Description
What happened?
The order_by clause is preserved in group_by().agg()
, but not in group_by().mutate()
.
import ibis
t = ibis.memtable(
[
(1, 2, "a"),
(1, 1, "b"),
(2, 5, "c"),
(2, 1, "d"),
(2, 8, "e"),
],
schema={"groupby": "int", "rank": "int", "val": "string"},
)
# If I swap this to a .aggregate() then it works as expected
t = t.group_by("groupby").mutate(first_val_ordered=t.val.first(order_by=t.rank))
print(ibis.to_sql(t))
print(repr(t.get_backend().compiler.to_sqlglot(t)))
print(t.execute().to_markdown())
results in
SELECT
"t0"."groupby",
"t0"."rank",
"t0"."val",
FIRST_VALUE("t0"."val") OVER (PARTITION BY "t0"."groupby" ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "first_val_ordered"
FROM "ibis_pandas_memtable_luaaov2shbhlrpp3zt53oagsau" AS "t0"
Select(
expressions=[
Column(
this=Identifier(this=groupby, quoted=True),
table=Identifier(this=t0, quoted=True)),
Column(
this=Identifier(this=rank, quoted=True),
table=Identifier(this=t0, quoted=True)),
Column(
this=Identifier(this=val, quoted=True),
table=Identifier(this=t0, quoted=True)),
Alias(
this=Window(
this=FirstValue(
this=Column(
this=Identifier(this=val, quoted=True),
table=Identifier(this=t0, quoted=True))),
partition_by=(Column(
this=Identifier(this=groupby, quoted=True),
table=Identifier(this=t0, quoted=True)),),
spec=WindowSpec(kind=ROWS, start=UNBOUNDED, start_side=PRECEDING, end=UNBOUNDED, end_side=FOLLOWING, over=OVER)),
alias=Identifier(this=first_val_ordered, quoted=True))],
from=From(
this=Table(
this=Identifier(this=ibis_pandas_memtable_luaaov2shbhlrpp3zt53oagsau, quoted=True),
alias=Identifier(this=t0, quoted=True))))
| | groupby | rank | val | first_val_ordered |
|---:|----------:|-------:|:------|:--------------------|
| 0 | 1 | 2 | a | a |
| 1 | 1 | 1 | b | a |
| 2 | 2 | 5 | c | c |
| 3 | 2 | 1 | d | c |
| 4 | 2 | 8 | e | c |
If you point me in the right direction I can try to fix this, but I haven't yet dealt with window functions so they look intimidating to me
What version of ibis are you using?
main
What backend(s) are you using, if any?
duckdb
Relevant log output
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Type
Projects
Status
backlog