Skip to content

bug: order_by is not preserved in .first() aggregate when used in window functions #11307

@NickCrews

Description

@NickCrews

What happened?

The order_by clause is preserved in group_by().agg(), but not in group_by().mutate().

import ibis

t = ibis.memtable(
    [
        (1, 2, "a"),
        (1, 1, "b"),
        (2, 5, "c"),
        (2, 1, "d"),
        (2, 8, "e"),
    ],
    schema={"groupby": "int", "rank": "int", "val": "string"},
)
# If I swap this to a .aggregate() then it works as expected
t = t.group_by("groupby").mutate(first_val_ordered=t.val.first(order_by=t.rank))
print(ibis.to_sql(t))
print(repr(t.get_backend().compiler.to_sqlglot(t)))
print(t.execute().to_markdown())

results in

SELECT
  "t0"."groupby",
  "t0"."rank",
  "t0"."val",
  FIRST_VALUE("t0"."val") OVER (PARTITION BY "t0"."groupby" ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "first_val_ordered"
FROM "ibis_pandas_memtable_luaaov2shbhlrpp3zt53oagsau" AS "t0"
Select(
  expressions=[
    Column(
      this=Identifier(this=groupby, quoted=True),
      table=Identifier(this=t0, quoted=True)),
    Column(
      this=Identifier(this=rank, quoted=True),
      table=Identifier(this=t0, quoted=True)),
    Column(
      this=Identifier(this=val, quoted=True),
      table=Identifier(this=t0, quoted=True)),
    Alias(
      this=Window(
        this=FirstValue(
          this=Column(
            this=Identifier(this=val, quoted=True),
            table=Identifier(this=t0, quoted=True))),
        partition_by=(Column(
            this=Identifier(this=groupby, quoted=True),
            table=Identifier(this=t0, quoted=True)),),
        spec=WindowSpec(kind=ROWS, start=UNBOUNDED, start_side=PRECEDING, end=UNBOUNDED, end_side=FOLLOWING, over=OVER)),
      alias=Identifier(this=first_val_ordered, quoted=True))],
  from=From(
    this=Table(
      this=Identifier(this=ibis_pandas_memtable_luaaov2shbhlrpp3zt53oagsau, quoted=True),
      alias=Identifier(this=t0, quoted=True))))
|    |   groupby |   rank | val   | first_val_ordered   |
|---:|----------:|-------:|:------|:--------------------|
|  0 |         1 |      2 | a     | a                   |
|  1 |         1 |      1 | b     | a                   |
|  2 |         2 |      5 | c     | c                   |
|  3 |         2 |      1 | d     | c                   |
|  4 |         2 |      8 | e     | c                   |

If you point me in the right direction I can try to fix this, but I haven't yet dealt with window functions so they look intimidating to me

What version of ibis are you using?

main

What backend(s) are you using, if any?

duckdb

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions