Skip to content

[mypyc] Switch to table-driven imports for smaller IR #14917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 24, 2023

Conversation

ichard26
Copy link
Collaborator

@ichard26 ichard26 commented Mar 17, 2023

Add CPyImport_ImportFromMany() which imports a module and a tuple of
names, placing them in the globals dict in one go.

Previously, each name would imported and placed in globals manually
in IR, leading to some pretty verbose code.

The other option to collect all from imports and perform them all at
once in the helper would remove even more ops, however, it has some
major downsides:

  • It wouldn't be able to be done in IRBuild directly, instead being
    handled in the prebuild visitor and codegen... which all sounds
    really involved.

  • It would cause from imports to be performed eagerly, potentially
    causing circular imports (especially in functions whose imports are
    probably there to avoid a circular import!).

The latter is the nail in the coffin for this idea.


Change how imports (not from imports!) are processed so they can be
table-driven (tuple-driven, really) and compact. Here's how it works:

Import nodes are divided in groups (in the prebuild visitor). Each group
consists of consecutive Import nodes:

import mod         <| group 1
import mod2         |

def foo() -> None:
    import mod3    <- group 2 (*)

import mod4        <- group 3

Every time we encounter the first import of a group, build IR to call
CPyImport_ImportMany() that will perform all of the group's imports in
one go.

(*) Imports in functions or classes are still transformed into the original,
verbose IR as speed is more important than codesize.

Previously, each module would imported and placed in globals manually
in IR, leading to some pretty verbose code.

The other option to collect all imports and perform them all at once in
the helper would remove even more ops, however, it's problematic for
the same reasons from the previous commit (spoiler: it's not safe).

Implementation notes:

  • I had to add support for loading the address of a static directly,
    so I shoehorned in LoadStatic support for LoadAddress.

  • Unfortunately by replacing multiple import nodes with a single
    function call at the IR level, if any import within a group fails,
    the traceback line number is static and will be probably wrong
    (pointing to the first import node in my original impl.).

    To fix this, I had to make CPyImport_ImportMany() add the traceback
    entry itself on failure (instead of letting codegen handle it
    automatically). This is admittedly ugly.

Overall, this doesn't speed up initialization. The only real speed
impact is that back to back imports in a tight loop seems to be 10-20%
slower. I believe that's acceptable given the code size reduction.


Other changes:

  • Don't declare internal static for non-compiled modules

    It won't be read anywhere as the internal statics are only used to
    avoid runaway recursion with import cycles in our module init
    functions.

  • Wrap long RArray initializers and long annotations in codegen

    Table-driven imports can load some rather large RArrays and tuple
    literals so this was needed to keep the generated C readable.

  • Add LLBuilder helper for setting up a RArray

Resolves mypyc/mypyc#591.

Table-driven imports can initialize some rather large RArrays so let's
make sure they stay readable. Thankfully, all it takes is moving and
reusing c_array_initializer().
It won't be read anywhere as the internal statics are only used to
avoid runaway recursion with import cycles in our module init functions.
Add CPyImport_ImportFromMany() which imports a module and a tuple of
names, placing them in the globals dict in one go.

Previously, each name would imported and placed in globals manually
in IR, leading to some pretty verbose code.

The other option to collect all from imports and perform them all at
once in the helper would remove even more ops, however, it has some
major downsides:

- It wouldn't be able to be done in IRBuild directly, instead being
  handled in the prebuild visitor and codegen... which all sounds
  really involved.

- It would cause from imports to be performed eagerly, potentially
  causing circular imports (especially in functions whose imports are
  probably there to avoid a circular import!).

The latter is the nail in the coffin for this idea.
Change how imports (not from imports!) are processed so they can be
table-driven (tuple-driven, really) and compact. Here's how it works:

Import nodes are divided in groups (in the prebuild visitor). Each group
consists of consecutive Import nodes:

  import mod         <| group 1
  import mod2         |

  def foo() -> None:
      import mod3    <- group 2

  import mod4        <- group 3

Every time we encounter the first import of a group, build IR to call
CPyImport_ImportMany() that will perform all of the group's imports in
one go.

Previously, each module would imported and placed in globals manually
in IR, leading to some pretty verbose code.

The other option to collect all imports and perform them all at once in
the helper would remove even more ops, however, it's problematic for
the same reasons from the previous commit (spoiler: it's not safe).

Implementation notes:

  - I had to add support for loading the address of a static directly,
    so I shoehorned in LoadLiteral support for LoadAddress.

  - Unfortunately by replacing multiple import nodes with a single
    function call at the IR level, if any import within a group fails,
    the traceback line number is static and will be probably wrong
    (pointing to the first import node in my original impl.).

    To fix this, I had to make CPyImport_ImportMany() add the traceback
    entry itself on failure (instead of letting codegen handle it
    automatically). This is admittedly ugly.

Overall, this doesn't speed up initialization. The only real speed
impact is that back to back imports in a tight loop is 10-20% slower. I
believe that's acceptable given the code reduction.
Just like with long arrays, table-driven imports can lead to some
massive literals being loaded... let's keep the C readable.
Copy link
Collaborator Author

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this is the file I used to measure looped import performance:

import time, statistics

import dataclasses, enum, typing

times = []
for _ in range(10):
    t0 = time.perf_counter()
    for _ in range(10000):
        import dataclasses
        # import enum
        # import typing
    elapsed = (time.perf_counter() - t0) * 1000
    times.append(elapsed)
    print(f"{elapsed:.2f}")
mean = statistics.mean(times)
stdev = statistics.stdev(times)
print(f"mean: {mean:.1f} (stdev: {stdev:.3f}) | min: {min(times):.1f} | max: {max(times):.1f}")

On master (best):

1.04
1.03
1.01
1.01
1.02
1.02
1.10
1.02
1.03
1.01
mean: 1.0 (stdev: 0.026) | min 1.0 | max: 1.1

This PR (most common result):

1.18
1.19
1.20
1.19
1.19
1.22
1.22
1.21
1.21
1.23
mean: 1.2 (stdev: 0.018) | min 1.2 | max: 1.2

If the slowdown is unacceptable, it's possible to cut it in half by unrolling the call to single_import() in IR (only within functions where performance could matter).

I haven't measured whether from imports are any slower.

@ichard26
Copy link
Collaborator Author

ichard26 commented Mar 17, 2023

Here's a GitHub Gist with the old and new C code for the file I've been using to test my implementation. The new code is roughly 3000 LOC smaller! It's not a real example so the usual caveats apply, but long from imports and "import groups" aren't uncommon in the real world. (I stole the long from import from some mypyc file IIRC.)

Note: Imports in functions (or classes) are now transformed into the original verbose but faster IR, so this example is technically invalid. It still shows how effective this PR is at reducing codesize though.

@AlexWaygood AlexWaygood added the topic-mypyc mypyc bugs label Mar 26, 2023
@ichard26 ichard26 force-pushed the tighter-imports-try-3 branch from 1337874 to f908715 Compare March 28, 2023 00:04
@JukkaL
Copy link
Collaborator

JukkaL commented Apr 14, 2023

I haven't done a full review yet, but based on a quick experiment the impact to self-compiling is pretty huge -- ~8% smaller generated code, and ~7% faster compilation when using -O0 (running an interpreted mypyc). And once a few additional bottlenecks in compilation speed are fixed, the relative impact will be bigger.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thanks! Left some comments, the main questions are about the possibility of reverting to the old behavior within functions. If the performance impact would be negligible, this is not important.

module = builder.gen_import_from(id, globals, imported_names, node.line)

# Copy everything into our module's dict.
builder.imports[id] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, what about only using the table based option outside functions? Imports within functions are already something of a performance bottleneck, so it would be nice if we could keep them as fast as they are currently. (Eventually I'd like us to make nested imports faster, but that might take a while.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't measure any slowdown running from typing import Any in a tight loop with the table-driven code.1 In fact, the table-driven code can be faster if there are several names. (I tried six.)

**one name**

[new] mean: 11.7 (stdev: 0.273) | min 11.4 | max: 12.2
[new] mean: 12.0 (stdev: 0.530) | min 11.3 | max: 13.0
[old] mean: 12.8 (stdev: 0.492) | min 11.8 | max: 13.5
[old] mean: 12.9 (stdev: 0.606) | min 12.0 | max: 13.8

**six names**

[new] mean: 15.9 (stdev: 0.543) | min 14.9 | max: 16.7
[new] mean: 15.0 (stdev: 0.365) | min 14.6 | max: 15.6
[old] mean: 18.8 (stdev: 0.358) | min 18.3 | max: 19.3
[old] mean: 18.4 (stdev: 0.451) | min 17.6 | max: 19.1

Since the table-driven code doesn't seem to introduce any slowdown, I won't bring back the old from import code.

Footnotes

  1. My data suggests it's actually faster, but that seems suspect. I won't call it any faster as the speedup is within the margin of error.

@ichard26 ichard26 requested a review from JukkaL April 19, 2023 21:05
@ichard26
Copy link
Collaborator Author

Also, when you say "compile time" or "compile speed", do you mean the time mypyc + gcc/clang/msvc takes OR just the time mypyc takes? I'd guess the latter, but it has never been clear to me...

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 24, 2023

Also, when you say "compile time" or "compile speed", do you mean the time mypyc + gcc/clang/msvc takes OR just the time mypyc takes? I'd guess the latter, but it has never been clear to me...

I mean the overall time, i.e. mypyc + C compiler. For larger programs this is usually dominated by C compilation speed when using optimizations.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! This looks great now.

@JukkaL JukkaL merged commit ba35026 into python:master Apr 24, 2023
@ichard26
Copy link
Collaborator Author

And thank you for the review! :)

@ichard26 ichard26 deleted the tighter-imports-try-3 branch April 29, 2023 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-mypyc mypyc bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Experiment with table-driven import code
3 participants