[mypyc] Switch to table-driven imports for smaller IR #14917

ichard26 · 2023-03-17T18:14:51Z

Add CPyImport_ImportFromMany() which imports a module and a tuple of
names, placing them in the globals dict in one go.

Previously, each name would imported and placed in globals manually
in IR, leading to some pretty verbose code.

The other option to collect all from imports and perform them all at
once in the helper would remove even more ops, however, it has some
major downsides:

It wouldn't be able to be done in IRBuild directly, instead being
handled in the prebuild visitor and codegen... which all sounds
really involved.
It would cause from imports to be performed eagerly, potentially
causing circular imports (especially in functions whose imports are
probably there to avoid a circular import!).

The latter is the nail in the coffin for this idea.

Change how imports (not from imports!) are processed so they can be
table-driven (tuple-driven, really) and compact. Here's how it works:

Import nodes are divided in groups (in the prebuild visitor). Each group
consists of consecutive Import nodes:

import mod         <| group 1
import mod2         |

def foo() -> None:
    import mod3    <- group 2 (*)

import mod4        <- group 3

Every time we encounter the first import of a group, build IR to call
CPyImport_ImportMany() that will perform all of the group's imports in
one go.

(*) Imports in functions or classes are still transformed into the original,
verbose IR as speed is more important than codesize.

Previously, each module would imported and placed in globals manually
in IR, leading to some pretty verbose code.

The other option to collect all imports and perform them all at once in
the helper would remove even more ops, however, it's problematic for
the same reasons from the previous commit (spoiler: it's not safe).

Implementation notes:

I had to add support for loading the address of a static directly,
so I shoehorned in LoadStatic support for LoadAddress.
Unfortunately by replacing multiple import nodes with a single
function call at the IR level, if any import within a group fails,
the traceback line number is static and will be probably wrong
(pointing to the first import node in my original impl.).

To fix this, I had to make CPyImport_ImportMany() add the traceback
entry itself on failure (instead of letting codegen handle it
automatically). This is admittedly ugly.

Overall, this doesn't speed up initialization. The only real speed
impact is that back to back imports in a tight loop seems to be 10-20%
slower. I believe that's acceptable given the code size reduction.

Other changes:

Don't declare internal static for non-compiled modules

It won't be read anywhere as the internal statics are only used to
avoid runaway recursion with import cycles in our module init
functions.
Wrap long RArray initializers and long annotations in codegen

Table-driven imports can load some rather large RArrays and tuple
literals so this was needed to keep the generated C readable.
Add LLBuilder helper for setting up a RArray

Resolves mypyc/mypyc#591.

Table-driven imports can initialize some rather large RArrays so let's make sure they stay readable. Thankfully, all it takes is moving and reusing c_array_initializer().

It won't be read anywhere as the internal statics are only used to avoid runaway recursion with import cycles in our module init functions.

Add CPyImport_ImportFromMany() which imports a module and a tuple of names, placing them in the globals dict in one go. Previously, each name would imported and placed in globals manually in IR, leading to some pretty verbose code. The other option to collect all from imports and perform them all at once in the helper would remove even more ops, however, it has some major downsides: - It wouldn't be able to be done in IRBuild directly, instead being handled in the prebuild visitor and codegen... which all sounds really involved. - It would cause from imports to be performed eagerly, potentially causing circular imports (especially in functions whose imports are probably there to avoid a circular import!). The latter is the nail in the coffin for this idea.

Change how imports (not from imports!) are processed so they can be table-driven (tuple-driven, really) and compact. Here's how it works: Import nodes are divided in groups (in the prebuild visitor). Each group consists of consecutive Import nodes: import mod <| group 1 import mod2 | def foo() -> None: import mod3 <- group 2 import mod4 <- group 3 Every time we encounter the first import of a group, build IR to call CPyImport_ImportMany() that will perform all of the group's imports in one go. Previously, each module would imported and placed in globals manually in IR, leading to some pretty verbose code. The other option to collect all imports and perform them all at once in the helper would remove even more ops, however, it's problematic for the same reasons from the previous commit (spoiler: it's not safe). Implementation notes: - I had to add support for loading the address of a static directly, so I shoehorned in LoadLiteral support for LoadAddress. - Unfortunately by replacing multiple import nodes with a single function call at the IR level, if any import within a group fails, the traceback line number is static and will be probably wrong (pointing to the first import node in my original impl.). To fix this, I had to make CPyImport_ImportMany() add the traceback entry itself on failure (instead of letting codegen handle it automatically). This is admittedly ugly. Overall, this doesn't speed up initialization. The only real speed impact is that back to back imports in a tight loop is 10-20% slower. I believe that's acceptable given the code reduction.

Just like with long arrays, table-driven imports can lead to some massive literals being loaded... let's keep the C readable.

ichard26

For reference, this is the file I used to measure looped import performance:

import time, statistics

import dataclasses, enum, typing

times = []
for _ in range(10):
    t0 = time.perf_counter()
    for _ in range(10000):
        import dataclasses
        # import enum
        # import typing
    elapsed = (time.perf_counter() - t0) * 1000
    times.append(elapsed)
    print(f"{elapsed:.2f}")
mean = statistics.mean(times)
stdev = statistics.stdev(times)
print(f"mean: {mean:.1f} (stdev: {stdev:.3f}) | min: {min(times):.1f} | max: {max(times):.1f}")

On master (best):

1.04
1.03
1.01
1.01
1.02
1.02
1.10
1.02
1.03
1.01
mean: 1.0 (stdev: 0.026) | min 1.0 | max: 1.1

This PR (most common result):

1.18
1.19
1.20
1.19
1.19
1.22
1.22
1.21
1.21
1.23
mean: 1.2 (stdev: 0.018) | min 1.2 | max: 1.2

If the slowdown is unacceptable, it's possible to cut it in half by unrolling the call to single_import() in IR (only within functions where performance could matter).

I haven't measured whether from imports are any slower.

mypyc/lib-rt/misc_ops.c

mypyc/test-data/irbuild-basic.test

ichard26 · 2023-03-17T18:44:16Z

Here's a GitHub Gist with the old and new C code for the file I've been using to test my implementation. The new code is roughly 3000 LOC smaller! It's not a real example so the usual caveats apply, but long from imports and "import groups" aren't uncommon in the real world. (I stole the long from import from some mypyc file IIRC.)

Note: Imports in functions (or classes) are now transformed into the original verbose but faster IR, so this example is technically invalid. It still shows how effective this PR is at reducing codesize though.

JukkaL · 2023-04-14T15:28:35Z

I haven't done a full review yet, but based on a quick experiment the impact to self-compiling is pretty huge -- ~8% smaller generated code, and ~7% faster compilation when using -O0 (running an interpreted mypyc). And once a few additional bottlenecks in compilation speed are fixed, the relative impact will be bigger.

JukkaL

Overall looks good, thanks! Left some comments, the main questions are about the possibility of reverting to the old behavior within functions. If the performance impact would be negligible, this is not important.

mypyc/codegen/emit.py

mypyc/irbuild/prebuildvisitor.py

mypyc/irbuild/statement.py

JukkaL · 2023-04-14T15:49:55Z

mypyc/irbuild/statement.py

-    module = builder.gen_import_from(id, globals, imported_names, node.line)
-
-    # Copy everything into our module's dict.
+    builder.imports[id] = None


Similar to above, what about only using the table based option outside functions? Imports within functions are already something of a performance bottleneck, so it would be nice if we could keep them as fast as they are currently. (Eventually I'd like us to make nested imports faster, but that might take a while.)

I can't measure any slowdown running from typing import Any in a tight loop with the table-driven code.¹ In fact, the table-driven code can be faster if there are several names. (I tried six.)

**one name** [new] mean: 11.7 (stdev: 0.273) | min 11.4 | max: 12.2 [new] mean: 12.0 (stdev: 0.530) | min 11.3 | max: 13.0 [old] mean: 12.8 (stdev: 0.492) | min 11.8 | max: 13.5 [old] mean: 12.9 (stdev: 0.606) | min 12.0 | max: 13.8 **six names** [new] mean: 15.9 (stdev: 0.543) | min 14.9 | max: 16.7 [new] mean: 15.0 (stdev: 0.365) | min 14.6 | max: 15.6 [old] mean: 18.8 (stdev: 0.358) | min 18.3 | max: 19.3 [old] mean: 18.4 (stdev: 0.451) | min 17.6 | max: 19.1

Since the table-driven code doesn't seem to introduce any slowdown, I won't bring back the old from import code.

Footnotes

My data suggests it's actually faster, but that seems suspect. I won't call it any faster as the speedup is within the margin of error. ↩

mypyc/lib-rt/misc_ops.c

The prebuild visitor would mark these two imports as part of the same group which isn't right. def f() -> None: import x import y

ichard26 · 2023-04-23T18:40:17Z

Also, when you say "compile time" or "compile speed", do you mean the time mypyc + gcc/clang/msvc takes OR just the time mypyc takes? I'd guess the latter, but it has never been clear to me...

JukkaL · 2023-04-24T14:43:23Z

Also, when you say "compile time" or "compile speed", do you mean the time mypyc + gcc/clang/msvc takes OR just the time mypyc takes? I'd guess the latter, but it has never been clear to me...

I mean the overall time, i.e. mypyc + C compiler. For larger programs this is usually dominated by C compilation speed when using optimizations.

JukkaL

Thanks for the updates! This looks great now.

ichard26 · 2023-04-25T03:42:02Z

And thank you for the review! :)

ichard26 added 6 commits March 17, 2023 00:23

Wrap long RArray initializers

912dcc8

Table-driven imports can initialize some rather large RArrays so let's make sure they stay readable. Thankfully, all it takes is moving and reusing c_array_initializer().

Don't declare internal static for non-compiled modules

a16d87f

It won't be read anywhere as the internal statics are only used to avoid runaway recursion with import cycles in our module init functions.

Wrap long annotations in codegen

8e9baa8

Just like with long arrays, table-driven imports can lead to some massive literals being loaded... let's keep the C readable.

Cargo cult the RArray helper added earlier

c205455

ichard26 commented Mar 17, 2023

View reviewed changes

mypyc/lib-rt/misc_ops.c Outdated Show resolved Hide resolved

mypyc/lib-rt/misc_ops.c Outdated Show resolved Hide resolved

mypyc/test-data/irbuild-basic.test Show resolved Hide resolved

Fix irbuild test on 32-bit

07d514c

AlexWaygood added the topic-mypyc mypyc bugs label Mar 26, 2023

Simplify data format

f908715

ichard26 force-pushed the tighter-imports-try-3 branch from 1337874 to f908715 Compare March 28, 2023 00:04

JukkaL reviewed Apr 14, 2023

View reviewed changes

ichard26 added 6 commits April 14, 2023 20:51

Create Emitter.object_annotation() & add tests

d36e115

Harden import group logic

53d40b1

The prebuild visitor would mark these two imports as part of the same group which isn't right. def f() -> None: import x import y

Raise original error even if traceback path/function fails

067da44

Use old verbose import code for functions

93aa254

Moar cleanup and tests

bd392ae

Fix self compilation (oops!)

d4c5d51

ichard26 requested a review from JukkaL April 19, 2023 21:05

JukkaL approved these changes Apr 24, 2023

View reviewed changes

JukkaL merged commit ba35026 into python:master Apr 24, 2023

ichard26 deleted the tighter-imports-try-3 branch April 29, 2023 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[mypyc] Switch to table-driven imports for smaller IR #14917

[mypyc] Switch to table-driven imports for smaller IR #14917

Uh oh!

ichard26 commented Mar 17, 2023 •

edited

Loading

Uh oh!

ichard26 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ichard26 commented Mar 17, 2023 •

edited

Loading

Uh oh!

JukkaL commented Apr 14, 2023

Uh oh!

JukkaL left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JukkaL Apr 14, 2023

Uh oh!

ichard26 Apr 15, 2023

Uh oh!

Uh oh!

ichard26 commented Apr 23, 2023

Uh oh!

JukkaL commented Apr 24, 2023

Uh oh!

JukkaL left a comment

Uh oh!

ichard26 commented Apr 25, 2023

Uh oh!

Uh oh!

Uh oh!

[mypyc] Switch to table-driven imports for smaller IR #14917

[mypyc] Switch to table-driven imports for smaller IR #14917

Uh oh!

Conversation

ichard26 commented Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ichard26 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ichard26 commented Mar 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JukkaL commented Apr 14, 2023

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JukkaL Apr 14, 2023

Choose a reason for hiding this comment

Uh oh!

ichard26 Apr 15, 2023

Choose a reason for hiding this comment

Footnotes

Uh oh!

Uh oh!

ichard26 commented Apr 23, 2023

Uh oh!

JukkaL commented Apr 24, 2023

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

ichard26 commented Apr 25, 2023

Uh oh!

Uh oh!

ichard26 commented Mar 17, 2023 •

edited

Loading

ichard26 commented Mar 17, 2023 •

edited

Loading