-
Notifications
You must be signed in to change notification settings - Fork 14
Description
My theory is that the shapeless slowdown is because of the change to inline the synchronized { if (moduleVar eq null) moduleVar = new NestedModule }; moduleVar
into the module accessor method, rather than before when this was in a separate method.
Concretely, trait lazy vals are mixed into subclasses
with the needed synchronization logic in place, as do
lazy vals in classes and methods. Similarly, modules
are initialized using double checked locking.
Since the code to initialize a module is short,
we do not emit compute methods for modules (anymore).For simplicity, local lazy vals do not get a compute method either.
2.11.8:
scala> :javap -c -private scala.reflect.internal.SymbolTable#NoPrefix
public scala.reflect.internal.Types$NoPrefix$ NoPrefix();
Code:
0: aload_0
1: getfield #3094 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
4: ifnonnull 14
7: aload_0
8: invokespecial #3100 // Method NoPrefix$lzycompute:()Lscala/reflect/internal/Types$NoPrefix$;
11: goto 18
14: aload_0
15: getfield #3094 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
18: areturn
public scala.reflect.api.Types$TypeApi NoPrefix();
Code:
0: aload_0
1: invokevirtual #5560 // Method NoPrefix:()Lscala/reflect/internal/Types$NoPrefix$;
4: areturn
scala> :javap -c -private scala.reflect.internal.SymbolTable#NoPrefix$lzycompute
private scala.reflect.internal.Types$NoPrefix$ NoPrefix$lzycompute();
Code:
0: aload_0
1: dup
2: astore_1
3: monitorenter
4: aload_0
5: getfield #3094 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
8: ifnonnull 23
11: aload_0
12: new #3096 // class scala/reflect/internal/Types$NoPrefix$
15: dup
16: aload_0
17: invokespecial #3097 // Method scala/reflect/internal/Types$NoPrefix$."<init>":(Lscala/reflect/internal/SymbolTable;)V
20: putfield #3094 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
23: getstatic #737 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
26: pop
27: aload_0
28: monitorexit
29: aload_0
30: getfield #3094 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
33: areturn
34: aload_1
35: monitorexit
36: athrow
Exception table:
from to target type
4 29 34 any
2.12.0-RC1:
scala> :javap -c -private scala.reflect.internal.SymbolTable#NoPrefix
public scala.reflect.internal.Types$NoPrefix$ NoPrefix();
Code:
0: aload_0
1: getfield #2615 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
4: ifnonnull 36
7: aload_0
8: monitorenter
9: aload_0
10: getfield #2615 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
13: ifnonnull 28
16: aload_0
17: new #867 // class scala/reflect/internal/Types$NoPrefix$
20: dup
21: aload_0
22: invokespecial #2616 // Method scala/reflect/internal/Types$NoPrefix$."<init>":(Lscala/reflect/internal/SymbolTable;)V
25: putfield #2615 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
28: aload_0
29: monitorexit
30: goto 36
33: aload_0
34: monitorexit
35: athrow
36: aload_0
37: getfield #2615 // Field NoPrefix$module:Lscala/reflect/internal/Types$NoPrefix$;
40: areturn
This has probably interfered with HotSpot's inlining.
I was comparing 2.11.8 and 2.12.0-RC1++ performance on the shapeless test suite, file by file. This was the worst regression, probably due to the relatively high amount of subtype checks that are performed in the implicit search.
⚡ time ((scalac -J-XX:+UnlockCommercialFeatures -J-XX:+UnlockDiagnosticVMOptions -J-XX:+DebugNonSafepoints -J-XX:+FlightRecorder -J-XX:FlightRecorderOptions=defaultrecording=true,dumponexit=true,stackdepth=1024,loglevel=debug,settings=profile,dumponexitpath=/tmp/old.jfr -Xlog-implicits @args-small.txt /Users/jz/code/shapeless/core/src/test/scala/shapeless/hlist.scala 2>&1) > /tmp/hlist-old.log)
real 0m27.659s
user 1m33.629s
sys 0m2.012s
⚡ time ((/code/scala/build/pack/bin/scalac -J-XX:+UnlockCommercialFeatures -J-XX:+UnlockDiagnosticVMOptions -J-XX:+DebugNonSafepoints -J-XX:+FlightRecorder -J-XX:FlightRecorderOptions=defaultrecording=true,dumponexit=true,stackdepth=1024,loglevel=debug,settings=profile,dumponexitpath=/tmp/new.jfr -Xlog-implicits @args-2.12.0-RC-small.txt /Users/jz/code/shapeless/core/src/test/scala/shapeless/hlist.scala 2>&1) > /tmp/hlist-new.log)
real 1m27.506s
user 2m37.283s
sys 0m2.292s
The actual output of -Xlog-implicits
was quite similar, so it seems a similar amount of work was being done.