Skip to content

Use Vector API in the Java Extension #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

samyron
Copy link
Contributor

@samyron samyron commented Jul 8, 2025

PLEASE DO NOT MERGE

Overview

This PR uses the jdk.incubator.vector module as mentioned in issue #739 to accelerate generating JSON with the same algorithm as the C extension.

The PR as it exists right now, it will attempt to build the json.ext.VectorizedEscapeScanner class with a target release of 16. This is the first version of Java with support for the jdk.incubator.vector module. The remaining code is built for Java 1.8. The code will attempt to load the json.ext.VectorizedEscapeScanner only if the json.enableVectorizedEscapeScanner system property is set to true (or 1).

I'm not entirely sure how this is packaged / included with JRuby so I'd love @byroot and @headius's (and others?) thought about how to potential package and/or structure the JARs. I did consider adding the json.ext.VectorizedEscapeScanner to a separate generator-vectorized.jar but I thought I'd solicit feedback before spending any more time on the build / package process.

Benchmarks

Machine M1 Macbook Air

Note: I've had trouble modifying the compare.rb I was using for the C extension to work reliability with the Java extension. I'll probably spend more time trying to get it to work, but as of right now these are pretty raw benchmarks.

Below are two sample runs of the real-world benchmarks. The benchmarks are much more variable then the C extension for some reason. I'm not sure if HotSpot is doing something slightly different per execution.

Vector API Enabled

scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=true' ruby -I"lib" benchmark/encoder-realworld.rb
WARNING: Using incubator modules: jdk.incubator.vector
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.384k i/100ms
Calculating -------------------------------------
                json     15.289k (± 0.8%) i/s   (65.41 μs/i) -    153.624k in  10.048481s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    76.000 i/100ms
Calculating -------------------------------------
                json    753.787 (± 3.6%) i/s    (1.33 ms/i) -      7.524k in   9.997059s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   173.000 i/100ms
Calculating -------------------------------------
                json      1.751k (± 1.1%) i/s  (571.24 μs/i) -     17.646k in  10.081260s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.390k i/100ms
Calculating -------------------------------------
                json     23.829k (± 0.8%) i/s   (41.97 μs/i) -    239.000k in  10.030503s

Vector API Disabled

scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=false' ruby -I"lib" benchmark/encoder-realworld.rb
WARNING: Using incubator modules: jdk.incubator.vector
VectorizedEscapeScanner disabled.
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.204k i/100ms
Calculating -------------------------------------
                json     12.937k (± 1.1%) i/s   (77.30 μs/i) -    130.032k in  10.052234s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    80.000 i/100ms
Calculating -------------------------------------
                json    817.378 (± 1.0%) i/s    (1.22 ms/i) -      8.240k in  10.082058s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   147.000 i/100ms
Calculating -------------------------------------
                json      1.499k (± 1.3%) i/s  (667.08 μs/i) -     14.994k in  10.004181s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.269k i/100ms
Calculating -------------------------------------
                json     22.366k (± 5.7%) i/s   (44.71 μs/i) -    224.631k in  10.097069s

master as of commit c5af1b68c582335c2a82bbc4bfa5b3e41ead1eba

scott@Scotts-MacBook-Air json % ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   886.000 i/100ms
Calculating -------------------------------------
                json^C%                                                                                                                   
scott@Scotts-MacBook-Air json % ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.031k i/100ms
Calculating -------------------------------------
                json     10.812k (± 1.3%) i/s   (92.49 μs/i) -    108.255k in  10.014260s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json    82.000 i/100ms
Calculating -------------------------------------
                json    824.921 (± 1.0%) i/s    (1.21 ms/i) -      8.282k in  10.040787s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json   141.000 i/100ms
Calculating -------------------------------------
                json      1.421k (± 0.7%) i/s  (703.85 μs/i) -     14.241k in  10.023979s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.274k i/100ms
Calculating -------------------------------------
                json     22.612k (± 0.9%) i/s   (44.22 μs/i) -    227.400k in  10.057516s

Observations

activitypub.json and twitter.json seem to be consistently faster with the Vector API enabled. citm_catalog.json seems consistently a bit slower and ohai.json is fairly close to even.

@samyron samyron force-pushed the sm/java-vector-simd branch from 194ba01 to 15c7187 Compare July 15, 2025 03:12
@samyron
Copy link
Contributor Author

samyron commented Jul 15, 2025

Using hsdis to examine the generated assembly I can verify that on my Macbook Air the Hotspot C2 Compiler does indeed use Neon instructions.

ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=true -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintIntrinsics -XX:CompileCommand=print,*VectorizedEscapeScanner.*' ruby -I"lib" benchmark/encoder-realworld.rb > output.txt 2>output.txt
Compiled method (c2)   22086 5801       4       json.ext.VectorizedEscapeScanner::scan (391 bytes)
<snip>

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Entry Point]
  # {method} {0x0000000133c3a0d8} 'scan' '(Ljson/ext/EscapeScanner$State;)Z' in 'json/ext/VectorizedEscapeScanner'
  # this:     c_rarg1:c_rarg1 
                        = 'json/ext/VectorizedEscapeScanner'
  # parm0:    c_rarg2:c_rarg2 
                        = 'json/ext/EscapeScanner$State'
  #           [sp+0x30]  (sp of caller)
  0x000000011b28d0c0:   ldr		w8, [x1, #8]
  0x000000011b28d0c4:   cmp		w9, w8
  0x000000011b28d0c8:   b.eq		#0x11b28d0d0
  0x000000011b28d0cc:   b		#0x11aa5fe80        ;   {runtime_call ic_miss_stub}
[Verified Entry Point]
  0x000000011b28d0d0:   nop		
  0x000000011b28d0d4:   sub		x9, sp, #0x14, lsl #12
  0x000000011b28d0d8:   str		xzr, [x9]
  0x000000011b28d0dc:   sub		sp, sp, #0x30
 <snip>
  0x000000011b28d194:   add		x12, x5, w14, sxtw
  0x000000011b28d198:   ldr		q20, [x12, #0x10]
  0x000000011b28d19c:   eor		v21.16b, v20.16b, v17.16b
  0x000000011b28d1a0:   cmgt		v22.16b, v19.16b, v20.16b
  0x000000011b28d1a4:   cmgt		v21.16b, v18.16b, v21.16b
  0x000000011b28d1a8:   cmeq		v20.16b, v20.16b, v16.16b
  0x000000011b28d1ac:   bic		v21.16b, v21.16b, v22.16b
  0x000000011b28d1b0:   orr		v20.16b, v20.16b, v21.16b
  0x000000011b28d1b4:   str		w1, [x2, #0x30]
  0x000000011b28d1b8:   addv		b21, v20.16b
  0x000000011b28d1bc:   umov		w8, v21.b[0]
  0x000000011b28d1c0:   cmp		w8, wzr
  0x000000011b28d1c4:   b.ne		#0x11b28d40c
  0x000000011b28d1c8:   add		w14, w7, #0x10
  0x000000011b28d1cc:   ldr		x12, [x28, #0x450]
  0x000000011b28d1d0:   str		w14, [x2, #0x14]    ; ImmutableOopMap {c_rarg2=Oop c_rarg5=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) json.ext.VectorizedEscapeScanner::scan@308 (line 59)
<snip>

@headius
Copy link
Contributor

headius commented Jul 16, 2025

@samyron OMG I look away for a few days and you just go and do it! Bravo!

I'll have a look at these changes soon and see if I can offer any suggestions. This API is still a bit of a moving target, but I think we can work around that with a little Ruby magic here and there.

I will also point the Vector API folks at this PR so they can see what we're doing and provide additional input.

Amazing work!

@headius
Copy link
Contributor

headius commented Jul 16, 2025

I've posted a thread to the panama-dev list here: https://mail.openjdk.org/pipermail/panama-dev/2025-July/021080.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants