Skip to content

Running Qwen3 XNNPACK on Android fails while setting up pretokenizer #10867

@kkimmk

Description

@kkimmk

I'm trying to run Qwen3-0.6B model on Android using XNNPACK, following the instructions in qwen3 example and step4 of llama example.

When running ./llama_main --model_path qwen3-0_6b.pte --tokenizer_path tokenizer.json --prompt "Hi" --seq_len 120 on Galaxy S23, I got the following error while setting up pretokenizer.

I 00:00:00.003396 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
I 00:00:00.003583 executorch:main.cpp:76] Resetting threadpool with num threads = 4
I 00:00:00.008963 executorch:runner.cpp:90] Creating LLaMa runner: model_path=qwen3-0_6b.pte, tokenizer_path=tokenizer.json
Setting up pretokenizer...
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1747192804.212017   24371 re2.cc:237] Error parsing '((?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s...': invalid perl operator: (?!
RE2 failed to compile pattern with lookahead: (?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+
Error: invalid perl operator: (?!
Compile with SUPPORT_REGEX_LOOKAHEAD=ON to enable support for lookahead patterns.
libc++abi: terminating due to uncaught exception of type std::runtime_error: Error: 4
Aborted

It seems the runner now accepts .json format for the tokenizer. Is there anything I'm missing?

Environments

  • executorch main branch (b173722)
  • Android NDK r28b
  • Galaxy S23
  • Run install_executorch.sh --pybind xnnpack and examples/models/llama/install_requirements.sh.

Commands used (used commands in the instructions)

  • model
    python -m examples.models.llama.export_llama \
      --model qwen3-0_6b \
      --params examples/models/qwen3/0_6b_config.json \
      -kv \
      --use_sdpa_with_kv_cache \
      -d fp32 \
      -X \
      --xnnpack-extended-ops \
      -qmode 8da4w \
      --metadata '{"get_bos_id": 151644, "get_eos_ids":[151645]}' \
      --output_name="qwen3-0_6b.pte" \
      --verbose
    
  • runner
    cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
        -DANDROID_ABI=arm64-v8a \
        -DANDROID_PLATFORM=android-23 \
        -DCMAKE_INSTALL_PREFIX=cmake-out-android \
        -DCMAKE_BUILD_TYPE=Release \
        -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
        -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
        -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
        -DEXECUTORCH_ENABLE_LOGGING=1 \
        -DPYTHON_EXECUTABLE=python \
        -DEXECUTORCH_BUILD_XNNPACK=ON \
        -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
        -Bcmake-out-android .
    
    cmake --build cmake-out-android -j16 --target install --config Release
    
    cmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
        -DANDROID_ABI=arm64-v8a \
        -DANDROID_PLATFORM=android-23 \
        -DCMAKE_INSTALL_PREFIX=cmake-out-android \
        -DCMAKE_BUILD_TYPE=Release \
        -DPYTHON_EXECUTABLE=python \
        -DEXECUTORCH_BUILD_XNNPACK=ON \
        -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
        -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
        -Bcmake-out-android/examples/models/llama \
        examples/models/llama
    
    cmake --build cmake-out-android/examples/models/llama -j16 --config Release
    

After, adb pushed .pte, tokenizer.json, llama_main to the device and executed with the command above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions