Skip to content

Fix quantize_row_q4_1() with ARM_NEON #876

@ggerganov

Description

@ggerganov

It is currently bugged. See results of quantize-stats on M1:

$  ./quantize-stats -m models/7B/ggml-model-f16.bin 
Loading model
llama.cpp: loading model from models/7B/ggml-model-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 256
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 1
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 14645.07 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB
note: source model is f16
testing 291 layers with max size 131072000
q4_0                                              : rmse 0.00222150, maxerr 0.18429124, 95pct<0.0040, median<0.0018
q4_1                                              : rmse 0.00360044, maxerr 0.26373291, 95pct<0.0066, median<0.0028

main:    total time = 93546.68 ms

The RMSE is too high - worse than Q4_0.

There is a bug in the following piece of code:

https://github.com/ggerganov/llama.cpp/blob/180b693a47b6b825288ef9f2c39d24b6eea4eea6/ggml.c#L922-L955

We should fix it

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghigh priorityVery important issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions