Fix quantize_row_q4_1() with ARM_NEON

It is currently bugged. See results of `quantize-stats` on M1:

```
$  ./quantize-stats -m models/7B/ggml-model-f16.bin 
Loading model
llama.cpp: loading model from models/7B/ggml-model-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 256
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 1
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 14645.07 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB
note: source model is f16
testing 291 layers with max size 131072000
q4_0                                              : rmse 0.00222150, maxerr 0.18429124, 95pct<0.0040, median<0.0018
q4_1                                              : rmse 0.00360044, maxerr 0.26373291, 95pct<0.0066, median<0.0028

main:    total time = 93546.68 ms
```

The RMSE is too high - worse than Q4_0.

There is a bug in the following piece of code:

https://github.com/ggerganov/llama.cpp/blob/180b693a47b6b825288ef9f2c39d24b6eea4eea6/ggml.c#L922-L955

We should fix it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix quantize_row_q4_1() with ARM_NEON #876

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix quantize_row_q4_1() with ARM_NEON #876

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions