-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Labels
bugSomething isn't workingSomething isn't workinghigh priorityVery important issueVery important issue
Description
It is currently bugged. See results of quantize-stats
on M1:
$ ./quantize-stats -m models/7B/ggml-model-f16.bin
Loading model
llama.cpp: loading model from models/7B/ggml-model-f16.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 256
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: f16 = 1
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 14645.07 MB (+ 2052.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB
note: source model is f16
testing 291 layers with max size 131072000
q4_0 : rmse 0.00222150, maxerr 0.18429124, 95pct<0.0040, median<0.0018
q4_1 : rmse 0.00360044, maxerr 0.26373291, 95pct<0.0066, median<0.0028
main: total time = 93546.68 ms
The RMSE is too high - worse than Q4_0.
There is a bug in the following piece of code:
We should fix it
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghigh priorityVery important issueVery important issue