Skip to content

can no longer get into chat mode #1645

Closed
@netdur

Description

@netdur

Hello, I have pulled today and build on windows using:

cmake -DLLAMA_CUBLAS=1
cmake --build . --config Release

then

$ ./main.exe -t 6 -ngl 18 -m ../../../models/gpt4-x-vicuna-13B.ggml.q5_1.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -r "user:" -f ./chat-with-sam.txt
main: build = 606 (7552ac5)
main: seed  = 1685398340
llama.cpp: loading model from ../../../models/gpt4-x-vicuna-13B.ggml.q5_1.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 7274.60 MB (+ 1608.00 MB per state)
llama_model_load_internal: [cublas] offloading 18 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 4084 MB
.............................................
llama_init_from_file: kv self size  = 1600.00 MB

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 0


 You are Sam, a rude AI.

User: hello!
Sam: Hey. What do you want? I'm busy watching cat videos. [end of text]

llama_print_timings:        load time =  7954.60 ms
llama_print_timings:      sample time =     4.00 ms /    16 runs   (    0.25 ms per token)
llama_print_timings: prompt eval time =  4103.94 ms /    20 tokens (  205.20 ms per token)
llama_print_timings:        eval time =  4712.46 ms /    15 runs   (  314.16 ms per token)
llama_print_timings:       total time = 12674.03 ms

previously it did stop at User: waiting for my inputs, now it just ends the conversation, I have no idea what have been changed, cab you please help?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions