Closed
Description
So my script using the generate high level api function got broken after v0.2.13 and I managed to track it down to this commit #795 ab028cb
The model stops responding (eos) and seems to generate random symbols as first character
Here's my test script to demonstrate the issue:
from llama_cpp import Llama
path = '../models/' + 'mistral-7b-instruct-v0.1.Q4_K_M.gguf'
llm = Llama(model_path=path, n_gpu_layers=-1, n_ctx=4096, verbose=True)
history = []
while True:
user_input = input("\nInput -> ")
history.append({"role": "user", "content": user_input})
prompt = "<s>"
prompt += '[INST] '+ user_input +' [/INST]'
# for msg in history:
# if msg['role'] == 'user':
# prompt += '[INST] '+ msg['content'] +' [/INST]'
# else:
# prompt += msg['content'] + '</s>\n'
print(prompt)
tok = llm.tokenize(prompt.encode("utf-8"), add_bos=False, special=True)
print('Prompt tokens: {0}'.format(len(tok)))
stream = llm.generate(tok, temp=0.3)
num = 0
output = ''
for token in stream:
num += 1
if token == llm.token_eos():# or num >= max_tokens:
print('</s>', end='', flush=True)
break
text = llm.detokenize([token])
text = text.decode("utf-8")
output += text
print(text, end='', flush=True)
#print("\n\nFull response:", output)
print(f'\nTokens generated: {num}')
history.append({"role": "assistant", "content": output})
Some example outputs:
Input -> hi
<s>[INST] hi [/INST]
Prompt tokens: 9
↓</s>
Tokens generated: 2
(restarted the script)
Input -> howdy
<s>[INST] howdy [/INST]
Prompt tokens: 10
♂ Hey there! How's life been treating you these days?</s>
Tokens generated: 15
Input -> bruh
<s>[INST] howdy [/INST]♂ Hey there! How's life been treating you these days?</s>
[INST] bruh [/INST]
Prompt tokens: 34
Llama.generate: prefix-match hit
</s>
Tokens generated: 1
(restarted the script)
Input -> hi
<s>[INST] hi [/INST]
Prompt tokens: 9
▲ Hello! How can I help you today?</s>
Tokens generated: 11
(restarted the script)
Input -> howdy
<s>[INST] howdy [/INST]
Prompt tokens: 10
► Hey there! How's life been treating you these days?</s>
Tokens generated: 15
Input -> lol
<s>[INST] lol [/INST]
Prompt tokens: 9
Llama.generate: prefix-match hit
</s>
Tokens generated: 1
(restarted the script)
Input -> hi
<s>[INST] hi [/INST]
Prompt tokens: 9
Hello! How can I assist you today?</s>
Tokens generated: 11
Input -> how r ya
<s>[INST] hi [/INST] Hello! How can I assist you today?</s>
[INST] how r ya [/INST]
Prompt tokens: 31
Llama.generate: prefix-match hit
</s>
Tokens generated: 1
Input -> why ..
<s>[INST] hi [/INST] Hello! How can I assist you today?</s>
[INST] how r ya [/INST]</s>
[INST] why .. [/INST]
Prompt tokens: 42
Llama.generate: prefix-match hit
</s>
Tokens generated: 1