Skip to content

Broken generate after Add support for batched decoding #888

Closed
@tk-master

Description

@tk-master

So my script using the generate high level api function got broken after v0.2.13 and I managed to track it down to this commit #795 ab028cb

The model stops responding (eos) and seems to generate random symbols as first character

Here's my test script to demonstrate the issue:

from llama_cpp import Llama

path = '../models/' + 'mistral-7b-instruct-v0.1.Q4_K_M.gguf'
llm = Llama(model_path=path, n_gpu_layers=-1, n_ctx=4096, verbose=True)

history = []

while True:
    user_input = input("\nInput -> ")

    history.append({"role": "user", "content": user_input})

    prompt = "<s>"
    prompt += '[INST] '+ user_input +' [/INST]'
 
    # for msg in history:
    #     if msg['role'] == 'user':
    #         prompt += '[INST] '+ msg['content'] +' [/INST]'
    #     else:
    #         prompt += msg['content'] + '</s>\n'
    print(prompt)

    tok = llm.tokenize(prompt.encode("utf-8"), add_bos=False, special=True)
    print('Prompt tokens: {0}'.format(len(tok)))

    stream = llm.generate(tok, temp=0.3)

    num = 0
    output = ''
    for token in stream:
        num += 1
        if token == llm.token_eos():# or num >= max_tokens:
            print('</s>', end='', flush=True)
            break

        text = llm.detokenize([token])
        text = text.decode("utf-8")
        output += text
        print(text, end='', flush=True)

    #print("\n\nFull response:", output)
    print(f'\nTokens generated: {num}')

    history.append({"role": "assistant", "content": output})

Some example outputs:

Input -> hi
<s>[INST] hi [/INST]
Prompt tokens: 9
↓</s>
Tokens generated: 2


(restarted the script)


Input -> howdy
<s>[INST] howdy [/INST]
Prompt tokens: 10
♂ Hey there! How's life been treating you these days?</s>
Tokens generated: 15

Input -> bruh
<s>[INST] howdy [/INST]♂ Hey there! How's life been treating you these days?</s>
[INST] bruh [/INST]
Prompt tokens: 34
Llama.generate: prefix-match hit
</s>
Tokens generated: 1



(restarted the script)


Input -> hi
<s>[INST] hi [/INST]
Prompt tokens: 9
▲ Hello! How can I help you today?</s>
Tokens generated: 11



(restarted the script)



Input -> howdy
<s>[INST] howdy [/INST]
Prompt tokens: 10
► Hey there! How's life been treating you these days?</s>
Tokens generated: 15

Input -> lol
<s>[INST] lol [/INST]
Prompt tokens: 9
Llama.generate: prefix-match hit
</s>
Tokens generated: 1



(restarted the script)


Input -> hi
<s>[INST] hi [/INST]
Prompt tokens: 9
  Hello! How can I assist you today?</s>
Tokens generated: 11

Input -> how r ya
<s>[INST] hi [/INST]  Hello! How can I assist you today?</s>
[INST] how r ya [/INST]
Prompt tokens: 31
Llama.generate: prefix-match hit
</s>
Tokens generated: 1

Input -> why ..
<s>[INST] hi [/INST]  Hello! How can I assist you today?</s>
[INST] how r ya [/INST]</s>
[INST] why .. [/INST]
Prompt tokens: 42
Llama.generate: prefix-match hit
</s>
Tokens generated: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions