-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
Continue the generation and gracefully exit.
Current Behavior
Segmentation fault while generating tokens. It usually happens after generating ~121 tokens (I did 4 different prompts which crashed at token 122, 121, 118 and 124), and it doesn't seem to happen in the llama.cpp ./main
example.
Environment and Context
I am utilizing context size of 512, prediction 256 and batch 1024. The rest of the settings are default. I am also utilizing CLBlast which on llama.cpp gives me 2.5x boost in performence. I am also using libllama.so built from the latest llama.cpp
source, so I can debug it with gdb.
- AMD Ryzen 5 3600 6-Core Processor + RX 580 4 GB
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 5 3600 6-Core Processor
CPU family: 23
Model: 113
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 94%
CPU max MHz: 4208,2031
CPU min MHz: 2200,0000
BogoMIPS: 7186,94
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
x mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
od nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl p
ni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe
popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy sv
m extapic cr8_legacy abm sse4a misalignsse 3dnowprefetc
h osvw ibs skinit wdt tce topoext perfctr_core perfctr_
nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate
ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bm
i2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsa
veopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_tota
l cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd
arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean
flushbyasid decodeassists pausefilter pfthreshold avic
v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_re
cov succor smca sev sev_es
Number of devices 1
Device Name gfx803
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2
Driver Version 3513.0 (HSA1.1,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) AMD Radeon RX 580 Series
- Linux:
Linux bober-desktop 6.3.1-x64v1-xanmod1-2 #1 SMP PREEMPT_DYNAMIC Sun, 07 May 2023 10:32:57 +0000 x86_64 GNU/Linux
- Versions:
Python 3.11.3
GNU Make 4.4.1
Built for x86_64-pc-linux-gnu
g++ (GCC) 13.1.1 20230429
Failure Information (for bugs)
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff6c35769 in ggml_element_size (tensor=0x7ffea0fff130) at ggml.c:3666
3666 return GGML_TYPE_SIZE[tensor->type];
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- Add my class to the source code https://gist.github.com/Firstbober/d7f97e7f743a973c14425424e360eeda
- Create an instance with WizardLM-7b
- Use
llamaChat.load_context
with some lengthy prompt (mine has 1300 characters) llamaChat.generate
try to generate something, I used this piece of code:
tokens = ""
i = 0
for token in llamaChat.generate('[[YOU]]: Write me a long essay about cookies, as long as you can.\n'):
print(token, i)
tokens += token
i += 1
print(tokens)
- Watch as the python crumbles at around token 121.