Segmentation fault while generating. 

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [X] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Continue the generation and gracefully exit.

# Current Behavior

Segmentation fault while generating tokens. It usually happens after generating ~121 tokens (I did 4 different prompts which crashed at token 122, 121, 118 and 124), and it doesn't seem to happen in the llama.cpp `./main` example.

# Environment and Context

I am utilizing context size of 512, prediction 256 and batch 1024. The rest of the settings are default. I am also utilizing CLBlast which on llama.cpp gives me 2.5x boost in performence. I am also using libllama.so built from the latest `llama.cpp` source, so I can debug it with gdb.

* AMD Ryzen 5 3600 6-Core Processor + RX 580 4 GB

```
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 5 3600 6-Core Processor
    CPU family:          23
    Model:               113
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU(s) scaling MHz:  94%
    CPU max MHz:         4208,2031
    CPU min MHz:         2200,0000
    BogoMIPS:            7186,94
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
                         x mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
                         od nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl p
                         ni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe
                          popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy sv
                         m extapic cr8_legacy abm sse4a misalignsse 3dnowprefetc
                         h osvw ibs skinit wdt tce topoext perfctr_core perfctr_
                         nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate
                          ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bm
                         i2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsa
                         veopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_tota
                         l cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd
                          arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean 
                         flushbyasid decodeassists pausefilter pfthreshold avic 
                         v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_re
                         cov succor smca sev sev_es

```

```
Number of devices                                 1
  Device Name                                     gfx803
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 
  Driver Version                                  3513.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon RX 580 Series
```

* Linux:

`Linux bober-desktop 6.3.1-x64v1-xanmod1-2 #1 SMP PREEMPT_DYNAMIC Sun, 07 May 2023 10:32:57 +0000 x86_64 GNU/Linux`

* Versions:

```
Python 3.11.3

GNU Make 4.4.1
Built for x86_64-pc-linux-gnu

g++ (GCC) 13.1.1 20230429
```

# Failure Information (for bugs)

```
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff6c35769 in ggml_element_size (tensor=0x7ffea0fff130) at ggml.c:3666
3666        return GGML_TYPE_SIZE[tensor->type];
```

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. Add my class to the source code https://gist.github.com/Firstbober/d7f97e7f743a973c14425424e360eeda
2. Create an instance with WizardLM-7b
3. Use `llamaChat.load_context` with some lengthy prompt (mine has 1300 characters)
4. `llamaChat.generate` try to generate something, I used this piece of code:
```py
tokens = ""
i = 0
for token in llamaChat.generate('[[YOU]]: Write me a long essay about cookies, as long as you can.\n'):
    print(token, i)
    tokens += token
    i += 1

print(tokens)
```
5. Watch as the python crumbles at around token 121.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segmentation fault while generating. #218

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Segmentation fault while generating. #218

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions