Open
Description
Jan version
0.6.0
Describe the Bug
I tried loading the Gemma 3 models after the update and noticed that has much slower token generation speed, and saw that it uses the CPU instead. I tried loading other models and they are fine. This happens on both the 4b and 12b versions, I didn't try the other versions. GPU Layers are at 100.
Steps to Reproduce
Download the Gemma3 models, load and tell it to generate a test message.
Screenshots / Logs
What is your OS?
- MacOS
- Windows
- Linux