- [x] Use `llama_decode` instead of deprecated `llama_eval` in `Llama` class - [ ] Implement batched inference support for `generate` and `create_completion` methods in `Llama` class - [ ] Add support for streaming / infinite completion