🚨FAQs | 常见问题🚨

> [!NOTE]
> Please **avoid** creating issues regarding the following questions, as they might be closed without a response.
> 请**避免**创建与下述问题有关的 issues，这些 issues 可能不会被回复。

> [!TIP]
> Documentation: https://llamafactory.readthedocs.io/en/latest/
> 中文文档：https://llamafactory.readthedocs.io/zh-cn/latest/
> NPU 中文文档：https://ascend.github.io/docs/sources/llamafactory/
> 中文版入门教程：https://zhuanlan.zhihu.com/p/695287607

----

### Most of problems / 大多数问题

#### Versions of dependencies conflict / 依赖库版本冲突

#### Supported models are not found / 无法找到已支持的模型

#### llamafactory-cli: command not found / 无法找到命令

Please update repository and install again using the following approach.

请按照以下方式更新仓库并重新安装。

```bash
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git && cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation
```

----

### Out-of-memory / 显存溢出

The out-of-memory (OOM) error during training is usually due to insufficient VRAM of the current device to complete the computation. You can try the following methods to deal with this issue:

1. Reduce the training batch size `per_device_train_batch_size: 1`
2. Reduce the maximum sequence length `cutoff_len: 512`
3. Replace compute kernels `enable_liger_kernel: true` and `use_unsloth_gc: true`
4. Use DeepSpeed ZeRO-3 or FSDP to partition model weights on multiple devices or use CPU offloading
5. Set `quantization_bit: 4` to quantize model parameters (only compatible with LoRA tuning)
6. Use the paged optimizer `optim: paged_adamw_8bit`

模型训练时显存溢出，通常是由于当前某个设备的剩余显存不足以完成计算任务。可尝试下述方法解决：

1. 降低批处理大小 `per_device_train_batch_size: 1`
2. 降低最大序列长度 `cutoff_len: 512`
3. 替换模型算子 `enable_liger_kernel: true` 和 `use_unsloth_gc: true`
4. 使用 DeepSpeed ZeRO-3 或 FSDP 将模型权重拆分到多个设备或使用 CPU Offloading
5. 设置 `quantization_bit: 4` 量化模型参数（仅限于 LoRA 方法）
6. 使用分页低精度优化器 `optim: paged_adamw_8bit`

----

### Unsatisfying fine-tuning results / 微调效果无法令人满意

Unsatisfying fine-tuning results are usually due to insufficient training samples, leading to underfitting. You can try the following methods to deal with this issue:

1. Increase the size of the training dataset
2. Increase the number of epochs `num_train_epochs: 5.0` or steps `max_steps: 1000`
3. Use a larger learning rate `learning_rate: 2.0e-4`
4. Use different fine-tuning method `finetuning_type: freeze` or `finetuning_type: full`

微调效果较差，通常是由于训练样本过少，导致模型欠拟合。可尝试下述方法解决：

1. 增加训练数据集的大小
2. 提高训练轮数 `num_train_epochs: 5.0` 或步数 `max_steps: 1000`
3. 增大学习率 `learning_rate: 2.0e-4`
4. 使用不同的微调方法 `finetuning_type: freeze` 或 `finetuning_type: full`

----

### Corrupted or repeated model responses / 胡乱或重复的模型回答

If this issue occurs before training, it is usually due to using an unaligned (base) model or a mismatched `template`. Please ensure an aligned (instruct/chat) model and correct `template` are used.
If this issue occurs after training, please check if the `template` used for training and inference is consistent. And do not forget to check if the overfitting appeared. You can try decreasing the number of epochs `num_train_epochs` and learning rate `learning_rate` to deal with the overfitting issue.

若该问题发生在训练之前，通常是由于使用了未经对齐（base）的模型或不恰当的模板 `template`，请保证使用对齐后（instruct/chat）的模型和正确的模板 `template`。
若该问题发生在训练之后，请检查训练和推理使用的模板 `template` 是否一致，同时检查是否发生了过拟合。如果发生了过拟合，请减小训练轮数 `num_train_epochs` 和学习率 `learning_rate`。

----

### Training hangs / 训练进程卡住

If distributed training was not enabled, please use the following command to check if the CUDA version of PyTorch is installed correctly:

```bash
python -c "import torch; print(torch.cuda.is_available())"
```

If distributed training was enabled, try setting the environment variable `export NCCL_P2P_LEVEL=NVL`.

如果没有使用分布式训练，请使用下述命令检查 CUDA 版本的 PyTorch 是否被正确安装：

```bash
python -c "import torch; print(torch.cuda.is_available())"
```

如果使用了分布式训练，请尝试设置环境变量 `export NCCL_P2P_LEVEL=NVL`。

----

### LLaMA Board cannot display datasets / LLaMA Board 无法显示数据集

Please ensure that the working directory when launching the LLaMA Board is the same as the LLaMA-Factory directory.

请确保启动 LLaMA Board 时的工作目录与 LLaMA-Factory 主目录一致。

----

### How to shard model weights on multiple devices / 如何模型权重拆分到多个设备上

During the training phase, please refer to the [examples](https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/README.md#supervised-fine-tuning-with-deepspeed-zero-3-weight-sharding) about how to use the DeepSpeed ZeRO-3 (recommended) or FSDP.
During the inference phase, please use vLLM to enable the tensor parallelism: [examples](https://github.com/hiyouga/LLaMA-Factory/tree/main/examples#inferring-lora-fine-tuned-models).

在训练阶段，请参考 [examples](https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/README_zh.md#%E4%BD%BF%E7%94%A8-deepspeed-zero-3-%E5%B9%B3%E5%9D%87%E5%88%86%E9%85%8D%E6%98%BE%E5%AD%98) 使用 DeepSpeed ZeRO-3（推荐）或 FSDP。
在推理阶段，请使用 vLLM 来开启张量并行：[examples](https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/README_zh.md#%E6%8E%A8%E7%90%86-lora-%E6%A8%A1%E5%9E%8B).

----

### How to use ORPO or SimPO / 如何使用 ORPO 或 SimPO

Modify the `pref_loss` in [example script](https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/llama3_lora_dpo.yaml) to `orpo` or `simpo`.

将[示例脚本](https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/llama3_lora_dpo.yaml) 中的 `pref_loss` 改为 `orpo` 或 `simpo`。

----

### How to debug with VSCode / 如何用 VSCode 调试程序

See #5337

----

### Why the number of examples is small in pre-training / 为什么预训练样本数比实际的少

We automatically use packing in pre-training, where we concatenate multiple samples into one sequence, so the number of examples displayed is less than the actual number.

我们在预训练时候自动使用了 Packing，将多个样本打包成一条序列，因此显示的样本数量会比实际的少。

----

### Will the training data be shuffled / 训练数据是否会被打乱

LLaMA-Factory will randomly shuffle the training data by default. You can use `disable_shuffling` to turn off the shuffling.

LLaMA-Factory 默认会随机打乱训练数据，可使用 `disable_shuffling` 关闭打乱。

----

### How to enable streaming / 如何启用流式数据读取

We recommend shuffling the dataset before training if you want to use streaming.

如果您希望使用流式数据读取，请在训练前手动打乱数据。

```yaml
buffer_size: 128
preprocessing_batch_size: 128
streaming: true
accelerator_config:
  dispatch_batches: false
```

----

> [!TIP]
> If the problems still exist with the **latest** code, please create an issue.
> 若使用**最新的**代码仍然无法解决问题，请创建一个 issue。


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚨FAQs | 常见问题🚨 #4614

Most of problems / 大多数问题

Versions of dependencies conflict / 依赖库版本冲突

Supported models are not found / 无法找到已支持的模型

llamafactory-cli: command not found / 无法找到命令

Out-of-memory / 显存溢出

Unsatisfying fine-tuning results / 微调效果无法令人满意

Corrupted or repeated model responses / 胡乱或重复的模型回答

Training hangs / 训练进程卡住

LLaMA Board cannot display datasets / LLaMA Board 无法显示数据集

How to shard model weights on multiple devices / 如何模型权重拆分到多个设备上

How to use ORPO or SimPO / 如何使用 ORPO 或 SimPO

How to debug with VSCode / 如何用 VSCode 调试程序

Why the number of examples is small in pre-training / 为什么预训练样本数比实际的少

Will the training data be shuffled / 训练数据是否会被打乱

How to enable streaming / 如何启用流式数据读取

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

🚨FAQs | 常见问题🚨 #4614

Description

Most of problems / 大多数问题

Versions of dependencies conflict / 依赖库版本冲突

Supported models are not found / 无法找到已支持的模型

llamafactory-cli: command not found / 无法找到命令

Out-of-memory / 显存溢出

Unsatisfying fine-tuning results / 微调效果无法令人满意

Corrupted or repeated model responses / 胡乱或重复的模型回答

Training hangs / 训练进程卡住

LLaMA Board cannot display datasets / LLaMA Board 无法显示数据集

How to shard model weights on multiple devices / 如何模型权重拆分到多个设备上

How to use ORPO or SimPO / 如何使用 ORPO 或 SimPO

How to debug with VSCode / 如何用 VSCode 调试程序

Why the number of examples is small in pre-training / 为什么预训练样本数比实际的少

Will the training data be shuffled / 训练数据是否会被打乱

How to enable streaming / 如何启用流式数据读取

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions