Skip to content

fix: prevent integer overflow in candle backend sequence length calcu… #681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xrwang8
Copy link

@xrwang8 xrwang8 commented Jul 11, 2025

Fix integer overflow in Candle backend sequence length calculation

What does this PR do?

This PR fixes a critical integer overflow bug in the Candle backend that causes CUDA driver crashes and massive memory allocation requests (~18.4 EB) when processing certain batch configurations.

Root Cause: The issue occurs in backends/candle/src/lib.rs where sequence lengths are calculated using unsigned integer subtraction without overflow protection:

(batch.cumulative_seq_lengths[i + 1] - batch.cumulative_seq_lengths[i]) as usize

When cumulative_seq_lengths[i] > cumulative_seq_lengths[i + 1] (due to malformed batch data), the subtraction underflows, producing a very large u32 value that gets cast to usize, resulting in massive memory allocation requests that crash the CUDA driver.

Solution: Replace unsafe subtraction with checked_sub() to detect overflow conditions and fail fast with a clear error message:

batch.cumulative_seq_lengths[i + 1]
    .checked_sub(batch.cumulative_seq_lengths[i])
    .expect("Invalid cumulative sequence lengths: sequence lengths must be non-decreasing")
    as usize

Symptoms this fixes:

  • CUDA driver errors: CUDA_ERROR_UNKNOWN
  • HAMi OOM errors with extremely large memory requests
  • Service becoming unresponsive after initial successful requests
  • nvidia-smi showing corrupted memory usage displays

Error logs example:

[HAMI-core ERROR]: Device 0 OOM 18446744073702267944 / 24146608128
DriverError(CUDA_ERROR_UNKNOWN, "<Failure when calling cuGetErrorString()>")

Impact:

  • Fixes critical CUDA driver crashes and OOM errors
  • Improves error diagnostics and debugging experience
  • Maintains full backward compatibility for valid inputs
  • Negligible performance impact (single integer comparison per sequence)

Testing: Added unit test to verify the fix correctly detects and handles invalid cumulative sequence lengths.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@Narsil

…lation

This commit fixes a critical integer overflow bug in the Candle backend that
causes CUDA driver crashes and massive memory allocation requests.

The issue occurred when calculating sequence lengths using unsigned integer
subtraction without overflow protection:
(batch.cumulative_seq_lengths[i + 1] - batch.cumulative_seq_lengths[i]) as usize

When cumulative_seq_lengths[i] > cumulative_seq_lengths[i + 1], the subtraction
underflows, producing a very large u32 value that gets cast to usize, resulting
in memory allocation requests of ~18.4 EB.

Signed-off-by: xrwang8 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant