fix: prevent integer overflow in candle backend sequence length calcu… #681
+46
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix integer overflow in Candle backend sequence length calculation
What does this PR do?
This PR fixes a critical integer overflow bug in the Candle backend that causes CUDA driver crashes and massive memory allocation requests (~18.4 EB) when processing certain batch configurations.
Root Cause: The issue occurs in
backends/candle/src/lib.rs
where sequence lengths are calculated using unsigned integer subtraction without overflow protection:When
cumulative_seq_lengths[i] > cumulative_seq_lengths[i + 1]
(due to malformed batch data), the subtraction underflows, producing a very largeu32
value that gets cast tousize
, resulting in massive memory allocation requests that crash the CUDA driver.Solution: Replace unsafe subtraction with
checked_sub()
to detect overflow conditions and fail fast with a clear error message:Symptoms this fixes:
CUDA_ERROR_UNKNOWN
Error logs example:
Impact:
Testing: Added unit test to verify the fix correctly detects and handles invalid cumulative sequence lengths.
Fixes # (issue)
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@Narsil