-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[docs] Create clearer optimization sections #4870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
</Tip> | ||
|
||
CPU offloading can also be chained with attention slicing to reduce memory consumption to less than 2GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still promote attention slicing here with CPU offloading (and below for model offloading)? I think we removed it a while ago and we're just keeping it in the docstring
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New structure/format looks good to me - thanks!
717e453
to
905d81e
Compare
For some reason, the PR link to the docs does not reflect the latest changes so please be aware of that! I'll leave this open a little while longer to allow others some time to chime in before I merge 🙂 |
905d81e
to
ef25994
Compare
* refactor * update general optim sections * update more sections * few more updates * benchmark code
Implements #4786 to tidy up the optimization/special hardware section so topics are more discoverable