-
Notifications
You must be signed in to change notification settings - Fork 381
[Misc] Support adapter scaling to all replicas #1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Misc] Support adapter scaling to all replicas #1132
Conversation
Signed-off-by: dittops <[email protected]>
Signed-off-by: dittops <[email protected]>
52d03df
to
1604126
Compare
@dittops Great! I will spend some time this week to review this change |
return nil | ||
} | ||
if targetPod.DeletionTimestamp != nil { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add a log message here for any pods skipped
if err := r.Get(ctx, types.NamespacedName{Namespace: instance.Namespace, Name: instance.Name}, found); err != nil { | ||
if apierrors.IsNotFound(err) { | ||
klog.Warningf("Failed to fetch the endpoint slice %s", klog.KObj(instance)) | ||
podList := []corev1.Pod{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the logic has been simplified a lot. I will double check the logics
} | ||
if len(activePods) != 0 { | ||
selectedPod, err = r.schedulePod(ctx, instance, activePods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em. @dittops can you confirm the scheduling logic has been disabled in the new workflow? any reasons?
@dittops I think the only part I was not that sure is the scheduling part. can you give more details? |
Have used the following logic for Adapter loading/unloading
|
@dittops the workflow sounds good. from the change change, I notice the lora scheduling logic has been deleted. In this case, how to select pods? |
scheulePod was used to pick one pod and then assigned to instance.Status.Instances. Instead of choosing one pod, the new approach uses ALL pods that match the selector and adds all matching pods to instance.Status.Instances If we need to keep the schedulePod, we can move the getActivePodsForModelAdapter inside schedulePod to select all pods and then return a list |
@dittops Yeah, I think the behavior has changed a bit recently. Option 1: Schedule the LoRA model to specific pods based on the specified replicas. While Option 2 is a valid pattern that we can support, I strongly recommend sticking with Option 1 (with multi-replica support) as the primary solution for now. In our case, some high-rank LoRA models are quite large, and it's not practical to scale using Option 2. We could consider adding Option 2 as a separate feature later. What do you think? |
@Jeffwan Are you referring to adding a replica count in the ModelAdapter and use that for scheduling?
|
Pull Request Description
Support adapter scaling to all replicas.
Related Issues
Resolves: #1095