Skip to content

[Misc] Support adapter scaling to all replicas #1132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dittops
Copy link
Contributor

@dittops dittops commented May 23, 2025

Pull Request Description

Support adapter scaling to all replicas.

  • allow controller to sync adapter instances with all active pods
  • load adapter on each pod
  • update EndpointSlice with all pod IPs
  • adjust resources and tests for multi-pod support

Related Issues

Resolves: #1095

@dittops dittops force-pushed the feature/adapter-multi-pod-scaling branch from 52d03df to 1604126 Compare May 24, 2025 19:49
@varungup90 varungup90 requested a review from Jeffwan May 28, 2025 16:53
@Jeffwan
Copy link
Collaborator

Jeffwan commented May 28, 2025

@dittops Great! I will spend some time this week to review this change

return nil
}
if targetPod.DeletionTimestamp != nil {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add a log message here for any pods skipped

if err := r.Get(ctx, types.NamespacedName{Namespace: instance.Namespace, Name: instance.Name}, found); err != nil {
if apierrors.IsNotFound(err) {
klog.Warningf("Failed to fetch the endpoint slice %s", klog.KObj(instance))
podList := []corev1.Pod{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the logic has been simplified a lot. I will double check the logics

}
if len(activePods) != 0 {
selectedPod, err = r.schedulePod(ctx, instance, activePods)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

em. @dittops can you confirm the scheduling logic has been disabled in the new workflow? any reasons?

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 16, 2025

@dittops I think the only part I was not that sure is the scheduling part. can you give more details?

@dittops
Copy link
Contributor Author

dittops commented Jun 18, 2025

Have used the following logic for Adapter loading/unloading

  1. The pods are selected with Label-Based Matching - adapter.model.aibrix.ai/enabled: "true"
  2. The selected pods are added to the Status.Instances list
  3. Use the reconcileLoading function to iteratively load the adapters on each of the pods from the Instances list.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 19, 2025

@dittops the workflow sounds good. from the change change, I notice the lora scheduling logic has been deleted. In this case, how to select pods?
image

@dittops
Copy link
Contributor Author

dittops commented Jun 19, 2025

scheulePod was used to pick one pod and then assigned to instance.Status.Instances. Instead of choosing one pod, the new approach uses ALL pods that match the selector and adds all matching pods to instance.Status.Instances

If we need to keep the schedulePod, we can move the getActivePodsForModelAdapter inside schedulePod to select all pods and then return a list

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 19, 2025

@dittops Yeah, I think the behavior has changed a bit recently.

Option 1: Schedule the LoRA model to specific pods based on the specified replicas.
Option 2: Load the LoRA into all base model replicas so that all models are identical — this is the approach you're switching to.

While Option 2 is a valid pattern that we can support, I strongly recommend sticking with Option 1 (with multi-replica support) as the primary solution for now. In our case, some high-rank LoRA models are quite large, and it's not practical to scale using Option 2. We could consider adding Option 2 as a separate feature later.

What do you think?

@dittops
Copy link
Contributor Author

dittops commented Jun 19, 2025

@Jeffwan Are you referring to adding a replica count in the ModelAdapter and use that for scheduling?
eg:

  spec:
    replicas: 3  # Only load on 3 pods
    podSelector:
      matchLabels:
        model.aibrix.ai/name: base-model

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 19, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ModelAdapters do not dynamically route to new pods
2 participants