-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Closed
Labels
:Distributed Coordination/Autoscaling:mlMachine learningMachine learning>bugTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:MLMeta label for the ML teamMeta label for the ML team
Description
Issue
Versions: 7.11-7.13
Fixed in: 7.14+
Due to poor estimations, it is possible that a scale down request accidentally requires a scale up.
Here is a response that epitomizes the scenario:
"ml": {
"required_capacity": {
"node": {
"memory": 2520765440
},
"total": {
"memory": 2520765440
}
},
"current_capacity": {
"node": {
"storage": 0,
"memory": 2147483648
},
"total": {
"storage": 0,
"memory": 6442450944
}
},
"current_nodes": [
{
"name": "instance-0000000099"
},
{
"name": "instance-0000000100"
},
{
"name": "instance-0000000101"
}
],
"deciders": {
"ml": {
"required_capacity": {
"node": {
"memory": 2520765440
},
"total": {
"memory": 2520765440
}
},
"reason_summary": "Requesting scale down as tier and/or node size could be smaller",
"reason_details": {
"waiting_analytics_jobs": [],
"waiting_anomaly_jobs": [],
"configuration": {},
"perceived_current_capacity": {
"node": {
"memory": 2503160627
},
"total": {
"memory": 6074310888
}
},
"required_capacity": {
"node": {
"memory": 2520765440
},
"total": {
"memory": 2520765440
}
},
"reason": "Requesting scale down as tier and/or node size could be smaller"
}
}
}
}
Note how the current size is actually 2GB (2147483648
), but ML's estimation is off due to rounding values inappropriately (2520765440
). This actually caused a scale up instead of a scale down.
Work around
If you have an Elasticsearch version that suffers from this and the scenario occurs, it is possible to statically set the minimum and maximum autoscaling sizes for ML inside of elastic cloud.
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/Autoscaling:mlMachine learningMachine learning>bugTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:MLMeta label for the ML teamMeta label for the ML team