Troubleshooting Pods
This doc is about troubleshooting plain Pods when directly managed by Kueue, in other words, Pods that are not managed by kubernetes Jobs or supported CRDs.
Note
This doc focuses on the behavior of Kueue when managing Pods that is different from other job integrations. You can read Troubleshooting Jobs for more general troubleshooting steps.Is my Pod managed directly by Kueue?
Kueue adds the label kueue.x-k8s.io/managed
with value true
to Pods that it manages.
If the label is not present on a Pod, it means that Kueue is not going to admit or account for the
resource usage of this Pod directly.
A Pod might not have the kueue.x-k8s.io/managed
due to one of the following reasons:
- The Pod integration is disabled.
- The Pod belongs to a namespace or has labels that don’t satisfy the requirements of
the
podOptions
configured for the Pod integration. - The Pod is owned by a Job or equivalent CRD that is managed by Kueue.
- The Pod doesn’t have a
kueue.x-k8s.io/queue-name
label andmanageJobsWithoutQueueName
is set tofalse
.
Identifying the Workload for your Pod
When using Pod groups,
the name of the Workload matches the value of the label kueue.x-k8s.io/pod-group-name
.
When using single Pods, you can identify its corresponding Workload by following the guide for Identifying the Workload of a Job.
Why doesn’t a Workload exist for my Pod group?
Before creating a Workload object, Kueue expects all the Pods for the group to be created.
The Pods should all have the same value for the label kueue.x-k8s.io/pod-group-name
and
the number of Pods should be equal to the value of the annotation kueue.x-k8s.io/pod-group-total-count
.
You can run the following command to identify whether Kueue has or has not created a Workload for the Pod:
kubectl describe pod my-pod -n my-namespace
If Kueue didn’t create the Workload object, you will see an output similar to the following:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrWorkloadCompose 14s pod-kueue-controller 'my-pod-group' group has fewer runnable pods than expected
Note
The above event might show up for the first Pod that Kueue observes, and it will remain even if Kueue successfully creates the Workload for the Pod group later.Once Kueue observes all the Pods for the group, you will see an output similar to the following:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatedWorkload 14s pod-kueue-controller Created Workload: my-namespace/my-pod-group
Why did my Pod disappear?
When you enable preemption, Kueue might preempt Pods
to accomodate higher priority jobs or reclaim quota. Preemption is implemented via DELETE
calls,
the standard way of terminating a Pod in Kubernetes.
When using single Pods, Kubernetes will delete Workload object along with the Pod, as there is nothing else holding ownership to it.
Kueue doesn’t typically fully delete Pods in a Pod group upon preemption. See the next question to understand the deletion mechanics for Pods in a Pod group.
Why aren’t Pods in a Pod group deleted when Failed or Succeeded?
When using Pod groups, Kueue keeps a finalizer
kueue.x-k8s.io/managed
to prevent Pods from being deleted and to be able to track the progress of the group.
You should not modify finalizers manually.
Kueue will remove the finalizer from Pods when:
- The group satisfies the termination criteria, for example, when all Pods terminate successfully.
- For Failed Pods, when Kueue observes a replacement Pod.
- You delete the Workload object.
Once a Pod doesn’t have any finalizers, Kubernetes will delete the Pods based on:
- Whether a user or a controller has issued a Pod deletion.
- The Pod garbage collector.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.