Prometheus Metrics

Prometheus metrics exported by Kueue

Kueue exposes prometheus metrics to monitor the health of the system and the status of ClusterQueues.

Kueue health

Use the following metrics to monitor the health of the kueue controllers:

Metric name Type Description Labels
kueue_admission_attempts_total Counter The total number of attempts to admit workloads. Each admission attempt might try to admit more than one workload. result: possible values are success or inadmissible
kueue_admission_attempt_duration_seconds Histogram The latency of an admission attempt. result: possible values are success or inadmissible

ClusterQueue status

Use the following metrics to monitor the status of your ClusterQueues:

Metric name Type Description Labels
kueue_pending_workloads Gauge The number of pending workloads. cluster_queue: the name of the ClusterQueue
status: possible values are active or inadmissible
kueue_admitted_workloads_total Counter The total number of admitted workloads. cluster_queue: the name of the ClusterQueue
kueue_admission_wait_time_seconds Histogram The time between a Workload was created until it was admitted. cluster_queue: the name of the ClusterQueue
kueue_admitted_active_workloads Gauge The number of admitted Workloads that are active (unsuspended and not finished) cluster_queue: the name of the ClusterQueue
kueue_cluster_queue_status Gauge Reports the status of the ClusterQueue cluster_queue: The name of the ClusterQueue
status: Possible values are pending, active or terminated. For a ClusterQueue, the metric only reports a value of 1 for one of the statuses.

Optional metrics

The following metrics are available only if metrics.enableClusterQueueResources is enabled in the manager’s configuration.

Metric name Type Description Labels
kueue_cluster_queue_resource_usage Gauge Reports the ClusterQueue’s total resource usage cohort: The cohort in which the queue belongs
cluster_queue: The name of the ClusterQueue
flavor: referenced flavor
resource: The resource name
kueue_cluster_queue_nominal_quota Gauge Reports the ClusterQueue’s resource quota cohort: The cohort in which the queue belongs
cluster_queue: The name of the ClusterQueue
flavor: referenced flavor
resource: The resource name
kueue_cluster_queue_borrowing_limit Gauge Reports the ClusterQueue’s resource borrowing limit cohort: The cohort in which the queue belongs
cluster_queue: The name of the ClusterQueue
flavor: referenced flavor
resource: The resource name