Troubleshooting

vLLM Deployment Pending Due to Insufficient Memory¶

The vLLM deployment pod remains in a Pending state due to an insufficient memory error.

Validate MIG Profile and GPU Capacity: Verify the configured MIG profile and available GPU memory. The node provides 3g.71gb (~71 GB), confirming that the selected MIG profile is adequate.
Check GPU Utilization: Execute nvidia-smi and confirm that no GPU workloads are running on the node.
Review Pod Resource Configuration: Inspect the pod YAML and validate the requested resources. Identify that the memory request and limit are set higher than required.
Verify vLLM SKU Service Profile: Review the vLLM SKU Service Profile configuration and confirm that the memory limit is configured with a higher value than necessary.

Update the memory configuration to an appropriate value and redeploy the model. The pod transitions to the Running state.