Interactive challenge

vLLM Inference Challenge

Deploy a GPU-backed OpenAI-compatible endpoint and prove scheduling, health, TTFT, queueing, and rollback readiness.

Difficulty

Hard

Duration

75 min

Persona

AI infrastructure engineer

Tools

kubectl, vLLM, Prometheus

Prerequisites

GPU node poolGPU scheduling basicsModel artifact access

Active step 01

Prepare the platform boundary

running

Create or identify the namespace, labels, and ownership metadata that make the workload reviewable.

lab@k8sllm:llm-serving

Kubernetes context is loaded. Type commands directly or run the step sequence.