Challenge catalog

Kubernetes LLM challenges with evidence-based checks.

Start with free guided labs. Each challenge has objectives, commands, expected signals, paste-output validation, progressive hints, and solution reveal tracking.

Challenge catalog

Guided checks for Kubernetes LLM operators.

Topic
Difficulty
Path

Showing 12 of 12 challenges

Model servingHard75 minFree

vLLM Inference Challenge

Deploy a GPU-backed OpenAI-compatible endpoint and prove scheduling, health, TTFT, queueing, and rollback readiness.

Persona
AI infrastructure engineer
Tools
kubectl, vLLM, Prometheus
Progress
not started
RAGMedium60 minFree

RAG Retrieval Challenge

Operate ingestion, metadata filters, vector retrieval, answer evaluation, and failure drills for production RAG.

Persona
MLOps engineer
Tools
kubectl, curl, vector database
Progress
not started
ProductionHard50 minFree

Production Readiness Challenge

Run a launch review across security, quota, rollout, observability, cost, and ownership before live traffic.

Persona
Platform lead
Tools
kubectl, policy engine, dashboard
Progress
not started
ObservabilityMedium45 minFree

LLM Observability Challenge

Build the signal model needed to debug user latency, runtime saturation, GPU pressure, traces, logs, and alerts.

Persona
SRE
Tools
Prometheus, Grafana, OpenTelemetry
Progress
not started
Model servingMedium55 minFree

vLLM Kubernetes Deployment Lab

Design the deployment contract for vLLM with model cache, readiness, runtime flags, and service exposure.

Persona
AI infrastructure engineer
Tools
kubectl, vLLM, container registry
Progress
not started
ArchitectureMedium35 minFree

KServe vs Ray Serve Decision Lab

Choose the serving abstraction by ownership model, CRDs, graph complexity, autoscaling, and rollout needs.

Persona
Platform architect
Tools
decision matrix, runtime inventory
Progress
not started
GPU capacityHard65 minFree

GPU Node Pool Scheduling Lab

Prove accelerator placement with labels, taints, tolerations, quotas, and unschedulable-pod debugging.

Persona
Platform engineer
Tools
kubectl, NVIDIA device plugin, cluster autoscaler
Progress
not started
RAGHard70 minFree

RAG Retrieval Quality Lab

Measure retrieval recall, citation accuracy, tenant filtering, and reranking latency before generation.

Persona
MLOps engineer
Tools
evaluation set, vector database, reranker
Progress
not started
CostMedium45 minFree

Inference Cost Model Lab

Calculate cost per request from input tokens, output tokens, GPU profile, utilization, and cache behavior.

Persona
AI platform lead
Tools
spreadsheet, metrics export, benchmark report
Progress
not started
ProductionHard60 minFree

LLM Rollout and Rollback Lab

Design traffic shifting, readiness gates, rollback triggers, and model-version ownership for inference services.

Persona
SRE
Tools
Argo CD, gateway policy, metrics dashboard
Progress
not started
SecurityHard70 minFree

Multi-Tenant LLM Security Lab

Review tenant routing, namespace boundaries, secrets, NetworkPolicy, prompt logging, and retrieval authorization.

Persona
Security-minded platform engineer
Tools
kubectl, NetworkPolicy, admission policy
Progress
not started
ObservabilityMedium50 minFree

LLM Observability and Cost Dashboard Lab

Create a dashboard model that joins user latency, queue wait, GPU pressure, token throughput, and cost signals.

Persona
SRE
Tools
Prometheus, Grafana, OpenTelemetry
Progress
not started