Kubernetes LLM guided labs

Practice the platform checks behind production LLM systems.

K8sLLM Labs turns Kubernetes LLM architecture into interactive operator challenges: type commands in a lab terminal, inspect Kubernetes output, unlock hints, validate readiness, and keep private progress on this device.

Browse challenges Follow roadmap Read docs

Pull architecture guide->Choose challenge

Type kubectl or curl->Inspect terminal output

Regex check passes->Step completed

Open hint if blocked->Reveal solution if needed

Example checkpaste_regex

kubectl get nodes -L accelerator,nvidia.com/gpu.product

The check passes when the output proves GPU placement, node labeling, or accelerator scheduling evidence.

Challenge of the Week

vLLM Inference Challenge

Deploy a GPU-backed OpenAI-compatible endpoint and prove scheduling, health, TTFT, queueing, and rollback readiness.

Hard75 minModel servingAI infrastructure engineer

Start challenge Read guide Follow roadmap

Free challenges12

Learning paths5

Guided checks68

Progressdevice

Product paths

Built for platform engineers, DevOps, MLOps, and AI infrastructure learners.

Kubernetes LLM Foundations

Build the cluster mental model needed before serving models.

vLLM Production Serving

Move from runtime deployment to latency, health, and rollout checks.

RAG Platform Engineering

Operate ingestion, retrieval, policy, answer quality, and evaluation.

LLM Observability and Cost

Connect user latency, runtime saturation, GPU pressure, and economics.

Production Readiness for AI Workloads

Review security, rollout, tenancy, rollback, and platform ownership.

Challenge catalog

Operator lab index

6/6 visible

Topic

Difficulty

Path

IDChallengeDifficultyTimeProgress statusPersona / toolsActions

01

Model servingvllm-production-serving / production-readiness-ai

vLLM Inference Challenge

Deploy a GPU-backed OpenAI-compatible endpoint and prove scheduling, health, TTFT, queueing, and rollback readiness.

DifficultyHard

Time75 min

Progress statusnot started

Hard

75 min

not started

AI infrastructure engineer

kubectl + vLLM + Prometheus

Open lab Docs

02

RAGrag-platform-engineering

RAG Retrieval Challenge

Operate ingestion, metadata filters, vector retrieval, answer evaluation, and failure drills for production RAG.

DifficultyMedium

Time60 min

Progress statusnot started

Medium

60 min

not started

MLOps engineer

kubectl + curl + vector database

Open lab Docs

03

Productionproduction-readiness-ai / kubernetes-llm-foundations

Production Readiness Challenge

Run a launch review across security, quota, rollout, observability, cost, and ownership before live traffic.

DifficultyHard

Time50 min

Progress statusnot started

Hard

50 min

not started

Platform lead

kubectl + policy engine + dashboard

Open lab Docs

04

Observabilityllm-observability-cost / production-readiness-ai

LLM Observability Challenge

Build the signal model needed to debug user latency, runtime saturation, GPU pressure, traces, logs, and alerts.

DifficultyMedium

Time45 min

Progress statusnot started

Medium

45 min

not started

SRE

Prometheus + Grafana + OpenTelemetry

Open lab Docs

05

Model servingvllm-production-serving

vLLM Kubernetes Deployment Lab

Design the deployment contract for vLLM with model cache, readiness, runtime flags, and service exposure.

DifficultyMedium

Time55 min

Progress statusnot started

Medium

55 min

not started

AI infrastructure engineer

kubectl + vLLM + container registry

Open lab Docs

06

Architecturekubernetes-llm-foundations / vllm-production-serving

KServe vs Ray Serve Decision Lab

Choose the serving abstraction by ownership model, CRDs, graph complexity, autoscaling, and rollout needs.

DifficultyMedium

Time35 min

Progress statusnot started

Medium

35 min

not started

Platform architect

decision matrix + runtime inventory

Open lab Docs

Advanced lab access

Get notified when downloadable kits, deeper scenarios, and review worksheets are ready.

Email for early access

For now this saves interest on this device. A real signup form will include consent and email delivery before launch.