Interactive challenge
Inference Cost Model Lab
Calculate cost per request from input tokens, output tokens, GPU profile, utilization, and cache behavior.
Prerequisites
Latency and token metrics
Guided step
Build the signal model
Connect user latency to runtime queueing, GPU pressure, token throughput, logs, and traces.
Commands
kubectl -n llm-serving get pod --show-labels
curl -sS "$METRICS_ENDPOINT" | grep -E "ttft|queue|tokens|gpu"
kubectl -n llm-serving logs deploy/<runtime-deployment> --tail=80
Expected signals
- Metrics have stable route, model, and tenant-safe labels.
- TTFT and queue wait are visible separately.
- GPU pressure can be correlated with user latency.
Checks
Paste metric names or dashboard notes for LLM serving.
Confirm that prompt text is not used as a metric label.
Hints and solution
No hints opened for this step yet.