Interactive challenge

Inference Cost Model Lab

Calculate cost per request from input tokens, output tokens, GPU profile, utilization, and cache behavior.

Prerequisites

Latency and token metrics

Guided step

Build the signal model

Connect user latency to runtime queueing, GPU pressure, token throughput, logs, and traces.

Commands

kubectl -n llm-serving get pod --show-labels
curl -sS "$METRICS_ENDPOINT" | grep -E "ttft|queue|tokens|gpu"
kubectl -n llm-serving logs deploy/<runtime-deployment> --tail=80

Expected signals

  • Metrics have stable route, model, and tenant-safe labels.
  • TTFT and queue wait are visible separately.
  • GPU pressure can be correlated with user latency.

Checks

Paste metric names or dashboard notes for LLM serving.

Confirm that prompt text is not used as a metric label.

Hints and solution

No hints opened for this step yet.