Interactive challenge

Inference Cost Model Lab

Calculate cost per request from input tokens, output tokens, GPU profile, utilization, and cache behavior.

Difficulty

Medium

Duration

45 min

Persona

AI platform lead

Tools

spreadsheet, metrics export, benchmark report

Prerequisites

Latency and token metrics

Active step 01

Build the signal model

running

Connect user latency to runtime queueing, GPU pressure, token throughput, logs, and traces.

lab@k8sllm:llm-serving

Kubernetes context is loaded. Type commands directly or run the step sequence.