Quiz: What is an LLM — For SRE / DevOps

4 questions. Check your understanding before moving on.

LLM Fundamentals — SRE / DevOps

A 70B parameter model in FP16 requires approximately how much GPU memory just to load the weights?

Your LLM service processes independent requests from multiple users. What load balancing strategy is sufficient?

Your monitoring dashboard shows normal Tokens Per Second but Time to First Token (TTFT) p99 has spiked 3x. What is the most likely cause?

You're evaluating API-hosted vs self-hosted LLMs for a workload processing ~500K tokens/day. Which is the strongest argument for API-hosted?