🏛️

Tech Leader Path

You've built and led systems for years. This path gives you the accurate mental model for AI engineering: the ROI framing, the risk surface, and the architectural decisions that actually matter. No boilerplate, no hand-waving.

Big picture first Minimal code Decision-focused

What you'll come away with

✓ An accurate mental model of how LLMs, agents, and protocols actually work: not marketing
✓ How to measure ROI on AI projects and frame the buy vs. build vs. tune decision
✓ The real risk surface: prompt injection, supply chain attacks, EU AI Act obligations
✓ What agent protocols (MCP, A2A) mean for your vendor and integration strategy

Your curriculum

4.1

What is an Agent

An agent is not a smarter chatbot: it is a different execution model. This module defines what makes something agentic, maps the spectrum from single call to autonomous agent, and gives you the decision matrix to know which approach fits your problem.

→

6.1

What Makes LLM Evaluation Hard

Learn why LLM eval is structurally different from traditional ML testing, what the three axes of eval design are, and how to build a mental model for the rest of the track.

→

1.1

What is an LLM?

Large Language Models are stateless text-transformation functions: they take text in and return text out, with no memory between calls. Understanding this one fact shapes every architectural decision you'll make with AI.

→

5.1

Hosting Options

Choosing where to run your model determines your cost structure, latency floor, and operational burden: understanding the tradeoffs between API inference, self-hosted, and cloud-managed endpoints lets you pick the right option for each workload rather than defaulting to whatever is easiest to start.

→

2.1

What is RAG and Why

LLMs know a lot, but they don't know your data. Retrieval-Augmented Generation is the pattern that fixes this: not by training the model on your data, but by finding the relevant pieces at query time and handing them directly to the model.

→

7.1

The AI Threat Landscape

Every LLM application has a multi-layer attack surface: model, context, tools, memory, and outputs. Understanding what attackers want and what they can do is the prerequisite to building defences that actually hold. This module maps the threat landscape and establishes why defence in depth is not optional.

→

8.1

AI ROI: What Actually Gets Measured

Most AI pilots show impressive returns that evaporate at scale. Understanding why, and how to measure value correctly, is the difference between AI investments that compound and ones that quietly fail.

→

1.2

How Prompts Work

A prompt is not a question: it's a structured program. Understanding its anatomy (system instruction, conversation history, user message) lets you communicate intent reliably and debug output failures systematically.

→

7.2

Prompt Injection

Prompt injection is the most prevalent attack class in LLM applications. It takes two forms: direct injection from user input, and indirect injection through retrieved documents or tool results. Both exploit the same root cause: the model cannot distinguish instructions from data when they share the same channel.

→

8.2

Buy vs Build vs Fine-tune

Every AI capability involves a make-or-buy decision, but the options are more nuanced than they look. This module gives you a decision framework and total cost of ownership model for each path.

→

1.3

Models and Model Selection

Not every task needs the most capable model. Understanding the capability-cost-latency tradeoff lets you pick the right model for each job, and avoid paying frontier prices for work a smaller model handles just as well.

→

7.3

Jailbreaking and Policy Bypass

Jailbreaking is the attempt to get a model to produce output that its alignment training or system prompt prohibit. No defence is permanent: the arms race between jailbreak techniques and countermeasures is ongoing. This module covers the attack taxonomy and the multi-layer defences that reduce, but never eliminate, the risk.

→

8.3

Where AI Creates Durable Advantage

Most AI features can be replicated by any competitor with API access. Durable advantage comes from the layer underneath: proprietary data, deep workflow integration, and feedback loops that compound over time.

→

1.4

Hallucinations and Model Reliability

LLMs generate plausible text, not verified truth. Understanding why models hallucinate, and how to architect around it, is the single most important reliability concern in production AI systems.

→

7.4

Data Privacy and PII

LLM systems create new PII leakage vectors that traditional data protection controls do not cover: model memorisation, cross-user context leakage, and RAG pipelines that pull in customer records without scrubbing. This module covers detection, scrubbing, retention, and the vendor agreements that govern what happens to your data.

→

8.4

Team Structure and AI Capability

How you organise your AI function determines what it can ship. This module maps the tradeoffs between centralised and federated models, defines the roles that actually matter, and gives you a maturity test for assessing whether your AI team can reach production.

→

6.5

Cost Attribution & Token Budgets

Learn to track, attribute, and control LLM API costs before the invoice surprises you: per-request tagging, per-feature aggregation, token budget enforcement, and anomaly alerting.

→

1.5

Structured Output and Tool Use

Getting reliable, machine-readable output from an LLM requires more than asking nicely. Structured output and tool use turn a text generator into a component your application can depend on.

→

9.5

Multimodal Safety

Images and audio introduce attack surfaces that text-only safety systems do not cover: injected instructions inside images, adversarial visual inputs, deepfakes, and PII embedded in non-text modalities. This module covers the threat model for multimodal inputs and the defensive patterns that close the gaps.

→

7.5

Guardrails Architecture

Guardrails are controls on inputs, outputs, or both: classifiers, validators, and policy checks that run independently of the model. Designing a guardrails architecture means choosing which controls to apply, how to layer them for coverage and performance, and how to calibrate them so false positives do not kill legitimate use.

→

8.5

Managing AI Risk at the Org Level

AI systems introduce risk categories that traditional software governance does not cover. This module maps the five risk categories, explains how to set risk appetite, and distinguishes real risk management from risk theatre.

→

4.6

Human-in-the-Loop

Human oversight is not a bolt-on safety feature: it is an architectural primitive that determines what an agent is permitted to do autonomously and what requires a human decision. This module covers the design of approval gates, interrupt points, confidence escalation, and audit trails that make human oversight practical at scale.

→

1.6

Context and Memory Management

LLMs are stateless: they have no memory between calls. Every form of 'memory' in an AI application is something your code explicitly puts into the context window. Understanding how to manage that window is the core engineering skill behind every reliable AI system.

→

7.6

Supply Chain Security

The AI supply chain, base model, fine-tuning data, adapters, Python packages, and API keys, has more attack surfaces than teams typically consider. A .pkl file is executable code. An unverified model weight can contain backdoors. This module covers the controls that keep your AI system trustworthy from training data to production inference.

→

8.6

Communicating AI to Stakeholders

The gap between what engineers know about AI systems and what stakeholders need to hear is where AI projects lose trust. This module gives you the frameworks to communicate outcomes, risk, cost, and failures in language that lands.

→

6.7

Production Monitoring & Drift Detection

Learn to detect quality regressions, distribution shifts, and cost anomalies in live LLM systems before users report them: using metrics, statistical process control, and a sample-and-judge pipeline.

→

1.7

Evaluating LLM Systems

LLM outputs are probabilistic and hard to unit-test. Building a systematic evaluation practice, before you ship, and continuously in production, is what separates AI features that stay reliable from ones that silently degrade.

→

7.7

Regulatory Landscape

The regulatory environment for AI is moving quickly. The EU AI Act introduced risk tiers and mandatory requirements. GDPR has always applied to automated decision-making. The US has the NIST AI RMF. This module maps the landscape for a B2B SaaS product using LLMs: what you likely need to document, what you need to avoid, and where you need legal counsel.

→

8.7

AI Procurement and Vendor Evaluation

Choosing an AI vendor on benchmark performance alone is one of the most reliable ways to end up with the wrong vendor. This module gives you a complete evaluation framework covering quality, pricing, data handling, SLAs, and exit planning.

→

4.8

Agent Evaluation

Evaluating an agent is fundamentally different from evaluating a model. The question is not just 'was the answer correct?' but 'did the agent take the right path to get there, and would it hold up under different conditions?' This module covers offline trajectory evaluation and online production monitoring: the two distinct disciplines that together keep agent quality measurable.

→

6.8

Red-teaming & Adversarial Evaluation

Learn to systematically discover failure modes in LLM systems before attackers do: how to run a red-team session, categorize findings, and convert every confirmed vulnerability into a permanent regression test.

→

1.8

Safety and Guardrails

Safety in AI systems is not a single feature: it is a layered architecture. Understanding what the model handles automatically, what you must build, and where the gaps are is essential before shipping anything user-facing.

→

5.8

Scaling & Cost Management

LLM serving costs accumulate differently from typical web services; GPU-hours are expensive, autoscaling on CPU metrics is wrong, and scale-to-zero creates cold-start latency that makes it unsuitable for interactive workloads; knowing the right signals to scale on and how to build the cost math keeps infrastructure expenses from becoming a surprise.

→

9.8

The Multimodal Frontier

Multimodal AI is advancing faster than any other part of the field: native multimodality, video understanding, and real-time audio-visual interaction are moving from research to production on a timescale of months. This module covers where the field is heading and, more importantly, what durable knowledge to invest in when specific capabilities become outdated within a year.

→

7.8

Incident Response for AI Systems

An AI incident is not a software incident: it involves model misbehaviour, safety violations, or data leakage, each with distinct root causes and remediation paths. This module covers detection, containment, investigation, and post-mortem structure for AI-specific incidents, and the one logging investment that makes all of it possible.

→

8.8

Building an AI-Ready Data Foundation

Most AI ambitions stall not on model capability but on data readiness. This module gives you a practical checklist to assess whether your data is ready for AI, and explains why data infrastructure investment returns more than model investment for most organisations.

→

6.9

Cost Management

LLM costs are non-linear and easy to underestimate — especially in multi-agent systems where one orchestration call spawns dozens of sub-calls. This module covers token economics, prompt caching, cost ceilings with graceful degradation, and the attribution infrastructure needed to run LLM workloads sustainably.

→

1.9

Prototype to Production Checklist

A prototype that works in a demo is not a production system. This capstone synthesises every Foundations concept into a practical checklist: the gaps teams consistently miss when shipping their first AI feature.

→

5.9

Fine-Tuning: When & Why

Fine-tuning is one of several ways to adapt a model to a task — and often the most expensive, slowest, and most fragile. This module is a decision framework: when to fine-tune, when not to, and what you give up either way.

→

7.9

EU AI Act & Governance, Risk, and Compliance

The EU AI Act is the first comprehensive binding regulation for AI systems. It classifies AI by risk tier, imposes strict obligations on high-risk deployments, and prohibits specific uses outright. This module covers what you must do, what you cannot do, and how to determine which rules apply to your system.

→

4.10

Internal Coding Agents

Coding agents are moving from personal tools to team infrastructure. This module covers the architecture for deploying coding agents internally — startup context, sandboxed execution, CI integration, and the review gates that keep automation safe.

→

1.11

Reasoning Models and Test-Time Compute

Reasoning models generate internal thinking traces before responding, trading token cost for accuracy on hard problems. This module explains when that trade is worth making, how to budget reasoning tokens, and what the empirical evidence says about where test-time compute actually helps.

→

6.12

Human Feedback Operations

Human review of AI output is not a checkbox — it's an operational discipline with its own failure modes. Reviewer quality degrades over time, labels drift, and retraining on degraded data makes models worse. This module covers the workflows, tooling, and quality controls that keep human feedback reliable.

→

5.12

Sovereign & Air-Gapped AI Architecture

Some data cannot leave your environment. Air-gapped AI deployments run the full stack — embeddings, vector database, and inference — entirely on-premise with no internet access. The architecture is straightforward; the hard parts are model provenance, patch strategy, and keeping the system from going stale.

→

2.12

Long-Context vs RAG Decision Framework

Models with million-token context windows seem to make RAG obsolete. They don't. The decision between long-context, RAG, and hybrid depends on update frequency, query pattern, cost ceiling, and latency SLO — not just how large your documents are.

→

Start here →