Sovereign & Air-Gapped AI Architecture: AI Explained

Layer 1: Surface

Data sovereignty means you control where your data goes and who can access it — down to the hardware. For regulated industries (defence, financial services, healthcare under certain jurisdictions, critical infrastructure), this isn’t a preference: it’s a legal requirement.

Air-gapped deployment is the strictest implementation: the AI system has no network connectivity to the internet or external services. Everything runs on hardware you control, inside a perimeter you control.

Why this is harder than it sounds:

Most AI tooling assumes internet access. Model weights are downloaded from Hugging Face or provider APIs. Embedding models are called via REST endpoints. Vector databases pull updates from cloud. Error reporting goes to external services. In an air-gapped environment, none of these paths exist. You must pre-stage everything before the airgap.

The compliance case (for leaders):

Regulation/Framework	Key requirement	AI implication
GDPR / national equivalents	Personal data must not leave the jurisdiction	Cloud API calls are cross-border transfers
Defence/Government classified	Data must not leave classified network	Any cloud inference is prohibited
HIPAA / health data	PHI must be secured with BAAs	Self-hosted is often cleaner than BAA negotiation
Financial regulatory	Audit trail for all data processing	Self-hosted with local logging = cleaner audit
Critical infrastructure	Air-gap requirement	Absolute prohibition on external network access

The architecture at a glance:

┌─────────────────────────────────────────────────────┐
│                Air-Gapped Environment                │
│                                                     │
│  ┌─────────────┐    ┌──────────────┐    ┌────────┐  │
│  │  Documents  │───▶│  Embedding   │───▶│ Vector │  │
│  │  (on-prem)  │    │  Model       │    │   DB   │  │
│  └─────────────┘    │  (local GPU) │    │(local) │  │
│                     └──────────────┘    └────┬───┘  │
│                                              │      │
│  ┌─────────────┐    ┌──────────────┐         │      │
│  │    User     │───▶│  Inference   │◀────────┘      │
│  │   Query     │    │  Model       │                │
│  └─────────────┘    │  (local GPU) │                │
│                     └──────────────┘                │
└─────────────────────────────────────────────────────┘
         No traffic crosses this boundary

Production Gotcha: Model provenance requires cryptographic verification at download time and again at deployment. A model weight file that was clean when downloaded can be tampered with before it reaches the air-gapped environment. Hash the file at download, store the hash in a tamper-evident log, and verify that hash at deployment.

Layer 2: Guided

Full offline RAG + inference stack

The following implements a complete air-gapped RAG pipeline using only components that can run without internet access.

Components:

Inference: Ollama (runs any GGUF model locally)
Embeddings: a local sentence-transformers model
Vector store: Chroma (fully local, no external dependencies)
Documents: pre-indexed at staging time

import hashlib
import json
import os
from pathlib import Path
import chromadb
from sentence_transformers import SentenceTransformer
import ollama

# ── Model verification ──────────────────────────────────────────────────

VERIFIED_MODELS = {
    # Populate this from your tamper-evident manifest at deployment time
    "embeddings": {
        "path": "/opt/models/all-MiniLM-L6-v2",
        "sha256": "a3e7d89f2b1c4..."  # pre-computed at download time
    },
    "inference": {
        "path": "/opt/models/llama3.2-3b.gguf",
        "sha256": "b4f2a91c3d8e7..."
    }
}

def verify_model(model_path: str, expected_hash: str) -> bool:
    """Verify model file integrity before loading."""
    sha256 = hashlib.sha256()
    with open(model_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            sha256.update(chunk)
    actual_hash = sha256.hexdigest()
    if actual_hash != expected_hash:
        raise RuntimeError(
            f"Model integrity check FAILED for {model_path}. "
            f"Expected: {expected_hash[:16]}... Got: {actual_hash[:16]}... "
            f"Do not proceed — the model file may have been tampered with."
        )
    return True

# Verify before loading
verify_model(
    VERIFIED_MODELS["embeddings"]["path"],
    VERIFIED_MODELS["embeddings"]["sha256"]
)

# ── Embedding model (fully local) ───────────────────────────────────────

embedding_model = SentenceTransformer(
    VERIFIED_MODELS["embeddings"]["path"],
    # Disable any network calls — sentence-transformers will try to check
    # for model updates if given a model name string instead of a local path
    local_files_only=True,
)

# ── Vector store (fully local) ──────────────────────────────────────────

chroma_client = chromadb.PersistentClient(
    path="/opt/vectorstore/chroma_db"
    # No server URL — fully in-process, no network
)

collection = chroma_client.get_or_create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"}
)

# ── Document ingestion (run at staging time, before air-gap) ───────────

def ingest_documents(doc_paths: list[Path]) -> None:
    """Pre-index documents before sealing the air-gapped environment."""
    for path in doc_paths:
        text = path.read_text()
        # Chunk into 512-token segments (simple character split for illustration)
        chunks = [text[i:i+2000] for i in range(0, len(text), 2000)]
        embeddings = embedding_model.encode(chunks).tolist()
        collection.add(
            documents=chunks,
            embeddings=embeddings,
            ids=[f"{path.stem}-chunk-{i}" for i in range(len(chunks))],
        )
    print(f"Indexed {len(doc_paths)} documents into local vector store.")

# ── RAG query (fully air-gapped) ────────────────────────────────────────

def rag_query(user_question: str, top_k: int = 3) -> str:
    """Answer a question using only local models and local vector store."""
    # Embed the question locally
    query_embedding = embedding_model.encode([user_question]).tolist()[0]

    # Retrieve from local vector store
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k,
    )
    context_chunks = results["documents"][0]
    context = "\n\n---\n\n".join(context_chunks)

    # Generate answer using local inference model (via Ollama)
    response = ollama.chat(
        model="llama3.2:3b",  # must be pre-pulled before air-gapping
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant with access to internal documents. "
                    "Answer only from the provided context. "
                    "If the context does not contain enough information, say so."
                )
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {user_question}"
            }
        ]
    )
    return response["message"]["content"]

Staging checklist: what must be pre-staged before air-gapping

BEFORE SEALING THE AIR-GAP:
  □ Download all model weights
  □ Compute SHA-256 of each model file
  □ Store hashes in tamper-evident manifest (signed, separate from models)
  □ Pre-pull inference models into Ollama model directory
  □ Install all Python dependencies (no pip install should run inside air-gap)
  □ Pre-index document corpus into vector store
  □ Test full RAG pipeline end-to-end in offline mode
  □ Document model versions and training cutoff dates
  □ Establish patch ingress process (USB/optical media + verification)

BEFORE EACH DEPLOYMENT:
  □ Verify model file hashes against manifest
  □ Verify manifest signature
  □ Run integration smoke test

Patch and update strategy

In a connected environment, model updates happen transparently. In an air-gapped environment, you need an explicit ingress process:

Secure Download Zone (internet-connected) → Physical Media → Air-Gapped Zone
         ↓                                                         ↓
  Download new model                                        Verify hash
  Compute hash                                              Load model
  Sign manifest                                             Test
  Transfer to media                                         Replace if verified

Establish a patch cycle cadence before deployment — quarterly is common for stable models, monthly for security patches. Without a cadence, air-gapped environments drift and become stale without anyone noticing.

Layer 3: Deep Dive

Why “air-gapped” is not binary

True air-gapping (no physical network connection) is rare outside classified government environments. Most regulated deployments implement data sovereignty without full physical air-gapping:

Network-isolated VLAN — AI system on a dedicated subnet with no egress rules
Private cloud / on-premise VPC — cloud infrastructure in your data centre with no public endpoints
Sovereign cloud region — cloud provider infrastructure in a specific jurisdiction, with contractual data residency guarantees

Each has different threat models and compliance implications. Physical air-gapping eliminates network-based exfiltration but introduces the supply chain problem: the only way in or out is physical media, which can carry malware.

Model provenance and the supply chain threat

Model provenance is the ability to verify that a model file is exactly what it claims to be, and that it hasn’t been modified after its legitimate source released it.

This is harder than it sounds because:

Model files are large (multi-GB) and rarely checked after download
Hugging Face and similar repositories serve models that have not been code-reviewed by the consuming organisation
A compromised model can behave normally on test inputs while carrying adversarial backdoors triggered by specific inputs

Cryptographic verification flow:

import hashlib
import json
from datetime import datetime, timezone

def create_model_manifest(model_paths: dict[str, str]) -> dict:
    """
    Create a signed manifest at download time.
    Store this manifest separate from the models,
    in a tamper-evident system (append-only log, HSM, etc.)
    """
    manifest = {
        "created_at": datetime.now(timezone.utc).isoformat(),
        "models": {}
    }
    for name, path in model_paths.items():
        sha256 = hashlib.sha256()
        with open(path, "rb") as f:
            for chunk in iter(lambda: f.read(65536), b""):
                sha256.update(chunk)
        manifest["models"][name] = {
            "path": path,
            "sha256": sha256.hexdigest(),
            "size_bytes": os.path.getsize(path),
        }
    return manifest

def verify_against_manifest(manifest: dict) -> bool:
    """Verify all models at deployment time."""
    for name, record in manifest["models"].items():
        sha256 = hashlib.sha256()
        with open(record["path"], "rb") as f:
            for chunk in iter(lambda: f.read(65536), b""):
                sha256.update(chunk)
        actual = sha256.hexdigest()
        if actual != record["sha256"]:
            raise RuntimeError(
                f"INTEGRITY FAILURE: {name} ({record['path']}) "
                f"hash mismatch. Expected {record['sha256'][:16]}... "
                f"Got {actual[:16]}... HALT DEPLOYMENT."
            )
    return True

For high-security environments, supplement SHA-256 with a hardware security module (HSM) that holds the signing key for the manifest. If the manifest is unsigned, an attacker who can modify the model can also modify the manifest.

Named failure modes

Supply chain compromise. A malicious model is staged into the air-gapped environment via the legitimate ingress process, bypassing network controls. Mitigation: cryptographic manifest verification at every boundary crossing, with the manifest signed by a key held outside the staging pipeline.

Model staleness drift. The model’s training cutoff is January 2025. Your organisation continues to use it in 2027. Users ask questions about events that happened after the cutoff; the model answers confidently and incorrectly. Mitigation: track training cutoff dates in the manifest, display them to users, and establish a replacement cadence.

Dependency update failure. The Python environment inside the air-gap uses library versions that were current at staging time. Over months, CVEs are discovered in those libraries but cannot be patched because pip install has no egress. Mitigation: treat OS and Python dependency updates as part of your patch ingress process, not just model updates.

VRAM planning error. The model that worked in the staging environment fails to load in the air-gapped environment because the hardware configuration differs. Mitigation: staging must use identical hardware to production, or VRAM requirements must be explicitly documented and verified against target hardware before transfer.

Embedding model / inference model version mismatch. The embedding model was updated but the vector index was not re-built. New documents are embedded with the new model while old documents are embedded with the old model — similarity search produces inconsistent results. Mitigation: version embedding models alongside index builds; rebuilding the index is required when the embedding model changes.

Regulatory frameworks and their specific requirements

GDPR Article 44-49 (transfers): Personal data transfers to third countries require either adequacy decisions or appropriate safeguards. Sending queries containing personal data to a cloud API may constitute a transfer. Self-hosted with no egress eliminates this question.

NIS2 Directive (EU, 2024): Critical infrastructure operators must ensure AI systems used in critical functions meet security requirements that may effectively mandate on-premise deployment for classified workloads.

FedRAMP / DoD IL (US): Government cloud frameworks allow cloud but require specific authorization levels. Air-gapped solutions above IL4 require on-premise or private cloud in approved facilities.

Sovereign & Air-Gapped AI Architecture