Skip to main content
Reference

Deployment Patterns

Docker, Kubernetes, serverless deployment — plus rollback strategies, canary deployments, health check endpoints, and CI/CD pipelines.

When your harness is tested and ready, deployment is the next critical step. This document covers containerization, orchestration, CI/CD pipelines, and scaling strategies for production AI agent harnesses.

In simple terms: How to take your local harness, package it reliably, deploy it to production, scale it, and monitor the deployment.


Part 1: Docker & Containerization

Why Docker for Harnesses

Containers solve several harness deployment problems:

  1. Environment consistency — “Works on my machine” becomes irrelevant
  2. Dependency isolation — Python version, library versions, system packages all versioned
  3. Resource limits — CPU and memory caps prevent runaway agents
  4. Stateless packaging — Deploy identical images to any environment
  5. Rollback capability — Keep previous images, revert instantly if new version breaks

Dockerfile Template for Python Harness

This template works for most harnesses (Claw-Code based, LangChain, etc.):

# Multi-stage build: development vs production
# Stage 1: Builder (install dependencies)
FROM python:3.11-slim as builder

WORKDIR /build

# Install system dependencies needed for Python packages
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    git \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements file
COPY requirements.txt .

# Create virtual environment in /build/venv
RUN python -m venv /build/venv
ENV PATH="/build/venv/bin:$PATH"

# Install Python dependencies
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# Stage 2: Runtime (minimal, production image)
FROM python:3.11-slim

WORKDIR /app

# Install only runtime dependencies (no build tools)
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy virtual environment from builder
COPY --from=builder /build/venv /app/venv

# Copy application code
COPY . .

# Set environment variables
ENV PATH="/app/venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
ENV PYTHONHASHSEED=random

# Health check (responds to Docker health checks)
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1

# Run harness
CMD ["python", "main.py"]

Optimizing Dockerfile for Harnesses

1. Minimize layer count — Fewer layers = smaller images

# Bad: Each RUN creates a new layer
RUN apt-get update
RUN apt-get install -y package1
RUN apt-get install -y package2
RUN apt-get clean

# Good: One layer with all commands
RUN apt-get update && apt-get install -y package1 package2 && rm -rf /var/lib/apt/lists/*

2. Cache busting strategy — Put frequently changing code last

# COPY requirements first (rarely changes)
COPY requirements.txt .
RUN pip install -r requirements.txt

# COPY code last (changes on every build)
COPY . .

3. Size optimization — Multi-stage builds remove build dependencies

Builder stage:    500 MB (with gcc, git, build tools)
Runtime stage:    150 MB (python + pip packages only)
Deployed image:   150 MB (not 500 MB)

Running Containers Locally

Test your harness in a container before deploying:

# Build the image
docker build -t my-harness:latest .

# Run interactively
docker run -it \
  -e OPENAI_API_KEY="sk-..." \
  -e CLAUDE_API_KEY="..." \
  -v /path/to/local/data:/app/data \
  my-harness:latest

# Run with resource limits
docker run \
  --cpus="2.0" \
  --memory="4g" \
  -e OPENAI_API_KEY="sk-..." \
  my-harness:latest

# Run in background, see logs
docker run -d --name harness-prod my-harness:latest
docker logs -f harness-prod

# Stop and remove
docker stop harness-prod
docker rm harness-prod

Docker Compose for Local Development

Run harness + dependencies (database, cache, queue) locally:

# docker-compose.yml
version: '3.8'

services:
  harness:
    build: .
    container_name: harness-dev
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - CLAUDE_API_KEY=${CLAUDE_API_KEY}
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/harness
      - REDIS_URL=redis://redis:6379
      - LOG_LEVEL=DEBUG
    ports:
      - "8000:8000"  # API port
    volumes:
      - .:/app  # Live code reload
      - ./logs:/app/logs
    depends_on:
      - postgres
      - redis
    networks:
      - harness-network

  postgres:
    image: postgres:15-alpine
    container_name: harness-postgres
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=harness
    ports:
      - "5432:5432"
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - harness-network

  redis:
    image: redis:7-alpine
    container_name: harness-redis
    ports:
      - "6379:6379"
    networks:
      - harness-network

  # Optional: Prometheus for metrics
  prometheus:
    image: prom/prometheus:latest
    container_name: harness-prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - harness-network

volumes:
  postgres-data:
  prometheus-data:

networks:
  harness-network:
    driver: bridge

Usage:

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f harness

# Stop all services
docker-compose down

# Clean up volumes (careful!)
docker-compose down -v

Image Tagging & Registry Strategy

Tag images for different environments:

# Local development
docker build -t my-harness:dev .

# Staging (before production)
docker build -t my-harness:staging .
docker push my-registry/my-harness:staging

# Production (immutable release)
docker build -t my-harness:v1.2.3 .
docker tag my-harness:v1.2.3 my-registry/my-harness:latest
docker push my-registry/my-harness:v1.2.3
docker push my-registry/my-harness:latest

# Tag for rollback capability
docker tag my-registry/my-harness:v1.2.3 my-registry/my-harness:stable
docker push my-registry/my-harness:stable

Part 2: Kubernetes Deployment (For Scaling)

When to Use Kubernetes vs Simpler Options

OptionComplexityScalingUse Case
Docker locallyLowManualDevelopment, testing
Docker + SystemdLow-MediumManual, scriptedSmall production (1-10 containers)
Docker ComposeLow-MediumLimitedMulti-container locally, simple staging
Kubernetes (K8s)HighAutomatic>10 containers, complex orchestration, auto-scale

Start with simpler options, migrate to K8s when you have:

  • Multiple harness instances (horizontal scaling needed)
  • Complex dependencies (database, cache, queue, monitoring)
  • Need for automatic failover and recovery
  • Multi-region or multi-cloud deployment

Kubernetes Architecture for Harnesses

┌─────────────────────────────────────────┐
│         Load Balancer (K8s Service)      │
│  Routes requests to healthy pods        │
└──────────┬──────────────────────────────┘

    ┌──────┴────────┐
    │               │
┌───▼──┐       ┌───▼──┐      ┌───────┐
│Pod 1 │       │Pod 2 │ .... │Pod N  │
│Harness       │Harness       │Harness│
│Container     │Container     │Container
│(CPU/Memory   │(CPU/Memory   │(CPU/Memory
│ limits)      │ limits)      │ limits)
└───┬──┘       └───┬──┘      └───┬───┘
    │              │             │
    └──────────────┼─────────────┘

         ┌─────────▼──────────┐
         │ Horizontal Pod      │
         │ Autoscaler (HPA)    │
         │ Scales on queue len │
         └────────────────────┘

Kubernetes Deployment Manifest

Basic Deployment for harness (covers most scenarios):

# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: harness
  namespace: default
  labels:
    app: harness
    version: v1
spec:
  replicas: 3  # Start with 3, HPA will adjust
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: harness
  template:
    metadata:
      labels:
        app: harness
        version: v1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8000"
        prometheus.io/path: "/metrics"
    spec:
      # Pod Disruption Budget (high availability during updates)
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - harness
              topologyKey: kubernetes.io/hostname
      
      containers:
      - name: harness
        image: my-registry/my-harness:v1.2.3
        imagePullPolicy: IfNotPresent
        
        # Port the harness listens on
        ports:
        - name: http
          containerPort: 8000
          protocol: TCP
        - name: metrics
          containerPort: 9090
          protocol: TCP
        
        # Environment variables
        env:
        - name: ENVIRONMENT
          value: "production"
        - name: LOG_LEVEL
          value: "INFO"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: harness-secrets
              key: database-url
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: harness-secrets
              key: openai-api-key
        - name: CLAUDE_API_KEY
          valueFrom:
            secretKeyRef:
              name: harness-secrets
              key: claude-api-key
        
        # Resource requests and limits (critical!)
        resources:
          requests:
            memory: "1Gi"        # Minimum guaranteed
            cpu: "500m"
          limits:
            memory: "4Gi"        # Never use more than this
            cpu: "2"
        
        # Liveness probe (is pod alive?)
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        # Readiness probe (ready to serve traffic?)
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 5
          failureThreshold: 3
        
        # Startup probe (graceful startup)
        startupProbe:
          httpGet:
            path: /health
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          failureThreshold: 30
        
        # Graceful shutdown
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]
        
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          capabilities:
            drop:
            - ALL
        
        # Volume mounts
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: logs
          mountPath: /app/logs
      
      # Volumes
      volumes:
      - name: tmp
        emptyDir: {}
      - name: logs
        emptyDir: {}
      
      # ImagePullSecrets for private registries
      imagePullSecrets:
      - name: registry-credentials
      
      # Termination grace period (time for graceful shutdown)
      terminationGracePeriodSeconds: 30

Kubernetes Service (Load Balancer)

Routes traffic to harness pods:

# k8s-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: harness
  namespace: default
  labels:
    app: harness
spec:
  type: LoadBalancer  # Or ClusterIP for internal, NodePort for external
  selector:
    app: harness
  ports:
  - name: http
    port: 80
    targetPort: 8000
    protocol: TCP
  - name: metrics
    port: 9090
    targetPort: 9090
    protocol: TCP
  
  # Session affinity for stateful agents
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600

Horizontal Pod Autoscaler (HPA)

Auto-scale based on metrics:

# k8s-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: harness-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: harness
  
  minReplicas: 2    # Never scale below 2 (high availability)
  maxReplicas: 20   # Never scale above 20 (cost control)
  
  metrics:
  # CPU-based scaling (30% target utilization)
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30
  
  # Memory-based scaling (75% target utilization)
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75
  
  # Custom metric: queue depth (requires monitoring setup)
  - type: Pods
    pods:
      metric:
        name: queue_depth
      target:
        type: AverageValue
        averageValue: "100"  # 1 pod per 100 items in queue
  
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100        # Double pods when scaling up
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50         # Remove 50% when scaling down
        periodSeconds: 15

ConfigMap & Secrets

Store configuration and sensitive data:

# k8s-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: harness-config
  namespace: default
data:
  # Non-sensitive configuration
  logging.level: INFO
  prometheus.enabled: "true"
  queue.max_retries: "3"
  health_check.interval_seconds: "30"

---
# k8s-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: harness-secrets
  namespace: default
type: Opaque
data:
  # Base64 encoded (use: echo -n 'value' | base64)
  database-url: cG9zdGdyZXM6Ly91c2VyOnBhc3NAaG9zdDpwb3J0L2Ri
  openai-api-key: c2stLnNvbWVrZXloZXJl
  claude-api-key: Y2wtc29tZWtleWhlcmU=

NEVER commit secrets to git! Use your cloud provider’s secret manager:

  • AWS: Secrets Manager or Parameter Store
  • Google Cloud: Secret Manager
  • Azure: Key Vault
  • Kubernetes: External Secrets Operator (to sync from above)

StatefulSet for Persistent State

If harness needs persistent storage (session files, local memory database):

# k8s-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: harness
  namespace: default
spec:
  serviceName: harness
  replicas: 3
  selector:
    matchLabels:
      app: harness
  template:
    metadata:
      labels:
        app: harness
    spec:
      containers:
      - name: harness
        image: my-registry/my-harness:v1.2.3
        ports:
        - containerPort: 8000
        
        volumeMounts:
        - name: data
          mountPath: /app/data
  
  # Persistent volumes for each replica
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 10Gi

Part 3: Serverless Deployment

When Serverless Makes Sense

CriterionServerless ✅Containers ❌
Invocation patternBursty, event-drivenContinuous, streaming
Request duration<5 min (Lambda), <1 hour (Cloud Functions)Any duration
Startup time sensitivityAcceptable (cold start 1-10s)Need fast startup
Persistent stateMinimal (stateless)Complex (persistent DB, cache)
Cost modelPay per executionPay per hour/month
Cost threshold<100K requests/month>100K requests/month

Recommendation: Use serverless for:

  • Event-driven harnesses (web hook → process → respond)
  • Scheduled jobs (cron → batch processing)
  • Infrequent inference (< 100 requests/day)

Don’t use serverless for:

  • Streaming agents (continuous interactions)
  • Long-running tasks (>15 minutes)
  • Stateful agents (complex session management)
  • High-frequency APIs (>100 requests/sec)

AWS Lambda Deployment

Harness entry point for Lambda:

# lambda_handler.py
import json
import os
from harness import AgentHarness

# Initialize harness once (reused across invocations)
harness = AgentHarness(
    model="claude-3-5-sonnet",
    system_prompt=os.environ.get("SYSTEM_PROMPT"),
    api_key=os.environ.get("CLAUDE_API_KEY")
)

def lambda_handler(event, context):
    """
    AWS Lambda entry point.
    
    event: API Gateway event or SNS message
    context: Lambda execution context
    """
    
    try:
        # Parse input
        if 'body' in event:  # API Gateway
            body = json.loads(event['body'])
            user_input = body.get('input')
        else:  # SNS or direct invocation
            user_input = event.get('input')
        
        # Run harness
        result = harness.run(user_input)
        
        # Return success
        return {
            'statusCode': 200,
            'body': json.dumps({
                'status': 'success',
                'result': result,
                'cost': harness.last_cost,
                'tokens_used': harness.last_tokens
            })
        }
    
    except Exception as e:
        # Return error
        return {
            'statusCode': 500,
            'body': json.dumps({
                'status': 'error',
                'error': str(e)
            })
        }

# For batch processing (SQS)
def sqs_handler(event, context):
    """Process messages from SQS queue."""
    for record in event['Records']:
        body = json.loads(record['body'])
        user_input = body.get('input')
        
        result = harness.run(user_input)
        
        # Could write results to S3, database, or SQS output queue
        print(f"Processed: {result}")
    
    return {'statusCode': 200}

Lambda deployment:

# Package with dependencies
pip install -r requirements.txt -t lambda_package/
cp lambda_handler.py lambda_package/
cd lambda_package && zip -r ../lambda_deployment.zip . && cd ..

# Deploy to Lambda
aws lambda create-function \
  --function-name my-harness \
  --runtime python3.11 \
  --role arn:aws:iam::ACCOUNT:role/lambda-execution-role \
  --handler lambda_handler.lambda_handler \
  --zip-file fileb://lambda_deployment.zip \
  --timeout 60 \
  --memory-size 1024 \
  --environment Variables="{CLAUDE_API_KEY=sk-...,SYSTEM_PROMPT=You are...}"

# Update function code
aws lambda update-function-code \
  --function-name my-harness \
  --zip-file fileb://lambda_deployment.zip

Google Cloud Functions Deployment

# main.py (Google Cloud Functions)
import functions_framework
from harness import AgentHarness

harness = AgentHarness(
    model="claude-3-5-sonnet",
    api_key=os.environ.get("CLAUDE_API_KEY")
)

@functions_framework.http
def process_request(request):
    """HTTP Cloud Function."""
    request_json = request.get_json(silent=True)
    user_input = request_json.get('input')
    
    result = harness.run(user_input)
    
    return {
        'status': 'success',
        'result': result,
        'cost': harness.last_cost
    }

@functions_framework.cloud_event
def process_event(cloud_event):
    """Event Cloud Function (Pub/Sub, Cloud Storage)."""
    import base64
    
    payload = base64.b64decode(cloud_event.data["message"]["data"])
    user_input = payload.decode('utf-8')
    
    result = harness.run(user_input)
    print(f"Processed: {result}")

Deployment:

gcloud functions deploy process_request \
  --runtime python311 \
  --trigger-http \
  --allow-unauthenticated \
  --set-env-vars CLAUDE_API_KEY=sk-...

Azure Functions Deployment

# function_app.py (Azure Functions)
import azure.functions as func
from harness import AgentHarness

harness = AgentHarness(model="claude-3-5-sonnet")

app = func.FunctionApp()

@app.route(route="process")
def http_trigger(req: func.HttpRequest) -> func.HttpResponse:
    """HTTP triggered function."""
    try:
        req_body = req.get_json()
        user_input = req_body.get('input')
        
        result = harness.run(user_input)
        
        return func.HttpResponse(
            json.dumps({'status': 'success', 'result': result}),
            status_code=200
        )
    except Exception as e:
        return func.HttpResponse(
            json.dumps({'status': 'error', 'error': str(e)}),
            status_code=500
        )

@app.queue_trigger(arg_name="msg", queue_name="harness-queue")
def queue_trigger(msg: func.InputStream):
    """Queue triggered function."""
    import json
    body = json.loads(msg.getvalue())
    user_input = body.get('input')
    
    result = harness.run(user_input)
    logging.info(f"Processed: {result}")

Deployment:

az functionapp create \
  --resource-group mygroup \
  --consumption-plan-location centralus \
  --runtime python \
  --functions-version 4 \
  --name my-harness

func azure functionapp publish my-harness

Cold Start Mitigation

Serverless functions have cold starts (first invocation slower). Strategies:

  1. Keep initialization lightweight

    # Initialize once, reuse across invocations
    harness = AgentHarness()  # Outside handler
    
    def handler(event, context):
        result = harness.run(...)  # Reuses initialized harness
  2. Provision concurrent executions (keep warm)

    # AWS: Reserve concurrency
    aws lambda put-provisioned-concurrency-config \
      --function-name my-harness \
      --provisioned-concurrent-executions 10 \
      --qualifier LIVE
  3. Use scheduled warmups

    # CloudWatch rule: invoke every 5 minutes
    aws events put-rule \
      --name warmup-harness \
      --schedule-expression "rate(5 minutes)"
    
    aws events put-targets \
      --rule warmup-harness \
      --targets "Id"="1","Arn"="arn:aws:lambda:..."
  4. Language choice (Go/Node faster than Python, Python faster than Java)


Part 4: CI/CD Pipelines

Complete CI/CD Workflow

Pipeline stages:

Code Push (main)

1. Lint & Format Check

2. Unit Tests

3. Security Scan

4. Build Docker Image

5. Deploy to Staging

6. Integration Tests

7. Deploy to Production

8. Smoke Tests

Complete

GitHub Actions Pipeline

Complete, production-ready workflow:

# .github/workflows/deploy.yml
name: Deploy Harness

on:
  push:
    branches: [main]
    paths:
      - 'src/**'
      - 'Dockerfile'
      - 'requirements.txt'
      - '.github/workflows/deploy.yml'

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # Job 1: Lint and format checks
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install lint tools
        run: |
          pip install pylint flake8 black isort
      
      - name: Run Black (formatter)
        run: black --check src/
      
      - name: Run isort (import sorter)
        run: isort --check-only src/
      
      - name: Run Flake8 (linter)
        run: flake8 src/ --max-line-length=120
      
      - name: Run Pylint (strict linter)
        run: pylint src/ --fail-under=8.0

  # Job 2: Unit tests
  test:
    runs-on: ubuntu-latest
    needs: lint
    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov pytest-timeout
      
      - name: Run unit tests
        run: pytest tests/unit/ -v --tb=short --timeout=10
      
      - name: Generate coverage report
        run: pytest tests/unit/ --cov=src --cov-report=xml
      
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage.xml
          fail_ci_if_error: true

  # Job 3: Security scan
  security:
    runs-on: ubuntu-latest
    needs: lint
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Bandit (security linter)
        run: |
          pip install bandit
          bandit -r src/ -ll -f json -o bandit-report.json
      
      - name: Check for hardcoded secrets
        run: |
          pip install detect-secrets
          detect-secrets scan --baseline .secretsbaseline
      
      - name: Dependency vulnerability check
        run: |
          pip install safety
          safety check --file requirements.txt

  # Job 4: Build Docker image
  build:
    runs-on: ubuntu-latest
    needs: [lint, test, security]
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      
      - name: Log in to Container Registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=ref,event=branch
            type=sha
      
      - name: Build and push Docker image
        id: build
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # Job 5: Deploy to staging
  deploy-staging:
    runs-on: ubuntu-latest
    needs: build
    environment:
      name: staging
      url: https://harness-staging.example.com
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      
      - name: Deploy to ECS/K8s
        run: |
          # Example: ECS task definition update
          aws ecs update-service \
            --cluster harness-staging \
            --service harness \
            --force-new-deployment
      
      - name: Wait for deployment
        run: |
          aws ecs wait services-stable \
            --cluster harness-staging \
            --services harness
      
      - name: Smoke test staging
        run: |
          curl -f https://harness-staging.example.com/health || exit 1
          python tests/smoke/test_staging.py

  # Job 6: Integration tests
  integration-test:
    runs-on: ubuntu-latest
    needs: deploy-staging
    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest
      
      - name: Run integration tests against staging
        run: pytest tests/integration/ -v --tb=short
        env:
          HARNESS_URL: https://harness-staging.example.com

  # Job 7: Deploy to production
  deploy-prod:
    runs-on: ubuntu-latest
    needs: [build, integration-test]
    environment:
      name: production
      url: https://harness.example.com
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
          role-to-assume: arn:aws:iam::ACCOUNT:role/github-actions-deploy
      
      - name: Deploy to production (blue-green)
        run: |
          # Create new task definition
          TASK_DEF=$(aws ecs describe-task-definition \
            --task-definition harness \
            --query taskDefinition | jq '.[] |= .image="${{ needs.build.outputs.image-tag }}"')
          
          TASK_VERSION=$(aws ecs register-task-definition \
            --cli-input-json "$TASK_DEF" \
            --query taskDefinition.taskDefinitionArn)
          
          # Update service (triggers blue-green deployment)
          aws ecs update-service \
            --cluster harness-prod \
            --service harness \
            --task-definition "$TASK_VERSION"
      
      - name: Wait for deployment
        run: |
          aws ecs wait services-stable \
            --cluster harness-prod \
            --services harness
      
      - name: Smoke test production
        run: curl -f https://harness.example.com/health

  # Job 8: Post-deployment validation
  validate:
    runs-on: ubuntu-latest
    needs: deploy-prod
    if: always()
    steps:
      - name: Check deployment status
        if: needs.deploy-prod.result == 'failure'
        run: |
          echo "Production deployment failed! Rolling back..."
          # Rollback to previous version
          aws ecs update-service \
            --cluster harness-prod \
            --service harness \
            --task-definition harness:PREVIOUS_VERSION \
            --force-new-deployment
      
      - name: Send notification
        run: |
          # Send to Slack, email, etc.
          curl -X POST https://hooks.slack.com/... \
            -d '{"text": "Harness deployed: ${{ needs.build.outputs.image-tag }}"}'

Pre-Deployment Validation in CI/CD

Add quality gates before production:

# Before deploy-prod step, add validation
- name: Validate deployment readiness
  run: |
    # Check all required environment variables
    [ -n "${{ secrets.CLAUDE_API_KEY }}" ] || exit 1
    [ -n "${{ secrets.DATABASE_URL }}" ] || exit 1
    
    # Verify staging tests passed
    python tests/validate_staging_metrics.py
    
    # Check cost projection
    python tests/validate_budget.py --max-daily-cost 100
    
    # Verify no hardcoded secrets in code
    grep -r "sk-" src/ && exit 1
    grep -r "claude_" src/ && exit 1

Part 5: Environment Configuration

Configuration Hierarchy

Environments should be identical except for specific overrides:

base/
  ├── Dockerfile (same for all)
  ├── requirements.txt (same for all)
  ├── k8s-deployment.yaml (use for all)

development/
  ├── .env (local API keys, debug flags)
  │   LOG_LEVEL=DEBUG
  │   CLAUDE_API_KEY=<test key>

staging/
  ├── .env.staging (pre-prod verification)
  │   LOG_LEVEL=INFO
  │   CLAUDE_API_KEY=<staging key>
  │   BUDGET_DAILY=100

production/
  ├── secrets/ (managed by cloud provider)
  │   LOG_LEVEL=WARNING
  │   CLAUDE_API_KEY=<from AWS Secrets Manager>
  │   BUDGET_DAILY=500

Configuration Management

Use environment variables (twelve-factor app):

# config.py
import os
from dataclasses import dataclass
from typing import Literal

Environment = Literal["development", "staging", "production"]

@dataclass
class Config:
    # Environment identification
    environment: Environment = os.environ.get("ENVIRONMENT", "development")
    
    # API Keys (from secrets manager, never hardcoded)
    claude_api_key: str = os.environ.get("CLAUDE_API_KEY", "")
    openai_api_key: str = os.environ.get("OPENAI_API_KEY", "")
    
    # Database
    database_url: str = os.environ.get(
        "DATABASE_URL",
        "postgresql://localhost/harness"
    )
    database_pool_size: int = int(os.environ.get("DATABASE_POOL_SIZE", "10"))
    
    # Logging
    log_level: str = os.environ.get("LOG_LEVEL", "INFO")
    log_json: bool = os.environ.get("LOG_JSON", "true").lower() == "true"
    
    # Budget and cost control
    budget_daily_usd: float = float(os.environ.get("BUDGET_DAILY", "100"))
    budget_monthly_usd: float = float(os.environ.get("BUDGET_MONTHLY", "3000"))
    
    # Scaling
    min_replicas: int = int(os.environ.get("MIN_REPLICAS", "2"))
    max_replicas: int = int(os.environ.get("MAX_REPLICAS", "10"))
    
    # Feature flags
    enable_streaming: bool = os.environ.get("ENABLE_STREAMING", "true").lower() == "true"
    enable_caching: bool = os.environ.get("ENABLE_CACHING", "true").lower() == "true"
    
    @property
    def is_production(self) -> bool:
        return self.environment == "production"
    
    @property
    def is_staging(self) -> bool:
        return self.environment == "staging"
    
    def validate(self):
        """Ensure all required settings are present."""
        if not self.claude_api_key and not self.openai_api_key:
            raise ValueError("At least one API key must be set")
        
        if not self.database_url:
            raise ValueError("DATABASE_URL must be set")
        
        if self.budget_daily_usd <= 0:
            raise ValueError("BUDGET_DAILY must be > 0")

# Usage in harness
config = Config()
config.validate()

harness = AgentHarness(
    model="claude-3-5-sonnet",
    api_key=config.claude_api_key,
    log_level=config.log_level,
    budget_daily=config.budget_daily_usd
)

Secret Management in CI/CD

AWS Secrets Manager:

# Store secret
aws secretsmanager create-secret \
  --name harness/claude-api-key \
  --secret-string "sk-..."

# Retrieve in CI/CD
aws secretsmanager get-secret-value \
  --secret-id harness/claude-api-key \
  --query SecretString \
  --output text

GitHub Actions Secrets:

# Use in workflow
- name: Deploy
  env:
    CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
  run: ./deploy.sh

Never commit .env files!:

# .gitignore
.env
.env.local
.env.*.local
*.p8
*.p12
*.pem
*.key
credentials.json

Part 6: Scaling Strategies

Horizontal Scaling (Multiple Instances)

Load balancing approaches:

ApproachSetupProsCons
Round-robinLoad balancer (Nginx, K8s)SimpleUneven load if instance capacity varies
Least connectionsStateful load balancerBalancedRequires state tracking
Sticky sessionsHash client IPPreserves stateDoesn’t scale if one client slow
Queue-basedSQS, Celery, KafkaDecoupledExtra infrastructure

Round-robin (simplest):

# Nginx as load balancer
upstream harness_backend {
    server harness-1.internal:8000;
    server harness-2.internal:8000;
    server harness-3.internal:8000;
}

server {
    listen 80;
    location / {
        proxy_pass http://harness_backend;
    }
}

Queue-based scaling (best for variable load):

# Producer: Add to queue
import boto3
sqs = boto3.client('sqs')

def queue_harness_job(user_input):
    sqs.send_message(
        QueueUrl='https://sqs.us-east-1.amazonaws.com/ID/harness-queue',
        MessageBody=json.dumps({'input': user_input})
    )
    return {'status': 'queued'}

# Consumer: Process from queue
def harness_worker():
    while True:
        messages = sqs.receive_message(QueueUrl=queue_url, MaxNumberOfMessages=10)
        
        for msg in messages.get('Messages', []):
            body = json.loads(msg['Body'])
            result = harness.run(body['input'])
            
            # Delete from queue
            sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=msg['ReceiptHandle'])

# Scale based on queue depth
# CloudWatch alarm: if queue > 100, add 5 more worker instances

Cost Optimization in Scaling

1. Reserved instances (commit to uptime, get discount)

On-demand:     $0.10/hour
Reserved (1yr): $0.06/hour (40% discount)
Reserved (3yr): $0.04/hour (60% discount)

2. Spot instances (unused capacity, 70-90% discount)

# Kubernetes: Use spot instances for non-critical pods
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      preference:
        matchExpressions:
        - key: cloud.google.com/gke-preemptible
          operator: In
          values: ["true"]

3. Right-sizing instances (measure actual usage)

# Monitor CPU/memory in production
# If consistently <50% utilized, right-size down
kubectl top pods  # See actual usage
kubectl describe nodes  # See available resources

4. Caching to reduce API calls

# Cache successful results
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_embedding(text):
    # Only call API if not cached
    return model.embed(text)

Part 7: Monitoring & Health During Deployment

Deployment Health Checks

Pre-deployment:

# test_deployment_readiness.py
import requests
import time

def test_health_endpoint():
    """Verify health endpoint responds."""
    response = requests.get("http://localhost:8000/health", timeout=5)
    assert response.status_code == 200

def test_readiness():
    """Verify harness is ready to serve."""
    response = requests.get("http://localhost:8000/ready", timeout=5)
    assert response.status_code == 200
    data = response.json()
    assert data.get('database_connected')
    assert data.get('model_responsive')
    assert data.get('memory_accessible')

def test_database_connectivity():
    """Verify database connection."""
    from sqlalchemy import create_engine, text
    engine = create_engine(os.environ["DATABASE_URL"])
    with engine.connect() as conn:
        result = conn.execute(text("SELECT 1"))
        assert result.scalar() == 1

def test_api_key_validity():
    """Verify API keys work."""
    client = Anthropic(api_key=os.environ["CLAUDE_API_KEY"])
    response = client.messages.create(
        model="claude-3-5-sonnet",
        max_tokens=10,
        messages=[{"role": "user", "content": "Hi"}]
    )
    assert response.stop_reason == "end_turn"

During deployment (continuous monitoring):

# metrics.py
from prometheus_client import Counter, Gauge, Histogram

request_count = Counter('harness_requests_total', 'Total requests')
error_count = Counter('harness_errors_total', 'Total errors')
request_latency = Histogram('harness_request_latency_seconds', 'Request latency')
cost_gauge = Gauge('harness_cost_usd', 'Current run cost in USD')

def monitor_request(func):
    """Decorator to monitor request metrics."""
    def wrapper(*args, **kwargs):
        request_count.inc()
        start = time.time()
        
        try:
            result = func(*args, **kwargs)
            return result
        except Exception as e:
            error_count.inc()
            raise
        finally:
            latency = time.time() - start
            request_latency.observe(latency)
    
    return wrapper

Canary Deployments

Deploy to small percentage of traffic first:

# Istio VirtualService: 90% to stable, 10% to canary
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: harness
spec:
  hosts:
  - harness
  http:
  - match:
    - uri:
        prefix: "/"
    route:
    - destination:
        host: harness
        subset: stable
      weight: 90
    - destination:
        host: harness
        subset: canary
      weight: 10
    timeout: 10s
    retries:
      attempts: 3
      perTryTimeout: 5s

Monitor canary metrics:

# If canary error rate > 2%, rollback automatically
error_rate_canary = errors_canary / requests_canary
if error_rate_canary > 0.02:
    print("Canary error rate too high, rolling back")
    # Trigger rollback

Blue-Green Deployments

Keep two identical environments, switch instantly:

# Blue environment (current production)
version=v1.2.2
kubectl apply -f deployment-blue.yaml

# Green environment (new version)
version=v1.2.3
kubectl apply -f deployment-green.yaml

# Test green environment
curl http://green-harness.internal:8000/health

# Switch traffic to green
kubectl patch service harness -p '{"spec":{"selector":{"deployment":"green"}}}'

# Keep blue ready for instant rollback
# If issues detected, switch back: kubectl patch service harness -p '{"spec":{"selector":{"deployment":"blue"}}}'

Rollback Strategies

Immediate rollback (something obviously wrong):

# Revert to previous image tag
kubectl set image deployment/harness \
  harness=my-registry/my-harness:v1.2.2 \
  --record

# Or revert last deployment
kubectl rollout undo deployment/harness
kubectl rollout status deployment/harness

Gradual rollback (intermittent issues):

# Reduce traffic to new version gradually
for percentage in 80 60 40 20 0; do
  kubectl patch vs harness --patch "{
    \"spec\": {
      \"http\": [{
        \"route\": [
          {\"destination\": {\"subset\": \"stable\"}, \"weight\": $((100-percentage))},
          {\"destination\": {\"subset\": \"canary\"}, \"weight\": $percentage}
        ]
      }]
    }
  }"
  
  # Monitor metrics between changes
  sleep 60
  if error_rate > threshold; then
    break  # Stop rollback if improving
  fi
done

Part 8: Database & Persistence

Session Persistence in Distributed Deployments

When running multiple harness instances, sessions need shared storage:

# distributed_session.py
import json
from sqlalchemy import create_engine, Column, String, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from datetime import datetime

Base = declarative_base()

class SessionRecord(Base):
    """Persistent session storage."""
    __tablename__ = 'harness_sessions'
    
    session_id = Column(String(36), primary_key=True)
    agent_id = Column(String(100))
    user_id = Column(String(100))
    created_at = Column(DateTime, default=datetime.utcnow)
    last_updated = Column(DateTime, onupdate=datetime.utcnow)
    
    # Serialized session state
    memory_json = Column(Text)
    context_window = Column(Text)
    feature_progress = Column(Text)
    iteration_count = Column(Integer, default=0)

# Usage
engine = create_engine(os.environ["DATABASE_URL"])
SessionLocal = sessionmaker(bind=engine)

class DistributedHarness(AgentHarness):
    def __init__(self, session_id, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.session_id = session_id
        self.db_session = SessionLocal()
        self.load_session()
    
    def load_session(self):
        """Load session from database."""
        record = self.db_session.query(SessionRecord).filter_by(
            session_id=self.session_id
        ).first()
        
        if record:
            self.memory = json.loads(record.memory_json)
            self.context = json.loads(record.context_window)
            self.iteration_count = record.iteration_count
    
    def save_session(self):
        """Persist session to database."""
        record = self.db_session.query(SessionRecord).filter_by(
            session_id=self.session_id
        ).first() or SessionRecord(session_id=self.session_id)
        
        record.memory_json = json.dumps(self.memory)
        record.context_window = json.dumps(self.context)
        record.iteration_count = self.iteration_count
        
        self.db_session.add(record)
        self.db_session.commit()

# Run with automatic persistence
harness = DistributedHarness(session_id="sess-abc123")
result = harness.run("Do something")
harness.save_session()  # Automatically saves to database

State Database Choices

OptionBest ForProsCons
SQLiteSingle-process harnessesMinimal setup, embeddedNot multi-process safe
PostgreSQLDistributed harnessesRobust, scalable, ACIDMore infrastructure
RedisFast, in-memory sessionsSub-millisecond speedData loss on restart
DynamoDBAWS-only deploymentManaged, serverlessVendor lock-in
MongoDBDocument-style stateFlexible schema, horizontalConsistency challenges

Recommendation: PostgreSQL for production (most reliable, widely supported).

Backup & Recovery

# backup.py
import subprocess
import boto3
from datetime import datetime

def backup_database():
    """Backup PostgreSQL to S3."""
    timestamp = datetime.utcnow().isoformat()
    backup_file = f"/tmp/harness-backup-{timestamp}.sql"
    
    # Dump database
    subprocess.run([
        "pg_dump",
        "-U", os.environ["DB_USER"],
        "-h", os.environ["DB_HOST"],
        os.environ["DB_NAME"]
    ], stdout=open(backup_file, 'w'))
    
    # Upload to S3
    s3 = boto3.client('s3')
    s3.upload_file(
        backup_file,
        'harness-backups',
        f"backups/{timestamp}.sql"
    )
    
    print(f"Backup complete: {timestamp}")

# Run daily via cron or Lambda
# 0 2 * * * python backup.py

Part 9: Pre-Deployment Checklist

Complete this checklist before every production deployment:

Infrastructure & Operations

  • Kubernetes/Container manifests created and validated
  • Resource limits set (CPU, memory)
  • Health checks configured (liveness, readiness, startup probes)
  • Persistence configured (database, cache, volumes)
  • Load balancer configured with health checks
  • Autoscaling policies defined (HPA or cloud-native)
  • Secrets manager integration working (AWS Secrets, GCP Secret Manager, etc.)

Environment Configuration

  • All environment variables set correctly for production
  • API keys in secrets manager (not code)
  • Database connection tested
  • Logging configured and working
  • Monitoring setup (Prometheus, Datadog, CloudWatch)
  • Alerts configured (budget, latency, errors)

Security Review

  • No hardcoded secrets in code or Docker image
  • Input validation implemented (sanitizing user input)
  • Output validation implemented (preventing data leaks)
  • Rate limiting configured
  • CORS configured correctly (if API)
  • Audit logging enabled
  • Security scanning passed in CI/CD (Bandit, Trivy)
  • Dependencies have no known CVEs (safety check)

Testing & Quality

  • Unit tests pass (100% passing)
  • Integration tests pass in staging
  • Load tests pass (expected QPS)
  • Quality baseline established and no regression
  • Cost projection within budget
  • Error handling tested (graceful degradation)

Monitoring & Observability

  • Structured logging configured (JSON)
  • Core metrics exposed:
    • Request latency (p50, p95, p99)
    • Request throughput
    • Error rate by type
    • Cost per request
    • Loop iterations (for agents)
  • Dashboard created and accessible
  • Alert thresholds set and tested
  • Budget tracking enabled with hard limits

Deployment Plan

  • Deployment strategy documented (canary, blue-green, rolling)
  • Rollback plan documented and tested
  • Expected downtime calculated (≥99.9% availability)
  • Gradual rollout plan (5% → 25% → 100%)
  • Monitoring interval defined (how often check metrics?)

Post-Deployment

  • Smoke tests ready to run immediately
  • On-call rotation established
  • Incident playbook ready (what to do if something breaks)
  • Rollback command documented and practiced

Backup & Recovery

  • Database backups scheduled
  • Backup restoration tested
  • Data retention policy defined
  • Disaster recovery plan documented

Cost & Capacity

  • Cost projection validated (daily, monthly)
  • Budget alerts configured
  • Scaling limits set (max replicas, max instances)
  • Cost per request calculated and within budget

  • Observability & Monitoring: See 09_operations_and_observability.md for detailed logging, metrics, cost tracking
  • Security & Safety: See 10_security_and_safety.md for input validation, secret management, compliance
  • Testing & QA: See 11_testing_and_qa.md for pre-deployment validation, quality baselines, regression detection
  • Architecture: See 06_harness_architecture.md for complete system design patterns

Checklist Summary

PhaseKey Deliverables
DevelopmentDockerfile, docker-compose.yml, local testing
CI/CDGitHub Actions workflow (lint → test → build → deploy)
StagingK8s manifests, ConfigMap/Secrets, integration tests
ProductionDeployment strategy, monitoring, alerts, rollback plan
OperationsHealth checks, metrics, logging, incident playbook

Part 10: Rollback Strategy

Why Rollback Planning Is Non-Negotiable

A broken harness deployment can burn through API budget, return hallucinated results, or loop indefinitely — all silently. Every deployment must have a tested rollback path before going live.

Blue-Green Deployment for Harnesses

Maintain two identical environments. Only one receives live traffic at a time:

                    ┌─────────────┐
   Traffic ────────►│ Load Balancer│
                    └──────┬──────┘

              ┌────────────┼────────────┐
              │            │            │
         ┌────▼────┐  ┌────▼────┐      │
         │  BLUE   │  │  GREEN  │      │
         │ v1.2.2  │  │ v1.2.3  │      │
         │ (live)  │  │ (idle)  │      │
         └─────────┘  └─────────┘      │

Deployment flow:

# 1. Deploy new version to the idle environment (green)
kubectl apply -f deployment-green.yaml
kubectl rollout status deployment/harness-green --timeout=120s

# 2. Run smoke tests against green (not yet receiving live traffic)
curl -f http://harness-green.internal:8000/health || exit 1
python tests/smoke/test_green.py --url http://harness-green.internal:8000

# 3. Switch traffic from blue to green
kubectl patch service harness \
  -p '{"spec":{"selector":{"deployment":"green"}}}'

# 4. Monitor for 10 minutes (check error rate, latency, cost)
echo "Monitoring green deployment for 10 minutes..."
# Check metrics dashboard or query Prometheus

# 5. If problems detected, instant rollback (< 5 seconds)
kubectl patch service harness \
  -p '{"spec":{"selector":{"deployment":"blue"}}}'
echo "Rolled back to blue (v1.2.2)"

Canary Deployment (Gradual Traffic Shift)

Route a small percentage of traffic to the new version, increase only if metrics are healthy:

# Stage 1: 10% to canary
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: harness
spec:
  hosts: [harness]
  http:
  - route:
    - destination:
        host: harness
        subset: stable
      weight: 90
    - destination:
        host: harness
        subset: canary
      weight: 10
EOF

echo "Canary at 10% — monitoring for 15 minutes..."

# Stage 2: Check metrics before proceeding
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=rate(harness_errors_total{subset='canary'}[5m])" \
  | jq '.data.result[0].value[1]' -r)

if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
  echo "ERROR: Canary error rate ${ERROR_RATE} exceeds 5% — rolling back"
  kubectl apply -f virtualservice-stable-only.yaml
  exit 1
fi

# Stage 3: 50% to canary
kubectl patch virtualservice harness --type merge \
  -p '{"spec":{"http":[{"route":[{"destination":{"host":"harness","subset":"stable"},"weight":50},{"destination":{"host":"harness","subset":"canary"},"weight":50}]}]}}'

echo "Canary at 50% — monitoring for 15 minutes..."

# Stage 4: 100% to canary (promote)
kubectl patch virtualservice harness --type merge \
  -p '{"spec":{"http":[{"route":[{"destination":{"host":"harness","subset":"canary"},"weight":100}]}]}}'

echo "Canary promoted to 100% — new version is now stable"

Automated Rollback Triggers

Define clear conditions that trigger automatic rollback. Implement as a monitoring script or Prometheus alert rule:

# rollback_monitor.py
import time
import requests

PROMETHEUS_URL = "http://prometheus:9090"
ROLLBACK_TRIGGERS = {
    "error_rate_spike": {
        "query": 'rate(harness_errors_total[5m]) / rate(harness_requests_total[5m])',
        "threshold": 0.05,       # > 5% error rate
        "comparison": "greater",
        "description": "Error rate exceeded 5%"
    },
    "latency_increase": {
        "query": 'histogram_quantile(0.95, rate(harness_latency_step_ms_bucket[5m]))',
        "threshold": 15000,      # > 15 seconds p95
        "comparison": "greater",
        "description": "P95 latency exceeded 15 seconds"
    },
    "cost_overrun": {
        "query": 'increase(harness_cost_total_usd[1h])',
        "threshold": 50.0,       # > $50/hour
        "comparison": "greater",
        "description": "Hourly cost exceeded $50"
    },
}

def check_triggers() -> list[str]:
    """Check all rollback triggers, return list of fired triggers."""
    fired = []
    for name, trigger in ROLLBACK_TRIGGERS.items():
        resp = requests.get(f"{PROMETHEUS_URL}/api/v1/query", params={"query": trigger["query"]})
        result = resp.json()["data"]["result"]
        if not result:
            continue
        value = float(result[0]["value"][1])
        if trigger["comparison"] == "greater" and value > trigger["threshold"]:
            fired.append(f"{name}: {trigger['description']} (value={value:.4f})")
    return fired

def execute_rollback(reason: str):
    """Roll back to previous known-good version."""
    print(f"ROLLBACK TRIGGERED: {reason}")

    # Kubernetes: undo last deployment
    import subprocess
    subprocess.run(["kubectl", "rollout", "undo", "deployment/harness"], check=True)
    subprocess.run(["kubectl", "rollout", "status", "deployment/harness", "--timeout=120s"], check=True)

    # Notify the team
    requests.post("https://hooks.slack.com/services/YOUR/WEBHOOK/URL", json={
        "text": f":rotating_light: Harness auto-rollback triggered: {reason}"
    })

# Run as a monitoring loop (or wire into Prometheus Alertmanager)
if __name__ == "__main__":
    while True:
        triggered = check_triggers()
        if triggered:
            execute_rollback("; ".join(triggered))
            break
        time.sleep(30)  # Check every 30 seconds

Docker Rollback Commands (Quick Reference)

# Docker Compose: revert to previous image
docker-compose pull  # Pull previous tag
docker-compose up -d --no-deps harness

# Docker standalone: switch to previous tag
docker stop harness-prod
docker run -d --name harness-prod my-registry/my-harness:v1.2.2

# Kubernetes: undo last rollout
kubectl rollout undo deployment/harness
kubectl rollout status deployment/harness

# Kubernetes: rollback to specific revision
kubectl rollout history deployment/harness          # List revisions
kubectl rollout undo deployment/harness --to-revision=3

# ECS: revert to previous task definition
aws ecs update-service \
  --cluster harness-prod \
  --service harness \
  --task-definition harness:42   # Previous task definition revision

Part 11: Health Check Endpoints

Why Health Checks Matter for Harnesses

Standard web services only need “is the server running?” checks. Harnesses need deeper validation: Is the model reachable? Are tools functional? Is there budget remaining? Without these, your load balancer routes traffic to a pod that accepts requests but cannot actually process them.

FastAPI Health Check Implementation

# health.py — Copy-paste ready for any FastAPI harness
from fastapi import FastAPI, Response
from datetime import datetime, timedelta
import time
import asyncio

app = FastAPI()

# Shared state (in production, use a proper state manager)
_last_model_check: float = 0
_model_healthy: bool = False
_startup_time: float = time.time()


@app.get("/health")
async def health():
    """
    Liveness probe: Is the process alive and responsive?
    Used by: Kubernetes livenessProbe, Docker HEALTHCHECK
    Should be fast (< 100ms) and never call external services.
    """
    return {
        "status": "healthy",
        "uptime_seconds": int(time.time() - _startup_time),
        "timestamp": datetime.utcnow().isoformat() + "Z"
    }


@app.get("/ready")
async def ready():
    """
    Readiness probe: Can this instance serve traffic right now?
    Used by: Kubernetes readinessProbe
    Checks all dependencies. If any fail, return 503 to stop traffic routing.
    """
    checks = {}

    # Check 1: Model API reachable (cached for 60s to avoid hammering)
    global _last_model_check, _model_healthy
    if time.time() - _last_model_check > 60:
        try:
            from anthropic import Anthropic
            client = Anthropic()
            resp = client.messages.create(
                model="claude-sonnet-4",
                max_tokens=5,
                messages=[{"role": "user", "content": "ping"}]
            )
            _model_healthy = resp.stop_reason is not None
        except Exception:
            _model_healthy = False
        _last_model_check = time.time()
    checks["model_responsive"] = _model_healthy

    # Check 2: Database connection
    try:
        from sqlalchemy import create_engine, text
        import os
        engine = create_engine(os.environ.get("DATABASE_URL", "sqlite:///test.db"))
        with engine.connect() as conn:
            conn.execute(text("SELECT 1"))
        checks["database_connected"] = True
    except Exception:
        checks["database_connected"] = False

    # Check 3: Budget remaining
    try:
        from harness.cost import get_daily_spend, get_daily_budget
        remaining = get_daily_budget() - get_daily_spend()
        checks["budget_remaining_usd"] = round(remaining, 2)
        checks["budget_ok"] = remaining > 0
    except Exception:
        checks["budget_ok"] = False

    # Check 4: Memory/disk accessible
    try:
        import os
        workspace = os.environ.get("WORKSPACE_DIR", "/app/data")
        checks["workspace_accessible"] = os.path.isdir(workspace)
    except Exception:
        checks["workspace_accessible"] = False

    # Overall verdict
    all_ok = all([
        checks.get("model_responsive", False),
        checks.get("database_connected", False),
        checks.get("budget_ok", False),
        checks.get("workspace_accessible", False),
    ])

    status_code = 200 if all_ok else 503
    return Response(
        content=json.dumps({
            "status": "ready" if all_ok else "not_ready",
            "checks": checks,
            "timestamp": datetime.utcnow().isoformat() + "Z"
        }),
        status_code=status_code,
        media_type="application/json"
    )


@app.get("/metrics")
async def metrics():
    """
    Prometheus-compatible metrics endpoint.
    Used by: Prometheus scraper (see k8s-deployment.yaml annotations)
    """
    from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
    return Response(
        content=generate_latest(),
        media_type=CONTENT_TYPE_LATEST
    )

Flask Health Check Implementation

# health_flask.py — For Flask-based harnesses
from flask import Flask, jsonify
import time
import json

app = Flask(__name__)
_startup_time = time.time()

@app.route("/health")
def health():
    """Liveness probe."""
    return jsonify({
        "status": "healthy",
        "uptime_seconds": int(time.time() - _startup_time)
    })

@app.route("/ready")
def ready():
    """Readiness probe with dependency checks."""
    checks = {}
    try:
        # Add your dependency checks here (same pattern as FastAPI above)
        checks["model_responsive"] = _check_model()
        checks["database_connected"] = _check_database()
        checks["budget_ok"] = _check_budget()
    except Exception as e:
        return jsonify({"status": "error", "error": str(e)}), 503

    all_ok = all(checks.values())
    return jsonify({
        "status": "ready" if all_ok else "not_ready",
        "checks": checks
    }), 200 if all_ok else 503

Wiring Health Checks to Kubernetes

Reference the deployment manifest in Part 2 of this document. The key fields are:

# Liveness: restart pod if /health fails 3 times
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  periodSeconds: 10
  failureThreshold: 3

# Readiness: stop sending traffic if /ready fails
readinessProbe:
  httpGet:
    path: /ready
    port: 8000
  periodSeconds: 5
  failureThreshold: 3

# Startup: give the pod time to initialize (model loading, etc.)
startupProbe:
  httpGet:
    path: /health
    port: 8000
  periodSeconds: 5
  failureThreshold: 30    # 30 * 5s = 150s max startup time

See Also

  • Doc 09 (Operations & Observability) — Monitoring and debugging production harnesses; deployment enables observability infrastructure
  • Doc 11 (Testing & QA) — Testing must complete before deploying; pre-deployment checklist validates readiness
  • Doc 10 (Security & Safety) — Security controls are deployed as part of container/Kubernetes configuration
  • Doc 20 (Integration Patterns) — Harness deployed as microservice integrates with other systems via these patterns

Changelog

  • April 2026: Created comprehensive deployment guide
    • Docker containerization patterns
    • Kubernetes manifests (Deployment, Service, HPA)
    • Serverless deployment (Lambda, Cloud Functions, Azure)
    • Complete CI/CD pipeline (GitHub Actions)
    • Environment configuration and secrets management
    • Scaling strategies and cost optimization
    • Pre-deployment checklist and monitoring patterns