Deployment Patterns
Docker, Kubernetes, serverless deployment — plus rollback strategies, canary deployments, health check endpoints, and CI/CD pipelines.
When your harness is tested and ready, deployment is the next critical step. This document covers containerization, orchestration, CI/CD pipelines, and scaling strategies for production AI agent harnesses.
In simple terms: How to take your local harness, package it reliably, deploy it to production, scale it, and monitor the deployment.
Part 1: Docker & Containerization
Why Docker for Harnesses
Containers solve several harness deployment problems:
- Environment consistency — “Works on my machine” becomes irrelevant
- Dependency isolation — Python version, library versions, system packages all versioned
- Resource limits — CPU and memory caps prevent runaway agents
- Stateless packaging — Deploy identical images to any environment
- Rollback capability — Keep previous images, revert instantly if new version breaks
Dockerfile Template for Python Harness
This template works for most harnesses (Claw-Code based, LangChain, etc.):
# Multi-stage build: development vs production
# Stage 1: Builder (install dependencies)
FROM python:3.11-slim as builder
WORKDIR /build
# Install system dependencies needed for Python packages
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements file
COPY requirements.txt .
# Create virtual environment in /build/venv
RUN python -m venv /build/venv
ENV PATH="/build/venv/bin:$PATH"
# Install Python dependencies
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
# Stage 2: Runtime (minimal, production image)
FROM python:3.11-slim
WORKDIR /app
# Install only runtime dependencies (no build tools)
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy virtual environment from builder
COPY --from=builder /build/venv /app/venv
# Copy application code
COPY . .
# Set environment variables
ENV PATH="/app/venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
ENV PYTHONHASHSEED=random
# Health check (responds to Docker health checks)
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1
# Run harness
CMD ["python", "main.py"]
Optimizing Dockerfile for Harnesses
1. Minimize layer count — Fewer layers = smaller images
# Bad: Each RUN creates a new layer
RUN apt-get update
RUN apt-get install -y package1
RUN apt-get install -y package2
RUN apt-get clean
# Good: One layer with all commands
RUN apt-get update && apt-get install -y package1 package2 && rm -rf /var/lib/apt/lists/*
2. Cache busting strategy — Put frequently changing code last
# COPY requirements first (rarely changes)
COPY requirements.txt .
RUN pip install -r requirements.txt
# COPY code last (changes on every build)
COPY . .
3. Size optimization — Multi-stage builds remove build dependencies
Builder stage: 500 MB (with gcc, git, build tools)
Runtime stage: 150 MB (python + pip packages only)
Deployed image: 150 MB (not 500 MB)
Running Containers Locally
Test your harness in a container before deploying:
# Build the image
docker build -t my-harness:latest .
# Run interactively
docker run -it \
-e OPENAI_API_KEY="sk-..." \
-e CLAUDE_API_KEY="..." \
-v /path/to/local/data:/app/data \
my-harness:latest
# Run with resource limits
docker run \
--cpus="2.0" \
--memory="4g" \
-e OPENAI_API_KEY="sk-..." \
my-harness:latest
# Run in background, see logs
docker run -d --name harness-prod my-harness:latest
docker logs -f harness-prod
# Stop and remove
docker stop harness-prod
docker rm harness-prod
Docker Compose for Local Development
Run harness + dependencies (database, cache, queue) locally:
# docker-compose.yml
version: '3.8'
services:
harness:
build: .
container_name: harness-dev
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- CLAUDE_API_KEY=${CLAUDE_API_KEY}
- DATABASE_URL=postgresql://postgres:password@postgres:5432/harness
- REDIS_URL=redis://redis:6379
- LOG_LEVEL=DEBUG
ports:
- "8000:8000" # API port
volumes:
- .:/app # Live code reload
- ./logs:/app/logs
depends_on:
- postgres
- redis
networks:
- harness-network
postgres:
image: postgres:15-alpine
container_name: harness-postgres
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=password
- POSTGRES_DB=harness
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- harness-network
redis:
image: redis:7-alpine
container_name: harness-redis
ports:
- "6379:6379"
networks:
- harness-network
# Optional: Prometheus for metrics
prometheus:
image: prom/prometheus:latest
container_name: harness-prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
ports:
- "9090:9090"
networks:
- harness-network
volumes:
postgres-data:
prometheus-data:
networks:
harness-network:
driver: bridge
Usage:
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f harness
# Stop all services
docker-compose down
# Clean up volumes (careful!)
docker-compose down -v
Image Tagging & Registry Strategy
Tag images for different environments:
# Local development
docker build -t my-harness:dev .
# Staging (before production)
docker build -t my-harness:staging .
docker push my-registry/my-harness:staging
# Production (immutable release)
docker build -t my-harness:v1.2.3 .
docker tag my-harness:v1.2.3 my-registry/my-harness:latest
docker push my-registry/my-harness:v1.2.3
docker push my-registry/my-harness:latest
# Tag for rollback capability
docker tag my-registry/my-harness:v1.2.3 my-registry/my-harness:stable
docker push my-registry/my-harness:stable
Part 2: Kubernetes Deployment (For Scaling)
When to Use Kubernetes vs Simpler Options
| Option | Complexity | Scaling | Use Case |
|---|---|---|---|
| Docker locally | Low | Manual | Development, testing |
| Docker + Systemd | Low-Medium | Manual, scripted | Small production (1-10 containers) |
| Docker Compose | Low-Medium | Limited | Multi-container locally, simple staging |
| Kubernetes (K8s) | High | Automatic | >10 containers, complex orchestration, auto-scale |
Start with simpler options, migrate to K8s when you have:
- Multiple harness instances (horizontal scaling needed)
- Complex dependencies (database, cache, queue, monitoring)
- Need for automatic failover and recovery
- Multi-region or multi-cloud deployment
Kubernetes Architecture for Harnesses
┌─────────────────────────────────────────┐
│ Load Balancer (K8s Service) │
│ Routes requests to healthy pods │
└──────────┬──────────────────────────────┘
│
┌──────┴────────┐
│ │
┌───▼──┐ ┌───▼──┐ ┌───────┐
│Pod 1 │ │Pod 2 │ .... │Pod N │
│Harness │Harness │Harness│
│Container │Container │Container
│(CPU/Memory │(CPU/Memory │(CPU/Memory
│ limits) │ limits) │ limits)
└───┬──┘ └───┬──┘ └───┬───┘
│ │ │
└──────────────┼─────────────┘
│
┌─────────▼──────────┐
│ Horizontal Pod │
│ Autoscaler (HPA) │
│ Scales on queue len │
└────────────────────┘
Kubernetes Deployment Manifest
Basic Deployment for harness (covers most scenarios):
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: harness
namespace: default
labels:
app: harness
version: v1
spec:
replicas: 3 # Start with 3, HPA will adjust
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: harness
template:
metadata:
labels:
app: harness
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec:
# Pod Disruption Budget (high availability during updates)
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- harness
topologyKey: kubernetes.io/hostname
containers:
- name: harness
image: my-registry/my-harness:v1.2.3
imagePullPolicy: IfNotPresent
# Port the harness listens on
ports:
- name: http
containerPort: 8000
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
# Environment variables
env:
- name: ENVIRONMENT
value: "production"
- name: LOG_LEVEL
value: "INFO"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: harness-secrets
key: database-url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: harness-secrets
key: openai-api-key
- name: CLAUDE_API_KEY
valueFrom:
secretKeyRef:
name: harness-secrets
key: claude-api-key
# Resource requests and limits (critical!)
resources:
requests:
memory: "1Gi" # Minimum guaranteed
cpu: "500m"
limits:
memory: "4Gi" # Never use more than this
cpu: "2"
# Liveness probe (is pod alive?)
livenessProbe:
httpGet:
path: /health
port: 8000
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# Readiness probe (ready to serve traffic?)
readinessProbe:
httpGet:
path: /ready
port: 8000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
# Startup probe (graceful startup)
startupProbe:
httpGet:
path: /health
port: 8000
scheme: HTTP
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30
# Graceful shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop:
- ALL
# Volume mounts
volumeMounts:
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /app/logs
# Volumes
volumes:
- name: tmp
emptyDir: {}
- name: logs
emptyDir: {}
# ImagePullSecrets for private registries
imagePullSecrets:
- name: registry-credentials
# Termination grace period (time for graceful shutdown)
terminationGracePeriodSeconds: 30
Kubernetes Service (Load Balancer)
Routes traffic to harness pods:
# k8s-service.yaml
apiVersion: v1
kind: Service
metadata:
name: harness
namespace: default
labels:
app: harness
spec:
type: LoadBalancer # Or ClusterIP for internal, NodePort for external
selector:
app: harness
ports:
- name: http
port: 80
targetPort: 8000
protocol: TCP
- name: metrics
port: 9090
targetPort: 9090
protocol: TCP
# Session affinity for stateful agents
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 3600
Horizontal Pod Autoscaler (HPA)
Auto-scale based on metrics:
# k8s-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: harness-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: harness
minReplicas: 2 # Never scale below 2 (high availability)
maxReplicas: 20 # Never scale above 20 (cost control)
metrics:
# CPU-based scaling (30% target utilization)
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30
# Memory-based scaling (75% target utilization)
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
# Custom metric: queue depth (requires monitoring setup)
- type: Pods
pods:
metric:
name: queue_depth
target:
type: AverageValue
averageValue: "100" # 1 pod per 100 items in queue
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100 # Double pods when scaling up
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50 # Remove 50% when scaling down
periodSeconds: 15
ConfigMap & Secrets
Store configuration and sensitive data:
# k8s-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: harness-config
namespace: default
data:
# Non-sensitive configuration
logging.level: INFO
prometheus.enabled: "true"
queue.max_retries: "3"
health_check.interval_seconds: "30"
---
# k8s-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: harness-secrets
namespace: default
type: Opaque
data:
# Base64 encoded (use: echo -n 'value' | base64)
database-url: cG9zdGdyZXM6Ly91c2VyOnBhc3NAaG9zdDpwb3J0L2Ri
openai-api-key: c2stLnNvbWVrZXloZXJl
claude-api-key: Y2wtc29tZWtleWhlcmU=
NEVER commit secrets to git! Use your cloud provider’s secret manager:
- AWS: Secrets Manager or Parameter Store
- Google Cloud: Secret Manager
- Azure: Key Vault
- Kubernetes: External Secrets Operator (to sync from above)
StatefulSet for Persistent State
If harness needs persistent storage (session files, local memory database):
# k8s-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: harness
namespace: default
spec:
serviceName: harness
replicas: 3
selector:
matchLabels:
app: harness
template:
metadata:
labels:
app: harness
spec:
containers:
- name: harness
image: my-registry/my-harness:v1.2.3
ports:
- containerPort: 8000
volumeMounts:
- name: data
mountPath: /app/data
# Persistent volumes for each replica
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 10Gi
Part 3: Serverless Deployment
When Serverless Makes Sense
| Criterion | Serverless ✅ | Containers ❌ |
|---|---|---|
| Invocation pattern | Bursty, event-driven | Continuous, streaming |
| Request duration | <5 min (Lambda), <1 hour (Cloud Functions) | Any duration |
| Startup time sensitivity | Acceptable (cold start 1-10s) | Need fast startup |
| Persistent state | Minimal (stateless) | Complex (persistent DB, cache) |
| Cost model | Pay per execution | Pay per hour/month |
| Cost threshold | <100K requests/month | >100K requests/month |
Recommendation: Use serverless for:
- Event-driven harnesses (web hook → process → respond)
- Scheduled jobs (cron → batch processing)
- Infrequent inference (< 100 requests/day)
Don’t use serverless for:
- Streaming agents (continuous interactions)
- Long-running tasks (>15 minutes)
- Stateful agents (complex session management)
- High-frequency APIs (>100 requests/sec)
AWS Lambda Deployment
Harness entry point for Lambda:
# lambda_handler.py
import json
import os
from harness import AgentHarness
# Initialize harness once (reused across invocations)
harness = AgentHarness(
model="claude-3-5-sonnet",
system_prompt=os.environ.get("SYSTEM_PROMPT"),
api_key=os.environ.get("CLAUDE_API_KEY")
)
def lambda_handler(event, context):
"""
AWS Lambda entry point.
event: API Gateway event or SNS message
context: Lambda execution context
"""
try:
# Parse input
if 'body' in event: # API Gateway
body = json.loads(event['body'])
user_input = body.get('input')
else: # SNS or direct invocation
user_input = event.get('input')
# Run harness
result = harness.run(user_input)
# Return success
return {
'statusCode': 200,
'body': json.dumps({
'status': 'success',
'result': result,
'cost': harness.last_cost,
'tokens_used': harness.last_tokens
})
}
except Exception as e:
# Return error
return {
'statusCode': 500,
'body': json.dumps({
'status': 'error',
'error': str(e)
})
}
# For batch processing (SQS)
def sqs_handler(event, context):
"""Process messages from SQS queue."""
for record in event['Records']:
body = json.loads(record['body'])
user_input = body.get('input')
result = harness.run(user_input)
# Could write results to S3, database, or SQS output queue
print(f"Processed: {result}")
return {'statusCode': 200}
Lambda deployment:
# Package with dependencies
pip install -r requirements.txt -t lambda_package/
cp lambda_handler.py lambda_package/
cd lambda_package && zip -r ../lambda_deployment.zip . && cd ..
# Deploy to Lambda
aws lambda create-function \
--function-name my-harness \
--runtime python3.11 \
--role arn:aws:iam::ACCOUNT:role/lambda-execution-role \
--handler lambda_handler.lambda_handler \
--zip-file fileb://lambda_deployment.zip \
--timeout 60 \
--memory-size 1024 \
--environment Variables="{CLAUDE_API_KEY=sk-...,SYSTEM_PROMPT=You are...}"
# Update function code
aws lambda update-function-code \
--function-name my-harness \
--zip-file fileb://lambda_deployment.zip
Google Cloud Functions Deployment
# main.py (Google Cloud Functions)
import functions_framework
from harness import AgentHarness
harness = AgentHarness(
model="claude-3-5-sonnet",
api_key=os.environ.get("CLAUDE_API_KEY")
)
@functions_framework.http
def process_request(request):
"""HTTP Cloud Function."""
request_json = request.get_json(silent=True)
user_input = request_json.get('input')
result = harness.run(user_input)
return {
'status': 'success',
'result': result,
'cost': harness.last_cost
}
@functions_framework.cloud_event
def process_event(cloud_event):
"""Event Cloud Function (Pub/Sub, Cloud Storage)."""
import base64
payload = base64.b64decode(cloud_event.data["message"]["data"])
user_input = payload.decode('utf-8')
result = harness.run(user_input)
print(f"Processed: {result}")
Deployment:
gcloud functions deploy process_request \
--runtime python311 \
--trigger-http \
--allow-unauthenticated \
--set-env-vars CLAUDE_API_KEY=sk-...
Azure Functions Deployment
# function_app.py (Azure Functions)
import azure.functions as func
from harness import AgentHarness
harness = AgentHarness(model="claude-3-5-sonnet")
app = func.FunctionApp()
@app.route(route="process")
def http_trigger(req: func.HttpRequest) -> func.HttpResponse:
"""HTTP triggered function."""
try:
req_body = req.get_json()
user_input = req_body.get('input')
result = harness.run(user_input)
return func.HttpResponse(
json.dumps({'status': 'success', 'result': result}),
status_code=200
)
except Exception as e:
return func.HttpResponse(
json.dumps({'status': 'error', 'error': str(e)}),
status_code=500
)
@app.queue_trigger(arg_name="msg", queue_name="harness-queue")
def queue_trigger(msg: func.InputStream):
"""Queue triggered function."""
import json
body = json.loads(msg.getvalue())
user_input = body.get('input')
result = harness.run(user_input)
logging.info(f"Processed: {result}")
Deployment:
az functionapp create \
--resource-group mygroup \
--consumption-plan-location centralus \
--runtime python \
--functions-version 4 \
--name my-harness
func azure functionapp publish my-harness
Cold Start Mitigation
Serverless functions have cold starts (first invocation slower). Strategies:
-
Keep initialization lightweight
# Initialize once, reuse across invocations harness = AgentHarness() # Outside handler def handler(event, context): result = harness.run(...) # Reuses initialized harness -
Provision concurrent executions (keep warm)
# AWS: Reserve concurrency aws lambda put-provisioned-concurrency-config \ --function-name my-harness \ --provisioned-concurrent-executions 10 \ --qualifier LIVE -
Use scheduled warmups
# CloudWatch rule: invoke every 5 minutes aws events put-rule \ --name warmup-harness \ --schedule-expression "rate(5 minutes)" aws events put-targets \ --rule warmup-harness \ --targets "Id"="1","Arn"="arn:aws:lambda:..." -
Language choice (Go/Node faster than Python, Python faster than Java)
Part 4: CI/CD Pipelines
Complete CI/CD Workflow
Pipeline stages:
Code Push (main)
↓
1. Lint & Format Check
↓
2. Unit Tests
↓
3. Security Scan
↓
4. Build Docker Image
↓
5. Deploy to Staging
↓
6. Integration Tests
↓
7. Deploy to Production
↓
8. Smoke Tests
↓
Complete
GitHub Actions Pipeline
Complete, production-ready workflow:
# .github/workflows/deploy.yml
name: Deploy Harness
on:
push:
branches: [main]
paths:
- 'src/**'
- 'Dockerfile'
- 'requirements.txt'
- '.github/workflows/deploy.yml'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
# Job 1: Lint and format checks
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install lint tools
run: |
pip install pylint flake8 black isort
- name: Run Black (formatter)
run: black --check src/
- name: Run isort (import sorter)
run: isort --check-only src/
- name: Run Flake8 (linter)
run: flake8 src/ --max-line-length=120
- name: Run Pylint (strict linter)
run: pylint src/ --fail-under=8.0
# Job 2: Unit tests
test:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov pytest-timeout
- name: Run unit tests
run: pytest tests/unit/ -v --tb=short --timeout=10
- name: Generate coverage report
run: pytest tests/unit/ --cov=src --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage.xml
fail_ci_if_error: true
# Job 3: Security scan
security:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- name: Run Bandit (security linter)
run: |
pip install bandit
bandit -r src/ -ll -f json -o bandit-report.json
- name: Check for hardcoded secrets
run: |
pip install detect-secrets
detect-secrets scan --baseline .secretsbaseline
- name: Dependency vulnerability check
run: |
pip install safety
safety check --file requirements.txt
# Job 4: Build Docker image
build:
runs-on: ubuntu-latest
needs: [lint, test, security]
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Log in to Container Registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=ref,event=branch
type=sha
- name: Build and push Docker image
id: build
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Job 5: Deploy to staging
deploy-staging:
runs-on: ubuntu-latest
needs: build
environment:
name: staging
url: https://harness-staging.example.com
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Deploy to ECS/K8s
run: |
# Example: ECS task definition update
aws ecs update-service \
--cluster harness-staging \
--service harness \
--force-new-deployment
- name: Wait for deployment
run: |
aws ecs wait services-stable \
--cluster harness-staging \
--services harness
- name: Smoke test staging
run: |
curl -f https://harness-staging.example.com/health || exit 1
python tests/smoke/test_staging.py
# Job 6: Integration tests
integration-test:
runs-on: ubuntu-latest
needs: deploy-staging
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest
- name: Run integration tests against staging
run: pytest tests/integration/ -v --tb=short
env:
HARNESS_URL: https://harness-staging.example.com
# Job 7: Deploy to production
deploy-prod:
runs-on: ubuntu-latest
needs: [build, integration-test]
environment:
name: production
url: https://harness.example.com
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
role-to-assume: arn:aws:iam::ACCOUNT:role/github-actions-deploy
- name: Deploy to production (blue-green)
run: |
# Create new task definition
TASK_DEF=$(aws ecs describe-task-definition \
--task-definition harness \
--query taskDefinition | jq '.[] |= .image="${{ needs.build.outputs.image-tag }}"')
TASK_VERSION=$(aws ecs register-task-definition \
--cli-input-json "$TASK_DEF" \
--query taskDefinition.taskDefinitionArn)
# Update service (triggers blue-green deployment)
aws ecs update-service \
--cluster harness-prod \
--service harness \
--task-definition "$TASK_VERSION"
- name: Wait for deployment
run: |
aws ecs wait services-stable \
--cluster harness-prod \
--services harness
- name: Smoke test production
run: curl -f https://harness.example.com/health
# Job 8: Post-deployment validation
validate:
runs-on: ubuntu-latest
needs: deploy-prod
if: always()
steps:
- name: Check deployment status
if: needs.deploy-prod.result == 'failure'
run: |
echo "Production deployment failed! Rolling back..."
# Rollback to previous version
aws ecs update-service \
--cluster harness-prod \
--service harness \
--task-definition harness:PREVIOUS_VERSION \
--force-new-deployment
- name: Send notification
run: |
# Send to Slack, email, etc.
curl -X POST https://hooks.slack.com/... \
-d '{"text": "Harness deployed: ${{ needs.build.outputs.image-tag }}"}'
Pre-Deployment Validation in CI/CD
Add quality gates before production:
# Before deploy-prod step, add validation
- name: Validate deployment readiness
run: |
# Check all required environment variables
[ -n "${{ secrets.CLAUDE_API_KEY }}" ] || exit 1
[ -n "${{ secrets.DATABASE_URL }}" ] || exit 1
# Verify staging tests passed
python tests/validate_staging_metrics.py
# Check cost projection
python tests/validate_budget.py --max-daily-cost 100
# Verify no hardcoded secrets in code
grep -r "sk-" src/ && exit 1
grep -r "claude_" src/ && exit 1
Part 5: Environment Configuration
Configuration Hierarchy
Environments should be identical except for specific overrides:
base/
├── Dockerfile (same for all)
├── requirements.txt (same for all)
├── k8s-deployment.yaml (use for all)
│
development/
├── .env (local API keys, debug flags)
│ LOG_LEVEL=DEBUG
│ CLAUDE_API_KEY=<test key>
│
staging/
├── .env.staging (pre-prod verification)
│ LOG_LEVEL=INFO
│ CLAUDE_API_KEY=<staging key>
│ BUDGET_DAILY=100
│
production/
├── secrets/ (managed by cloud provider)
│ LOG_LEVEL=WARNING
│ CLAUDE_API_KEY=<from AWS Secrets Manager>
│ BUDGET_DAILY=500
Configuration Management
Use environment variables (twelve-factor app):
# config.py
import os
from dataclasses import dataclass
from typing import Literal
Environment = Literal["development", "staging", "production"]
@dataclass
class Config:
# Environment identification
environment: Environment = os.environ.get("ENVIRONMENT", "development")
# API Keys (from secrets manager, never hardcoded)
claude_api_key: str = os.environ.get("CLAUDE_API_KEY", "")
openai_api_key: str = os.environ.get("OPENAI_API_KEY", "")
# Database
database_url: str = os.environ.get(
"DATABASE_URL",
"postgresql://localhost/harness"
)
database_pool_size: int = int(os.environ.get("DATABASE_POOL_SIZE", "10"))
# Logging
log_level: str = os.environ.get("LOG_LEVEL", "INFO")
log_json: bool = os.environ.get("LOG_JSON", "true").lower() == "true"
# Budget and cost control
budget_daily_usd: float = float(os.environ.get("BUDGET_DAILY", "100"))
budget_monthly_usd: float = float(os.environ.get("BUDGET_MONTHLY", "3000"))
# Scaling
min_replicas: int = int(os.environ.get("MIN_REPLICAS", "2"))
max_replicas: int = int(os.environ.get("MAX_REPLICAS", "10"))
# Feature flags
enable_streaming: bool = os.environ.get("ENABLE_STREAMING", "true").lower() == "true"
enable_caching: bool = os.environ.get("ENABLE_CACHING", "true").lower() == "true"
@property
def is_production(self) -> bool:
return self.environment == "production"
@property
def is_staging(self) -> bool:
return self.environment == "staging"
def validate(self):
"""Ensure all required settings are present."""
if not self.claude_api_key and not self.openai_api_key:
raise ValueError("At least one API key must be set")
if not self.database_url:
raise ValueError("DATABASE_URL must be set")
if self.budget_daily_usd <= 0:
raise ValueError("BUDGET_DAILY must be > 0")
# Usage in harness
config = Config()
config.validate()
harness = AgentHarness(
model="claude-3-5-sonnet",
api_key=config.claude_api_key,
log_level=config.log_level,
budget_daily=config.budget_daily_usd
)
Secret Management in CI/CD
AWS Secrets Manager:
# Store secret
aws secretsmanager create-secret \
--name harness/claude-api-key \
--secret-string "sk-..."
# Retrieve in CI/CD
aws secretsmanager get-secret-value \
--secret-id harness/claude-api-key \
--query SecretString \
--output text
GitHub Actions Secrets:
# Use in workflow
- name: Deploy
env:
CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: ./deploy.sh
Never commit .env files!:
# .gitignore
.env
.env.local
.env.*.local
*.p8
*.p12
*.pem
*.key
credentials.json
Part 6: Scaling Strategies
Horizontal Scaling (Multiple Instances)
Load balancing approaches:
| Approach | Setup | Pros | Cons |
|---|---|---|---|
| Round-robin | Load balancer (Nginx, K8s) | Simple | Uneven load if instance capacity varies |
| Least connections | Stateful load balancer | Balanced | Requires state tracking |
| Sticky sessions | Hash client IP | Preserves state | Doesn’t scale if one client slow |
| Queue-based | SQS, Celery, Kafka | Decoupled | Extra infrastructure |
Round-robin (simplest):
# Nginx as load balancer
upstream harness_backend {
server harness-1.internal:8000;
server harness-2.internal:8000;
server harness-3.internal:8000;
}
server {
listen 80;
location / {
proxy_pass http://harness_backend;
}
}
Queue-based scaling (best for variable load):
# Producer: Add to queue
import boto3
sqs = boto3.client('sqs')
def queue_harness_job(user_input):
sqs.send_message(
QueueUrl='https://sqs.us-east-1.amazonaws.com/ID/harness-queue',
MessageBody=json.dumps({'input': user_input})
)
return {'status': 'queued'}
# Consumer: Process from queue
def harness_worker():
while True:
messages = sqs.receive_message(QueueUrl=queue_url, MaxNumberOfMessages=10)
for msg in messages.get('Messages', []):
body = json.loads(msg['Body'])
result = harness.run(body['input'])
# Delete from queue
sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=msg['ReceiptHandle'])
# Scale based on queue depth
# CloudWatch alarm: if queue > 100, add 5 more worker instances
Cost Optimization in Scaling
1. Reserved instances (commit to uptime, get discount)
On-demand: $0.10/hour
Reserved (1yr): $0.06/hour (40% discount)
Reserved (3yr): $0.04/hour (60% discount)
2. Spot instances (unused capacity, 70-90% discount)
# Kubernetes: Use spot instances for non-critical pods
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: cloud.google.com/gke-preemptible
operator: In
values: ["true"]
3. Right-sizing instances (measure actual usage)
# Monitor CPU/memory in production
# If consistently <50% utilized, right-size down
kubectl top pods # See actual usage
kubectl describe nodes # See available resources
4. Caching to reduce API calls
# Cache successful results
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_embedding(text):
# Only call API if not cached
return model.embed(text)
Part 7: Monitoring & Health During Deployment
Deployment Health Checks
Pre-deployment:
# test_deployment_readiness.py
import requests
import time
def test_health_endpoint():
"""Verify health endpoint responds."""
response = requests.get("http://localhost:8000/health", timeout=5)
assert response.status_code == 200
def test_readiness():
"""Verify harness is ready to serve."""
response = requests.get("http://localhost:8000/ready", timeout=5)
assert response.status_code == 200
data = response.json()
assert data.get('database_connected')
assert data.get('model_responsive')
assert data.get('memory_accessible')
def test_database_connectivity():
"""Verify database connection."""
from sqlalchemy import create_engine, text
engine = create_engine(os.environ["DATABASE_URL"])
with engine.connect() as conn:
result = conn.execute(text("SELECT 1"))
assert result.scalar() == 1
def test_api_key_validity():
"""Verify API keys work."""
client = Anthropic(api_key=os.environ["CLAUDE_API_KEY"])
response = client.messages.create(
model="claude-3-5-sonnet",
max_tokens=10,
messages=[{"role": "user", "content": "Hi"}]
)
assert response.stop_reason == "end_turn"
During deployment (continuous monitoring):
# metrics.py
from prometheus_client import Counter, Gauge, Histogram
request_count = Counter('harness_requests_total', 'Total requests')
error_count = Counter('harness_errors_total', 'Total errors')
request_latency = Histogram('harness_request_latency_seconds', 'Request latency')
cost_gauge = Gauge('harness_cost_usd', 'Current run cost in USD')
def monitor_request(func):
"""Decorator to monitor request metrics."""
def wrapper(*args, **kwargs):
request_count.inc()
start = time.time()
try:
result = func(*args, **kwargs)
return result
except Exception as e:
error_count.inc()
raise
finally:
latency = time.time() - start
request_latency.observe(latency)
return wrapper
Canary Deployments
Deploy to small percentage of traffic first:
# Istio VirtualService: 90% to stable, 10% to canary
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: harness
spec:
hosts:
- harness
http:
- match:
- uri:
prefix: "/"
route:
- destination:
host: harness
subset: stable
weight: 90
- destination:
host: harness
subset: canary
weight: 10
timeout: 10s
retries:
attempts: 3
perTryTimeout: 5s
Monitor canary metrics:
# If canary error rate > 2%, rollback automatically
error_rate_canary = errors_canary / requests_canary
if error_rate_canary > 0.02:
print("Canary error rate too high, rolling back")
# Trigger rollback
Blue-Green Deployments
Keep two identical environments, switch instantly:
# Blue environment (current production)
version=v1.2.2
kubectl apply -f deployment-blue.yaml
# Green environment (new version)
version=v1.2.3
kubectl apply -f deployment-green.yaml
# Test green environment
curl http://green-harness.internal:8000/health
# Switch traffic to green
kubectl patch service harness -p '{"spec":{"selector":{"deployment":"green"}}}'
# Keep blue ready for instant rollback
# If issues detected, switch back: kubectl patch service harness -p '{"spec":{"selector":{"deployment":"blue"}}}'
Rollback Strategies
Immediate rollback (something obviously wrong):
# Revert to previous image tag
kubectl set image deployment/harness \
harness=my-registry/my-harness:v1.2.2 \
--record
# Or revert last deployment
kubectl rollout undo deployment/harness
kubectl rollout status deployment/harness
Gradual rollback (intermittent issues):
# Reduce traffic to new version gradually
for percentage in 80 60 40 20 0; do
kubectl patch vs harness --patch "{
\"spec\": {
\"http\": [{
\"route\": [
{\"destination\": {\"subset\": \"stable\"}, \"weight\": $((100-percentage))},
{\"destination\": {\"subset\": \"canary\"}, \"weight\": $percentage}
]
}]
}
}"
# Monitor metrics between changes
sleep 60
if error_rate > threshold; then
break # Stop rollback if improving
fi
done
Part 8: Database & Persistence
Session Persistence in Distributed Deployments
When running multiple harness instances, sessions need shared storage:
# distributed_session.py
import json
from sqlalchemy import create_engine, Column, String, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from datetime import datetime
Base = declarative_base()
class SessionRecord(Base):
"""Persistent session storage."""
__tablename__ = 'harness_sessions'
session_id = Column(String(36), primary_key=True)
agent_id = Column(String(100))
user_id = Column(String(100))
created_at = Column(DateTime, default=datetime.utcnow)
last_updated = Column(DateTime, onupdate=datetime.utcnow)
# Serialized session state
memory_json = Column(Text)
context_window = Column(Text)
feature_progress = Column(Text)
iteration_count = Column(Integer, default=0)
# Usage
engine = create_engine(os.environ["DATABASE_URL"])
SessionLocal = sessionmaker(bind=engine)
class DistributedHarness(AgentHarness):
def __init__(self, session_id, *args, **kwargs):
super().__init__(*args, **kwargs)
self.session_id = session_id
self.db_session = SessionLocal()
self.load_session()
def load_session(self):
"""Load session from database."""
record = self.db_session.query(SessionRecord).filter_by(
session_id=self.session_id
).first()
if record:
self.memory = json.loads(record.memory_json)
self.context = json.loads(record.context_window)
self.iteration_count = record.iteration_count
def save_session(self):
"""Persist session to database."""
record = self.db_session.query(SessionRecord).filter_by(
session_id=self.session_id
).first() or SessionRecord(session_id=self.session_id)
record.memory_json = json.dumps(self.memory)
record.context_window = json.dumps(self.context)
record.iteration_count = self.iteration_count
self.db_session.add(record)
self.db_session.commit()
# Run with automatic persistence
harness = DistributedHarness(session_id="sess-abc123")
result = harness.run("Do something")
harness.save_session() # Automatically saves to database
State Database Choices
| Option | Best For | Pros | Cons |
|---|---|---|---|
| SQLite | Single-process harnesses | Minimal setup, embedded | Not multi-process safe |
| PostgreSQL | Distributed harnesses | Robust, scalable, ACID | More infrastructure |
| Redis | Fast, in-memory sessions | Sub-millisecond speed | Data loss on restart |
| DynamoDB | AWS-only deployment | Managed, serverless | Vendor lock-in |
| MongoDB | Document-style state | Flexible schema, horizontal | Consistency challenges |
Recommendation: PostgreSQL for production (most reliable, widely supported).
Backup & Recovery
# backup.py
import subprocess
import boto3
from datetime import datetime
def backup_database():
"""Backup PostgreSQL to S3."""
timestamp = datetime.utcnow().isoformat()
backup_file = f"/tmp/harness-backup-{timestamp}.sql"
# Dump database
subprocess.run([
"pg_dump",
"-U", os.environ["DB_USER"],
"-h", os.environ["DB_HOST"],
os.environ["DB_NAME"]
], stdout=open(backup_file, 'w'))
# Upload to S3
s3 = boto3.client('s3')
s3.upload_file(
backup_file,
'harness-backups',
f"backups/{timestamp}.sql"
)
print(f"Backup complete: {timestamp}")
# Run daily via cron or Lambda
# 0 2 * * * python backup.py
Part 9: Pre-Deployment Checklist
Complete this checklist before every production deployment:
Infrastructure & Operations
- Kubernetes/Container manifests created and validated
- Resource limits set (CPU, memory)
- Health checks configured (liveness, readiness, startup probes)
- Persistence configured (database, cache, volumes)
- Load balancer configured with health checks
- Autoscaling policies defined (HPA or cloud-native)
- Secrets manager integration working (AWS Secrets, GCP Secret Manager, etc.)
Environment Configuration
- All environment variables set correctly for production
- API keys in secrets manager (not code)
- Database connection tested
- Logging configured and working
- Monitoring setup (Prometheus, Datadog, CloudWatch)
- Alerts configured (budget, latency, errors)
Security Review
- No hardcoded secrets in code or Docker image
- Input validation implemented (sanitizing user input)
- Output validation implemented (preventing data leaks)
- Rate limiting configured
- CORS configured correctly (if API)
- Audit logging enabled
- Security scanning passed in CI/CD (Bandit, Trivy)
- Dependencies have no known CVEs (safety check)
Testing & Quality
- Unit tests pass (100% passing)
- Integration tests pass in staging
- Load tests pass (expected QPS)
- Quality baseline established and no regression
- Cost projection within budget
- Error handling tested (graceful degradation)
Monitoring & Observability
- Structured logging configured (JSON)
- Core metrics exposed:
- Request latency (p50, p95, p99)
- Request throughput
- Error rate by type
- Cost per request
- Loop iterations (for agents)
- Dashboard created and accessible
- Alert thresholds set and tested
- Budget tracking enabled with hard limits
Deployment Plan
- Deployment strategy documented (canary, blue-green, rolling)
- Rollback plan documented and tested
- Expected downtime calculated (≥99.9% availability)
- Gradual rollout plan (5% → 25% → 100%)
- Monitoring interval defined (how often check metrics?)
Post-Deployment
- Smoke tests ready to run immediately
- On-call rotation established
- Incident playbook ready (what to do if something breaks)
- Rollback command documented and practiced
Backup & Recovery
- Database backups scheduled
- Backup restoration tested
- Data retention policy defined
- Disaster recovery plan documented
Cost & Capacity
- Cost projection validated (daily, monthly)
- Budget alerts configured
- Scaling limits set (max replicas, max instances)
- Cost per request calculated and within budget
Related Documentation
- Observability & Monitoring: See
09_operations_and_observability.mdfor detailed logging, metrics, cost tracking - Security & Safety: See
10_security_and_safety.mdfor input validation, secret management, compliance - Testing & QA: See
11_testing_and_qa.mdfor pre-deployment validation, quality baselines, regression detection - Architecture: See
06_harness_architecture.mdfor complete system design patterns
Checklist Summary
| Phase | Key Deliverables |
|---|---|
| Development | Dockerfile, docker-compose.yml, local testing |
| CI/CD | GitHub Actions workflow (lint → test → build → deploy) |
| Staging | K8s manifests, ConfigMap/Secrets, integration tests |
| Production | Deployment strategy, monitoring, alerts, rollback plan |
| Operations | Health checks, metrics, logging, incident playbook |
Part 10: Rollback Strategy
Why Rollback Planning Is Non-Negotiable
A broken harness deployment can burn through API budget, return hallucinated results, or loop indefinitely — all silently. Every deployment must have a tested rollback path before going live.
Blue-Green Deployment for Harnesses
Maintain two identical environments. Only one receives live traffic at a time:
┌─────────────┐
Traffic ────────►│ Load Balancer│
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ │
│ BLUE │ │ GREEN │ │
│ v1.2.2 │ │ v1.2.3 │ │
│ (live) │ │ (idle) │ │
└─────────┘ └─────────┘ │
Deployment flow:
# 1. Deploy new version to the idle environment (green)
kubectl apply -f deployment-green.yaml
kubectl rollout status deployment/harness-green --timeout=120s
# 2. Run smoke tests against green (not yet receiving live traffic)
curl -f http://harness-green.internal:8000/health || exit 1
python tests/smoke/test_green.py --url http://harness-green.internal:8000
# 3. Switch traffic from blue to green
kubectl patch service harness \
-p '{"spec":{"selector":{"deployment":"green"}}}'
# 4. Monitor for 10 minutes (check error rate, latency, cost)
echo "Monitoring green deployment for 10 minutes..."
# Check metrics dashboard or query Prometheus
# 5. If problems detected, instant rollback (< 5 seconds)
kubectl patch service harness \
-p '{"spec":{"selector":{"deployment":"blue"}}}'
echo "Rolled back to blue (v1.2.2)"
Canary Deployment (Gradual Traffic Shift)
Route a small percentage of traffic to the new version, increase only if metrics are healthy:
# Stage 1: 10% to canary
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: harness
spec:
hosts: [harness]
http:
- route:
- destination:
host: harness
subset: stable
weight: 90
- destination:
host: harness
subset: canary
weight: 10
EOF
echo "Canary at 10% — monitoring for 15 minutes..."
# Stage 2: Check metrics before proceeding
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query?query=rate(harness_errors_total{subset='canary'}[5m])" \
| jq '.data.result[0].value[1]' -r)
if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
echo "ERROR: Canary error rate ${ERROR_RATE} exceeds 5% — rolling back"
kubectl apply -f virtualservice-stable-only.yaml
exit 1
fi
# Stage 3: 50% to canary
kubectl patch virtualservice harness --type merge \
-p '{"spec":{"http":[{"route":[{"destination":{"host":"harness","subset":"stable"},"weight":50},{"destination":{"host":"harness","subset":"canary"},"weight":50}]}]}}'
echo "Canary at 50% — monitoring for 15 minutes..."
# Stage 4: 100% to canary (promote)
kubectl patch virtualservice harness --type merge \
-p '{"spec":{"http":[{"route":[{"destination":{"host":"harness","subset":"canary"},"weight":100}]}]}}'
echo "Canary promoted to 100% — new version is now stable"
Automated Rollback Triggers
Define clear conditions that trigger automatic rollback. Implement as a monitoring script or Prometheus alert rule:
# rollback_monitor.py
import time
import requests
PROMETHEUS_URL = "http://prometheus:9090"
ROLLBACK_TRIGGERS = {
"error_rate_spike": {
"query": 'rate(harness_errors_total[5m]) / rate(harness_requests_total[5m])',
"threshold": 0.05, # > 5% error rate
"comparison": "greater",
"description": "Error rate exceeded 5%"
},
"latency_increase": {
"query": 'histogram_quantile(0.95, rate(harness_latency_step_ms_bucket[5m]))',
"threshold": 15000, # > 15 seconds p95
"comparison": "greater",
"description": "P95 latency exceeded 15 seconds"
},
"cost_overrun": {
"query": 'increase(harness_cost_total_usd[1h])',
"threshold": 50.0, # > $50/hour
"comparison": "greater",
"description": "Hourly cost exceeded $50"
},
}
def check_triggers() -> list[str]:
"""Check all rollback triggers, return list of fired triggers."""
fired = []
for name, trigger in ROLLBACK_TRIGGERS.items():
resp = requests.get(f"{PROMETHEUS_URL}/api/v1/query", params={"query": trigger["query"]})
result = resp.json()["data"]["result"]
if not result:
continue
value = float(result[0]["value"][1])
if trigger["comparison"] == "greater" and value > trigger["threshold"]:
fired.append(f"{name}: {trigger['description']} (value={value:.4f})")
return fired
def execute_rollback(reason: str):
"""Roll back to previous known-good version."""
print(f"ROLLBACK TRIGGERED: {reason}")
# Kubernetes: undo last deployment
import subprocess
subprocess.run(["kubectl", "rollout", "undo", "deployment/harness"], check=True)
subprocess.run(["kubectl", "rollout", "status", "deployment/harness", "--timeout=120s"], check=True)
# Notify the team
requests.post("https://hooks.slack.com/services/YOUR/WEBHOOK/URL", json={
"text": f":rotating_light: Harness auto-rollback triggered: {reason}"
})
# Run as a monitoring loop (or wire into Prometheus Alertmanager)
if __name__ == "__main__":
while True:
triggered = check_triggers()
if triggered:
execute_rollback("; ".join(triggered))
break
time.sleep(30) # Check every 30 seconds
Docker Rollback Commands (Quick Reference)
# Docker Compose: revert to previous image
docker-compose pull # Pull previous tag
docker-compose up -d --no-deps harness
# Docker standalone: switch to previous tag
docker stop harness-prod
docker run -d --name harness-prod my-registry/my-harness:v1.2.2
# Kubernetes: undo last rollout
kubectl rollout undo deployment/harness
kubectl rollout status deployment/harness
# Kubernetes: rollback to specific revision
kubectl rollout history deployment/harness # List revisions
kubectl rollout undo deployment/harness --to-revision=3
# ECS: revert to previous task definition
aws ecs update-service \
--cluster harness-prod \
--service harness \
--task-definition harness:42 # Previous task definition revision
Part 11: Health Check Endpoints
Why Health Checks Matter for Harnesses
Standard web services only need “is the server running?” checks. Harnesses need deeper validation: Is the model reachable? Are tools functional? Is there budget remaining? Without these, your load balancer routes traffic to a pod that accepts requests but cannot actually process them.
FastAPI Health Check Implementation
# health.py — Copy-paste ready for any FastAPI harness
from fastapi import FastAPI, Response
from datetime import datetime, timedelta
import time
import asyncio
app = FastAPI()
# Shared state (in production, use a proper state manager)
_last_model_check: float = 0
_model_healthy: bool = False
_startup_time: float = time.time()
@app.get("/health")
async def health():
"""
Liveness probe: Is the process alive and responsive?
Used by: Kubernetes livenessProbe, Docker HEALTHCHECK
Should be fast (< 100ms) and never call external services.
"""
return {
"status": "healthy",
"uptime_seconds": int(time.time() - _startup_time),
"timestamp": datetime.utcnow().isoformat() + "Z"
}
@app.get("/ready")
async def ready():
"""
Readiness probe: Can this instance serve traffic right now?
Used by: Kubernetes readinessProbe
Checks all dependencies. If any fail, return 503 to stop traffic routing.
"""
checks = {}
# Check 1: Model API reachable (cached for 60s to avoid hammering)
global _last_model_check, _model_healthy
if time.time() - _last_model_check > 60:
try:
from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
model="claude-sonnet-4",
max_tokens=5,
messages=[{"role": "user", "content": "ping"}]
)
_model_healthy = resp.stop_reason is not None
except Exception:
_model_healthy = False
_last_model_check = time.time()
checks["model_responsive"] = _model_healthy
# Check 2: Database connection
try:
from sqlalchemy import create_engine, text
import os
engine = create_engine(os.environ.get("DATABASE_URL", "sqlite:///test.db"))
with engine.connect() as conn:
conn.execute(text("SELECT 1"))
checks["database_connected"] = True
except Exception:
checks["database_connected"] = False
# Check 3: Budget remaining
try:
from harness.cost import get_daily_spend, get_daily_budget
remaining = get_daily_budget() - get_daily_spend()
checks["budget_remaining_usd"] = round(remaining, 2)
checks["budget_ok"] = remaining > 0
except Exception:
checks["budget_ok"] = False
# Check 4: Memory/disk accessible
try:
import os
workspace = os.environ.get("WORKSPACE_DIR", "/app/data")
checks["workspace_accessible"] = os.path.isdir(workspace)
except Exception:
checks["workspace_accessible"] = False
# Overall verdict
all_ok = all([
checks.get("model_responsive", False),
checks.get("database_connected", False),
checks.get("budget_ok", False),
checks.get("workspace_accessible", False),
])
status_code = 200 if all_ok else 503
return Response(
content=json.dumps({
"status": "ready" if all_ok else "not_ready",
"checks": checks,
"timestamp": datetime.utcnow().isoformat() + "Z"
}),
status_code=status_code,
media_type="application/json"
)
@app.get("/metrics")
async def metrics():
"""
Prometheus-compatible metrics endpoint.
Used by: Prometheus scraper (see k8s-deployment.yaml annotations)
"""
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
return Response(
content=generate_latest(),
media_type=CONTENT_TYPE_LATEST
)
Flask Health Check Implementation
# health_flask.py — For Flask-based harnesses
from flask import Flask, jsonify
import time
import json
app = Flask(__name__)
_startup_time = time.time()
@app.route("/health")
def health():
"""Liveness probe."""
return jsonify({
"status": "healthy",
"uptime_seconds": int(time.time() - _startup_time)
})
@app.route("/ready")
def ready():
"""Readiness probe with dependency checks."""
checks = {}
try:
# Add your dependency checks here (same pattern as FastAPI above)
checks["model_responsive"] = _check_model()
checks["database_connected"] = _check_database()
checks["budget_ok"] = _check_budget()
except Exception as e:
return jsonify({"status": "error", "error": str(e)}), 503
all_ok = all(checks.values())
return jsonify({
"status": "ready" if all_ok else "not_ready",
"checks": checks
}), 200 if all_ok else 503
Wiring Health Checks to Kubernetes
Reference the deployment manifest in Part 2 of this document. The key fields are:
# Liveness: restart pod if /health fails 3 times
livenessProbe:
httpGet:
path: /health
port: 8000
periodSeconds: 10
failureThreshold: 3
# Readiness: stop sending traffic if /ready fails
readinessProbe:
httpGet:
path: /ready
port: 8000
periodSeconds: 5
failureThreshold: 3
# Startup: give the pod time to initialize (model loading, etc.)
startupProbe:
httpGet:
path: /health
port: 8000
periodSeconds: 5
failureThreshold: 30 # 30 * 5s = 150s max startup time
See Also
- Doc 09 (Operations & Observability) — Monitoring and debugging production harnesses; deployment enables observability infrastructure
- Doc 11 (Testing & QA) — Testing must complete before deploying; pre-deployment checklist validates readiness
- Doc 10 (Security & Safety) — Security controls are deployed as part of container/Kubernetes configuration
- Doc 20 (Integration Patterns) — Harness deployed as microservice integrates with other systems via these patterns
Changelog
- April 2026: Created comprehensive deployment guide
- Docker containerization patterns
- Kubernetes manifests (Deployment, Service, HPA)
- Serverless deployment (Lambda, Cloud Functions, Azure)
- Complete CI/CD pipeline (GitHub Actions)
- Environment configuration and secrets management
- Scaling strategies and cost optimization
- Pre-deployment checklist and monitoring patterns