Real-World AI Applications — The Harness Handbook Reference

AI in the lab is clean, controlled, and reproducible. AI in production is messy, complex, and must work when the stakes are high. This chapter explores how AI systems actually get deployed across industries, the patterns that emerge, and the real engineering challenges that matter.

1. Autonomous Vehicles: A Complete System

The self-driving car is the most complex AI application ever deployed at scale. It’s not one model—it’s a tightly integrated system where every component has hard real-time constraints.

System Architecture

Perception — Making sense of the world.

The car sees through multiple sensors:

Cameras (8-10 per vehicle): RGB data, 1920×1080, 30 fps. Neural networks detect cars, pedestrians, cyclists, lane markings, traffic lights.
LIDAR (2-5 units): 360° point clouds at 10-20 Hz, up to 200m range. Detection networks output 3D bounding boxes.
Radar (4-6 units): Works in rain/fog when cameras fail. Good for velocity estimation.

Perception runs multiple networks in parallel:

Object detection (YOLO, EfficientDet variants): cars, pedestrians, cyclists, signs
Lane detection (custom CNNs): lane markings, curbs, drivable area
Traffic light state (classification network): red/yellow/green
Semantic segmentation: road surface, obstacles, traversable space

These run on dedicated hardware (NVIDIA DRIVE AGX, Tesla custom silicon) at fixed clock speeds. Latency budget: 50-100ms for full pipeline. Miss the deadline, the car doesn’t react to a pedestrian.

Sensor Fusion — Combining multiple modalities.

Raw detections from cameras, LIDAR, radar are independent. Fusion networks learn to combine them:

LIDAR gives 3D position, camera gives appearance
Radar gives velocity, confirms moving vs static
Output: unified object list with position, velocity, classification, confidence

Localization — Where am I?

GPS alone is 5-10 meters of error. Not precise enough.

HD Maps: Pre-built maps with centimeter-level detail (lane boundaries, curb positions, traffic rules per lane)
Map Matching: GPS position snapped to nearest road, then refined
IMU + Odometry: Integrate accelerometer and wheel odometry between GPS fixes
Particle Filter: Fuse GPS, map, and motion to get 10cm accuracy

Localization must be continuous. Even 1 second of uncertainty is dangerous.

Prediction — What will they do?

Other agents (vehicles, pedestrians, cyclists) don’t stand still. Prediction networks forecast 3-8 seconds ahead:

Trajectory Models (LSTM, transformer variants): given past positions and velocities, predict future
Interaction Models: pedestrian behavior depends on nearby cars
Uncertainty Quantification: not a single prediction, but a distribution (multimodal—multiple likely futures)

Critical insight: at intersections, vehicles have many possible futures (turn left, right, straight). Prediction networks must capture all of them.

Planning — Safe path given predictions.

Given my position, other agents’ predictions, and traffic rules, compute a safe trajectory:

Search (hybrid A*, RRT): generate candidate paths
Cost Function: collision risk, comfort (acceleration), efficiency, traffic rules
Constraint Satisfaction: can’t accelerate infinitely, must obey lane boundaries
Output: desired position and velocity for next 100ms

Planning runs at 10 Hz. The car re-plans continuously as new information arrives.

Control — Execute the plan.

A planning output of “go to (10.5, 20.3) at 5 m/s” must translate to steering angle and throttle:

Lateral Control (steering): PID controller tracks desired path
Longitudinal Control (throttle/brake): PID controller tracks desired speed
Actuator Limits: steering has max angle, acceleration has limits
Latency Compensation: control signal takes ~50ms to affect wheels, model predicts ahead

Real-World Complexity

The lab has clean scenarios. The road doesn’t.

Weather: Rain reduces camera quality and LIDAR range. Snow covers lane markings. Fog blocks everything. Models trained on sunny California don’t work in Seattle.

Shadows and Reflections: A shadow looks like a pothole. Rain on the camera lens creates artifacts. Specular reflections off wet roads confuse perception.

Occlusion: A parked car hides a pedestrian 20 meters ahead. Prediction must infer unseen agents.

Novel Scenarios: A construction worker in a pink tutu is not in the training set. A child chasing a ball into traffic is a tail case. The long tail of edge cases is infinite.

Adversarial Inputs: Can a sticker on a stop sign fool the perception system? (Yes, in the lab—Szegedy et al. 2014. Harder in the real world, but possible.)

Map Errors: HD maps are months old. Construction changed the road. A lane is missing. Planning must handle map uncertainty.

Current State of the Art

Tesla (Full Self-Driving, Autopilot):

Vision-only: 8 cameras, no LIDAR or radar
Approach: if you have enough cameras and neural networks, LIDAR is redundant
Reality: works well on highways, struggles with complex urban scenarios
Scale: millions of cars collecting data daily, continuous model improvement
Advantage: lower cost, simpler hardware

Waymo (Waymo Driver):

Full stack: cameras, LIDAR, radar, HD maps
Approach: multimodal fusion, sophisticated prediction and planning
Operations: fully autonomous (no human safety driver) in Phoenix, San Francisco, Los Angeles
Scale: hundreds of thousands of rides, good operational data
Advantage: more sensors → more redundancy, higher confidence

Others (Cruise, Aurora, Motional):

Most use hybrid approaches (cameras + LIDAR)
Regional focus: Phoenix for Waymo, San Francisco for Cruise
Challenge: getting to >99.99% reliability for regulatory approval

Challenges

Edge Cases: Training data contains 99.9% highway driving. The 0.1% of interesting edge cases dominate actual failures. Collecting enough examples of rare scenarios takes years.

Long Tail: Once you solve the 80% case, the remaining 20% takes 80% of effort. Moving from 95% to 99% to 99.9% reliability requires exponential data collection and engineering.

Regulatory Approval: Proving safety to the DMV requires showing the system is safer than humans. How do you measure safety of a system that’s only been deployed to a few thousand cars? Waymo settled on extreme operational control (controlled areas, good weather, no complex intersections) to get to 99.99%.

Integration Complexity: Hundreds of neural networks, classical algorithms, and real-time constraints all interacting. One bug in the planning module causes a collision. Testing is hard; simulation helps but doesn’t catch everything.

Hardware Constraints: All this must run on a car’s compute platform, in real-time, with <100ms latency, consuming <10kW of power. NVIDIA DRIVE AGX has 700 TFLOPS. A single perception model might use 100 TFLOPS. No room for inefficiency.

2. Robotics Ecosystems: From SLAM to Grasping

Robots perceive, plan, and act. Unlike self-driving cars (which move in one plane), robots are 3D agents manipulating complex environments.

Before a robot can plan, it must know where it is and what the world looks like.

SLAM (Simultaneous Localization and Mapping):

Robot has no prior map
Uses cameras and/or LIDAR to observe the environment
Builds a map incrementally while estimating its own position
Classical approach: visual odometry (track features between frames) + loop closure detection (recognize when you’ve seen a place before)
Modern approach: neural networks learn visual features that are robust to viewpoint and lighting changes

Warehouse robots (Amazon Robotic, Fetch, Mobile Industrial Robots):

Environment: known warehouse with fixed layout
Approach: use pre-built floor plans, no SLAM needed
Precision required: 5cm accuracy to dock with shelves
Method: wheel odometry + occasional global localization (magnetic markers or barcode landmarks)

Legged robots (Boston Dynamics Spot):

Challenge: rough terrain, uneven ground, dynamic balance
Approach: IMU + foot force sensors + visual odometry
Real-time constraint: balance control runs at 1kHz, must react in milliseconds to terrain
Navigation: SLAM on visual features + terrain classification (avoid obstacles, assess traversability)

Manipulation: Object Detection to Grasping

A robot arm is useless without knowing what to grasp and where.

Object Detection:

Camera mounted on gripper or body observes the scene
CNN detects objects: boxes, bottles, books (depends on task)
3D localization: use LIDAR or stereo to get 3D position

Grasp Planning:

Given object position and 3D model, compute grasping poses
Classical: test 1000 candidate grasps, choose best by physics simulation
Learning: CNN predicts grasp quality directly from image (faster, but less generalizable)
Key challenge: friction varies (wet surface vs dry), objects deform (soft materials), gripper shape matters

Placement:

Once grasped, robot must place object somewhere
Target: shelf, bin, conveyor belt (depends on task)
Constraints: don’t hit obstacles, don’t drop on fragile items
Use path planning (RRT, probabilistic roadmaps) in 6D space (3D position + 3D orientation)

Real-World Challenges:

Objects are soft or rigid, heavy or light—physics varies
Suction cups fail on porous materials, parallel grippers fail on cylinders
Small changes in friction cause grasp failures
No perfect 3D models in real warehouses; objects are jumbled together

Learning from Demonstration

Robots can learn from human examples (imitation learning):

Human demonstrates grasping a bottle
Camera records RGB + depth during demo
Train neural network: image → robot joint commands
Deploy: robot executes learned behavior

Challenge: distribution shift. Human moves smoothly, robot is jerky. Human understands physics intuitively, robot doesn’t. Network memorizes the demo but fails on new scenarios.

Reinforcement Learning for Robotics

Trial and error in the real world is expensive (destroy the robot). Use simulation:

Train in simulation (PyBullet, Mujoco) with randomized physics
Transfer to real robot via domain randomization (make sim sufficiently random that real world is just another variant)

Example: Google’s robotic arm learning to grasp (Levine et al. 2016). Train in simulation, deploy on 50 robotic arms, each learns from its own experiences. Aggregate data across robots, retrain, deploy. Cycle runs continuously. Within months, the robot can grasp 99% of objects it’s never seen.

Challenge: simulation doesn’t match reality perfectly. Systematic differences (friction coefficient) cause sim-to-real gap. Domain randomization helps but isn’t perfect.

Future: General-Purpose Robots

Current robots are task-specific:

Amazon Robotic arms: grasp boxes only
Dishwasher robots: load dishes only
Welding robots: weld only

Future vision: Foundation models for robotics. Similar to how GPT-3 is “general” (can do many NLP tasks), train a single model on data from thousands of robots doing diverse tasks. Adapt via prompt or fine-tuning to new tasks.

Current state: OpenVLA (Open-Source Vocabulary-based Language Action model), RT-1 (Robotics Transformer) are early attempts. Still far from “one model for all robotics.”

3. Industrial IoT & Predictive Maintenance: Sensing Failure Before It Happens

In a manufacturing plant, an unexpected machine failure costs $100k+ per hour (lost production, labor, parts). Predictive maintenance uses ML to schedule maintenance before failure occurs.

Data Pipeline

Machines have sensors: vibration accelerometers, temperature, acoustic emission, current draw.

Vibration sensor on a bearing: samples at 10kHz, records time series
Temperature sensor: thermocouples, 1 Hz sample rate
Current sensor: measures motor current draw, indicates load and efficiency

Data flows:

Sensors → edge device (industrial computer on factory floor)
Edge device → local buffer (SQL database)
Buffer → cloud analytics (batch processing every night)
Analytics → alerting (schedule maintenance for day X)

Models: LSTM and GRU

Predictive maintenance is a time series forecasting problem:

Input: 30 days of historical sensor data
Output: time until failure (0-365 days)

RNN architectures:

LSTM (Long Short-Term Memory): learns long-range dependencies, good for slow degradation (bearing wear over months)
GRU (Gated Recurrent Unit): simpler than LSTM, similar performance, less compute
1D Convolutions: simpler and faster, competitive with RNN for some tasks

Example model:

model = Sequential([
    LSTM(128, input_shape=(30, 10)),  # 30 days, 10 sensors
    Dropout(0.2),
    Dense(64, activation='relu'),
    Dense(1)  # output: days to failure
])
model.compile(loss='mse')

Alerting Strategy

A model outputs: “bearing will fail in 48 hours.”

Question: do you schedule maintenance now?

Cost function:

Schedule too early: wasteful (spend money replacing part that still works)
Schedule too late: catastrophic (machine fails, expensive downtime)
Don’t schedule at all: disaster

Most plants use a decision rule:

If predicted time to failure < 7 days: alert maintenance crew
Crew schedules replacement within 7 days
Hedge: run bearing 7 days with increased monitoring

Data Collection and Baseline

Collecting sufficient training data is the hard part.

A healthy bearing runs for 2-3 years. A failed bearing is the last day. To build a model:

Collect data from 50 machines for 6 months
Hopefully observe 5-10 failures
Lots of healthy data, few failure examples

Imbalanced dataset: 99.9% healthy, 0.1% failure. Standard techniques fail (model just predicts “healthy” always).

Solution: oversampling failures, weighted loss functions, or anomaly detection (model learns normal, flags deviations).

Real-World Deployment

Sensors fail: a vibration sensor disconnects, temperature readout drifts, current measurement becomes noisy. Model must handle missing data, outliers, and sensor drift.

Fallback Logic:

if model_confidence < threshold:
    use rule_based_alerting()
else:
    use model_prediction()

Rules: if vibration suddenly spikes 10x normal, alert immediately (don’t wait for model).

Economics

A company with 100 industrial machines:

Current approach: reactive maintenance (fix when broken) costs $50k/month in downtime
Predictive maintenance: costs $20k/month (scheduled, no surprises)
Cost of ML system: $200k initial, $20k/year operations
Payback period: 4 months
Once profitable, motivation to deploy everywhere

4. Smart Power Grids: Real-Time Optimization Under Uncertainty

A power grid must balance supply and demand in real time. Too much demand and the grid frequency drops (blackout risk). Too little and generators are overprovisioned (waste).

AI adds flexibility, cost optimization, and resilience.

Load Forecasting

Predict electricity demand hours/days ahead so generators can ramp up in time.

Factors:

Time of day: peak at breakfast and evening
Day of week: weekday vs weekend
Season: summer (air conditioning) vs winter (heating)
Weather: temperature, cloud cover, humidity
Events: sports game, concert, holiday

Model:

Inputs: past 2 weeks of demand, weather forecast, calendar features
Output: demand (MW) for next 1, 6, 24 hours ahead
Architecture: GRU or Transformer, multiple output heads for different horizons

For a large utility (50M customers), a 1% error in load forecast translates to millions of dollars of wasted generation or shortage risk.

Real-world challenge: Renewable penetration introduces variability. Solar output depends on cloud cover (hard to predict). Wind is intermittent. Battery storage adds flexibility but must be charged/discharged strategically.

Anomaly Detection

Grid attacks (physical or cyber) cause unusual patterns: a major line failure drops demand 20% instantly.

Approach:

Baseline: normal grid behavior patterns (learned from months of data)
Monitor: real-time power flows, voltage, frequency
Alert: when measurements deviate from baseline (Z-score > 3σ)

Example: a distribution line is attacked (cut). Current suddenly drops. Voltage becomes unstable. Within milliseconds, anomaly detection flags it. Engineers can reroute power via alternate lines to avoid cascading failure.

Fault Localization

When a line fails, operators must identify which line to repair.

Traditional approach: manual investigation, walk the line to find the break (hours of time).

ML approach:

Measure voltages and currents at multiple points in the grid
Train model: electrical measurements → which line failed
Use graph neural networks (grid is naturally a graph) or physics-informed neural networks

Given measurements from 100 sensor points, classify which of 1000 possible faults occurred.

Real-time constraint: decision within seconds, before cascading failures occur.

Optimization: Microgrids

A microgrid is a small grid (neighborhood, university campus) with distributed generation (solar, wind, batteries) that can operate independently or connected to the main grid.

Optimization problem:

Many solar panels (variable output)
Many batteries (can store or release)
Many consumers (variable demand)
Goal: minimize cost (buy from grid when cheap, sell back when expensive), or maximize renewables (use local solar first)

This is a unit commitment problem (which generators should be on?) and economic dispatch (how much power from each generator?).

Real-time approach:

Every 5 minutes, forecast next 1 hour of solar and demand
Optimize: battery charge/discharge schedule, load shedding if needed
Execute: send setpoints to inverters and controllers
Repeat: reoptimize as new forecasts arrive

With high renewable penetration, this optimization becomes critical. Too much solar at noon (demand is moderate) means batteries must absorb the surplus. Too little at 6pm (demand peaks, sun sets) means grid must supply power (expensive). Planning ahead avoids this mismatch.

Real-World Complexity

Grid modernization is gradual. Utilities have decades-old infrastructure:

Measurement infrastructure is sparse (few sensors in the grid)
Communication is slow (SCADA systems, not real-time APIs)
Integration with legacy systems requires months of engineering
Regulatory approval: grid reliability is critical, must pass extensive testing

A utility can’t deploy a new algorithm without months of validation. Failures have real consequences (blackouts).

Current state: some utilities deploy load forecasting (well-proven, low risk). Optimization and anomaly detection are pilot programs.

5. Healthcare Applications: When AI Decisions Impact Lives

AI in healthcare has regulatory, ethical, and practical complexities beyond most domains.

Diagnostics: Radiology

Chest X-ray analysis:

Radiologist reviews image, looks for pneumonia, tuberculosis, pneumothorax, nodules
Highly variable: image quality, patient body composition, prior images matter
Inter-rater variability: two radiologists disagree 10-20% of the time

AI approach:

Train CNN (ResNet, DenseNet) on 100k+ labeled chest X-rays
Model learns: nodule appearance, Kerley B lines (sign of pneumonia), pleural effusion
Deployment: radiologist uploads image, model outputs predictions + attention map (highlights abnormal regions)
Workflow: model assists, doesn’t replace (radiologist makes final call)

FDA Approval (FDA clearance pathway):

Phase 1: validate on internal dataset (does model work?)
Phase 2: validate on external dataset from different hospitals (does it generalize?)
Phase 3: prospective clinical trial (does it improve patient outcomes?)
Submission to FDA: provide validation data, intended use, failure modes
Review: takes 6-12 months for FDA clearance

Current state: Multiple radiology AI systems have FDA clearance (CheXpert by Stanford, others by Siemens, GE). Most are assistive (help radiologist decide), not autonomous.

Challenges:

Labeling: need thousands of images labeled by expert radiologists (expensive, takes months)
Shifts: model trained on modern equipment, deployed on older machines → performance drops
Rare diseases: model trained on common cases (pneumonia, TB) fails on rare conditions (silicosis)
Legal liability: if model-assisted diagnosis is wrong and patient harmed, who’s liable?

Wearables and Continuous Monitoring

Heart rate variability → Arrhythmia:

Smartwatch measures heart rate continuously (100 Hz sampling)
LSTM detects irregular patterns: atrial fibrillation (Afib) is an irregular heart rhythm
User gets alert: “irregular rhythm detected, consult doctor”

Challenge: false positives. Watch detects motion artifact (user moving, not heart issue). Too many false alarms and user ignores alerts (alert fatigue).

Approach: increase model specificity, add confirmation rules (must see pattern for >60 seconds), use multimodal data (combine HR with motion data to rule out artifact).

Apple Watch received FDA clearance for atrial fibrillation detection in 2018.

Drug Discovery

Developing new drugs:

Identify disease target (protein that causes disease)
Compound screening: test millions of molecules, find ones that bind to target (inhibit the disease protein)
Validation: test that the compound works in cells, then animals, then humans
Trials: prove to FDA that drug works and is safe (costs $2B, takes 10+ years)

AI’s role: accelerate step 2 (compound screening).

Traditionally: test 1000s of molecules in lab manually (months)
With AI: train generative model on known active compounds, generate new candidates predicted to work, prioritize by ML ranking model, test top 10 candidates (weeks)

Models:

Graph neural networks: molecules are graphs (atoms as nodes, bonds as edges)
Variational autoencoders: learn latent space of molecules, generate new ones by sampling latent space
Transformer models: treat molecule SMILES strings as sequences, use language models to generate candidate molecules

Success story: DeepMind’s AlphaFold (protein structure prediction). Knowing protein 3D structure is crucial for drug design. AlphaFold predicts structure from amino acid sequence. Previously took years of X-ray crystallography; AlphaFold does it in seconds. Deployed via public database, free for researchers. Impact: accelerated drug discovery, structural biology.

Challenges Specific to Healthcare

Privacy and Compliance (HIPAA):

Patient data is sensitive
Models trained on patient data must respect privacy
Real approach: train locally on hospital servers, don’t send data to cloud
Alternative: federated learning (train model across hospitals without sharing raw data)

Regulatory Approval:

FDA requires extensive validation before use
Every new version of the model must be revalidated
Slows innovation (AI companies used to weekly model updates; healthcare requires monthly or quarterly)

Fairness:

Training data is often biased (fewer examples of rare diseases, certain demographics underrepresented)
Model learns: “this demographic has low risk of disease” (statistical artifact, not reality)
Real harm: disease missed in underrepresented group due to model bias
Solution: audit for fairness, balance training data, monitor performance per demographic in production

Explainability:

Doctor needs to understand why model flagged a case
“Neural network says abnormal” doesn’t help if radiologist can’t see what the network saw
Attention mechanisms, saliency maps help: “network focused on upper left lobe” → doctor looks there
Challenge: saliency maps can be misleading or gamed

6. Recommendation Systems: The Economics of Predicting What You’ll Like

Netflix, Amazon, and Spotify are fundamentally ML companies. Their business model is: predict what users will like, recommend it, users engage more, more ad revenue or subscriptions.

Collaborative Filtering

Insight: Users with similar tastes should like similar content.

Matrix factorization:

Matrix: M[user, item] = rating (1-5 stars)
Problem: matrix is sparse (each user rates <1% of items)
Solution: factorize into two low-rank matrices: M ≈ U × V
- U: user latent factors (user vectors in 100D space)
- V: item latent factors (item vectors in 100D space)
- Prediction: M[u, i] ≈ U[u] · V[i] (dot product of user and item vectors)

Training: gradient descent to minimize ||M - U×V||² over observed entries.

Challenge: cold start. New users with no history. New items with no ratings.

Solution: ask new user for ratings (bootstrap), or infer from demographics, or use content-based approach

Content-Based Filtering

Insight: Items similar in content should appeal to the same users.

Netflix has metadata for each movie: genre, actors, director, year
Compute similarity between movies (cosine similarity in feature space)
If user liked “The Matrix,” recommend similar movies (sci-fi, special effects, Neo-like protagonist)

Advantage: no cold start for new items (you have metadata). Disadvantage: limited serendipity (always recommend similar movies, user never discovers new genres).

Serendipity vs. Accuracy

Pure accuracy: recommend movies user will definitely watch → engagement up, revenue up.

But: user gets bored (always same genre). Recommendation engines are too predictable.

Serendipity: recommend something unexpected but relevant → user discovers new genres, long-term engagement stays high.

Trade-off:

Explore-exploit: 80% exploitation (recommend what user will like), 20% exploration (recommend novel items)
Bandit algorithms: formalize the trade-off mathematically

A/B Testing at Scale

Change recommendation algorithm, measure impact:

Control group: old algorithm
Variant group: new algorithm
Metric: did users engage more? (watch time for Netflix, click-through rate for Amazon)

With billions of users, even 0.1% improvement is significant.

But: 0.1% improvement is hard to detect statistically. Need large sample sizes.

Real process:

Test new model offline (does it predict held-out ratings accurately?)
Shadow mode: run new model but don’t use predictions (measure how it would perform if deployed)
Canary: deploy to 1% of users, measure impact
Ramp: gradually increase to 10%, 50%, 100% if impact is positive

Scale and Latency

Netflix has 250M users, 10k titles. Matrix factorization (250M × 10k) would be gigabytes of memory. Instead:

Pre-compute recommendations offline: for each user, compute top 100 items (batch process, run nightly)
Store in cache (Redis): lookup is instant
Personalization: real-time factors (what user watched today) refine the cached recommendations

Latency budget: <100ms from user requesting page to recommendations appearing.

7. Natural Language Applications: Language Models in Production

From customer service chatbots to semantic search, NLP powers many applications.

Chatbots

Customer Service: user asks “how do I return this item?” → chatbot either answers directly or routes to human.

Approach:

Intent classification: does message mean “return policy question”?
Slot filling: extract parameters (what item? when purchased?)
Response generation: template-based (“Here’s our return policy…”) or neural (generate response)

Real-world deployment:

Template + rules: cheap, predictable, limited flexibility
Fine-tuned LLM: GPT-3 with few-shot examples → flexible, handles novel questions
Hybrid: use LLM for generation, but validate response before sending (check facts)

Challenge: out-of-scope questions. User asks “where does your CEO live?” (reasonable question, but not relevant to customer service). Model should say “I don’t know, please contact support.”

Summarization

Contract review: read 50-page legal document, extract key terms.

Manual: lawyer spends 2 hours, costs $400
AI: summarization model generates abstract in seconds
Reality: abstract is missing nuances, lawyer still reviews, but faster

Approach:

Extractive: copy sentences from document that are most important
Abstractive: generate new sentences that capture the document
Most successful: hybrid (extractive on top of abstractive: take key sentences, pass to summarizer)

Translation

“Break the language barrier.”

Modern translation (Google Translate, DeepL):

Neural machine translation: sequence-to-sequence model
Encoder: reads source language sentence, builds representation
Decoder: generates target language sentence from representation
Real-time constraint: <1 second from uploading document to translated output

Real-world challenges:

Domain-specific terms: “malware” in security vs “malware” in medicine (different translations)
Cultural nuance: idioms don’t translate literally
Ambiguity: “I saw the man with the telescope” → who has the telescope?

Current: neural MT is very good for common languages (>90% BLEU score on WMT benchmarks). Rare language pairs still struggle.

Semantic Search

User searches “how to fix leaky faucet” → system must find relevant documents (YouTube videos, forum posts) even if documents don’t use exact keywords.

Traditional: keyword matching (fast, brittle to synonyms)

Modern: semantic search

Index all documents: pass each through embedding model (BERT, text-embedding-3-large), get 1536D vector
User searches: embed query in same space
Find closest document vectors (cosine similarity or vector DB)
Return top K documents

Advantage: “fix water leak” matches “repair broken tap” even with no shared keywords.

Real deployment: vector database (Weaviate, Pinecone) stores millions of vectors, retrieval is <100ms.

Sentiment Analysis

Review: “The product is amazing, highly recommend.” → Label: positive.

Review: “Broke after one week, terrible build quality.” → Label: negative.

Simple RNN or fine-tuned BERT does well. Real-world:

Sarcasm: “Oh great, another bug” (negative despite positive words)
Mixed: “Good product, terrible shipping” (mixed sentiment)
Domain shift: model trained on product reviews fails on social media sentiment

Information Extraction

Document: “John Smith, age 30, employed by Acme Corp.” → Extract (person: “John Smith”, age: 30, company: “Acme Corp”)

Approaches:

Rule-based: regex patterns (“age (\d+)”)
Sequence tagging: BIO tagging (Begin, Inside, Outside tags for each entity type)
Generative: prompt LLM (“extract person name, age, company from this text”)

Deployment: APIs vs. Local

Cloud API (OpenAI, Anthropic):

Pros: latest models, no hardware cost, easy to scale
Cons: privacy (data sent to cloud), latency (network round-trip), cost per request

Local (Ollama, Hugging Face Transformers):

Pros: privacy, offline capability, no per-request cost
Cons: need GPU hardware, older models, scaling is manual

Real-world: enterprises with sensitive data run local. Startups use APIs.

8. Computer Vision Applications: Seeing the World

Beyond autonomous vehicles, computer vision powers many systems.

Object Detection

Example: retail shelf monitoring. Camera observes shelves, detects if products are out of stock.

Model: YOLO (You Only Look Once), Faster R-CNN

Input: image
Output: bounding boxes + class labels + confidence scores

Deployment:

Edge device (camera on shelf) runs inference locally
Sends alerts to store manager: “milk is out of stock, aisle 3”
Latency: <100ms per image

Challenge: 10,000 different products. Model can’t be trained on all. Solution:

Train on broad categories (milk, bread, water bottles)
Use instance segmentation to distinguish individual products via appearance
Augment with barcode scanning (ground truth for out-of-stock)

Face Recognition

Use cases: security (access control), photo organization, law enforcement

Technical: face detection + feature extraction + comparison

Face detection: MTCNN, RetinaFace (find face in image)
Feature extraction: ResNet, VGGFace (convert face to 128D vector)
Comparison: compute distance between vectors (same person = small distance)

Real-world:

Accuracy: ~99% on benchmark datasets, lower in the wild (lighting, angle, expression)
Privacy concerns: mass surveillance, bias against minorities
Regulatory: EU restricts real-time facial recognition by law enforcement

Segmentation: Pixel-Level Understanding

Semantic segmentation: label each pixel (road, building, tree, person)

Use case: autonomous vehicles. Knowing “that’s a road” is more useful than “there’s a car-shaped object.”

Instance segmentation: separate individuals (three people → three masks)

Models: U-Net (dense prediction), Mask R-CNN (combines detection + segmentation)

Pose Estimation

Detect human keypoints: head, shoulders, elbows, wrists, hips, knees, ankles.

Applications:

Sports analysis: swing biomechanics, running form
Fitness: is user doing squat correctly?
Healthcare: physical therapy (did patient do prescribed exercises?)

Model: OpenPose, MediaPipe, PoseNet

Input: video
Output: 17 keypoints per person, 25 frames/sec

Document Understanding

OCR (Optical Character Recognition): extract text from images of documents.

Modern approach: end-to-end model (detect text regions + recognize characters in one pass)

Layout understanding: document has structure (title, paragraphs, tables). Models must preserve structure.

Use cases: digitize paper documents, extract information from invoices (date, amount, vendor).

Real-world: combination of OCR + NLP. OCR extracts text, NLP extracts structured information (date field, amount field).

Video Analysis

Frame-level analysis is too slow (30 fps × N frames = huge computation).

Approach: 3D convolutions or recurrent models

3D CNN: convolve over time + space (learn temporal patterns)
I3D (Inflated 3D): treat video as 3D data, scale up 2D models to 3D

Use cases:

Activity recognition: is person playing soccer or swimming?
Anomaly detection: unusual activity in surveillance footage
Video understanding: summarize what happens in video

9. Financial Applications: AI Making High-Stakes Decisions

Finance involves regulatory oversight and adversarial actors. ML systems here are especially fraught.

Fraud Detection

Credit card transactions: millions per day, 0.1% are fraudulent (legitimate transactions mixed with fraud).

Approach:

Baseline: user’s normal behavior (location, time of day, merchant type)
Real-time model: is this transaction anomalous?
If suspicious: extra verification (call user, require 2FA)

Models: random forest (interpretable, good for fraud), neural networks (higher accuracy, black box).

Real-world:

False positive rate: flag too many legitimate transactions and users get frustrated
False negative rate: miss fraud and customer is liable or bank loses money
Trade-off: set threshold based on cost of error (cost of false positive vs. cost of false negative)

Challenges:

Concept drift: fraud patterns change (criminals adapt to detection system)
Recurring monitoring: retrain model weekly to catch new patterns
Explainability: if transaction is blocked, user wants to know why

Credit Scoring

Loan application: predict if applicant will repay.

Data: income, employment history, credit history, loan amount, purpose

Model: logistic regression (simple, interpretable) or gradient boosting (higher accuracy)

Regulatory: Fair Lending Act (don’t discriminate based on protected characteristics: race, gender, religion, national origin).

Problem: protected characteristics are correlated with wealth/income (historical injustice). Model trained on data learns the correlation.

Solution:

Remove protected features from input (but correlated features might still encode protected info)
Audit model for disparate impact (does model treat minorities differently?)
Adjust decision thresholds to equalize approval rates across demographics
Explicit fairness constraints in optimization

Algorithmic Trading

High-frequency trading: make 1000s of trades per second, profit from small price differences.

Real-time constraint: decision in <1 microsecond (1000x faster than blink reflex).

Approach:

Model observes market microstructure (order book, recent trades, news)
Predicts next price movement
Places trades if profit expected exceeds cost

Challenges:

Latency: colocate servers at exchange to reduce network latency
Model risk: if model’s prediction is wrong, losses are instant (markets are 24/5)
Regulatory: SEC oversees automated trading, limits certain strategies
Adversarial: other traders are also sophisticated, model must outcompete them

Risk Management

Portfolio optimization: given 1000 assets and return/risk, allocate capital to maximize return for acceptable risk.

Classical: Mean-variance optimization (Markowitz). Choose weights that minimize variance for target return.

Modern: use ML to predict asset correlations and returns, then optimize.

Challenge: predictions are uncertain. Model predicts Apple stock will return 10%, but uncertainty is ±20%. Optimization is sensitive to predictions; small change in prediction → large change in allocation.

Real approach: robust optimization (optimize for worst case within uncertainty bounds).

Compliance

AML (Anti-Money Laundering): detect suspicious patterns.

Models: graph neural networks to detect money laundering rings (suspicious transfers between accounts), or anomaly detection (unusual transfer amounts/frequencies).

Regulatory: banks must report suspicious activity. Model flags suspicious transactions, analysts review, escalate if confirmed.

10. Smart Home & IoT: Ambient Intelligence

Homes with 10-100 connected devices learning user preferences and automating comfort/security.

Occupancy Detection

Problem: is the house empty? If yes, turn off HVAC to save energy.

Sensors:

Motion detectors (PIR sensors)
Door/window sensors (entry points)
Cameras (computer vision)
WiFi network (devices connected?)

Model: ensemble combining multiple signals.

If motion detected in last 10 min → occupied
If door was unlocked and weather is cold → likely occupied
If no devices on network → unoccupied

Real-world: false negatives (model thinks empty when occupied) cause comfort loss. False positives (thinks occupied when empty) waste energy. Tune threshold based on preference.

Security

Intrusion detection:

Window broken? (acoustic signature)
Door forced? (motion + door sensor + timeofday = 3am = suspicious)
Unusual entry pattern? (timing, door order)

Models: anomaly detection (learn normal patterns, alert on deviations).

Real-world: false alarms desensitize user to alerts. A real intrusion must be caught.

Advanced: face recognition at doorbell (is this person authorized to enter?).

Energy Optimization

Prediction: forecast next 24 hours of electricity demand and solar generation.

Optimization: schedule large appliances (water heater, EV charging) when solar production is high (peak midday).

Real-world:

Weather forecast error → solar prediction error
User changes patterns (travel unexpectedly) → demand prediction fails
Optimization must be conservative (avoid running out of battery power)

Voice Control

Smart speakers (Alexa, Google Home, Siri) understand spoken commands.

Pipeline:

Audio → speech-to-text (transcribe)
Text → intent classification (turn on lights, adjust temperature)
Intent → action (send command to light controller)

Privacy concern: device records audio, sends to cloud for processing. Some users uncomfortable.

Solution: on-device processing. Recent models are small enough to run on speaker hardware. Google Recorder does on-device speech recognition, Alexa is moving this way.

Privacy: On-Device Preferred

IoT devices contain intimate home data: when you’re home, what rooms you’re in, temperature preferences, security.

Approach: process locally, don’t send to cloud.

Trade-off: cloud models are more powerful (larger, better trained). Local models are smaller, less accurate, but private.

Real deployment: hybrid.

On-device: basic functionality (voice commands, simple automation)
Cloud: when needed (complex queries, learning patterns over time)

11. Real-World Deployment Patterns

How do ML systems actually get deployed?

Batch Processing

Pattern: process data once per day/hour, store results, serve from cache.

Example: predictive maintenance.

Every night: run LSTM on 100 machines’ sensor data from past 30 days
Output: maintenance schedule for next week
Store in database
Maintenance team queries database in morning

Advantages:

Can use expensive models (nightly batch has time budget)
Can recompute if needed (no real-time constraint)
Cost is predictable (fixed compute, fixed time)

Disadvantages:

Latency: if event happens at 11pm, notification comes 12 hours later
No adaptation: schedule doesn’t change until next batch

Real-Time Streaming

Pattern: process events as they arrive, make decisions instantly.

Example: fraud detection.

User swipes card
In <100ms: run fraud model on transaction
If suspicious: block or challenge
Send response to card terminal

Advantages:

Instant feedback (user knows immediately)
Adapts to latest data

Disadvantages:

Latency budget is tight (<100ms)
Can’t afford expensive models (must be fast)
Models must be simple + optimized

Technology: Kafka (event streaming), real-time ML frameworks (Seldon, BentoML), feature stores (Tecton, Feast).

Hybrid: Batch Retraining, Streaming Inference

Most large-scale systems use this:

Batch: nightly retraining on accumulated data, produce new model
Streaming: serve latest model to inference engine
Real-time: send predictions to user instantly

Example: Netflix recommendation.

Batch: nightly, retrain collaborative filtering on 24 hours of new watch data (too much data to update in real-time)
Streaming: user logs in, lookup precomputed recommendations (fast, <1ms)
Real-time: if new user, fallback to content-based recommendations

Fallback: When ML Fails

Pattern: always have a backup plan.

If ML model crashes, is slow, or gives nonsensical output → fallback to simpler method.

Example: autonomous vehicle.

Primary: neural network planner (sophisticated)
Fallback 1: reactive planner (stop if obstacles detected)
Fallback 2: manual control (human takes over)

Example: recommendation engine.

Primary: collaborative filtering
Fallback 1: content-based recommendations
Fallback 2: trending items (what’s popular today)
Fallback 3: editorial picks (human curated)

Real-world deployment requires fallback chains. Distributed systems are fragile; assume something will fail.

Gradual Rollout: De-Risking Deployment

New model: unknown behavior on real data. Rollout strategy:

Shadow Mode:

Run new model but don’t use predictions
Log predictions for analysis
Compare to old model: does new model agree?
If not → investigate before rollout

Canary:

Deploy new model to 1% of users
Monitor metrics (latency, accuracy, crashes)
If metrics good → increase to 10%
Gradually ramp to 100%

Metrics to watch:

Latency: did inference slow down?
Accuracy: are predictions still good? (A/B test against old model)
Crashes: does new model have bugs?
Business metrics: did engagement/revenue change?

Example: Facebook tests feed ranking algorithm on 1% of users, measures engagement. If engagement up, ramp to more users. If down, investigate or revert.

Real: rollouts take days to weeks. A bad model reaching all users could harm business significantly.

12. Common Challenges Across All Applications

Data Quality

Garbage in, garbage out. Most ML failures are data problems, not algorithm problems.

Issues:

Labeling errors: human labels are wrong (inter-rater disagreement)
Missing data: sensors fail, data gets lost
Outliers: one extreme example skews training (one person spends $1M, model thinks everyone does)
Data imbalance: 99.9% negative examples, 0.1% positive (rare events)

Detection:

Visualize label distribution (is it realistic?)
Check for duplicate examples (data leakage)
Compare train/test label distribution (should be similar)
Get multiple annotators (measure agreement, find inconsistencies)

Distribution Shift

Problem: model trained on data type X, deployed on data type Y. Performance drops.

Examples:

Chest X-ray model trained on modern equipment, deployed on older machines → accuracy drops
Fraud model trained on 2020 fraud patterns, deployed in 2024 when fraud techniques changed → model misses new patterns
Autonomous vehicle model trained on California weather, deployed in Seattle → fails in rain

Detection:

Monitor predictions over time: if distribution changes, flag alert
Compare model accuracy on recent data vs. old data
Use test-time adaptation: update model slightly on new data

Solution:

Continuous retraining: retrain on new data monthly
Domain adaptation: transfer learning from new domain
Robustness: train on diverse data to generalize

Regulatory Compliance

Different domains have different rules:

Healthcare: FDA approval required, extensive validation, documentation
Finance: Fair lending laws, explainability required, risk management
EU: GDPR (right to explanation), AI Act (classification by risk level)
Autonomous vehicles: state-level approval, safety validation

Real impact: can’t deploy in some regions due to regulation. Can’t use certain models (black-box neural networks) where explainability is required.

Privacy

Regulations:

GDPR: EU resident data is protected, users have right to access/delete
CCPA: California similar to GDPR
HIPAA: medical data

Practical:

Don’t send sensitive data to cloud (keep medical data in-hospital)
Use differential privacy: add noise to training data so individual contributions are hidden
Federated learning: train model across hospitals without centralizing data

Real-world: data is often the moat (more data → better model). Privacy regulations limit data access, slowing model improvement.

Explainability

Why does it matter?

User trust: “why was I denied a loan?” (lender must explain)
Debugging: “why did model fail here?” (engineer must understand)
Safety: “why did car brake?” (reassure passengers)

Techniques:

Attention mechanisms: show which parts of input model focused on
Feature importance: which features drove prediction?
LIME: local interpretable model-agnostic explanations
Saliency maps: visualize gradients (which pixels matter?)

Limitations: explanations can be misleading (models can have spurious correlations). A saliency map showing “blue color” matters doesn’t mean blue is causal; might be correlated with actual cause.

Fairness and Bias

Problem: model discriminates against group (race, gender, age).

Sources:

Biased training data (historical injustice encoded in data)
Proxy features (zip code correlates with race, not causal but predictive)
Model amplifies: learns subtle correlations that humans wouldn’t consciously use

Detection:

Disaggregate metrics by demographic: is model equally accurate for all groups?
Compare decision rates: does model accept/reject equitable proportions across groups?

Solutions:

Balance training data: oversample underrepresented groups
Fairness constraints: explicitly minimize group disparities during training
Threshold adjustment: use different decision thresholds for different groups (controversial)

Real-world: perfect fairness is impossible (trade-offs between accuracy and fairness). Find acceptable trade-off, document it, monitor for drift.

Cost of Inference

Cloud APIs: $0.001 per request (ChatGPT API), millions of requests → significant cost.

On-device: one-time cost of hardware, no per-request cost. But limited accuracy (small models).

Trade-off: accuracy vs. cost.

ChatGPT (highest cost, highest accuracy)
Open-source LLama (free, lower accuracy)
Distilled model (cheap, decent accuracy, smaller)

Real deployment: use cheap model first. If accuracy insufficient, use expensive model.

Latency: Speed of Response

Real-time applications (fraud, autonomous vehicles): <100ms requirement.

Models must be small and optimized
Can’t afford big transformers, use distilled models or classical algorithms

Batch applications (predictive maintenance, recommendations): seconds to minutes.

Can use larger models
More processing time available

Trade-off: size vs. accuracy.

Large transformer: highest accuracy, slow
Distilled model: slightly lower accuracy, 10x faster
Choose based on requirements

13. How These Connect to Harnesses

A harness is the decision-making layer orchestrating complex systems. Each real-world application follows a pattern:

Perception → Decision → Action

Harness is the Decision step.

Autonomous Vehicle Harness

Perception (camera/LIDAR → object detections)
    ↓
Harness: Planning & Control
- Prediction: what will others do?
- Planning: safe path given predictions
- Control: steering commands
    ↓
Action (actuators: steering wheel, brakes, accelerator)

Robotics Harness

Perception (camera → object detection, localization → SLAM)
    ↓
Harness: Manipulation & Navigation
- Object detection → where to grasp?
- Planning: how to reach object?
- Grasping: what grasp strategy?
    ↓
Action (robot arm: move to position, close gripper)

Smart Grid Harness

Perception (sensors → power flows, demand, generation)
    ↓
Harness: Optimization
- Load forecasting: predict demand
- Optimization: allocate generation
- Fault localization: where is problem?
    ↓
Action (controllers: reroute power, start generator, trip circuit)

In each case, the harness must:

Integrate perception (uncertain sensor data, multiple modalities)
Reason about the world (predictions, planning, optimization)
Make decisions (choose action that achieves goals)
Execute under constraints (real-time, reliability, safety)

The perception systems are deep learning (neural networks). The harness could be classical algorithms (optimization, planning), learned models (reinforcement learning), or hybrid.

Conclusion

AI in production is not the lab version. It’s:

Integrated: perception, decision, action working together
Constrained: latency, cost, power, reliability requirements
Uncertain: sensors fail, data shifts, edge cases happen
Real-time: decisions in milliseconds or predictions 24 hours ahead
High-stakes: failures have real consequences (crashes, blackouts, financial loss)
Regulated: privacy, fairness, explainability requirements vary by domain

The common pattern: perception systems extract meaning from raw data, harnesses orchestrate decisions, controllers execute actions. The harness is where real-world AI becomes practical.

Understanding these applications shows why harnesses matter: they translate ML predictions into reliable systems. Without a proper harness, AI is just math—good predictions don’t help if the system can’t integrate them into decisions that actually work.

Validation Checklist

How do you know you got this right?

Performance Checks

Identified which application pattern fits your domain (autonomous vehicle, robotics, predictive maintenance, smart grid, healthcare, recommendations, NLP, computer vision, finance, smart home)
Have a latency budget for your use case: real-time (<100ms), near-real-time (<1s), or batch (minutes to hours)
Measured end-to-end system latency from sensor/input to action/output on representative data

Implementation Checks

Perception-Decision-Action pipeline defined: each stage has clear inputs, outputs, and latency allocation
Fallback chain implemented: primary ML model -> simpler model -> rule-based logic -> human escalation
Gradual rollout strategy planned: shadow mode -> canary (1% of traffic) -> ramp to 100%
Data quality checks in place: label distribution validated, duplicates removed, train/test split verified
Distribution shift monitoring configured: track model accuracy on recent data, alert if performance degrades
Domain-specific regulatory requirements identified (FDA for healthcare, Fair Lending for finance, GDPR for EU data)
Cost of inference calculated: per-request cost for cloud API vs amortized hardware cost for your expected volume

Integration Checks

Harness orchestration layer connects perception outputs to decision logic to action execution
A/B testing infrastructure ready: can compare old model vs new model on live traffic with statistical significance
Monitoring dashboards configured: latency, accuracy, crash rate, and business metrics tracked per deployment

Common Failure Modes

Distribution shift in production: Model trained on historical data fails on current patterns (fraud evolves, weather changes, equipment ages). Fix: retrain monthly on new data, monitor accuracy per time window, set up automated drift detection.
False positive fatigue: Too many alerts desensitize users (security alarms, predictive maintenance warnings, medical alerts). Fix: increase model specificity, add confirmation rules (pattern must persist >60 seconds), tune threshold based on cost of false positive vs false negative.
Edge case domination: 99.9% of data is easy; the 0.1% of edge cases causes all real failures. Fix: active learning to prioritize hard examples for labeling, simulation for synthetic edge cases, gradual deployment with human oversight.
Explainability gap: Stakeholders (doctors, regulators, customers) don’t trust black-box predictions. Fix: add attention maps, LIME explanations, or feature importance scores; document model limitations in user-facing documentation.

Sign-Off Criteria

End-to-end system tested on real data (not just held-out test set) with realistic traffic patterns and failure scenarios
Fallback behavior verified: system degrades gracefully when model fails, is slow, or gives low-confidence output
Regulatory compliance confirmed for your domain (HIPAA audit for healthcare, fairness audit for lending, safety validation for autonomous systems)
Business metrics tracked: does the ML system improve the metric you care about (revenue, safety, efficiency, user satisfaction)?
Runbook documented: what to do when the model fails in production (who to alert, how to revert, when to retrain)

1. Autonomous Vehicles: A Complete System

System Architecture

Real-World Complexity

Current State of the Art

Challenges

2. Robotics Ecosystems: From SLAM to Grasping

Navigation: SLAM

Manipulation: Object Detection to Grasping

Learning from Demonstration

Reinforcement Learning for Robotics

Future: General-Purpose Robots

3. Industrial IoT & Predictive Maintenance: Sensing Failure Before It Happens

Data Pipeline

Models: LSTM and GRU

Alerting Strategy

Data Collection and Baseline

Real-World Deployment

Economics

4. Smart Power Grids: Real-Time Optimization Under Uncertainty

Load Forecasting

Anomaly Detection

Fault Localization

Optimization: Microgrids

Real-World Complexity

5. Healthcare Applications: When AI Decisions Impact Lives

Diagnostics: Radiology

Wearables and Continuous Monitoring

Drug Discovery

Challenges Specific to Healthcare

6. Recommendation Systems: The Economics of Predicting What You’ll Like

Collaborative Filtering

Content-Based Filtering

Serendipity vs. Accuracy

A/B Testing at Scale

Scale and Latency

7. Natural Language Applications: Language Models in Production

Chatbots

Summarization

Translation

Semantic Search

Sentiment Analysis

Information Extraction

Deployment: APIs vs. Local

8. Computer Vision Applications: Seeing the World

Object Detection

Face Recognition

Segmentation: Pixel-Level Understanding

Pose Estimation

Document Understanding

Video Analysis

9. Financial Applications: AI Making High-Stakes Decisions

Fraud Detection

Credit Scoring

Algorithmic Trading

Risk Management

Compliance

10. Smart Home & IoT: Ambient Intelligence

Occupancy Detection

Security

Energy Optimization

Voice Control

Privacy: On-Device Preferred

11. Real-World Deployment Patterns

Batch Processing

Real-Time Streaming

Hybrid: Batch Retraining, Streaming Inference

Fallback: When ML Fails

Gradual Rollout: De-Risking Deployment

12. Common Challenges Across All Applications

Data Quality

Distribution Shift

Regulatory Compliance

Privacy

Explainability

Fairness and Bias

Cost of Inference

Latency: Speed of Response

13. How These Connect to Harnesses

Autonomous Vehicle Harness

Robotics Harness