Regulatory & Ethics
GDPR, HIPAA, FTC compliance — fairness, bias detection, explainability, and responsible AI governance for production systems.
Status: Phase 3 - Critical for regulated industries, important for all systems
Last Updated: April 2026
Audience: Legal teams, compliance officers, senior engineers, product leaders
Executive Summary
Harnesses operate in an increasingly regulated environment where AI systems are subject to government oversight, data protection laws, and ethical standards. This document provides a framework for building compliant, fair, and ethical harness systems.
Key regulatory drivers (April 2026):
- FTC Guidelines on AI Transparency: Mandatory disclosure of AI involvement; prohibition of deceptive practices
- GDPR: Applies to any system processing EU citizen data; strict consent and deletion requirements
- HIPAA: If handling healthcare data; requires encryption, audit trails, and access controls
- SOC 2 Type II: Industry standard for security, availability, and confidentiality
- Sector-specific regulations: Finance, healthcare, insurance, employment, education
This framework is not optional for regulated industries and should be adopted proactively by all organizations building harnesses.
1. Regulatory Landscape (April 2026)
1.1 FTC Guidance on AI and Transparency
Status: Updated April 2024; actively enforced
Key requirements:
- Disclosure: Must clearly disclose when AI is making decisions that affect users
- Example: “This recommendation is AI-generated” or “An AI system reviewed your application”
- Substantiation: Any claims about AI capabilities must be truthful and substantiated
- Cannot claim “AI determines your best match” without evidence
- Deception prohibition: AI systems cannot:
- Impersonate humans
- Make false promises about capabilities
- Hide material limitations
- Vulnerability targeting: Special protections for vulnerable populations (children, elderly)
How it applies to harnesses:
- If a harness generates recommendations, summaries, or decisions → disclose AI involvement
- If a harness claims accuracy, safety, or fairness → substantiate with testing
- If a harness will be used with vulnerable users → implement additional safeguards
FTC enforcement examples:
- ChatGPT falsely claimed it would not retain conversation data for training (settled for $5M)
- Social media AI targeting children without adequate disclosure (ongoing investigations)
- Job recommendation AI showing gender bias without disclosure (multiple companies fined)
1.2 GDPR Requirements (General Data Protection Regulation)
Applies to: Any system processing data of EU residents, regardless of where the company is based
Key principles (Articles 5-14):
- Lawfulness & Transparency: Processing must be lawful, fair, and transparent
- Purpose limitation: Data collected for stated purpose only
- Data minimization: Collect only what is necessary
- Accuracy: Keep data accurate and up-to-date
- Storage limitation: Delete after purpose is fulfilled
- Integrity & confidentiality: Protect with appropriate security
User rights:
- Right of access: Users can request all data held about them
- Right to rectification: Users can correct inaccurate data
- Right to erasure (“right to be forgotten”): Users can request deletion (with exceptions)
- Right to restrict processing: Users can pause processing without deletion
- Right to data portability: Users can get their data in machine-readable format
- Right to object: Users can object to processing for marketing, profiling
- Rights related to automated decision-making: Users have right to explanation and human review for automated decisions
Penalties: Up to 4% of global annual revenue or €20M (whichever is higher)
How it applies to harnesses:
- If harness processes EU user data → register with Data Protection Authority
- If harness makes automated decisions (approving/denying, categorizing users) → provide explanation
- Must implement “privacy by design” (data protection from the start, not added later)
- Must document processing in Data Protection Impact Assessment (DPIA)
- Data deletion must be possible and timely
1.3 HIPAA (Health Insurance Portability and Accountability Act)
Applies to: Healthcare providers, plans, clearinghouses + their business associates
Protected Health Information (PHI) includes:
- Medical records and medical billing information
- Genetic information, biometric information for ID
- Health plans, prescription information
- Anything that can identify a patient combined with health info
Key requirements:
- Minimum necessary: Use only the PHI needed for stated purpose
- Encryption: All PHI in transit and at rest
- Access controls: Only authorized personnel can access PHI
- Audit trails: Immutable logs of who accessed what data and when
- Breach notification: Notify affected individuals within 60 days of discovery
- Business Associate Agreements: Any third party handling PHI must have written BAA
Penalties: Up to $1.5M per violation category per year
How it applies to harnesses:
- If harness processes PHI (patient data, medical history) → must be HIPAA-compliant
- Must encrypt all training data and production data
- Must maintain audit trails of model access and inference
- Cannot use PHI for training without explicit consent
- Consider differential privacy for model training on sensitive health data
1.4 SOC 2 Type II Compliance
Status: Industry standard for security and operational controls
Scope: Five trust service criteria
- Security: System is protected against unauthorized access
- Availability: System is available for operation and use
- Processing integrity: Data is complete, accurate, timely
- Confidentiality: Confidential information is protected
- Privacy: Personal information is collected, used, retained per laws
What’s required:
- Security policies and access controls
- Encryption of sensitive data
- Incident response plan
- Regular security audits (third-party attestation)
- Employee training on security
- Monitoring and alerting
How it applies to harnesses:
- Enterprise customers often require SOC 2 Type II certification
- Demonstrates security maturity, reduces customer risk
- Requires sustained compliance (audited annually)
1.5 Industry-Specific Regulations
Financial Services (SEC, FINRA)
- Model risk management (model validation, monitoring, documentation)
- Cannot use models for trading recommendations without disclosure
- Must explain model decisions to regulators on request
- Applies to: Investment platforms, robo-advisors, loan underwriting
Employment (EEOC, State Laws)
- AI hiring/promotion tools must not discriminate based on protected attributes
- Must audit for disparate impact (different outcomes for protected groups)
- Applies to: Recruitment systems, performance evaluation tools
Education (FERPA)
- Cannot disclose student education records to third parties
- If using AI on student data, must maintain same confidentiality as paper records
- Applies to: Student assessment, tutoring systems, enrollment systems
Insurance (State Insurance Commissioners)
- Cannot use discriminatory proxies (e.g., zip code as proxy for race)
- Must justify underwriting decisions using legitimate factors
- Applies to: Claims processing, pricing models, risk assessment
Autonomous Systems & Safety (NHTSA, FAA)
- Self-driving vehicles: must maintain safety records, disclose limitations
- Autonomous drones: must have failure modes, human override
- Applies to: Robots, autonomous agents, safety-critical systems
1.6 How Regulations Apply to Harnesses
Risk matrix: Is your harness regulated?
| Harness Type | Regulated Industries | Key Regulations | Risk Level |
|---|---|---|---|
| Data processing agent | All if processing PII | GDPR, CCPA, HIPAA | Medium |
| Recommendation engine | Finance, healthcare, employment | FTC, SEC, EEOC | High |
| Hiring/promotion system | Employment | EEOC, state labor laws | Critical |
| Medical diagnosis assistant | Healthcare | HIPAA, FDA | Critical |
| Financial advisor | Finance | SEC, FINRA | Critical |
| Content moderation agent | All consumer-facing | FTC, Platform rules | Medium |
| Customer support bot | All with PII | GDPR, CCPA, industry-specific | Medium |
| Internal automation tool | Depends on data | Industry-specific | Low-Medium |
Assessment questions:
- Does your harness make consequential decisions about people? (recommendations, approvals, denials, rankings)
- Does it process personal or health data?
- Will it be used in regulated industries?
- Could it affect access to services (finance, employment, housing, education, healthcare)?
- Will it be used with vulnerable populations (children, elderly)?
If yes to any: Your harness is likely regulated. Proceed to compliance sections below.
2. Data Privacy & GDPR Compliance
2.1 What Qualifies as Personal Data
Personal data (GDPR Art. 4): Any information relating to an identified or identifiable person
Includes:
- Direct identifiers: Name, email, phone, ID number
- Pseudonymized data: Data where individual cannot be identified without additional info (but with reasonable effort they could be)
- Genetic data: DNA sequences, family relationships
- Biometric data: Fingerprints, facial recognition, iris scans, voiceprints
- Special categories (require explicit consent):
- Race/ethnicity, political opinions, religious beliefs
- Trade union membership, genetic data, biometric data for ID
- Health data, sexual orientation, criminal convictions
Does NOT include:
- Fully anonymized data (cannot be linked back to individual, even with significant effort)
- Business contact information (corporate email, business address) — if genuinely only for business
- Aggregate statistics (e.g., “70% of users prefer feature X”) — if no individual can be identified
2.2 Data Minimization Principle
Collect only what you need.
Questions to ask:
- Do I need this data to fulfill the stated purpose?
- Could I achieve the goal with less detailed data?
- Could I use aggregated or anonymized data instead?
- What’s the minimum retention period needed?
Examples:
| Purpose | Needed Data | Excessive Data |
|---|---|---|
| Send password reset link | Email address | Email + phone + address |
| Recommend products | Browsing history, preferences | Full browsing history + location + family info |
| Train model on diversity | Demographic attributes | Names, addresses, employer details |
| Verify age (13+) | Birth year | Full date of birth, SSN |
Implementation:
- In database schema: only store minimum fields
- In APIs: accept only necessary parameters
- In model training: exclude unnecessary features
- In logging: redact sensitive fields before writing logs
- In exports: provide only relevant data to third parties
2.3 User Consent Requirements
Lawful basis for processing (GDPR Art. 6) — choose one:
- Consent: User explicitly agrees to processing (must be specific, granular, informed)
- Contract: Processing necessary to fulfill a contract with user
- Legal obligation: Required by law
- Vital interests: Necessary to protect someone’s life
- Public task: Necessary for public authority to perform official duty
- Legitimate interests: Balancing test (your interest vs. user privacy) — requires clear documentation
Consent requirements:
- Specific: “Use my data to train models” is OK; “We may use your data” is not
- Granular: Allow users to consent to email separately from SMS
- Informed: Must explain what data, how it will be used, with whom it’s shared
- Freely given: Cannot be forced (not a condition of service unless necessary)
- Separate from T&Cs: Cannot hide consent in dense terms & conditions
- Documented: Must record that consent was given, when, and for what
- Easy to withdraw: Users must be able to revoke consent as easily as they gave it
Special rules for children:
- Below age 13 (varies by country): Need parental consent, not child’s
- Ages 13-18: Can give own consent but consider additional safeguards
Example consent flow:
[ ] I consent to storing my email for account notifications
[ ] I consent to using my anonymized browsing data to improve recommendations
[ ] I consent to sharing my data with analytics platform [AnalyticsCorp]
[ ] I consent to using my health data to train wellness models (requires review for fairness)
Learn more about: [Data processing] [Our privacy practices] [Your rights]
[Decline All] [Accept Checked]
2.4 Right to Be Forgotten (Erasure)
What it means: Users can request deletion of their personal data
Exceptions (data that may not be deleted):
- Required by law
- Fulfilling contract with user
- In public interest (e.g., journalist records)
- Exercising freedom of expression
- For legal claims
- Public health in the public interest
Implementation challenge: After training a model on user data, retraining without that user’s data is expensive
Solutions:
- Remove from training set: Retrain model excluding user’s data (expensive but thorough)
- Differential privacy: Train model in way that individual contribution is minimal (probabilistic)
- Federated learning: Never centralize user data; train on devices or secure enclaves
- Layer of abstraction: Train on minimally identifiable features; aggregate before training
Process:
- User requests deletion (email, API, delete account button)
- Verify identity and find all data for that user
- Delete from systems (databases, backups, analytics)
- Delete from archives after backup retention period
- Document deletion for audit trail
- Notify user within 30 days
2.5 Data Retention Policies
Principle: Delete data when no longer needed
Recommended retention periods by data type:
| Data Type | Retention Period | Justification |
|---|---|---|
| Account credentials (hashed) | Until account deleted | Needed for authentication |
| Email/contact info | Until account deleted or user requests deletion | Needed for communication |
| Payment info | Until transaction settled + 7 years | Tax and fraud investigation |
| Login audit logs | 90-365 days | Security monitoring, GDPR audit |
| Error logs with PII | 30 days or less | Debugging; minimize PII in logs |
| Health/genetic data | Until purpose fulfilled + 1 year | GDPR special category |
| Consent records | Until consent withdrawn or 5 years | GDPR requirement |
| Model training data | Until model retired | Can be aggregated/anonymized after |
| Backups | 30-90 days (encrypted) | Disaster recovery |
| Deleted account data | Delete within 30 days | GDPR requirement |
Implementation:
- Set expiration dates in database (TTL, automatic deletion)
- Audit trail: log when data was deleted, by whom, why
- Verify deletion: sampling check that deletion actually occurred
- Backup retention: backups are harder to delete; consider encrypted backups with auto-expiration
2.6 GDPR Implementation Checklist
Assess and document:
- Do we process EU resident data? (If yes, GDPR applies)
- Have we completed a Data Protection Impact Assessment (DPIA)?
- Have we appointed a Data Protection Officer (if required by size/type)?
- Have we registered with EU Data Protection Authority?
- Do we have lawful basis documented for each processing activity?
Consent and transparency:
- Privacy policy clearly explains data collection and use
- Consent mechanism is specific, granular, informed, freely given
- Easy option to withdraw consent (in user settings)
- Cookie consent (if applicable) meets GDPR standards
User rights:
- Process to respond to access requests (within 30 days)
- Process to respond to deletion requests
- Process to respond to portability requests (machine-readable format)
- Process to respond to objection requests
- Test that deletion actually works in all systems
Technical safeguards:
- Encryption in transit (HTTPS, TLS)
- Encryption at rest (database encryption, encrypted backups)
- Access controls (who can see what data)
- Audit logging (immutable logs of data access)
- Regular security audits
Training and documentation:
- All staff handling data trained on GDPR
- Data Processing Agreements with any third parties
- Breach notification plan (notify supervisory authority within 72 hours)
- Records of processing: what, why, for how long
3. PII Detection & Protection
3.1 Sensitive Data Types
Personally Identifiable Information (PII): Data that directly identifies or can be used to identify someone
High-risk PII (can cause identity theft, fraud, or serious harm if exposed):
- Government identifiers: Social Security Number, driver’s license, passport, tax ID
- Financial: Credit card, bank account, routing number, PIN
- Biometric: Fingerprints, DNA, facial recognition templates, iris scans
- Health/genetic: Medical records, diagnoses, genetic data, mental health info
- Precise location: GPS coordinates, home address (combined with name)
Medium-risk PII (sensitive but less directly harmful):
- Contact: Full name + phone/email/address, work contact info
- Demographic: Date of birth, race/ethnicity, religious beliefs
- Educational: School/university name, grades, degree
- Professional: Job title, employer, salary, client list
Low-risk PII (generally public):
- First name, general city, public LinkedIn profile
3.2 Detecting Sensitive Data
Automated detection patterns:
# SSN: 123-45-6789 or 123456789
\b\d{3}-?\d{2}-?\d{4}\b
# Credit card: 16-digit number with spaces/dashes
\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b
# Email: standard format
\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
# Phone: various formats
\b(\+1)?[\s.-]?\(?[0-9]{3}\)?[\s.-]?[0-9]{3}[\s.-]?[0-9]{4}\b
# IP address (can be PII in some contexts)
\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b
# Keywords in context
"password": ".*"
"api_key": ".*"
"ssn": ".*"
"credit_card": ".*"
Limitations of patterns:
- Generate false positives (e.g., “123-45-6789” in a fictional example)
- Miss variations (spaces, typos)
- Don’t detect semantic PII (e.g., “my mother was born in 1952” = age + family)
- Don’t detect combinations (e.g., job title + unusual name = identifiable)
Better approach: Combination of patterns + semantic detection + human review
3.3 Redaction Before Logging
Problem: Logs often contain PII by accident (error messages, request bodies, stack traces)
Example: User reports error
ERROR: Failed to process payment
Request: POST /api/payment
card_number: 4532-1234-5678-9123
cvv: 234
amount: 19.99
Stack trace: ... payment.py line 456
Solution: Redaction
ERROR: Failed to process payment
Request: POST /api/payment
card_number: [REDACTED]
cvv: [REDACTED]
amount: [REDACTED]
Stack trace: ... payment.py line 456
Implementation approach:
- Configure logging to redact certain fields
- Redact PII patterns before writing to logs
- Use structured logging (JSON) to identify sensitive fields
- Regular audit of logs for leaked PII
Code example (Python):
import logging
import re
class RedactionFilter(logging.Filter):
def filter(self, record):
# Redact SSN
record.msg = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[REDACTED-SSN]', str(record.msg))
# Redact credit card
record.msg = re.sub(r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b', '[REDACTED-CC]', str(record.msg))
# Redact specific fields
if hasattr(record, 'password'):
record.password = '[REDACTED]'
return True
logger = logging.getLogger()
logger.addFilter(RedactionFilter())
3.4 Anonymization for Analytics
Goal: Enable analysis without identifying individuals
Techniques:
-
Aggregation: Report statistics, not individuals
- Instead of: “User [email protected] clicked [button] 12 times”
- Use: “Across 1000 users, [button] was clicked average 8.5 times”
-
Differential privacy: Add noise to protect individuals
- Train model such that removing any single user doesn’t substantially change output
- Quantify privacy loss (epsilon parameter)
-
K-anonymity: Each record indistinguishable from k-1 others
- Generalize: ZIP code 90210 → “Los Angeles area”
- Delete rare values: If only 1 user from postal code, remove postal code
- Suppression: Replace specific values with ranges
-
Data masking: Replace with fake but realistic data
- SSN 123-45-6789 → 987-65-4321 (fake but same format)
- Name “Alice Smith” → “Bob Johnson” (not linked to real person)
-
Tokenization: Replace with random identifier
- user_id=12345 → token=abc-def-ghi (can’t reverse)
- Cannot link back to original (unless encrypted key stored separately)
Anonymization effectiveness matrix:
| Technique | Reversal Risk | Analytical Value | Effort |
|---|---|---|---|
| Aggregation | None | Moderate | Low |
| Differential privacy | Cryptographic | Good | High |
| K-anonymity | Moderate (homogeneity attacks) | Moderate | Medium |
| Masking | None (if one-way) | High | Medium |
| Tokenization | Low (if key separate) | High | Low |
Implementation example (Apache Spark):
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, concat_ws
spark = SparkSession.builder.appName("Anonymize").getOrCreate()
# Read raw user data
df = spark.read.json("users.json")
# Anonymize: keep age range, remove exact birth date
anonymized = df.select(
col("user_id").alias("user_token"), # Keep ID for analytics
when((col("age") < 18), "0-17")
.when((col("age") < 25), "18-24")
.when((col("age") < 65), "25-64")
.otherwise("65+").alias("age_group"),
col("country").alias("location"), # Coarse location
col("purchase_total"),
col("visit_count")
)
anonymized.write.parquet("anonymized_analytics/")
3.5 PII Detection & Protection Checklist
Identify:
- Catalog all data sources (databases, APIs, files, logs)
- Identify what PII each source contains
- Classify by sensitivity (high/medium/low)
- Identify who accesses each data source
Protect:
- Encrypt all high-risk PII at rest and in transit
- Access controls: limit who can see PII
- Redaction: automatic redaction in logs and exports
- Retention: delete when no longer needed
- Backups: encrypted backups with separate key management
Monitor:
- Audit logs: who accessed what data and when
- Anomaly detection: alert on unusual data access patterns
- Regular scans: automated PII detection in logs and exports
- Breach plan: process to notify users if PII is exposed
Test:
- Simulate: mock data that looks like real PII
- Verify redaction: check that logs don’t contain real PII
- Test retention: verify data is deleted after retention period
- Test deletion: verify user deletion removes all traces
4. Fairness & Bias Detection
4.1 What is Bias?
Bias in AI: Systematically treating different groups differently based on protected attributes
Protected attributes (vary by jurisdiction):
- Race/ethnicity
- Color
- Gender/sex (including pregnancy, sexual orientation, gender identity)
- Religion/belief
- Age
- Disability
- National origin
- Marital status (some jurisdictions)
- Genetic information (some jurisdictions)
Types of bias:
-
Representation bias: Training data doesn’t represent all groups
- Example: Resume screening trained on mostly male engineers; penalizes female candidates
- Impact: Model performs worse for underrepresented groups
-
Measurement bias: Outcomes measured differently across groups
- Example: Recidivism model trained on arrests, not actual crimes; over-policed communities have more arrests
- Impact: Model learns unfair proxy (overpolicing) as predictor of crime
-
Aggregation bias: Applying one-size-fits-all model to diverse groups
- Example: Medical model trained on majority population; performs worse for minorities
- Impact: Dangerous health decisions for underrepresented groups
-
Evaluation bias: Testing only on majority group
- Example: Facial recognition tested on white men, performs poorly on women and people of color
- Impact: Deployed system fails when it matters most
-
Outcome bias: Model perpetuates historical discrimination
- Example: Hiring model trained on past hiring decisions (which were discriminatory)
- Impact: “Learning” to discriminate against protected groups
-
Proxy bias: Using proxies that correlate with protected attributes
- Example: ZIP code as proxy for race; name as proxy for gender
- Impact: Discrimination without explicitly using protected attributes
4.2 Detecting Bias in Outputs
Metrics for fairness (choose appropriate for your context):
-
Demographic Parity (Statistical Parity)
- Definition: Positive outcome rate is equal across groups
- Formula: P(Y=1 | G=A) = P(Y=1 | G=B) for groups A and B
- Example: Hiring acceptance rate should be 30% for all genders
- Limitation: May require rejecting qualified candidates to achieve parity
-
Equalized Odds (Equal Opportunity)
- Definition: True positive rate (TPR) and false positive rate (FPR) are equal across groups
- Formula: P(Y’=1 | Y=1, G=A) = P(Y’=1 | Y=1, G=B) (and same for FPR)
- Example: Loan approval system identifies 80% of creditworthy applicants in all racial groups
- Strength: Focuses on classifier performance, not outcome rates
-
Equalized False Negative Rates
- Definition: False negative rate (FNR) is equal across groups
- Formula: P(Y’=0 | Y=1, G=A) = P(Y’=0 | Y=1, G=B)
- Example: Recidivism model misses dangerous individuals at same rate for all races
- Strength: Focuses on errors for positive class
-
Predictive Parity
- Definition: Precision (accuracy of positive predictions) is equal across groups
- Formula: P(Y=1 | Y’=1, G=A) = P(Y=1 | Y’=1, G=B)
- Example: When hiring model predicts “good candidate,” that’s accurate 85% of the time for all genders
- Limitation: Can be hard to measure (requires long-term outcomes)
-
Calibration
- Definition: Predicted probability matches actual outcome probability for all groups
- Example: If model says “80% chance of success,” that outcome actually occurs 80% of the time
- Strength: Enables fair decision thresholds
-
Individual Fairness
- Definition: Similar individuals should receive similar treatment
- Example: Two identical applicants should get same decision regardless of protected attribute
- Limitation: Requires defining “similar,” which is subjective
4.3 Testing for Fairness
Test structure:
- Identify relevant protected attributes for your domain
- Segment data by protected attribute
- Evaluate model performance for each group
- Calculate fairness metrics for each group-pair
- Identify disparities and root causes
- Decide acceptable thresholds (e.g., max 5% difference in TPR)
Example: Hiring recommendation system
from sklearn.metrics import confusion_matrix, roc_auc_score
import pandas as pd
# Data: candidates with predictions and outcomes
df = pd.read_csv("hiring_decisions.csv")
# Columns: predicted_score, hired, gender, race, years_exp, ...
def fairness_audit(df, protected_attr='gender'):
"""Audit model for bias across protected attribute."""
results = {}
for group in df[protected_attr].unique():
group_df = df[df[protected_attr] == group]
# Calculate metrics for this group
tn, fp, fn, tp = confusion_matrix(
group_df['hired'],
(group_df['predicted_score'] > 0.5).astype(int)
).ravel()
tpr = tp / (tp + fn) # True positive rate
fpr = fp / (fp + tn) # False positive rate
fnr = fn / (fn + tp) # False negative rate
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
results[group] = {
'size': len(group_df),
'hiring_rate': group_df['hired'].mean(),
'tpr': tpr,
'fpr': fpr,
'fnr': fnr,
'precision': precision,
'auc': roc_auc_score(group_df['hired'], group_df['predicted_score'])
}
# Print results
audit_df = pd.DataFrame(results).T
print(audit_df)
# Identify disparities
tpr_by_group = audit_df['tpr']
if (tpr_by_group.max() - tpr_by_group.min()) > 0.05: # >5% difference
print(f"WARNING: TPR disparity detected ({tpr_by_group.max() - tpr_by_group.min():.1%})")
return audit_df
# Run audit
fairness_audit(df, protected_attr='gender')
fairness_audit(df, protected_attr='race')
Output example:
size hiring_rate tpr fpr fnr precision auc
male 450 0.45 0.87 0.12 0.13 0.82 0.91
female 350 0.38 0.74 0.18 0.26 0.71 0.84
--------
WARNING: TPR disparity detected (13.0%)
WARNING: FNR disparity detected (13.0%)
4.4 Demographic Parity vs Equalized Odds
When to use each:
| Metric | Use When | Don’t Use When |
|---|---|---|
| Demographic Parity | Hiring/admissions (goal is equal representation) | Loan approval (can’t force equal lending) |
| Equalized Odds | Safety-critical (fraud, recidivism, medical) | Sensitive to base rate differences |
| Equalized FNR | High cost to false negatives (missing criminals, patients) | Equal approval rates more important |
| Calibration | Transparent scoring needed (lending rates) | Raw fairness metrics less important |
Real-world examples:
- Hiring: Both demographic parity AND equalized odds recommended
- Demographic parity: ensure diverse candidate pool
- Equalized odds: ensure model isn’t sabotaging qualified minorities
- Loan approval: Equalized odds NOT sufficient (lenders incentivized to approve less minorities to reduce FPR)
- Use: Demographic parity + monitoring for disparate impact
- Recidivism: Equalized odds critical (don’t want false negatives to vary by race)
- Use: Equal FNR + TPR + regular monitoring
4.5 Mitigation Strategies
Pre-processing (before model training):
- Balanced sampling: Oversample underrepresented groups in training data
- Data augmentation: Generate synthetic data for underrepresented groups
- Stratified sampling: Ensure train/test splits represent all groups equally
- Proxy removal: Don’t train on protected attributes or strong proxies
In-processing (during model training):
- Fairness constraints: Add penalty if model exhibits bias
- Fairness-aware objective: Maximize accuracy subject to fairness constraint
- Debiased representations: Learn representations that are independent of protected attribute
- Threshold optimization: Adjust decision threshold differently for each group
Post-processing (after model training):
- Threshold adjustment: Different thresholds for different groups to equalize outcomes
- Output adjustment: Flip some predictions to meet fairness targets
- Fairness wrapper: Take model outputs and adjust for fairness
Long-term:
- Diverse training data: Actively collect data from underrepresented groups
- Diverse team: Engineers, domain experts, and affected communities involved in design
- Continuous monitoring: Regular fairness audits in production
- User feedback: Incorporate feedback from affected communities
Example: Threshold adjustment
# Original model predicts probability 0-1
# Adjust thresholds to achieve equal TPR across groups
tpr_targets = {
'male': 0.85,
'female': 0.85,
}
thresholds = {}
for group, target_tpr in tpr_targets.items():
group_df = df[df['gender'] == group]
# Find threshold where TPR = target
sorted_preds = sorted(group_df['predicted_score'].values)
# ... find threshold ...
thresholds[group] = 0.52 # Example
# Apply different thresholds
df['hired'] = df.apply(
lambda row: row['predicted_score'] > thresholds[row['gender']], axis=1
)
4.6 Fairness Testing Framework
Automated fairness testing (code):
class FairnessAudit:
"""Framework for testing fairness of model across protected attributes."""
def __init__(self, df, protected_attrs, outcome_col, pred_col, pred_threshold=0.5):
self.df = df
self.protected_attrs = protected_attrs # ['gender', 'race', 'age_group']
self.outcome_col = outcome_col # 'hired' or 'loan_approved'
self.pred_col = pred_col # 'pred_score' or 'model_prob'
self.pred_threshold = pred_threshold
self.results = {}
def audit(self):
"""Run fairness audit across all protected attributes."""
for attr in self.protected_attrs:
self.results[attr] = self._audit_attribute(attr)
return self.results
def _audit_attribute(self, attr):
"""Audit single protected attribute."""
groups = {}
for group in self.df[attr].unique():
group_df = self.df[self.df[attr] == group]
# Binary predictions
y_true = group_df[self.outcome_col].values
y_pred = (group_df[self.pred_col] > self.pred_threshold).astype(int).values
# Calculate metrics
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
groups[group] = {
'n': len(group_df),
'outcome_rate': y_true.mean(),
'pred_rate': y_pred.mean(),
'tpr': tp / (tp + fn) if (tp + fn) > 0 else 0,
'fpr': fp / (fp + tn) if (fp + tn) > 0 else 0,
'fnr': fn / (fn + tp) if (fn + tp) > 0 else 0,
'precision': tp / (tp + fp) if (tp + fp) > 0 else 0,
}
return groups
def report(self, threshold=0.05):
"""Generate fairness report with warnings."""
print("=" * 80)
print("FAIRNESS AUDIT REPORT")
print("=" * 80)
for attr, groups in self.results.items():
print(f"\n{attr.upper()}")
print("-" * 80)
df_report = pd.DataFrame(groups).T
print(df_report.to_string())
# Check for disparities
tpr_disparity = df_report['tpr'].max() - df_report['tpr'].min()
fpr_disparity = df_report['fpr'].max() - df_report['fpr'].min()
outcome_disparity = df_report['outcome_rate'].max() - df_report['outcome_rate'].min()
if tpr_disparity > threshold:
print(f" ⚠️ TPR disparity: {tpr_disparity:.1%} (threshold: {threshold:.1%})")
if fpr_disparity > threshold:
print(f" ⚠️ FPR disparity: {fpr_disparity:.1%} (threshold: {threshold:.1%})")
if outcome_disparity > threshold:
print(f" ⚠️ Outcome disparity: {outcome_disparity:.1%} (threshold: {threshold:.1%})")
# Usage
audit = FairnessAudit(
df=hiring_df,
protected_attrs=['gender', 'race', 'age_group'],
outcome_col='hired',
pred_col='model_score',
pred_threshold=0.5
)
audit.audit()
audit.report(threshold=0.05) # Fail if >5% disparity
4.7 Fairness & Bias Checklist
Design:
- Identified protected attributes relevant to your domain
- Reviewed training data for representation of all groups
- Documented potential sources of bias (measurement, historical, aggregation)
- Decided fairness metric appropriate for use case (parity, odds, calibration)
Development:
- Analyzed training data composition by protected attribute
- Tested model performance for each group
- Calculated fairness metrics for each group-pair
- Identified groups where performance differs significantly
- Applied mitigation strategy (pre/in/post-processing)
Testing:
- Fairness test suite integrated into CI/CD
- Regular fairness audits (monthly or quarterly)
- Test coverage includes intersectional groups (e.g., women of color, older women)
- Document why any disparity is acceptable (if it is)
Monitoring:
- Production monitoring: fairness metrics tracked over time
- Alerts for fairness regressions (e.g., TPR disparity increases)
- User feedback: channel for affected communities to report unfairness
- Regular re-auditing: fairness checks continue in production
5. Explainability & Interpretability
5.1 Why It Matters
Regulatory:
- GDPR right to explanation (Art. 22): Users can request explanation of automated decisions
- FTC substantiation: AI claims must be truthful and can’t hide material limitations
- Sector-specific: Finance, healthcare, employment all require explainability
User trust:
- Transparency builds trust; opacity creates suspicion
- Users want to understand why they were recommended, approved, or rejected
- Explainability enables users to correct errors and provide feedback
Debugging & improvement:
- When model fails, explanations help identify root cause
- Fairness analysis: explanations reveal if model relies on proxies for protected attributes
- Feature importance: understand what drives predictions
5.2 How to Provide Explanations
Types of explanations:
-
Global explanation: How does the model work overall?
- “The model weighs: recent activity (40%), historical engagement (30%), user preferences (20%), trending now (10%)”
- “Credit decision factors: income (35%), debt-to-income ratio (30%), payment history (25%), other (10%)”
-
Local explanation: Why this specific decision?
- “Recommendation: Movie A because you watched 5 similar movies, your friends rated it 8.5/10, and algorithm detected similar taste”
- “Loan denied because: debt-to-income ratio 0.65 (max 0.60), no credit history, only 6 months employment history”
-
Counterfactual explanation: What would need to change for different outcome?
- “To get approval, reduce debt by $5,000 or increase income to $65,000+”
- “To get approved: clear past due payment or reduce credit utilization to <30%”
-
Feature importance: Which inputs matter most?
- SHAP (Shapley Additive exPlanations): Contribution of each feature to prediction
- LIME (Local Interpretable Model-agnostic Explanations): Approximate model locally with interpretable model
Example local explanations:
DECISION: Loan Approved ($50,000)
Key Factors:
+ Income: $120,000 (strong positive, top 20%)
+ Payment history: 8 years, 99% on-time (strong positive)
+ Debt-to-income: 0.35 (acceptable, below 0.40 limit)
- Credit utilization: 45% (minor negative, optimal is <30%)
+ Employment: 5 years at current job (strong positive, stability)
Why approved? Strong income, excellent payment history, and low debt relative to income outweigh moderate credit utilization concern.
To improve rate: Reduce credit card balances to <30% of limits.
5.3 Traceability & Audit Trails
Traceability: Ability to trace decision back to data, model version, and reasoning
Why it matters:
- Regulatory: Must explain decisions to regulators
- Debugging: Trace failures to identify root cause
- Fairness: Verify decisions aren’t based on protected attributes
- Legal: Support appeals and disputes
What to log (immutable audit trail):
{
"decision_id": "dec-2024-04-18-12345",
"timestamp": "2024-04-18T14:23:45Z",
"user_id": "user-xyz", // Pseudonymized
"decision_type": "loan_approval",
"decision": "approved",
"model_version": "credit-model-v3.2.1",
"model_hash": "sha256:abc123...",
"input_features": {
"income": 120000,
"debt_to_income": 0.35,
"payment_history_years": 8,
"on_time_payment_pct": 0.99,
"employment_years": 5
},
"model_output": {
"approval_probability": 0.87,
"score": 785,
"threshold": 0.70
},
"explanation": {
"top_factors": [
{"feature": "payment_history_years", "contribution": 0.35},
{"feature": "income", "contribution": 0.30},
{"feature": "on_time_payment_pct", "contribution": 0.25}
]
},
"approved_by": "system", // Or human reviewer
"appeal_available": true,
"appeal_deadline": "2024-05-18T23:59:59Z"
}
5.4 Tool Call Justification
For agent systems: When agent chooses tool, explain why
Example:
User: "Help me plan a 3-day trip to Japan"
Agent reasoning: User is asking for trip planning. I need to:
1. Search for flights [tool: web_search]
2. Find hotels [tool: web_search]
3. Get local attractions [tool: web_search]
4. Build itinerary [reasoning: combine results with LLM]
Tool call 1: web_search(
query: "flights to Japan 3 days from today"
reason: "Find available flights and prices for user's trip"
)
Tool call 2: web_search(
query: "best hotels in Tokyo for budget travelers"
reason: "Find accommodation options matching user's likely preferences (typical 3-day budget trips use Tokyo as hub)"
)
Response: Here's a 3-day Japan itinerary...
Explanation of recommendations: Based on flight availability from your location, hotel ratings, and popular attractions for first-time visitors.
5.5 Implementation Patterns
Pattern 1: Feature importance with SHAP
import shap
import xgboost as xgb
# Train model
model = xgb.train(params, data)
# Create explainer
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# Explain single prediction
def explain_decision(X_row, feature_names):
"""Generate human-readable explanation."""
shap_value = explainer.shap_values(X_row)
base_value = explainer.expected_value
# Get top features
feature_importance = list(zip(
feature_names,
X_row[0],
shap_value[0]
))
feature_importance.sort(key=lambda x: abs(x[2]), reverse=True)
explanation = f"Base score: {base_value:.2f}\n"
for feat_name, feat_val, shap_val in feature_importance[:5]:
direction = "↑" if shap_val > 0 else "↓"
explanation += f" {direction} {feat_name}={feat_val:.2f} (impact: {shap_val:+.3f})\n"
return explanation
# Usage
explanation = explain_decision(X_test[0:1], feature_names)
print(explanation)
Pattern 2: Counterfactual explanation
def counterfactual_explanation(X_instance, model, target_class, feature_names):
"""Find minimal changes needed for different outcome."""
current_prediction = model.predict(X_instance)[0]
if current_prediction == target_class:
return "Already predicted target class"
# Try adjusting each feature
counterfactuals = []
for i, feature_name in enumerate(feature_names):
# Binary search: find value that flips prediction
original_value = X_instance[0, i]
# Try increasing
for new_value in np.linspace(original_value, feature_max[i], 20):
X_test = X_instance.copy()
X_test[0, i] = new_value
if model.predict(X_test)[0] == target_class:
counterfactuals.append({
'feature': feature_name,
'change': new_value - original_value,
'new_value': new_value
})
break
# Sort by smallest change needed
counterfactuals.sort(key=lambda x: abs(x['change']))
explanation = "To achieve target outcome, try:\n"
for cf in counterfactuals[:3]:
direction = "increase" if cf['change'] > 0 else "decrease"
explanation += f" {direction} {cf['feature']} to {cf['new_value']:.2f}\n"
return explanation
5.6 Explainability Checklist
Design:
- Identified stakeholders who need explanations (users, regulators, support)
- Determined level of detail needed (simple vs technical)
- Chose explanation approach appropriate for model type
- Planned how to communicate limitations and uncertainty
Implementation:
- Global explanations documented (how model works overall)
- Local explanations generated for each decision
- Feature importance / contribution clearly communicated
- Explanations tested for clarity with actual users
- Explanation generated quickly enough for real-time use
Audit trail:
- All decisions logged with input data, model version, output
- Audit logs immutable and retrievable
- Audit logs include explanation/reasoning
- Can retrieve decision details for any specific case
Monitoring:
- Verify explanations match actual model behavior (no gaming)
- User feedback on explanation quality
- Explanations reviewed in fairness audits
6. Transparency & Disclosure
6.1 When to Disclose AI Involvement
FTC rule: Disclose material facts about AI
Must disclose when:
- System makes consequential decisions (approval/denial, ranking, recommendation)
- Claims about system performance or fairness made
- System replaces human judgment in decision-making
- User would reasonably expect human to make decision
- Users are vulnerable (children, elderly, low digital literacy)
Examples where disclosure required:
- “This job recommendation is AI-generated” ✓
- Loan approval letter: “Your application was reviewed by an AI system” ✓
- Medical diagnosis: “This diagnosis was assisted by AI analysis” ✓
- Resume screening: Disclose that AI filters resumes ✓
- Content moderation: “This content was flagged by AI for review” ✓
Examples where disclosure may not be required (but consider doing it anyway):
- Auto-complete in search box (minor, expected)
- Spellcheck (clearly AI, obvious to user)
- Recommendations in exploratory context (browsing products)
- Internal tools not seen by users
Note: When in doubt, disclose. Transparency builds trust.
6.2 Labeling AI-Generated Content
For content generated by AI (text, images, video, audio):
Minimal label:
⚠️ This content was generated using AI
Better label (with context):
This article was written by an AI system trained on news sources.
Fact-check claims before sharing.
Best practice (specific model + limitations):
🤖 AI-Generated Content
Generated using: GPT-4 language model
Limitations: May contain inaccuracies, outdated information, or bias
Confidence: Medium (base facts verified, analysis not independently checked)
Last verified: April 10, 2024
Feedback: Report errors at [email]
Where to place labels:
- Beginning of content (most noticeable)
- Byline: “By AI Assistant (GPT-4)” instead of human name
- Metadata: Mark as “AI-generated” for discoverability
- If in video: visual indicator + spoken disclosure
- If in social media: Use provided “Labeled” feature; pin comment explaining
6.3 User Expectations
Set expectations early:
- Onboarding: When users first use system, explain what AI is involved
- Settings: Allow users to choose level of AI assistance
- Transparency: Make it easy to see why system made a decision
Example onboarding for recommendation system:
🎯 How our recommendations work
We use AI to learn from:
• What you like and dislike
• What similar users enjoyed
• What's popular right now
Why this matters:
✓ Better recommendations tailored to your taste
⚠️ AI isn't perfect (sometimes will miss or misunderstand)
📊 We track what we get wrong and improve over time
You're in control:
• Can always ignore recommendations
• Can hide inappropriate suggestions
• Can adjust AI preference settings
• Can see why each recommendation was made
Questions? See our [FAQ] or email [support]
Example for decision-making system:
⚖️ How decisions are made
When you apply, your information is reviewed by:
1. Automated screening (AI) — Initial pass/fail based on requirements
2. Human review — Final decision made by human reviewer
3. Appeal process — Can contest decision in writing
The AI screening:
✓ Ensures consistent evaluation
✓ Flags potential bias issues
⚠️ Is not final decision
📊 Is reviewed by humans for fairness
You have rights:
• Right to explanation of why you were approved/denied
• Right to appeal decision
• Right to request human review if denied
Questions? See [Appeals] or email [appeals@company]
6.4 Clear Communication
Principles:
- Accurate: Don’t oversell capabilities
- Honest: Acknowledge limitations and uncertainty
- Simple: Use plain language, not jargon
- Timely: Disclose before user is affected
- Actionable: Tell user what they can do about it
Common mistakes:
- ❌ “Our AI will perfectly match you with your soulmate” → No AI is perfect
- ❌ “Powered by machine learning” with no explanation → Meaningless jargon
- ❌ Hiding AI in small print → Defeats transparency purpose
- ❌ Overstating accuracy without evidence → FTC violation
Good examples:
- ✓ “Based on your viewing history, we think you’d enjoy: [Movie]. Why? You’ve watched 8 similar movies.”
- ✓ “Estimated match: 72% (based on 5 shared interests, similar values about family and career)”
- ✓ “Prediction: 70% likely to succeed in program. This is based on academic performance, test scores, and interviews. Accuracy is 75% (varies by demographic).“
7. Audit Trails & Accountability
7.1 Immutable Logs of Decisions
Purpose: Enable investigation, prove compliance, debug failures
What to log for decision systems:
{
# Identifiers
"request_id": "req-2024-04-18-xyz789", # Unique ID for this decision
"timestamp": "2024-04-18T14:23:45Z", # UTC timestamp
# User & context
"user_id": "user-12345", # Pseudonymized or tokenized
"context": {
"device": "mobile-ios",
"location_country": "US", # Coarse location only
"session_id": "sess-abc123"
},
# Input
"input_features": {
"age_group": "25-34", # Coarse categories
"income_bracket": "100k-150k",
"credit_score": 750,
"historical_purchases": 24
# Does NOT include: name, email, SSN, full address
},
# Processing
"model_version": "recommendation-v2.3.1",
"model_hash": "sha256:abc123...", # Immutable reference
"processing_time_ms": 245,
"model_output": {
"recommendation_id": "item-5432",
"confidence": 0.87,
"reason": "Similar to 8 items you purchased"
},
# Decision
"decision": "recommend",
"decision_maker": "system", # Or "human_reviewer"
"decision_time": "2024-04-18T14:23:46Z",
# Explanation
"explanation": {
"primary_reason": "Item matches your browsing history",
"secondary_reasons": ["Popular with similar users", "On sale this week"],
"factors_against": ["Expensive", "Niche category"]
},
# Compliance
"user_rights_applicable": ["access", "object"],
"appeal_available": true,
"appeal_deadline": "2024-05-18"
}
7.2 Who Made What Decision and When
Accountability structure:
{
"decision_id": "dec-loan-20240418-xyz",
"decision_type": "loan_approval",
"decision": "denied",
# Decision by AI
"ai_stage": {
"timestamp": "2024-04-18T09:00:00Z",
"model": "loan-approval-v5.2",
"recommendation": "denied",
"confidence": 0.72,
"reasoning": "Income < $60k threshold"
},
# Decision by human (if any)
"human_review": {
"timestamp": "2024-04-18T10:15:00Z",
"reviewer_id": "emp-review-543", # Pseudonymized employee ID
"reviewer_department": "loan-review",
"final_decision": "denied",
"override_of_ai": false,
"notes": "Applicant underqualified, low income for loan amount"
},
# Communication
"decision_communicated": {
"timestamp": "2024-04-18T10:30:00Z",
"method": "email",
"user_received": true
},
# Appeal
"appeal": {
"available": true,
"appeal_deadline": "2024-05-18",
"appeal_contact": "[email protected]"
}
}
7.3 Enabling Investigation of Failures
When something goes wrong, audit trails enable questions:
Question: “Why was this user recommended dangerous product?” Answer (from logs):
- Model version: recommendation-v2.1 (known bug with certain user profiles)
- Input features: Similar to user’s past, but missing safety flags
- Feature missing: “Has reported safety concern” (feature removed in v2.1 by mistake)
Question: “Was this decision biased against women?” Answer (from logs):
- Pulled audit logs for past 100 decisions
- Compared approval rates by gender
- Found no statistical disparity (p-value 0.42)
- Model features: No gender input, no strong gender proxies
Question: “Did employee access customer data they shouldn’t have?” Answer (from logs):
- Audit logs show all data access
- Employee accessed customer X on 2024-04-18 at 2:15pm
- Reason code: “Technical support ticket TSK-1234”
- Ticket shows: Customer reported password reset issue (legitimate reason)
7.4 Compliance with Regulations
GDPR compliance:
- Log evidence of consent (when given, what consent, withdrawal)
- Log data deletions (when deleted, what deleted, confirmation)
- Log access requests (who requested, when, what data provided)
HIPAA compliance:
- Log all PHI access (who, when, why, what data)
- Log disclosures (to whom, when, how much)
- Immutable: cannot modify or delete historical logs
FTC compliance:
- Log AI involvement (when, what system, what decision)
- Log explanations provided (what was disclosed to user)
7.5 Retention Periods
How long to keep logs (varies by regulation):
| Log Type | Retention | Reason |
|---|---|---|
| Decision logs | 7 years | GDPR, financial compliance, audit |
| Access logs (PII) | 3 years | GDPR, HIPAA |
| Deletion logs | 5+ years | Prove deletion happened |
| Consent records | Until withdrawn + 5 years | GDPR requirement |
| Security incidents | Indefinite | Legal liability |
| Error logs (no PII) | 90 days | Operational debugging |
Implementation:
- Store logs in immutable storage (write-once)
- Separate system from production (can’t modify)
- Encryption at rest
- Monthly integrity checks (verify no tampering)
- Archive old logs (cold storage, but retrievable)
8. Model Cards & Documentation
8.1 What Should Be Documented
Model Card template:
# Model Card: [Model Name]
## Model Details
- **Model version**: recommendation-v3.2.1
- **Type**: Collaborative filtering + content-based hybrid
- **Date**: April 2024
- **Owners**: [Team name]
- **Documentation date**: April 18, 2024
- **Contact**: [email]
## Intended Use
- **Primary use**: Recommend products to users
- **Users**: E-commerce platform, end users (consumers)
- **Out-of-scope uses**:
- Do NOT use for: Employment decisions, credit scoring, healthcare
- Has not been tested for: Adversarial inputs, non-English languages
- Performance unknown for: Users under 18, users with disabilities
## Factors
- **Relevant factors**:
- User browsing history
- Item features (category, price, rating)
- Seasonal trends
- **Irrelevant factors** (should not influence):
- User demographics (gender, race, age)
- User location
- User reviews (other products)
## Metrics
### Overall Performance
- **Accuracy** (top-10 recommendations contain user-liked item): 68%
- **NDCG@10**: 0.72
- **Coverage** (recommendations span catalog): 82%
- **Diversity** (recommendations span categories): 0.65
### Performance by Group
| Demographic | Accuracy | NDCG | Notes |
|-------------|----------|------|-------|
| **Age 18-24** | 71% | 0.75 | Young users: higher engagement |
| **Age 25-34** | 68% | 0.72 | -- |
| **Age 35-49** | 65% | 0.68 | Fewer interactions, lower signal |
| **Age 50+** | 61% | 0.63 | ⚠️ Smaller cohort, less data |
| **Male** | 70% | 0.73 | -- |
| **Female** | 66% | 0.70 | ⚠️ Historically lower engagement |
### Fairness
- **Recommendation rate parity**: 98% (recommendations shown to all demographics at similar rates)
- **Coverage parity**: 85% (all groups see diverse recommendation categories)
- **Concern**: Older users (50+) have lower model performance due to lower training data volume
## Datasets
- **Training data**: 2023 user interactions (500M events)
- **Data source**: Production logs
- **Cutoff date**: January 2024
- **Train/validation/test split**: 80/10/10
### Data Characteristics
- **Size**: 500M interaction records
- **Time span**: Jan 2023 - Dec 2023
- **User coverage**: 95% of active users
- **Item coverage**: 87% of catalog
### Known Issues
- ⚠️ **Demographic imbalance**: Dataset is 60% male, 40% female (matches user base)
- ⚠️ **Sparse data**: Users 50+ have 3x fewer interactions, leading to weaker recommendations
- ⚠️ **Seasonal bias**: Training data primarily from Q1-Q3 (holiday season underrepresented)
- ⚠️ **Recency bias**: Recent users (joined Dec 2023) have minimal training signal
## Limitations
1. **Cold start**: Cannot recommend to new users (no history)
- Mitigation: Fall back to popularity-based recommendations
2. **Filter bubble**: May reinforce user preferences
- Mitigation: Inject novelty into recommendations
3. **Performance**: Accuracy drops 10% for users 50+ due to sparse data
- Mitigation: Collect more data, use transfer learning
4. **Bias**: May recommend products matching historical preferences, can miss new interests
- Mitigation: Regular audits, user feedback
## Ethical Considerations
- ✓ No protected attributes used directly in model
- ⚠️ Age-based performance differences observed; under investigation
- ✓ Recommendations provide user control (can dislike, hide)
- ✓ Transparent about AI involvement ("AI chose this for you")
## Caveats and Recommendations
- **Bias analysis**: Quarterly bias audits recommended
- **Monitoring**: NDCG and coverage metrics should be monitored weekly
- **Retraining**: Model should be retrained quarterly
- **Human oversight**: Recommendations occasionally reviewed by human (sample: 1% of recommendations)
## Model Card Version History
- **v1.0** (Jan 2024): Initial release
- **v2.0** (Mar 2024): Added diversity factor
- **v3.0** (Apr 2024): Improved cold start
- **v3.2.1** (Apr 2024): Bugfix in NDCG calculation
---
8.2 Limitations and Biases
Examples of limitations to document:
## Known Limitations
### Performance Limitations
- **Accuracy**: 68% (top-10 contains item user later purchased)
- **Coverage**: 82% of catalog recommended (18% of niche items rarely shown)
- **Latency**: 200-300ms per recommendation (real-time)
- **Scalability**: Tested on 5M daily active users; unknown performance at 10M+
### Data Limitations
- **Historical bias**: Training data from 2023; recommendations may be outdated
- **Representation**: 95% of active users in training; 5% new users not represented
- **Temporal**: Data collected Jan-Dec 2023; holiday shopping (Dec) undersampled (only 1/12th of training)
- **Sparse**: Users with <5 interactions (10% of user base) get poor recommendations
### Group-Specific Limitations
| Group | Limitation | Severity |
|-------|-----------|----------|
| **Age 50+** | 7% lower accuracy due to sparse training data | Medium |
| **Women** | 4% lower coverage (some female-specific categories underrecommended) | Low-Medium |
| **New users (< 1 week)** | 30% lower accuracy; recommend using popularity baseline | High |
| **Users with disabilities** | Not tested for accessibility of recommendation explanations | Medium |
### Domain Limitations
- **Not tested for**: Employment decisions, healthcare, safety-critical
- **Not suitable for**: Users with significant accessibility needs (blind users can't see images)
- **Not appropriate for**: Vulnerable populations (children < 13)
### What the Model Doesn't Do
- ❌ Does NOT explain WHY you might like item (just shows similar items)
- ❌ Does NOT consider user preferences expressed in other channels (customer service calls, surveys)
- ❌ Does NOT account for supply chain issues (may recommend out-of-stock items)
- ❌ Does NOT consider ethical concerns (does recommend products from companies with poor labor practices)
8.3 Training Data Characteristics
Document:
- Source and collection method
- Size and composition
- Potential biases or limitations
- Preprocessing applied
## Training Data
### Source
- Collection method: Production logs (user interactions on platform)
- Time period: January 1, 2023 - December 31, 2023
- User consent: All data collected under Terms of Service; users can opt-out of recommendations
### Data Composition
- **Total records**: 500 million interactions
- **Unique users**: 5.2 million
- **Unique items**: 250,000 products
- **Time span**: 12 months
### Demographics of Training Data
| Demographic | Proportion | Notes |
|-------------|-----------|-------|
| **Male** | 58% | Includes some data entry errors |
| **Female** | 39% | May be underrepresented in certain categories |
| **Other/Not disclosed** | 3% | -- |
| **Age 18-24** | 22% | Heavy users; 3x more interactions than 50+ |
| **Age 25-34** | 35% | Core demographic |
| **Age 35-49** | 28% | -- |
| **Age 50+** | 15% | ⚠️ Underrepresented in training |
### Preprocessing
- Removed: Interactions flagged as fraudulent (0.5%)
- Removed: Interactions from bot traffic (1.2%)
- Aggregated: Multiple interactions same user/item on same day
- Anonymized: User IDs hashed; names not included
### Known Issues in Training Data
- ⚠️ Gender data: 45% of users didn't disclose; assumed to be male (default)
- ⚠️ Age data: 22% missing; estimated from other signals (purchase behavior, device type)
- ⚠️ Seasonal: Holiday season (Nov-Dec) only 2 months of data; underrepresented
- ⚠️ New product bias: New items (added in 2023) have less interaction data
8.4 Performance on Different Subgroups
Document fairness and accuracy across groups:
## Performance Analysis by Subgroup
### Performance by Age Group
Age 18-24:
- Accuracy (top-10): 71%
- NDCG@10: 0.75
- Avg interactions per user: 450
- Sample size: 1.1M users
- Notes: Young users highly engaged; recommendations reliable
Age 25-34:
- Accuracy: 68%
- NDCG@10: 0.72
- Avg interactions: 320
- Sample size: 1.8M users
- Notes: Core demographic; good performance
Age 50+:
- Accuracy: 61%
- NDCG@10: 0.63
- Avg interactions: 95
- Sample size: 0.8M users
- ⚠️ CONCERN: 7% lower accuracy due to sparse data
- Mitigation: Consider overweighting older users in future training
### Performance by Gender
Male users:
- Accuracy: 70%
- NDCG@10: 0.73
- Coverage: 85%
Female users:
- Accuracy: 66%
- NDCG@10: 0.70
- Coverage: 78%
- ⚠️ CONCERN: 4% lower coverage for products in women-specific categories
- Root cause: Historical underrepresentation in training data (47% of interactions are male)
- Mitigation: Increase sampling of female users in next training cycle
### Intersectional Analysis
Older women (50+ female):
- Accuracy: 58%
- ⚠️ CONCERN: Combines age and gender effects; lowest overall performance
- Recommendation: Flag for manual review
- Mitigation: Dedicated cohort analysis in next audit
8.5 Intended Use and Misuse
Define appropriate use:
## Appropriate Use
✓ **Product recommendations** to platform users
✓ **Personalization** of user experience
✓ **Trend identification** (aggregated, anonymized)
## Inappropriate Use
❌ **Employment decisions**: Model not evaluated for fair hiring
❌ **Credit/lending decisions**: Model not validated for financial risk
❌ **Healthcare/diagnosis**: Model not tested for medical accuracy
❌ **Advertising vulnerable populations**: Cannot recommend to users under 18
❌ **Surveillance**: Should not be used to track user behavior beyond recommendations
## Context-Specific Considerations
- **Multi-language platforms**: Model trained on English only; performance unknown for non-English users
- **Accessibility**: Model recommendations assume visual access to product images
- **Low-bandwidth**: Model recommendations include images; requires good connectivity
9. Ethical Guidelines
9.1 No Weaponization
Principle: Agents should not facilitate violence, harm, or warfare
Guidelines:
- Do not help design weapons, explosives, or biological agents
- Do not help plan violence or terrorism
- Do not help with surveillance for harmful purposes (e.g., stalking, harassment)
- Do not provide targeting assistance for armed conflict
Exceptions:
- Legitimate self-defense information (general knowledge)
- Educational content about weapons (history, policy)
- Information about security and harm prevention
- Law enforcement / military use (with appropriate restrictions)
Implementation:
- Content filter: Detect requests for weapon design, terrorism planning
- User assessment: Identify high-risk use cases
- Escalation: Contact legal/safety team for ambiguous cases
9.2 No Deception
Principle: Agents should not impersonate humans or hide their nature
Guidelines:
- Never claim to be human if AI
- Never hide that system is AI-powered
- Disclose limitations and capabilities honestly
- Correct misunderstandings about what you are
Examples of deception (don’t do):
- ❌ Customer service agent that doesn’t disclose it’s AI
- ❌ “I understand your feelings” (AI doesn’t have feelings)
- ❌ Claiming to be specific human (“I’m John from customer service”)
- ❌ Using human photo in avatar
Good examples:
- ✓ “I’m Claude, an AI assistant. I can help with…”
- ✓ “I don’t have feelings, but I understand this is frustrating…”
- ✓ “I can’t make promises, but here’s what I’d recommend…“
9.3 No Manipulation
Principle: Agents should not coerce or manipulate users into actions
Guidelines:
- No dark patterns (deceptive design)
- No pressure tactics (urgency, scarcity without basis)
- No exploitation of vulnerabilities (fear, loneliness, addiction)
- Users should maintain control over their choices
Examples of manipulation (don’t do):
- ❌ “Last chance! Only 2 items left!” (when plenty in stock)
- ❌ Showing “most people bought this” to create social pressure
- ❌ Making unsubscribe 5 clicks while subscribe is 1 click
- ❌ Recommending addictive behavior (excessive purchases, gambling)
Ethical alternatives:
- ✓ “This item is popular right now. [More info]”
- ✓ “Would you like to unsubscribe? [Yes] [Maybe later] [No]”
- ✓ “You’ve ordered frequently this month; consider taking a break?“
9.4 Respect Autonomy
Principle: Users maintain control and can override AI decisions
Guidelines:
- Provide explanations so users understand decisions
- Allow users to reject or override recommendations
- Don’t force AI decisions on users
- Respect user preferences and values
Implementation:
- Always show “why” (explain recommendations)
- Always show “dislike” / “try another” (override)
- Always show settings (adjust behavior)
- Always show history (see what AI chose)
Example: Recommendation system with user control
🎯 Recommended for you: [Product A]
Why: Similar to 8 items you purchased
[Dislike] [Show Different] [More Like This]
⚙️ Recommendation Settings:
[ ] Show popular items
[ ] Show new items
[x] Show deals
[ ] Show recommendations from friends
9.5 Organizational Ethical Guidelines
Template (customize for your org):
# Ethical AI Guidelines for [Organization]
## Core Values
1. **Transparency**: We disclose when AI is involved
2. **Fairness**: We test for and mitigate bias
3. **User control**: Users understand and can control AI systems
4. **Safety**: We prioritize safety over convenience
5. **Privacy**: We protect personal information
## Prohibited Use Cases
- Do not use AI for: [surveillance of employees, discriminatory hiring, deceptive marketing, ...]
- Do not deploy AI without: [fairness audit, privacy review, user disclosure, ...]
- Do not train on: [non-consensual data, sensitive health information without permission, ...]
## Approval Process
All new AI systems must be reviewed by:
- [ ] Product team (is this solving a real problem?)
- [ ] Legal (is this compliant with regulations?)
- [ ] Ethics board (does this align with our values?)
- [ ] Privacy team (does this respect user privacy?)
- [ ] Security team (is this secure?)
## Regular Audits
- Fairness audits: quarterly
- Security audits: annually
- Privacy reviews: before major updates
- Ethics review: annually
## Escalation
If any team has concerns, escalate to: [Executive sponsor]
10. Implementation Framework
10.1 Privacy Impact Assessment (PIA)
When to do: Before deploying any system processing personal data
Process:
# Privacy Impact Assessment: [System Name]
## Executive Summary
[1-2 sentences about system and data)
## Data Inventory
- What personal data does system process?
- How is it collected?
- Where is it stored?
- Who has access?
- How long is it retained?
### Sensitive Data Table
| Data Type | Category | Sensitivity | Retention | Purpose |
|-----------|----------|-------------|-----------|---------|
| Email | Contact | Medium | Until account deleted | Account recovery |
| Payment card | Financial | High | Until transaction settled | Billing |
| Browse history | Behavioral | Medium | 90 days | Recommendations |
## Legal Basis
- [ ] User consent
- [ ] Contract with user
- [ ] Legal obligation
- [ ] Vital interests
- [ ] Public task
- [ ] Legitimate interests (with balancing test)
## Risks
### High Risks
1. **Data breach**: Could expose user data to attackers
- Likelihood: Low (enterprise security)
- Impact: Very High (credential theft, fraud)
- Mitigation: Encryption, access controls, monitoring
2. **Unauthorized access**: Employees could access data inappropriately
- Likelihood: Medium (human factor)
- Impact: High (privacy violation)
- Mitigation: Access controls, audit logs, training
### Medium Risks
3. **Data retention**: Keeping data too long
- Likelihood: Medium
- Impact: Medium
- Mitigation: Automated deletion after retention period
### Low Risks
4. **Performance issues**: System downtime
- Likelihood: Low
- Impact: Low
- Mitigation: Redundancy, monitoring
## Safeguards
- [ ] Encryption at rest and in transit
- [ ] Access controls (only authorized users)
- [ ] Audit logging (who accessed what, when)
- [ ] Data minimization (collect only needed)
- [ ] User consent process
- [ ] Deletion process for user requests
- [ ] Regular security audits
- [ ] Data protection training for staff
## Conclusion
- ✓ Low risk with safeguards in place
- Proceed with deployment
10.2 Fairness Audit Process
Schedule: Before launch, then quarterly
Process:
# Fairness Audit: [Model/System Name]
## Scope
- System: [Description]
- Evaluation period: [Dates]
- Protected attributes: [gender, race, age, ...]
## Data
- Evaluation set size: [N] records
- Protected attribute coverage: [%]
- Baseline: [previous version or benchmark]
## Metrics
Choose appropriate fairness metrics (see section 4.2):
- Demographic parity: Pass if <5% difference in outcome rates
- Equalized odds: Pass if <5% difference in TPR
- Calibration: Pass if predicted probabilities match actual rates
## Results
[Present results for each protected attribute]
## Findings
- ✓ Pass: Metrics within acceptable ranges
- ⚠️ Investigate: [Metric] shows [X]% disparity, recommend mitigation
- ❌ Fail: [Metric] violates threshold; halt deployment
## Mitigation Plan (if needed)
1. Root cause: [Why is disparity occurring?]
2. Solution: [Pre/in/post-processing mitigation]
3. Timeline: [When to implement]
4. Re-evaluation: [When to audit again]
## Sign-off
- Product owner: [Name/Date]
- ML engineer: [Name/Date]
- Ethics reviewer: [Name/Date]
10.3 Ethical Review Board (if applicable)
Purpose: Review high-stakes AI decisions
Structure:
- Members: Product, Legal, Ethics, Engineering, affected community rep
- Frequency: Monthly meetings for new systems; as-needed for urgent issues
- Authority: Can delay or block deployment
Review questions:
- Does this system align with org values?
- Could this system harm vulnerable groups?
- Is there disclosure of AI involvement?
- Have we tested for bias and fairness?
- Is deployment necessary or is there less-risky alternative?
- Have we considered long-term societal impact?
10.4 Regular Re-evaluation
Schedule:
- Monthly: Fairness metrics (automated)
- Quarterly: Full fairness audit
- Annually: Privacy & security audit, ethics review
- Event-driven: Any user complaint, bias concern, major change
Monitoring dashboard:
Fairness Metrics (Last Updated: Today)
Demographic parity: ✓ Pass (4.2% diff)
Equalized odds: ✓ Pass (3.8% TPR diff)
Coverage: ✓ Pass (all groups represented)
Privacy Metrics (Last Updated: Today)
Data deletion requests: 142 (avg 2 days to delete)
PII detected in logs: 0
Unauthorized access attempts: 0
User Feedback (Last 30 days)
Fairness complaints: 1 (investigating)
Transparency concerns: 3 (addressed in FAQ)
10.5 Incident Response for Violations
If bias is discovered:
- Assess severity: Is model still accurate for all groups?
- Immediate action: Flag affected users (if any), pause recommendations if severe
- Investigation: Root cause analysis (data, model, features)
- Mitigation: Retrain, adjust thresholds, or roll back
- Communication: Inform affected users, explain what happened and how it’s fixed
- Prevention: Update processes to prevent recurrence
If privacy violation occurs:
- Containment: Stop any further data exposure
- Assessment: How much data, which users affected?
- Notification: Notify users within 30 days (GDPR), 60 days (HIPAA)
- Investigation: Root cause (breach, unauthorized access, bug)
- Remediation: Offer credit monitoring, password reset, etc.
- Prevention: Close security gap, increase monitoring
If deception discovered:
- Immediate action: Stop deceptive practice
- Disclosure: Inform users about what happened
- Correction: Update system to be transparent
- Audit: Check for other deceptive patterns
- Legal review: Assess FTC/regulatory violation risk
11. Compliance Checklist
11.1 GDPR Compliance Items
Before launching:
- Identified lawful basis for data processing
- Completed Data Protection Impact Assessment (DPIA)
- Appointed Data Protection Officer (if required)
- Registered with Data Protection Authority (if required)
- Privacy policy explains data use clearly
- User consent mechanism (if needed) is specific, informed, granular, freely given
- Data minimization: only collecting necessary data
- Encryption in transit (HTTPS) and at rest
- Access controls documented
- Retention policy defined and automated
Ongoing:
- Process for responding to access requests (30 days)
- Process for responding to deletion requests (30 days)
- Process for responding to portability requests
- Breach notification plan
- Data Processing Agreements with third parties
- Annual privacy audit
- Staff training on GDPR
Monitoring:
- Audit logs of data access
- Regular scans for PII in logs
- Alert system for suspicious access patterns
- Backup deletion after retention period
11.2 HIPAA Compliance Items (if applicable)
Security Rule:
- Encryption of all PHI at rest (AES-256 or equivalent)
- Encryption of all PHI in transit (TLS 1.2+)
- Access controls (authentication, authorization)
- Audit logs (who accessed what data, when, why)
- Integrity checks (data not modified)
- Backup and disaster recovery plan
Privacy Rule:
- Minimum necessary principle (collect only needed data)
- User authorization before use (signed Business Associate Agreement if third party)
- Patient rights (access, amendment, accounting of disclosures)
- Breach notification plan (notify affected individuals within 60 days)
Business Associate Agreement (if using third parties):
- Signed before any PHI is shared
- Specifies what data, how it will be used, safeguards
- Requires subcontractors to also sign BAA
- Includes breach notification provisions
Documentation:
- Privacy policy explaining HIPAA safeguards
- Security audit plan
- Incident response plan
- Training records for all staff handling PHI
11.3 FTC Guidance Alignment
Transparency & Disclosure:
- Clearly disclose when AI is involved in consequential decisions
- Explain material limitations (what the AI can’t do well)
- Don’t claim capabilities without evidence
- Don’t hide material information in fine print
Prohibited Practices:
- Don’t impersonate humans
- Don’t make false claims about accuracy or fairness
- Don’t discriminate based on protected attributes
- Don’t target vulnerable populations without safeguards
Evidence & Testing:
- Claims about accuracy must be substantiated with testing
- Claims about fairness must be supported by fairness audits
- Performance claims should include limitations and caveats
- Regular re-evaluation to maintain claims
11.4 Industry-Specific Requirements
Financial Services (SEC, FINRA):
- Model risk management framework documented
- Model validation report (independent reviewer)
- Performance monitoring (monthly)
- Explainability: can explain decisions to regulators
- Fairness: no discriminatory outcomes
- Disclosure: customers informed of AI involvement
Employment (EEOC):
- Fairness audit: no disparate impact by protected attributes
- Performance audit: works equally well for all demographics
- Validation: proven to predict job performance
- Disclosure: candidates informed of AI screening (if used)
- Appeals: process for candidates to dispute decision
Healthcare (FDA, State Boards):
- Validation study (if clinical use)
- Safety analysis (what harms could occur?)
- Clinical trial results (if significant decisions)
- Adverse event reporting (track problems in production)
- Clear labeling of limitations
Education (FERPA):
- Confidentiality of student records maintained
- Parental consent (if students under 18)
- Data retention limits
- No unauthorized disclosure
11.5 Documentation Requirements
Maintain records of:
- Fairness audits (quarterly results)
- Privacy impact assessments
- Consent records (when given, what consent, withdrawal)
- Data deletion requests (proof of deletion)
- Breach incidents (what, when, who, remediation)
- Model changes (version history, performance impact)
- Bias findings and mitigation steps
- User complaints and resolution
- Training records (staff trained on AI ethics, GDPR, etc.)
- Third-party assessments (security audit, SOC 2, etc.)
Retention period:
- Keep documentation for minimum of 3-7 years
- Indefinite for major incidents or legal disputes
- Follow industry standards (financial: 7 years, healthcare: varies by state)
12. Quick Reference: Is Your Harness Regulated?
Answer these questions:
-
Does your harness make consequential decisions about people? (recommendations, approvals, rankings, categorization)
- Yes → Must disclose AI involvement
- No → Skip to Q2
-
Does it process personal or sensitive data? (names, emails, health info, financial data, location, browsing history)
- Yes → Must comply with GDPR/CCPA/HIPAA (if applicable)
- No → Skip to Q3
-
Will it be used in regulated industries? (healthcare, finance, employment, insurance, education)
- Yes → Must comply with industry regulations
- No → Skip to Q4
-
Could it affect access to services? (job offers, credit, housing, education, healthcare)
- Yes → Must audit for fairness and discrimination
- No → You’re good!
If ANY answer is yes: Your harness requires compliance work. Start with:
- Privacy Impact Assessment (Sec 10.1)
- Fairness Audit Process (Sec 10.2)
- Relevant compliance checklist (Sec 11)
Appendices
A. Regulatory Timeline (April 2026)
| Date | Regulation | Impact |
|---|---|---|
| Feb 2024 | FTC AI Transparency Guidelines | Disclosure required for AI decisions |
| Nov 2023 | EU AI Act (partial entry) | Restrictions on high-risk AI |
| May 2018 | GDPR enforcement | EU data protection active (ongoing) |
| Jan 2020 | CCPA enforcement | California privacy law |
| Ongoing | HIPAA | Healthcare data protection |
B. Resources
- FTC Guidance: ftc.gov/ai
- GDPR: gdpr-info.eu
- HIPAA: hhs.gov/hipaa
- NIST AI Risk Management: nist.gov/aigovernance
- Partnership on AI: partnershiponai.org
- Model Cards: research.google/pubs/ModelCards
- Fairness Tools: fairlearn.org, agaricus.io/sos
C. Glossary
Bias: Systematically treating groups differently based on protected attributes Differential Privacy: Mathematical guarantee that removing an individual’s data doesn’t significantly change model output Equalized Odds: Fairness metric where true positive and false positive rates are equal across groups Explainability: Ability to understand and explain model decisions Fairness: Treating different groups equitably; no discrimination based on protected attributes PII: Personally Identifiable Information; data that identifies or can identify a person SHAP: Shapley Additive exPlanations; method for explaining model predictions Transparency: Openly disclosing how AI systems work and when they’re being used
Document version: 1.0
Last updated: April 2026
Next review: April 2027
Validation Checklist
How do you know you got this right?
Performance Checks
- Privacy Impact Assessment completed in <2 weeks for MVP
- Fairness audit runs in <1 day (automated baseline + human review)
- Data deletion tested: user data removal takes <7 days
- Compliance documentation generated automatically before deployment
Implementation Checks
- Privacy policy written and reviewed by legal
- GDPR/CCPA/HIPAA applicability determined (questionnaire in Section 10)
- Data retention policy defined: what data kept, how long, deletion process
- Fairness audit completed: demographic parity checked on 3+ protected groups
- Model card created: training data, performance, limitations documented
- Explainability mechanism in place: can show why harness made decision
- Data minimization applied: only collect essential data for stated purpose
Integration Checks
- Harness discloses AI involvement to users (transparency)
- Data deletion integrates with persistence layer: database + files cleaned
- Audit trail working: logging who accessed what data and when
- Consent collection: user opts-in before processing personal data
- Error monitoring flagged with bias detection: notified if accuracy differs by group
Common Failure Modes
- Scope creep on data collection: Started minimal, expanded without re-assessment
- Fairness audit shows high disparate impact: Model treats protected groups differently
- No audit trail: Can’t prove data deletion happened or track access
- Disclosure missing: Users unaware AI system making decisions about them
- Compliance checkbox mentality: Audit completed but findings not acted upon
Sign-Off Criteria
- Legal review completed: acceptable risk profile for your jurisdiction
- Fairness audit passed: no group has >20% performance disparity
- Privacy controls tested: data deletion and access controls verified
- Disclosure statement written and displayed to users
- Compliance maintenance plan: how often re-audit? who responsible?
See Also
- Doc 10 (Security & Safety): Data protection and access controls for sensitive data
- Doc 11 (Testing & QA): Fairness testing integrated into automated test suite
- Doc 13 (Cost Management): Regulatory compliance costs (audits, legal, training)