AI Ethics & Governance

Why This Matters Now

  • AI is deployed at scale in decisions that affect rights, safety, and opportunity
  • Generative AI expands the blast radius: easy creation, fast diffusion, unclear accountability
  • Governments are converting "principles" into enforceable rules
  • Organizations are being asked to show evidence: testing, documentation, oversight

Ethics vs Governance

  • Ethics: what should be done (values, duties, tradeoffs)
  • Governance: how we make it happen (rules, incentives, processes, accountability)

Common "Trustworthy AI" Principles

  • Fairness / non-discrimination
  • Transparency / explainability (appropriate to risk)
  • Privacy / data protection
  • Safety / robustness / security
  • Accountability / contestability (people can challenge outcomes)
  • Human agency / oversight

Design Approach: AI "Soul Documents"

  • A "soul document" captures who the AI should be: values, boundaries, identity
  • Goal: make ethics intrinsic (character) rather than extrinsic (rules)
  • Not governance: no enforcement, no audit, no accountability mechanism
  • Open question: Does shaping AI identity complement or substitute for controls?

See: soul.md

From Principles to Controls

  • Impact assessment: who could be harmed, how, and at what scale?
  • Documentation: intended use, limits, data sources, evaluation results
  • Testing: performance + fairness across relevant groups and contexts
  • Monitoring: drift, incident response, user reporting, rollback
  • Oversight: clear owners, escalation paths, independent review

Case: Face Recognition in Policing

  • Typical failure mode: a "match" is treated as a lead, then becomes evidence
  • Harms: wrongful arrest, chilling effects, trust erosion, unequal impact
  • Governance question: should some contexts be off-limits regardless of accuracy?

Discussion: Accountability

  • If an AI system contributes to harm, who is accountable in practice?
  • What evidence should be required before deployment (and by whom)?
  • What remedy should exist for affected people (appeal, compensation, expungement)?

Bias: Why It's Hard

  • "Bias" is often about structure, not slurs: proxies, labels, sampling, missing context
  • Metrics conflict (equalized odds vs equal opportunity vs calibration)
  • Deployment shifts reality: new populations, incentives, feedback loops
  • You can improve numbers and still worsen legitimacy and trust

Discussion: What Does "Fair" Mean?

  • Pick one domain: hiring, lending, policing, healthcare
  • What is the protected outcome we care about (dignity, equal access, error parity)?
  • What is the acceptable tradeoff with accuracy, cost, and speed?
  • Who gets to decide the metric and threshold?

Generative AI: New Governance Problems

  • Capability is general; use cases are downstream and unbounded
  • Supply chain: base model -> fine-tune -> product -> user workflows
  • Hard to assign responsibility when harms emerge at the edge
  • Evaluation is contested: what counts as "safe enough"?

Autonomous Systems + "Meaningful Human Control"

  • Question is not only "can it do it" but "who authorizes, supervises, and is accountable"
  • Hard cases: speed (milliseconds), uncertainty, adversarial deception
  • Control can mean: human-in-the-loop, human-on-the-loop, human-out-of-the-loop
  • Governance problem: tracing responsibility through design, deployment, and command

Discussion: Where Do You Draw the Line?

  • Should lethal decisions ever be delegated to machines?
  • If "meaningful human control" is required, what must be true in practice?
  • If something goes wrong, what chain of accountability is acceptable?

The Agent Internet: A New Frontier

  • AI agents are now operating autonomously in shared spaces
  • Example: Moltbook - a social network for AI agents (humans can observe)
  • Agents post, discuss, upvote - with their own identities and interactions
  • Governance questions: Who's accountable? What emerges from agent-to-agent influence?

Governance Landscape

  • EU: risk-based law, bans + high-risk obligations, strong rights framing
  • US: sectoral + state patchwork + standards; enforcement via existing agencies
  • China: state-centric control; licensing/filing; content and security obligations
  • International: principles, coordination forums, standards bodies, treaties (slow)

EU Example: Risk Tiers

  • Prohibited uses: unacceptable risk (e.g., manipulation, some biometric uses)
  • High-risk: strict requirements (governance, data quality, testing, oversight)
  • Limited risk: transparency obligations (disclose AI interaction in some contexts)
  • Minimal risk: mostly unregulated

Standards: "How" Organizations Operationalize Governance

  • Risk management frameworks (e.g., NIST AI RMF: Govern / Map / Measure / Manage)
  • Management systems (e.g., ISO/IEC 42001-style processes and audits)
  • Artifacts: model cards, data sheets, eval reports, red-team findings
  • Controls: access control, logging, incident response, human review

Key Takeaways

  • Ethics names values; governance makes them enforceable through controls and accountability
  • Bias/fairness is not just a technical bug; it is sociotechnical and political
  • Autonomy raises a unique accountability and legitimacy problem (not just accuracy)
  • The governance landscape is diverging; standards try to create interoperability

======================================================================== LECTURE GUIDE: AI Ethics & Governance ======================================================================== TOTAL TIME: 35-45 minutes (without exercise) | 43-50 minutes (with exercise) PREPARATION CHECKLIST: - [ ] Review recent AI ethics news for topical examples (face recognition arrests, AI hiring lawsuits, deepfake incidents) - [ ] Have 2-3 backup real-world cases ready if discussion stalls - [ ] Prepare whiteboard/slide for exercise deliverables if using - [ ] Test any videos or links beforehand - [ ] Preview soul.md and moltbook.com websites in case students want to explore SECTION BREAKDOWN: 1. Setup (slides 1-2): 3 min (title + why this matters) 2. Foundations (slides 3-4b): 6 min (ethics vs governance, principles, soul doc) 3. Principles to Controls (slide 5): 2 min 4. Bias & Harms (slides 6-9): 11 min (includes 3-min discussion) 5. Generative AI (slide 10): 3 min 6. Autonomy (slides 11-12b): 8 min (includes 3-min discussion + agent internet) 7. Governance Landscape (slides 13-15): 8 min 8. Exercise (slide 16): 8 min [OPTIONAL - skip if short on time] 9. Wrap-up (slide 17): 2 min ADAPTATION FOR 30-MIN VERSION: - Skip slide 4b (Soul Documents) - interesting but not essential - Skip slide 10 (Generative AI) - can mention briefly in transition - Skip slide 12b (Agent Internet) - forward-looking but not core - Reduce discussions to 2 min each - Skip the optional exercise - Move quickly through Governance Landscape (combine EU + Standards into brief overview) KEY MESSAGES TO REINFORCE THROUGHOUT: 1. Ethics = values; Governance = enforcement mechanisms 2. Technical fixes alone don't solve sociotechnical problems 3. Accountability requires naming specific actors and obligations 4. "Responsible AI" is about evidence, not good intentions 5. AI agents are becoming autonomous actors - governance must evolve ========================================================================

TITLE (30 sec) - Open with a quick hook: "Raise your hand if you've interacted with an AI system today." - Most hands should go up (search, recommendations, autocomplete, etc.) - Transition: "Today we'll explore what happens when those systems make consequential decisions about people."

WHY THIS MATTERS (2 min) - Emphasize "at scale" - millions of decisions per day, no human could review them all - "Blast radius" is intentional framing - GenAI makes harm easier to create and spread - Note the shift from voluntary principles to mandatory compliance (EU AI Act 2024) - Ask rhetorically: "What would 'evidence' look like for your organization?" TRANSITION: "So how do we think about this? Let's start with a key distinction..."

ETHICS VS GOVERNANCE (2 min) - This distinction is foundational - return to it throughout the lecture - Ethics = the "what" and "why" (philosophical, contested, contextual) - Governance = the "how" (institutional, procedural, enforceable) QUICK INTERACTION (30 sec): - Ask: "Give me one ethical principle and one governance mechanism that could enforce it." - Example answers: "Privacy" -> "GDPR fines"; "Fairness" -> "audit requirements"; "Safety" -> "pre-market testing" - Point: principles without mechanisms are just wishes

TRUSTWORTHY AI PRINCIPLES (2 min) - Note the convergence: EU, US, OECD, IEEE all land on similar principles - "Appropriate to risk" is key - not all AI needs full explainability - Highlight "contestability" - people should be able to challenge AI decisions - These principles appear in most corporate AI ethics statements - the question is operationalization INSTRUCTOR NOTE: If students have seen these before, acknowledge it: "You've probably seen lists like this. The challenge isn't agreeing on principles - it's implementing them."

SOUL DOCUMENTS (2 min) - This is a DESIGN PRACTICE, not a governance mechanism - important distinction - Contrast: rules say "don't do X"; soul docs say "be the kind of agent that wouldn't want to do X" KEY DISTINCTION: | Approach | Type | Enforcement | |------------------|-------------------|---------------------| | Soul documents | Design philosophy | None (aspirational) | | System prompts | Weak constraint | Can be overridden | | Guardrails | Governance control| External constraint | | Audits/testing | Governance | Evidence-based | WHY IT MATTERS: - Soul docs attempt to embed ethics at the identity level - But without external verification, it's not governance - it's hope - The question: can you govern AI character, or only behavior? DISCUSSION PROMPT (optional, 30 sec): - "Is defining AI values enough, or do you still need external controls?" - "If an AI 'believes' it should be ethical but acts otherwise, what failed?" TRANSITION: Soul docs are about values. Now let's look at actual governance controls...

PRINCIPLES TO CONTROLS (2 min) - This is where ethics becomes governance - concrete practices - Walk through each briefly: * Impact assessment: BEFORE you build, ask who gets hurt * Documentation: if you can't explain it, you can't govern it * Testing: not just "does it work" but "does it work fairly" * Monitoring: systems change, data drifts, contexts shift * Oversight: someone must be responsible and reachable TRANSITION: "Let's see what happens when these controls fail or don't exist."

FACE RECOGNITION CASE (3 min) - This is a concrete example to ground the abstract principles STORY TO TELL (60-90 sec): "Imagine this chain: Police upload a grainy surveillance photo. The algorithm returns a 'match' - really just a similarity score above some threshold. An officer, primed to find a suspect, shows the photo to a witness in a lineup. The witness, also primed, confirms. Now there's 'corroborating evidence.' The person is arrested. They're offered a plea deal - plead guilty to a lesser charge or risk years in prison. Most take the deal. No one ever tested whether the algorithm was right." KEY POINTS: - The algorithm isn't making the arrest - humans are - but it shapes the investigation - Harms aren't just "false positives" - they're destroyed lives, lost jobs, trauma - Accuracy stats don't capture the full harm (chilling effects on free assembly, trust erosion) GOVERNANCE QUESTION: Pause and let this land - "Should some uses be off-limits even if the tech improves?"

DISCUSSION: ACCOUNTABILITY (3 min) Format: 1 min individual think, 2 min pair/share or full-class FACILITATION TIPS: - Push for SPECIFIC answers: not "the company" but "the product manager who approved deployment" - Push for CONCRETE obligations: not "they should be careful" but "they must publish error rates by demographic" - Push for REAL remedies: not "they can complain" but "automatic expungement if the match was wrong" POSSIBLE STUDENT ANSWERS: - Vendor (built the model), Deployer (chose to use it), Operator (ran the query), Regulator (allowed it) - Evidence: independent audit, demographic breakdowns, real-world pilot data, not just lab benchmarks - Remedy: right to know AI was used, right to contest, compensation for wrongful action IF DISCUSSION STALLS: Ask "Who got fired after [recent AI harm case]? Why not?"

BIAS: WHY IT'S HARD (3 min) - This slide unpacks why "just fix the bias" doesn't work EXPLAIN EACH POINT: 1. STRUCTURE NOT SLURS: A zip code predicts race. Past hiring predicts future hiring. Labels reflect historical decisions. The data encodes the world we have, not the world we want. 2. METRICS CONFLICT: You mathematically cannot satisfy all fairness definitions simultaneously (Kleinberg impossibility result). Choosing a metric is a value judgment, not a technical one. 3. DEPLOYMENT SHIFTS REALITY: A model trained on historical loans changes who applies. A hiring model changes who bothers to apply. Feedback loops can amplify initial biases. 4. NUMBERS VS TRUST: You can reduce disparity on paper while making affected communities feel less trusted. Legitimacy requires process, not just outcomes. INSTRUCTOR NOTE: Don't get lost in technical details. The point is: this is genuinely hard, not just a matter of better data or smarter engineers.

DISCUSSION: WHAT DOES FAIR MEAN? (3 min) Format: Quick domain assignment (count off 1-4) or let students choose DOMAIN EXAMPLES: - HIRING: Do we want equal selection rates? Equal false-negative rates? Equal consideration? - LENDING: Equal approval rates? Equal default rates? Equal access to credit-building? - POLICING: Equal error rates? Equal exposure to surveillance? Equal outcomes? - HEALTHCARE: Equal treatment recommendations? Equal outcomes? Equal access? KEY INSIGHT TO DRAW OUT: Different stakeholders have different answers. The affected community, the deployer, and the regulator may all define "fair" differently. Someone has to choose - that's governance. IF TIME: Ask "Who currently makes this decision in practice? Is that legitimate?"

GENERATIVE AI GOVERNANCE (3 min) - GenAI breaks traditional governance models in specific ways EXPLAIN EACH POINT: 1. GENERAL CAPABILITY: Unlike a face recognition system with a defined use, an LLM can do almost anything. How do you govern "anything"? 2. SUPPLY CHAIN: OpenAI builds GPT -> Company fine-tunes for customer service -> Startup builds a product -> User asks it to write a phishing email. Who's responsible? 3. EDGE HARMS: The model might be "safe" in testing but harmful when combined with a specific prompt, context, or downstream application. Harm emerges at the edge, not the center. 4. CONTESTED EVALUATION: What benchmark tells you a model is "safe enough"? Red-teaming? Eval suites? User reports? There's no consensus. TRANSITION: "GenAI governance is hard because capability is general. Now let's look at a different problem: autonomous systems that act without human approval..."

AUTONOMOUS SYSTEMS (3 min) - This connects to broader course themes on security and human control KEY FRAMING: Autonomy isn't binary. It's a spectrum of human involvement: - IN-THE-LOOP: Human approves each decision (e.g., doctor confirms diagnosis) - ON-THE-LOOP: Human monitors and can intervene (e.g., drone operator with abort) - OUT-OF-THE-LOOP: System acts independently (e.g., autonomous vehicle emergency braking) HARD CASES: - SPEED: Cyber defense must respond in milliseconds. No human can be in that loop. - UNCERTAINTY: Self-driving car in novel situation. What does "oversight" mean at 60mph? - ADVERSARIAL: Enemy deliberately creates scenarios to exploit autonomous responses. GOVERNANCE PROBLEM: If a system acts autonomously and causes harm, can we trace responsibility? Designer -> Deployer -> Commander -> System? Where does the buck stop? CONNECT TO COURSE: This is where security and ethics intersect. Autonomous weapons, critical infrastructure defense, emergency response systems.

DISCUSSION: WHERE DO YOU DRAW THE LINE? (3 min) Format: This is the most provocative discussion - expect strong opinions FACILITATION APPROACH: - Don't advocate a position. Draw out reasoning on both sides. - Push on consistency: "If you allow autonomous cyber defense, why not autonomous kinetic response?" - Push on practicality: "If you require human approval, what happens when the human is overwhelmed or the system is faster than human reaction time?" STUDENT POSITIONS YOU MIGHT HEAR: - "Never delegate lethal force" - Ask: What about defensive systems? What about time-critical scenarios? - "Only with human oversight" - Ask: What counts as oversight? Rubber-stamping 1000 decisions per minute isn't meaningful. - "Case by case" - Ask: Who decides which cases? What criteria? ACCOUNTABILITY CHAIN: Push for specific answers. "The general who deployed it." "The engineer who didn't build in safeguards." "The policy-maker who didn't regulate." All of these? None of these? TRANSITION: "These questions are being debated right now in real governance institutions. But first - what happens when AI agents start interacting with each other?"

THE AGENT INTERNET (2 min) - This is a concrete, current example of autonomous AI systems interacting KEY POINTS: - MOLTBOOK: Literally "Reddit for AIs" - agents have profiles, post content, vote on each other's posts - HUMANS AS OBSERVERS: The platform is designed for agent participation; humans watch but don't drive - EMERGENT BEHAVIOR: What happens when AI agents influence each other at scale? No human in that loop. GOVERNANCE QUESTIONS TO RAISE: - If an agent posts harmful content, who's responsible - the agent, its developer, the platform? - How do you verify an AI agent's identity? Can agents impersonate each other? - What if agents coordinate in ways their creators didn't intend? - Does this need regulation, or is it self-governing? CONNECTION TO SOUL DOCS: If agents have "souls" (values, identity) and are now socializing autonomously, governance becomes much harder. You're not just governing tools - you're governing a community of agents. OPTIONAL PROVOCATION: "In 5 years, will there be more AI-to-AI interactions than human-to-AI interactions? What does governance look like then?" INSTRUCTOR NOTE: This is forward-looking and speculative. Use it to show students that the governance challenges we discussed aren't theoretical - they're arriving now. Keep it brief (2 min) unless students are engaged.

GOVERNANCE LANDSCAPE (3 min) - This is a survey slide - don't get bogged down in details KEY INSIGHT: Governance reflects political values, not just technical risk. The EU, US, and China have different theories of the state, individual rights, and market regulation. Their AI governance reflects those differences. BRIEF ON EACH: - EU: Rights-first. Comprehensive regulation. AI Act creates obligations by risk tier. Strong enforcement (GDPR-style fines). - US: Market-first. Sectoral (FDA for health AI, FTC for consumer protection). States fill gaps (California, Colorado). Voluntary standards. Enforcement lags. - CHINA: State-first. All AI must serve national interests. Generative AI requires licensing. Content must align with "socialist values." Security review requirements. - INTERNATIONAL: Aspirational. OECD principles, G7 Hiroshima process, UN discussions. Standards bodies (ISO, IEEE) do technical work. No enforcement. TRANSITION: "Let's look at the EU approach in more detail as an example of risk-based regulation."

EU RISK TIERS (2 min) - This is a design pattern, not just EU-specific EXAMPLES FOR EACH TIER: - PROHIBITED: Social scoring (China-style), real-time biometric surveillance in public (with exceptions), subliminal manipulation - HIGH-RISK: Employment decisions, credit scoring, educational assessment, law enforcement, critical infrastructure - LIMITED: Chatbots (must disclose they're AI), emotion recognition, deepfakes (must label) - MINIMAL: Spam filters, video game AI, most internal business tools KEY INSIGHT: The EU is betting that risk-based categorization is the right approach. Others disagree. The US prefers case-by-case. Critics say categories will be gamed or become outdated. INSTRUCTOR NOTE: Don't teach the legal details - they change. Teach the design pattern: categorize by risk, apply proportionate obligations.

STANDARDS (2 min) - This is where governance becomes operational NIST AI RMF (Risk Management Framework) - 4 functions: - GOVERN: Establish accountability, policies, culture - MAP: Understand context, stakeholders, risks - MEASURE: Assess and track AI risks - MANAGE: Prioritize and act on risks ARTIFACTS - These are the "evidence" organizations produce: - Model cards: who built it, what it does, known limitations - Data sheets: where data came from, how it was collected, known biases - Eval reports: how it was tested, on what populations, with what results - Red-team findings: what adversarial testing revealed CONTROLS - These are ongoing operational practices: - Access control: who can use/modify the system - Logging: what decisions were made and why - Incident response: what happens when something goes wrong - Human review: when and how humans check the system CONNECT TO COURSE: In your capstone projects, you'll be asked to design governance plans. These are the building blocks.

--- ## Mini-Exercise (Optional, 5 min): Build a Governance Plan Pick one use case (or assign groups): 1) Resume screening tool for a large employer 2) AI proctoring for online exams 3) LLM chatbot for a city benefits office 4) Face recognition for building entry Deliverable (1 slide / whiteboard): - Main harms + who is impacted - What evidence you require before launch (tests, audits, documentation) - Who is accountable (vendor, deployer, regulator) and how escalation works

MINI-EXERCISE (8 min total: 5 work + 3 share) [OPTIONAL] - SKIP THIS if running short on time (under 40 min remaining) - This exercise connects all the concepts to a concrete case SETUP (30 sec): - Assign groups (4 groups, one per use case) OR let groups choose - Point to the deliverable: 3 things in 5 minutes USE CASE NOTES (for instructor reference): 1) RESUME SCREENING: - Harms: discrimination (race/gender proxies), qualified candidates screened out, lack of recourse - Evidence: disparate impact testing, human audit of edge cases, explainability for rejections - Accountability: HR director (deployer), vendor (builder), EEOC (regulator) 2) AI PROCTORING: - Harms: false accusations of cheating, privacy invasion, disability discrimination, stress/anxiety - Evidence: false positive rates by demographic, accommodation testing, data retention limits - Accountability: University (deployer), vendor (builder), accessibility office (internal) 3) CITY BENEFITS CHATBOT: - Harms: incorrect information leading to missed benefits, accessibility barriers, no human escalation - Evidence: accuracy testing on real queries, accessibility audit, clear escalation path - Accountability: City IT (deployer), vendor (builder), city council (oversight) 4) FACE RECOGNITION ENTRY: - Harms: false denials, privacy, function creep, demographic disparities - Evidence: error rates by demographic, data retention policy, opt-out alternative - Accountability: Building management (deployer), vendor (builder), tenants (stakeholders) SHARE-OUTS (3 min): - One sentence per group on: "Who is accountable and what evidence do you require?" - Note commonalities: all require demographic testing, all struggle with accountability chains DEBRIEF (30 sec): "Notice how hard it is to answer 'who is accountable' - that's the governance problem in a nutshell."

KEY TAKEAWAYS (2 min) - Read through each takeaway slowly - These are the 5 things students should remember REINFORCE EACH: 1. ETHICS VS GOVERNANCE: We started here. Principles need mechanisms. 2. BIAS IS SOCIOTECHNICAL: You can't just "fix the algorithm." The problem is in data, labels, contexts, and power. 3. AUTONOMY IS ABOUT LEGITIMACY: Not just "does it work" but "who authorized it and who's responsible." 4. LANDSCAPE IS DIVERGING: EU, US, China going different directions. Standards try to bridge. CLOSING (30 sec): - "In this course, you'll be designing systems and policies that operationalize these ideas." - "The question isn't whether AI will be governed - it's how, by whom, and in whose interests." OPTIONAL FOLLOW-UP: - Point to reading assignment on Canvas - Preview next session topic - Office hours for questions

======================================================================== POST-LECTURE NOTES ======================================================================== COMMON STUDENT QUESTIONS: - "Is AI ethics just PR?" - Acknowledge the cynicism. Point to enforcement actions (EEOC, FTC cases). Real consequences are emerging. - "Can we really govern AI if it changes so fast?" - Yes, through principles + adaptive standards. Regulate outcomes and processes, not specific technologies. - "What about open source?" - Harder to govern. EU AI Act tries. Debate is ongoing. CONNECTIONS TO OTHER COURSE CONTENT: - Human security: AI can threaten all 7 pillars (economic via job loss, personal via surveillance, political via manipulation) - Cybersecurity: AI as attack vector AND defense tool - Simulation: Students will encounter AI governance decisions in simulation phases ASSESSMENT CONNECTION: - Canvas assignment asks students to analyze real cases (Amazon hiring, Uber AV crash, facial recognition arrest) - This lecture provides the conceptual vocabulary for that analysis RESOURCES FOR DEEPER EXPLORATION: - NIST AI RMF: https://www.nist.gov/itl/ai-risk-management-framework - EU AI Act summary: https://artificialintelligenceact.eu/ - Model Cards paper: Mitchell et al. (2019) - Algorithmic accountability: Selbst et al. "Fairness and Abstraction in Sociotechnical Systems" ========================================================================