Flash Debate
Build the evidence case for a contested claim — then argue it.
A contested claim, two teams, and an AI that will argue either side just as readily — your job is to make it argue well, with citations you can defend.
The goal
The Goal By the end, your team can stand up and deliver a 90-second case on your assigned side of a contested pain-neuroscience resolution — anchored by your three strongest pieces of evidence, a plausible mechanism, and a pre-loaded rebuttal to the other side’s best shot. Every citation you put on the slide is one a teammate has actually verified exists and says what you claim it says. The bar is defensible, not just fluent.
Why it matters
Science is an argument that never fully ends, and your career is a long series of structured disagreements: defending a hypothesis in your dissertation committee, answering Reviewer 2, justifying a grant aim, steering a journal club away from over-reading one flashy fMRI figure. Research-capable AI can collapse hours of literature triage into minutes — but it can also fabricate a citation that looks perfect down to the DOI, or quietly give you only the evidence that flatters your side.
This challenge trains the two skills that separate a useful AI research partner from a confidently wrong one: getting it to steelman both sides of a real controversy, and checking its work before you stake your name on it. The opioid-remodeling resolution below is genuinely unsettled in the literature, which is exactly the point — there is real evidence and real counter-evidence to find.
Run of show
- 0:00–0:05 · Challenge introduction (5 min)
- 0:05–0:20 · Work in your group (15 min)
- 0:20–0:22 · Post your best prompt (2 min)
- 0:22–0:32 · Share & debrief (10 min)
- 0:32–0:35 · Reset (3 min)
Bad prompt to better prompt
Why it disappoints: you asked the model to confirm a foregone conclusion, so it returns a tidy paragraph of agreeable generalities, a couple of plausible-sounding citations you can’t verify, and no counter-evidence. You now have a case that will collapse the moment the CON team asks one question.
You are preparing a graduate journal club to debate this resolution: “Chronic opioid use causes remodeling of opioid receptors and related circuits in the brain, which mediates adverse psychological outcomes.”
I am arguing PRO. Do three things, clearly separated:
EVIDENCE FOR: List the 3-4 strongest empirical findings supporting the resolution. For each, give the study type (human PET, rodent autoradiography, longitudinal cohort, etc.), the specific finding, the approximate sample/species, and a citation (authors, year, journal). Prioritize causal or longitudinal designs over cross-sectional correlation.
STEELMAN THE OPPOSITION: Give the 3 strongest counterarguments a sharp CON team would raise — confounds, reverse causation, species-translation problems, receptor downregulation being adaptive rather than harmful.
SELF-AUDIT: For each citation, rate your confidence it is real and accurately characterized (high/medium/low) and flag any you may have approximated. Then tell me the single weakest link in my PRO case.
Why it works: it assigns a role and a stance, decomposes the task into evidence / steelman / audit, demands study design and verifiable citation details (so you can check them), forces it to build the other team’s case, and makes it score its own confidence and name its own weak point.
Prompting moves to try
- Decompose the case (claim → evidence → mechanism → rebuttal). Ask for each layer separately. A debate point with a mechanism beats a debate point that’s just a correlation, and a rebuttal you wrote before the round beats one you improvise on stage.
- Role + stance prompting. “You are a skeptical addiction neuroscientist on the CON side” produces sharper, more adversarial output than a neutral “summarize the evidence.” Run it once for your side, once for the opposition.
- Adversarial self-evaluation. After it gives you a case, prompt: “Now play Reviewer 2. Attack this case as harshly as a hostile peer reviewer would, then rate the strength of each of my points from 1-10 and tell me which would not survive cross-examination.”
- Force a confidence + citation audit. Make it tag every reference high/medium/low confidence and explicitly flag anything it might have fabricated or paraphrased from memory. Treat “low” as “do not cite until verified.”
- Ask it to improve your prompt. “Before you answer, rewrite my prompt to get a more rigorous, less one-sided debate brief — then answer the improved version.”
- Hunt for the disconfirming study. “Find the strongest single study that contradicts my side, and explain why my evidence still holds despite it.” This is how you pre-empt the other team.
Starter materials
The resolution (read it aloud, assign sides):
Resolution “Chronic opioid use causes remodeling of opioid receptors and related circuits in the brain, which mediates adverse psychological outcomes (e.g., depression, anhedonia, dysphoria).”
Teams are assigned PRO or CON at random. CON is not arguing “opioids are harmless” — CON argues that the specific causal chain (use → receptor/circuit remodeling → psychological harm) is not established by the evidence.
Debate-prep scaffold — fill one row per argument (copy into your shared doc):
| Slot | PRO fills in | CON fills in |
|---|---|---|
| Claim (one sentence) | Chronic exposure drives lasting μ-opioid receptor downregulation / circuit changes | The receptor changes are inconsistent, reversible, or not the cause of mood symptoms |
| Evidence #1 (study type, finding, citation) | ||
| Evidence #2 | ||
| Evidence #3 | ||
| Mechanism (the causal “how”) | e.g., receptor downregulation → blunted endogenous reward signaling → anhedonia | e.g., pain, withdrawal, and pre-existing depression confound the link |
| Best opposing point (steelman) | ||
| Rebuttal (your answer to it) | ||
| Weakest link in our case |
Evidence-quality rubric — rank what AI hands you before you trust it:
| Tier | What it looks like | Debate weight |
|---|---|---|
| A | Longitudinal / interventional human study, or converging human + animal causal evidence | Lead with it |
| B | Cross-sectional human imaging (PET/fMRI), or controlled rodent study | Solid support |
| C | Single small study, narrative review, or animal-only with weak translation | Use with a caveat |
| D | Mechanistic speculation, no primary data, or a citation you can’t verify | Do not put on the slide |
Citation-check protocol (do this before the prompt-post):
- Copy each citation the AI gave you. Search the title or DOI in a real database (PubMed / Google Scholar / your library).
- Confirm three things: the paper exists, the authors/year/journal match, and the finding actually says what your brief claims (open the abstract).
- Any citation that fails → drop it or downgrade to Tier D. Note in your doc: “AI-claimed, unverified.”
- Bonus: if a real paper exists but the AI got the finding subtly wrong, log that — it is the most teachable failure in the room.
Two pre-loaded landmines (hand one to each side as a stress test):
For CON to deploy “Receptor downregulation after chronic agonist exposure is a normal, often reversible homeostatic adaptation — present it as evidence of harm and you’ve confused a thermostat for a fire. Where is the human evidence that the receptor change causes the mood change, rather than both being caused by ongoing pain or withdrawal?”
For PRO to deploy “Reversibility doesn’t mean harmless: blunted endogenous opioid signaling during use can plausibly mediate anhedonia and dysphoria in real time, and there is converging rodent and human imaging evidence. Absence of a perfect longitudinal human RCT is not absence of a causal signal.”
Debrief questions
- Which AI tool produced the most verifiable evidence — not the most fluent — and how could you tell the difference from the output alone?
- How many citations survived the check protocol? What did the failures look like — invented outright, real-paper-wrong-finding, or right-paper-overstated?
- Did asking the AI to steelman the opposite side change your own case? What was the strongest counterargument you would have missed?
- When the AI rated its own confidence, was it calibrated — were its “high confidence” claims actually the ones that checked out?
- Where is the genuine scientific honest answer here? After all this, is the resolution PRO, CON, or “the evidence doesn’t settle it yet”?
Level up
- Run the opposite side. Feed your finished PRO brief to a fresh AI session and ask it to demolish you as the CON team, then patch every hole it finds. Bring the patched version to the round.
- Force a quantitative claim. Make the AI give an effect size or a number (e.g., percent change in receptor availability, hazard ratio for depression) — then verify that number against the primary source. Numbers are where fabrication hides best.
- Multi-model triangulation. Run the same brief through two different research-capable tools and only put a finding on the slide if both surface it and it survives the citation check. Note where the two models disagreed — that’s your map of the genuinely contested terrain.