Experiment Generator

From a single finding to six experiments that pin down the mechanism.

Turn one phenomenon into three basic-science and three clinical experiments that triangulate the mechanism across levels of analysis.

One finding goes in; six experiments that triangulate its mechanism come out — across cells, circuits, and lives.

35 min Intermediate teams of 3-5 Any chat AI

The goal

The Goal Starting from a single robust finding, produce six experiments — three basic-science and three clinical/psychosocial — that together would define the mechanism of the phenomenon. The set should span different levels of analysis (molecular, cellular, circuit, behavioral, cognitive, social) and the experiments should be complementary: each one rules out something the others can’t. We care about the destination (a convergent, mechanism-defining battery), not the route you take with the AI to get there.

Why it matters

The leap from “we observed X” to “here is why X happens” is the hardest move in science, and it’s exactly what reviewers, thesis committees, and grant panels grill you on. A finding only earns the word mechanism when it survives attack from multiple directions: a knockout AND a fMRI study AND a longitudinal cohort that all point the same way. Most of us get stuck inside one level of analysis — the molecular person designs more molecular studies, the clinician designs more trials. AI is a remarkably good brainstorming partner for jumping levels: it will propose an optogenetics experiment and a workplace-stress cohort in the same breath. Your job is to be the skeptic who keeps it honest — pruning the plausible-but-untestable and demanding that each experiment actually isolate a cause. Get good at this and you have a reusable engine for Aim 2 of every grant you’ll write.

Run of show

0:00–0:05 · Challenge introduction (5 min)
0:05–0:20 · Work in your group (15 min)
0:20–0:22 · Post your best prompt (2 min)
0:22–0:32 · Share & debrief (10 min)
0:32–0:35 · Reset (3 min)

Bad prompt to better prompt

Weak prompt

Give me some experiments to study experience-dependent sensitization in pain.

Why the output disappoints: you get a generic listicle — “do an animal study,” “do a survey,” “use fMRI” — with no hypotheses, no predictions, and no thought about what each design rules out. Everything clusters at one comfortable level of analysis, the experiments are interchangeable rather than complementary, and nothing actually isolates a cause. It reads like a textbook margin, not a research program.

Strong prompt

You are a study-section reviewer with a joint appointment in molecular pain neuroscience and clinical psychology. I’ll give you one robust finding and a short background. Design SIX experiments that together would define its MECHANISM: exactly three basic-science and three clinical/psychosocial, and force them to span different levels of analysis (molecular, cellular/circuit, behavioral, cognitive, social). The experiments must be COMPLEMENTARY — each must rule out something the others cannot. For each, output a card with: (1) Hypothesis (a specific causal claim), (2) Design (species/sample, manipulation, key measure, control), (3) Prediction (the result that would SUPPORT the hypothesis, plus the result that would FALSIFY it), (4) Confound (the single most likely alternative explanation and how the design addresses it). After the six cards, add a one-paragraph “Convergence” note: if all six came out as predicted, what mechanistic claim would we now be entitled to make? Finding and background below.

Why it works: it assigns a cross-level identity, hard-codes the 3+3 structure and the levels-of-analysis spread, demands falsification (not just confirmation) and an explicit confound per card, and ends with a convergence test that checks whether the six experiments actually add up to a mechanism. The output becomes a defensible research program instead of a brainstorm.

Prompting moves to try

Decompose the goal first. Before asking for experiments, ask the AI to list the candidate causal levels at which the finding could operate (receptor, synapse, circuit, learning, appraisal, social context). Then request one experiment per level — you’ve turned a vague ask into a structured grid.
Role-prompt for range. Give it a deliberately interdisciplinary identity (“molecular pain neuroscientist who also runs RCTs”). Single-discipline personas produce single-level experiments; hybrid personas jump levels.
Make it falsify itself. For every card, force both the supporting AND the falsifying prediction. A hypothesis you can’t imagine disconfirming isn’t a hypothesis — it’s an assumption.
Adversarial self-review. After the six cards, prompt: “Now act as a brutal R01 reviewer. Score each experiment 1-10 on how cleanly it isolates the mechanism, flag the weakest one, and rewrite it.” Then ask it to rate its own confidence (0-100%) that the set truly defines the mechanism.
Hunt the redundancy. Ask: “Which two of these six are secretly testing the same thing? Replace the redundant one with an experiment at a level we’re missing.” This is what makes the set complementary rather than six variations on a theme.
Let it fix your prompt. End with: “Before answering, rewrite my prompt to get a sharper, more mechanistic answer, then proceed with your improved version.” You’ll often learn a better question than the one you asked.

Starter materials

Paste the finding and background below into your AI, then use the card template to structure the output. (Swap in any finding from your own work if you’d rather — the template is the reusable part.)

The finding (paste this) Finding: Experience-dependent sensitization. After a single painful or threatening episode in a given context, both rodents and humans show amplified pain responses to later, milder stimuli encountered in that same context — and the amplification grows with repeated exposure rather than habituating.

Background (≈120 words): Sensitization is the lab cousin of “once bitten, twice shy” — but for nociception. In a classic demonstration, animals given an inflammatory insult to one paw later show lowered withdrawal thresholds not only at the injured site but in surrounding and even contralateral tissue, and the effect strengthens with each subsequent challenge. Parallel human work shows that people with prior painful medical procedures report higher pain and stronger anticipatory anxiety to standardized noxious stimuli delivered in similar settings. The phenomenon spans peripheral changes (nociceptor excitability), spinal mechanisms (central sensitization, wind-up), supraspinal modulation, and learning (Pavlovian fear conditioning, expectancy). What’s unresolved is which level(s) drive the amplification, whether the same mechanism operates in acute lab settings and chronic clinical pain, and why some individuals sensitize while others habituate.

Experiment card template (use for all six)

Experiment N — [short name] · Level: [molecular / cellular-circuit / behavioral / cognitive / social] · Track: [basic | clinical]

Hypothesis: a single specific causal claim — “X causes/modulates the amplification via Y.”
Design: species or sample + N; the manipulation (what you do to whom); the key outcome measure; the control or comparison condition.
Prediction: the result that would support the hypothesis — and the result that would falsify it.
Confound: the single most likely alternative explanation, and the one design feature that rules it out.

Worked example card (so teams know the bar)

Experiment 1 — Block the wind-up · Level: cellular-circuit · Track: basic

Hypothesis: NMDA-receptor-dependent spinal central sensitization is necessary for the contextual amplification, not just for baseline nociception.
Design: Mice (n≈12/group) receive repeated mild noxious heat in a distinctive chamber. Intrathecal NMDA antagonist (e.g., MK-801) vs. vehicle delivered before each session. Outcome: withdrawal-threshold drop across sessions (slope of sensitization). Control: same drug given in a different chamber to separate spinal blockade from context.
Prediction: Support → antagonist flattens the session-by-session sensitization slope while leaving first-session threshold intact. Falsify → sensitization slope is unchanged, implicating a non-NMDA (e.g., supraspinal/learning) driver.
Confound: The drug could simply blunt acute pain (analgesia mimicking “less sensitization”). Addressed by checking that first-exposure thresholds are normal and by the cross-chamber control.

Coverage checklist (score your six)

Exactly 3 basic + 3 clinical/psychosocial.
At least 4 different levels of analysis represented across the set.
Every card has a falsifying prediction, not just a confirming one.
No two experiments are secretly testing the same thing.
At least one experiment directly tests the basic↔︎clinical translation (does the same mechanism operate in both?).
The convergence note states a mechanism claim the full set would earn.

Debrief questions

Which level of analysis did your AI reach for first — and which did it ignore until you forced it? What does that reveal about its (and your) defaults?
Point to the two experiments in your set that are the most complementary. What can the pair conclude together that neither could alone?
Did any card have a falsifying prediction that was actually hard to imagine? If so, was it a real hypothesis or a disguised assumption?
When the AI critiqued and scored its own experiments, was it right about which was weakest? Where was its confidence miscalibrated?
If all six came back exactly as predicted, what would you still be unable to claim about the mechanism?

Level up

Demand the cheap-and-fast version. Ask the AI to redesign your strongest experiment under brutal constraints: one undergrad, $500, eight weeks, no animal facility. Mechanism on a shoestring is a real grant skill.
Pre-register one card. Have the AI turn a single experiment into a mini pre-registration: hypotheses, primary outcome, sample-size justification, and analysis plan with a stopping rule. Notice what gets harder when you can’t move the goalposts.
Build the decisive study. Ask: “Design the ONE experiment whose result would most cleanly distinguish your top two competing mechanisms — and tell me which existing finding it would have to contradict to matter.” That’s the experiment a study section remembers.

← Back to the Challenge menu · AI Toolkit →