Who Are My Friends?

Design a system that finds the reviewers who will love your paper.

An agentic challenge: decompose reviewer discovery into verifiable subtasks, mine citation context and sentiment, and stay on the right side of journal etiquette.

Somewhere out there is the reviewer who will read your discussion section and think “finally, someone who gets it.” Your job is to build the system that finds them.

35 min Advanced · Claude Code / Codex teams of 3-5 Agentic CLI

The goal

The Goal You have a paper (real or invented) that you want in front of reviewers who will appreciate it — people fluent in your methods, sympathetic to your framing, and likely to give a fair, engaged read. Produce a ranked shortlist of 5-8 candidate reviewers, each with a one-line justification and a confidence score, that you could plausibly hand to an editor’s “suggest reviewers” box. How you get there is up to you. Do not just ask an AI “who should review my paper?” and accept the answer.

Why it matters

Reviewer assignment quietly decides the fate of your science. The same manuscript can draw a “reject — out of scope” from someone who thinks pain is a peripheral nociception problem, and a “minor revisions, lovely work” from someone steeped in central sensitization and predictive coding. Editors routinely ask you to suggest reviewers, and most of us do it from memory — which means we suggest the three people we met at a conference, miss the early-career researcher who has been quietly citing us with admiration, and overlook the lab on another continent doing exactly our thing.

This is also a well-suited agentic problem. “Find good reviewers” is not one prompt; it is a pipeline of search, retrieval, reading-in-context, scoring, and verification. Learning to decompose a fuzzy academic goal into checkable subtasks — and to make the AI show its sources — is the transferable skill. The same scaffold finds collaborators, grant panel allies, journal clubs that would host your work, and people to cite.

Run of show

0:00–0:05 · Challenge introduction (5 min)
0:05–0:20 · Work in your group (15 min)
0:20–0:22 · Post your best prompt (2 min)
0:22–0:32 · Share & debrief (10 min)
0:32–0:35 · Reset (3 min)

Bad prompt to better prompt

Weak prompt

Who are good peer reviewers for my paper on chronic pain and brain imaging?

The model hands you a tidy list of five famous names it half-remembers — the textbook authors, the keynote speakers — with no sources, no evidence they would like the work, and a real chance one of them is your direct competitor or, worse, fabricated. It optimized for “famous in this field,” not “right reader for this manuscript.” You cannot verify any of it.

Strong prompt

You are a research-integrity-minded editorial assistant with web and file access. Goal: build a ranked shortlist of 5-8 candidate peer reviewers who would give my paper a fair, expert, sympathetic read.

My paper abstract and my 3 most-cited prior papers are in ./paper/.

Work as an agent in explicit steps, and pause to show me each step’s output before continuing: 1. Extract 6-8 key concepts + methods from my abstract. 2. For each concept, search recent (last 5 yrs) open literature for authors publishing on it; collect name, affiliation, a representative paper, and the source URL. 3. Find papers that CITE my prior work; for each, quote the sentence of citing context and label it positive / neutral / critical. 4. Merge into a candidate table. Score each person 0-1 on expertise-match and on likely-sympathy, and give an overall confidence with one sentence of reasoning. 5. Flag any candidate who is a likely competitor, frequent co-author of mine, or where you are NOT confident the person/paper is real.

Cite a source for every factual claim. If you cannot verify someone, say so rather than guessing.

It works because it assigns a role, decomposes the fuzzy goal into checkable subtasks, forces the model to externalize evidence (quoted citing context + URLs), separates “knows the topic” from “will like the paper,” and builds in adversarial self-checks (competitor flag, hallucination flag, confidence). You get an auditable shortlist rather than an unsourced guess.

Prompting moves to try

Decompose into agentic subtasks. Turn “find reviewers” into: concept extraction → author search → citation-context mining → sentiment scoring → dedupe/merge → verification. Ask the agent to checkpoint after each so you can catch a wrong turn early instead of at the end.
Role + audience prompting. “You are an editor who must defend this assignment to the EIC” produces more cautious, source-backed output than “list some reviewers.” Try also: “You are a skeptical reviewer of your own shortlist.”
Mine the citing context, not just the citation count. Ask the AI to quote the actual sentence where someone cites your work and classify its sentiment. “Building on Wager et al.” signals an ally; “contrary to claims by Wager et al.” signals a critic.
Adversarial self-evaluation. After it produces the list: “For each candidate, give your confidence (0-1) that this is a real person, that the cited paper exists, and that they’d be sympathetic. Sort by lowest confidence first and tell me what would change your mind.” This surfaces hallucinations fast.
Ask the AI to improve your prompt. “Before running, rewrite my instructions to make the verification step more rigorous and the sentiment scoring more reproducible. Then run the better version.”
Separate two axes. Expertise-match and likely-sympathy are different things; a world expert who hates your theory is a bad suggestion. Make the model score them in separate columns.

Starter materials

Paste-ready scaffold for the room. Drop a folder like ./paper/ next to your agent with the manuscript and prior papers, then feed the system-design scaffold and use the scoring rubric and verification checklist to keep the agent honest. A mock messy dataset is included so groups with no manuscript handy can still run the full pipeline.

1. System-design scaffold (fill in the brackets, hand to the agent)

ROLE: You are an editorial-assistant agent with web search and local file access. You value verifiability over completeness.

INPUT: - Manuscript abstract: [paste or ./paper/abstract.txt] - My prior papers to track citations of: [3-5 titles or DOIs] - My recent co-authors (exclude as reviewers): [names] - Known direct competitors (flag, don’t auto-exclude): [names or “none known”] - Target journal suggest-reviewer policy: [e.g. “needs 4, no recent co-authors, no shared institution”]

PIPELINE (checkpoint after each step): S1 Concepts: extract 6-8 concept/method keywords from the abstract. S2 Expertise search: per concept, find 2-3 active authors (last 5 yrs); record name, affiliation, 1 representative paper + URL. S3 Citation mining: find works citing my prior papers; quote the citing sentence; tag positive / neutral / critical. S4 (optional) Social signal: find public posts/preprint-server mentions of my prior work; tag sentiment + link. S5 Merge & dedupe: one row per person. S6 Score: fill the rubric columns below. S7 Verify: run the verification checklist; demote or drop anything that fails.

OUTPUT: a ranked table (highest overall confidence first) + a 2-sentence note on who you deliberately excluded and why.

CONSTRAINTS: cite a source URL for every factual claim; never invent a person, paper, or quote; if unsure, write “UNVERIFIED” instead of guessing.

2. Reviewer scoring rubric (the columns the agent must fill)

Candidate	Affiliation	Expertise match (0-1)	Likely sympathy (0-1)	Evidence (quote + source)	Conflict flag	Overall confidence (0-1)
e.g. L. Okafor	Univ. of Toronto	0.9	0.7	“extends the predictive-coding account of Wager et al.” (Pain, 2024, doi:…)	none	0.8
(agent fills)

Expertise match = overlap of their work with your concepts/methods.
Likely sympathy = inferred from citing tone, shared theoretical framing, methodological kinship. Be explicit it is an inference.
Conflict flag = recent co-author / same institution / advisor-advisee / direct competitor.

3. Verification checklist (run before trusting the list)

Every candidate has a real, locatable paper with a working source link.
Every “sympathy” score is backed by a quoted sentence, not a vibe.
No recent co-authors or same-institution conflicts slipped through.
At least one candidate is early-career (not just the usual senior names).
Spot-check 2 entries by hand — does the quote actually say what the agent claims?
The agent flagged anything it could not verify rather than smoothing over it.

4. Mock messy dataset (for groups without a manuscript)

Use this fake “who cites my work” export — deliberately inconsistent — and have the agent clean it, score sentiment from the context snippet, and produce reviewer candidates.

citing_author	inst	year	citing_context_snippet	source
Okafor, L.	UofT	2024	“…elegantly extends the central-sensitization framework proposed by [our group]…”	Pain 165(4)
j. mehta	Imperial Coll.	23	“see also their well-powered fMRI study”	bioRxiv
Sandberg-Lee, R	Karolinska	2025	“However, this conflates nociception with pain, contra [our group]”	J Neurosci
A. Ruiz	n/a	2024	“replicated by Ruiz et al., confirming the dorsal-horn effect”	Twitter/X thread
Park, Min-ji	SNU	2022	“builds directly on the predictive-coding model”	Neuroimage
(blank)	Stanford	2024	“we adopt their preprocessing pipeline wholesale”	preprint, no DOI
Okafor L.	Toronto	2024	DUPLICATE of row 1?	Pain

Note for groups: rows have ragged years (23 vs 2023), a duplicate, a missing author, an unverifiable Twitter source, and one clearly critical citation (Sandberg-Lee). A good agent will normalize, dedupe, demote the unverifiable source, and correctly tag Sandberg-Lee as a poor sympathy bet despite high expertise.

Debrief questions

Which signal turned out to be the strongest predictor of a sympathetic reviewer — topical overlap, citing tone, shared methods, or social-media mentions? Did any team find these disagreed?
Where did your agent hallucinate, and which verification step caught it? Which step should have caught it but didn’t?
How did separating “expertise match” from “likely sympathy” change your shortlist versus a single “good reviewer” score?
The critical citer (Sandberg-Lee) is a domain expert who disagrees with you. Is suggesting them ever the right move? What would an editor think of your list if it contained only fans?
What part of this pipeline would you trust to run unattended, and what part demands a human in the loop every time?

Level up

Add the preprint loop. Design step S4+ as a real strategy: post a preprint, instrument it (who downloads, who posts, who emails), and let engagement — not just citation history — surface interested experts over time. Sketch the data sources and how you’d score genuine interest vs. polite noise.
Draft personalized outreach (and a stop rule). Have the agent draft a short, honest “thought you might find this relevant” note per candidate, and write the rule for when NOT to send one. The skill is restraint, not reach.
Build the conflict-of-interest auditor. Add an agent that cross-checks every candidate against your co-author graph, shared grants, and institution, and produces an editor-ready COI statement for the whole list.

Ethics

Responsible use Suggesting reviewers is normal and expected — but the line between “finding the right expert reader” and “stacking the deck” is real. Suggest people who will give a fair read, not people you can manipulate; a list with zero potential critics is a red flag to any good editor, and norms vary by journal (some forbid recent co-authors or same-institution suggestions; some don’t take suggestions at all — check). Never contact a candidate to lobby for a favorable review, and never fabricate engagement. Sentiment scoring uses public statements; do not infer private feelings or build profiles beyond the task. And the hard rule for the inverse case — call it Reviewer 4: never point an AI at a manuscript you have been assigned to review in order to “find allies,” shape your verdict, or auto-write a review. This scaffold is for your own drafts and submissions only.

Back to the Challenge menu · Grab a tool from the AI Toolkit.