How to Plan MRMC and Standalone Studies for a 3-Month 510(k)

The Problem 🔗

Figure. Our secret sauce for a 3-month 510(k) submission: run the MRMC study first, keep collecting standalone cases in parallel, and use the washout period wisely. That way, when the FDA asks for more data—you already have it. Day 90 submission, Day 180 clearance.

When planning clinical validation for AI/ML-enabled medical devices, two study designs dominate regulatory submissions:

MRMC (Multi-Reader Multi-Case) studies, which compare unaided human performance to AI-aided performance.
Standalone studies, which evaluate algorithm performance directly against ground truth, independent of clinical workflow.

Most companies treat these as separate, sequential commitments—performing one study, then later deciding whether a second is necessary. But this mindset leaves time and data on the table.

In practice, you can—and often should—continue collecting data for a standalone study even after the MRMC study is complete. Doing so allows you to:

Accelerate time to clearance: Run the MRMC first and continue collecting standalone data during the reader washout and **even after you submit the 510(k) and wait for FDA's feedback.**
De-risk your submission: If the MRMC doesn’t fully meet expectations, or the data you collected has issues like missing race, ethnicity, or scanner manufacturer representation, the standalone dataset provides a powerful backup.

Doing so allows you to maximize speed to market. The standalone study can overcome issues in the data collected for the MRMC study such as insufficient representation of race, ethnicity, scanner manufacturer, convolution kernel, and other subgroups.

Executive Summary 🔗

MRMC and standalone answer different questions. MRMC tells FDA how your device changes reader performance; standalone shows your algorithm’s native accuracy and generalization across devices, sites, and subgroups. FDA’s CADe guidance explicitly anticipates both kinds of evidence and even recommends engaging FDA on both protocols up front. U.S. Food and Drug Administration
In real 510(k)s, standalone datasets are usually much larger than MRMC. In your cleaned dataset of recent radiology SaMD 510(k)s, the median standalone sample size was ~4× MRMC (IQR ~1–9×). That’s by design: MRMC studies are tightly controlled and necessarily small; standalone studies are where you prove breadth and robustness.
You can keep accruing standalone cases during the MRMC washout. This is common practice in cleared devices across modalities (X‑ray, mammography, ultrasound/echo). The data streams are independent; continuing standalone accrual doesn’t contaminate the reader study and accelerates your filing. FDA’s CADe guidance encourages advance alignment on both protocols via Pre‑Sub. U.S. Food and Drug Administration
MRMC “problems” aren’t a show‑stopper if your claims are crafted correctly. Reader studies can run into variance, prevalence enrichment effects, or subgroup noise. Strong standalone evidence does not erase a negative MRMC for a reader‑improvement claim—but it does mitigate risk, supports safety and effectiveness, and can enable claim shaping (e.g., non‑inferiority to a prior version under a Special 510(k) or manufacturer‑expansion under a PCCP).
A pragmatic “fast 510(k)” plan in ~3 months is realistic when you: (1) run the MRMC first; (2) keep standalone accrual running through the washout; (3) pre‑align endpoints via Q‑Sub; and (4) pick the right path (Traditional vs Special 510(k)).

About the Author 🔗

Yujan Shrestha, MD is a physician–engineer and Partner at Innolitics specializing in AI/ML SaMD. He unites clinical‑evidence design with software and regulatory execution. He uses his clinical, regulatory, and technical expertise to discover the least burdensome approach to get your AI/ML SaMD to market as quickly as possible without cutting corners on safety and efficacy. Coupled with his execution team, this results in a submission‑ready file in **3 months** (once prerequisites are met).

Why this matters for our “FDA submission in 3 months” guarantee 🔗

Our three‑month submission guarantee lives or dies on removing timeline risk. The biggest single risk is the pivotal clinical evidence package—especially the MRMC reader study (to show clinical utility) and the Standalone assessment (to show algorithm robustness). The fastest way to keep that risk small is simple: don’t let your data pipeline stop. Keep Standalone accrual running during MRMC washout and through the 510(k) interactive review window. Below are the concrete failure modes you avoid—and the misconceptions that, if left uncorrected, push companies into repeating a whole MRMC when more Standalone data would have solved the problem.

What can go wrong if you stop collecting data during washout and while you wait on 510(k) review 🔗

Borderline precision (wide CIs) on a key endpoint 🔗

What it looks like: The MRMC effect size (e.g., AUC‑Δ or sensitivity at fixed specificity) is on target, but confidence intervals are wider than planned—often because disease prevalence or case difficulty skewed harder than forecast.
Why it matters: FDA may ask for additional analyses or subgroup breakdowns to ensure clinical risks are covered.
If you stopped collecting: You must re‑open sites or negotiate access to new data sources—work measured in weeks to months.
If you kept collecting: You already have a fresh Standalone tranche to tighten operating‑point CIs, add subgroups, or show calibration stability. No site re‑activation.

Subgroup coverage is thin (vendor, scanner, age/sex, skin tone, BMI, clinical site) 🔗

What it looks like: MRMC hits its primary endpoint overall but shows wide variability or low counts in one or two risk‑relevant slices.
Why it matters: FDA can request subgroup breakouts; payers and key customers will ask the same.
If you stopped collecting: You’re stuck writing narrative justifications or trying to assemble one‑off data after the fact.
If you kept collecting: You can top‑up the exact subgroups that need precision, show stable point estimates, and move on.

Selection bias discovered late 🔗

What it looks like: Post‑hoc checks show over‑representation of obvious cases or under‑representation of edge cases (small lesions, motion, rare comorbidities).
Why it matters: Calls into question generalizability and safety in real use.
If you stopped collecting: You need to find and contract new sources, re‑open IRBs, and rebuild adjudication—time you don’t have.
If you kept collecting: Your pipeline already includes quota controls (or at least tracking) for under‑represented strata, so you can supply balanced evidence quickly.

MRMC vs Standalone: Conceptual Differences 🔗

Understanding the key differences between these validation approaches is critical. MRMC studies involve readers interpreting cases both with and without AI assistance, effectively capturing workflow impact and demonstrating real clinical utility. However, they come with significant logistical challenges including reader recruitment, required washout periods, and coordinating multiple reading sessions. In contrast, standalone studies directly compare algorithm outputs against definitive ground truth (whether from biopsy, invasive measurements, or expert consensus). While standalone studies are more straightforward, faster to execute, and allow for larger sample sizes, they don't capture the crucial human-AI interaction that's central to how these tools actually function in clinical settings. The strategic choice—and timing—between these approaches can significantly impact your regulatory timeline.

The Methodology 🔗

I have a database of all 510(k) and De Novo summary PDFs ever published by FDA (that is over 90k of them). With the help of LLMs, I found ~58 summaries with a multi-reader multi-case comparative effectiveness study. Of these, I used an LLM to extract the number of samples used for MRMC and Standalone performance testing. I spot checked the results and provided references at the end of this article for you to check yourself. I encourage you to double check any particular number before using it as precedent for your own submission.

The Results 🔗

Figure. **MRMC case counts.** Median **244**; IQR **200–398**; 10th–90th percentile **148–650**.

Median **900**; IQR **249–2,008**; 10th–90th percentile **~104–5,000**; heavy-tailed with some very large datasets (e.g., ≥10k). Note: I truncated this graph to 1000 for better visualization. See the references section for more information on the long tail.

**Median ratio (Standalone ÷ MRMC):** **3.4×** (p25=1.0×, p75≈7.9×; p90≈20.5×).

Units: MRMC often reports cases/images/exams; standalone often reports patients/cases/images. A small subset cites “studies” or “lesions.”
Interpretation: Across diverse device types, Standalone datasets trend 3–4× larger than MRMC on the median—consistent with the idea that MRMC squeezes more information per case via multiple readers, while Standalone must increase N to achieve comparable precision. However, it is acceptable to have equivalent numbers as well so there is a cluster of 1:1 as well.

What the cleared devices actually do (recent 510(k) examples) 🔗

Below are representative, publicly available 510(k) summaries illustrating the MRMC vs standalone split and explicit washout language. They also show that maintaining a large standalone corpus alongside controlled MRMC is common.

Chest‑CAD (CXR, Imagen) — K210666 🔗

MRMC: 24 clinical readers × 238 cases (two sessions with ≥28‑day washout).
Standalone: 20,000 chest radiographs from 12 U.S. sites.
Both streams: algorithm subgroup metrics, followed by reader AUC/sensitivity/specificity deltas under DBM analysis.

FractureDetect (MSK radiographs, Imagen) — K193417 🔗

MRMC: 175 cases across study types; reader AUC improved (paired in MRMC).
Standalone: 11,970 radiographs with high AUC (0.982) and stratified per anatomic region. FDA Access Data+1

MammoScreen® 3 (Therapixel) — K240301 🔗

MRMC: Retrospective dataset with 240 combined DBT/2D mammograms.
Standalone: 7,544 exams for algorithm performance (with priors support). FDA Access Data

MammoScreen® 4 (Therapixel) — K243679 (with PCCP) 🔗

Standalone: 1,475 patients / 2,950 studies to establish non‑inferiority vs earlier versions across multiple endpoints; data independence and subgrouping specified.
MRMC: Three separate MRMC studies (FFDM; DBT; combined with priors) showed superiority of aided vs unaided readers.
Regulatory note: cleared with a Predetermined Change Control Plan (PCCP), which pairs clean standalone endpoints to future updates—an approach tailor‑made for continued standalone accrual.

Rayvolve LN (Pulmonary Nodules on CXR, AZmed) — K243831 🔗

Standalone: 2,181 radiographs with AUC/sensitivity/specificity and subgroup analyses.
MRMC: 400 cases; readers significantly improved AUC, sensitivity, and specificity with ≥1‑month washout.

EchoSolv AS (Severe Aortic Stenosis, Echo IQ) — K241245 🔗

Standalone: 6,268 TTE studies; AUROC 0.948 with extensive subgroup reporting (age, sex, ethnicity, BMI, LVEF).
MRMC: 5 readers × 200 TTE studies. (Cardiology, 21 CFR 892.2060.)

Saige‑Dx (DBT, DeepHealth/RadNet) — K220105 and K251873 🔗

K220105 (Traditional 510(k)): MRMC: 18 MQSA readers × 240 cases with ≥4‑week washout; Standalone: 1,304 cases from 9 U.S. sites. FDA Access Data+1
K251873 (Special 510(k), 2025): Standalone: 2,002 DBT mammograms (multi‑vendor: Hologic & GE), non‑inferiority vs predicate (K243688). Prior MRMC evidence remained applicable. Illustrates how ongoing standalone accrual supports faster modifications/expansions. FDA Access Data

When Standalone Isn’t Bigger: What We Learn from Devices with Standalone:MRMC ≤1 🔗

What the data show. In our data snapshot, 17 devices report a Standalone:MRMC ratio of ≤1. This does not mean the programs are under‑evidenced; rather, it reflects claim type, task design, and what “n” measures (patients vs. scans vs. lesions). In several categories—image‑acquisition guidance, scoring/quantification tools, and high‑annotation‑cost tasks—it’s common to see MRMCs that are as large or larger than standalone sets.

Why the ratio can be ≤1 (and still be appropriate) 🔗

Claim type drives the evidence mix.
- Reader‑impact claims (e.g., “aided reads are better”) naturally put more weight on MRMC, sometimes with hundreds of cases and many readers to detect realistic ΔAUC.
- Guidance / acquisition optimization tools (product code QJU) test operator performance under controlled reading/acquisition tasks; standalone data may be narrower (e.g., algorithm gating or quality thresholds).
Units matter. Ratio calculations can be “equal” on paper while measuring different denominators: patients vs nodules; cases vs lesions; cases vs series. That’s acceptable if the protocol is clear and the unit maps to the endpoint and labeling.
High‑effort truthing. For some devices, ground truth can be expensive (full ROI segmentation, biopsy, consensus, longitudinal follow‑up) whereas the MRMC measurements can be less burdensome (e.g. categorical, length measurement, etc).

Conclusion 🔗

Our analysis across dozens of AI/ML SaMD 510(k) submissions reveals several consistent study design frameworks:

Compact MRMC Studies: Reader studies typically involve 12–24 readers and 200–400 cases, with ≥4-week washouts to measure aided vs. unaided reader performance.
Balanced Evidence Strategies: Precedent exists for MRMCs studies using the same data for the standalone test leading to a 1x standalone to MRMC ratio.
Large Standalone Datasets: Standalone validations often run 3× or more larger than MRMC, covering thousands of patients, exams, or images across sites, vendors, and subgroups to demonstrate robustness and independence from training data.
Parallelization as a Speed Lever: The fastest clearances come from sponsors who run MRMC early and keep standalone data accrual running through washout.

These frameworks—compact MRMC for reader effect, expansive standalone for robustness, and careful protocol separation—reflect practices distilled over years of submissions and refined by real-world edge cases. They will continue to evolve as models, claims, and regulatory pathways adapt.

If you have a working product and want to get to market ASAP, reach out today. Let’s schedule the gap assessment, finalize claims and endpoints, and put a date on your submission calendar now. We can add certainty and speed to your FDA journey. Our 3-Month 510(k) Submission program guarantees FDA submission in 3 months with clearance in 3 to 6 months afterwards. We are able to offer this accelerated service because, unlike other firms, we have physicians, engineers, and regulatory consultants all in-house focused on AI/ML SaMD. We leverage decades of combined experience, fine-tuned templates, and custom-built submission software to offer a done-for-you, turnkey fast 510(k) submission.

Submit My 510(k)

References 🔗


K Number	device name	mrmc n	mrmc unit	standalone n	standalone unit	ratio standalone to mrmc
DEN190040	Caption Guidance	240	patients	50	patients	0.2
K223347	UltraSight AI Guidance	240	subjects	75	subjects	0.3
K240044	CADDIE	841	patients	389	patients	0.5
K161201	ClearRead CT	200	cases	100	cases	0.5
K241770	Prostate MR AI (VA10A)	340	cases	222	transversal T2 series	0.7
K221624	Avenda Health AI Prostate Cancer Planning Software	200	cases	137	patients	0.7
K243294	Brainomix 360 e-ASPECTS	140	cases	137	scans	1.0
K212783	ProstatID	150	patients	150	cases	1.0
K240712	icobrain aria	199	cases	199	cases	1.0
K233342	CINA-ASPECTS	200	cases	200	cases	1.0
K211541	MammoScreen 2.0	240	cases	240	cases	1.0
K202300	Optellum Virtual Nodule Clinic, Optellum software, Optellum platform	300	subjects	300	nodules	1.0
K243688	Saige-Dx (3.1.0)	419	studies	419	studies	1.0
K240697	See-Mode Augmented Reporting Tool, Thyroid (SMART-T)	600	cases	600	cases	1.0
K190442	Koios DS for Breast	900	patient cases	900	lesions	1.0
K240003	Velmeni for Dentists (V4D)	1,797	images	1,797	images	1.0
K210365	Second Opinion	2,010	images	2,010	images	1.0
K242683	QP-Prostate® CAD	228	cases	247	lesions	1.1
K242130	Koios DS	650	cases	900	lesions	1.4
K212616	Koios DS	650	cases	900	lesions	1.4
K220624	AI4CMR v1.0	146	cases	238	cases	1.6
K210670	BU-CAD	628	cases	1,139	cases	1.8
K233738	Overjet Caries Assist-Pediatric	636	patients	1,190	images	1.9
K201019	Genius AI Detection	390	cases	764	cases	2.0
K250221	StrokeSENS ASPECTS Software Application	100	CT scans	200	CT scans	2.0
K242437	Smile Dx®	352	cases	867	cases	2.5
K231678	Overjet Periapical Radiolucency Assist	379	images	1,147	images	3.0
K232384	Videa Dental Assist	378	radiographs	1,445	radiographs	3.8
K241725	Better Diagnostics Caries Assist (BDCA) Version 1.0	328	images	1,298	images	4.0
K222176	BoneView	480	cases	2,000	radiographs	4.2
K251071	Fetal EchoScan (v1.1)	200	exams	877	exams	4.4
K242342	Fetal EchoScan	200	exams	877	exams	4.4
K213795	Videa Caries Assist	226	radiographs	1,034	radiographs	4.6
K230144	Denti.AI Detect	154	images	709	images	4.6
K221564	Brainomix 360 e-ASPECTS	54	cases	256	patients	4.7
K182373	PowerLook Tomo Detection V2 Software	260	cases	1,265	cases	4.9
DEN180005	OsteoDetect	200	cases	1,000	images	5.0
K220105	Saige-Dx	240	cases	1,304	cases	5.4
K243831	Rayvolve LN	400	cases	2,181	radiographs	5.5
DEN230008	DermaSensor	286	lesions	1,579	lesions	5.5
DEN170022	QuantX	111	cases	652	lesions	5.9
K231470	Lunit INSIGHT DBT	258	DBT exams	2,202	DBT exams	8.5
K231001	DeepTek CXR Analyzer v1.0	300	cases	3,000	scans	10.0
K211678	Lunit INSIGHT MMG	240	mammograms	2,412	mammograms	10.1
K242171	TechCare Trauma	769	cases	7,744	images	10.1
K243614	Sonio Suspect	750	images	8,745	images	11.7
K220164	Rayvolve	186	patients	2,626	radiographs	14.1
K241620	ChestView US	240	cases	3,884	cases	16.2
K212365	BoneView	480	cases	8,918	radiographs	18.6
K230085	Lung-CAD	244	cases	5,000	cases	20.5
K223811	Lung-CAD	244	cases	5,000	cases	20.5
K213353	Aorta-CAD	244	cases	5,000	cases	20.5
K240301	MammoScreen® (3)	240	mammograms	7,544	exams	31.4
K193417	FractureDetect (FX)	175	cases	11,970	radiographs	68.4
K210666	Chest-CAD	238	cases	20,000	cases	84.0

Software Development

FDA Regulatory Consulting

FDA Cybersecurity

Fast 510(k)

Guided 510(k)

Guaranteed AI/ML FDA Clearance

Regulatory Strategy, Q-Sub, and BDD in 2 Weeks

Guaranteed QMS in 2 Months

How to Plan MRMC and Standalone Studies for a 3-Month 510(k)

The Problem 🔗

Executive Summary 🔗

About the Author 🔗

Why this matters for our “FDA submission in 3 months” guarantee 🔗