The Problem 🔗
When planning clinical validation for AI/ML-enabled medical devices, two study designs dominate regulatory submissions:
- MRMC (Multi-Reader Multi-Case) studies, which compare unaided human performance to AI-aided performance.
- Standalone studies, which evaluate algorithm performance directly against ground truth, independent of clinical workflow.
Most companies treat these as separate, sequential commitments—performing one study, then later deciding whether a second is necessary. But this mindset leaves time and data on the table.
In practice, you can—and often should—continue collecting data for a standalone study even after the MRMC study is complete. Doing so allows you to:
- Accelerate time to clearance: Run the MRMC first and continue collecting standalone data during the reader washout and **even after you submit the 510(k) and wait for FDA's feedback.**
- De-risk your submission: If the MRMC doesn’t fully meet expectations, or the data you collected has issues like missing race, ethnicity, or scanner manufacturer representation, the standalone dataset provides a powerful backup.
Doing so allows you to maximize speed to market. The standalone study can overcome issues in the data collected for the MRMC study such as insufficient representation of race, ethnicity, scanner manufacturer, convolution kernel, and other subgroups.
Executive Summary 🔗
- MRMC and standalone answer different questions. MRMC tells FDA how your device changes reader performance; standalone shows your algorithm’s native accuracy and generalization across devices, sites, and subgroups. FDA’s CADe guidance explicitly anticipates both kinds of evidence and even recommends engaging FDA on both protocols up front. U.S. Food and Drug Administration
- In real 510(k)s, standalone datasets are usually much larger than MRMC. In your cleaned dataset of recent radiology SaMD 510(k)s, the median standalone sample size was ~4× MRMC (IQR ~1–9×). That’s by design: MRMC studies are tightly controlled and necessarily small; standalone studies are where you prove breadth and robustness.
- You can keep accruing standalone cases during the MRMC washout. This is common practice in cleared devices across modalities (X‑ray, mammography, ultrasound/echo). The data streams are independent; continuing standalone accrual doesn’t contaminate the reader study and accelerates your filing. FDA’s CADe guidance encourages advance alignment on both protocols via Pre‑Sub. U.S. Food and Drug Administration
- MRMC “problems” aren’t a show‑stopper if your claims are crafted correctly. Reader studies can run into variance, prevalence enrichment effects, or subgroup noise. Strong standalone evidence does not erase a negative MRMC for a reader‑improvement claim—but it does mitigate risk, supports safety and effectiveness, and can enable claim shaping (e.g., non‑inferiority to a prior version under a Special 510(k) or manufacturer‑expansion under a PCCP).
- A pragmatic “fast 510(k)” plan in ~3 months is realistic when you: (1) run the MRMC first; (2) keep standalone accrual running through the washout; (3) pre‑align endpoints via Q‑Sub; and (4) pick the right path (Traditional vs Special 510(k)).
About the Author 🔗
Yujan Shrestha, MD is a physician–engineer and Partner at Innolitics specializing in AI/ML SaMD. He unites clinical‑evidence design with software and regulatory execution. He uses his clinical, regulatory, and technical expertise to discover the least burdensome approach to get your AI/ML SaMD to market as quickly as possible without cutting corners on safety and efficacy. Coupled with his execution team, this results in a submission‑ready file in **3 months** (once prerequisites are met).
Why this matters for our “FDA submission in 3 months” guarantee 🔗
Our three‑month submission guarantee lives or dies on removing timeline risk. The biggest single risk is the pivotal clinical evidence package—especially the MRMC reader study (to show clinical utility) and the Standalone assessment (to show algorithm robustness). The fastest way to keep that risk small is simple: don’t let your data pipeline stop. Keep Standalone accrual running during MRMC washout and through the 510(k) interactive review window. Below are the concrete failure modes you avoid—and the misconceptions that, if left uncorrected, push companies into repeating a whole MRMC when more Standalone data would have solved the problem.
What can go wrong if you stop collecting data during washout and while you wait on 510(k) review 🔗
Borderline precision (wide CIs) on a key endpoint 🔗
- What it looks like: The MRMC effect size (e.g., AUC‑Δ or sensitivity at fixed specificity) is on target, but confidence intervals are wider than planned—often because disease prevalence or case difficulty skewed harder than forecast.
- Why it matters: FDA may ask for additional analyses or subgroup breakdowns to ensure clinical risks are covered.
- If you stopped collecting: You must re‑open sites or negotiate access to new data sources—work measured in weeks to months.
- If you kept collecting: You already have a fresh Standalone tranche to tighten operating‑point CIs, add subgroups, or show calibration stability. No site re‑activation.
Subgroup coverage is thin (vendor, scanner, age/sex, skin tone, BMI, clinical site) 🔗
- What it looks like: MRMC hits its primary endpoint overall but shows wide variability or low counts in one or two risk‑relevant slices.
- Why it matters: FDA can request subgroup breakouts; payers and key customers will ask the same.
- If you stopped collecting: You’re stuck writing narrative justifications or trying to assemble one‑off data after the fact.
- If you kept collecting: You can top‑up the exact subgroups that need precision, show stable point estimates, and move on.
Selection bias discovered late 🔗
- What it looks like: Post‑hoc checks show over‑representation of obvious cases or under‑representation of edge cases (small lesions, motion, rare comorbidities).
- Why it matters: Calls into question generalizability and safety in real use.
- If you stopped collecting: You need to find and contract new sources, re‑open IRBs, and rebuild adjudication—time you don’t have.
- If you kept collecting: Your pipeline already includes quota controls (or at least tracking) for under‑represented strata, so you can supply balanced evidence quickly.
MRMC vs Standalone: Conceptual Differences 🔗
Understanding the key differences between these validation approaches is critical. MRMC studies involve readers interpreting cases both with and without AI assistance, effectively capturing workflow impact and demonstrating real clinical utility. However, they come with significant logistical challenges including reader recruitment, required washout periods, and coordinating multiple reading sessions. In contrast, standalone studies directly compare algorithm outputs against definitive ground truth (whether from biopsy, invasive measurements, or expert consensus). While standalone studies are more straightforward, faster to execute, and allow for larger sample sizes, they don't capture the crucial human-AI interaction that's central to how these tools actually function in clinical settings. The strategic choice—and timing—between these approaches can significantly impact your regulatory timeline.
The Methodology 🔗
I have a database of all 510(k) and De Novo summary PDFs ever published by FDA (that is over 90k of them). With the help of LLMs, I found ~58 summaries with a multi-reader multi-case comparative effectiveness study. Of these, I used an LLM to extract the number of samples used for MRMC and Standalone performance testing. I spot checked the results and provided references at the end of this article for you to check yourself. I encourage you to double check any particular number before using it as precedent for your own submission.
The Results 🔗
- Units: MRMC often reports cases/images/exams; standalone often reports patients/cases/images. A small subset cites “studies” or “lesions.”
- Interpretation: Across diverse device types, Standalone datasets trend 3–4× larger than MRMC on the median—consistent with the idea that MRMC squeezes more information per case via multiple readers, while Standalone must increase N to achieve comparable precision. However, it is acceptable to have equivalent numbers as well so there is a cluster of 1:1 as well.
What the cleared devices actually do (recent 510(k) examples) 🔗
Below are representative, publicly available 510(k) summaries illustrating the MRMC vs standalone split and explicit washout language. They also show that maintaining a large standalone corpus alongside controlled MRMC is common.
Chest‑CAD (CXR, Imagen) — K210666 🔗
- MRMC: 24 clinical readers × 238 cases (two sessions with ≥28‑day washout).
- Standalone: 20,000 chest radiographs from 12 U.S. sites.
- Both streams: algorithm subgroup metrics, followed by reader AUC/sensitivity/specificity deltas under DBM analysis.
FractureDetect (MSK radiographs, Imagen) — K193417 🔗
- MRMC: 175 cases across study types; reader AUC improved (paired in MRMC).
- Standalone: 11,970 radiographs with high AUC (0.982) and stratified per anatomic region. FDA Access Data+1
MammoScreen® 3 (Therapixel) — K240301 🔗
- MRMC: Retrospective dataset with 240 combined DBT/2D mammograms.
- Standalone: 7,544 exams for algorithm performance (with priors support). FDA Access Data
MammoScreen® 4 (Therapixel) — K243679 (with PCCP) 🔗
- Standalone: 1,475 patients / 2,950 studies to establish non‑inferiority vs earlier versions across multiple endpoints; data independence and subgrouping specified.
- MRMC: Three separate MRMC studies (FFDM; DBT; combined with priors) showed superiority of aided vs unaided readers.
- Regulatory note: cleared with a Predetermined Change Control Plan (PCCP), which pairs clean standalone endpoints to future updates—an approach tailor‑made for continued standalone accrual.
Rayvolve LN (Pulmonary Nodules on CXR, AZmed) — K243831 🔗
- Standalone: 2,181 radiographs with AUC/sensitivity/specificity and subgroup analyses.
- MRMC: 400 cases; readers significantly improved AUC, sensitivity, and specificity with ≥1‑month washout.
EchoSolv AS (Severe Aortic Stenosis, Echo IQ) — K241245 🔗
- Standalone: 6,268 TTE studies; AUROC 0.948 with extensive subgroup reporting (age, sex, ethnicity, BMI, LVEF).
- MRMC: 5 readers × 200 TTE studies. (Cardiology, 21 CFR 892.2060.)
Saige‑Dx (DBT, DeepHealth/RadNet) — K220105 and K251873 🔗
- K220105 (Traditional 510(k)): MRMC: 18 MQSA readers × 240 cases with ≥4‑week washout; Standalone: 1,304 cases from 9 U.S. sites. FDA Access Data+1
- K251873 (Special 510(k), 2025): Standalone: 2,002 DBT mammograms (multi‑vendor: Hologic & GE), non‑inferiority vs predicate (K243688). Prior MRMC evidence remained applicable. Illustrates how ongoing standalone accrual supports faster modifications/expansions. FDA Access Data
When Standalone Isn’t Bigger: What We Learn from Devices with Standalone:MRMC ≤1 🔗
What the data show. In our data snapshot, 17 devices report a Standalone:MRMC ratio of ≤1. This does not mean the programs are under‑evidenced; rather, it reflects claim type, task design, and what “n” measures (patients vs. scans vs. lesions). In several categories—image‑acquisition guidance, scoring/quantification tools, and high‑annotation‑cost tasks—it’s common to see MRMCs that are as large or larger than standalone sets.
Why the ratio can be ≤1 (and still be appropriate) 🔗
-
Claim type drives the evidence mix.
- Reader‑impact claims (e.g., “aided reads are better”) naturally put more weight on MRMC, sometimes with hundreds of cases and many readers to detect realistic ΔAUC.
- Guidance / acquisition optimization tools (product code QJU) test operator performance under controlled reading/acquisition tasks; standalone data may be narrower (e.g., algorithm gating or quality thresholds).
- Units matter. Ratio calculations can be “equal” on paper while measuring different denominators: patients vs nodules; cases vs lesions; cases vs series. That’s acceptable if the protocol is clear and the unit maps to the endpoint and labeling.
- High‑effort truthing. For some devices, ground truth can be expensive (full ROI segmentation, biopsy, consensus, longitudinal follow‑up) whereas the MRMC measurements can be less burdensome (e.g. categorical, length measurement, etc).
Conclusion 🔗
Our analysis across dozens of AI/ML SaMD 510(k) submissions reveals several consistent study design frameworks:
- Compact MRMC Studies: Reader studies typically involve 12–24 readers and 200–400 cases, with ≥4-week washouts to measure aided vs. unaided reader performance.
- Balanced Evidence Strategies: Precedent exists for MRMCs studies using the same data for the standalone test leading to a 1x standalone to MRMC ratio.
- Large Standalone Datasets: Standalone validations often run 3× or more larger than MRMC, covering thousands of patients, exams, or images across sites, vendors, and subgroups to demonstrate robustness and independence from training data.
- Parallelization as a Speed Lever: The fastest clearances come from sponsors who run MRMC early and keep standalone data accrual running through washout.
These frameworks—compact MRMC for reader effect, expansive standalone for robustness, and careful protocol separation—reflect practices distilled over years of submissions and refined by real-world edge cases. They will continue to evolve as models, claims, and regulatory pathways adapt.
If you have a working product and want to get to market ASAP, reach out today. Let’s schedule the gap assessment, finalize claims and endpoints, and put a date on your submission calendar now. We can add certainty and speed to your FDA journey. Our 3-Month 510(k) Submission program guarantees FDA submission in 3 months with clearance in 3 to 6 months afterwards. We are able to offer this accelerated service because, unlike other firms, we have physicians, engineers, and regulatory consultants all in-house focused on AI/ML SaMD. We leverage decades of combined experience, fine-tuned templates, and custom-built submission software to offer a done-for-you, turnkey fast 510(k) submission.
No working product? Reach out anyway. We can build it for you too!
Our People. Our Process. Our Proof. 🔗
References 🔗
K Number | device name | mrmc n | mrmc unit | standalone n | standalone unit | ratio standalone to mrmc |
DEN190040 | Caption Guidance | 240 | patients | 50 | patients | 0.2 |
K223347 | UltraSight AI Guidance | 240 | subjects | 75 | subjects | 0.3 |
K240044 | CADDIE | 841 | patients | 389 | patients | 0.5 |
K161201 | ClearRead CT | 200 | cases | 100 | cases | 0.5 |
K241770 | Prostate MR AI (VA10A) | 340 | cases | 222 | transversal T2 series | 0.7 |
K221624 | Avenda Health AI Prostate Cancer Planning Software | 200 | cases | 137 | patients | 0.7 |
K243294 | Brainomix 360 e-ASPECTS | 140 | cases | 137 | scans | 1.0 |
K212783 | ProstatID | 150 | patients | 150 | cases | 1.0 |
K240712 | icobrain aria | 199 | cases | 199 | cases | 1.0 |
K233342 | CINA-ASPECTS | 200 | cases | 200 | cases | 1.0 |
K211541 | MammoScreen 2.0 | 240 | cases | 240 | cases | 1.0 |
K202300 | Optellum Virtual Nodule Clinic, Optellum software, Optellum platform | 300 | subjects | 300 | nodules | 1.0 |
K243688 | Saige-Dx (3.1.0) | 419 | studies | 419 | studies | 1.0 |
K240697 | See-Mode Augmented Reporting Tool, Thyroid (SMART-T) | 600 | cases | 600 | cases | 1.0 |
K190442 | Koios DS for Breast | 900 | patient cases | 900 | lesions | 1.0 |
K240003 | Velmeni for Dentists (V4D) | 1,797 | images | 1,797 | images | 1.0 |
K210365 | Second Opinion | 2,010 | images | 2,010 | images | 1.0 |
K242683 | QP-Prostate® CAD | 228 | cases | 247 | lesions | 1.1 |
K242130 | Koios DS | 650 | cases | 900 | lesions | 1.4 |
K212616 | Koios DS | 650 | cases | 900 | lesions | 1.4 |
K220624 | AI4CMR v1.0 | 146 | cases | 238 | cases | 1.6 |
K210670 | BU-CAD | 628 | cases | 1,139 | cases | 1.8 |
K233738 | Overjet Caries Assist-Pediatric | 636 | patients | 1,190 | images | 1.9 |
K201019 | Genius AI Detection | 390 | cases | 764 | cases | 2.0 |
K250221 | StrokeSENS ASPECTS Software Application | 100 | CT scans | 200 | CT scans | 2.0 |
K242437 | Smile Dx® | 352 | cases | 867 | cases | 2.5 |
K231678 | Overjet Periapical Radiolucency Assist | 379 | images | 1,147 | images | 3.0 |
K232384 | Videa Dental Assist | 378 | radiographs | 1,445 | radiographs | 3.8 |
K241725 | Better Diagnostics Caries Assist (BDCA) Version 1.0 | 328 | images | 1,298 | images | 4.0 |
K222176 | BoneView | 480 | cases | 2,000 | radiographs | 4.2 |
K251071 | Fetal EchoScan (v1.1) | 200 | exams | 877 | exams | 4.4 |
K242342 | Fetal EchoScan | 200 | exams | 877 | exams | 4.4 |
K213795 | Videa Caries Assist | 226 | radiographs | 1,034 | radiographs | 4.6 |
K230144 | Denti.AI Detect | 154 | images | 709 | images | 4.6 |
K221564 | Brainomix 360 e-ASPECTS | 54 | cases | 256 | patients | 4.7 |
K182373 | PowerLook Tomo Detection V2 Software | 260 | cases | 1,265 | cases | 4.9 |
DEN180005 | OsteoDetect | 200 | cases | 1,000 | images | 5.0 |
K220105 | Saige-Dx | 240 | cases | 1,304 | cases | 5.4 |
K243831 | Rayvolve LN | 400 | cases | 2,181 | radiographs | 5.5 |
DEN230008 | DermaSensor | 286 | lesions | 1,579 | lesions | 5.5 |
DEN170022 | QuantX | 111 | cases | 652 | lesions | 5.9 |
K231470 | Lunit INSIGHT DBT | 258 | DBT exams | 2,202 | DBT exams | 8.5 |
K231001 | DeepTek CXR Analyzer v1.0 | 300 | cases | 3,000 | scans | 10.0 |
K211678 | Lunit INSIGHT MMG | 240 | mammograms | 2,412 | mammograms | 10.1 |
K242171 | TechCare Trauma | 769 | cases | 7,744 | images | 10.1 |
K243614 | Sonio Suspect | 750 | images | 8,745 | images | 11.7 |
K220164 | Rayvolve | 186 | patients | 2,626 | radiographs | 14.1 |
K241620 | ChestView US | 240 | cases | 3,884 | cases | 16.2 |
K212365 | BoneView | 480 | cases | 8,918 | radiographs | 18.6 |
K230085 | Lung-CAD | 244 | cases | 5,000 | cases | 20.5 |
K223811 | Lung-CAD | 244 | cases | 5,000 | cases | 20.5 |
K213353 | Aorta-CAD | 244 | cases | 5,000 | cases | 20.5 |
K240301 | MammoScreen® (3) | 240 | mammograms | 7,544 | exams | 31.4 |
K193417 | FractureDetect (FX) | 175 | cases | 11,970 | radiographs | 68.4 |
K210666 | Chest-CAD | 238 | cases | 20,000 | cases | 84.0 |