How to Plan MRMC and Standalone Studies for a 3-Month 510(k)

 September 29, 2025
SHARE ON

AI/MLRegulatory

The Problem 🔗

Figure. Our secret sauce for a 3-month 510(k) submission: run the MRMC study first, keep collecting standalone cases in parallel, and use the washout period wisely. That way, when the FDA asks for more data—you already have it. Day 90 submission, Day 180 clearance.

When planning clinical validation for AI/ML-enabled medical devices, two study designs dominate regulatory submissions:

  • MRMC (Multi-Reader Multi-Case) studies, which compare unaided human performance to AI-aided performance.
  • Standalone studies, which evaluate algorithm performance directly against ground truth, independent of clinical workflow.

Most companies treat these as separate, sequential commitments—performing one study, then later deciding whether a second is necessary. But this mindset leaves time and data on the table.

In practice, you can—and often should—continue collecting data for a standalone study even after the MRMC study is complete. Doing so allows you to:

  • Accelerate time to clearance: Run the MRMC first and continue collecting standalone data during the reader washout and **even after you submit the 510(k) and wait for FDA's feedback.**
  • De-risk your submission: If the MRMC doesn’t fully meet expectations, or the data you collected has issues like missing race, ethnicity, or scanner manufacturer representation, the standalone dataset provides a powerful backup.

Doing so allows you to maximize speed to market. The standalone study can overcome issues in the data collected for the MRMC study such as insufficient representation of race, ethnicity, scanner manufacturer, convolution kernel, and other subgroups.

Executive Summary 🔗

  • MRMC and standalone answer different questions. MRMC tells FDA how your device changes reader performance; standalone shows your algorithm’s native accuracy and generalization across devices, sites, and subgroups. FDA’s CADe guidance explicitly anticipates both kinds of evidence and even recommends engaging FDA on both protocols up front. U.S. Food and Drug Administration
  • In real 510(k)s, standalone datasets are usually much larger than MRMC. In your cleaned dataset of recent radiology SaMD 510(k)s, the median standalone sample size was ~4× MRMC (IQR ~1–9×). That’s by design: MRMC studies are tightly controlled and necessarily small; standalone studies are where you prove breadth and robustness.
  • You can keep accruing standalone cases during the MRMC washout. This is common practice in cleared devices across modalities (X‑ray, mammography, ultrasound/echo). The data streams are independent; continuing standalone accrual doesn’t contaminate the reader study and accelerates your filing. FDA’s CADe guidance encourages advance alignment on both protocols via Pre‑Sub. U.S. Food and Drug Administration
  • MRMC “problems” aren’t a show‑stopper if your claims are crafted correctly. Reader studies can run into variance, prevalence enrichment effects, or subgroup noise. Strong standalone evidence does not erase a negative MRMC for a reader‑improvement claim—but it does mitigate risk, supports safety and effectiveness, and can enable claim shaping (e.g., non‑inferiority to a prior version under a Special 510(k) or manufacturer‑expansion under a PCCP).
  • A pragmatic “fast 510(k)” plan in ~3 months is realistic when you: (1) run the MRMC first; (2) keep standalone accrual running through the washout; (3) pre‑align endpoints via Q‑Sub; and (4) pick the right path (Traditional vs Special 510(k)).

About the Author 🔗

Yujan Shrestha, MD is a physician–engineer and Partner at Innolitics specializing in AI/ML SaMD. He unites clinical‑evidence design with software and regulatory execution. He uses his clinical, regulatory, and technical expertise to discover the least burdensome approach to get your AI/ML SaMD to market as quickly as possible without cutting corners on safety and efficacy. Coupled with his execution team, this results in a submission‑ready file in **3 months** (once prerequisites are met).

Why this matters for our “FDA submission in 3 months” guarantee 🔗

Our three‑month submission guarantee lives or dies on removing timeline risk. The biggest single risk is the pivotal clinical evidence package—especially the MRMC reader study (to show clinical utility) and the Standalone assessment (to show algorithm robustness). The fastest way to keep that risk small is simple: don’t let your data pipeline stop. Keep Standalone accrual running during MRMC washout and through the 510(k) interactive review window. Below are the concrete failure modes you avoid—and the misconceptions that, if left uncorrected, push companies into repeating a whole MRMC when more Standalone data would have solved the problem.

What can go wrong if you stop collecting data during washout and while you wait on 510(k) review 🔗

Borderline precision (wide CIs) on a key endpoint 🔗

  • What it looks like: The MRMC effect size (e.g., AUC‑Δ or sensitivity at fixed specificity) is on target, but confidence intervals are wider than planned—often because disease prevalence or case difficulty skewed harder than forecast.
  • Why it matters: FDA may ask for additional analyses or subgroup breakdowns to ensure clinical risks are covered.
  • If you stopped collecting: You must re‑open sites or negotiate access to new data sources—work measured in weeks to months.
  • If you kept collecting: You already have a fresh Standalone tranche to tighten operating‑point CIs, add subgroups, or show calibration stability. No site re‑activation.

Subgroup coverage is thin (vendor, scanner, age/sex, skin tone, BMI, clinical site) 🔗

  • What it looks like: MRMC hits its primary endpoint overall but shows wide variability or low counts in one or two risk‑relevant slices.
  • Why it matters: FDA can request subgroup breakouts; payers and key customers will ask the same.
  • If you stopped collecting: You’re stuck writing narrative justifications or trying to assemble one‑off data after the fact.
  • If you kept collecting: You can top‑up the exact subgroups that need precision, show stable point estimates, and move on.

Selection bias discovered late 🔗

  • What it looks like: Post‑hoc checks show over‑representation of obvious cases or under‑representation of edge cases (small lesions, motion, rare comorbidities).
  • Why it matters: Calls into question generalizability and safety in real use.
  • If you stopped collecting: You need to find and contract new sources, re‑open IRBs, and rebuild adjudication—time you don’t have.
  • If you kept collecting: Your pipeline already includes quota controls (or at least tracking) for under‑represented strata, so you can supply balanced evidence quickly.

MRMC vs Standalone: Conceptual Differences 🔗

Understanding the key differences between these validation approaches is critical. MRMC studies involve readers interpreting cases both with and without AI assistance, effectively capturing workflow impact and demonstrating real clinical utility. However, they come with significant logistical challenges including reader recruitment, required washout periods, and coordinating multiple reading sessions. In contrast, standalone studies directly compare algorithm outputs against definitive ground truth (whether from biopsy, invasive measurements, or expert consensus). While standalone studies are more straightforward, faster to execute, and allow for larger sample sizes, they don't capture the crucial human-AI interaction that's central to how these tools actually function in clinical settings. The strategic choice—and timing—between these approaches can significantly impact your regulatory timeline.

The Methodology 🔗

I have a database of all 510(k) and De Novo summary PDFs ever published by FDA (that is over 90k of them). With the help of LLMs, I found ~58 summaries with a multi-reader multi-case comparative effectiveness study. Of these, I used an LLM to extract the number of samples used for MRMC and Standalone performance testing. I spot checked the results and provided references at the end of this article for you to check yourself. I encourage you to double check any particular number before using it as precedent for your own submission.

The Results 🔗

Figure. MRMC case counts. Median 244; IQR 200–398; 10th–90th percentile 148–650.
Median 900; IQR 249–2,008; 10th–90th percentile ~104–5,000; heavy-tailed with some very large datasets (e.g., ≥10k). Note: I truncated this graph to 1000 for better visualization. See the references section for more information on the long tail.
Median ratio (Standalone ÷ MRMC): 3.4× (p25=1.0×, p75≈7.9×; p90≈20.5×).
  • Units: MRMC often reports cases/images/exams; standalone often reports patients/cases/images. A small subset cites “studies” or “lesions.”
  • Interpretation: Across diverse device types, Standalone datasets trend 3–4× larger than MRMC on the median—consistent with the idea that MRMC squeezes more information per case via multiple readers, while Standalone must increase N to achieve comparable precision. However, it is acceptable to have equivalent numbers as well so there is a cluster of 1:1 as well.

What the cleared devices actually do (recent 510(k) examples) 🔗

Below are representative, publicly available 510(k) summaries illustrating the MRMC vs standalone split and explicit washout language. They also show that maintaining a large standalone corpus alongside controlled MRMC is common.

Chest‑CAD (CXR, Imagen) — K210666 🔗

  • MRMC: 24 clinical readers × 238 cases (two sessions with ≥28‑day washout).
  • Standalone: 20,000 chest radiographs from 12 U.S. sites.
  • Both streams: algorithm subgroup metrics, followed by reader AUC/sensitivity/specificity deltas under DBM analysis.

FractureDetect (MSK radiographs, Imagen) — K193417 🔗

  • MRMC: 175 cases across study types; reader AUC improved (paired in MRMC).
  • Standalone: 11,970 radiographs with high AUC (0.982) and stratified per anatomic region. FDA Access Data+1

MammoScreen® 3 (Therapixel) — K240301 🔗

  • MRMC: Retrospective dataset with 240 combined DBT/2D mammograms.
  • Standalone: 7,544 exams for algorithm performance (with priors support). FDA Access Data

MammoScreen® 4 (Therapixel) — K243679 (with PCCP) 🔗

  • Standalone: 1,475 patients / 2,950 studies to establish non‑inferiority vs earlier versions across multiple endpoints; data independence and subgrouping specified.
  • MRMC: Three separate MRMC studies (FFDM; DBT; combined with priors) showed superiority of aided vs unaided readers.
  • Regulatory note: cleared with a Predetermined Change Control Plan (PCCP), which pairs clean standalone endpoints to future updates—an approach tailor‑made for continued standalone accrual.

Rayvolve LN (Pulmonary Nodules on CXR, AZmed) — K243831 🔗

  • Standalone: 2,181 radiographs with AUC/sensitivity/specificity and subgroup analyses.
  • MRMC: 400 cases; readers significantly improved AUC, sensitivity, and specificity with ≥1‑month washout.

EchoSolv AS (Severe Aortic Stenosis, Echo IQ) — K241245 🔗

  • Standalone: 6,268 TTE studies; AUROC 0.948 with extensive subgroup reporting (age, sex, ethnicity, BMI, LVEF).
  • MRMC: 5 readers × 200 TTE studies. (Cardiology, 21 CFR 892.2060.)

Saige‑Dx (DBT, DeepHealth/RadNet) — K220105 and K251873 🔗

  • K220105 (Traditional 510(k)): MRMC: 18 MQSA readers × 240 cases with ≥4‑week washout; Standalone: 1,304 cases from 9 U.S. sites. FDA Access Data+1
  • K251873 (Special 510(k), 2025): Standalone: 2,002 DBT mammograms (multi‑vendor: Hologic & GE), non‑inferiority vs predicate (K243688). Prior MRMC evidence remained applicable. Illustrates how ongoing standalone accrual supports faster modifications/expansions. FDA Access Data

When Standalone Isn’t Bigger: What We Learn from Devices with Standalone:MRMC ≤1 🔗

What the data show. In our data snapshot, 17 devices report a Standalone:MRMC ratio of ≤1. This does not mean the programs are under‑evidenced; rather, it reflects claim type, task design, and what “n” measures (patients vs. scans vs. lesions). In several categories—image‑acquisition guidance, scoring/quantification tools, and high‑annotation‑cost tasks—it’s common to see MRMCs that are as large or larger than standalone sets.

Why the ratio can be ≤1 (and still be appropriate) 🔗

  1. Claim type drives the evidence mix.
    • Reader‑impact claims (e.g., “aided reads are better”) naturally put more weight on MRMC, sometimes with hundreds of cases and many readers to detect realistic ΔAUC.
    • Guidance / acquisition optimization tools (product code QJU) test operator performance under controlled reading/acquisition tasks; standalone data may be narrower (e.g., algorithm gating or quality thresholds).
  2. Units matter. Ratio calculations can be “equal” on paper while measuring different denominators: patients vs nodules; cases vs lesions; cases vs series. That’s acceptable if the protocol is clear and the unit maps to the endpoint and labeling.
  3. High‑effort truthing. For some devices, ground truth can be expensive (full ROI segmentation, biopsy, consensus, longitudinal follow‑up) whereas the MRMC measurements can be less burdensome (e.g. categorical, length measurement, etc).

Conclusion 🔗

Our analysis across dozens of AI/ML SaMD 510(k) submissions reveals several consistent study design frameworks:

  • Compact MRMC Studies: Reader studies typically involve 12–24 readers and 200–400 cases, with ≥4-week washouts to measure aided vs. unaided reader performance.
  • Balanced Evidence Strategies: Precedent exists for MRMCs studies using the same data for the standalone test leading to a 1x standalone to MRMC ratio.
  • Large Standalone Datasets: Standalone validations often run 3× or more larger than MRMC, covering thousands of patients, exams, or images across sites, vendors, and subgroups to demonstrate robustness and independence from training data.
  • Parallelization as a Speed Lever: The fastest clearances come from sponsors who run MRMC early and keep standalone data accrual running through washout.

These frameworks—compact MRMC for reader effect, expansive standalone for robustness, and careful protocol separation—reflect practices distilled over years of submissions and refined by real-world edge cases. They will continue to evolve as models, claims, and regulatory pathways adapt.

If you have a working product and want to get to market ASAP, reach out today. Let’s schedule the gap assessment, finalize claims and endpoints, and put a date on your submission calendar now. We can add certainty and speed to your FDA journey. Our 3-Month 510(k) Submission program guarantees FDA submission in 3 months with clearance in 3 to 6 months afterwards. We are able to offer this accelerated service because, unlike other firms, we have physicians, engineers, and regulatory consultants all in-house focused on AI/ML SaMD. We leverage decades of combined experience, fine-tuned templates, and custom-built submission software to offer a done-for-you, turnkey fast 510(k) submission.

No working product? Reach out anyway. We can build it for you too!

Our People. Our Process. Our Proof. 🔗

Our People: A team of seasoned medtech experts—clinicians, engineers, and regulatory specialists—who have guided hundreds of devices from concept to FDA clearance.
Our Process: A proven, end-to-end pathway for AI/ML SaMD—regulatory strategy, validation, deployment, and FDA submission in just 90 days, guaranteed.
Our Proof: Trusted by leading medtech innovators—over 60 FDA SaMD clearances, 100+ projects delivered, and partnerships with top institutions worldwide.

References 🔗

             
K Number device name mrmc n mrmc unit standalone n standalone unit ratio standalone to mrmc
DEN190040 Caption Guidance 240 patients 50 patients 0.2
K223347 UltraSight AI Guidance 240 subjects 75 subjects 0.3
K240044 CADDIE 841 patients 389 patients 0.5
K161201 ClearRead CT 200 cases 100 cases 0.5
K241770 Prostate MR AI (VA10A) 340 cases 222 transversal T2 series 0.7
K221624 Avenda Health AI Prostate Cancer Planning Software 200 cases 137 patients 0.7
K243294 Brainomix 360 e-ASPECTS 140 cases 137 scans 1.0
K212783 ProstatID 150 patients 150 cases 1.0
K240712 icobrain aria 199 cases 199 cases 1.0
K233342 CINA-ASPECTS 200 cases 200 cases 1.0
K211541 MammoScreen 2.0 240 cases 240 cases 1.0
K202300 Optellum Virtual Nodule Clinic, Optellum software, Optellum platform 300 subjects 300 nodules 1.0
K243688 Saige-Dx (3.1.0) 419 studies 419 studies 1.0
K240697 See-Mode Augmented Reporting Tool, Thyroid (SMART-T) 600 cases 600 cases 1.0
K190442 Koios DS for Breast 900 patient cases 900 lesions 1.0
K240003 Velmeni for Dentists (V4D) 1,797 images 1,797 images 1.0
K210365 Second Opinion 2,010 images 2,010 images 1.0
K242683 QP-Prostate® CAD 228 cases 247 lesions 1.1
K242130 Koios DS 650 cases 900 lesions 1.4
K212616 Koios DS 650 cases 900 lesions 1.4
K220624 AI4CMR v1.0 146 cases 238 cases 1.6
K210670 BU-CAD 628 cases 1,139 cases 1.8
K233738 Overjet Caries Assist-Pediatric 636 patients 1,190 images 1.9
K201019 Genius AI Detection 390 cases 764 cases 2.0
K250221 StrokeSENS ASPECTS Software Application 100 CT scans 200 CT scans 2.0
K242437 Smile Dx® 352 cases 867 cases 2.5
K231678 Overjet Periapical Radiolucency Assist 379 images 1,147 images 3.0
K232384 Videa Dental Assist 378 radiographs 1,445 radiographs 3.8
K241725 Better Diagnostics Caries Assist (BDCA) Version 1.0 328 images 1,298 images 4.0
K222176 BoneView 480 cases 2,000 radiographs 4.2
K251071 Fetal EchoScan (v1.1) 200 exams 877 exams 4.4
K242342 Fetal EchoScan 200 exams 877 exams 4.4
K213795 Videa Caries Assist 226 radiographs 1,034 radiographs 4.6
K230144 Denti.AI Detect 154 images 709 images 4.6
K221564 Brainomix 360 e-ASPECTS 54 cases 256 patients 4.7
K182373 PowerLook Tomo Detection V2 Software 260 cases 1,265 cases 4.9
DEN180005 OsteoDetect 200 cases 1,000 images 5.0
K220105 Saige-Dx 240 cases 1,304 cases 5.4
K243831 Rayvolve LN 400 cases 2,181 radiographs 5.5
DEN230008 DermaSensor 286 lesions 1,579 lesions 5.5
DEN170022 QuantX 111 cases 652 lesions 5.9
K231470 Lunit INSIGHT DBT 258 DBT exams 2,202 DBT exams 8.5
K231001 DeepTek CXR Analyzer v1.0 300 cases 3,000 scans 10.0
K211678 Lunit INSIGHT MMG 240 mammograms 2,412 mammograms 10.1
K242171 TechCare Trauma 769 cases 7,744 images 10.1
K243614 Sonio Suspect 750 images 8,745 images 11.7
K220164 Rayvolve 186 patients 2,626 radiographs 14.1
K241620 ChestView US 240 cases 3,884 cases 16.2
K212365 BoneView 480 cases 8,918 radiographs 18.6
K230085 Lung-CAD 244 cases 5,000 cases 20.5
K223811 Lung-CAD 244 cases 5,000 cases 20.5
K213353 Aorta-CAD 244 cases 5,000 cases 20.5
K240301 MammoScreen® (3) 240 mammograms 7,544 exams 31.4
K193417 FractureDetect (FX) 175 cases 11,970 radiographs 68.4
K210666 Chest-CAD 238 cases 20,000 cases 84.0

Get To Market Faster

Subscribe to Read the Full Article

Our Medtech tips will help you get safe and effective Medtech software on the market faster. We cover regulatory process, AI/ML, software, cybersecurity, interoperability and more.

SHARE ON
×