Definitive Guide to AI/ML SaMD Ground Truthing

The Problem 🔗

**Figure. Adjudication Method Worksheet.** Decision tree for selecting a fast, defensible ground‑truth method. Start with a clinical anchor; if present, imaging adjudication is unnecessary. If not, route by task: **(A)** combined detection+segmentation (CADe) with disagreement gates (distance/Dice) and majority/consensus rules; **(B)** continuous/categorical values—measure directly or derive from segmentation and adjudicate by median/majority; **(C)** segmentation outputs—use STAPLE or expert pick with minimal edits, guarded by Dice thresholds. The goal is minimal meetings and auditable, pre‑specified rules.

AI/ML SaMD teams routinely struggle to design adjudication protocols that are regulator-defensible yet fast enough for commercial timelines. Sponsors face a thicket of inconsistent practices—panel sizes, voting rules, QC layers, segmentation consensus methods (e.g., STAPLE vs. per-pixel), measurement aggregation (mean/median/two-most-concordant), and when to rely on objective diagnosis instead of multi-expert opinion. The result is avoidable rework, slow study execution, uncertainty during FDA interactions, and safe + effective devices that are unnecessarily delayed to get to where they belong: on the market and in the clinic saving lives and making impact.

This article aims to solve three concrete problems:

Ambiguity in “ground truth” construction. Teams lack a clear, task-fit playbook (detection, triage, diagnosis, segmentation, measurement, EEG/time-series) to convert expert reads and objective diagnosis into a single, auditable reference standard.
Excessive operational burden. Over-sized panels, unnecessary synchronous meetings, and open-ended QC loops inflate cost and time. Sponsors need lean defaults (asynchronous reads, 2-of-3 / 2+1, thresholded QC) that preserve rigor without calendar drag.
Regulatory predictability. Without precedent-aligned patterns, adjudication sections in protocols and submissions invite questions and delays. Teams need concise, precedent-backed templates that map methods to indications and explicitly define triggers, thresholds, and escalation paths.

In short: we are codifying pragmatic, precedent-aligned adjudication patterns that minimize meetings and re-reads, maximize use of objective diagnosis, and standardize thresholds—so we can lock ground truth faster, document it cleanly, and move to market sooner with fewer regulatory surprises.

About the Author 🔗

Yujan Shrestha, MD is a physician–engineer and Partner at Innolitics specializing in AI/ML SaMD. He unites clinical‑evidence design with software and regulatory execution. He uses his clinical, regulatory, and technical expertise to discover the least burdensome approach to get your AI/ML SaMD to market as quickly as possible without cutting corners on safety and efficacy. Coupled with his execution team, this results in a submission‑ready file in 3 months (once prerequisites are met).

Why this matters for our “FDA submission in 3 months” guarantee 🔗

Our 3-month submission guarantee depends on eliminating as many unknowns and timeline risks as possible. One of the biggest timeline risks is the pivotal clinical study needed to substantiate key marketing claims. The most critical aspects of this study are the establishment of acceptance criteria, sample size determination, and ground truth definition. In this article, we will focus specifically on ground truth.

The fastest path to a defensible submission is to lock a clean, auditable “ground truth” early—with minimal meetings, minimal panel size, maximum automation, and maximum reliance on objective diagnosis. This article operationalizes that into concrete defaults we implement on day one:

Objective-first: We prioritize pathology/operative findings, PSG, echo, or structured chart review as primary truth. Expert reads verify mapping only. This compresses ground truthing from weeks into days and reduces FDA back-and-forth on reference standards and getting stuck on high inter-reader variability. It is hard to argue with the pathology result.
Asynchronous reads with 2-of-3 / 2+1: Three blinded readers in parallel; adjudicator engages only on discordance. No standing panel meetings. This keeps synchronous meetings off your critical path.
Segmentation consensus by algorithm (STAPLE), not meetings. We will generate a single per-case consensus mask by applying STAPLE to ≥3 independent expert masks (probability map → pre-specified threshold with morphology/topology guards).
- If the primary endpoint is a measurement (e.g., volume, area, length, mean HU), we will derive the scalar(s) from the STAPLE consensus mask for each case (no panel meetings or per-pixel manual reconciliation).
- If the endpoint is segmentation quality (Dice/HD), the primary analysis will compare the algorithm directly to the STAPLE consensus mask. As a secondary sensitivity, we will also analyze algorithm vs. each reader mask using a mixed-effects model with random effects for case and reader to account for clustering; the median of per-reader Dice will be reported as an additional robustness metric.
Decouple detection and segmentation consensus. Detection consensus can be established by majority vote in most cases but you must predefine the detection threshold and minimum detectable unit first. For example, if your device detects lung nodules, you can consider a positive case where at least two radiologists have an overlapping segmentation. Segmentations without overlaps are removed (treated as false positives).
Measurement truth = math: Mean/median across three measurements; escalate only on disagreement gates (e.g., >5 mm or >10%). Disagreements resolve without convening a room.
Pre-specified triggers & “Indeterminate” handling: We write numeric gates (overlap seconds, Dice, IoU, disagreement) directly into your protocol, plus clear “Indeterminate → forced decision/objective” rules—preventing post-hoc debates and extra cycles.
Parallel Execution: We batch work, pre-calibrate readers, go from 0 to 1 on as many threads as soon as possible, keep alternates on standby, and sometimes even execute parallel threads for high risk items to ensure timeline contingencies are in place.

Bottom line: These adjudication patterns are how we keep your validation on schedule, your protocol reviewer-friendly, and your documentation bulletproof—so we can stand behind a three-month submission timeline with confidence. These are guiding principles rather than hard-and-fast rules. Nuances of your situation may require departures.

Executive Summary 🔗

**Figure. Reference-Standard Pathways.** Four practical routes to ground truth: (1) two independent reads with a third adjudicator (2+1), (2) automated consensus (2/3 vote or mean/median for measurements), (3) asynchronous reads followed by a time-boxed manual consensus for discordant cases, and (4) fully synchronous three-reader consensus for all cases.

Method 1 — Two Readers → Third Adjudicator 🔗

What it is (diagram left): Reader 1 and Reader 2 perform independent, blinded reads. If their case-level calls disagree (or a pre-set gate is exceeded), a third blinded adjudicator reviews and issues the final reference label.

When to use: When your project is tight on budget. You end up paying for the fewest reads this way but this might be slower because of the additional manual step and FDA might want a stronger method. You can upconvert to Method 2 or 3 later.

Precedents:

CADe: K190424, K202992. CADt: K213721, K214043, K243611, K243363, K231384. CADx: K231130, K251766, K243189. CADm: K231324, K231631.

Method 2 — Three Asynchronous Reads → Automated Consensus 🔗

What it is (diagram top-right): All readers work asynchronously and you compute an automatic consensus:

2-of-3 majority for categorical case-level labels (present/absent, triage).
Median/Mean of 3 for measurements (diameter, volume, EF).
STAPLE (segmentation masks)

When to use: A good balance between budget, speed, and FDA regulatory risk. You convert to method 3 if too much disagreement to adjudication automatically or FDA objects to automated adjudication strategy.

Precedents:

CADe: K231025, K241923, K242821; ICH: K203260, K232431. CADt (2:3 concurrence): K241727, K232751, K230020, K251151. CADx (ordinal): K222275 (median of five), K202013, K241561. CADm: K230497, K232083, K230534, K242607, K222361.

Method 3 — Three Asynchronous Reads → Manual Consensus 🔗

What it is (diagram bottom-left/right): All three readers complete blinded, asynchronous reads. Optionally, only the discordant subset is brought to a consensus session (i.e. all experts in the same room or teleconference call) to reach a single reference label.

When to use: When lowest FDA regulatory risk is a high priority over cost.

Precedents:

CADe/CADt: K232410, K242171, K251983. CADx: K202013, K241561, K243234, K223490. Seg/metrics: K243647.

Method 4 — Three Synchronous Reads 🔗

What it is (diagram bottom-right): All three experts meet live to label every case together and produce one reference label per case.

I am including this method for completeness. This method is implied from various 510(k) summaries.

When to use: I do not recommend this approach because on of the readers might have a strong personality and bias the others.

Precedents (consensus panels; synchrony typically not specified in summaries): K202013, K241561, K243234, K223490.

Cost, Risk, and Speed Considerations 🔗

Figure. The Tradeoff Triad. “fast, cheap, good — pick two”

Here is an adaptation of the adage “fast, cheap, good — pick two” to medical-device development. The triangle shows three competing objectives: Low FDA risk (apex), Low cost (left), and High speed (right). It illustrates that teams can reliably achieve at most two:

Low FDA risk + Low cost → longer timelines (more contract negotiation, extended reader recruitment, and conservative study designs).
Low FDA risk + High speed → higher cost (parallel workstreams, duplicate evidence generation, and contingency activities to de-risk the regulatory pathway).
Low cost + High speed → rapid delivery but greater likelihood of FDA questions, requests for additional data, or rejection.

Now, applying the tradeoff triad concept to our study design, I have outlined the following:

Figure. Strategy matrix comparing three study/operational approaches—**3+3**, **3+auto**, and **2+1**—across **Speed**, **Cost**, and **FDA Risk**. Each cell summarizes the expected outcome for that strategy (e.g., **3+3** → medium speed / high cost / low FDA risk; **3+auto** → medium–high speed / medium–high cost / low–moderate FDA risk; **2+1** → medium–high speed / low–medium cost / moderate–high FDA risk).

This matrix converts the abstract tradeoff between cost, speed, and regulatory risk into three practical program choices. The 3+3 option represents a conservative, highly manual approach with extensive human review and redundancy; it minimizes regulatory risk but increases cost and could slow time-to-market. The 3+auto option is a hybrid that integrates automation into parts of the workflow to raise throughput while retaining substantial human oversight—it shortens timelines relative to fully manual designs but still requires meaningful investment to manage validation and QA, yielding low-to-moderate regulatory risk. The 2+1 option prioritizes speed and lower up-front cost by reducing manual reader burden (with a single adjudicator or lighter QA), accepting a higher probability of FDA questions or requests for additional evidence. Use this matrix to pick the approach that aligns with your commercial objectives and risk tolerance; if you select a faster/cheaper route, explicitly budget for contingency activities (parallel risk-mitigation threads, targeted post-market studies) so your regulatory strategy remains defensible.

Seven Guiding Principles 🔗

1) Prioritize Objective Diagnosis Over Expert Consensus 🔗

When a objective diagnosis exists, use it as the primary reference as this removes the chance of high inter-reader variability muddying your ground truth and making your device perform worse on paper.

Objective Diagnosis: pathology/operative findings, PSG, echocardiography, structured chart review.

Note: While objective diagnosis is preferred, you can also use another more definitive imaging modality as a ground truth too. This forms a hierarchy of evidence that is often more difficult and expensive to find the higher up the ladder you go (with 5 year outcomes and/or pathology being up at the top and subjective human based methods at the bottom).

In fact, some of the most commercially successful SaMD are the ones that can reliably gate a expensive higher tier invasive exam with a cheaper lower tier non-invasive one. For example:

CT-FFR from CCTA (HeartFlow FFRct, K152733) to gate invasive FFR
ECG-AI LVEF screening from a 12-lead ECG (Anumana, K232699) to gate confirmatory echocardiography
Lung-nodule malignancy CADx from CT (Optellum VNC, K202300) to gate biopsy/PET vs. CT surveillance
Volumetric breast density from mammography (Volpara, K182310) to inform supplemental MRI/US decisions, and
CT-based opportunistic BMD/strength (VirtuOst, K220402) to prompt DXA or osteoporosis work-up.

Here are some examples:

Tiers of ground truth for SaMD validation 🔗

Top-tier (e.g., histopathology, invasive physiology, hard outcomes) carries high data cost, low annotation cost, minimal inter-reader variability, and supports strong marketing claims. Middle-tier (e.g., adjudicated quantitative imaging/scales) has moderate data cost, high annotation cost, medium variability, and yields moderate claim strength. Bottom-tier (e.g., subjective expert reads) offers low data cost, moderate annotation cost, high variability, and only limited claim strength.

Top tier – hard clinical truth 🔗

Adjudication: tissue diagnosis or invasive physiologic gold standards. Usually definitive enough where expert consensus is trivial or not needed.

Histopathology-proven malignancy (Colon Polyps):

Genius AI Detection 2.0. Pathology/diagnostic imaging verification K221449

Histopathology‑proven malignancy (lung nodules, CADx risk):

Optellum Virtual Nodule Clinic (VNC) validated against biopsy/resection or ≥2‑year imaging follow‑up for malignant/benign truth. K202300.

Invasive physiology (coronary ischemia):

HeartFlow FFRct validated versus invasive FFR ≤0.80 as the reference standard. K152733 (original), later expanded submissions exist.

Middle tier – objective imaging or widely accepted clinical scales 🔗

Adjudication: quantitative imaging surrogates; core‑lab or standardized criteria; noninvasive comparators.

Coronary calcium / Agatston (CT): automated CAC scoring from non‑contrast cardiac CT producing Agatston, volume, mass scores. Siemens syngo.CT CaScoring K221219; Nano‑X HealthCCSng K241440; BunkerHill CAC (gated) Algorithm K240369; Imbio CAC K230112. (syngo.CT explicitly lists Agatston equivalents.)
Coronary percent stenosis / plaque burden (CCTA): HeartFlow Analysis (v3.plus) outputs % stenosis and plaque volumes (alongside FFRct). K213857. Autoplaque 3.0 (Cedars‑Sinai core‑lab) provides detailed plaque quantification with core‑lab methodology. K212758.
Breast density (BI‑RADS density category + volumetrics): Volpara Imaging Software provides volumetric breast density and BI‑RADS (4th/5th ed.) density category. K182310, K153427.
Tumor response metrics (RECIST / qEASL styled workflows): mint Lesion supports standardized staging/response criteria for tumor tracking/assessment (RECIST‑type use). K142647.
Bone mineral density (CT QCT / opportunistic BMD): Mindways QCT Pro (asynchronous calibration)—CT‑derived BMD measurement. K140342. VirtuOst (O.N. Diagnostics)—CT‑based BMD, bone strength, load‑to‑strength ratio. K220402. (FDA has also defined a specific device type for opportunistic low‑BMD software.)

Bottom tier – expert image interpretation (subjective clinical scales) 🔗

Adjudication: majority/consensus reads by radiologists; subjective severity scores.

Radiologist‑determined truth (example triage AI): Aidoc BriefCase pivotal studies used two radiologists with a third tie‑breaker (consensus) as ground truth; similar approach across other indications (e.g., head CTA VO; AAA). K214043, K220709, K230534.
Radiographic severity scales (e.g., knee OA—Kellgren‑Lawrence): IB Lab KOALA outputs KL grade and objective OA measures (JSW, osteophytes, etc.) from knee radiographs—truth typically via expert reads per OARSI/KL criteria. K192109 (with later submissions such as K223646 for related modules).

Precedents: pathology/diagnostic imaging verification K221449; PSG scoring and physiologist review K213360, K233618; echo anchoring K213794; duplicate chart review for AF K233549; structured Sepsis‑3 committee with “Indeterminate/forced‑majority” handling DEN230036.

2) Prioritize Asynchronous Workflows Over Synchronous Bottlenecks 🔗

Similar to the concept of “This meeting could have been an email.” It is best to avoid having to set up a synchronous meeting with busy clinical experts.

3) Prioritize Algorithmic Consensus Over Manual Consensus 🔗

When possible, use an algorithm to collapse multiple expert measurements into a single ground truth. It can be as simple as:

Median for continuous measurements
Majority vote for categorical ones
STAPLE for segmentation

≥3 expert masks → STAPLE (primary) or per‑pixel majority to form the consensus mask; QC only if thresholds are breached (e.g., Dice/HD outside bounds).

Instance segmentation: match objects (IoU) first, then apply the same consensus per instance; use an enclosing box rule if working with boxes.

Precedents: STAPLE consensus K220034, K223268, K252362, K250686; per‑pixel majority/QC K241108, K242607; thresholded expert QC/reconciliation K242745, K243647; box reconciliation by enclosure K213566.

4) Prioritize simple segmentation → measurement algorithm over complex but ideal ones 🔗

Comparison of two adjudication pathways for AI segmentation validation. **Option 1** adjudicates at the *segmentation level* (e.g., STAPLE consensus) and then derives clinical values, enabling Dice/HD analysis but requiring verification of the consensus method. The **Option 2** derives *clinical values* (Agatston, BI-RADS, RECIST, BMD, stenosis %) directly from each expert segmentation, then adjudicates at the scalar level (e.g., median). This avoids STAPLE artifacts and aligns directly with regulatory claims, making Option 2 my **preferred approach for primary clinical endpoints**, while Option 1 may be retained for secondary segmentation-quality analyses.

A lot of devices derive their final outputs from upstream segmentations. While it may be tempting to make these derivations with as high fidelity as possible, therefore necessitating the usage of complex methods such as Deep Neural Nets, it is far better to keep it simple so that these derivation algorithms can be tested through unit tests. This opens up flexibility on how the ground truth is established and adjudicated as shown in the figure above.

Why: Fast automatic adjudication

Precedents: Mean/average across three readers K232083, K230534, K230497; “two most concordant” rule / disagreement handling K222361, K231324; measurement on consensus structures/derived metrics K242607, K241038.

5) Pre‑specify adjudication triggers, thresholds, and “Indeterminate” handling 🔗

If you must use a manual consensus step, I recommend first attempting to adjudicate the easy ones in an automatic fashion and only escalating to manual adjudication when the disagreements exceed a predefined threshold. Don't overthink this threshold. Few precedents exist, and justifications are usually difficult to come by. I usually stick to a one-size-fits-all combination of a percentage error and an absolute error to handle values close to zero

Detection/classification: Two out of three majority vote is usually enough. However, if you are concerned about high inter-reader variability, you might want to even manually adjudicate those.
Segmentation: Dice < Y or HD > Z → QC re‑read; otherwise accept consensus.
Instance matching: Dice ≥ 0.5. This one is useful for combinatorial detection and segmentation problems, such as Kari's detection on dental x-rays or tumor segmentation on lung nodule screening CT scans. You can say if two experts agree if their segmentations overlap by a certain amount, and you can choose to either automatically adjudicate this difference (if two out of the three detect the lesion, or don't detect it). Or you can manually adjudicate if you're concerned about high intra-reader variability skewing your results.
Time‑series: event is “true” if ≥2/3 reviewers’ intervals overlap ≥1s. This is a similar concept to instance matching where time series detection can also be thought of as a segmentation task if a time range needs to be selected, therefore turning it into a combinatory segmentation and detection task. Precedents: Temporal‑overlap rules (≥1s) and majority definitions K211452, K240993, with majority EEG policies K120260, K141883; STAPLE/consensus thresholds and QC workflows K220034, K252362, K241108; formal committee logic incl. “Indeterminate” DEN230036; “two most concordant” measurement rule K222361.

6) Use 3-5 readers for ground truthing. More readers for comparative effectiveness MRMC study. 🔗

Detection/triage: stick to 3 readers (2‑of‑3 or 2+1).
Ordinal/nuanced categories (e.g., breast density): use consensus of 5 only when justified; median tie‑rule acceptable.
Reader‑aid claims: use MRMC; otherwise avoid.
Synchronous consensus: deploy only for a pre‑flagged minority (e.g., policy choice for CADe), not for the entire cohort. Precedents: 5‑reader density consensus/median K202013, K241561, K222275; MRMC reader‑performance studies K240301, K243234, K223347, K223623.

7) Plan for contingencies and build in speed with risk mitigations into contracts. 🔗

Life happens, and sometimes readers need to drop off from studies or they may not be available on a weekend where everyone else is depending on your speed requirements. It may be worth it to spend extra money on more readers to de-risk potential disruptions or submit when the first of three readers finishes.

Ensure you have the flexibility to on-board a new reader halfway through the study if necessary. Consider key bottlenecks and strategies for how to parallelize them. For example, it is possible to conduct the comparative effect in this multi-reader study in parallel with the ground truthing if the algorithm is already completed and you are reasonably sure about stand-alone performance. For some clients where speed to market is the most important factor, this along with other risks are risks worth taking to essentially trade money for time and low regulatory risk.

Batch cases; run asynchronous first reads; pre‑book alternate readers; set response‑time SLAs.
Calibrate with a short pilot to reduce downstream discordance.

Precedents: Reader‑variability assessment/calibration K223347; dual‑reader reconciliation to consensus with iterative corrections K243647; targeted expert QC in segmentation K242745; senior adjudicator final‑say for edge disagreements K231631; 2+1 schemes that keep progress unblocked K213721, K214043, K243611, K243363, K231384.

Case Review: Themes, Patterns, and Outliers 🔗

Below are some general observations, patterns, and outliers.

Dominant patterns 🔗

Multi‑expert truthing is the norm. Most sponsors use either 2‑of‑3 majority, two readers + a third adjudicator, or explicit panel consensus (often with blinding). Examples span triage/detection, diagnosis, segmentation, and measurement tasks K231025, K241923, K242821, K251151, K213721, K214043, K243611, K243363, K231384, K202013, K241561, K243234.
Method matches task.
- Detection/triage (CADe/CADt) → Default to three blinded experts with 2‑of‑3 majority and pre‑specified 2+1 escalation; this is fast, familiar, and robust K231025, K241923, K242821, K213721, K214043, K243611, K243363, K231384.
- Diagnosis (CADx) → Prefer external objective diagnosis (pathology/operative/PSG/echo/validated chart review) as primary truth; use expert verification only as needed K221449, K213360, K213794, K233549, K233618, DEN230036.
- Segmentation (incl. instance segmentation) → Use STAPLE or per‑pixel majority across ≥3 experts; add senior QC and thresholded re‑reads (e.g., Dice/HD limits). For multi‑object tasks, match instances (IoU/Hungarian) then combine per instance K220034, K252362, K223268, K241108, K242607, K242745, K243647.
- Measurements → median or “two most concordant,” with thresholds for escalation K230497, K232083, K231324, K242607, K222361.
- Time‑series (EEG/sleep) → temporal‑overlap rules (e.g., ≥1s overlap among ≥2/3 reviewers) K211452, K240993, K241390, K233438, K120260, K141883.
- High expected inter-reader variability. Senior‑review + edit and/or committee consensus appears frequently, especially for complex masks with high inter-reader variability K221449, K243647, K242745, K240411.

Outliers and special cases 🔗

User review as a safety net appears in a few algorithm‑improvement contexts, but should not be the primary truthing mechanism K243769.
Pass/Fail scoring (without explicit adjudication) is rare; pre‑specify clinical thresholds if you go this route K222745.
Union‑of‑annotations for localization is a niche but transparent option DEN200080.

Examples by Adjudication Strategy 🔗

A. Majority vote (2‑of‑3) 🔗

Common for presence/absence decisions in detection and triage: K231025, K241923, K242821, K251151, K220709, K221552, K222076, K230020, K231130, K232410, K232431, K232751, K241390, K241440, K242292, K243685, K243851, DEN170073, DEN200069, DEN200080.

B. Two readers + third adjudicator (2+1) 🔗

A workhorse pattern for CADt and nuanced calls: K190424, K191556, K213721, K214043, K231384, K231767, K241480, K243611, K243363, K251766.

C. Panel consensus 🔗

Especially useful for difficult tasks with high anticipated inter-observer variability: Breast density (K202013, K241561), dental (K243234), vessel extraction (K223490), organs RT pipelines (K221305, K242745), MR Planner (K211841), and others: K223491, K233968, K242522, K242600.

D. Segmentation consensus—STAPLE / per‑pixel 🔗

STAPLE: white‑matter changes, brain hyperintensities, multi‑expert masks K220034, K223268, K252362, K250686.
Per‑pixel majority: spinal/lumbar structures, per‑pixel rules and medians for measures K241108, K242607, K220497.
QC layers: board‑certified reviewer corrections or dual‑reader reconciliation K242745, K243647.

E. Measurements—mean/median/”two most concordant” 🔗

Aorta diameters, EF, CAC quant, other continuous outputs: mean of 3 K230497, K232083, two most concordant K222361, median/thresholds K231324, EF vs consensus seg K232331, K241038.

F. Time‑series adjudication (EEG/sleep) 🔗

Explicit overlap/epoch‑majority rules: seizures and spikes K211452, K240993, sleep staging K233438, additional EEG adjudication K241390, with classic 2/3 policies K120260, K141883.

G. External objective diagnosis 🔗

Pathology/diagnostic reports and confirmatory tests supersede panel opinion. In this case, ground truthing may not even be needed if the data is already structured. Simple chart review may just need one expert double checking technician classifications. Complex chart review may still need expert consensus but nowhere near as difficult as agreeing on lung nodule margins. breast lesion localization with path/diag images K221449, PSG as gold standard K213360, echo pairing K213794, duplicate chart review in AF K233549, adjudication committee for sepsis DEN230036, sleep scoring by trained physiologists K233618.

Examples 🔗

Predictive CADx (e.g., 5‑year risk) 🔗

For prognostic CADx, the reference standard must be future incident disease defined before study start, with clear event windows and handling of censoring. FDA’s De Novo for the first 5‑year breast‑cancer risk SaMD (Allix5) makes this explicit: validation protocols must prespecify endpoints and performance goals, use an independent test set, establish a clinically justified reference standard to distinguish who does and does not develop disease within the window, and adjust for left/right censoring of time‑to‑disease. In practice, sponsors often use registry/pathology‑confirmed incidence plus independent/core‑lab adjudication when imaging is part of the truthing. [DEN240047]

However, at the time of this writing, the De Novo summary has not yet been released so the adjudication method is unknown. However, it is likely based on a strong clinical anchor (objective diagnosis) so reliance on expert consensus is likely minimal.

CADe: MSK X‑ray Fracture Detection (Presence/Absence + Localization) 🔗

Objective

Establish case‑level truth for fracture presence/absence and fracture localization on extremity radiographs.

Readers & Blinding

5 board‑certified MSK radiologists (independent, AI‑blinded).
Prior to consensus, each submits a binary label and one or more localization marks per case.

Adjudication

3**‑of‑5 majority** on presence/absence; localization marks resolved by enclosing the overlapping boxes (or union region) from agreeing readers.

Precedent:

Consensus/iterative reconciliation: K202013, K241561, K243647. Majority/2+1 patterns for CADe/CADt: K231025, K241923, K242821, K213721, K214043, K243611, K243363. Localization via box reconciliation: K213566. MSK fracture panels: K220164, K240845, K193417.

CADt: PE Triage on CTPA (Case‑Level Triage) 🔗

Objective

Define ground truth for pulmonary embolism presence (case‑level triage).

Readers & Blinding

Two ABR‑certified thoracic radiologists (independent, AI‑blinded).
Adjudicator: third senior thoracic radiologist (blinded).

Adjudication

Primary: 2+1—if the two initial reads disagree, the adjudicator reviews and finalizes.
Secondary: If adjudicator flags “indeterminate,” escalate to anchored evidence (e.g., confirmatory ultrasound/CTA addendum if available), then finalize.

Precedent:

2+1 triage schemes: K213721, K214043, K243611, K243363, K231384. Three‑expert/majority triage precedents: K241727, K232751, K230020, K221330, K251151.

CADx: Breast Cancer Diagnosis (Objective‑First Truthing) 🔗

Objective

Determine lesion‑level truth using objective diagnosis; avoid expert consensus unless objective diagnosis are ambiguous.

Primary Truth (Objective‑First)

Pathology report, diagnostic/post‑biopsy imaging, and radiology reports serve as primary reference.
A single MQSA‑qualified radiologist verifies objective diagnosis extraction/mapping to lesions; a second verifier only if ambiguous.

Adjudication

No panel consensus if objective diagnosis are definitive.

Precedent:

Objective‑first (pathology/diagnostic/echo/PSG/duplicate chart review; adjudication committees for CDS): K221449, K213360, K213794, K233549, K233618, DEN230036.

Segmentation: Brain WMH on MRI (Mask Consensus via STAPLE) 🔗

Objective

Create robust consensus masks for white‑matter hyperintensities (WMH).

Readers & Blinding

3 experienced neuroradiologists (independent, AI‑blinded) produce voxel‑level masks.

Adjudication

Primary: STAPLE to combine expert masks into a probabilistic and binary consensus mask.
QC thresholds: If Dice < 0.80 or 95% HD > 10mm vs STAPLE for any individual mask, a senior neuroradiologist reviews that case and issues corrections (threshold‑triggered QC).
Archive all original masks + STAPLE + final.

Precedent:

STAPLE for segmentation consensus: K220034, K223268, K252362, K250686. Per‑pixel/consensus variants and QC: K241108, K242607, K242745, K243647.

Aortic Diameter on CT (Continuous Measurement Truth) 🔗

Objective

Define reference for maximum aortic diameter (mm).

Readers & Blinding

3 cardiovascular radiologists (independent, AI‑blinded) measure diameters.

Adjudication

Primary: truth = median of 3 measurements.

Precedent:

Means/medians/two‑most‑concordant and thresholds: K232083, K230534, K243859, K231324, K222361. Related continuous metrics precedents: K241038, K242607.

Measurement‑from‑Segmentation: LV Ejection Fraction (Echo) 🔗

Objective

Establish EF (%) derived from consensus LV segmentation.

Two‑Stage Adjudication

Segmentation: 3 expert sonographers/radiologists generate masks; combine via STAPLE (primary) or per‑pixel majority; QC thresholds (Dice/HD) trigger senior review.
EF measurement: Three experts compute EF from the accepted consensus mask; truth = mean of the three; if disagreement > 10% EF, apply two‑most‑concordant rule and add a senior re‑measurement.

Evidence pattern references

EF vs consensus segmentation and multi‑reader EF: K232331, K241038, K241430, K232501. STAPLE/per‑pixel + QC: K220034, K241108, K242607, K242745, K252362.

Classification‑from‑Segmentation: Breast Density Categories 🔗

Objective

Produce BI‑RADS density category truth when classification is derived from segmentation outputs.

Readers & Blinding

5 MQSA‑qualified radiologists (independent, AI‑blinded).

Adjudication

Primary: majority vote consensus for category
If segmentation is used to aid density: finalize segmentation via per‑pixel majority (or STAPLE) before category voting.

Precedent:

5‑reader density consensus/median: K202013, K241561, K222275. Segmentation consensus aids: K241108, K220034.

Instance Segmentation: Rib Fracture Instances on CT 🔗

Objective

Establish instance‑level truth masks and counts for rib fractures.

Readers & Blinding

3 thoracic radiologists (independent, AI‑blinded) annotate instance masks.

Adjudication

Match instances across readers algorithmically (IoU ≥0.5; operational choice).
For matched instances, combine masks via STAPLE (primary) or per‑pixel majority to create a final instance mask.
For unmatched instances, require 2‑of‑3 readers to concur on presence; majority mask becomes final; if still discordant, a senior adjudicator decides.
For box‑based localizations, replace multiple boxes by the smallest enclosing box around agreeing boxes.

Precedent:

STAPLE/per‑pixel: K220034, K252362, K241108, K242607. Box reconciliation: K213566. Majority presence standards: K231025, K241923, K242821.

EEG: Seizure/Spike Detection (Temporal Overlap Rules) 🔗

Objective

Define truth for EEG event detection (seizures/spikes) using epoch overlap criteria.

Readers & Blinding

3 board‑certified EEG experts (independent, AI‑blinded) provide start/stop times for events.

Adjudication

Event is true if ≥2 of 3 reviewers’ intervals overlap by ≥1s (seizures) or meet the spike overlap rule.
Localization truth uses the overlapping time range of the agreeing reviewers.
Consensus required for rhythmic/periodic patterns across the two reviewers assigned to those sub‑tasks.

Precedent:

Temporal overlap adjudication: K211452, K240993; majority EEG rules: K120260, K141883; neurologist majority approaches: K241390.

Dental: Caries Detection + Pixel‑Level Segmentation 🔗

Objective

Establish tooth/surface‑level presence of caries and (if applicable) pixel‑level segmentation.

Readers & Blinding

3 licensed dentists (independent, AI‑blinded).
Adjudicator: oral/dental radiologist (blinded).

Adjudication

Classification (tooth/surface): consensus of 3; if non‑consensus, apply majority of 3; remaining ties adjudicated by the oral radiologist.
Segmentation (if present): per‑pixel majority (3 readers); if structure‑level measures are needed, take the median or mean across readers’ measurements; adjudicator may correct masks for protocol deviations.
(Optional) MRMC if the device claims reader‑aid.

Precedent:

Dentist consensus + oral radiologist adjudication: K220928, K212519, K222746. Pixel‑majority/consensus references (dentistry and general): K233590, K242607, K241108. MRMC for reader‑aid studies: K243234, K223347, K223623.

SaMD for Patient‑specific anatomical models for 3D printing 🔗

Axial3D’s Clinical Segmentation Performance study used ACR’s RADPEER peer-review framework: three radiologists reviewed 12 cases, and all segmentations met the prespecified acceptance criterion of RADPEER 1 or 2a. In RADPEER, 1 means full concordance with the original interpretation; 2 denotes an understandable miss, and the “a” qualifier indicates the discrepancy is unlikely to be clinically significant—so “1 or 2a” implies clinical acceptability with at most minor, non-significant variances.

How ground truth was established/adjudicated:

The 510(k) summary reports that “all cases were scored within the acceptance criteria of 1 or 2a.” While the excerpt you provided does not spell out the full truthing workflow, this language indicates a predefined acceptance‑grade framework applied to expert‑derived reference segmentations. In practice, that means expert segmentations serve as the reference, and cases meet the ground‑truth standard when they satisfy the acceptance criteria threshold. [K222745]

A closely related precedent (patient‑specific 3D models) with explicit adjudication detail:

inHEART Models: To use manual segmentations as ground truth, two external experts evaluated the concordance of the segmentations for the task; those concordant expert segmentations constituted the reference standard. This is an explicit multi‑expert adjudication step you can mirror in a 3D‑printing workflow when you need more than acceptance grading alone. [K231683]

Practical takeaway:

For 3D‑printed anatomical models, an FDA‑familiar path is:

Expert reference segmentations (primary truth).
Objective acceptance criteria to verify each case meets the reference standard (as in Axial3D Insight). [K222745]
If you need a clearer adjudication trail, add a two‑expert concordance (or 2+1) review to finalize ground truth (as in inHEART Models). [K231683]

EEG seizures/spikes (time‑series events) 🔗

Typical adjudication. Independent expert marking with temporal‑overlap rules—an event is “true” if at least 2 of 3 reviewers’ intervals overlap by a pre‑specified minimum (e.g., ≥1s); epoch boundaries/localization come from the overlapping region. Majority voting is also used for seizure presence on longer windows.

Precedent: Overlap rules and majority criteria were explicitly defined for seizures/spikes and other EEG patterns K211452, K240993, with majority rules also reported in other EEG devices K120260, K141883; seizure/spike adjudication via multi‑expert panels is likewise described in neurologic indications K241390, K231779.

Coronary artery calcium (CAC) on CT (including non‑gated chest CT) 🔗

Typical adjudication. 2‑of‑3 majority among experienced radiologists for CAC category; when disagreement persists or for quantitative scoring/thresholding, a senior adjudicator finalizes the grade.

Precedent: Majority‑rule CAC category from three radiologists is used (and replicated across versions) K210085, K241440; CAC level finalization by a senior radiologist is described when reviewers disagree K231631; consensus/majority truth for CAC segmentation/labels is also used in related CT workflows K242188.

Breast density (BI‑RADS A–D) 🔗

Typical adjudication. Panel consensus with larger reader groups (often 5 experts) to stabilize ordinal categories; some programs compute the median category as the final label.

Examples. Five‑reader consensus and median‑based truth appear across multiple submissions K202013, K241561, K243685, with median/consensus rules also documented by another sponsor K222275.

Breast lesion diagnosis / cancer confirmation (CADx) 🔗

Typical adjudication. External objective diagnosis first (pathology, diagnostic/post‑biopsy imaging, radiology reports); experts verify mapping of objective diagnosis to the case/lesion but do not overrule definitive objective diagnosis.

Examples. Objective‑first truthing and verification by MQSA‑qualified radiologists (including pathology/diagnostic image review) are explicitly described K221449. Parallel objective‑first models exist for sleep (PSG) and auscultation/echo (see items 12 and 13).

Intracranial hemorrhage (ICH)/SDH on head CT (detection/triage) 🔗

Typical adjudication. Three neuroradiologists with majority or explicit consensus; many CADt workflows use 2+1 (third reader adjudicates disagreements).

Examples. Majority read of three neuroradiologists for ICH K203260, K232431; SDH truth established by three expert neuroradiologists K232436; explicit 2+1 adjudication schemes for ICH/SDH triage in related products K243363; neuroradiologist consensus for broader stroke triage is also documented K251983.

Pulmonary embolism (PE) triage on CTPA 🔗

Typical adjudication. 2+1 (two independent thoracic/neuroradiology readers; third adjudicator if disagreement) or 3‑reader majority, all blinded.

Examples. Multi‑site triage studies compared device performance to ground truth by three experts (often “2:3 concurrence”) K251151, K220499, with other programs using three senior radiologists/majority voting K232751, K230020, K241727.

Midline shift (MLS) quantification on head CT 🔗

Typical adjudication. Quantitative truth from multiple independent measurements (often three), combined by mean/average; segmentation components may use consensus/STAPLE.

Examples. Mean of three neuroradiologist measurements defines the reference standard K232083; average shift distance of all annotators is also used K223268; MLS truth by three experts appears in allied stroke tools K243378.

Aortic aneurysm diameter (abdominal/CT) — quantitative CADm 🔗

Typical adjudication. Three independent measurements → mean (or median) as truth; escalation to a senior reviewer if disagreement exceeds preset thresholds (e.g., >5mm or >10%).

Examples. Abdominal aorta diameter truth from three experts across multi‑center validation K230534, with a similar three‑expert measurement paradigm in a larger cohort K241112.

Dental caries/periapical radiolucency (tooth or surface level; optional pixel masks) 🔗

Typical adjudication. Three‑dentist consensus/majority for classification; oral radiologist adjudicates non‑consensus; when masks are used, per‑pixel majority (or consensus) defines the ground truth segmentation.

Examples. Consensus labels with oral‑radiologist adjudication for non‑consensus cases K212519, K222746; three‑dentist consensus with study‑specific majority/consensus rules K230144; adjudicated labeling by a dental specialist in another program K232384; pixel‑level majority/consensus ground truth appears in related dental imaging K242600, K242522, and pixel‑majority principles are documented in broader imaging tasks K233590.

Lung nodule detection/segmentation/quantification (CT/CXR) 🔗

Typical adjudication. Three‑expert majority or consensus for presence; for segmentation/measurement, combine masks (STAPLE/per‑pixel) and then average measurements or apply rules like “two most concordant.”

Examples. Dataset truthed by three dedicated chest radiologists K221592; lung nodules defined using multi‑expert truthers (with CT/auxiliary reports for context) K231805; nodule delineations by three expert radiologists for quantification tasks K240740.

Rib fracture detection/localization (X‑ray/CT) 🔗

Typical adjudication. Three‑expert majority or 2+1 adjudication for presence; bounding‑box disagreements reconciled by enclosing or consensus regions; pediatric/adult sub‑panels where relevant.

Examples. 2+1 adjudication to resolve inconsistencies K202992; majority consensus with third‑reader review for initial disagreements across adult/pediatric panels K242171; multi‑expert MSK fracture truthing in related devices K220164, K240845.

White‑matter hyperintensities (WMH) segmentation (MRI) 🔗

Typical adjudication. Multi‑expert masks combined via STAPLE or per‑pixel majority, with senior clinical expert QC; longitudinal change protocols often separate annotators/reviewers/experts by design.

Examples. STAPLE‑based consensus for hyperintensities K252362 and related neuro segmentation K220034; per‑pixel consensus/QC pipelines K241108; standardized multi‑stage annotation with expert corrections K213706, with longitudinal designs describing disjoint annotator/reviewer/expert groups K232305.

Sleep staging / sleep physiology (PSG‑anchored and algorithmic staging) 🔗

Typical adjudication. objective diagnosis first (PSG scored to AASM standards by trained scorers/physiologists); for algorithmic staging evaluation, 2‑of‑3 technologist majority per epoch is common.

Examples. PSG datasets scored by trained physiologists; video annotations by blinded reviewers for auxiliary labels K233618; sleep‑staging software compared to 2/3 expert consensus per epoch K233438; PSG gold standard and independent scorers in OSA screening K213360.

Conclusion 🔗

Our analysis across hundreds of FDA submissions reveals several consistent adjudication frameworks:

Tiered Ground Truth: Using multiple experts (typically 3) with majority voting or consensus protocols, often with escalation paths for disagreements
Automatic Adjudication: For quantitative measurements, averaging multiple expert assessments or using statistical methods like STAPLE for segmentations, median for continuous values, and majority vote for categorical ones.

The frameworks above—tiered ground truth, scalar‑level adjudication for derived metrics, gating lower‑tier tests to higher‑tier decisions—reflect practices distilled over years of submissions and refined by real‑world edge cases. They will continue to evolve as models, data sources, and clinical workflows change.

If you have a working product and want to get to market ASAP, reach out today. Let’s schedule the gap assessment, finalize claims and thresholds, and put a date on your submission calendar now. We can add certainty and speed to your FDA journey. Our 3‑Month 510(k) Submission program guarantees FDA submission in 3 months with clearance in 3 to 6 months afterwards. We are able to offer this accelerated service because, unlike other firms, we have physicians, engineers, and regulatory consultants all in house with the focus of AI/ML SaMD. We leverage our decades of combined experience, fine tuned templates, and custom built submission software to offer a done-for-you hands-off turnkey fast 510(k) submission.

Submit My 510(k)

References 🔗

The following sources were used to support the claims made in this article.

K Number	Device Name	Applicant	Adjudication Quote
DEN170073	ContaCT	Viz.Al, Inc.	"In cases where the neuro-radiologists did not agree on whether a study required further review, an additional neuro-radiologist provided an additional opinion and established a ground truth by majority consensus."
DEN180005	OsteoDetect	Imagen Technologies, Inc.	"Ground truth for each case was determined by three US board certified orthopedic hand surgeons who independently interpreted images using the standard clinical definition of a distal radius fracture. Ground truth for the presence/absence of distal radius fracture is defined as the majority opinion of at least 2 of the 3 clinicians participating in the truthing process."
DEN190040	Caption Guidance	Bay Labs, Inc.	"Following the study and control exams, a panel of five (5) expert cardiologist readers independently provided assessments of whether the patient study, in its totality, provided sufficient information to assess ten clinical parameters."
DEN200069	Cognoa ASD Diagnosis Aid	Cognoa, Inc.	"Majority rule was used to resolve discrepancies between the two central reviewers and the site diagnosing specialist who all evaluated the same subjects."
DEN200080	Paige Prostate	Paige.AI	"The union of annotations between at least 2 of the 3 annotating pathologists was used as the localization ground truth."
DEN220066	BrainSee	Darmiyan, Inc.	"All ground truth labels were reviewed and confirmed by consensus of three physicians with clinical experience evaluating aMCI patients."
DEN230003	Viz HCM	Viz.ai, Inc.	"Study protocols must include a description of the adjudication process(es) for determining ground truth of training and test datasets."
DEN230027	NaviCam ProScan	Ankon Technologies Co., Ltd	"When the cutoff value for consistency is less than 3, two arbitration experts independently review and modify the classification results, correcting any missed diagnoses, misdiagnoses, or misjudgments. If difficult questions arise, the arbitration experts engage in collective discussion and confirmation."
DEN230036	Sepsis ImmunoScore	Prenosis, Inc.	"The adjudication process for resolving reader disagreements involved "a retrospective chart review done by a team of three physicians that reviewed the medical chart to determine the presence of a sepsis event." "The entirety of the patient's record was sent to an adjudication committee of three physicians." "If it was unclear whether the infection was the cause of organ dysfunction, the adjudicator was instructed to answer 'Indefinite,' and the patient's Sepsis status was labeled as 'Indeterminate.' In addition to providing the 'Septic.' 'Non-Septic,' or 'Indeterminate' label for each subject, each adjudicator was also asked to also provide a 'forced decision' in 'Indeterminate' cases. This led to two groups for analysis, the adjudicated forced majority group and the adjudicated forced unanimous - the majority group was all patients that received adjudication and their Sepsis 3 determination was defined by the majority rule of diagnosis by physicians and the unanimous was where all physicians agreed on the diagnosis."
K120260	ICTA	EXCEL-TECH LTD. (XLTEK)	"Due to the anticipated inter-rater variability among EEG experts, a majority rule (at least 2 out of 3) was applied to make the final determination of 'true' electrographic seizure."
K141883	CLINISCANSM EEG	PICOFEMTO LLC	"Due to the expected inter-rater variability, a two-thirds majority rule was used to determine the ground truth for seizure presence."
K142273	EmboGuide	PHILIPS MEDICAL SYSTEMS NEDERLAND B.V.	"First, feeding vessels of the lesions were defined by consensus of two experienced interventional radiologists (located outside the United States) who also performed the procedures by using all available information (2D angiography, MR and /or CT, Cone Beam CT (CBCT) and EmboGuide). This was used as the "ground truth"."
K182177	Accipiolx	MaxQ-Al Ltd.	"Device sensitivity and specificity was compared to ground truth established by concurrence of at least two expert neuroradiologist readers."
K183019	SIS Software version 3.3.0	Surgical Information Sciences, Inc.	adjudication was done.
K190072	BriefCase	Aidoc Medical, Ltd.	"Another radiologist was used to break ties between the report and the reviewer."
K190424	HealthICH	Zebra Medical Vision Ltd.	"In the event that the two ground truthers did not agree, a third, more senior US Board Certified neuro-radiologist reviewed the axial CT series and determined ground truth (presence or absence of ICH)."
K191556	Red Dot	Behold.AI Technologies Limited	"The ground truth was determined by two readers with a third reader in the event of disagreement/discrepancy."
K191647	QLAB Advanced Quantification Software	Philips Healthcare	"The results of the validation show that when used as intended, the healthcare professional was able to successfully determine which contours required revision and was capable of revising in the "tracking revision" screen prior to accepting the measurements for a report to create accurate measurements of the RV volume."
K192109	KOALA	IB Lab GmbH	"This dataset contained a total of 6597 radiographs, representing 1149 individuals for which ground truth grading for Kellgren Lawrence grades, as well as osteophyte, sclerosis and joint space narrowing grades according to the OARSI (Osteoarthritis Research Society International) guidelines, was established by three physicians following adjudication procedures for discrepancies."
K192320	HealthCXR	Zebra Medical Vision, Ltd.	"The validation data set was truthed (ground truth) by three US Board-Certified Radiologists (truthers)."
K192969	Ezra Plexo Software	Ezra AI Inc.	"consensus ground truth created by five U.S. board certified expert radiologists."
K193087	Rapid ICH	iSchemaView Incorporated	"the RAPID ICH performance has been validated through the use of phantoms and retrospective case data and through the use of reader truthing of the data."
K193267	Al-Rad Companion (Musculoskeletal)	Siemens Medical Solutions USA, Inc.	"Ground truth annotations were established using manual vertebra height and density measurements performed by four radiologists (two readers per case plus a third reader for adjudications)."
K193300	AIMI-Triage CXR PTX	RADLogics, Inc.	"The AIMI-Triage CXR PTX output was compared to the ground truth established by 3 independent US-board certified radiologists (Truther involved in the ground truthing process was blinded to any other Truther's results, to any existing report, and to the results obtained by the AlMI-Triage CXR PTX software."
K193417	FractureDetect (FX)	Imagen Technologies, Inc.	"Each case had been previously evaluated by a panel of three U.S. board-certified orthopedic surgeons or U.S. board-certified radiologists who assigned a ground truth binary label indicating the presence or absence of a fracture."
K193658	Viz ICH	Viz.ai, Inc.	"Sensitivity and specificity were calculated in the image database, comparing the Viz ICH's output to ground truth as established by trained neuro-radiologists."
K200621	Caption Interpretation Automated Ejection Fraction Software	Caption Health	"Results of the Clip Annotator were compared to evaluation by a panel of expert readers. That study met the pre-defined acceptance criteria and found that the observed PPV point estimates for the Clip Annotator were greater than 97% for identification of the imaging mode and the view."
K200667	EyeArt	Eyenuk, Inc	"Each subject’s images were graded independently by 2 experienced and certified graders and in case of significant differences (determined using prespecified significance levels) in the 2 independent gradings, a more experienced adjudication grader graded the same images."
K200717	CLEWICU System (ClewICUServer and ClewICUnitor)	CLEW Medical Ltd.	"As an initial matter, a tagging system was developed and validated (against human physician readers as ground truth)."
K200760	Rapid ASPECTS	iSchemaView Inc.	"Data truthing was performed by three experts."
K200855	CINA	AVICENNA.AI	"Device sensitivities and specificities were compared to ground truth established by concurrence of three US-board-certified neuroradiologist readers."
K200873	HALO	NICo-Lab B.V.	"Ground truth was established by an expert panel consisting of 3 neuro radiologists."
K201034	Syngo.CT CaScoring	Siemens Medical Solutions USA, Inc.	"No statistically relevant difference between the performance of the three individual readers compared to their consensus, and the algorithm compared to the consensus was found."
K201411	Visage Breast Density	Visage Imaging GmbH	"Three board certified radiologists with MQSA qualification per site performed a breast density classification and the consensus of the three reviewers was determined for each study."
K202013	WRDensity by Whiterabbit.ai	Whiterabbit.ai Inc.	"consensus of five expert radiologists who independently assessed breast density on a test dataset."
K202928	DV. Target	Deepvoxel INC	"The ground truth OARs contours on the public validation data were generated from the consensus of three board-certified physicians."
K202992	BriefCase, RIB Fractures Triage (RibFx)	Aidoc Medical, Ltd.	"Ground truthing was performed by two radiologists with an additional third radiologist to resolve inconsistencies."
K203235	VBrain	Vysioneer Inc.	"The ground truth of each tumor contours was generated from the consensus of three board-certified radiation oncologists."
K203256	Imbio RV/LV Software	Imbio, LLC	"The second test (Reader Study- II) will demonstrated the accuracy of RVLV diameter ratios compared to radiologist's measurement of the RVLV diameter ratio."
K203258	syngo.CT Lung CAD	Siemens Healthcare GmbH	"The reference standard was based on reader majority (three out of five) followed by expert adjudication, as needed."
K203260	syngo.CT Brain Hemorrhage	Siemens Medical Solutions USA, Inc.	"The data cohort consisted of 600 anonymized head CT cases from 5 sites in US and Europe with approximately equal distribution of positive (case with ICH) and negative (case without ICH) cases. Sensitivity and specificity of syngo.CT Brain Hemorrhage in processing of non-contrast head CT have been analyzed by comparison to a ground truth established by majority read of 3 US board certified neuroradiologists with more than 10 years of experience."
K203517	Saige-Q	DeepHealth, Inc.	"Each case was reviewed by two independent expert radiologists (and an adjudicator if discordance was observed) to establish the reference standard for each case."
K203696	RBknee	Radiobotics ApS	"Ground truth grading for Kellgren Lawrence grades, as well as osteophyte, sclerosis and joint space narrowing grades according to the OARSI (Osteoarthritis Research Society International) guidelines, and measurements of the minimum joint space width was established by two physicians following adjudication procedures with a third reviewer for discrepancies."
K210085	HealthCCSng	Zebra Medical Vision Ltd.	"Ground truth category was determined by the majority agreement of two of three radiologists."
K210187	Overjet Dental Assist	Overjet, Inc.	"These measurements were then adjudicated by two US Dental Radiologists".
K210237	CINA CHEST	Avicenna.AI	"Device sensitivities and specificities were compared to ground truth established by concurrence of several US-board-certified radiologist readers."
K211452	Encevis	Austrian Institute of Technology GmbH	"An event was considered as "true seizure" only if the time interval of two out of three reviewers overlapped by at least 1 second. A seizure epoch was then defined as the overlapping time range of two reviewers." "An event was considered as "true spike" only if the time interval of two out of three reviewers overlapped." "the 3D-coordinates of the electrode which is next to the spike maximum averaged over reviewers was used." "Annotations had to be consistent between both reviewers to be used in the sensitivity and specificity measurement." "The detection performance was analyzed for consensus annotations of the two reviewers. The consensus annotations only include annotation segments where both reviewers showed the same decision about Burst Suppression pattern."
K211803	HealthPPT	Zebra Medical Vision Ltd.	"The validation data set was truthed (ground truth) by three US Board-Certified Radiologists (truthers)."
K211841	MRI Planner	Spectronic Medical AB	"Manual delineations were generated by two expert truthers using the consensus approach, based on US clinical guidelines."
K212519	Overjet Caries Assist	Overjet, Inc.	"Ground truth was established by the consensus labels of three US licensed dentists, and non-consensus labels were adjudicated by a Dental Radiologist."
K212758	Autoplaque	Cedars-Sinai Medical Center: AIM	"Ground truthing was performed by two cardiologists with one additional highly experienced radiologist to resolve discrepancies."
K213155	RT-Mind-AI	MedMind Technology Co., Ltd.	"Ground truthing of each image was generated from the consensus of at least three licensed physicians."
K213272	Formus Hip	Formus Labs, Ltd	"A third senior radiologist reviewed each pair of segmentation and selected the most accurate segmentation which was the final manually segmented mesh."
K213360	SleepCheckRx	ResApp Health	"The clinical study used a binomial endpoint comparing the presence and severity of OSA, by using sleep sounds captured and analyzed by the SleepCheckRx algorithm and comparing them to a simultaneous PSG diagnosis (gold standard). PSG diagnosis was established by independent scorers, in accordance with the Type II (in-home) American Academy of Sleep Medicine (AASM) 2017 Guidelines. Each sleep study was scored by a qualified independent sleep scorer."
K213409	ZEUS System (Zio Watch)	iRhythm Technologies, Inc.	"The ECG-based preliminary findings in the Zio Watch Transmission Reports are quality reviewed by Certified Cardiographic Technicians (CCTs) prior to publishing."
K213519	Rune Labs Tremor Transducer System	Rune Labs, Inc.	"The choreiform movement score (CMS) was calculated from sensor data in the pilot study and compared to dyskinesia ratings from three MDS-certified experts during multiple MDS-UPDRS assessments."
K213566	ClearRead Xray Pneumothorax	Riverain Technologies, Inc.	"The final image label and associated annotations were derived from a majority voting rule, where the associated annotation bounding boxes were replaced with a single box that enclosed all bounding boxes."
K213686	SKOUT Software	Iterative Scopes Inc.	"Ground truth was defined as data reviewed and either validated or created by expert gastroenterologists through a process referred to as gastroenterologist review. During gastroenterologist review, experts reviewed and either validated, rejected new labels post primary annotation."
K213706	AI-Rad Companion Brain MR	Siemens Healthcare GmBh	"For each test dataset, the three initial annotations are annotated by three different in-house annotators. Then, each initial annotation is reviewed by the in-house reviewer. Afterwards, each initial annotation is reviewed by the referred clinical expert. The clinical expert reviews and corrects the initial annotation of the WMH according to the annotation protocol."
K213721	BriefCase	Aidoc Medical, Ltd.	"Ground truthing was performed by two US Board-certified radiologists and a third one to resolve inconsistencies."
K213794	Eko Murmur Analysis Software (EMAS)	Eko Devices, Inc.	"All recordings were annotated by multiple cardiologists in respect to their quality and the presence of any murmur." and "Ground truth for murmur classification was obtained via pairing cardiologist annotations with gold standard echocardiogram."
K213941	Annalise Enterprise CXR Triage Pneumothorax	Annalise-AI	"To determine the ground truth, each deidentified CXR case was annotated in a blinded fashion by at least two American Board of Radiology (ABR)-certified and protocol-trained radiologists (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement."
K213944	HealthOST	NanoxAI Ltd.	"Ground truth measurements were determined by the three US board-certified radiologists."
K213986	CerebralGo Plus	Yukun (Beijing) Technology Co., Ltd	"When the two radiologists conflicted, the third radiologist would arbitrate and generate the reference standard."
K214043	BriefCase	Aidoc Medical, Ltd.	"Ground truthing was performed by two radiologists with an additional third radiologist to resolve inconsistencies."
K220034	NEUROShield	In-Med Prognostics L3C	"This ground truth was combined into one tracing per case by the STAPLE (Simultaneous Truth and Performance Level Estimation) algorithm. The STAPLE-derived ground truth was then compared with segmentation provided by each radiologist and statistical tests were performed to ensure the validity of ground truth."
K220105	Saige-Dx	DeepHealth, Inc.	"For exams where there were discrepancies between the two truther's assessment of density, lesion type, and/or lesion location, a third truther served as the adjudicator."
K220164	Rayvolve	AZmed SAS	"Each case had been previously evaluated by a panel of three US board-certified MSK radiologists to provide ground truth binary labeling indicating the presence or absence of fracture and the localization information for fractures."
K220349	TeraRecon Neuro	TeraRecon, Inc	"The evaluator was asked to confirm through qualitative assessment that the generated maps of TeraRecon Neuro are at least 85% substantially equivalent or better than the predicate and reference devices."
K220408	AVIEW RT ACS	Coreline Soft Co.,Ltd	"Second, segmentation results generated by 1 expert are sequentially edited by 2 experts. In the editing process, the first expert makes corrections, and the result is received by another expert completes the gold standard by finalizing it. This process was performed by a panel of three radiation oncology physicians' experiences."
K220437	Neurophet AQUA	NEUROPHET, Inc.	"Ground-truth data were initially generated using FreeSurfer (General Hospital Corporation, Boston, MA, USA, version 6.0) and verified and corrected by four radiologists."
K220497	CoLumbo	Smart Soft Healthcare AD	"The standalone software performance assessment study compared the CoLumbo software outputs without any editing by a radiologist to the ground truth defined by 3 radiologists on segmentations and measurements. ... The per-pixel majority opinion of the three (3) radiologists established the ground truth for each segmented tissue. Similarly, each radiologist used a commercial software tool to produce a standard set of areal, angular and linear measurements. The ground truth measurements were established by taking the median of three radiologists' measurements."
K220499	Rapid PE Triage and Notification (PETN)	iSchemaView Inc.	"Final performance validation included 306 CTPA cases with ground truth established by 3 experts using a 2:3 confirmation."
K220709	BriefCase	Aidoc Medical, Ltd.	"The study compared the software's performance to the ground truth, as determined by 3 expert US board certified Neurologists reviewers, using majority voting."
K220815	BrainInsight	Hyperfine, Inc.	"Ground truth for midline shift was determined based on the average shift distance of all annotators." "Ground truth for segmentation is calculated using Simultaneous Truth and Performance Level Estimation (STAPLE)."
K220928	Overjet Calculus Assist	Overjet Inc.	"Ground truth was established by the consensus labels of three US-licensed dentists, and non-consensus labels were adjudicated by an oral radiologist."
K220940	EchoPAC Software Only, EchoPAC Plug-in	GE Medical Systems Ultrasound and Primary Care Diagnostics,	"For all datasets, two certified cardiologists performed manual delineation, then reviewed the annotations for each other. A consensus reading was first done whereby the two cardiologists discussed if they agreed on or not. A panel of experienced experts further reviewed annotations that the two cardiologists could not agree on."
K221241	DrAid for Radiology v1	VinBrain Joint Stock Company	"This data set was truthed by a panel of 3 US board certified radiologists."
K221305	AI-Rad Companion Organs RT	Siemens Medical Solutions USA, Inc	"adjudication was done."
K221330	BriefCase	Aidoc Medical, Ltd.	"the ground truth as determined by 2 out of 3 majority voting senior board-certified radiologists."
K221449	Genius AI Detection 2.0	Hologic, Inc.	"The truth was verified by another MQSA-qualified, board-certified radiologist to ensure accuracy and consistency."
K221552	EFAI ChestSuite XR Pneumothorax Assessment System	Ever Fortune AI Co., Ltd.	"The reference standard (ground truth) was generated by the majority agreement between the three board-certified radiologists."
K221592	AVIEW Lung Nodule CAD	Coreline Soft Co.,Ltd.	"Three dedicated chest radiologists with at least ten years of experience determined the ground truth using a dataset of 151 Chest CTs with 103 negative controls and 48 cases with one or more lung nodules."
K221716	CINA	AVICENNA.AI	"Device sensitivities and specificities were compared to ground truth established by concurrence of three US-board-certified neuroradiologist readers."
K221868	QOCA image Smart CXR Image Processing System	Quanta Computer Inc.	"The dataset was truthed by three radiologists."
K221921	DTX Studio Clinic 3.0	Nobel Biocare AB	"The dataset of 452 adult IOR images was 'ground-truthed by a group of 10 dental practitioners followed by an additional expert review.'"
K222054	Denti.AI Auto-Chart	Denti.AI Technology Inc.	"The GT was established with the help of two experienced dental hygienists with an experienced dentist reviewing cases of disagreement."
K222076	EFAI ChestSuite XR Pleural Effusion Assessment System	Ever Fortune.AI Co., Ltd.	"Three US board-certified radiologists determined the presence of pleural effusion in each case independently. The majority agreement was used as the reference standard (ground truth)."
K222179	Annalise Enterprise CXR Triage Trauma	Annalise-AI Pty Ltd	"To determine the ground truth, each deidentified CXR case was annotated in a blinded fashion by ABR-certified and protocol trained radiologists (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement for the primary finding."
K222268	Annalise Enterprise CXR Triage Trauma	Annalise-AI Pty Ltd	"To determine the ground truth, each deidentified CXR case was annotated in a blinded fashion by at least two ABR-certified and protocol-trained radiologists (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement."
K222275	Saige-Density	DeepHealth, Inc.	"Ground truth was established for each case as the consensus of five expert radiologists' breast density categories on the same set of cases, and calculated as the median of the reported categories for each case."
K222361	AI-Rad Companion (Musculoskeletal)	Siemens Medical Solutions USA, Inc.	"For outliers, a third annotation was blindly provided by one of the radiologist who had not annotated before. The ground truth was generated by the average of the two most concordant measurements. For all other cases, the two annotations were used as ground truth."
K222692	BriefCase	Aidoc Medical, Ltd.	"ground truth as determined by three senior board-certified radiologists"
K222745	Axial3D Insight	Axial Medical Printing Limited	"all cases were scored within the acceptance criteria of 1 or 2a [1]."
K222746	Overjet Caries Assist	Overjet, Inc.	"Standalone performance of the OCA device was compared to a ground truth established by consensus of labels of three US licensed dentists, and non-consensus labels were adjudicated by an oral radiologist."
K222781	Augmento	Deeptek Medical Imaging Private Limited	"Replicability was demonstrated by measurements made by two readers in twelve independent X-ray scans...These measurements were compared to the angles measured using Augmento. Two statistical analysis tests: the equivalence test and T-test were used."
K223240	Annalise Enterprise CTB Triage Trauma	Annalise-AI Pty Ltd	"To determine the ground truth, each deidentified case was annotated in a blinded fashion by at least two ABR-certified and protocol-trained neuroradiologists (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement."
K223268	BrainInsight	Hyperfine, Inc.	"Ground truth for midline shift was determined based on the average shift distance of all annotators." "Ground truth for segmentation is calculated using Simultaneous Truth and Performance Level Estimation (STAPLE)."
K223296	Videa Perio Assist	VideaHealth, Inc.	No response
K223347	UltraSight AI Guidance	UltraSight Inc.	"The clips acquired during those scans were reviewed by a panel of 5 expert cardiologists blinded to whether the clip was acquired by a non-expert user or a sonographer and to each other's evaluations." and "Assessment of intra-cardiologists' variability using Cohen's kappa coefficient (k) was assessed on a randomly selected 10% of the examinations on which a repeated assessment was performed."
K223396	Rapid RV/LV	iSchema View Inc.	"Final performance validation included 124 CTPA cases with ground truth established by 3 experts."
K223443	Viz AAA	Viz. ai, Inc.	"Sensitivity and specificity were calculated for the image database, comparing Viz AAA's output to ground truth as established by trained radiologists with fellowship in vascular radiology."
K223490	FlightPlan for Embolization	GE Medical Systems SCS	"For vessel extraction, the ground truth was produced by the consensus of 3 board certified radiologists."
K223491	Critical Care Suite with Pneumothorax Detection AI Algorithm, Critical Care Suite 2.1, Critical Care Suite	GE Medical Systems, LLC	"The reference standard was established by three blinded radiologists."
K223502	MR Diffusion Perfusion Mismatch V1.0	Olea Medical	"the qualitative assessment allowed an US board-certified neuroradiologist to conclude that all parametric maps were substantially equivalent." "the appraisal performed by an US board-certified neuroradiologist led to the conclusion that Volume 1 was visually equivalent for all 30 cases" "the visual inspection performed by an US board-certified neuroradiologist led to the conclusion that Volume 2 was equivalent for all 30 cases."
K223623	SubtleMR (2.3.x)	Subtle Medical Inc.	"Based upon the results of this testing, the SubtleMR performance was determined to be substantially equivalent to the predicate device."
K223646	IB Lab LAMA	IB Lab GmbH	"If any pair of assessments differs by more than the threshold defined in the Test-Plan, the respective leg was consensus read by the two truthers in order to establish a reliable ground truth."
K223757	Bonelogic	Disior Ltd	"adjudication was done."
K223774	Contour ProtégéAI	MIM Software Inc.	"The initial seqmentations were then reviewed and corrected by a radiation oncologist against the same standards and quidelines. Qualified staff at MIM Software (M.D. or licensed dosimetrists) then performed a final review and correction."
K230020	BriefCase	Aidoc Medical, Ltd.	"The study compared the software's performance to the ground truth, as determined by three senior board-certified radiologists, using majority voting."
K230039	uOmnispace	Shanghai United Imaging Healthcare Co., Ltd	"annotators will refine the first round annotation, they will check each other's annotation. At last, a senior clinical specialist will check and modify annotations to make sure the ground truth correct."
K230074	Rapid Aneurysm Triage and Notification	iSchemaView Inc.	"Final performance validation included 266 (151 pos, 115 neg) CTA cases with ground truth established by 3 experts."
K230082	Auto Segmentation	GE Medical Systems, LLC	"Ground truth annotations were established following RTOG and DAHANCA clinical guidelines manually by three independent, qualified radiotherapy practitioners."
K230144	Denti.AI Detect	Denti.AI Technology, Inc.	"Ground truthing was performed by three independent dentists with the consensus rule applied to establish final reference standard." "Ground truthing was performed by three independent dentists with majority rule applied to establish final reference standard."
K230209	Sonix Health	Ontact Health Co., Ltd.	"The ground truth annotation for the test was performed by two experienced sonographers with a Registered Diagnostic Cardiac Sonographer (RDCS) certification. The annotation was supervised by two experienced cardiologists and the consensus annotation was used as the final ground truth."
K230497	Bladder AI (AIBV01)	Exo Inc	"The ground truth for bladder volume (reference data) was obtained as the average bladder volume measurement among three expert clinicians."
K230534	BriefCase-Quantification	Aidoc Medical, Ltd.	"Aidoc conducted a retrospective, blinded, multicenter study with the BriefCase-Quantification software to evaluate the software's performance in providing maximum axial diameter measurements of the abdominal aorta in CT images in 160 cases, from 6 US-based clinical sites, both academic and community centers, compared to the ground truth, as determined by three US board-certified radiologists."
K230685	AutoContour Model RADAC V3	Radformation, Inc.	"Ground truthing of each test data set were generated manually using consensus (NRG/RTOG) guidelines as appropriate by three clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist."
K230899	qXR-PTX-PE	Qure.ai Technologies	"The ground truth was established by 3 ABR thoracic radiologists with a minimum of 10 years of experience."
K231001	DeepTek CXR Analyzer v1.0	DeepTek Medical Imaging Pvt Ltd	"The ground truth (GT) label for the presence or absence of ROI for each category was defined as the majority opinion of 2 out of the 3 the radiologists."
K231025	EFAI NeuroSuite CT ICH Assessment System	Ever Fortune.AI Co., Ltd.	"The presence of ICH in each case was determined independently by three U.S. board-certified neuroradiologists, and the reference standard (ground truth) was generated by the majority agreement between the three experts."
K231094	Annalise Enterprise CTB Triage-OH	Annalise-AI Pty Ltd	“To determine the ground truth, each deidentified case was annotated in a blinded fashion by at least two ABR-certified and protocol-trained neuroradiologists (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement.”
K231130	TumorSight Viz	SimBioSys, Inc.	"In cases where the two radiologists did not agree on whether the segmentation was appropriate, a third radiologist provided an additional opinion and established a ground truth by majority consensus."
K231324	DASI Dimensions (V1.0)	DASI Simulations	"The reference standard was derived from 2 qualified truthing each CTA, whose measurements were averaged for each case. If there was a significant variance between the initial two truthers, an adjudicator was involved."
K231355	Aurora	EnsoData	"For an event to be officially scored or reported, a consensus of at least two-thirds among the scorers was required."
K231384	Annalise Enterprise CTB Triage Trauma	Annalise-AI Pty Ltd.	"To determine the ground truth, each deidentified case was annotated in a blinded fashion by at least two ABR-certified and protocol-trained neuroradiologists (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement."
K231396	CEPHX- Cephalometric Analysis Software	Orca Dental AI LTD	"The study design involved the comparison of 21 clinically significant landmarks detected automatically by the Al algorithm to the manually detected landmarks by the three orthodontic specialists, with a margin of up to 2.0mm considered "pass" and a margin above this range considered "fail"."
K231631	BriefCase-Quantification	Aidoc Medical, Ltd.	"In cases where the reviewers disagree on the level of CAC, the senior US board-certified radiologist provided a final opinion which has established the ground truth."
K231678	Overjet Periapical Radiolucency Assist	Overjet, Inc	"The consensus reference standard established by 3 endodontists."
K231683	inHEART Models	inHEART, SAS	"In order to use this as a ground truth, two external experts evaluated the concordance of the manual segmentations for the task in which the use of this software is inscribed."
K231690	iCAS-LV	HighRAD Ltd.	"The ground truthing process involved two experienced radiologists, one of whom is US board-certified, independently identifying and delineating liver metastases in abdominal ceCT scans. A third senior radiologist reviewed and compared their findings, with the final lesion delineations validated or modified by the third radiologist being considered as the Ground Truth for the study."
K231767	Annalise Enterprise CTB Triage Trauma	Annalise-AI Pty Ltd	"To determine the ground truth, each deidentified case was annotated in a blinded fashion by at least two ABR-certified and protocol-trained neuroradiologists (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement."
K231779	REMI AI Discrete Detection Module	Epitel, Inc.	"Consensus ground truth electrographic seizure negative determinations were made using the wired EEG records when at least 2 of 3 members identified the presence or absence of an electrographic seizure event."
K231805	qXR-LN	Qure.ai Technologies	"The standalone study was performed to compare qXR-LN's performance against a ground truth determined by 5 ABR certified ground truthers. They read the Chest X-rays with the accompanying CT scans and reports and the ground truth was based on the nodules visible on the Chest Xray."
K231837	Brainomix 360 Triage LVO	Brainomix Limited	"To determine the ground truth, each case was reviewed by two ABR-certified neuroradiologists (ground truthers), with a consensus determined by a third ground truther in the event of disagreement."
K231871	Radify Triage	Envisionit Deep AI Ltd	"The ground truth was established by 3 board-certified ABR (USA) radiologists with a minimum of 11 years of experience."
K232083	BriefCase-Quantification	Aidoc Medical, Ltd.	"Aidoc conducted a retrospective, blinded, multicenter, study with the BriefCase-Quantification software to evaluate the software's performance in providing adequate measurements of the midline shift in non-contrast head CT images in 284 cases from 228 unique patients from 6 US-based clinical sites, both academic and community centers, compared to the ground truth, as determined by three neuroradiologists, who independently measured the midline shift, the reference standard was created as the mean of all three measurements."
K232096	Transpara Density 1.0.0	Screenpoint Medical B.V.	"Ties in the panel majority-vote were resolved by taking the majority vote of the three most experienced radiologists in the panel."
K232237	Tyto Insights for Wheeze Detection	Tyto Care Ltd.	"To establish the ground truth, all of the recordings were read by three blinded experienced Pulmonologists at random, the binary ground truth was determined by majority vote of these three Pulmonologists."
K232305	AI-Rad Companion Brain MR	Siemens Medical Solutions U.S.A.	"For each dataset, three sets of ground truth of white matter hyperintensity changes between two time points are annotated manually. Each set is annotated by a disjoint group of annotator, reviewer, and clinical expert, with the expert randomly assigned per case to minimize annotation bias."
K232331	InVision Precision LVEF (LVEF)	InVision Medical Technology Corporation	"The primary success criterion was that the subject device would produce an ejection fraction number with a Root Mean Square Deviation below a set threshold as compared to the reference ground truth EF as well as Dice score above a set threshold compared to the consensus annotation of three cardiologists."
K232384	Videa Dental Assist	VideaHealth, Inc.	"US licensed dentists labeled the data and a US licensed dentist adjudicated those labels to establish a reference standard for the study."
K232410	SmartChest	Milvue	"The presence or absence of pneumothorax and pleural effusion was established by three ABR-certified radiologists with a minimum of 5 years of experience in cardiologists independently interpreted each case and the third radiologist independently reviewed the cases where there was disagreement between the first two. The final reference standard was determined by majority consensus."
K232431	syngo.CT Brain Hemorrhage	Siemens Medical Solutions USA, Inc.	"The performance of the syngo.CT Brain Hemorrhage device has been va alone performance study. Sensitivity and specificity of syngo.CT Brain Hemorrhage in processing of noncontrast head CT have been analyzed by comparison to a ground truth established by majority read of 3 US board certified neuroradiologists with more than 10 years of experience."
K232436	Rapid SDH	iSchemaView, Inc.	"Truth was established using three (3) expert neuro-radiologists."
K232501	AI Platform (AIP001)	Exo Inc	"The ground truth for ejection fraction (reference data) was obtained as the average ejection fraction measurement of three experts." "The ground truth of the presence of A-line was determined by consensus of two or more experts." "The ground truth of B-line counts was determined as the average of B-line counts from three experts."
K232751	BriefCase-Triage	Aidoc Medical, Ltd.	"The study compared the software's performance to the ground truth, as determined by three senior boardcertified radiologists, using majority voting."
K232928	DeepContour (V1.0)	Wisdom Technologies., Inc.	"a third qualified internal staff member available to adjudicate if needed."
K233176	uOmnispace.MI	Shanghai United Imaging Healthcare Co., Ltd.	"For ground truth annotations in spine labeling: 'Finally, a senior clinical specialist will check and modify annotations to make sure the ground truth correct.'" "For ground truth annotations in rib labeling: 'At last, a senior clinical specialist will check and modify annotations to make sure the ground truth correct.'"
K233186	uOmnispace.MR	Shanghai United Imaging Healthcare Co., Ltd.	"If there is a disagreement, a consensus between the experts was done."
K233196	Medihub Prostate	JLK Inc.	"The ground truthing was conducted by expert-level radiologists. They independently annotated the prostate images, and these annotations were then consolidated into a definitive ground truth through a majority rule approach. The rationale for employing consensus among three radiologists, resolved through discussion and mutual agreement in cases of ties, ensures a reliable and unbiased representation of the prostate, crucial for the accurate clinical performance evaluation of our device."
K233209	uOmnispace.CT	Shanghai United Imaging Healthcare Co., Ltd.	"After the first round of annotation, they will check each other's annotation. Finally, all ground truth are evaluated by two licensed physicians with U.S. credentials." "Finally, all ground truth are evaluated by two licensed physicians with U.S. credentials."
K233247	Heuron ICH	Heuron Co., Ltd.	"The ground truth was determined by the two US board-certified neuroradiologists (truthers) interpretating each NCCT images, and in case of disagreement between the two truthers, a third truther reviewed the case for generating the final ground truth."
K233438	SleepStageML	Beacon Biosignals, Inc.	"SleepStageML software performance was evaluated against the expert consensus sleep stages that were constructed using 2/3 majority scoring (i.e., the stage per epoch where at least 2 of the 3 experts agree)."
K233549	Tempus ECG-AF	Tempus AI, Inc.	"Each clinical site contributed >1000 patient records, from which the AF status of each patient was determined based on duplicate manual chart review."
K233590	Overjet Charting Assist	Overjet, Inc	"The results were compared to a robust consensus reference standard established by trained dentists via majority pixel voting."
K233618	Oxevision Sleep Device	Oxehealth Limited	"Reference PSG measurements were assessed and scored (in accordance with the American Academy of Sleep Medicine Manual for the Scoring of Sleep and Associated Events version 2.6 of January 2020) by three trained sleep physiologists, blinded to the video data collected by the standard off-theshelf camera." "Oxevision video data was reviewed and annotated (to obtain a reference standard) for periods of bed occupancy by two reviewers, blinded to the algorithm development details."
K233753	AI-Rad Companion (Pulmonary)	Siemens Healthcare GmbH	"In case of disagreement a third radiologist (9 years of experience) served as an adjudicator."
K233968	CINA-iPE	Avicenna.AI	"Device Sensitivity [95% Cl] and Specificity [95% Cl] were computed against the groundtruth established by consensus of three US-board-certified expert radiologists."
K233998	TRAQinform IQ	AIQ Global, Inc.	"with blinded or otherwise neutral adjudication regarding interpretation/classification source."
K234042	EFAI Bonesuite XR Bone Age Pro Assessment System (BAP-XR-100)	Ever Fortune.AI Co., Ltd.	"The study design measured the performance of EFAI BAPXR against the ground truth (GT) from four U.S. board-certified expert radiologists. As shown in the following figure A) for the ground truthing workflow, the ground truthing was generated through the truthing process based on the current standard of care, with the addition of multiple checkpoints to ensure consistency and consensus among all readers reviewing the radiographs when comparing them to the Greulich-Pyle Atlas."
K234141	AISAP Cardio V1.0	Aisap	"Any discrepancies were interpreted by a third ground truth cardiologist ("2+1" annotation strategy). Any persistent disagreements were decided at a meeting of the three ground truth cardiologists."
K240003	Velmeni for Dentists (V4D)	Velmeni Inc.	"Standalone performance was compared to ground truth established by consensus labels of three US licensed dentists, and nonconsensus labels were adjudicated by an oral radiologist."
K240094	LumiNE US; Lumi	Augmedit B.V.	"The U.S data was individually truthed by 3 U.S. based neurosurgeons with relevant experience including fellowships. The definitive US ground truth test set was established by mutual agreement after internal discussion and signed off per scan per truther."
K240291	EFAI CARDIOSUITE CTA ACUTE AORTIC SYNDROME ASSESSMENT SYSTEM	Ever Fortune.AI, Co., Ltd.	"The presence of AD or IMH in each case was determined independently by three U.S. board-certified radiologists, and the reference standard (ground truth) was generated by the majority agreement between the three experts."
K240301	MammoScreen® (3)	Therapixel	"The study applied a fully crossed design, so that each case was red by each reader both with and without the aid of MammoScreen 3."
K240411	uAI Portal	Shanghai United Imaging Intelligence Co., Ltd.	"During the ground truthing process, two Chinese radiologists, each with at least 5 years of clinical experience, independently annotated vessel mask for each patient case, resulting in two sets of annotations per case. Both radiologists are hospital employees and are independent from United Imaging. After completion, an American Board-Certified Radiology adjudicator with at least 10 years of clinical experience reviewed both sets of segmented images. Based on his assessment, the adjudicator selected the most accurate segmentation set as the final ground truth. If needed, he would make any necessary modification until a satisfactory ground truth was established for the study."
K240555	Tyto Insights for Crackles Detection	Tyto Care Ltd.	"To establish the ground truth, all the recordings were read by three blinded experienced Pulmonologists at random, the binary ground truth was determined by a majority vote of these three Pulmonologists."
K240612	CINA-VCF	Avicenna.AI	"Device Area Under the Receiver Operating Characteristic curve (ROC AUC) was computed against the ground truth established by consensus of three US-board-certified expert radiologists, as the primary endpoint, in accordance with the established required technical method under the QFM product code."
K240642	SMART Bun-Yo-Matic CT	Disior Ltd	"Based on the majority vote of three, two same responses were required to establish a ground truth on each of the DICOM series."
K240697	See-Mode Augmented Reporting Tool, Thyroid (SMART-T)	See-Mode Technologies Pte. Ltd.	"The ground truth labels for localisation, ACR TI-RADS lexicon descriptors, and TI-RADS level agreement were based on the labels of two expert US-board certified radiologists and an adjudicator (also US-board certified radiologist with the most years of experience)."
K240712	icobrain aria	icometrix NV	"Ground truth obtained via a consensus of 3 experts."
K240736	SMART Bun-Yo-Matic X-Ray	Disior Ltd	"The ground truth for the testing data was established by 2 (2) clinicians with over five (5) years of experience practicing medicine. Each clinician was given the same image data to review dorsoplantar and lateral x-ray images. Each clinician then marks on a spreadsheet the presence of the bone in the image."
K240740	qCT LN Quant	Qure.ai Technologies	"Ground Truth was established by three expert radiologists. The truthers independently read the scans and mark out the boundaries of the nodule in all slices"
K240791	ADAS 3D	Adas3D Medical S.L	"Ground truth annotations were generated using the FDA-cleared ADAS 3D software by two clinical experts independent of the clinical experts who established the ground truth of the training dataset."
K240845	Rayvolve	AZmed SAS	"Each case had been previously evaluated by a panel of three US board-certified MSK radiologists to provide ground truth binary labeling the presence or absence of fracture and the localization information for fractures."
K240901	Stethophone	Sparrow Acoustics Inc.	"Each recording in a testing dataset was annotated by multiple expert cardiologists. Annotation of each recording included determining the presence of a heart murmur of any type and providing timings of all S1 and S2 heart sounds that were audible in the recording."
K240942	CINA-CSpine	Avicenna.AI	"Device Sensitivity [95% Cl] and Specificity [95% Cl] were computed against the ground truth established by consensus of three US-board-certified expert radiologists."
K240993	encevis (2.1)	AIT Austrian Institute of Technology GmbH	"An event was considered as 'true seizure' only if the time interval of two out of three reviewers overlapped by at least 1 second. A seizure epoch was the overlapping time range of two reviewers." "An event was considered as 'true spike' only if the time interval of two out of three reviewers overlapped." "Annotations had to be consistent between both reviewers to be used in the sensitivity and specificity measurement"
K241009	PeriCALM Patterns 3.0	PeriGen, Inc.	"To resolve the inter-observer variation, a majority opinion approach was used."
K241038	Cardiac CT Function Software Application	Circle Cardiovascular Imaging	"Compared to a reference standard established from three expert readers, the ML-based model is capable of segmenting the LV cavity with less than 10% difference in MAE, a Dice coefficient above 86%, a HD below 9.5 mm, and an EF bias of 1.3% with a 95% confidence interval of [-12, 14]."
K241108	RemedyLogic AI MRI Lumbar Spine Reader	Remedy Logic Inc.	"For the segmentation, each radiologist used a specialized pixel labeling tool to independently label the pixels of the tissues at the predetermined levels of the preselected axial and sagittal slices. The per-pixel majority opinion of the five (5) radiologists established the ground truth for each anatomical structure. Specially, if at least 3 of the 5 radiologists labeled a pixel as belonging to a particular anatomical structure, the pixel was included. Otherwise, the pixel was excluded."
K241112	BriefCase-Quantification	Aidoc Medical, Ltd.	"determined by three US board-certified radiologists."
K241211	CoLumbo	Smart Soft Healthcare	"the ground truth was defined by 3 radiologists"
K241232	Galen™ Second Read™	Ibex Medical Analytics Ltd.	"the GT determination for a slide was performed by two independent expert pathologists; slides where the pathologists disagreed, a third independent expert pathologist was asked to review the slide and the majority rule determined the GT for the slide."
K241380	FETOLY-HEART	Diagnoly	"Images in which the pair of annotators disagreed were reviewed by an adjudicator, who made the final decision." "If the overlap was lower or there was a disagreement on the criterion presence, an adjudicator reviewed the boxes. The final decision regarding the presence was based on majority consensus among the adjudicator and annotators. The final decision for the criteria localization was based on the adjudicator's decision to either keep one of the annotator's boxes or draw a new one."
K241390	NeuroMatch	LVIS Corporation	"A reference standard was established for the validation dataset by a panel of three independent EEG trained neurologists who reviewed and annotated the EEG recordings for seizure episodes. Seizures were identified based on a 2 out of 3 majority rule." "A reference standard was established for the validation dataset by a panel of three independent EEG trained neurologists who reviewed and annotated the EEG recordings for spike events. The reference standard for spike is established with majority consensus among the annotating physicians (i.e., consensus of at least 2 out of the 3 physicians)."
K241430	EchoMeasure	iCardio.ai	"Ground truth annotations were established using manual measurements and segmentations performed by experienced clinicians (using the mean of three experienced US-based cardiac sonographers per case to establish the Ground Truth)."
K241440	HealthCCSng	Nano-X AI Ltd.	"The ground truth (Coronary Artery Calcium Category) was determined by the majority agreement of two out of three US board certified radiologists, experienced in identifying coronary calcium on non-gated CT studies."
K241480	JBS-LVO	JLK, Inc.	"In this standalone performance evaluation, each case output from the JBS-LVO device was compared with a ground truth was determined by two ground truthers, with a third ground truther intervening in cases of disagreement. All truthers were US board-certified neuroradiologists."
K241561	MammoScreen BD	Therapixel	"The primary objective was to evaluate the accuracy of MammoScreen BD in assessing the breast density value in terms of agreement between MammoScreen BD and the ground truth (GT) established by consensus among the visual assessment of 5 breast radiologists."
K241593	BoneMetrics (US)	Gleamer SAS	"Any cases with discrepancies exceeding the predetermined threshold were subjected to an adjudication process, where the three experts mutually agreed on a value for the ground truth."
K241696	Ortho AI	Ortho AI LLC	"After the reviews from each blinded surgeon, a final senior-level surgeon adjudicator reviewed the modifications and added further modifications to the segmentations, if necessary."
K241719	NeuroICH	Neurocareai Inc.	"comparing the NeurolCH's output to the ground truth as established by three US board certified Neurologists."
K241725	Better Diagnostics Caries Assist (BDCA) Version 1.0	Better Diagnostics AI Corp	"Ground truth was determined through the consensus of two out of three experienced, licensed dentists, each with over 10 years of professional experience. These dentists examined and labeled dental surfaces, agreeing on the final labels for analysis when at least two dentists identified a surface as carious."
K241727	BriefCase-Triage	Aidoc Medical, Ltd.	"determined by three senior board-certified radiologists"
K241747	Saige-Dx	DeepHealth, Inc	"Briefly, each cancer exam and supporting medical reports were reviewed by two independent truthers, plus an additional adjudicator if needed."
K241923	EFAI Neurosuite CT Midline Shift Assessment System (MLS-CT-100)	Ever Fortune.AI, Co., Ltd.	"The presence of MLS in each case was determined independently by three U.S. board-certified radiologists, and the reference standard (ground truth) was generated by the majority agreement between the three experts."
K242062	1CMR Pro	Mycardium AI Limited	"This was done by 3 independent US based truthers, all with >5 years experience."
K242120	OTOPLAN	Cascination AG	"The ground truth has been established by three qualified surgeons."
K242166	TribusConnect	TribusMed Beheer BV	"Validation of Heart segmentation is performed by 2 US board certified radiologists who qualitatively evaluated the performance."
K242171	TechCare Trauma	Milvue	"The ground-truth was established by American Board of Radiology (ABR)-certified radiologists with a minimum of 5 years of experience since ABR certification. Pediatric and adult cases followed two parallel ground-truthing (GT) pathways: the pediatric cases were annotated by a pediatric GT panel made of three ABR-certified pediatric radiologists and the adult cases by an adult GT panel made of three ABR-certified musculoskeletal (MSK) radiologists independently interpreted each case for the presence or absence of fracture and EJE using the standard clinical definitions of these pathologies. The third radiologist independently reviewed the cases where there was disagreement between the first two. The final reference standard was determined by majority consensus." "...with a third reviewing cases with initial disagreements. The final reference standard was determined by majority consensus."
K242188	ClearRead CT CAC	Riverain Technologies, Inc.	"In total, 491 cases were used in the clinical assessment of the device... as determined by a consensus ground truth review by three radiologists."
K242203	BriefCase-Quantification	Aidoc Medical, Ltd.	"Aidoc conducted a retrospective, blinded, multicenter study with the BriefCase-Quantification software to evaluate the software's performance...compared to the ground truth, as determined by three US board-certified radiologists."
K242292	uAI Easy Triage ICH	Shanghai United Imaging Intelligence Co., Ltd.	"Sensitivity and specificity of uAI Easy Triage ICH in processing of non-contrast head CT have been analyzed by comparison to a ground truth established by majority read of 3 U.S .- board-certified neuroradiologists."
K242342	Fetal EchoScan	BrightHeart	"The reference standard was derived from the dataset through a truthing process in which three pediatric cardiologists assessed the presence or absence of each of the eight findings, and majority voting was used."
K242411	Brainomix 360 e-Lung	Brainomix Limited	"The lung segmentation performance of the updated algorithm was validated through a head-to-head comparison between proposed and predicate devices. The study evaluated the accuracy of the e-Lung lung mask generation compared to a ground truth mask generated from the consensus of three experienced US board certified radiologists, who segmented the lungs following their usual standard of care."
K242437	Smile Dx®	Cube Click, Inc.	"Both devices were evaluated in a multi-reader, multi-case (MRMC) retrospective study with at least 13 US licensed dentists (Smile Dx® had 14 readers). Ground truth was established by the consensus labels of at least three US licensed dentists (the ground truth for Smile Dx®'s study was established by four US licensed dentists)."
K242461	IRISeg	Intuitive Surgical Inc.	"A consensus of three U.S. Board Certified Radiologists was used to resolve discrepancies / reader disagreements during the performance testing of the machine learning model."
K242522	Second Opinion CC	Pearl Inc.	"The ground truth (GT) was established using the consensus approach based on agreement among at least three out of four expert readers."
K242600	Second Opinion Periapical Radiolucency Contours	Pearl Inc.	"The ground truth (GT) was established using the consensus approach based on agreement among at least three out of four expert readers."
K242607	ScanDiags Ortho L-Spine MR-Q	ScanDiags AG	"Consent ground truth for anatomic structure segmentation determined by pixel-based majority opinion between the three radiologists. Consent ground truth for area and distance measurements determined by averaging the measurements of all three readers."
K242729	AutoContour (Model RADAC V4)	Radformation, Inc.	"Ground truthing of each test data set were generated manually using consensus (NRG/RTOG) quidelines as appropriate by three clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist."
K242745	AI-Rad Companion Organs RT	Siemens Healthcare GmbH	"Additionally, a quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist using validated medical image annotation tools."
K242781	cvi42 Software Application	Circle Cardiovascular Imaging Inc.	"The performance of the constrained tissue tracking algorithm was also compared to manual tracking in ES phase by three expert readers."
K242807	HeartFocus (V.1.1.1)	Deski	"When necessary, disagreements were resolved either through direct reconciliation by the 2 experts or by a third expert. The ground truth (or gold standard) was defined from the consensus between the first expert annotator and the expert reviewer(s)."
K242821	EFAI Chestsuite XR Malpositioned ETT Assessment System (ETT-XR-100)	Ever Fortune.AI, Co., Ltd.	"The determination of malpositioned ETT in each case was independently assessed by three U.S. board-certified radiologists, with cases classified as positive for malpositioned ETT. Cases where the ETT was correctly positioned or with no ETT were classified as negative. The reference standard (ground truth) was based on the majority agreement among the three U.S. board-certified radiologists, resulting in 259 positive cases and 681 negative cases (Correctly Positioned ETT: 316, With No ETT: 365)."
K242837	BriefCase-Triage	Aidoc Medical, Ltd.	"compared to the ground truth as determined by three senior board-certified radiologists."
K242925	MR Contour DL	GE HealthCare	"all (3) independently validated ground-truth contours were incorporated in the performance evaluation."
K242994	OncoStudio (OS-01)	OncoSoft. Co., Ltd.	"Ground truth seqmentations were established by three radiation oncologists following international clinical quidelines." "First, the 1 radiation oncologist manually delineated the organs Second, seqmentation results generated by 1 radiation oncologist are sequentially edited and confirmed by 2 radiation oncologists. In this editing"Ground truth seqmentations were established by three radiation oncologists following international clinical quidelines." "First, the 1 radiation oncologist manually delineated the organs Second, seqmentation results generated by 1 radiation oncologist are sequentially edited and confirmed by 2 radiation oncologists. In this editing process, the first radiation oncologist makes corrections, and the corrected results are received and finalized by another radiation oncologist."
K243145	syngo.CT LVO Detection	Siemens Medical Solutions USA, Inc.	"Ground truth was established by two US-board certified neuroradiologists independently assessing the cases. In case of disagrement, adjudication was performed by a third US-board certified neuroradiologists."
K243189	TumorSight Viz	SimBioSys, Inc.	"In cases where the two radiologists did not agree on whether the segmentation was appropriate, a third radiologist provided an additional opinion and established a ground truth by majority consensus."
K243230	Second Opinion® BLE	Pearl Inc.	"measurement differences that were clinically significant required an adjudication. These divergent measurements were then adjudicated by two U.S. Dental Radiologists."
K243234	Second Opinion® CS	Pearl Inc.	"ground truth determined by four experienced dentists achieving consensus (Jaccard index ≥0.4)."
K243239	Lung AI (LAI001)	Exo Inc	"Adjudication, in case of disagreement, was provided by a third expert."
K243294	Brainomix 360 e-ASPECTS	Brainomix Limited	"Ground truth was determined by the consensus of three board-certified US neuroradiologists."
K243341	Genius AI Detection 2.0	Hologic, Inc.	"The ground truthing to evaluate performance metrics including the locations of cancer lesions was done by two MQSA-certified radiologists with over 20 years of experience."
K243350	Rapid Neuro3D	iSchemaView, Inc.	"The primary endpoint, clinical accuracy, as determined by the consensus of up to three clinical experts against the source DICOM images, passed for all RN3D outputs with 99.8% agreement for MIP images, 98.6% agreement for VR images, 100.0% agreement for SSE images, and 100.0% agreement for CPR."
K243363	JLK-ICH	JLK, Inc.	"Each case output from the JLK-ICH device was compared with a ground truth standard determined by two ground truthers, with a third ground truther intervening in cases of disagreement (i.e., 2+1 truther scheme). All truthers were US board-certified neuroradiologists."
K243378	Rapid MLS	iSchemaview Inc.	"Final performance validation included 153 NCCT cases with ground truth established by 3 experts."
K243548	BriefCase-Triage	Aidoc Medical, Ltd.	"The study compared the software's performance to the ground truth, as determined by three senior board-certified radiologists."
K243611	JLK-SDH	JLK, Inc.	"Each case output from the JLK-SDH device was compared with a ground truth standard determined by two ground truthers, with a third truther intervening in cases of disagreement (i.e., 2+1 truther scheme). All truthers were US board-certified neuroradiologists."
K243647	Synapse PACS (7.5)	FUJIFILM Healthcare Americas Corporation	"An initial bone mask was created by certified technologist, then subjected to an independent dual-reader consensus review: two U.S. board-certified radiologists independently evaluated the mask, recorded any discrepancies, and iteratively reconciled them until consensus was achieved. The resulting consensus mask serves as the definitive ground truth for performance testing."
K243685	MammoScreen BD	Therapixel	"The reference standard for breast density value was established by majority rule among the assessment of 5 breast radiologists with at least 10 years of experience in breast imaging interpretation."
K243743	autoSCORE (V 2.0.0)	Holberg EEG AS	"A consensus of three HEs was used as the reference standard for all calculations. Each segment was prepared in two forms: 1. Without any markers placed by autoSCORE v 2.0 for recording level validation 2. With autoSCORE v2.0 markers and their assigned type of abnormality for marker level validation." "A marker was classified as a True Positive (TP) if at least two HEs agreed that it correctly the abnormality type. Conversely, if fewer than two HEs agreed, the marker was considered a False Positive (FP)."
K243769	QFR (3.0)	QFR Solutions bv	"For all of these algorithmic improvements the user is able to review and correct the results before the QFR value is calculated."
K243810	TraumaCad Neo (1.1)	Brainlab Ltd.	"Accuracy of implant presence and 2D landmark detection have been tested against ground-truth annotations done by qualified and trained personnel."
K243851	CHLOE BLAST	Fairtility Ltd.	"The TLI videos were annotated at a frame level with the ground truth of one of the morphokinetic stages and at a video level with blastulation results. Number of pronuclei (PNs) and embryo quality (according to SART) were also annotated to allow subgroup analysis."
K243859	PRAEVAorta®2	Nurea	"The manual measurements performed by these healthcare professionals are referred to as the "ground truth." The measurements performed by these professionals showed no discrepancy greater than 5 mm at the end of the collected data process."
K243863	Opulus™ Lymphoma Precision	Roche Molecular Systems, Inc.	"Reference standard (ground truth) was established using three radiologists/nuclear medicine physicians with expertise in interpreting PET/CT scans from patients with FDG-avid lymphoma. The ground truth for each scan was based on the independent input from three radiologists randomly selected from a pool of nine radiologists."
K250035	Contour ProtégéAI+	MIM Software Inc.	"The initial seqmentations were then reviewed and corrected by radiation oncologists against the same standards and guidelines. Qualified staff at MIM Software (MD or licensed dosimetrists) then performed a final review and correction."
K250221	StrokeSENS ASPECTS Software Application	Circle Cardiovascular Imaging Inc.	"The primary standalone performance assessment was a region-level Clustered ROC Analysis to demonstrate the standalone performance of the ASPECTS device with respect to the expert consensus reference standard."
K250239	NeuroMatch	LVIS Corporation	"Specifically, a clinical study was designed to evaluate the concordance of the SL algorithms and the resected brain areas, following the 510(k) summary of the FDA-cleared device PreOp (K172858). In this study, three US board-certified epileptologists were recruited to independently complete a survey. The physicians were presented with the source localization results of each device, along with normalized post-operative MRIs with distinctive resection regions. They were instructed to first determine the resection region at the sublobar level. They then assessed whether SL output of each device (NeuroMatch: sLORETA on idealized brain model, CURRY: LORETA on idealized brain model, PreOp: sLORETA on individualized brain model) had any overlap with the determined resection region at a sublobar level. For a particular patient, for every device, the physicians responded to a Yes/No question that asked whether there is concordance for the corresponding device."
K250248	BriefCase-Triage	Aidoc Medical, Ltd.	"ground truth [was] determined by three senior board-certified radiologists."
K250686	GyriCalc (Version 1.0.0)	NeuroSpectrum Insights Corp.	"For each brain MRI, the expert used an annotation platform to view the image series and a pre-loaded initialization of 16 subregions of the brain. The expert then reviewed the initial segmentation and edited the segmentations as necessary for accuracy. The segmentations of the 3 experts where then combined to produce a single segmentation using the STAPLE method. Reference measurements (i.e., volume, surface area, and local gyrification index) were derived from the combined segmentation."
K250831	Annalise Enterprise	Annalise-AI	"To determine the ground truth, each deidentified case was annotated in a blinded fashion by at least two ABR-certified and protocol-trained radiologists who interpret chest X-ray as part of regular clinical practice (ground truthers), with consensus determined by two ground truthers and a third ground truther in the event of disagreement."
K251071	Fetal EchoScan (v1.1)	BrightHeart	"The reference standard was derived from the dataset through a truthing process in which three pediatric cardiologists assessed the presence or absence of each of the eight findings, and majority voting was used."
K251151	Rapid CTA 360	iSchemaView	"Final performance validation included 403 CTA cases with ground truth established by 3 experts (2:3 concurrence), all cases are independent of the development data."
K251342	EchoPAC Software Only / EchoPAC Plug-in	GE Medical Systems Ultrasound and Primary Care Diagnostics	"A review panel of five clinical experts provided feedback on the annotations which were corrected (as needed) until a consensus agreement was achieved between the annotators and reviewers."
K251456	BrightHeart View Classifier	BrightHeart	"The reference standard was derived from the dataset through a truthing process in which a sonographer and an MFM specialist with experience in fetal echocardiography determined the presence or absence of standard views on fetal ultrasound images."
K251528	syngo.via MI Workflows; Scenium; syngo MBF	Siemens Medical Solutions USA, Inc.	"In the first analysis conducted, the reference standard used to evaluate the subject device method performance consisted of liver VOI positioning obtained semi-automatically by two expert readers. The subject device algorithm was then compared to the reference standard and shown to yield results in better agreement with semi-automatic evaluation by expert readers compared with the method of placement used in the predicate device."
K251590	Methinks CTA Stroke	Methinks Software S.L.	"Ground truthing was established by two US board certified neuroradiologists that read the cases and a third ground truther in case the two first readers were in disagreement regarding LVO findings. The final ground truth was established based on the majority vote."
K251766	TumorSight Viz	SimBioSys, Inc.	"In cases where the two radiologists did not agree on whether the segmentation was appropriate, a third radiologist provided an additional opinion and established a ground truth by majority consensus."
K251837	Salix Coronary Plaque (V1.0.0)	Artrya Limited	"Discrepancies between the expert readers was resolved by a third independent adjudicator with Level III qualifications or equivalent experience."
K251983	Brainomix 360 Triage Stroke	Brainomix Limited	"Truthing was conducted by consensus of three experienced US board certified neuroradiologists."
K252362	GBrain MRI	Galileo CDS, Inc	"The GBrain MRI segmentation performance was evaluated by comparing the software-derived segmentations to a Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm generated consensus of three expert-labeled segmentations of hyperintensities for volume measurement accuracy, and segmentation overlap agreement. Comparisons to expert segmentations were quantified using OLS Regression, and Dice similarity coefficient (extent of software-derived vs. ground truth overlap). The three expert labeled segmentations were performed by three independent US board certified, experienced neuroradiologists."

Definitive Guide to AI/ML SaMD Ground Truthing

The Problem 🔗

About the Author 🔗

Why this matters for our “FDA submission in 3 months” guarantee 🔗

Executive Summary 🔗

Method 1 — Two Readers → Third Adjudicator 🔗

Method 2 — Three Asynchronous Reads → Automated Consensus 🔗

Method 3 — Three Asynchronous Reads → Manual Consensus 🔗

Method 4 — Three Synchronous Reads 🔗

Cost, Risk, and Speed Considerations 🔗

Seven Guiding Principles 🔗

1) Prioritize Objective Diagnosis Over Expert Consensus 🔗

Tiers of ground truth for SaMD validation 🔗

Top tier – hard clinical truth 🔗

Middle tier – objective imaging or widely accepted clinical scales 🔗

Bottom tier – expert image interpretation (subjective clinical scales) 🔗

2) Prioritize Asynchronous Workflows Over Synchronous Bottlenecks 🔗

3) Prioritize Algorithmic Consensus Over Manual Consensus 🔗

4) Prioritize simple segmentation → measurement algorithm over complex but ideal ones 🔗

5) Pre‑specify adjudication triggers, thresholds, and “Indeterminate” handling 🔗

6) Use 3-5 readers for ground truthing. More readers for comparative effectiveness MRMC study. 🔗

7) Plan for contingencies and build in speed with risk mitigations into contracts. 🔗

Case Review: Themes, Patterns, and Outliers 🔗

Dominant patterns 🔗

Outliers and special cases 🔗

Examples by Adjudication Strategy 🔗

A. Majority vote (2‑of‑3) 🔗

B. Two readers + third adjudicator (2+1) 🔗

C. Panel consensus 🔗

D. Segmentation consensus—STAPLE / per‑pixel 🔗

E. Measurements—mean/median/”two most concordant” 🔗

F. Time‑series adjudication (EEG/sleep) 🔗

G. External objective diagnosis 🔗

Examples 🔗

Predictive CADx (e.g., 5‑year risk) 🔗

CADe: MSK X‑ray Fracture Detection (Presence/Absence + Localization) 🔗

CADt: PE Triage on CTPA (Case‑Level Triage) 🔗

CADx: Breast Cancer Diagnosis (Objective‑First Truthing) 🔗

Segmentation: Brain WMH on MRI (Mask Consensus via STAPLE) 🔗

Aortic Diameter on CT (Continuous Measurement Truth) 🔗

Measurement‑from‑Segmentation: LV Ejection Fraction (Echo) 🔗

Classification‑from‑Segmentation: Breast Density Categories 🔗

Instance Segmentation: Rib Fracture Instances on CT 🔗

EEG: Seizure/Spike Detection (Temporal Overlap Rules) 🔗

Dental: Caries Detection + Pixel‑Level Segmentation 🔗

SaMD for Patient‑specific anatomical models for 3D printing 🔗

EEG seizures/spikes (time‑series events) 🔗

Coronary artery calcium (CAC) on CT (including non‑gated chest CT) 🔗

Breast density (BI‑RADS A–D) 🔗

Breast lesion diagnosis / cancer confirmation (CADx) 🔗

Intracranial hemorrhage (ICH)/SDH on head CT (detection/triage) 🔗

Pulmonary embolism (PE) triage on CTPA 🔗

Midline shift (MLS) quantification on head CT 🔗

Aortic aneurysm diameter (abdominal/CT) — quantitative CADm 🔗

Dental caries/periapical radiolucency (tooth or surface level; optional pixel masks) 🔗

Lung nodule detection/segmentation/quantification (CT/CXR) 🔗

Rib fracture detection/localization (X‑ray/CT) 🔗

White‑matter hyperintensities (WMH) segmentation (MRI) 🔗

Sleep staging / sleep physiology (PSG‑anchored and algorithmic staging) 🔗

Conclusion 🔗

References 🔗

Related Articles

Definitive Guide to AI/ML SaMD Acceptance Criteria

Anatomy of an AI/ML SaMD 510(k) with Examples

2025 Year in Review: AI/ML Medical Device 510(k) Clearances

Contents

Let's Talk