Coffee Time: Is FDA Doing a Good Job with AI?

Participants 🔗

Yujan Shrestha - CEO, Partner
J. David Giese - President, Partner

Key Takeaways 🔗

The discussion begins with a focus on defining what it means for the FDA to be successful in its regulatory role, highlighting the importance of clear targets and standards for evaluating its performance.
In an ideal scenario, the regulatory process would be streamlined, requiring minimal effort from companies. Ideally, the process would be automated to confirm device compliance effortlessly.
A key aspect of FDA decisions is the accuracy of risk-benefit assessments, ensuring that devices are approved only when benefits outweigh risks. There is an acknowledgment that some decisions may not always align perfectly with this ideal.
The timing of regulatory approvals is critical. Overly strict regulations can delay necessary treatments, while rushing approvals may compromise patient safety.
The FDA must strike a balance between approving devices quickly to benefit patients and ensuring thorough safety evaluations to prevent harmful products from reaching the market.

Transcript 🔗

Yujan Shrestha: Thanks, everyone, for joining. The topic of today's conversation is about how we think the FDA is doing in terms of regulating AI medical devices. Are they doing a good job? That’s right, David? That was the topic we chose.

David Giese: Do we feel the FDA is doing a good job regulating AI in medical devices? What could they do better?

Yujan Shrestha: Cool. So, the format for this session is that for the first 15-20 minutes or so, we'll chat, give our thoughts on the topic, share any insights, and screen share anything we want to discuss.

And while we discuss this, it'd be great if the audience could start typing their questions into the LinkedIn channel. At the end of the conversation, we’ll pick out the questions we have and try our best to answer them. So, with no further ado, let's get started with a quick round of introductions. My name is Yujan Shrestha, CEO and partner at Innolitics. I have a background in biomedical engineering, an M.D., and about 13 years in the medical device space. Our team has 19 full-time employees, many of whom are software engineers with regulatory experience. We also have regulatory consultants and software experts who help our clients build software and achieve FDA clearance.

David Giese: Great. I'm David, also known as D.C. I studied biomedical engineering. Yujan and I actually met at UC Austin a long time ago. I later pursued a Ph.D. at Boston University but left to start Innolitics when I realized I preferred software over hardware. Since then, I’ve worked in software development for about ten years.

Before moving more in the FDA regulatory direction, I also spent some time on regulatory consulting, though these days I’m primarily involved in operations and sales for Innolitics. Excited to be here!

Exploring FDA’s Regulatory Goals and Balance 🔗

Yujan Shrestha: Cool, cool. Let's start out. I think the first step in any scholarly discussion is to define terms, right? It might not be the most interesting part, but it’s essential to ensure we’re all on the same page. So, David, when you ask how well the FDA is doing, how would we define success? How do we set a target for what it means to do a good job?

David Giese: In an ideal regulatory scenario, the process for companies submitting devices, or doing whatever is required, should be as effortless as possible. Ideally, it would require no extra effort—just install an app that reads your code and says you’re cleared.But that would be the ideal, right? Next would be ensuring they make perfect decisions, where the risk-benefit analysis is accurate—clearing devices only when the benefits outweigh the risks, and declining when they don’t. That’s the ideal we’re aiming for, and assessing how close they are to this. I’m trying to think of other parameters beyond just effort and the final risk-benefit assessment, which is always tied to the device’s intended use. For example, the risk-benefit ratio could vary for different indications.

Yujan Shrestha: So, you’re saying it should be low effort with a high-risk benefit and high accuracy in clearing devices based on that.

David Giese: By accuracy, I mean that each decision consistently aligns with the risk-benefit analysis, like a diagnostic test’s sensitivity and specificity. Ideally, when benefits outweigh risks for patients, devices are always cleared, and when they don’t, they aren’t cleared. Right now, that’s not always the case. Some devices are cleared that maybe shouldn’t be, and some aren’t that should be.

Yujan Shrestha: This topic definitely goes deep. If you keep digging, you might hit uncomfortable questions about the value of human life. Another important factor is timing—not too fast, not too slow. If regulation is too strict, devices take too long to get cleared.

That speed aspect also plays a critical role, as both too fast and too slow can harm patients and, ultimately, society. It’s the agency’s job to ensure it maintains a balanced pace. Going too fast could mean unsafe devices make it to market, while too slow means patients may not get the treatment they need. For example, in clinical trials, if a significant benefit is observed early, the trial might be stopped because continuing could harm patients by delaying access to effective treatment. I think this is similar. The EU's Medical Device Regulation (MDR) could be an example of overcorrection. The process is so slow and cumbersome now that many manufacturers are choosing to launch in the U.S. instead, meaning EU patients miss out on timely access to treatments. Patients are clearly suffering due to this. I can’t think of a clear example of the opposite—where things have moved too fast—but perhaps there are cases within the EU's move from MDD to MDR that illustrate this.

David Giese: For new technologies, the FDA’s decision-making relies heavily on clinical evidence. It’s unpredictable, and that’s where the "least burdensome approach" could help—not aiming for a ‘not guilty’ standard, but something manageable and evidence-based.

The "least burdensome" approach, in theory, would mean zero effort or burden, but without gathering more evidence, perfect accuracy isn’t possible. Balancing speed with accuracy often comes down to deciding how much evidence is needed to make a decision. So, if we’re aiming for high accuracy, we need to account for a variety of factors, as gathering data is essential for these decisions.

Yujan Shrestha: For this discussion, maybe it makes sense to focus on recent FDA measures that attempt to speed up the process. The FDA could easily regulate everything heavily to prevent risk but has moved towards faster pathways to avoid that extreme.

A useful way to frame this conversation is to look at the risks the FDA is taking to accelerate the process—how it’s deviating from that natural state of over-regulation.

David Giese: For example, I think the FDA has done a good job publishing guidance quickly, despite this being new territory.

I’m especially excited about the upcoming API guidance in 2025, as it shows their willingness to provide direction even when standards are evolving. They’ve cleared around 1,000 devices, and we haven’t seen major safety issues with them. So, it seems they’re finding the right balance.

Yujan Shrestha: In recent moves, like the development of the PCCP (Presubmission Consultation and Collaboration Process) and lifecycle guidance, it’s clear they’re taking regulated risks, leaning toward a less burdensome model. They seem to be relying on adverse event data to loosen controls on categories that have proven safe, allowing them to trust manufacturers more. For instance, the PCCP guidance enables manufacturers to manage their own devices, processes, and even post-market activities with more autonomy. This frees the FDA to focus on creating targeted guidance documents and dedicating more attention to high-risk devices.

So, it certainly seems that these are FDA initiatives moving in the right direction. I’d say there’s overregulation, underregulation, and then the “just right” level of regulation.

David Giese: This balance reminds me of Aristotle’s “golden mean”—finding the right balance between extremes. In this case, the challenge is finding that balance in regulation.

Yujan Shrestha: If I can use a physics analogy, it's like asking where a ball would roll without external forces. Over time, I think regulatory bodies, not just the FDA, naturally move from a less conservative stance to a more conservative one. It’s almost like a shift from high potential energy (underregulation) to low potential energy (overregulation). Adding new regulations can happen gradually and naturally, but it takes a concerted effort and some risk-taking to scale back.

David Giese: I think we see this in our own projects, where things usually go well, but when they don’t, we add new controls. Over time, these controls accumulate, and things can become unwieldy. Good intentions guide each control, but eventually, all those safeguards together can create excessive complexity. I think the federal government can experience the same challenge.

Returning to the FDA’s recent clearance of a thousand devices, there haven’t been significant adverse events. But could that be because these devices aren’t widely used yet? Out of a thousand clearances, only a subset might be actively used in clinical practice.

Yujan Shrestha: Additionally, adverse events may not be reported accurately, which is something to consider. It brings us back to the fundamental question of benefit-risk analysis. If we’re aiming to maximize benefits and minimize harm but struggle to measure those outcomes accurately, then maybe that’s the question we should focus on.

I believe the FDA is indeed focusing on that question. There's a strong emphasis on post-market surveillance of these algorithms to ensure that data drift doesn’t lead to adverse impacts. From my interactions with the agency, it appears there’s a concerted effort to detect adverse events and respond effectively. The FDA seems to recognize the importance of post-market monitoring to gauge real-world use in clinical settings and detect any underreporting of adverse events.

Let's switch over to some questions. Our first question is from Fathom (apologies if I mispronounce your name). They ask whether we're going to put the FDA's approach to regulating AI in perspective alongside EU and India regulations.

David Giese: At Wicked Analytics, we primarily focus on the U.S. FDA, although we occasionally handle projects for CE marking. However, I don’t have as much experience with EU regulations and therefore can’t speak in detail about that.

Yujan Shrestha: Next, we have a question from Angela: "What is the best regulatory pathway if I already have a SaMD cleared by the FDA, specifically for digital surgical planning, and want to add additional AI tools?" Great question! This situation comes up frequently. Adding new AI indications could be pursued as a separate SaMD, especially if you already have a hardware device that you’re looking to augment with AI. A good example is the Apple Watch series, which has two separate SaMD AI indications, each cleared independently. This approach could allow for a more lightweight application while decoupling the AI functionality from your main SaMD, which may be less dynamic. By separating the AI SaMD, you gain flexibility for retraining or adding a PCCP specifically for the AI component, and discussions can remain focused on AI performance rather than the entire system’s infrastructure that’s already been cleared.

David Giese: A further consideration would be the integration of AI tools with the surgical planning, especially in terms of the user interface. If these tools are tightly coupled, it might be challenging to separate them. A clear, well-defined interface is essential if you're planning to modularize new AI tools, along with a strong configuration management system to control versions and monitor data flow between the AI and your existing software.

If the AI and surgical planning tools are tightly coupled at the user interface layer, it can indeed be challenging—or even impossible in some cases—to decouple them cleanly.

Yujan Shrestha: This setup can require re-architecting your existing, cleared product to accommodate a separate SaMD for AI. Essentially, you’d need to maintain a more modular setup, where the AI SaMD can operate somewhat independently, ideally using standardized data formats like DICOM or other common protocols to ensure seamless integration across devices. This modular approach, while potentially beneficial, could demand restructuring in software design, so definitely bring your software engineering team into these discussions early. Aligning your regulatory, business, and software architecture strategies will be crucial here.

To your next question, regarding FDA’s push toward ISO 13485, the FDA is indeed moving toward harmonizing its regulations with ISO 13485, moving away from the current Quality System Regulation (QSR).

David Giese: The FDA is rolling out what’s called the Quality System Management Regulation (QSMR), which will require compliance with ISO 13485. This transition period is expected to span roughly a year to a year and a half, during which companies will be expected to shift their quality systems to this international standard.

Yujan Shrestha: For AI’s impact on risk analysis frameworks, such as ISO 14971, the principles of risk management remain fundamentally the same. While AI does introduce unique risks, particularly with emerging technologies that rely on foundational models, alarms, and real-time data analysis, the existing framework of 14971 still holds. You’ll need to account for AI-specific failure modes, like potential inaccuracies, misinterpretations, or complete malfunctions, but these fit within the broader structure of risk management. Essentially, the core process hasn’t changed—though you’ll be expanding your analysis to include new potential failure sequences specific to AI behaviors.

I think that really boils down to considering different modes of potential failure, especially with the advent of AI and its application to risk assessment.

Integrating AI Risk Management with ISO Standards 🔗

David Giese: I'd also recommend looking into AAMITIR 34971, a consensus report on integrating AI risk management with ISO 4971, as it's worth checking out. I agree with everything you're saying, Yujan.

Yujan Shrestha: The last question concerns whether ISO or IEC 6304 has been adapted to AI for lifecycle software management. David, would you like to address this?

David Giese: Sure. I'd say no, as there are many unique aspects of AI lifecycle management not covered in IEC 6304. There are multiple standards in development for this, but none seem widely adopted yet. Yujan and I were both at the Admin Conference last week, and Troy, head of the Digital Health Center of Excellence, mentioned that the industry isn't gravitating toward any single AI standard. That's been our experience, too. The guidance document expected next year should reflect the FDA's stance on AI lifecycle management. In cybersecurity, ISO 81001-5-1 is the standard that industries rally around, much like 6304 for cybersecurity. Hopefully, a similar standard will emerge for AI, but it doesn’t seem to be there yet.

Determining When to File for a New Submission 🔗

Yujan Shrestha: Regarding the next question: Under what conditions is a change considered minor enough that a new submission isn’t needed? For instance, retraining the AI model or updating the patch level for cybersecurity. For retraining the AI model, there’s guidance on whether to file a new submission, often based on your interpretation of what qualifies as a significant change. In my opinion, retraining the AI model might not necessitate a new submission, but there’s a PCCP option, with an example submitted by Overjet, specifically for retraining, that’s worth investigating. David, could you discuss the cybersecurity part?

David Giese: Before moving on, I’ll add that risk tolerance plays a role here, with larger companies likely being more cautious and using PCCPs more frequently. We recently hosted a 10x talk on our website covering this topic, with links to FDA guidance documents that may be helpful. Cybersecurity in this context can be nebulous, but we address some specific scenarios and provide a transcription, which may be useful to review.

For updates that patch vulnerabilities, the FDA has been clear that submissions are not needed. However, if you're making significant refactors in the codebase that alter its structure considerably, you might consider submitting, depending on your level of caution. Generally, though, cybersecurity-related changes don’t require a new submission, but specifics can vary.

Using Postmarket Surveillance Data for Validation 🔗

Yujan Shrestha: The FDA's guidance on this is intentionally open-ended because there are numerous edge cases that must be assessed individually. I'd recommend reading the examples at the end of the FDA's guidance document—they help clarify what the FDA considers a significant change. I often read guidance documents from the examples upward, as it sets a clear context for interpreting the rest. Moving to validation, AI’s characteristic of learning and improving outcomes presents unique challenges. Could EMS data or postmarket surveillance data (PMS) be used in validation?

David Giese: If the AI model isn’t adaptive—meaning it's trained, validated, and then deployed without updates—FDA typically allows the reuse of validation data, as long as the development team doesn’t rely on it excessively, which can risk overfitting.

Yujan Shrestha: For adaptive models, postmarket data should be monitored to ensure performance. Customer feedback can reveal issues, as poor performance may lead to subscription losses, underscoring the importance of continuous monitoring.

For continuous learning AI models that retrain in the field, a Predetermined Change Control Plan (PCCP) is highly recommended. Frequent updates in such models can make it impractical to rerun verification and validation (V&V) every time, potentially leading to overfitting on test data. Postmarket surveillance data can be an effective solution here, as it represents real-world usage, matching the intended audience and patient demographics more closely than pre-market data.

Using postmarket data also allows monitoring of the model’s actual performance in practice, which is beneficial for adaptive algorithms that continually adjust.

David Giese: David also mentioned a relevant FDA paper from a few years back on re-using test datasets for adaptive algorithms, which discusses overfitting risks when repeatedly testing on the same dataset. It’s worth a read for anyone working in this area.

Yujan Shrestha: Regarding reusing clinical data for a different version in a new 510(k) submission, it’s technically possible, but caution is required to avoid overfitting. Running validation multiple times on the same dataset can inadvertently lead to overfitting, even with good training/test set separation. Ideally, testing is done only once per release. To prevent unintended bias, separating the engineering team from the test data and maintaining a log of dataset use is recommended, ensuring thorough documentation if the FDA inquires about testing frequency.

This aligns with recent trends in data validation practices; for example, in the machine learning competition platform Kaggle, a similar phenomenon occurs with public leaderboards. Top-ranking models on public test sets often perform poorly on private ones due to overfitting, which is a lesson in the risks of excessive tuning.

David Giese: Finally, on the FDA’s stance towards continuous learning algorithms: as of now, there are ongoing discussions, and more guidance on AI lifecycle management is expected by 2025, likely covering best practices for adaptive algorithms in healthcare applications.

Yujan Shrestha: Yes, there are additional regulatory requirements for continuous learning algorithms, especially around postmarket surveillance, as continuous updates in AI models increase the need for oversight. A PCCP (Predetermined Change Control Plan) can help, enabling a structured approach for managing and documenting updates in a way that ensures safety and compliance without needing to file for each minor adjustment. The PCCP was partly designed to accommodate these continuously learning algorithms, as well as applications like antibiotic susceptibility testing.

However, continuous learning AI models have yet to see widespread release. Business challenges and regulatory hurdles, alongside limited market demand, likely contribute to the hesitation. Developing and deploying these algorithms is complex and may not bring enough commercial benefits to justify the effort. The added regulatory scrutiny, risk, and market challenges make traditional, static AI models more appealing to most sponsors, who may prefer a more established regulatory path over forging new ground.

As for PCCP timelines, they are somewhat flexible. The guidance leaves the cadence of model updates—whether daily, weekly, or another timeframe—up to the sponsor.

David Giese: However, while an open-ended approach to updates is permissible, the frequency should be manageable from both a technical and operational standpoint. A nightly update, for example, could be feasible for some applications, but excessive frequency (e.g., microsecond updates) may be unrealistic from both a regulatory and a business perspective.

Yujan Shrestha: On the question about reusing a Software as a Medical Device (SaMD) cleared by the FDA for surgical planning, it depends on how the original intended use is defined. If the new use aligns with the original intended use, it might be suitable for a "letter to file." However, if it represents a new intended use or indication, a traditional 510(k) or special 510(k) may be required, as these submissions offer a safer path for handling changes in intended use.

Lastly, for in vitro diagnostics (IVDs) in the U.S., the regulatory pathway typically includes premarket notification (510(k)), premarket approval (PMA), or De Novo classifications, depending on the risk class. This is often the domain of specialized regulatory teams, as the requirements for IVDs can be quite distinct from general software and medical device pathways.

Adoption Challenges for AI and ML SaMDs in Healthcare 🔗

David Giese:It sounds like there’s significant interest in integrating AI and ML SaMDs into healthcare, but adoption has been slower than expected despite increasing FDA clearances. While there isn’t hard data immediately available, the main challenge appears to be the gap between those purchasing the technology and the end users, often clinicians. This disconnect means that purchasing decisions may not always align with what clinicians find most valuable in their workflows. Moreover, reimbursement structures in the U.S. healthcare system don’t always incentivize efficiency gains alone, which makes it challenging for AI technologies to demonstrate a clear return on investment (ROI) unless they can directly tie back to cost savings or improved patient outcomes.

Yujan Shrestha: On the cybersecurity front, medical devices and healthcare applications face typical cloud-based threats, such as malware, misconfigured deployments, and authentication vulnerabilities. Effective mitigation strategies include keeping dependencies updated, robust authentication, and proactive monitoring (e.g., using services like AWS GuardDuty). Ensuring data protection by securing log files and other sensitive assets is also essential.

The Nvidia and AI Doc collaboration you mentioned might signal a push toward addressing integration issues and fostering adoption by creating more resilient infrastructure. In addition, reimbursement challenges have also been cited as a barrier to adoption, as some AI applications struggle to show direct financial benefits that align with payer incentives, especially in a multi-payer system like the U.S.

David Giese: If you’d like to dive deeper into any specific topics—whether it's cybersecurity threats or value propositions in healthcare AI—I'm here to help!

It sounds like there was a good wrap-up on both cybersecurity and broader device safety. Firmware update mechanisms, as you mentioned, have become a key area of concern, especially since compromised updates can introduce vulnerabilities at a foundational level. This type of threat likely sparked recent FDA attention on cybersecurity protocols for medical devices. Implementing robust firmware signing, secure distribution channels, and verification processes are some key ways to mitigate these risks, though, as you said, it’s a large topic in itself.

Yujan Shrestha: Thanks for sharing this insightful discussion! If you have any specific follow-ups, particularly on cybersecurity strategies for medical devices or best practices for AI in healthcare, feel free to reach out.

Software Development

FDA Regulatory Consulting

FDA Cybersecurity

Initial Regulatory Assessment

Regulatory Strategy

Real 510(k) Example

Fast 510(k)

Guided 510(k)

QMS Implementation & Support

Coffee Time: Is FDA Doing a Good Job with AI?

Participants 🔗

Key Takeaways 🔗

Transcript 🔗

Exploring FDA’s Regulatory Goals and Balance 🔗

Integrating AI Risk Management with ISO Standards 🔗

Determining When to File for a New Submission 🔗

Using Postmarket Surveillance Data for Validation 🔗

Adoption Challenges for AI and ML SaMDs in Healthcare 🔗

Coffee Time: Is FDA Doing a Good Job with AI?

Participants 🔗

Key Takeaways 🔗

Transcript 🔗

Exploring FDA’s Regulatory Goals and Balance 🔗

Integrating AI Risk Management with ISO Standards 🔗

Determining When to File for a New Submission 🔗

Using Postmarket Surveillance Data for Validation 🔗

Adoption Challenges for AI and ML SaMDs in Healthcare 🔗

Medtech Insider Insights