10x Coffee Talk: Review of FDA's Latest Draft AI Guidance

 February 14, 2025
SHARE ON

RegulatorySoftware

Participants 🔗

  1. J. David Giese – Partner
  2. Bimba Shrestha – Software Engineer
  3. Ethan Ulrich – Software Engineer, AI/ML
  4. JP Centeno – Software Engineer, AI/ML
  5. Jim Luker – Senior Regulatory Affairs Project Manager
  6. Joshua Tzucker – Senior Software Engineer
  7. Kris Huang – Senior Software Engineer
  8. Matt Hancock – Software Engineer
  9. Reece Stevens – Director of Engineering

5 Key Takeaways 🔗

  1. FDA’s AI Guidance Focuses on the Total Product Lifecycle (TPLC)
    • The FDA emphasizes a Total Product Lifecycle (TPLC) approach to AI in medical devices, addressing development, validation, risk assessment, and post-market monitoring.
  2. Cybersecurity is a Major Concern in AI-Enabled Devices
    • The discussion highlights AI-specific cyber threats such as data poisoning, model inversion, and adversarial attacks, which could compromise AI models used in medical devices.
  3. Regulatory Compliance Requires Robust Security Measures
    • The FDA guidance suggests penetration testing, fuzz testing, and encryption to ensure AI models are secure and resistant to cyber threats.
  4. Challenges of AI-Specific Cyber Threats in the Medical Industry
    • Participants debated whether real-world cases of AI cyberattacks have been observed in the medical device industry, with some arguing that concerns may be overstated or theoretical.
  5. Industry Needs to Provide Feedback on FDA’s Guidelines
    • The discussion concludes with a call for industry feedback on FDA’s draft guidance, suggesting that companies engage in the public comment process to shape future AI regulations.

Transcript 🔗

Introduction to FDA’s Draft Guidance on AI in Medical Devices 🔗

Matt Hancock: So the FDA released this draft guidance on medical devices containing AI functionality earlier this year. So I thought it would be good to spend some time going over it and looking at some of the sections in, in more detail. So this diagram here, it's actually taken from a blog post. It's kind of showing the AI lifecycle, but it's kind of a lot of what the documents about and how these relate, these parts of the AI lifecycle relate to the overall product lifecycle. So I thought I would start by looking at kind of how what the FDA's activities in AI have been kind of recently to sort of contextualize ourselves and then give like just to kind of an overview of the guidance itself, and then look at the cybersecurity section in more detail. And then I think that'll hopefully lead to some sort of questions and discussions and opinions and things like that.

Overview of FDA’s AI Activities and Regulatory Timeline 🔗

Matt Hancock: Is this big enough for you all to see? I don't know if it matters all that much, but just in general.

Reece Stevens: I can see it.

Matt Hancock: Yeah. So I was looking at the FDA's page where they have a lot of what they've done and what they've done related to AI. And I've looked at the page before and kind of delved into the different resources, but I put it together into more of a timeline. And what I noticed is, they've kind of spent the last, I guess, 2019, six years ago now, I can say last few years, but it's been longer than that. The last like 5 or 6 years. Looks like they've been focused on how you can change your device after it's been cleared or how do you propose how you'll do that when you do your practical type in case submission or whatever. So we know that now is a predetermined change control plan. So this is one of their first things back in 2019. And then there was draft guidance on that later on. And then eventually just last December, the actual guidance on that. And then in between there, there's just been kind of like a smattering of just kind of general thoughts, I think, like what they're how they're kind of thinking about how people should develop AI.

Key Recommendations in the FDA AI Guidance Document 🔗

Matt Hancock: So these are like these guiding principle documents and things like that. So what does it actually contain? The recent guidance. It's a lot of recommendations for what submissions, 510(k) sumbissions, De Novos, or whatever should contain if your device has AI functionality and they actually have this term artificial intelligence enabled device software functions. So if your device has anything like that, this is a guidance document saying what your submission could contain, lots of recommendations and they have some guiding principles that they kind of have been talking about a lot lately, which is a focus on just the overall product lifecycle and I think it was actually a reorg to and I think not FDA but maybe higher and I forget. But there's a reorg kind of restructuring their organization based on this view of looking at things this way, as with the total product lifecycle. So this is kind of like just a way of I thought just to kind of break it down visually in terms of the different aspects of a product's life cycle.

Total Product Lifecycle Approach in AI-Enabled Medical Devices 🔗

Matt Hancock: So there's development and then these are all kind of related to AI to risk assessment of course, describe and develop your model, how you manage the data related to training the model, things like that. You know, in the early part of the product life cycle, how you validate it, how you manage the data related to that, how you describe it, how you present what the model does, how it achieves what it does, all those things to users. In this guidance document have exceptional model cards, which I think is used a lot in industry, but kind of making its way into a recommendation and how to present things about a model. And they also have some guidance on what the public submission summary should contain, the device description should contain, and then some things on post-market monitoring and things like that.

Cybersecurity Considerations for AI-Enabled Medical Devices 🔗

Matt Hancock: So yeah, that's going from development all the way to post-market. And at the end they actually can have like the breakdown of what the actual specifics are. So in terms of, in practical terms, how we look at this and how it affects us and how it changes. Like let's say we're trying to build out the documentation for a new software device, and it has an AI function in it. We would look at these things here and go into this section V of this guidance. And we'd see what are the recommendations under this guidance. And then we place them in the submission under this section. And each of these sections, you know, has quite a bit of, has quite a number of recommendations to itself. I didn't look at all of these in detail, I looked at the cybersecurity one or more. So I did look at the performance monitoring one or sorry the performance validation. This is actually incorrect. I'm sorry. This arrow should go here to performance monitoring. There is a note in this section and I didn't see it. I don't know if anybody else saw a note on any of those, any of the other sections, but they say that it's recommended that you have performance monitoring.

But it's only required if your device type has special controls for it. And so if you have like if people aren't aware of this, like for example, if your product type is LLZ, which means you just, you know, move images around just to image processing and interact with the packs, maybe, and don't have I a special control for that product code under the regulations is that you can have a Dicom conformance or conform to Dicom somehow, which. And so for this I wasn't really sure. I don't know if anybody knows, but product type has regulations that specify post-market monitoring. But I was kind of happy to see that, because that seems like it would be kind of a difficult thing to monitor in a lot of cases, at least for these cases. I'm thinking about it.

J. David Giese: But it's interesting, I was thinking about this question the other day, in the context of a possible De Novo project, and I was actually looking for regulations that require post-market surveillance, but I couldn't find any. I didn't do a comprehensive search, so I think it'd be interesting to propose that or a feature. If we do a De Novo and we think monitoring would be good, it seems like the FDA would be happy.

Matt Hancock: Yeah. And so I think I remember seeing some I don't…I've never been involved with De Novo myself. Does that end up creating a new product code and or does it slot into an existing one? Does it create a new regulation to correspond with the product code, or is that why you would propose a special new special controls? Or would you be saying we should add special controls to an existing regulation in that circumstance?

J. David Giese: Yeah. Good question. So the way it works is if you do, you're doing it, you know, mainly because there you one you don't think it's class three, so you don't think it's a PMA. But there's not a predicate that you can predicate on. And so you actually are creating a new regulation which may have one or more product codes because of the product. So you've probably noticed that some regulations only have one product code, but some regulations also have multiple product codes. But ultimately the regulation is really the part that matters the most because that's where the special controls are listed. And but anyway, you would suggest you propose what you think the special controls should be to FDA. They ultimately get to design.

Matt Hancock: Gotcha. And well, yeah. So there's quite a bit I think would be beneficial to add to the templates that we use for expediting our documentation process that would be related to AI based on these, these sections. So getting into cybersecurity, this slide is just kind of saying what they say. It's basically if your device is AI enabled and it meets the definition of a cyber device.

Potential Cyber Threats in AI Medical Devices 🔗

Matt Hancock: There's some extra recommendations here to help meet the standards that are required. So they give some examples here. And I think these might be interesting to talk about. They give examples of potential cyber threats. So data poisoning I didn't actually know what some of these terms meant or I didn't know that these things had terms for them, I should say. So data poisoning I think they're implying this would be like if somebody is able to manipulate data at training time to augment the behavior with model, I can just go through these one by one and I. And the next slide, I have some thoughts on them about how they could potentially be mitigated, and also how these things might come up or not come up and must use cases that we see. But for now, I'll just go through these and just kind of say what they are too so everybody's on the same page. So data poisoning said that model inversion slash stealing that's related to if you're able to if your model is say open on an API endpoint and you can constantly ping that endpoint with different data to kind of collect your own training data to see how that model behaves. And then copy it. Model evasion, this is a term that I hadn't heard before. It's and the description that they had in this guidance document, it was essentially the same as data poisoning. But not in the training scenario but in the inference scenario. So basically somebody is augmenting the silence of that word to use somebody manipulating the data, that's being sent as input to get a prediction.

So if you have a saying like a benign image and you, you're sending it to a, you know, cancer classifier, you don't want somebody to manipulate that such that you get predicted as having cancer. That would be bad, right? Data leakage. This is just like not having your training data be overfitting. A lot of these relate to if you have your model training, in a context where it's pulling its training parameters or it's pulling its data from network sources, which isn't the case a lot of times for us. So overfitting, it's kind of hard to understand why that would be like a cyber threat. But that's like if somebody is like, let's say that your, your model training script lives in the cloud and it's pulling its training configuration from some database, and somebody tampers with that and say, hey, run for a bajillion iterations or something, that makes it overfit. That could be a cyber threat in this context model bios similar kind of thing. And then performance drift. I don't remember exactly how they said that. Yeah. Okay. So performance drift, what the FDA was suggesting was that if you're actually a lot of these didn't really kind of spell it out and I had to make some, some kind of guesses as to what they were thinking was the actual threat. And what I was thinking here is they were suggesting that if somebody just hit your endpoint, your model inference endpoint, or something like that with a bunch of unrelated data, and your training process was just set up so that it was retraining automatically, let's say, without any quality control data that was sent to the model or something like that, that could be a maliciously induced performance drift, let's say.

So that's I think what they were kind of getting at. So yeah, I don't know what people think about these. I had some thoughts here, like data poisoning. A lot of times we do quality control on the training data, and we're not pulling in training data on the fly from network requests. So there's not really like tampering and in transit or tampering at rest that we usually have to worry about. Which isn't to say we wouldn't need to document that. And that's like one of the recommendations that we have controls in place for, say, access controls to the machine or training is happening. And saying that it's not pulling data from background requests, a model stealing. I don't know if a lot of times when any applications that I'm familiar with that the model endpoints themselves are user facing, but it seems like maybe rate limiting could be a mitigation for this encryption, where if you have a web application or a cyber device, you're probably using encrypted communication anyhow. Yeah, overfitting. We don't really have a lot of use cases where we're pulling training job configuration from a database, but something to be aware of, I don't know, did anybody have any thoughts on this or like any other threats like cyber threats related to AI training, ML training or ML inference?

Reece Stevens: Yeah, I had just one note, which is that I was really surprised to see the number of cybersecurity threats that happened at training time, given that it seemed like especially some of the more malicious ones would require. Seems like it would require threat actors who understood your model even better than you did if they were able to, you know, maliciously manipulate predictions in that way. Like it seems unrealistic that it's not unrealistic. It seems like not the most high priority risk factor. So it seems like it's been a lot of time talking about it. I don't know if maybe I'm missing something or if anyone else thought that.

J. David Giese: Yeah 100%. You know, I think it's interesting, like I was reading this risk, the risk benefit guidance, that's for in the context of De Novo’s and one of the things they see in there is they really want you to focus on real risks that have actually happened. And then they make this comment and I'm thinking about it in the context of this, like, wow, I've never seen these things happen. It seems really pretty, pretty wild to think like, yeah, they're like inserting training data just to have a backdoor. Like that's pretty cool.

Challenges and Feasibility of AI-Specific Cyber Threats 🔗

Jim Luker: I mean, I was thinking the same thing. If there was more time, I was thinking, go out and look to see if there were any, you know, instances of anything like this actually being reported. It seems kind of odd.

J. David Giese: Yeah. It is. It's like, this is you can sit around and think of all sorts of stuff and I mean it maybe if it's really high severity, you think like, okay, but maybe, maybe we should think about this. But I don't know. I'll say, oh, it feels a little over the top.

Jim Luker: Yeah and sort of.

J. David Giese: I think the comment phase is still open. Maybe this, this endpoint. You can say like if we actually see any of these.

Jim Luker: Where did this come from?

Joshua Tzucker: I don't know about within the medical industry specifically, but data poisoning definitely has happened. There's been some recent incidents with that. And I saw an interesting article on Hackaday that someone did a study of the exact percentage of training data that's necessary to trigger, basically a poisoning incident. And it was much lower than you would think. It was like 0.001%. Because really what we're talking about is like a latent I mean, it's also, I think, would be really hard to control for because it's like introducing a latent bias that's somewhat hidden and hard to discover.

Matt Hancock: Yeah.

Jim Luker: That like affects the weighting of the weights of that. The data set is related to data poisoning or whatever. How would it turn into an issue if the data was poisoned? Incorrect outputs? I mean. Yeah, it's kind of.

Joshua Tzucker: Yeah. I mean.

Jim Luker: Nebulous.

Joshua Tzucker: Yeah. You could have something where it comes back with just something wildly off that because the data was correlated in a certain way in the training data, like I don't know about for medicine, like, I don't know, like it's some diagnostic tool and it's trying to detect or it's like giving advice for like diabetes and it's like chop off their arm or something, you know, like theoretically some of that could be snuck in there.

Matt Hancock: I imagine it'd be easier to make the ML model trained so that it doesn't do its job, rather than training it to like, do something specifically like let's say that you have like a, like a watermark X under all the image data that you put in there. And then like when it goes, when you go to run it production, none of it has that. And you know that it just doesn't work. But still it's like I don't know.

Reece Stevens: The real world examples I've seen of. Data poisoning are like image recognition things, image recognition algorithms where like people make a sticker, where if these stickers anywhere in the image, it'll always say it's a toaster, no matter what's in the image. Like stuff like that. Where you're manipulating the output to a certain fixed result.

Matt Hancock: Yeah, that would be the example. I think of, like, the runtime where it's like you're a model that's already trained and then it's supposed to recognize faces, but or something like that, or stop signs. And because it has a black pixel in one spot, it says it's a squirrel or something.

Kris Huang: There was one example that I've seen where they added a very small amount of very specially crafted noise to an image. So a human looking at the image couldn't tell that there had been any visual. There's like no visible indication that the image had been altered. But like let's say I think the classic example is like a picture of a cat. It's like noise added, specially crafted noise added to the image. And then suddenly the model is actually even more certain that it's something else and not a cat. That's sort of like Reese's example of, like, you know, slapping a sticker on a stop sign and suddenly it's like a 55 mile per hour speed limit sign. But you could easily imagine that sort of happening also at training time, where somebody imperceptibly alters the training data. So even if the researchers went back and looked at all the images themselves, they wouldn't even be able to tell that anything had happened. And then suddenly, you know, their model is just bonkers.

Matt Hancock: Yeah, it's more difficult for me to imagine the training time thing because you need to know so much more about how the model is being trained and what you're, what you're doing to the data, how it's going to affect the model that's produced. But it seems like the runtime thing where you're manipulating these to probably kind of go hand in hand and they're both pretty easily mitigated by rate limiting and encryption to me.

Kris Huang: I agree. The government I think is also reacting to all these like basically China fears like suddenly there's all this awareness that, oh, our telecommunications infrastructure is actually pretty deeply infiltrated. So like they're warning people like don't use sms, you know, use end to end encryption, all these other things. So, you know, it's operationally like within our own personal networks, I'm pretty sure we don't do these things. You know, a lot of this is very tin hatish, I agree. Tin foil hat sort of thing. I'm sorry to interrupt somebody.

Reece Stevens: But I was just going to say, you know, I think the other controls here are at a higher level or I like it more focused on the user needs and requirements. It's like if the model is saying this patient, you need to cut their arm off. Like at some point you might want to have other risk control measures in your software that look at the output of the model and sanity check it with the full context. It's like a model in isolation versus a model in a medical device with more context as to what the prediction means and how it should be used. So there might be some broader things to take into account there, too. I think I might have cut JP off too. Sorry. JP, did you have something you wanted to add?

JP Centeno: Something pretty minor, I guess, assuming that you had the whole automated pipeline that would just retrain automatically on some sort of schedule, like maybe an automatic way of knowing that your data is correct is to hash the whole thing and just make sure the hashes, you know, rehash it before training and then make sure they match and then off it goes. Otherwise, you know, you take a second look at it.

Matt Hancock: Yeah. And I wonder if some of the focus here on a lot of the training time things and in sort of like a lot of these issues are, are these potential issues based on the assumption that you have this kind of automated retraining pipeline? I mean, they don't say that explicitly, but I feel like that's really from what I can tell, a lot of where this could only possibly come up and I wonder if that's just because of the recent PCCP stuff where they maybe anticipate getting more automated retraining pipelines and they want to kind of get ahead of the get ahead of that. I don't know,

Reece Stevens: Do we know of any clients or products that use automated retraining in a medical device context?

J. David Giese: As far as I know, this and this is a comment one of the FDA speakers made sometime last year, I forget where there hasn't been any devices cleared with automatic training that obviously that can be different in the last few months because there's anyone, and by automatic I'm assuming that means like on the device, like real time, like not like automatic, like in the background.

Matt Hancock: So I mean, a lot of the stuff like it makes sense potentially. But it does seem like it's getting kind of burdensome to say for each of these potential threats, you can kind of conjure up that we have mitigations for. I wonder if there's a kind of way to say in your model necessary or training processes…like your training process is not part of a cyber device or something like that, so that you can kind of not have to be responsible for a new bulk of documenting risk controls.

Joshua Tzucker: Or I'm not saying that this is at all likely that this is happening at all at the moment, but couldn't you make the argument that like by the time we get training data, usually it's passed through like multiple people's hands, like we're getting it from a third party vendor, like Radiant Health or like a client is providing it. Usually it's, you know, being sourced from a study and then they're de-identified through these multiple ways. Could you make the argument that there could be a supply chain attack, just like there is on software dependencies that the images have been tampered with?

J. David Giese: That's a cool idea. Like one of these data vendor companies, like if they get hacked and then I'm just spitballing here, but like there's they have some script installed that like, is like I mean, I don't know, it's like sticking by with the pixels to make it work a certain way, I don't know.

Matt Hancock: Yes. I guess that's another, another possible way for data poisoning. I was only thinking about it and the online retraining scenario.

Kris Huang: I think the FDA, you know, is very, I think they're trying to be a very forward looking organization and I think, you know, we think of medical AI as being of a certain scale, but I think they're looking far beyond that. Like, you know, we have open AI, and the scale of data that they're dealing with is so huge that it isn't possible for humans to curate that. And, you know, for data poisoning, it would actually be really trivial to mess that up. Like for example, images, you don't even need to really manipulate anything you could just copy an image a million times and then, you know, you've basically bombed your model training because you have a million of the same case or, you know, with very trivial changes to it or something like that. So I think, I think they're really trying to get ahead of things, not just things you would code, but these are like operational sorts of measures that organizations need to take, like, how do you secure your infrastructure? I like this, the supply chain idea. And then for like performance drift and things like that, it sort of reminds me of a potential PEBCAK situation where, you know, you might be like, oh, hey, let's augment our training in this category because our, our telemetry shows us users are really looking at this particular sort of case, and then they might actually be biasing their unintentionally biasing their models without even knowing it. And, you know, without, for example, actually confirming like, oh yeah, the like for example, for cancer, like, oh yes, the NCI designated cancer centers are indeed seeing this happen rather than just, you know, going by their own sort of surveys. I don't know, I think they're trying to be extremely forward looking here.

Matt Hancock: Yeah, I guess they have pretty high expectations. Right. When you're doing a clinical performance or non-clinical performance study. In terms of saying, hey, we have some data, it's high quality. It has these exclusion, inclusion, exclusion criteria and so forth. And I guess one way to look at this would be like, let's have that not only at the study time, but also when you're doing training, have those same kinds of levels of quality assurance and then also and monitoring. Yeah, just monitoring also.

Reece Stevens: Just thinking a little bit about like, apart from mitigations that you can build around the model and its inference, I'm trying to think about what are mitigations that you could actually like for a data supply chain tampering event? What are the mitigations you actually could even do? Like, especially if it's upstream of you. You know, I'm not exactly sure what that would be except to try and detect anomalies behavior, but kind of by definition, this is behavior you aren't knowledgeable of and can't trigger until some very specific edge case occurs by design. So I'm not like, it's good to think about these things, but I guess I'm also not sure what the mitigation actually could be, apart from like a higher level mitigation surrounding the model, and it outputs in the way they're interpreted.

JP Centeno: I think you have to like medical images, you probably have to have a pretty good sense of what the data looks like, like the example that Chris gave, you know, like how you can inject this, like subtle pixel differences to make the model think about something else. But maybe we had some code that took the histograms of all those images and then you see that all of a sudden, you know, there's like a new peak or something, you know, that, then maybe you can spot it that way. Yeah. I'm not sure.

Ethan Ulrich: Yeah. Adding to that, like an example I think Matt gave where there was like an X in the bottom of an x ray. And maybe that X was associated with patients with cancer. And then the model is when it's trained it recognizes that, hey, if there's this X in this corner, it means cancer. And then your model is biased for that. You know, maybe more sinister. There's this noise pattern in a section of the image. I can't imagine this happening, but maybe a data broker on all their images that are diagnosed with cancer. They put a little noise stamp in every single case. And then you can train on that. You'll never know. You'll find that your model predicts really well, patients with cancer. Maybe one way to figure out if that's happening is there are ways to investigate the features of an image that are contributing to the prediction. I forget the term given to that, but essentially it shows that the, you know, the model based on the model weights, what is contributing to the model's prediction. And if there is something that is just fishy happening, like, you know, it's focusing on the corner of an image like that is evidence that your model's biased. So there are ways to do that. I don't think it's done enough. Probably should be done a little bit more. And again, this is all assuming that you have doubt in the supply chain. So I guess.

Reece Stevens: Maybe in addition to that like redundancy, like don't get all don't get both your training and validation data sets and gradient health. Like having diversity in the data suppliers. Seems like maybe another way to hedge the event. Like maybe there's maybe there's even cases where it wouldn't happen maliciously, but like gradients like data processing systems or something like certain attributes. Cause it's a I can imagine scenarios where systemic bias in your data sets could happen at the provider level. Its conditions are right or certain bugs happened even absent of any malicious intent.

Joshua Tzucker: Yeah, don't think about it. I think unintentional is actually more likely like there is a non medical, but there is an incident with one of the, you know, AI models overfitting on images for it. Shutterstock or one of the Getty Images it was reproducing the Getty Images logo and images, because obviously that's in some of the training data. I could see gradient health or something trying to put a digital watermark on things and that unintentionally getting linked.

Matt Hancock: Yeah. Still like kind of with that I'm like wondering like what are the and like practical like security related controls that you would use for some of the stuff because a lot of like the prospective like data analysis for the potential of snooping like, yeah, I mean, I'm just I guess I'm not really saying anything besides just when you get the data, have it delivered over encrypted channels, and get a checksum like has been suggested, but I'm also wondering, from a security controls point of view, like what can you really do besides trust the sender and demand encryption?

Bimba Shrestha: Yeah, I mean, this isn't strictly a technical mitigation, but I know for OTS software and like cloud service providers, we have vendor validation where there's like a checklist of things that you look at for every single vendor that you interact with. I could imagine, like the list of things that you would check getting longer for it to mitigate against, like a supply chain thing there.

FDA’s Expectations for Cybersecurity Testing & Penetration Testing 🔗

Bimba Shrestha: But yeah, I mean, I feel like that would be things we'd be doing anyway. Like they would just get captured by a regular, I don't know if this guidance necessarily changes what people would do for that.

Matt Hancock: Yeah. Okay. Well I guess this is kind of like a lot of other security risk assessment sorts of things that you do. There's a long list of potential things. But at the end you kind of look at it and say, is this likely? Is this likely enough to even include as part of my assessment and and then is it substantial enough to or is it, you know, you're not supposed to mess with these for likely and security. Is it explainable enough to, to warrant being included in the assessment? And then is the severity high enough to have a control in combination with the explainability? Because a lot of these things might just be like, well, we trust our supplier. And they hand it to me and a thumb drive so I don't have to worry about data poisoning or, you know, something like that.

Anyhow, I'm going to move on. So there were some recommendations. And this guidance in particular, like security use case to use, they're recommending having specific ones for AI potentially some of those I guess could be the same security use case views for not AI functionality. They're just related to network encryption or encryption at rest. The security risk assessment and reports that we make, it would probably be helpful to have some kind of questions to prompt these kinds of thoughts about whether some of the things that we talked about are likely and warrant controls that they're not there. They also said they're pretty concerned about data vulnerability and leakage. I don't know if this is true. We should have default security mitigations for AI devices. Yeah. And then so these were kind of coming from what they specifically were suggesting and this cybersecurity section of the AI document and then we only have ten minutes left and this one could be also I think a longer discussion. But they said this and they would like an explanation regarding how the cybersecurity testing is appropriate that you do for your device to address the risks associated with the model, including fuzz testing and penetration testing, which I know some services, like we are often already do penetration testing, for example, for cyber devices and a lot of the the companies offering those services. Additionally, I think it provides some testing related to ML models, specifically. And I'm just curious what people think about this and what their interpretation is, because it would be kind of I would hate to see an extra like, I don't know, thousands of dollars, just kind of tacked on to every single 510(k). I'm just kind of making up a number in terms of like if there's the additional cost of ML injection testing for as part of the usual penetration testing.

Bimba Shrestha: Guessing I'm not sure. And I mean, I only read the cybersecurity section, but maybe it talks about somewhere else. But I'm guessing when they say penetration testing here, they mostly mean like in the context of where the AI model is deployed, not like fuzz testing the actual or penetration testing the actual, you know, model, or fuzz testing or penetration testing the actual inputs and outputs like trying to get the model to behave in unexpected ways in a vacuum. I'm guessing that's not what they're talking about, because I feel like it wouldn't be that useful. Right. Is that the other people's read? Not as well.

J. David Giese: That would be my read as well. Yeah. Like I'm trying to think of a real use case, but like where if, if somehow the model is like exposed, I don't know, like, like where that model is more directly exposed to. Yeah. Like this is really contrived. But like, let's say you have an API that you can send results through and then it's doing real time training, like you would like to send, I don't know, like you could use the fact that it's doing real time training to like, mess with it. But yeah. Yeah, I don't know. I don't know that.

Kris Huang: I'm thinking if you include like the the large language models in on this. So for example like guardrails, socially we have guardrails that open like ChatGPT or whatever won't tell you how to make a bomb or you know, such things like that. Medically, you know, you could kind of imagine similar things were for example, HIPAA related things like you should only have access to certain patients data. But if you could phrase your question in a certain way and somehow get beyond that, that barrier, then you would have improper access that would that might be an example or that, you know, if you're only supposed to have orders that, you know, have a confirmed diagnosis or something like that, but you are able to phrase your question and fool it into executing or at least entering an order that doesn't make sense, then that that's sort of like I mean, it's a different definition of what penetration is like when compared to what we normally think, but it's basically jailbreaking and, you know, again, I think they're trying to just be really forward looking about what are what is every potential thing that could happen and how can we design guidance for when that eventually does, even if it's not like real soon?

Bimba Shrestha: Yeah. You know, like on one level it's hard to disagree with any of the new quote requirements that like they say, at minimum, the following test, it's hard to disagree with them because it's like, okay, like the more security the better. Who would like to say no to that? But like in practice, I am a little worried when they say at minimum this like, I don't know, there's so many cases where I think first testing and penetration testing with the AI models, it's just yeah, not that. I mean, I also don't see how this, yeah, I don't. I don't know why this is like in a document about AI in particular. Like this is just like their cybersecurity guidance around just a regular software development kind of already captures this type of risk. I don't know, like, yeah.

Reece Stevens: I definitely agree. Like, I don't know why the fact that there is an AI component in the model just so happens to necessitate in testing and fuzz testing, whereas if the same device didn't have AI. Wouldn't like that, like it's almost an implementation. I mean it isn’t an implementation detail. So it seems kind of odd to use that as a trigger for certain kinds of testing.

Matt Hancock: Oh yeah. I agree with the penetration.

Bimba Shrestha: Like it's the ... go ahead man.

Matt Hancock: Yeah. I was just saying like for yeah penetration testing for example. Like you know if there's an API and it just so happens if there's an API endpoint, it just so happens to forward the image on to an ML model. I don't really. And a user is not supposed to have access to the API endpoint given their credentials. I don't see how that's unique to what the API endpoint does. And I'm not I'm not really sure what fuzz testing gets you, though.

Bimba Shrestha: I feel. I mean, like if you like, replace the AI model with an algorithm, just like some complex, you know, just for loops and some mathematical algorithm. I feel like if there were any specific guidelines around, AI would have to be cases that wouldn't apply to the algorithm version, but would apply to AI specifically. Right? I can see there being like, especially if the model can change. I can see there being additional, but I don't know, it sounds like they aren't clearing a lot of those kinds of devices anyway. Most of our devices aren't like that.

Matt Hancock: Yeah. It's like if you're penetration testing, I just really can't think of anything like even if there's something that exposes something about the model, it's like that should be guarded by an admin flag or whatever. It's not really unique.

Closing Discussion on FDA Guidance and Potential Industry Feedback 🔗

Joshua Tzucker: Wouldn’t they…go back to like what Kris was saying about guardrails and like I would think it'd be a different type of penetration test. Like the thing that you're trying to bypass is a different domain.

Kris Huang: I agree. Like if we're thinking about the usual cybersecurity, you could just I mean, it's good to include it in the AI related cybersecurity, but you would just refer to your regular cybersecurity document for those particular sections. And I think they're giving you an out here because it's like an explanation regarding how it's appropriate. So if it's appropriate not to do it because it doesn't make sense, then I think it's perfectly viable to say so and just not do it.

Jim Luker: They mentioned adversarial testing, specific to the model. I wonder if that would be a reasonable additional test.

J. David Giese: Wait, sorry. Can you expand on that a little bit? Yeah.

Jim Luker: In the guidance, it talks also about adversarial testing, which I'm not exactly sure, but it looks like that that can be pointed to challenge the model itself.

Kris Huang: Yeah. The adversarial thing is related to the model evasion where you manipulate the input image and you get some ridiculous answer out. I mean, I did that for those kinds of models, and perhaps that is a reasonable thing to try.

Matt Hancock: Yeah, I think and this this is where like a lot of it kind of is coupled is to do that you either have to be able to test that model tons of images to figure out what an adversarial is, or you have to have some kind of access to details about the model, like let's say it's a an open model pulled off the shelf or something like that, where there's known adversarial and you can somehow discover that fact through an appropriate access.

JP Centeno: To the feature, it looks like something like GE, Siemens and others, you know, have like this watermark, very sophisticated watermark that they only share how to decrypt it with, you know, approve vendors and then that's how you know that your input data is good.

Kris Huang: In the journalism space. There was a similar thing for DSLRs to help verify that the images that news organizations were getting were legitimate. And I think Nikon had like a special model just for that, where it would include in the the Jpeg tags, like some sort of like a fingerprint based on the pixels of the image and, and other things about the camera. I would think that a medical device would do something similar, not actually mess with the image, but use the image to fingerprint something that you could verify rather than mess with the image itself, I wonder?

Bimba Shrestha: Yeah, I'm curious what people think about if our strategy when submitting 510(k)s would change at all. So with the addition of this new little text here, I'm guessing because of my understanding of our current default strategy for most clients is to submit without penetration testing and then let the FDA request it. Go ahead, David.

J. David Giese: I would say I think there's like some cases where you can make a case that it's not as if it's a model running in a Docker container, like checkpoints. But I think it's really unlikely for us to have a web app host in the cloud that you’d be able to not do pen testing. I think just based on what I'm seeing, a number of them, I think very unlikely they would go for that. I guess like since I'm talking already I've a couple other comments. I definitely interpreted this not to mean it would be like you would do pen testing in a case when you wouldn't otherwise. Like I interpreted this more to mean just make sure the scope of your pen testing would cover any risks related to the model. Similar and similar sort of fuzz testing. So not that it would necessarily trigger it, but I also can't think of any real cases where it matters with maybe the really far after case. And yet an AI agent that you would like, try and have it do penetration testing to check that like an AI agent to do something like but that I mean we're so far from that at this point that seems a little a little out there. But I guess that would seem like maybe a case where some of your pen testing approach might be like, you'd be trying to fool the model. That's a little different than maybe the main pen testing, anyway.

Bimba Shrestha: Yeah, yeah. The guardrail thing that Kris and Josh you guys brought up, I think that makes it. That makes sense to me. And I guess that would kind of be expanding the scope of what penetration testing means a little. So I mean that's one thing. But like I don't even I think I can think of examples outside of medical imaging that, you know, like, oh yeah, that yeah. Guardrails for okay. You know, don't say when when you, you know, ask ChatGPT about things like suicide or something, you know, like have guardrails around that. But I'm wondering what specific things I don't know, like image segmentation models. Well like what would the guardrails there look even look like I guess.

Matt Hancock: I think I'm kind of with David at that. Like I mean I think that there's risks that are specific to ML models, like getting in like let's say you have an endpoint that gives you some information about the model or something like that, and then you can go run away with that. But I don't think I can think of risks that are specific to ML models that also aren't covered by penetration testing, and I'm happy to interpret it that way rather than trying to change it, rather than seeing it as like a new penetration testing that we have to do in addition to existing.

Reece Stevens: For some reason, thinking about penetration testing with this AI specific, maybe think of writing a Django authentication middleware that's LLM that acts like a troll under the bridge or something. You have to convince it that you do indeed need to access something. Okay, can you convince the LLM and it's worth it.

Jim Luker: To 42.

Matt Hancock: And there you go. That's the AI specific pen test. The Django troll.

Kris Huang: I just had one last thought that's not related to Pentesting, but all the other things like the secure supply and operational things. You know, we usually think of a problem as being very obvious, like it's calling our patient a panda bear or something like that. But I think actually the far bigger danger is very subtle adulteration of the model that decreases its performance by a little bit. You know, it would be akin to like in a drug manufacturing plant, just, you know, adding a little bit more filler into the pill so that it doesn't actually have 100% of the dose that you intended. It might be 90% or something that's a whole lot less perceptible. I think that's probably the bigger danger. And yeah, one last tie into that, it's sort of related to how viruses spread, like the viruses that are extremely virulent. They don't get very far because they kill their patients way too quickly before they have a chance to spread or do their damage. The ones that are pervasive are the ones that don't kill you. They're the ones who just annoy you and they leave you alive and you don't do anything about them.

Matt Hancock: Well not now. Yeah, I think that I mean, if you read this as if it's just a heap of new documentation that we have to do, that kind of seems like a lot. But that's I don't think the intent I think it's like, you know, consider all these things, there's all these new potentials. And then maybe document that you've considered them, but you don't necessarily have to go and add a bunch of controls for every potential risk that you can cook up, even if it's not likely. So anyhow, yeah, thanks everybody, for the discussion. Oh, and if anybody has good strong opinions and strong thoughts I think maybe we should get them together and maybe submit comments if we feel like there's changes that we might want to make or suggestions that we want to make.

Bimba Shrestha: Awesome topic. That's great. All right. Yeah, I will catch everyone later on the next one.

Matt Hancock: Yeah. Bye

SHARE ON
×

Get To Market Faster

Medtech Insider Insights

Our Medtech tips will help you get safe and effective Medtech software on the market faster. We cover regulatory process, AI/ML, software, cybersecurity, interoperability and more.