10x Coffee Talk: Generative AI and FDA Guidelines

 November 19, 2024
SHARE ON

AI/MLRegulatorySoftware

Participants 🔗

  • Yujan Shrestha - CEO, Partner
  • J. David Giese - President, Partner

Key Takeaways 🔗

  1. GenAI refers to AI models that generate synthetic content by mimicking input data structures, with emphasis on the vast and often untraceable training datasets used in foundation models.
  2. Generative AI tools pose unique risks such as producing false but plausible outputs, necessitating transparency, error detection, and user training.
  3. FDA might prioritize post-market monitoring over exhaustive pre-market evaluations to manage the risks and benefits of rapidly evolving GenAI technologies.
  4. Innovations like virtual scribes demonstrate the potential of GenAI to unlock previously unviable applications, though they introduce risks tied to critical errors, such as in DNR or allergy information.
  5. Regulators and developers need to ensure the provenance of training data and address gaps in understanding large-scale datasets to maintain device efficacy and safety.

Transcript 🔗

Introduction and Webinar Overview 🔗

J. David Giese: Hey Yujan. Good morning.

Yujan Shrestha: Morning. How's it going?

J. David Giese: Hey, it’s going well. Yeah, I just got off a practice webinar run for webinar next week on AI. It's kind of more focused on reimbursement, but.

Yujan Shrestha: Yeah, I'd be very curious to hear what you had to say about that.

J. David Giese: Yeah. Well, I'm on the reimbursement side specifically. I'm mostly just listening. Oh, that would be interesting.

Introduction to GenAI and FDA Meeting 🔗

Yujan Shrestha: Yeah. Yeah, it's really interesting. Yeah. And definitely if you want to make a big difference, like it has to pay for itself, you know, a good product test, have product market fit, you can't just put that as an afterthought, you know. So yeah, I think that's super important. Although not perhaps not the most interesting technically, but definitely impact wise.

It's like a prerequisite to have. Oh, really? Cool. Yeah. Well, I've I've got my coffee in hand. I don't know about you. I don't know about everyone else. Anyone else watching, but, Yeah, we're. So we're good to go. We have the brain juice ready or I'm going to kick things off for. Today's topic is about can AI?

We titled it what is a Gen AI enabled device? But we're going to go into more than just that. As you can you see my screen.

J. David Giese: Yes I can.

Yujan Shrestha: Yep. Right. Yeah. So actually David told me about this Digital Health Advisory Committee meeting that's going on next week. And I'm going in person to this meeting. Yeah, this is my first time. Go to these meetings. I don't know exactly what to expect, but I believe what it is, is that the FDA is getting feedback from the industry on how they should regulate or burdensome advice on what the industry is thinking about this new technology.

And I guess the idea is that, that would inform FDA in their, in their, in like the future guidance. And that's not the book around this topic. So this is publicly available. Anyone can go to this URL and, and this URL. Talk about what this advisory committee is about. What are the specific items that FDA wants to talk about.

And also, if you scroll all the way to the bottom here, you can download some document that goes over what the FDA thinks about it and what the questions they want the public to to answer or help them or provide feedback for them. As we'll see, FDA has done a lot of thought on this already because actually come to us with their thoughts on, Gen AI with my understanding, it's like the first by the way, this is not a guidance document by any means, but it's the first time that I've seen a concrete like statement from FDA telling us what FDA thinks about.

J. David Giese: Yeah, I think that.

Yujan Shrestha: Will be.

J. David Giese: I think that this far as I'm aware of that. I agree it's the only thing and it's pretty good. I, I actually just for full disclosure, I've only read probably two thirds of it. So I had I didn't finish the last part, but but yeah, I think it's a it's a pretty great document. It has some useful terms in it.

Yujan Shrestha: Yeah. Yeah, definitely, definitely. I like to start out with. Yeah. So I guess the, the point of this talk is to, go over FDA questions. That's one question that they laid out for industry about about what the panel discussion is about. So I'm just going to go over some of these. I mean, I guess this like would just be going on these questions and talking about them.

FDA's Definition of GenAI 🔗

Yujan Shrestha: I'll go over kind of my thoughts on some of these, and I'd be very curious to hear what, David thinks about them and also what, and also what you guys think about them as well. So let me pull up this diagram that I've got going on here instead of the panel questions about Gen AI enabled devices. But but before we dive into that, what is Gen AI enabled device.

Well, according to FDA, Gen AI enabled devices are are the med devices that fall under the the definition of a device that uses Gen AI. Then that goes back to, well, what is GenAI and this is what they have to say about that class of AI models that emulate the structure and characteristic of input data in order to generate right synthetic content.

I think actually reading how they use it in the paragraph, I think, kind of clarifies it a bit better for me where they say, yeah, GenAI, can can create new derived synthetic content, and it's usually meant to create new data that resembles the data learned from instead of just identify, patterns. Also, and I think this is important and to me this is probably like the, the clearest definition, I think of what GenAi is.

Importantly, GenAI models are frequently developed on data sets so large that human developers cannot know everything about the data set. Teams do development. I think that's like the clearest definition to me at least. Like like whenever we say, like I tried to better clearly define. Well, what does it mean to be synthetic and that that like put like looks like the data it learn from. Because you could argue that current generation, it creates output that looks like the input. I mean that's like what what what like machine learning is supposed to do. Right? So I couldn't really have a crisp. Right? I couldn't come up with like a crisp definition, like to, to tease the two apart. Well, you have kind of fallen back.

Comparing GenAI and Traditional AI Devices 🔗

J. David Giese: I think Yujan. Can I ask a question? Yeah. So could maybe just. Could you contrast this with just a normal AI enabled device? So, like the AI enabled device that does not incorporate Gen AI, and how does it not meet this definition?

Yujan Shrestha: Yeah. So I think so. Okay. So like a non Gen AI enabled device. Let's say that this is CT presentation algorithm. Right. We have CT. And we're also going to have ground truth, which is going to be the let's say prostate segmentations. Right. So during training like you'll use both of these to train. Right.

And this this applies to any AI, not necessarily just Gen AI. And then like the output of this would be this ground would be prostate segmentations which you could argue like doesn't that resemble, doesn't this resemble what this was trained on. Maybe what they write. Like if you read it on, what is it to that class of AI models that emulate structure, characters and input in order to generate synthetic content?

And then if we're here, mimic a structured characteristics that input data. I guess that's what synthetic content means. It's like it's something that mimics the structure and characteristics of the input data anyway. Like like like like I thought this, this distinction because, like, I know, I guess I, I think I understand when I, when I see it right.

It's like calculating something that's producing something on the input. Right. But then actually try to create a definition around it, like it seems like the current generation, like like I'm not sure how to really qualify, this definition more maybe it's like a probabilistic or it can generate something different, something new every time. I don't know, but I think it needs a little bit more workshopping to really narrow that down.

However, the second part seems a lot more clear to me that more importantly, if if you're gen, if you're if your AI algorithm uses this data set for a larger area, which is what they call which they call foundation models, then that that characterized as Gen AI.

Foundation Models and Data Transparency 🔗

J. David Giese: So in. In the past, like, like for for a while, I'd always heard the distinction between supervised and unsupervised learning. And I feel like unsupervised learning isn't something that I hear talked about a whole lot more. But in a way like, okay, because it seems to me like a lot of the AI products that we work on in the radiology space, it's very clearly supervised.

Right? You have here's the image. And then here's the right answer that's produced by the radiologist usually. And then you try to train it to then get the right answer on. And you do that on the data sets where you know the answer because the radiologist did it. And then the hope is now you give it a new data set that the radiologists haven't looked at, and it gives you the answer.

So you need to have the supervised thing. But but obviously with like ChatGPT and particular, you just have blobs of text and it's just predicting the next, the next. So so would you I mean this I hope I'm not getting off on a tangent here, but would you consider like GenAI supervised or unsupervised?

Yujan Shrestha: I think it's still supervised training because because you have labeled data like labeled data is free. Right. But you know, in like the corpus of text that it was trained on, it does generate its own, supervised data set. They, they call it self-supervised, but, you know, like like how the, but like how the LLM’s are trained is they mask out like the rest of the paragraph and then they, they train the, the AI to then predict the next tokens, but they know what the next token is.

So it's actually not actually not unsupervised. I think it's, I think it's more I think they they call it self-supervised to my understanding, all of these foundation models, I don't want to say all, but I think most of them are, self-supervised in that there's already like a, like a large set of data and they just like masked out text or they masked out an image, or they masked out a frame in a video, and then they just have the transformer AI architecture predict that masked out data.

But it's all still supervised. Yeah. Yeah. So this one's kind of a strange one. Like it seems easier to to do the two apart, but I don't know. I think there's a lot more neural nuance there. I hadn't I don't when we had to think about kind of moving on to the, panel questions. So, so FDA asked please discuss what specific information related to Gen AI should be available FDA in order to validate the efficacy.

Performance Evaluation and Pre-Market Submissions 🔗

Yujan Shrestha: And this is like a, a, group of questions. There's not a question here, but this is just on the pre-market side. They're asking some some questions here. And I just have this diagram here. This is from this from the FDA. They're just looking at the AI portal lifecycle, which includes design development, V&V and then, you know, probably add that right after you're right for there.

Typically we're required to get like a 510 K or at least some sort of pre-market permission is right at this point. And then after that is in the post-market. So and then as I showed here that this is a little like like what once you're done, like you're not going to be done completely with it and you're going to keep iterating on your, on your product, which is the point of this diagram here.

And then this one here. They talk about, they just break it out into, into more, concept on this and they show that, you know, this is the cycle and there's going to be little checks and gates between the two, between all these phases. So anyway, this is what actually means by pre-market and also this what they mean by whole product lifecycle.

Is this what we see? Here, the first question, FDA asked what information would be included as part of the device description or characterization in the pre-market submission by enabled GenAI. And I, for example, when when a human is not in the loop or device with intent on recall versus generate new recommendations, what information is particularly valuable to evaluate the safety and efficacy devices enabled GenAI, compared to non Gen AI.

So like my kind of go to like I have these diagrams that we that we use to describe AI driven devices. And I think it applies to Gen AI and also Non GenAI. And for those who haven't seen that this is also available on on our website we have like data set. These are like training data sets or testing data set. And we have data. These are like individual images or pieces of text or whatever of I think highlighting device output is good and all the external systems that have impact whatnot. And then here in green or non machine learning based algorithms and in Dash are the are the are things that require manual innovations. So this would this will cover like the human in the loop that we're talking about here.

And then in blue is the is anything machine learning related. And in solid blue that is non soup or this is this a machine learning trained in health from scratch or at least and you have full provenance of everything. And then in dash with machine learning that you don't have provenance over, over everything. So this is something like using off the shelf technically or using something off the shelf, like something you've downloaded from GitHub or whatever.

I think the same thing, like what information should be included. I think the same thing should like you should try to do the same exercise, go and figure out what data was used to train the model and how the data flows to your system, how you call like fully lay out all of the machine learning components you you use that you develop both in-house and also off the shelf.

And yeah, I don't know, I could kind of think like, well, this is kind of everything that you used to do. How much different is gen AI other than the fact that you're going to have a much bigger black box with this component? Here we have this external component that's off the shelf that you don't really have control over.

That obviously is a much bigger black box than they used to be. But other than that, I'm not sure if, there's really much inherently different on like the algorithm description side. I think risk as we kind of come down to later, there's there's some differences there. But for the device description, I don't know, I couldn't really think of too many more differences, to be honest on.

J. David Giese: What do you think? About this. So presumably like like, like to make this more concrete, let's think of, like a foundation, like, imaging model where, like, you know, with a lot of the AI models we've done now, we know the data set completely and it's all relevant for the intended use. Right? Like if we're segmenting parts for a sclerosis and a CT like that, we we know all the training data is for that.

But in these foundation models, the data that's going into training them is a lot broader, like it's and it could be for a whole bunch of things that aren't related to the particular intended use of this device. So I would imagine that just describing the training data, like the way you describe it, will have to be kind of different because there's just so much more of it.

And in the past, again, it was all related to the intended use, but now it's for all sorts of things. And so do you have to like go into like, like like it would seems like just a lot more effort will go into describing that. Does that in. Yeah. Is it valid or not. Like do we just pull a whole ton of clinical data like, like, like, do we just go to several academic medical centers and just pull all their data, right.

Yujan Shrestha: Like right.

J. David Giese: I could imagine just the visibility, the transparency in the in the information around that. Maybe different. Well, I don't know. What do you think is that.

Yujan Shrestha: Yeah. Yeah I think I think that that's that's definitely going to be different because it's also like not as known. It's that's definitely like where a lot of different things are for sure is the is this the scale of, of training data. And that that ties into like, like FDA definition of foundation models and also their definition.

GenAI I wouldn't use the foundation models. You know, they say that that like a developer can't possibly, or can't, can't fully understand or something about like the data that was used to to train it. And they also, talk about this in their, in their comments here, where, where they say this, for gen AI enabled device training, this may include information.

And this is important, as reasonably as possible, pertaining to data management and model development for the initial foundation model. So I think that's what I think that's what you're talking about is the data management model development for the initial foundation model. And and here FDA is is is conceding that like this is actually really difficult to get.

And we're not expecting you to, you know, drive up to Facebook and and like ask them to, to give you exactly what data was used to train the model. Like, you know, they're they're, conceding that this is not quite possible, but like, you're right, like, at least you should make a reasonable effort to, figure out what was used there.

Some some some of these foundation models are more open than others. And they, they've, published what types of data used or data that they've used, but others are pretty, closed.

J. David Giese: And like one one thing we've seen for like, when there's companies selling off the shelf commercial off the shelf software and medical device manufacturers is they will provide kind of a lot of information to the companies buying the software. So like, for example, like there's some, real time operating systems that are meant for medical devices and they'll have a lot of like validation documentation that's aligned with how FDA expects it.

So what do you think if if there are companies and we know there are, because we've talked to some of them who are building foundation models for the radiology space for them, like they're not going to be clearing their software or their models directly. In many cases, they'll be selling them to other medical device manufacturers. Like, do you think there will be an equivalent thing where there's like an expectation that, hey, you're going to have to disclose all this kind of validation documentation for like mopped and versus because, like, you're right, like if you're thinking about like Lama or OpenAI for like maybe like those types of things like you're probably not going to get that much visibility. Yeah.

Yujan Shrestha: But yeah, if yeah, if, if I were the one purchasing. But this specific foundation model. Yeah, I would want to make sure that whatever I use it for, that I would be able to commercialize. And therefore probably one step of that is, is getting FDA clearance. And if I can't get it FDA cleared because they're not sharing this information, it seems like that's not a reasonable outcome.

Risks and Usability Challenges of GenAI 🔗

Yujan Shrestha: That's not a favorable outcome for both parties. Right. So I would think that the, people making the the base foundation model would want to make it easy as possible for their, for their customers to get FDA cleared. Therefore, they would want to make that data available. It doesn't need to be all the data, right? Like it doesn't need to be the raw data. But just say that that we got, you know, X amount of CT’s or MR’s from this site. Here's the demographic breakdown of all the data, things like that. I think it is is reasonable to expect and I think advantageous for both the foundation model developer and also the, the, customer.

J. David Giese: So yeah.

Yujan Shrestha: Yeah, I think yeah, I think I think it's a great point.

J. David Giese: It's analogous to like, hey, we're not going to give you the source code, we're just giving you the binary, but we are going to give you like some validation documentation that described the high level, kind of like.

Yujan Shrestha: Yeah, Exactly. All right. I think we get some questions, but but we can, we can just kind of entertain some questions because it might be related to what we've discussed so far. So Rama would like to get a link to the reimbursement webinar. Yes. We can certainly send you that. Also the link to the diagram.

Yes. It's on our website. And I can paste it into the chat, a little bit. So Walter asked, what about the intellectual property concerns an investor would consider in addition to FDA clearance? I'm not sure if I have the the have the, knowledge to really answer this one. I don't know, David. Do you do you have any thoughts on this one?

Intellectual property concern one investor would consider in addition to FDA clearance?

J. David Giese: Yeah, yeah. To be honest, Walter, we're that that's outside of my my space too. Like we're we're typically really focused on the product development and the FDA regulatory side. And then we bleed a little bit in the reimbursement in the bleed a little bit in the IP. But but we're not I wouldn't say really qualified to say a whole lot.

There. You know, assert that specific concerns for foundation models around like the training data that was used in like out there, like I think that's I know, like lots of people are talking about that and thinking about it, but certainly would with know.

Yujan Shrestha: Yeah.

J. David Giese: Typically anything about that. Yeah. That's it seems like the question though.

Yujan Shrestha: And like disappear. How much, how much I don't know about this, but I like I heard there was a court case where like the like I believe the like like there was a court case showing or arguing is the fact that, like, Google scraped all the data from the sites and then trained, trained the foundation model on it, is that is that copyright infringement?

I think the outcome was was no, that it wasn't that the judge ruled in favor of the of the, foundation model, but I again, I'm not a lawyer. I'm not sure if that's what you're that the paper that you're that you're thinking about. Okay, see, there's another question from Walter. How could providing further data and technical details potentially compromised intellectual property protection?

Yeah, again, I'm sorry. I, I can't really I don't really know the answer to that.

J. David Giese: But I think. One thing we could comment on is it certainly like if we're thinking in, in the context of a company developing a foundation model, having to disclose some details to medical device manufacturers that use that model, that sort of disclosure, I would say, is pretty expected. Like I think certainly that there's kind of a status quo and that if you have okay, how to say it. So like FDA has, an off the shelf software guidance where they talk about using off the shelf software that you've bought from a vendor in a medical device, and there's different levels of risk related to using that off the shelf software and at the higher level of risks. One of the things that's required is that you have some visibility into the product development lifecycle of that software, and that typically involves disclosing a certain amount of information about how it was put together.

And I think there's kind of that's kind of an expectation that you'll need to provide more visibility than maybe you would into other industries, into how your software works. And I would think I would imagine there's a similar thing would happen with models of unknown provenance that if you're trying to sell a foundation model, that you're going to just have to expect to disclose more.

But I think whether what the, the intellectual property protections around that are like, if, you know, if you're using it as a trade secret versus a patent like, I'm not, it's I don't really know. And I hopefully that that's at least somewhat useful, but great. Looks like there's another question.

Yujan Shrestha: Yeah. So for, SaMD, the AI SaMD design controls. You typically see manufacturers separating their design traceability matrix into products specific for software and technical requirements, and process and regulatory specific requirements separately or do you typically see, quantitative and qualitative DTM.

J. David Giese: So okay, I can say a few comments on this. So we typically don't include like process specific requirements in our design traceability matrices. We'll often have like for our internal audits of our quality management system. And our processes will have our own kind of traceability matrix, where we trace from regulatory requirements to the processes and evidence that those processes have been followed.

And that's usually operating at a higher level than a particular product. But but yeah, we typically don't mix those because I think the verification step of each of them is a little different in the timing of when you do them is a little different. So I generally don't like to mix them, although I have seen companies do that quite a bit.

And especially if you're a startup and you only have one product, it may just be simpler to do it that way. But I don't know if that answers your question, but.

Yujan Shrestha: Yeah, and and like on the question of quantitative versus qualitative, I, I don't separate them out. And they're both either verification or or a validation. But I don't separate out at least and in separate matrices. Quantitative and qualitative. Yeah. Yeah. Cool. So yeah, thanks for the questions. If you have any more please. We've type them in and we'll answer them later.

The other question FDA had what evidence specific to Gen AI enable device initiative to consider doing pre-market evaluation regarding performance, validation and characteristics of training data during the whole product lifecycle of devices, safe and effective. So again, like this. This just touches on our previous talk about it's not possible to get the, foundation model. And also like Alpha brought up the there's, there's the, there's potentially other legal roadblocks too.

And getting it I think what FDA is asking here is yeah, like like like specific to the data, what can be what can be done. I guess I just try to expand on that previous one, previous point. I think that, like, the one thing I could come up with is some assurance that you're that the training set has not been contaminated with your test set.

And this is definitely a phenomenon seen in, in the large language models and that there's, there's these benchmark that are public. And you'll see over time all the algorithms perform better over time. But then there's benchmarks that are sequestered from the internet. And, and there's a big difference between the, model performance. And you wouldn't expect that.

There's also many other papers published where like, even it's seeming seemingly small just to see the, questions like changing names and changing things that, like a human wouldn't even really notice seems to make a big difference on on performance. So. But yeah. Like what? Like I can think of, like, a simple thing is just to make sure that the test set is not, is not on the internet, and reasonably sure that I guess it wasn't used to train the foundation model.

Yeah, I think that that was like the main thing that I could, I could come up with on this, like how do you what evidence regarding training, training data can you provide specific to GenAI I mean, obviously the the stuff that's not not specific to GenAI I was like, you know, where you train it, it came from that does it probably represent your device and kind of use?

And I mean, all that stuff is still there, but specific to GenAI I don't know, kind of best I could come up with with with like the time that I gave think about it. I don't know David if you have any any if you have anything else that you think should be in this.

J. David Giese: Yeah. I have some more thoughts on the performance evaluation part, but as far as the training data and like understanding, understanding, I don't know. I mean, I do think in general that's like, yeah, ideally we would know more about the foundation models. And it really comes down to our risk benefit analysis of like, well, okay. Yeah, everyone, the FDA, manufacturers, everyone would wish we knew more.

But if the benefits are sufficiently high, like does that outweigh the risk of us not knowing that information and that I I'm sure the later questions will kind of get it. Yeah, yeah, get into that aspect of it.

Yujan Shrestha: Yeah. But yeah. Yeah. And I'll, I'll touch a little bit on that. Like this is what we're talking about here. Pre-Market. But I think pre-market and post-market combine is going to be a very important thing for anyone getting is FDA cleared is, I think, to make that case like, hey, like, we don't want to know much about this other than here's limited pre-market data that we show that, hey, this looks like there's a high likelihood of of a high benefit risk ratio, but we'll follow that up with a promise of post-market data and post-market surveillance.

How like have drug trials were phased in? You know, we say like there's there's a post-market one. And with these two combined that that lowers the risk of getting it cleared now. But yeah, I think I think we'll talk more about that later. Here's another question from FDA. What new and unique risks related to usability may be introduced by generative AI compared to non-Gen AI?

What, if any, specific information relevant to healthcare professionals, patients, and caregivers is needed to be conveyed to help improve transparency or control these risks? So definitely, hallucinations is what first comes to mind for me. And that's where the LLM’s produce false information that looks very, very compelling. I think that's the main you need new usability or risk. And then the second question, I think clearly annotating what was AI generating what's not would really help.

And like kind of tracking that through the user interface that people know that, hey, this is like an AI generative or not also styling references for where that content came from. Doing things like a confidence score where you can say like, hey, we're like 95% sure that this was said and the 5% or, you know, 50% or whatever.

Also, I think human training, like we should not be using GenAI without sufficient context, like Google, like it's not. Do you type something in? It's there's no context. You couldn't rely on that information coming out. I think there still needs to be a lot of training involved. And just in general public, like, you know, how how to really use it effectively in a to reduce the likelihood of having these hallucinations and kind of likewise to that, I'm thinking that the, the transparency of what went into the context of the, that the user could, could analyze, like the likelihood of alternations like if you didn't have the medication, this in the context, but then

you get a medication list out the other end that probably be, unlikely or that, you know, that that should be suspect. Anyway, there's probably a lot more control that that we could think of, but these are kind of some common themes. So I know David, if he had any any other ideas on that.

J. David Giese: Yeah, I've got, I got several other things. So our thoughts so on the usability topic specifically. So like I think one thing with the generative AI, especially like a text based generative AI tools to be be more specific that that because there's so general purpose, the boundaries between what it's intended for and what it's not intended for, or a lot blurrier. And I know FDA talks about that quite a bit in there. And their document. But I think like just in the past but before generative AI, it's just we technically it's hard to make tools that are so general purpose. So like usually it was more obvious that, hey, like this, this can only really do the one thing it does.

Right. Like, and it's, it's it's not it's not as likely like you if you try using it for something else, it just wouldn't work. And so I think that the risk, the usability risk of someone just starting to use it for something that it wasn't intended for goes up. And, you know, I think for when it when you think about like, foundation models, like foundation imaging models used in the radiology space, some of those problems aren't really there.

Like that's more specific to like a text based interface because like, let's say you use the foundation model and then fine tune it on some specific task in the whole, like you. And then the output of the model is, hey, you get some CTN and then it spits out a report. But none of those risks are there, even though it is using generative AI, it's just using generative AI to perform better with less training data.

And in that case, I think it's more or less the same. It's it's there's what we're doing already, just probably will work better.

Yujan Shrestha: But I think, you know.

J. David Giese: Maybe another case is if you just say, hey, I want to send you an MRI, like a chest MRI, and it's just annotates everything as if it was a radiologist, which I know there's people working on this sort of stuff. You read about it online. I mean that that's probably another situation where, you know, if FDA says, hey, we only approved this for these areas, then what happens when the radiologist isn't aware that, oh, it's actually annotating something I wasn't supposed to do, and I'm just going to believe that it actually works. So then then at that risk, maybe it comes back in play.

Yujan Shrestha: I'm going to call that much easier predicate creep. Is that is that a term I don't know. So this is what that reminds me of. It's like it's like FDA, FDA approves one thing or clears one thing, and then that device slowly kind of creeps and more things get added to it over time. Like the main check and balance on that is it's hard to create new features.

But now we're saying now you can just repurpose it. So like amplifies this problem. Yeah. I mean yeah like like like like I've seen devices, that we've tried to predicate on, but they were cleared like, like 15 years ago and only had one clearance. That was version one. And now they're up to like version 12. And it does something completely different. It does a lot of different things maybe like over time. Right. Yeah. Was like just like I notify all the log file and then it gets to being completely different. Yeah. This seems like this will kind of amplify that. That effect. Yeah. Yeah. Like Yeah. You mean.

J. David Giese: That? Yeah. I think that's another risk. Maybe. Maybe it's not a usability risk per se, but, but, but I think I think like another usability risk is like if we think about like using cursor or GitHub Copilot, like I think some of the things that make those uses of generative AI so nice is that you can iterate quickly. And so I'm just thinking about usability, like I think having the generative AI like being able to handle when it's wrong and making it really easy for someone to review it is helpful. Like, this is really software specific, but with this cursor tool, one of the things that it does is it makes it really easy to like, rerun a prompt and then see what's different.

And so it makes it easier to kind of check it, check that it worked well. And like I wonder if there's similar situations in clinical uses of generative AI where if it's an iterative process where the user is kind of interacting with the generative AI, if there's some controls or some usability features to make it easier to evaluate. Hey, did it work?

Well, I haven't seen any applications that are like that in the clinical space where you're like interactively doing it. Like, I don't know, Yujan if you have you seen anything like that or.

Yujan Shrestha: No, no, I haven't seen anything like cursor in like clinical space. Yeah.

J. David Giese: We had someone asking us about glasshouse which is interesting. There is startup that's raised I think there Y Combinator based and it's supposedly clinical decision support. And I don't know if it's like the sort of thing where you can kind of like edit it like, like get no.

Yujan Shrestha: Yeah. This, this seems to be more like searching, searching for evidence based medicine. Right? I think I mean, the simplest thing is if this is just combining large language models with, peer review guidelines and stuff, it's not making specific claims about, like, a patient or, you know, things like this one is a little bit like this may not even be considered a medical device.

J. David Giese: But yeah, I hadn't. So when you say draft differential diagnosis.

Yujan Shrestha: I kind of think of that like Google like like you're going to Google. I mean, graphically, yeah, I think this is actually perhaps more of a graze on then maybe a first for slope. But yeah, this I think this might be a topic in of itself. Like figuring out is it is the GenAI know device or not.

Post-Market Surveillance and FDA Guidance 🔗

Yujan Shrestha: And, and also FDA talked about it too. They talk about they're three there's like three categories that they bring up. Ones that are clearly not a device. And then it's, it's kind of thing is like the CDF guidance for. Yeah here like some something I that do not meet the definition. Some that meet the definition. But we're not going to worry about it. And then there's others that meet the definition. And we are going to worry about it. And that's what they're focusing on. And then they go into some details about what what cost. I'm thinking, like, I don't know if this is an educational tool, but I got me thinking, like if it is an educational tool, kind of thing, you know, maybe not considered, but it's not it's not clear on exactly what what class AI is.

I think that, but definitely depends on how it's being helping use.

J. David Giese: In the market, marketing claims and everything. Like.

Yujan Shrestha: Yeah. Yeah. Exactly, exactly. So kind of going back to this, the one more thing I want to talk about, like usability is the is it is the is a is the detectability of errors. And I think it's not uniform like I think like the, virtual scribe is kind of what got me thinking about stuff where there's, speech, the text. Oh, sorry. My my mouth fell the, speech to text error. I think that one in particular is really difficult to detect. Like, if, like if there's an error, I can't see like a, like a busy clinician going back to the recording and like verifying it unless they like, they check the transcript right then and there where it's still fresh in the mind. But but unless they yeah, I just don't see that like like I think the, detectability of that kind of error is, is low. So like the, the probability that, that, that that sequence prevents, goes without the physician detecting I think is higher. I think it's different for like each of these different ones. Like if you have a segmentation algorithm because you have the image right there with the segmentation, a lot easier to to detect those types of errors. But I think the, the, the one related to ambient scribes is particularly hairy. I think that's like another usability concern is to is to try to evaluate what is the detectability of it and think. So that's a human factor concern. Like I just don't think physicians, busy physicians are going to really do that.

J. David Giese: Yeah, I, I think there's a yeah, it's I think it's a great point because the harder it like the harder it is to do, the less likely you are to do it. If you're busy and I know for myself just using ChatGPT versus GitHub copilot like early on, right, you would say, hey ChatGPT, write me a script that does this in this, in this, and you do it and then paste it in and that, that works well.

But then when you have a cursor where it's tightly integrated into your editor and you can quickly see the diff of like, oh, here's what it's changing, you're like, oh, it's way easier to check that. And because it's easier to check it, it's more usable that risk on the second arrow, the sequence of events there, like is as I think that's what you're saying, I think it's kind of like but for like the audio.

Yeah. Like it's. Yeah. And that that's a good point to that. The control being like, hey you have to do this. Like we're only going to let you do this if it's immediately after, like you just talk to the patient and then you go out and they clinician has to review it. Then otherwise it just disappears. And then they're going to have to re listen and do it themselves.

But maybe that would be a way to like enforce that as a risk control.

Yujan Shrestha: But yeah yeah yeah maybe I think one other way is like in post-market like if you if you store the recording and then you can like in the post-market as these LLM’s get better, you know we say well oh shoot. Like this was actually a transcription error. And you know, like like it like like effects care. I think there's some legal reasons why people wouldn't want to, like, why the company didn't want to do that, but for safety reasons, it seems like that's the right thing to do.

Well, which is I think why like, like FDA is putting a lot of emphasis on the post-market like, they like like they want to get this device, like they want to get these devices out there. Because the benefit is clearly there. But to get them out there sooner without a bunch of pre-market evidence, pushing that to the post-market I think is makes sense.

And it's a more sustainable way to do it, too, I think. But anyway, kind of getting to an A side there about my, my opinion, which don't necessarily matter. I think, this question here, I'm going to skip this with plenty of time so we can go on to the other ones. Risk management. The thing I wanted to talk about here, because what new opportunities?

So there's new intended use cases have been enabled by Gen AI, and what new controls might be needed to mitigate those risk. So we talked about risks previously. One thing I wanted to kind of discussed here is like why? Why are virtual scribes way more prevalent now than they used to be? I think it's because pre pre large language model or pre pre AI will say it was really expensive to to do that and it didn't work as well.

And there wasn't like a a clear financial or technical or like a clear benefit in doing it. And now it's like, so you see that a lot of people are doing it to just kind of unlock the floodgate for just everyone can call, you know, like open AI chat API, transcribe it and then toss it into ChatGPT

And there does seem to be like a business need that comes out the other end of it. So it's like unblocking all those things. I think that's the answer to like, you know, why are there these new opportunities? And there's probably more like it where before it was hard, but now, like the value chain gets unlocked and very step long way.

And like that's why we have so much more adoption on that. Now talk about risk. Like I think this is kind of another this is another dimension to like the risk thing we talked about here like this and talking about how would it be detected. And then this is more about like how would those error be used clinically?

Opportunities and New Use Cases for GenAI 🔗

Yujan Shrestha: I think some things like the DNR status, which stands for do not resuscitate, I think this is important to get. Right. Right. But but if you light on the virtual scribe for this and it said yes instead of no like but that's objectively the problem. So not all like outputs could be weighed equally. I think that's another important thing to consider.

Like there's going to be some some consequences of an error that is that is unacceptable. I think like DNR status allergy list. These are some these are some that I could think of that are really high risk that probably like we shouldn't be really relying on virtual scribes for this. There's other ones that are probably lower risk that are probably okay.

J. David Giese: That's a that's a really interesting way to think about. Yeah. But I just did see like having a hallucination. The DNR stuff is definitely problematic, right?

Yujan Shrestha: Like, it's like going again, like it's terrible, but it's like like if you trace these, red arrows. Right? High likelihood, high likelihood, really shitty consequences, you know, and so that that should be like, like if you ran your risk, it's like, if you're running risk evaluation on this, you should probably get unacceptable. Yeah.

J. David Giese: It's like, well, there's too many novels about DNR is like where that in the novels that or whatever. Right. So it's biased to like always like one like that. Yeah. Like something silly like.

Yujan Shrestha: And like I like this. This often gets into, like the performance evaluation aspect of it, I think, like, you can't just wait all of these equally, right? Like, if you have in your transcript the allergy list, you should probably wait errors on that. Wait higher than this other stuff rather than just like a word error rate. Like like if you did just like based on characters or something.

Yes versus no. Oh, so this is one character up, but like just really matter. But you know, I'd say like that that would be weighted significantly, more semantically like based on the based on the consequences of the and, you know, it's really difficult thing to do. But I think it's one that should be considered when you when you write the, the function that evaluate how good your algorithm is, is doing, maybe this other stuff will, we'll leave for another day, cause we only got five more minutes and we have some question that, that we can try to get to here.

So. But from Ricky. Thanks. Thanks for your question. So Ricky said it's not unreasonable to think FDA would issue guidance on what to include within a pre-market submission for a GenAI how long it's taken to get GenAI specific guidance for predictive AI. And we're still waiting on a life cycle. How long do you think it would take for FDA to get some meaningful guidances out?

Q&A Session 🔗

Yujan Shrestha: Yeah, that's a that's a great question. I don't know, I, I think there's a lot of I think there's a lot of urgency to this one because there's just so much I mean, there's so much. GenAI just being cowboy in the product right now that perhaps there's a, there's a higher there's like a higher priority, I think ask one time there.

Like I'll ask and I had when do you guys think you have a well have a generalized guidance. I don't know David if you have.

J. David Giese: We we can we ask them this at the, at the abdomen conference in the big FDA symposium and they kind of were like, no, they didn't didn't really give an answer. But my suspicion is if there is a guidance at first, it'll be around what where they're drawing the boundaries between is it a device or not?

Like in the clinical decision support. So because I would I would think that's like the most immediately thing like urgent thing to clarify. And then I would imagine then like supposedly they haven't even they haven't cleared or they have not cleared any GenAI devices, Gen AI enabled devices. And they said they haven't even had any submitted. But I, I, I don't know if they're just not allowed to say.

I don't quite believe that just from other conversations that we've had. So I would maybe maybe it's just not fully communicated to FDA. But I'm my guess. Is there have been some submissions with GenAI enabled devices, certainly at least preserved, but but probably once they've had enough to like, actually have a standard then then I could see them doing guidance, but I doubt they're going to do it until they've at least cleared several and they start to have some consensus internally.

And like what? What's reasonable to expect. So I don't know. That'd be my guess, but it's just a guess.

Yujan Shrestha: Yeah, I'm kind of I'm kind of reading between the lines about, like the current the, the, current document we've had currently. Like, it seems like they're pushing a lot more into the post-market, which I haven't, but like, like, like, I know post-market surveillance was, was is definitely on FDA's mind, but I think this time around, maybe they'll actually put something out there with the with the strings attached.

Have to have a strong post-market surveillance plan, in order to get this cleared. I think that's what the that's where they're going to. So maybe it'll be sooner than that. I could talk going to be reactive as it probably was in the past, but maybe it'll be more bit more proactive because they have the assurances that the post-market surveillance is happening. I don't know.

J. David Giese: But I think when issued, there's is just their authority to enforce post-market stuff like like, I don't know, like it seems like they're kind of stuck a little bit in that they have the authority to do this big pre-market check.

Yujan Shrestha: But like, I'm.

J. David Giese: I'm not really that aware of how this works on the legal side, but like, I know they do audits of your QMS but, like, what authority would they have to enforce that you're doing post-market surveillance, unless it's specifically in the special controls for a new product code.

Yujan Shrestha: But yeah.

J. David Giese: I know, I know that, Troy, I forget his last name, but that that the director of the the digital health group that there was at edmm and I think he, he said that that they needed more statutory authority to implement the post-market stuff that they would like to do. But, you know, I to be honest, I need to read and think about this a little more to be sure.

So what I'm saying here is like with a grain of salt.

Yujan Shrestha: Cool. Yeah. Thanks for your question, Ricky. And we got one more question from Richie. The quality prompt could play a key role with respect to output generated. What generally do you think developing something like a prompting that could be part of, device labeling to help users and prompting? Yeah, I think so. I think having clear instructions to the user and, like and definitely educating them on how to how to best use I to reduce the chances of these hallucinations. I think that would be important. And this also touches on what FDA is saying, what would be of the, the open ended input and open ended output. To them, that's like the highest level of risk, the highest chance that it could be misuse. Highest chance try to creep and all that stuff happens. So I think like the the prompting guide is most relevant when you have an open ended input. And that's also probably the most difficult one to get yet FDA cleared. Right now. Okay. But yeah. Anyway, yeah. So we're already two minutes over. But, you know, I think we had a great, great conversation. Had some some excellent questions. Thanks a lot for your participation. Everybody. And, yeah, I think we'll just continue this conversation. I'm not sure if we'll have this next week because I'll be out. But we'll definitely let you guys know. Yeah. Please, please have a look out for any posts there. So, yeah. Thanks a lot for joining me. Hope you enjoyed. And, yeah, please, please consider risks that you can make devices that that you would use on yourself and your family. And if that's true, you know, keep it pragmatic. Keep it simple.

J. David Giese: Okay. Thanks Yujan. I thought it was a great, great topic. Thanks.

Yujan Shrestha: Yep. Talk to you later.

SHARE ON
×

Get To Market Faster

Medtech Insider Insights

Our Medtech tips will help you get safe and effective Medtech software on the market faster. We cover regulatory process, AI/ML, software, cybersecurity, interoperability and more.