Participants 🔗
- Isaac Miller - FDA's Office of Scientific Professional Development
- Dr. Elena Sizikova - Office of Science and Engineering Laboratory, center for Devices and Radiological Health in the Food and Drug Administration.
Key Takeaways 🔗
- Synthetic Data in Medical Imaging: Synthetic data plays a pivotal role in bridging gaps where real medical imaging data is limited, enabling safer AI model development and testing without breaching privacy or facing accessibility issues.
- Applications and Benefits: Synthetic data is being used across domains like histopathology, pediatric imaging, and skin simulation, offering adjustable variability and removing risks such as radiation exposure during data collection.
- Challenges in Synthetic Data Usage: Key challenges include ensuring the realism of synthetic data, addressing biases, and defining proper evaluation and validation methods for its application in AI-driven medical devices.
- Comparisons of Generative and Knowledge-Based Methods: Generative methods often yield realistic outputs but inherit training data biases, while knowledge-based methods provide controlled outcomes but require complex implementation and tuning.
- Blending Synthetic and Real Data: Hybrid approaches combining synthetic and real data can improve model performance, but the ratio of synthetic-to-real data needs to be carefully optimized based on context.
Transcript 🔗
Introduction and Overview of FDA Grand Rounds 🔗
Isaac Miller: All right. Good afternoon and welcome. Good morning to the folks in the Western time zones. And good afternoon, everybody here on the East Coast. My name is Isaac Miller. I am with the FDA's Office of Scientific Professional Development. Thank you for joining us. For today's FDA Grand Rounds sponsored by the FDA's office of the Chief Scientist. Our lecture for today will be provided by Doctor Elena Sizikova. Before we begin today's grand rounds, I have a few CE in-class instructions to read. Speakers are expected to use generic names. If trade names are used. Those of several companies should be used rather than only that of a single supporting company. Disclose unapproved use. CE speakers are required to disclose to the attendees when products or procedures being discussed are off label, unlabeled, not FDA approved, and any limitations on the information that is presented.
Disclosures today. Speaker the FDA Faculty Planning Committee and CE consultation and accreditation team have nothing to disclose. Continuing education. This activity is approved for one CMC CPE, and CNE contact hours. Continuing education credit will be available for those who attend via zoom. Pre-registration via zoom is required. Requirements for continuing education attend the activity verified by zoom registration. The sessions pharmacists, nurses and those claiming non physician CME must attest to their attendance and complete the final activity evaluation via the CE portal.
We encourage all participants to complete the Post Activity survey as it provides valuable feedback which may affect future activity offerings. Instructions on how to complete the survey, claim CE credit and or print the certificate will be emailed to you within 24 hours of this session. Claiming code is on the screen. There. I'll give folks a moment to write it down if they want to go ahead and get ahead Jumped on claiming directly after the class. If not, it will be emailed to you sometime tomorrow. Zoom. Closed captioning is provided for today's seminar. If you do not see the CC, press the CC button at the bottom of the screen. There will be a Q&A session at the conclusion of today's talk. Please put all questions in the Q&A pod at the bottom of the screen.
This webinar is being recorded. The recording will be posted on the website within five days. Post event. All right, today's speaker, Elena Sizikova. Elena Sizikova is a staff fellow in the Office of Science and Engineering Laboratory, center for Devices and Radiological Health in the Food and Drug Administration, located in Silver Spring, Maryland. Her work addresses research problems at the intersection of artificial intelligence, medical imaging, and computer vision, focusing on the use of synthetic data and regulatory science.
Speaker Introduction: Dr. Elena Sizikova 🔗
Isaac Miller: The title of today's presentation is Synthetic Data for Medical Imaging AI. All right. Time for our esteemed speaker. Coming from the White Oak campus, FDA, Silver Spring, Maryland. Elena its all yours.
Elena Sizikova: Thank you so much for the introduction, and thank you for the opportunity to discuss my work with me. Share my screen. Are you able to see my screen now?
Isaac Miller: Yes, ma'am.
Elena Sizikova: Great. Thank you. My name is Elena, and I'm a senior staff fellow in the Division of Imaging Diagnostic and Social Reliability and U.S. Food and Drug Administration. And it's an honor to present our work today to you. So first, I'd like to start with some slides that discuss what, what I'd like to discuss today. So I want to talk about different classes of techniques for generating synthetic medical imaging data.
Overview of Synthetic Data in Medical Imaging 🔗
Elena Sizikova: And summarize their strengths and weaknesses. And I want to also discuss how synthetic data generation for some practical applications, such as Breast and Skin imaging can be effectively used for various stages of medical AI development. So what is DIDSR? is a division within the Office of Science and Engineering Labs also, which is the research arm of the center for Devices and Radiological Health within FDA. At DIDSR, we conduct regulatory reviews of AI enabled medical devices and develop regulatory science tools to accelerate and facilitate a review of the types of devices that we see daily. In this chart, you can see where regulatory science tools and publications are part of the regulatory tools evolution and RST’s are a new way to de-risk technology and development by providing practical solutions for evaluating devices.
RST’s do not replace other regulatory tools that we employ during the FDA review process. But one notable key benefit of RST’s is that it takes less than a year to get them out of both of our hands and into the hands of the stakeholders. On the other hand, the development of guidance documents is typically in the ballpark of 3 to 6 years, while standards take even longer and in the range of 5 to 12 years, here's a snapshot of the RST catalog and which is already three years old as of today. You can access this catalog, online. It's freely available. And early data suggests that the product catalog has a probably approximately tripled the use of the tools that are part of it.
We're adding new tools at the rate of 50 a year and looking to significantly expand the capacity. And, I want to emphasize that a lot of these tools, there are links there are links to the data, the forward links to GitHub, and you can sort of interact and open issues and really, and really be able to see how you can use these tools in practice and, and further and further.
Advantages of Synthetic Data Over Real Data 🔗
Elena Sizikova: So today I want to focus on one of the major gaps and challenges facing medical imaging AI, which is, limited and labeled data sets. In this slide you can find some data about the number of device submissions to the agency that contain an AI component. And this is public data across, many product areas. And look at what happened in the last few years.
There has been a dramatic increase. The dominated by radiological application. But, committing to other device areas and we expect this trend to continue and we don't see any indications of slowdown. So what is common across all of these device devices? Medical devices need to be evaluated on representative populations. And particularly AI algorithms require access to large training and testing data sets that would have enough variability to demonstrate the safety and effectiveness.
And, the intended use populations. On the other hand, real patient data sets that are commonly used to within development are often limited to a number of factors. They're often smaller in size and constrained in variability. They may not represent, rare or life critical cases. They may be unavailable to privacy regulations, policies or risks of acquisition.
As a result, they may be biased. And, even when this data is available, they may not, have labels of the type of information that may be needed to, develop a particular device. So data availability is a huge issue in medical device development. And one of the potential solutions to the data availability is synthetic data.
So as defined by the artificial intelligence glossary, synthetic data is artificial data. That is intended to mimic the properties and relationships seen in real patient data. Synthetic data, are examples that have been partially or fully generated using computational techniques, rather than acquired from a human subject by a physical system. So some initial comparisons of patient and synthetic data.
You can see here. So patient data is human data while synthetic data is an approximation. So that's a negative for synthetic data. On the other hand patient data is more burdensome to collect. Well synthetic data often doesn't sort of can be collected in a much easier fashion. And it can be a patient data may contain limited samples, while synthetic data.
With if you have, access to a good compute system, you can generate unlimited samples. Patient data, may have limited, patient, variability due to sort of the, the challenges in acquisition. But in synthetic data, you can tune the variability that you'd like. Patient data is often more narrow in scope. While synthetic data is adjustable in some in patient data.
Some of the, reference standard or truth may be uncertain while in synthetic data, often it's easier to obtain the labels. Also, synthetic data may reduce the risk of data collection. And so there's no inherent risk. For example, radiation risk. When imaging a patient or patient data may also pose privacy concerns for data sharing while synthetic data.
Again depending on how it is generated, may have, less privacy concerns for further data sharing. And finally, patient data for all of these factors, individually or together may may be biased. And synthetic data, depending again on the way it is generated. We may have more control over the biases that we know. And we can potentially be able to address.
So as you could see, synthetic data poses a number of advantages or potential advances for patient data. And the question is, can can synthetic data partially or fully replace real data for AI? And commonly classification, detection or segmentation algorithms. Can we simulate enough synthetic data for AI development? And in this slide I show some sample synthetic mammography images.
That we developed in-house, to support some of the testing in, to evaluate, this claim. I will present the, some later, data set in a later slide. So here I want to show an image of a breast cancer lesion. And I want to ask you, which, which image? Contains a synthetic image and which one is not and I'll give you a few seconds to decide. So you could see that the left image, actually, is a synthetically generated image. So what I want to say that this is a cherry picked example, and not all synthetic data is as realistic, but, we can tune the synthetic data to be as realistic as we need, depending on how much realism we actually need.
Applications of Synthetic Data 🔗
Elena Sizikova: And in the next couple of slides, I want to show some, use cases of synthetic data at various stages of medical imaging, AI development and analysis. So DIDSR, has many, staff fellows or AI fellows and other researchers, and we have looked at a number of applications starting from histopathology to breast imaging. And then it kind of scorecards, synthetic medical evaluation.
And even beyond this, there are a number of other ways how synthetic data has been used in various, for various applications in medical devices. So to start with, synthetic synthetic histopathology, data generation nuclei and cell segmentation is a critical step in pathology image analysis. Quantifying cells can help improve the accuracy and efficiency of diagnostics and biomarker quantification.
But developing algorithms requires accurate annotations of millions of nuclei and cells on a large scale slide images, which is often challenging. Imagine taking one of the images that are shown in the slide and then annotating every cell. In this project, my colleagues collected for cancer pathology data sets and developed and trained the diffusion based generative AI model on pairs of Histopathology images and their corresponding masks.
They produce a synthetic histopathology image dataset that can be used for downstream task analysis, a pathologist confirmed, and the results are very realistic and then can be identified. Stromal carcinoma and nuclear. In the generated synthetic data images. The performance evaluation using synthetic data quality metrics shows high level of detail and a variety in the generated images. I encourage you to look at the RC and the links shown in the slides in this and other slides to learn more about each individual project.
We also know that, this is a different project that looks at pediatric AI, usage and synthetic data. And we all know that kids are not just small adults, but in fact their medical images vary substantially from adults. The anatomy disease presentation and image acquisition and quality are all different. All of these factors can contribute to variable performance in the AI devices used to process or aid in interpreting these pediatric images, and AI devices are increasingly being used to optimize pediatric imaging, but collecting and annotating pediatric data.
Pediatric kids are protected population requires more expertise and thus is more challenging than adult data. Meaning performance. Pediatric patients in most AI devices is unknown. We have developed a synthetic pediatric imaging pipeline, as shown in the slide, that enables us to ensure that AI devices are safe and effective across a range of dimensions. So where pediatric images differ from adults.
The pipeline show creates synthetic CT images using a virtual CT scanner shown in the center. The resulting images are then fed into an AI model on the right. Since we know the ground truth, patient and disease characteristics, we can evaluate the effects when comparing AI output to the ground. Truth. So this project is just one of many examples how synthetic data can give us an opportunity to conduct virtual imaging trials to better understand the safety and effectiveness of the AI devices in kids without unnecessary radiation exposure or expensive patient recruitment.
In another project, we developed S-SYNTH a 3D knowledge based and open source skin simulation framework for generating synthetic skin images with systematic variations of skin and lesion properties, as well as a rendering. I will go into the detail of the pipeline in the second part of the presentation. In a follow up project on incense, we also show that we can use synthetic training images generated by this pipeline to contain common skin artifacts to improve downstream performance on lesion segmentation tasks.
So knowledge based and simulation pipelines can are not exclusive to skin. For example, in the incense project, where we focus on the creation of synthetic mammography datasets, containing 45,000 images. This dataset can be used to analyze AI model performance and fine regions of parameter space for which performance varies from the aggregate level. Synthetic data can also be used to study disease progression over time.
In this project, we introduce a computational model, for simulating growth of breast cancer lesions accounting for stiffness of surrounding anatomic tissues. A key aspect of synthetic data use is accurate reporting and evaluation. One of the tools developed by a group is the Synthetic Medical Data Evaluation Scorecard, which is a just a comprehensive report to accompany artificially generated data sets.
The scorecard is intended to help standardize evaluation and report performance according to a comprehensive set of metrics. I want to emphasize that imaging AI applications are only a part of a much larger set of applications of synthetic data used within medical devices. Computational modeling and simulation is an adjacent field. For instance, an algorithm on a device software may take in patient data and simulate the patient to obtain patient specific results.
I encourage you to look at the links shown below to learn more about this topic, which is beyond the scope of this presentation. So at this point, I've showed a number of applications. Are you probably wondering how can I actually generate synthetic data? In this, project, we created, kind of a taxonomy of different types of ways, how synthetic data can be, generated.
Synthetic Data Generation Techniques 🔗
Elena Sizikova: But we distinguish two different ways, individual models which are non stochastic and intended to model an individual patient so called digital twin or a family of patients that are related. On the other hand, population models are stochastic models that can be sampled from and create new individuals. Compilation models can be grouped into either image based or knowledge based methods.
Image based methods create synthetic data based on information obtained from patient images such as CT scans. Generative AI is a one type of imaging based method that learns in an AI model to, from images to generate new images. On the other hand, knowledge based models take data from, physical or biological measurements to create, synthetic data.
At this point, I want to pose and ask which of the following terms can be used in describing digital representation of a patient's anatomy? I'll give you a few more seconds to respond. So the correct answer here is all of the above. Which is C So a phantom or a 3D model can be used to describe a digital representation of a patient's anatomy.
So in this slide I want to present an overview of different techniques that we created where synthetic data is used for radiology. And so I want to emphasize that there has been a lot of work done and across different fields, across almost every application in CT scans and X-rays, mammography, across both multiple modalities and anatomic sites.
The type of models that are used, again, are varied. So many models are again imaging based generative adversarial networks. Diffusion models and so forth. But a lot of models are also simulation based, patient based or knowledge based and so forth. And the colors in this table indicate are grouped by the type of model that's used to generate synthetic data and whether also whether the model is used to generate a particular patient or simulate, for example, an imaging acquisition device.
I want to emphasize that a lot of the models today are imaging based generative AI methods. And you probably see these models are very popular. And you may be wondering at this point, why would we want to use simulation? Why don't we just use the latest and greatest, generative AI model that is often developed not for imaging data?
So these are just some popular examples, of models that can be used to generate images based on input prompts or other input data. So these models generate extremely nice outputs. But for medical imaging, one of the limitations is that there is a difference across the image space. For example, the space of where the x ray sits and the object space, which is the true representation of the object that you're trying to image from such as this patient and so information is frequently lost during imaging operations. And so these two spaces are non bijective. And taking a particular point and mapping into the imaging space results in the loss of information. How does this manifest in practice. So in the middle you see just the visual example not medical, where you take images and then take all information from them and learn a model for both them.
And you see that the resulting images may not be realistic, they may not be anatomically correct and contain artifacts. And so, depending on how the synthetic data is used, it's important to be able to account for these kinds of limitations. And during the model. Another problem is that the outputs are often unconstrained and again, depending on how the, synthetic data generation model is parameterized, it may generate artifacts which may cause the resulting images to be unrealistic.
So here's a very brief comparison of knowledge based versus generative models. In knowledge based models that obtain data not from images, but rather from physical or biological measurements, there are generally no null space limitations. While generative methods that are learning from images have a null space problem. As a result, generative methods may suffer from hallucinations, while noise based methods may not. Knowledge based methods are not the solution to everything. They may often be much slower, to both implement and to run and may often generate unrealistic outputs. On the other hand, generative methods learn directly from the distribution of images, and the outputs are often more realistic. However, I you probably, remember the example from earlier where I showed the breast imaging in in simulations, we can control the factors that we simulate and potentially can make very realistic, data.
And the choice of how much realism we need may depend on the type of application that you're setting. Another advantage of knowledge based methods is control over biases. Because because we know what data is used, to create a particular model. We have more control versus, generative methods that learn directly from patient images, which are often, suffer from limitations in patient data sets, propagate biases from the training data into the resulting synthetic data.
So in summary, if you're going to choose a synthetic data generation technique, you may want to consider the quality and quantity of the source data that's used to create the synthetic data generator. Whether there is available biological physical knowledge as well as application domain features such as a null space, are important to consider. And each of these models a couple of characteristics of the object from the imaging system.
And again, the choice of techniques really depends on how what is the risk of use of the synthetic data. And how it is used. There's a term for this context of use. And again it's really important to be able to characterize what, what synthetic data is, what how it has been developed and how it is used and what are the resulting risks.
Case Study: Synthetic Mammography Images 🔗
Elena Sizikova: So, in the next part of my presentation, I want to go back to the graph. I shared earlier, and look at some specific examples, how knowledge based models can be used to create, synthetic data, in conjunction with AI development. In the first project, I want to discuss the, some data set, which starts from a knowledge base model of the breast developed by graph and colleagues, which is then compressed, and mass is inserted into the breast.
And then the resulting digital representation of the breast is imaged through a virtual simulator for mammography device. This results in images, where we vary characteristics of the breast and mass, such as the breast density, mass, radius, and mass size, as well as characteristics of the imaging system such as the relative doors with respect to the clinically recommended dose.
We can see the effect of each of these variations on the images. You can see that in the images, for example, increasing the mass results in ambassador's larger sites. The Amazon data set, is, one of the regulatory science tools I mentioned earlier. And the full pipeline as well as the images are available open source, via GitHub.
So we look at some experimental validation of this data set and look at one specific task, which is mass presence detection in digital mammogram images. This means that if we take an image and, output a single label, whether a mass is present in this image, for each specific breast density, we look at breast density and other properties.
We, generate, a large data set and then compare it to a real patient data set, which in this case is 410 mammography images from the in breast data set, which is a public mammo data set. And we use a metric called area under the curve, which is a standard metric for evaluating this type of test.
So in this graph we look at how does the training data, affect the performance on mass radius and breast density. So here each of the graphs shows the colors indicate the type of training data that is used to train a specific model. And the x axis indicates the parameters of the testing data. And then the y axis indicates the metric ABC.
And in this case the higher AUC is better. So synthetic data generates expected trends that are based on physics. Understanding of this task. We find that, easier cancer cases for instance, masses with larger larger sizes or higher densities generated higher AUC values, including in the main examples such as masses matching size for examples with matching breast density generated higher performance as compared to using out of domain data, including all examples within a domain.
For instance, all mass radii generates the best performance. So here I want to ask again another attention p question. What type of models suffer from null space limitations? Models trained from imaging data or models trained from physical and biological knowledge? Again, I'll give you a few more seconds to think about it. So correct answer here is (A) model is trained from imaging data.
So in the next project I wanted to discuss in more detail. The S-SYNTH framework which I mentioned earlier, which is an open source skin simulation framework for generating synthetic skin images, where we vary properties of the skin and lesions as well as the properties of the rendering via detailed physics models. To check how does AI downstream performance vary?
Case Study: S-SYNTH Skin Imaging Framework 🔗
Elena Sizikova: So the framework for this is quite similar to the breast imaging that I discussed earlier, but the application is much different. And here I want to show a video of how this model was created. So we start with a model, developed in Houdini, which is a multi-layer synthetic skin model that's procedurally generated. There is a blood network that is grown as well as hair and the lesion that is inserted into the skin that we each of the layers is assigned a material and rendered using a virtual camera, using the Mitsuba framework, where we can, very systematically characteristics of all of the layers and the imaging to generate the resulting images. So some of the variations, that this model allows are shown here. The Melanosome fraction, which is a proxy for skin color, the variation of which is shown on the left, shows that we can generate or the same exact model of the lesion and skin. We can generate variations in skin. So we can also do this for other parameters such as the blood fraction.
And very again, these characteristics based on values that we obtained from literature, we can also look at binary characteristics. For example, we can add and subtract here from the same exact model. We can also make the lesion shape. More regular or more irregular depending on what we want to study. So this model allows us to look at a given a particular, skin segmentation model, that's trained on all the patient data.
How does the performance vary with these annotations? I want to emphasize that for skin imaging there are very the patient data sets are extremely limited, and virtually none of the labels that are seen in this slide are available for patient images. And so we can actually, test the variations are not label in patient images using these models.
We find that synthetic test images with hair artifacts resulted in lower performance compared to the images without hair artifacts. The performance drops with darker skin color, which is a known issue with many skin imaging AI tasks. It is slightly easier to segment lesions that are more regularly shaped than irregularly shaped, and also the blood fraction affects slightly affects performance.
We can also add synthetic data during training and see what happens. So in this graph, instead of graphs I look at how this the proportion of patients to synthetic data affect, performance. And so the x axis indicates the proportion of patient data to synthetic data. While the y axis indicates some Dice sport which is a segmentation metric, and the higher dices the better.
And so the left graph shows performance when the model is trained only on, real data and then evaluated on a held up, test data set. So test data sets are the same as this patient data set, used in all three graphs. The dashed in graphs are the data set. And the kind of filled in graphs are the basic data set.
The difference between these two data sets is Ice is much smaller than how. And so you can see that if we only had, real patient data, which is the graph on the left, we only use 20% of the data. But these data set the performance is much worse than, if we had 100% of the data.
So this shows the more real patient data we have, the better if we, kind of consider a simulated scenario where we replace, portions of real patient data with synthetic data, which is a graph in the middle, we find that, if the model is trained only on synthetic data, the performance is not that great. But, if even 10 to 20% of the some of the patient data is replaced by I'm sorry if 10 to 20% of the real data is used and then the rest is synthetic, the performance is actually nearly the same as if the entire patient data was used.
And finally, if we add different subsets, if we add synthetic data to the full, real data set, which as you can see on the right, the performance also improves like so we find the addition of synthetic data sets to small patient data sets generally results in improved performance. For this task. So this concludes the essential project.
Challenges and Future Considerations in Synthetic Data 🔗
Elena Sizikova: There are number however of remaining challenges and considerations that are important to look at. With the users synthetic data for these applications and others. So what there are a few remaining challenges that I want to highlight. How do we evaluate synthetic data sets? What are the appropriate metrics and how do we actually sort of check if they can or cannot be used in the particular context?
Many of the methods that we see today on synthetic data are imaging based. And how do we address the limitations of the imaging based models? Do we need to address them, and how do we address the dependency, for example, on the acquisition device that was used to collect the particular data set? And then we wanted to generalize in the context of knowledge base and simulation based methods.
How can we simulate more realistic knowledge based data, or for it to be fully, to be able to use it in a practical context? So responsible use of synthetic data in AI development has promise in resolving gaps in real patient availability and variability. But not all synthetic data is alike. And it's important to consider again the context of use and the risk associated with a particular application where synthetic data would be used.
Conclusion and Q&A Session 🔗
Elena Sizikova: And, we have developed and disseminated, several regulatory science tools and applications that contain knowledge based pipelines as a proof of principle for developers for using, these synthetic models. And I encourage you to look at the regulatory science tool catalog, to, to learn more about them. So this concludes my talk. Please feel free to send me an email, for the slides and I think that will be a good time to open up for questions. Thank you.
Isaac Miller
Yes. Thank you. So far we have two questions. Are there? Are there any more questions? You place them in the Q&A. I did get some requests for the slides. We my department cannot share the slides, and we will not be able to post them. But like Elena said, if you want to email or, for the slides, that will be fine.
First question why knowledge based methods may produce unrealistic results, while generative methods may produce more realistic results.
Elena Sizikova: So this is a great question. If you look at the synthetic images here, for example, this is a skin model. You could and it's hard for me to tell because I look at these images all the time, but they generally don't look like a realistic, image of skin. Right. So to some extent, depending on the resolution.
But, so, it's important to look at all of, sort of to make a realistic simulate a realistic image. It's important to look at characteristics such as light reflection. The underlying kind of parameters and tuning these parameters is very hard versus if you start with just patient images and learn kind of a generative model on top of them, it is easier to to learn that kind of distribution because you're already you start from a set of patient data, imaging data and then generate the results.
Isaac Miller: Thank you. Next question. You mentioned that knowledge based methods allow better control over biases as compared to generative methods that usually incorporate biases from the training data. Could you elaborate more on that? Additionally, which part of the pipeline does this fit into?
Elena Sizikova: So again, I would give an example from the recent framework. So I kind of gloss over this, but here is a distribution of one of the data sets. So skin color is both, one of the public data sets that we use, as well as a synthetic data distribution that we generated. So you can see that both of these distributions are not sort of equal or not equal by skin tone. But specifically if you if you're learning, let's say, a diffusion model of the skin images that are distributed, based on the skin tone shown on the real data distribution on the left, then you're probably going to get this type of distribution as an output versus in knowledge based models we can control, we can choose, to manipulate the melanocytes fraction as well as the imaging to control the distribution, which would be much more difficult, to do.
If you already have a fixed set of images where you only have very like very few dark skin patients.
Isaac Miller: Thank you. Next question. How does adding or replacing real data with static data differ from using argument augmentation methods?
Elena Sizikova: So this is a great question. So you can think of synthetic data as a form of data augmentation kind of a modified form of data augmentation. And so in the experiment for example with addition or replacement, again we we want to systematically vary proportions of synthetic data and patient data subsets. So I think one key difference is how again how the synthetic data is generated.
So for example a common data augmentation method is you are that is not synthetic data base. You're flipping the images and then using sort of the resulting flip images in addition to the original ones to improve your performance. But you're really using the same distribution in terms of, for example, skin tones. So that doesn't really change. And so if you're also maybe augmenting data, that's comes from a Gan or another generative model, you're also using a very similar distribution of data as in addition to the original training patient data versus in if you're using kind of simulations or external information, you're adding other distributions to to try to balance, performance, either overall or balance or improve performance, either the role or a particular subgroup.
Isaac Miller: Right. Thank you. Could we use generative method to create data set for performance evaluation for an FDA submission.
Elena Sizikova: So that's a great question. I don't I wouldn't be qualified to answer regulatory questions on the webinar. But there are submissions sort of we see at FDA, that use synthetic data. And so it's it's an emerging trend oftentimes that is used for training. And again, one of the important things to consider here is how do you sort of, you know, if if you're using synthetic data, is this like so what information do you have for making the synthetic data and how it is created.
So it's not that different from if you're just using AI, you're just using a different type of AI to make data. Or if you're using a simulation framework, again, what is it that you're doing and what how it is used and what are sort of associated risks and conflicts of uses.
Isaac Miller: Right. Thank you. Are there any regulations from the FDA with regards to approvals? When synthetic synthetic data is used to train models trained the models.
Elena Sizikova: So again, this is a great question. At the moment we I believe we do not have a guidance in terms of specifically for synthetic data. But I would also encourage you to look at the human as guidance that I mentioned earlier to learn more about sort of current stage.
Isaac Miller: All right. Next question has a couple of parts I'm going to read them all. Great talk. Thank you. Mixing synthetic and real data sounds like the least burdensome approach. How do you validate synthetic data generation. Is it is it a mostly quantitatively driven assessment like the breast image given earlier.
Elena Sizikova: So again, as we see a more synthetic data use it's important to consider sort of again how it is used. And in one context. And so we are learning this as the industry expands. And then there are more applications. So again it's important to look at. So the scorecard project is kind of an initial effort to try to come up with a kind of a standardized evaluation and reporting for synthetic data. So depending on the type of application, it may be important to look at either task based assessment or sort of non task based assessment. So yeah, I think the answer here it depends on your task. All right.
Isaac Miller: Thank you. Next question to date has FDA given marketing authorization to any devices that included synthetic synthetic data in training?
Elena Sizikova: I won't be able to answer that. Sorry.
Isaac Miller: This is more of a note. They want to see your email address again. Sure. But this slide with your email address. Right. Next question. These are all, they're rolling in, and I think we have ten more minutes to go. So are you are you final, Elena?
Elena Sizikova: Absolutely. Yeah. As we.
Isaac Miller: Okay. Great for this for the synthetic images generated and downstream analysis. Were you just experimenting with different portions of synthetic plus real images and how it affects the performance? For example, adding 50% of synthetic data, improved performance on drought on the downstream task.
Elena Sizikova: So we did experiment again for so the slides have a very kind of brief overview, but we did experiment, for example, there's a the closest here is 30 to 70, 40 to 60, 60, 40. But here these kind of splits are, proportion of like half, real patient and have synthetic, data sets. And so what happens, to the Dice performance.
So that's what we were doing. And I think, the replacement experiments actually a very nice way to look at sort of we have the full data set available for each case here. And so in a hypothetical scenario, if we didn't have that case, for example, and then sort of a new use case or something else, how what would happen if, you know, we actually use synthetic data to supplement the limited patient data.
But we have sort of sort of a reference standard to compare sort of Oracle to compare that if we had the full data set, this would be the final performance.
Isaac Miller: Are these data sets specific for some spectrum of imaging equipment, e.g. what an image obtained on 0.72 and 1.5 to MRI, MRI give you similar results after training and validation.
Elena Sizikova: So we haven't really considered, MRI simulation in sort of these projects. And one of the reasons is that it's challenging to to be able to simulate sort of a very specific, MRI machine in comparison to, for example, for a mammography image, where in mammography, we, we had we use, very specific, detectors that were they could have a particular manufacturer just as an example.
But again, it sort of depending on the application and you really want to think whether it's mammography, whether it's MRI, whether it's skin, you really want to you probably know best what are the differences across the detectors or imaging devices. And so and how these differences would affect the downstream performance.
Isaac Miller: Thank you. Is there a correlation of model A you see in the higher spectral resolution spectral excuse me spectral resolution of the data set. Does this apply only to patient derived versus synthetic sources.
Elena Sizikova: So if you generate synthetic images. So absolutely a resolution matters right. We can downsample these images to, you know, very few pixels kind of the resolution size. And then we won't be really able to see the masses. So we haven't explicitly conducted an experiment, where we compare the kind of the effect of resolution. But actually it would be an interesting experiment to conduct.
And the other thing I wanted to note is that we are able to simulate, images in both masses, in both the mammography project and the skin project at extremely high resolution. And then for the downstream AI performance, we actually downsample down to a smaller size because sort of two bits, because we were constrained by the architecture. But absolutely, the resolution is important and the simulation actually allows you and us to compare some of these effects versus if you were looking, for example, at a data driven model, it may be constrained to the resolution of the input image.
And so it would be more difficult to generate images at a higher resolution without sort of appropriate super resolution techniques.
Isaac Miller: Great. Thank you. This next one I'm not really sure answer, but I'm going to ask it anyways. Will FDA accept a submission package of AI as medical device where proportion of the data used for AI training is synthetic?
Elena Sizikova: So again, this is a kind of a more, where you talk. So, the FDA has a mechanism called regulatory mechanism called u sub. And so any kind of questions that would be for a particular device. That would be a great way to sort of interact with FDA to ask these type of questions. But I do want to emphasize that synthetic data, again, it's not used in every submission. But certainly we do see submissions with synthetic data.
Isaac Miller: Thank you. Thank you. There are models that combine both knowledge based methods and generative methods.
Elena Sizikova: That is also a great question. So there are models oftentimes they're called hybrid or physics for. So again there are many different names for these models. So I would encourage you so specifically for radiology and you can look at this survey of techniques. So you can see the title on the top right.
But hybrid models actually allow sort of in many cases to take the best of both worlds. And I hope there are more models developed in this domain.
Isaac Miller: Thank you. Are there models that combine both knowledge based methods and generative methods?
Elena Sizikova: Yeah. So that would be hybrid for physics and for models. Maybe there are other names for them.
Isaac Miller: Is the evaluation of synthetic data different from validation of synthetic data?
Elena Sizikova: I think this is a terminology question. So I think the key distinction would be, you know, if the synthetic data is generated, you know, I'm sorry, evaluated on itself or in comparison to the patient data.
Isaac Miller: Thank you. Could you elaborate more on the concept of null space.
Elena Sizikova: So the null space would be the concept shown here. So sorry. You start with an imaging space, which is for example the lossy space. Again here you just see an example. But imagine sort of the, the set of information that is loss, such as the loss, certain sort of piece of the sort of the headgear or anything.
But essentially you have you start with a breast or another kind of object that is kind of from a patient. Right. And then you image it. And so this results in a loss of information. And so by learning on the loss in measurements, you're not necessarily learning to generate the full model. And which may pose kind of various downstream issues.
And in addition if your model is under constrained they're sort of both null space and hallucination, issues that would occur. And so you would want to again, depending on how much they actually affect your performance, you may want to be able to account for them.
Isaac Miller: Thank you. But we're getting kind of short. So I don't know. I think we'll maybe do two more questions for the breast example, if an AI detection algorithm is trained only on real cases and tested on synthetic data, does it give the same result as if it was tested on real cases? Assuming the test and synthetic data sets are matched?
Elena Sizikova: So this is also a great question. Really getting at this project, we unfortunately I don't have the graph of the so I only have the synthetic synthetic results. But in practice we found that no, for the breast project at the moment. And this is not the case. So if you look at the breast images here, they're sort of we're doing another important processes that need to be conducted, sort of need to be added to the images to make them more realistic.
So, when we kind of qualitatively compared the synthetic mammography images to the, in breast and other patient images, they don't they're clearly distinct. So but the Amazon project sort of was just a demonstration to show that we can, in fact, use this pipeline and then obtain performance trends that are similar to what we would have expected in patient data.
But there sort of more work needs to be done in terms of actually improving this whole pipeline to make it more realistic. Whether it's post-processing the images to make them more realistic or whether it's adding things like calcifications to the images, maybe to add more realistic confounders or other things that would sort of adjust this synthetic to real. Yeah.
Isaac Miller: Thank you.
Elena Sizikova: We're going to real gap is kind of a a big problem in many of these methods. But again with enough work it may be adjusting.
Isaac Miller: All right we got a minute left. I'm going to ask the very last question here. Thank you so much. It's a rule of thumb I've heard from the FDA is 50% of test data for AIML enabled devices must come from the US. Is there a similar rule of thumb for synthetic to real data ratio?
Elena Sizikova: Again, this is more of a regulatory question, I think. Again, take a look at the Q sub mechanism. Another sort of official where should mechanisms.
Isaac Miller: Great. Thank you, thank you, thank you all for so many great questions are still rolling in. And I think we got only to about half of. But thank you so very much for your presentation and presenting your research and data. Doctor Sizikova really appreciate it and good luck to your future research. Very, very informative Q&A session. Thank you everybody for attending our Grand Rounds. That's it for today. Thank you.
Elena Sizikova: Thank you so much.