Do your AI/ML engineers have access to your Test Dataset? This may be a $100k+ mistake.
Your Test Dataset is the set of data that is never shown to the ML training algorithm during training. It is used to estimate the ML model's performance after training.
FDA understands that if you reuse the Test Dataset too many times, you may be implicitly using it to train your model.
At the same time, FDA realize Test Datasets can be very expensive to develop. They therefore DO allow you to reuse them—but you do so carefully.
Many AI/ML startups are not being careful. Their AI/ML engineers have direct access to their test dataset. Often, anyone on the team can access this valuable data anytime. Furthermore, no records are kept of when or why they’re evaluating their model’s performance.
FDA could ask about your test data reuse. They could ask what procedures you had in place to prevent overfitting. Worst case, they may ask you to collect a new Test Dataset. For many AI/ML teams, this would be a $100k, $200k, or $500k problem.
To avoid this problem, we suggest that our client’s follow FDA’s Guidance:
In the event that you would like the [FDA] to consider the reuse of any test data in your standalone evaluation, you should control the access of your staff to performance results for the test subpopulations and individual cases. It may therefore be necessary for you to set up a “firewall” to ensure those outside of the regulatory assessment team (e.g., algorithm developers) are completely insulated from knowledge of the [test data]. You should maintain test data integrity throughout the lifecycle of the product. This is analogous to data integrity involving clinical trial data monitoring committees and the use of a “firewall” to insulate personnel responsible for proposing interim protocol changes from knowledge of interim comparative results.
This quote is actually included in two different FDA guidance. The scope of both guidance is radiology-specific devices, however, I believe FDA will apply similar reasoning to other types of devices when it publishes guidance there as well.
We haven’t seen the FDA ask a startup to create a new Test Dataset, but since the fix is simple and the cost is high, its not worth risking it!
The firewall strategy can be quite easy:
- Limit Test Dataset access
- Have your engineers create a script to evaluate their model
- Have someone else who’s responsible for running this script and selectively reporting results.
- Each time they run they run they run the script they should create a record that includes:
- When they accessed the data
- Why we used the test data
- How well the model performed (optional)
All of this would (ideally) be written up in a formal process.