"Update Robot Doctor Now?" Gulp!
In April 2019 the US Food and Drug Administration proposed a new regulatory framework that would cover when a robot doctor gets a software upgrade.
Yikes!
Whether you call it a robot doctor or a AI/ML SaMD, they show up in your daily life in the form of devices and machines that use software to inform or even drive medical decisions, and in certain cases, even treat or diagnose without the need for human intervention. These are definitely very different classes of devices!
Most of the devices we encounter deal with non-serious healthcare situations, like standing on a bathroom scale that also informs you of your BMI. Sometimes though, the healthcare situations are serious or even critical.
It makes sense to treat each of these situations, and also each of these classes of device, differently. The FDA proposal does exactly this.
What is in the new proposed regulatory framework?
On April 2, 2019, the US FDA posted the "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML) – Based Software as a Medical Device (SaMD)" document, found here.
EXECUTIVE SUMMARY
The framework first categorizes the “risk” of the AI/ML-based SaMD as being levels, I, II, III, or IV, from lowest to highest, based on a combination of how critical the healthcare situation is, and also how significant the SaMD classification will be to the final healthcare decision. The framework goes on from there to categorize the three (3) “types of modifications” which can help developers decide when a formal (510[k]) review is required. The three types are i) Performance, ii) Inputs, and iii) Intended use.PERTINENT DETAILS
The proposal has not yet been made into regulation, and so may change substantially or be withdrawn, but some aspects of the current proposal are pertinent to those working in the field. These details include: SaMD’s are subdivided into three broad classes based on how significant the SaMD classification will be to the final healthcare decision. These three are “Inform clinical management,” “Drive clinical management,” and “Treat or diagnose.” These classes drive the clinical risk (from I to IV), further modified by the state of the healthcare situation or condition (Critical, Serious, or Non-serious).
Modifications are divided into three broad categories as follows:
Performance: Modifications related to performance, with no change to the intended use or new input type. This may include re-training with new data sets within the intended use population from the same type of input signal. As a “litmus test” this type of change will NOT change any of the explicit use claims about the product.
Inputs: Modifications to inputs, with no change in intended use. These changes may also involve changes to the algorithm for use with new types of signals, but do not change the product use claims. As examples, the document cites “modification to support compatibility with CT scanners from additional manufacturers,” or an atrial fibrillation diagnoses system that now will “include oximetry data in addition to heart rate data.”
Intended Use: These types of modifications include those that result in a change in the significance of information provided by the SaMD and/or result in a change to the healthcare situation or condition explicitly claimed by the manufacturer. Examples include an expanded patient population, such as the inclusion of pediatric population when the original SaMD was initially intended for adults, or an expanded number of diseases or conditions, such as for lesion detection from one type of cancer to another.
Finally, the proposal also examines the total product life cycle, with an understanding that AI/ML-based technologies have the potential to transform healthcare by deriving new and important insights from the vast amount of data generated during the delivery of healthcare every day.
INVITATION FOR FEEDBACK BY THE US FDA
Although the entire document is open so as to initiate discussion and for the whole world to provide feedback, there are two major areas that seem to stand out as needing expert advice. The first is the “GMLP” or “Good Machine Learning Practices” which, if followed, can insure that the SaMD is accurate, reliable, precise, and achieves the intended purpose. The second is the establishment of the SPS and ACP filings. The “SPS”, or SaMD Pre-Specification describes the anticipated modifications. The “ACP”, or Algorithm Change Protocol, covers the steps used to control risks of violating GMLP.
Want to Get Involved?
Why not try your hand at some of these machine learning techniques? You can try them on Kaggle.com, a site owned by Google. Registration is free, and the software you write runs on the Google CPU's and GPU's, so you can use a very inexpensive PC or Laptop to do very sophisticated machine learning. You can even enter some fun Competitions (I like the Titanic competition to start out with).
Once you get the hang of it, you'll be making modifications like a pro. With regard to modifications (since robot upgrades are the topic of this article), let's take a look at how your Kaggle software might learn from these FDA recommendations.
As per the FDA document relating to Medical Devices, modifications are divided into three broad categories as follows:
Performance - Example: re-training with new data sets within the intended use population from the same type of input signal.
Inputs - Example: modifications to inputs and/or algorithms with no change in intended use.
Intended Use - Example: Change from a laboratory prediction to a real-world use and/or change from informing to driving or taking action.
When it comes to all changes, we want to make sure that the device is accurate, reliable, precise, and achieves the intended purpose. Because of this, we need to understand the meaning of these terms and relate them to our application.
Accuracy and Precision: Accuracy refers to the closeness of a measured value to a standard or known value. Precision, on the other hand, refers to the closeness of two or more measurements to each other. Sometimes precision is equated to "reliability", although the latter is a measure of how dependably an observation is exactly the same when repeated. Sometimes accurate is equated to "achieving the intended purpose", although the latter assumes that indeed we have chosen the correct training set that allows us to generalize to the real world population.
Using good machine learning practices learned from medicine.
In the aforementioned FDA document, a number of steps for documentation and procedural aspects are discussed. These are definitely worth reading in detail - and if you are familiar with Kaggle competitions, you'll find much of this familiar, but below is a brief summary..
For the initially created SaMD, the steps include:
- Data selection and management
- Model training and tuning
- Performance and Clinical model validation
- Pre-market testing
- Data for re-training
- All of the usual steps abve, with the addition of SaMD Pre-specifications and Algorithm Change Protocol
- Establish clear expectations on quality systems and good ML practices (GMLP);
- Conduct pre-market review for those SaMD that require pre-market submission to demonstrate reasonable assurance of safety and effectiveness and establish clear expectations for manufacturers of AI/ML-based SaMD to continually manage patient risks throughout the life-cycle;
- Expect manufacturers to monitor the AI/ML device and incorporate a risk management approach and other approaches outlined in “Deciding When to Submit a 510(k) for a Software Change to an Existing Device” Guidance 18 in development, validation, and execution of he algorithm changes (SaMD Pre-Specifications and Algorithm Change Protocol); and
- Enable increased transparency to users and FDA using post-market real-world performance reporting for maintaining continued assurance of safety and effectiveness.
What does this mean for the software?
As a Kaggle competition entrant, mostly just the documentation of the above will be sufficient - while adherence to Good Machine Learning Practices (GMLP) should be something you are doing anyway. In the case of competitions, the rules for submissions have already been set - whereas the rules for submission of medical trial documentation will have its own (lengthy) rules which must be adhered to.
- Being clear about how we categorize the device and its intended use.
- Making plans for future modifications.
- Using good machine learning practices learned from medicine.
An Example
If you need a concrete example, I took the liberty of creating one (a Kernel) for a recent "Earthquake Prediction" competition. Specifically, I examine "what can we learn from Medical warning systems to help improve this Earthquake warning system?"I have been an active user of Kaggle, and I have shared it with my math and engineering technology students. The Kaggle team over at Google was kind enough to recognize me with the moniker of "Kernels Expert" - so hopefully this example I created will be of use to you as it has been to others.
As a potential lifesaving warning device, Earthquake Prediction has much in common with other lifesaving warning devices, most notably medical devices. There is additional commonality, in that many data scientists and researchers work on several machine learning areas, so I expect participants analyzing geophysical data in this competition may also be interested in biomedical signal analysis.
Because of this, I wanted to share the aforementioned FDA document published earlier this month. The FDA Document helps us categorize the system, describe ongoing changes to the system, and discusses good machine learning practices.
I break down the components of the proposed regulatory framework, and use this example Kernel to demonstrate how it might impact your future work as a researcher, programmer, and data scientist.
https://www.kaggle.com/pnussbaum/earthquake-pred-cnn-medical-analogy-v07
P.S. Silly and serious images are all courtesy Wikimedia.org
Comments
Post a Comment