It’s one thing when companies use algorithms to personalize ads for shoes or tag you in a Facebook photo—it’s quite another when algorithms get to decide whether to release you on bail or send you to prison. While many have voiced concerns about using algorithms to inform such high-stakes decisions, few have discussed how we can determine if they can actually be used safely and effectively.
As health-tech professionals, we think it’s useful to view these high-stakes algorithms as prescription drugs or medical devices. For decades, pharma and biotech companies have tested drugs through meticulously fine-tuned clinical trials. Why not take some of those best practices and use them to create algorithms that are safer, more effective, and even more ethical?
Both algorithms and drugs can have an enormous impact on human lives—the trick is in balancing the benefits and risks. Chemotherapy drugs, for example, can shrink a patient’s tumor, but they can also cause harrowing side effects. Algorithms are much the same. An algorithm used in child-protection services to predict endangerment may save someone from violence, but misplaced scrutiny could be unnecessarily intrusive to families.
But unlike risky drugs and medical procedures, we’re interacting with algorithms without reading the warning labels—because there aren’t any. An oncologist would likely never recommend invasive chemotherapy drugs for a healthy person; it is only when treating an aggressive cancer that we consider the risks and rewards. When considering high-stakes applications of technologies, architects of AI systems could take a similar approach that accounts for both the risks and potential benefits.
Because algorithms and drugs have commonalities, we can learn from existing, successful regulatory paradigms in health care to think about algorithms. Here are some of the ways in which they are similar.
They affect lives. Like drugs and devices, high-stakes algorithms have the potential to transform a user’s world in significant ways. Algorithms have already been developed to make recommendations about whether defendants should be released on bail, to determine heath-care benefits, and to evaluate teachers.
They can be used as medical treatment. There’s a whole new field of software-driven treatments called digital therapeutics (DTx), which are software programs that prevent, manage, or treat a disease. For instance, Akili Interactive Labs has created a DTx that looks and feels like a video game to treat pediatric ADHD. Some have even been cleared by the US Food and Drug Administration (FDA), like reSET-O, a product from Pear Therapeutics for patients with opioid-use disorder.
They perform differently on different populations. Some drugs work on one population but not another. For example, the blood thinner clopidogrel, or Plavix, doesn’t work in the 75% of Pacific Islanders whose bodies don’t produce the enzyme required to activate the drug. Similarly, algorithms can affect different populations differently because of algorithmic bias. For example, Amazon’s AI-driven recruiting platform had a systematic bias against female-oriented words in resumes.
Researchers have responded to such findings by crafting strategies for detecting and reducing algorithmic bias. Outlets like Propublica have investigated problematic algorithms, while computer scientists have created academic conferences and research centers focused on the topic. These initiatives represent steps in the right direction, just as growing momentum for more representative clinical trials is likely to lead to better medical research.
They can have side effects. Just as a drug that targets one condition can have side effects in another system, algorithms can also have unintended effects. In the online war for attention, a website that aims to increase engagement may retrospectively find that its machine-learning algorithms learned to optimize for anger- and fear-inducing content in order to increase time spent on the site. In the quest to make the product sticky, the side-effect is an unwelcomed behavior change—outrage—in its user population.
In drug development, manufacturers are required to prove the safety and effectiveness of drug products before they go on the market. But while the FDA has created and enforces such protocols, algorithms remain largely unregulated.
A few guiding principles help to illustrate how tools from drug development could be used to build safer, more effective algorithms:
Handling “adverse events.” Taking prescription drugs carries the risk of injury, hospitalization, and other adverse events, as the industry calls them. The FDA has a well-documented public reporting structure for handling such mishaps, where reports of serious events like death and hospitalization are voluntarily recorded in a public database.
But what do we do when the medical “input” is an algorithm? We currently lack strong public reporting tools to handle adverse algorithm outcomes—such as Facebook’s Cambridge Analytica crisis—but public databases could be created for common use cases.
Knowing how the product works. Before a manufacturer puts a drug to market, they need to understand the biochemical interaction between the body and the drug. But this has never been the case for algorithms. Because of something called the “black box effect,” many machine-learning algorithms are difficult or even impossible to interpret. This needs to be acknowledged and addressed where possible. When we understand how the inputs are transformed into outputs, it’s easier to understand potential risks if the system malfunctions.
Understanding who the product is for. Clinical researchers must clearly define a drug’s target users so a prescribing clinician can have confidence that the drug has been successfully tested on similar patients. Similarly, well-designed algorithms should define the characteristics of the population in which they are intended to be used. Because algorithms may perform differently on populations for whom the algorithm was not developed, it is essential to ensure that algorithms specify and document which populations and use cases they apply to. Doing so provides confidence that the risks and rewards for the target group have been sufficiently studied and deemed an acceptable trade-off.
Understanding how the product was developed. Clinical trials rely on public-trial registries and mandatory reporting by sponsors to support transparency. Such a system holds product developers accountable for conducting ethical studies and publishing their results. Today’s high-stakes algorithms often don’t share their validation methods publicly, though; because companies are so protective of their algorithmic IP, it’s often unclear how a product has been tested or whether the results are reproducible. But understanding how products are developed can help to clarify and mitigate unintended outcomes.
Informing users of the risks and benefits. The Belmont Report, written in 1979, outlines basic ethical principles involving human subjects in medical research, such as “informed consent.” But how aware are you of the subtle experiments being administered to you online? One could argue that Facebook A/B testing new design elements on its newsfeed to determine what to implement is a form of unconsented human-subject research.
The tech world’s version of informed consent are the privacy policies and terms of service that accompany many apps and websites. But these are rarely read by users, prompting some researchers to refer to them as “the biggest lie on the internet.” When the stakes are high, it is particularly important that these agreements be not only readable, but also read by users. Aiming to address the lack of data transparency, the European Union created the General Data Protection Regulation (GDPR), one of the world’s strongest data protection rules, which went into effect in May 2018 with the goal of giving the individual more power over their data. Under GDPR, companies assume more accountability for data protection and a clear responsibility to obtain the consent of the individuals from whom they collect information. Individuals can now request comprehensive information about what data a company has stored on them.
Protecting data rights and privacy. In health care, patients and clinicians have clear rights and governance rules for biospecimens like blood, urine, and genomes; researchers cannot use these specimens for research outside of studies and procedures that the patient has consented to. But there are no equivalents for “digital specimens,” which often contain highly sensitive individual data. For most technology products, data rights and governance are not clear to users, and high-stakes algorithms demand more than a one-size-fits-all approach. Data rights need to be baked into the product itself, and not blindly agreed to in a hurry to complete a sign-up process.
International quality and ethical standards have been well adopted across medical industries for decades: There’s the Good Clinical Practice (GCP) to manage clinical trials, Good Manufacturing Practice (GMP) for products, and Good Laboratory Practice (GLP) to ensure consistency and reliability of research laboratories.
Is it time for a Good Algorithm Practice (GAP)? Should we establish an FDA for algorithms?
There are several barriers to making these governance structures work. Because of the ubiquity of AI across disciplines, one global regulatory body would be unrealistic; oversight can and should be tailored to each field of application.
The health-care industry, for one, is already well positioned to regulate the algorithms within its field. The FDA has started publishing detailed content, issuing guidelines, and clearing AI-driven products like digital therapeutics. Other industries with regulatory bodies, such as education and finance, could also be responsible for articulating best practices through guidance or even formal regulation. However, many other industries do not have regulatory agencies with public accountability. In unregulated settings, industry consortia and industry leaders will have to play an important role in articulating best practices.
Society is searching for ways to develop safe and effective algorithms in high-stakes settings. Although many questions remain, concepts and tools from clinical research can be used as a thought-provoking starting point.