Excerpted from BIG DATA: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger, Kenneth Cukier.
Computer systems currently base their decisions on rules they have been explicitly programmed to follow. Thus when a decision goes awry, as is inevitable from time to time, we can go back and figure out why the computer made it. For example, we can investigate questions like “Why did the autopilot system pitch the plane five degrees higher when an external sensor detected a sudden surge in humidity?” Today’s computer code can be opened and inspected, and those who know how to interpret it can trace and comprehend the basis for its decisions, no matter how complex.
With big-data analysis, however, this traceability will become much harder. The basis of an algorithm’s predictions may often be far too intricate for the average human to understand.
When computers were explicitly programmed to follow sets of instructions, as with IBM’s early translation program of Russian to English in 1954, a human could readily grasp why the software substituted one word for another. But Google Translate incorporates billions of pages of translations into its judgments about whether the English word “light” should be “lumière” or “léger” in French (that is, whether the word refers to brightness or to weight). It’s impossible for a human to trace the precise reasons for the program’s word choices because they are based on massive amounts of data and vast statistical computations.
Big data operates at a scale that transcends our ordinary understanding. For example, the correlation Google identified between a handful of search terms and the flu was the result of testing 450 million mathematical models. In contrast, Cynthia Rudin initially designed 106 predictors for whether a manhole might catch fire, and she could explain to Con Edison’s managers why her program prioritized inspection sites as it did. “Explainability,” as it is called in artificial intelligence circles, is important for us mortals, who tend to want to know why, not just what. But what if instead of 106 predictors, the system automatically generated a whopping 601 predictors, the vast majority of which had very low weightings but which, when taken together, improved the model’s accuracy? The basis for any prediction might be staggeringly complex. What could she tell the managers then to convince them to reallocate their limited budget?
In these scenarios, we can see the risk that big-data predictions, and the algorithms and datasets behind them, will become black boxes that offer us no accountability, traceability, or confidence. To prevent this, big data will require monitoring and transparency, which in turn will require new types of expertise and institutions. These new players will provide support in areas where society needs to scrutinize big-data predictions and enable people who feel wronged by them to seek redress.
As a society, we’ve often seen such new entities emerge when a dramatic increase in the complexity and specialization of a particular field produced an urgent need for experts to manage the new techniques. Professions like law, medicine, accounting, and engineering underwent this very transformation more than a century ago. More recently, specialists in computer security and privacy have cropped up to certify that companies are complying with the best practices determined by bodies like the International Organization for Standards (which was itself formed to address a new need for guidelines in this field).
Big data will require a new group of people to take on this role. Perhaps they will be called “algorithmists.” They could take two forms—independent entities to monitor firms from outside, and employees or departments to monitor them from within—just as companies have in-house accountants as well as outside auditors who review their finances.
These new professionals would be experts in the areas of computer science, mathematics, and statistics; they would act as reviewers of big-data analyses and predictions. Algorithmists would take a vow of impartiality and confidentiality, much as accountants and certain other professionals do now. They would evaluate the selection of data sources, the choice of analytical and predictive tools, including algorithms and models, and the interpretation of results. In the event of a dispute, they would have access to the algorithms, statistical approaches, and datasets that produced a given decision.
Had there been an algorithmist on staff at the Department of Homeland Security in 2004, he might have prevented the agency from generating a no-fly list so flawed that it included Senator Kennedy. More recent instances where algorithmists could have played a role, have happened in Japan, France, Germany, and Italy, where people have complained that Google’s “autocomplete” feature, which produces a list of common search terms associated with a typed-in name, has defamed them. The list is largely based on the frequency of previous searches: terms are ranked by their mathematical probability. Still, which of us wouldn’t be angry if the word “convict” or “prostitute” appeared next to our name when potential business or romantic partners turned to the Web to check us out?
We envision algorithmists as providing a market-oriented approach to problems like these that may head off more intrusive forms of regulation. They’d fill a need similar to the one accountants and auditors filled when they emerged in the early twentieth century to handle the new deluge of financial information. The numeric onslaught was hard for people to understand; it required specialists organized in an agile, self-regulatory way. The market responded by giving rise to a new sector of competitive firms specializing in financial surveillance. By offering this service, the new breed of professionals bolstered society’s confidence in the economy. Big data could and should benefit from the similar confidence boost that algorithmists would provide.
We envision external algorithmists to act as impartial auditors to review the accuracy or validity of big-data predictions whenever the government required it, such as under court order or regulation. They also can take on big-data companies as clients, performing audits for firms that wanted expert support. And they may certify the soundness of big-data applications like anti-fraud techniques or stock-trading systems. Finally, external algorithmists are prepared to consult with government agencies on how best to use big data in the public sector.
As in medicine, law, and other occupations, we envision that this new profession regulates itself with a code of conduct. The algorithmists’ impartiality, confidentiality, competence, and professionalism is enforced by tough liability rules; if they failed to adhere to these standards, they’d be open to lawsuits. They can also be called on to serve as expert witnesses in trials, or to act as “court masters”, which are experts appointed by judges to assist them in technical matters on particularly complex cases.
Moreover, people who believe they’ve been harmed by big-data predictions—a patient rejected for surgery, an inmate denied parole, a loan applicant denied a mortgage—can look to algorithmists much as they already look to lawyers for help in understanding and appealing those decisions.
Internal algorithmists work inside an organization to monitor its big-data activities. They look out not just for the company’s interests but also for the interests of people who are affected by its big-data analyses. They oversee big-data operations, and they’re the first point of contact for anybody who feels harmed by their organization’s big-data predictions. They also vet big-data analyses for integrity and accuracy before letting them go live. To perform the first of these two roles, algorithmists must have a certain level of freedom and impartiality within the organization they work for.
The notion of a person who works for a company remaining impartial about its operations may seem counterintuitive, but such situations are actually fairly common. The surveillance divisions at major financial institutions are one example; so are the boards of directors at many firms, whose responsibility is to shareholders, not management. And many media companies, including the New York Times and the Washington Post, employ ombudsmen whose primary responsibility is to defend the public trust. These employees handle readers’ complaints and often chastise their employer publicly when they determine that it has done wrong.
And there’s an even closer analogue to the internal algorithmist—a professional charged with ensuring that personal information isn’t misused in the corporate setting. For instance, Germany requires companies above a certain size (generally ten or more people employed in processing personal information) to designate a data-protection representative. Since the 1970s, these in-house representatives have developed a professional ethic and an esprit de corps. They meet regularly to share best practices and training and have their own specialized media and conferences. Moreover, they’ve succeeded in maintaining dual allegiances to their employers and to their duties as impartial reviewers, managing to act as data-protection ombudsmen while also embedding information-privacy values throughout their companies’ operations. We believe in-house algorithmists could do the same.
There are no foolproof ways to fully prepare for the world of big data; it will require that we establish new principles by which we govern ourselves. A series of important changes to our practices can help society as it becomes more familiar with big data’s character and shortcomings. We must design safeguards to allow a new professional class of “algorithmists” to assess big-data analytics — so that a world which has become less random by dint of big data does not turn into a black box, simply replacing one form of the unknowable with another.
Copyright © 2013 by Viktor Mayer-Schönberger and Kenneth Cukier. Reprinted with permission of Houghton Mifflin Harcourt Publishing Company. All rights reserved.
We welcome your comments at firstname.lastname@example.org.