How big data will haunt you forever: your high school transcript

Big data will show you the way.
Big data will show you the way.
Image: Reuters/Shannon Stapleton
We may earn a commission from links on this page.

Excerpted from Learning with Big Data: The Future of Education by Viktor Mayer-Schönberger, Kenneth Cukier.

Arizona State University, like many colleges across the United States, has a problem with students who enter their freshman year ill prepared in math. Though the school offers remedial classes, one-third of students earn less than a C, a key predictor that they will leave before getting a degree. To improve the dismal situation, ASU turned to adaptive-learning software by Knewton, a prominent edtech company. The result: pass rates zipped up from 64% to 75% between 2009 and 2011, and dropout rates were cut in half.

But imagine the underside to this seeming success story. What if the data collected by the software never disappeared and the fact that one had needed to take remedial classes became part of a student’s permanent record, accessible decades later? Consider if the technical system made predictions that tried to improve the school’s success rate not by pushing students to excel, but by pushing them out, in order to inflate the overall grade average of students who remained.

These sorts of scenarios are extremely possible. Some educational reformers advocate for “digital backpacks” that would have students carry their electronic transcripts with them throughout their schooling. And adaptive-learning algorithms are a spooky art. Khan Academy’s “dean of analytics,” Jace Kohlmeier, raises a conundrum with “domain learning curves” to identify what students know. “We could raise the average accuracy for the more experienced end of a learning curve just by frustrating weaker learners early on and causing them to quit,” he explains, “but that hardly seems like the thing to do!”

Big data—the ability to collect, store and process more data than ever—is poised to overturn traditional education. It will add a quantified component to aspects of learning and teaching that never experienced this before, enabling society to improve not only student performance, but the instructor’s work as well. However, there are risks.

Parents and education experts have long worried about protecting the privacy of minors. Also, people have fretted over the consequences of academically “tracking” students, which potentially narrows their opportunities in life. Big data doesn’t simply magnify both of these problems: it changes their very nature. Here, as elsewhere, the change in scale leads to a change in state.

Permanence of the past

Many parents are viscerally alarmed by the huge stockpile of personal data that is starting to accumulate over the course of their children’s schooling. For example, the nonprofit organization inBloom—backed with $100 million by the prestigious Gates Foundation and Carnegie—struck agreements with nine states to be a repository of student data. But after huge parental outcry in 2013, six of those states put the initiatives on hold.

Yet behind the intuitive opposition lies not just the conventional concern over privacy and data protection, but a more unique worry. Where traditional data protection has mostly been focused on addressing the power imbalance that results from others having access to one’s personal data, here the concern is more about the threat posed by an unshakable past. School records may not get stored in cardboard boxes and left to molder before being thrown out: they may be stored and saved forever—and continually called up at the speed of light.

Think about records of student activism being stored and made available to prospective employers when an individual applies for a job a quarter of a century later. Today past records are very hard to access, save for high-profile individuals. But in the future this information will be routinely accessible for everyone. And it may not be just “snapshot” data like standardized college admissions tests—it may be every scrap of data related to our progress as a student, from amount of sick days and visits to the guidance counselor, to number of pages read and passages underlined in Huckleberry Finn.

Hence, the first significant danger with comprehensive educational data is not that the information may be released improperly, but that it shackles us to our past, denying us due credit for our ability to evolve, grow, and change. And there is no reliable safeguard against this danger. We can’t easily change how we evaluate others, and what we take into account. Most of our thought processes happen without our ability to fully control them rationally. On the other hand, not collecting or keeping the data would stunt the benefits that big data brings to learning.

Fixed futures

The second danger is equally severe. The comprehensive educational data collected on all of us will be used to make predictions about our future: that we should learn at this pace, at this sequence, that we will have a 90% likelihood of getting a B or above if we review the material between 8:00pm and 9:00pm, but it drops down to 50% if we do so earlier in the evening, and so on. This is probabilistic prediction—and the danger is that it may restrict our “learning freedom,” and ultimately, our opportunity in life.

The huge promise of big data is that it individualizes learning and improves educational materials and teaching, and ultimately student performance. In the age of big data, these predictions will be far more accurate than today. This puts more pressure on decision makers, from admissions boards to job recruiters, to put more stock in what they foretell. In the past we could argue our case that a group to which we belonged might not apply specifically to us as an individual.

For example, some universities are experimenting with “e-advisors”—big-data software systems that crunch the numbers to help students graduate. Since the University of Arizona implemented such a system in 2007, the proportion of students who move on from one year to the next has increased from 77% to 84%. At Austin Peay State University in Tennessee, when students take a class for which software called Degree Compass indicates they will get at least a B, they have a 90% chance of doing so, compared to around 60% otherwise.

These systems can make a big difference in graduation rates, considering that in the United States only about half of students graduate within six years. But they can have pernicious consequences too. What if the system predicts we’re not likely to do well in one field, like bioinformatics, so subtly directs us toward another, like nursing? We may think it has our best interests at heart—providing us with a comfortable educational trajectory. But that may actually be the problem. Perhaps we should be pushed to succeed against the odds rather than feel content to advance along a smoother track.

One hope, and it’s just a hope, is that big data will make tracking disappear. As students learn at their own pace, and the sequence of material is algorithmically optimized so they learn best, we may see less need to formally track students.

But the reality could well be in the inverse. Customized education may actually lock in these streams more ruthlessly, making it harder for one to break out of a particular track if they wanted to or could. There are now a billion different tracks: one for every individual student. The upside is that education is custom tailored to each individual. The downside is that it may actually be harder to leap out of the canyon-like groove we’re locked into. We’re still trapped in a track, even if it is a bespoke one.

Addressing the anxiety

How to overcome these instinctual and rational fears of the dangers big data poses when applied to education?

In most countries, some form of privacy law currently protects against the comprehensive collection and long-term storage of personal information. Generally, these laws require data users to inform people whose data they collect what it might be used for and get their consent for that use. But much of the appeal of big data is that its value lies in its reuse for purposes that were scarcely contemplated when the data was initially gathered. So, informed consent at the time of collection is often impossible.

Policymakers in Europe and the United States are already discussing how to reform privacy laws to make those who use big data more accountable for any misuse of it. In return for taking on more responsibility (and thus more legal liability), data processors would be able to reuse personal information for new purposes. They would need to define what are acceptable-use categories, as well as uses that are restricted.

In education, this could permit the use of personal data to improve learning materials and tools, while using the same data to predict students’ future abilities may be allowed only under much more stringent safeguards (such as transparency and regulatory oversight). It may require the explicit consent of the students themselves. It will also need tough enforcement, so that firms that use the data know that they cannot afford to break the rules.

Ultimately, how much big-data analysis we would like to see in education, and how we best protect against the dystopian dangers we foresee, will remain a delicate tradeoff between our desire to optimize learning and our refusal to let the past dictate the future.