What standardized testing and schools could learn from Target

Excerpted from Anya Kamenetz’s latest book, The test: Why our schools are obsessed with standardized testing—but you don’t have to be, published by PublicAffairs.

Technologies in use in about a third of US K-12 schools today have the potential, say their creators, to relieve some of the worst toll taken by standardized testing: the anxiety, coaching, and cheating, to give a few examples. This is about using software to reduce the inefficiencies of standardized testing and produce detailed, personalized feedback that emphasizes growth over achievement.

These technologies belong to the world of learning analytics, also known as adaptive learning software.

Valerie Shute, a professor at Florida State University and former principal research scientist at ETS, coined the term “stealth assessment.”

“In 1974 businesses had to close down once or twice a year to take stock,” she says. “Barcodes and automated checkout instead allowed for collection of a continuous stream of information. Business owners can aggregate data and examine the trends. It’s very, very powerful and allows for all sorts of just-in- time decision making.”

In the business world, this is dubbed predictive analytics and is heavily used by marketers. For example, by looking at patterns of purchasing items like body lotion and vitamin supplements, Target was able to create a “pregnancy score” that guesses whether a particular customer is pregnant, the better to start showering her with coupons for baby products and diapers.

The flow of interactions in a classroom is far richer than the flow of transactions in a Target. But that doesn’t mean similar tools can’t yield important insights, Shute says.

“If business can collect and use a continuous stream of information, why the hell can’t we in education do that too? Wouldn’t it be great to aggregate this information and make inferences about developing competencies at any point in time and at any grain size?”

Right now we bring school to a halt to take high-stakes tests. Maybe a student is nervous or had a bad night’s sleep. Or she may be a good crammer, but will have forgotten it all in three weeks. With stealth assessment, there is no big test day. Testing is simultaneous with learning. And the cost of testing is folded into the cost of teaching.

The big test companies agree: Software could replace standardized tests very soon. All that is currently lacking is the large-scale studies to validate the results of these programs by comparing them to students’ actual performance in school, as well as their scores on traditional tests. Kimberly O’Malley is the senior vice president of school research at Pearson Education. Pearson officially put together her research group in 2012 to get academics in the fields of learning, assessment, and educational technology working more closely together. “Invisible, integrated assessment, to me, is the future,” she says. “We can monitor students’ learning day to day in a digital scenario. Ultimately, if we’re successful, the need for and the activity of stopping and testing will go away in many cases.”

In 2009, at the first-ever Venture Capital in Education Summit at Stanford University, I met a brash dot-commer named Jose Ferreira. He painted his learning analytics company, Knewton, as nothing less than an educational messiah. “Look at what other industries the Internet has transformed,” he told me then. “Print, digital, video, music. Travel, hotels, restaurants, retail—anything with a big information component. But for whatever reason, people don’t see it with education. It is blindingly obvious to me that it will happen with education.” As he elaborated in a later interview: “All the content behind education is going to move online in the next 10 years. It’s one giant Oklahoma land grab—one big tectonic shift. And that is what Knewton is going to power.”

As a reporter for five years for the technology and business magazine Fast Company I’ve heard those kinds of claims quite a few times. But Ferreira was one bigmouth who put his money where his mouth was. By the fall of 2013, Knewton’s software platform was available to the vast majority of the nation’s colleges and K-12 school districts through partnerships with three of the five major textbook publishers: Pearson, MacMillan, and Houghton Mifflin Harcourt.

Teaching machines

The dream of automated learning is even older than computers themselves. In 1924 an educational psychology professor named Sidney Pressey built a mechanical teaching machine that supplied questions, along with the correct answers, when a button was pressed. B.F. Skinner, the famous behaviorist, also introduced a teaching machine in 1954, a clunky thing that looked like a typewriter. He claimed some of the very same benefits for it that you hear from ed-tech entrepreneurs to this day—allowing every student to move at his own pace, supplying immediate feedback, improving motivation.

Computer programmers have been working on so-called intelligent tutoring systems since the 1970s, and they’ve gotten quite good for certain jobs. In a 2011 review of research that carefully compared software-based tutoring to one on one tutoring with a human, the resulting improvement in student performance was almost exactly the same in both cases—0.76, or three-quarters of a standard deviation, for the computers, and 0.79 for the humans.

Today, besides the Knewton-powered products, other adaptive software is sold by Dreambox, Scholastic, and Khan Academy, a nonprofit. Most work more or less the same way. They introduce concepts with text, video or animations, ask students to respond in the form of quick short-answer or multiple-choice questions, give them increasingly broad hints when they get stuck, and choose what concept to bring in next based on the student’s responses. What the “adaptive” part means is that the specific selection of questions and order of content presented to each student will vary according to the students’ responses. You take a quick diagnostic test or start off with a medium-hard question. If you get it right, you proceed to harder questions; flub it and you get easier questions. As a result, each student’s path and pace through the material is slightly different.

These platforms log and analyze lots of data on each student—not just each right or wrong answer but every mouse click, every hover, every hesitation or deletion. Most come with dashboards showing at a glance how many math problems a student has solved, how many concepts within a lesson plan they’ve covered, how many times they logged in, how much time they’ve spent, and similar indicators.

Math and English mechanics (grammar, vocabulary, spelling) are the subjects most commonly taught with the help of such programs. But software engines can be used to help present any set of facts and concepts. A research group at Carnegie Mellon has created a set of adaptive college courses to automatically teach French, biology, statistics, logic and more.

Ferreira has long been a mortal enemy of standardized testing. A former professional poker player with a Harvard MBA, he started out in education working for Kaplan. At one point he discovered a flaw that turned a very technical math question on the GREs into child’s play. ETS deleted an entire section from the test and referred to him privately as “the Antichrist.” Kaplan turned the hack into the centerpiece of an international marketing campaign.

So it’s not surprising that Ferreira, too, believes his platform and those like it could reduce the need for high-stakes tests. “Knewton allows for much more gentle, passive data collection via ongoing formative assessment in homework/classwork,” he told me in an online chat. “We can predict your score on a bunch of these high stakes tests anyway, lessening the need for so many of them.”

“Formative assessment” refers to the feedback that is part of nearly any teaching and learning scenario, such as when a teacher calls on the class during the lecture, or when a student is studying vocabulary with flash cards and flips the card over to see the right answer. It’s opposed to summative assessment, which “sums up” learning at the end of a period of time, ranging from a unit test to a graduation exam.

The Angry Bird model

Most kids have experience getting ongoing formative assessment while playing video games. They can see at a glance how many Angry Birds they’ve already launched at the pigs, how many levels they have to go, their rankings compared to other players and their all-time personal best. Since this feedback is ongoing, it’s inherently more fluid. Games always offer the chance to try again and do better next time. Failure is part of the process.

I observed a seventh grade remedial math class in Los Altos, California, using Khan Academy software in the fall of 2012, that seemed to operate on these principles. Students could earn badges, little sun and moon graphics, for solving ten similar problems in a row, solving problems quickly, attempting harder material, or for helping each other out.

With a glance at her dashboard, the teacher, Courtney Cadwell, could easily pinpoint where students were struggling. She pulled these students out for intensive work in groups of one or two, while making sure everyone else was on task. The program supplied the students much more practice time than in years past. They progressed quickly—gaining an astonishing 2.5 to 3.5 grade levels in the first 12 weeks. At one point, while I was watching, the teacher had the whole class racing to solve as many fraction multiplication problems as possible, as they watched their collective progress zoom upwards on a screen at the front of the room. It was exciting and even fun.

Cadwell, a former NASA recruit, took advantage of the time saved by the Khan Academy system to introduce more demos and experiments that brought math concepts to life. She saw her remedial students, at an age when they were statistically likely to give up on math altogether, instead discover a love for the subject. It’s “kind of like a game,” John Martinez, 13, told me. “It’s kind of an addiction—you want a ton of badges.”

Software-mediated learning can be powerful. For example, when Knewton produced a remedial math course for college freshmen at the University of Arizona, half the students mastered the material four weeks or more ahead of schedule. Ferreira says this is because the course was presented to them at the exact pace they could handle and in the order that made the most sense to them, instead of a predetermined chunk of the syllabus each week. Most software platforms use the “mastery-based learning model,” which means you don’t move on to a new concept until you can demonstrate a good grasp of an underlying basic concept. Seems like a no-brainer for solid learning, except that’s not at all what happens in a typical class, where the pace for the group is dictated by the teacher. Even for the Arizona students who couldn’t pass the final exam, the system was able to identify who was making progress, and would pass given, say, half a semester more. Compared to, “you failed, now start over,” that’s the kind of feedback that can really reinforce a growth mindset.

Drill and kill or grow and evolve?

Clearly, Team Robot doesn’t have all the answers. Learning- analytics platforms are, so far, largely customized for teaching times tables and “i before e except after c.” That makes them a complement, not a replacement, for a 21st century curriculum. They don’t handle truly creative or collaborative work. In the hands of a less skilled teacher than Courtney Cadwell in Los Altos, they have the potential to increase emphasis on drill and kill. And cheating could become as easy as hacking a computer.

The politics of software-centered testing are complicated too. The big textbook and test incumbents, alongside a raft of new startups like Knewton and large technology companies like Apple, are seizing on this stuff for a reason. The integration of assessment with materials, all delivered on an electronic device, is a huge sales opportunity. More technology tends to mean a greater role for private business in schools, as vendors and partners, for good and for ill.

Most troubling would be if computer-centered teaching and assessment were implemented as a means to improve the “efficiency” of education—that is, by hiring fewer teachers. There’s something a bit, well, bloodless about a vision that puts software in the drivers’ seat in this way.

More to the point: if education’s purpose in the 21st century is to prepare students to excel at the very tasks that computers can’t master, it would follow that any task gradable by a computer is probably less than central to that mission. The optimal role of software is as an aid to human teaching, not a replacement for it. The difference between what a piece of technology can accomplish in the hands of a skilled and creative human practitioner and what machines can do alone is like the difference between an espresso crafted by a master barista operating a $24,000 LaMarzocco, and a cup of coffee from a vending machine.

While overhauling our accountability system may not be as simple as a software upgrade, there are things to love about Team Robot. The Scantron score sheets of the past focused on static “achievement” or even worse, “aptitude”; today’s technology is making it possible to focus on growth. “The snapshot in time has driven the accountability system in the past, and that has served its purpose for a while,” says O’Malley of Pearson. “We’ve really recommended looking at models of growth: not just where students are today, but where they’ve come from and where they need to go.”

You can follow Anya on Twitter at @ anya1anya. We welcome your comments at ideas@qz.com.

home our picks popular latest obsessions search