After the tragedy and mayhem of last year’s Boston Marathon bombing, figuring out projected finish times for the race’s non-finishers was a low-priority task. But the attack left runners stranded across the city, distressed and confused, and without a sense of closure—a problem that a team of statisticians decided to tackle.
Their systems for calculating finish times, which used methods similar to the one Netflix uses to predict peoples’s movie preferences, could also be used to help future racers calculate their finish times as they run.
The study, published in March in PLOS One, is not the first to estimate runners’ pace, but the researchers say their model is more accurate. Throughout the Boston Marathon, race officials from the Boston Athletic Association (BAA) record each runner at 5-km intervals called “splits.” As the runners pass each 5-km checkpoint, officials use the time to calculate their pace, and generate an estimated finish time. This linear extension is called “constant pace,” and the researchers for this study believe it’s flawed.
The problem, they say, is that most people tend to peter out in the final stretches, and some runners fluctuate wildly throughout. The race times were open, and the problem is easy to describe, but because there is such a large dataset and many variable outcomes, the researchers turned to a strategy Netflix used to turn 100 million ratings of about 18,000 movies from 480,000 subscribers into a predictive tool for smart recommendations.
To solve this ”matrix completion problem,” the statisticians created a grid of known paces for each runner with blank spaces for those whose finish times were unknown. They plugged that data into nine different statistical models—including constant pace—to get realistic estimates. The data didn’t just come from the non-finishers, and included races prior to 2013. One model called for times from a runner’s earlier marathons, while another compared each runner to 200 other runners with similar paces. Many also included multipliers that took into account a runner’s ability, their age and sex, and the difficulty of different sections of the race route.
Four of the methods worked equally well, and could predict finish times for 80% of the runners to within two minutes. The rest of the methods were too far off to be useful. Women, young people, and fast runners were easier to predict than men, older people, and slow runners, though the researchers didn’t offer an explanation of why.
The Boston race is one of the most popular in the country, and has some of the strictest entry requirements. To qualify, runners have to finish an approved marathon in a highly competitive age-and-sex-specific time. Many of those runners use their time from the previous Boston Marathon as their qualifier for the next year. Demand to be in this year’s race was higher than ever before, possibly because Americans decided that running it would be a great way to thumb their noses at terrorism. In response, race organizers increased capacity to 36,000 runners, up from 23,000 in 2013.
Regardless of projected finish times, the Boston Athletic Association decided to give all the 2013 non-finishers who made it at least halfway through the race a pass on their qualification requirements for this year’s race.
These results could also be used to help competitive runners get a better sense of their pace during races, the researchers said. A team on the sidelines could apply any one of the statistical methods and relay the information to the runner, who could then speed up or slow down to save energy, based on her goal.