How to think critically about polls and rankings

Ever since The Harvard Business Review declared data science to be “the sexiest job of the 21st century” in 2012, I feel like I have been promoted. I used to be a statistician, with a PhD and a professorship at a major university. This almost guaranteed glazed eyes and short conversations at social events. Now I’m a data scientist!

Ever since The Harvard Business Review declared data science to be “the sexiest job of the 21st century” in 2012, I feel like I have been promoted. I used to be a statistician, with a PhD and a professorship at a major university. This almost guaranteed glazed eyes and short conversations at social events. Now I’m a data scientist!

Permit me to use my new status to express some skepticism, particularly when it comes to rankings and polls. As a consumer of information, I can tell you they play right into our insatiable need to order entities. As a scientist, I can tell you their methods often are flawed and are easily subject to manipulation, making them highly fallible representations of reality.

Take, for example, university rankings. Should a top-notch student prefer Columbia over Stanford? In the latest US News and World Report’s National University Rankings, Columbia (#3) bested Stanford (#6). But in the Times Higher Education’s World University Rankings of US schools, Stanford (#1) destroyed Columbia (#12). How can an 18-year-old make sense of this? (And even if they could, how would they now begin to factor in the effects of the pandemic on each campus and on their classes?)

I am often asked by friends whether we should believe polls, a concern that was front and center during the US presidential election. My answer is, “How do I know?” The quality of each poll is rooted in the way in which the data are collected and not in the analysis: the most careful examination of bad data will lead to the most precise wrong answer.

The pitfalls of polling

For argument’s sake, let us make the generous assumption that surveys are constructed in an honest way with a phone conversation that does not suggest a preferred answer. Let us also assume that households are randomly chosen, again in a way that is not biased toward answers of a certain proclivity. The likelihood that the intended respondent is at home or does not hang up before answering the questions is not under the pollster’s control. If only a small fraction of those who are called respond, who is to say that these respondents answer in a similar way to the many for whom a response is not obtained?

These potential pitfalls are why I am leery of polling results. But the enterprise of rankings rankles even more.

We are inundated with rankings: of the best cities to live in, best cities to retire to, and best cities in which to raise families, of the most powerful people in an industry, and of universities, cars, politicians, entertainers, and professional athletes. There seems to be an insatiable need to order entities.

But rankings are often meaningless or even misleading because of the data inputs and how they are manipulated. If rankings are presented just for interest’s sake, they might be harmless. But they can have bad consequences for those who rely on them to make decisions, and can lead to poor decisions on the part of the entities being evaluated, particularly when they have the opportunity to affect the values of the attributes on which they are ranked.

Consider the example of ranking cities to determine which are most desirable for quality of life. The analysis begins with choosing attributes that matter. In one such ranking, these attributes might include temperature, crime rate, and cultural amenities. Weights are assigned to each of the attributes to reflect their relative importance. Each city is scored on each of these attributes. These scores are multiplied by their corresponding weights and summed to obtain an overall score for each city, which are then ordered to provide the ranks.

So what’s the problem? For starters, who is to say that the weights that are used for the analysis match anyone’s particular preferences? You might prefer the good weather of San Diego and I might prefer the museums and cultural activities available in New York. We all do not like crime so care must be taken that this variable is properly oriented to imply that low crime is a good score. Assuming that temperatures that are either too hot or too cold are generally not desirable, this variable needs to be presented so that the bottom of the scale is extreme weather and the top of the scale is temperate weather. However, some people dislike exceedingly cold weather more than exceedingly hot weather and others have the reverse sensitivity.

The weights are critical. Of the 248 cities that were considered in one recent ranking, well over half could plausibly be ranked #1 with the suitable choice of weights. To make matters worse, the weights are often not disclosed at all, or at best they appear in small print in footnotes.

To complicate matters further, weights may change from ranking to ranking. If New York is considered a “better place to live” than San Diego one year but not the next, is this because San Diego has really improved (or New York declined) or because someone has decided that balmy weather is more important than ever?

The trouble with inputs

The rankings for a person, place, or institution depend not only on the weights assigned them but also on the values derived from the input data.. And these values might not be reliable, which is the second major problem with calculating rankings.

Take the example of the Chronicle of Higher Education’s rankings of departments in different academic disciplines. One of the inputs to these rankings is the views of faculty in each discipline comparing departments across universities. In one such survey some time ago, Princeton’s statistics department was ranked #2. On the face of it, that does not sound strange as Princeton is an excellent university. But at the time of the survey, Princeton did not even have a statistics department.

What went wrong? It had the reputation of having an excellent department, primarily because John Tukey, arguably the leading statistician in the middle of the 20th century, was on its faculty. But Tukey left for Bell Labs, and the department, which was small to begin with, collapsed. Many of the academics queried from universities around the country were not aware, however, that Princeton’s statistics department had ceased to exist.

Where else rankings can go wrong

There’s no shortage of other ways in which rankings can distort reality.

If raw scores for a ranking are on a scale of 0-10, but the ranking analysts divide the scores into quartiles, there might not be much daylight between a school in the top quartile and a school one quartile down.

And if the ranking is derived from evaluations by the students attending each school, that, too, is problematic, as comparing scores from students at one institution to students at another won’t necessarily tell you which school is best. (That’s how, in the first survey of MBA programs, roughly 30 years ago, Northwestern bested Harvard.)

And then there are all the opportunities for data manipulation. In the case of university rankings, there are examples of institutions simply lying about the data of their students. Of course, that is highly unethical, but data manipulation doesn’t need to be so nefarious.

Consider the faculty-to-student ratio, typically an important component in the rankings of universities. How could this be anything other than objective? It’s one number divided by another. But one prominent university reported this number to be 15:1 in one year and 6:1 in the following year, which led to a substantial climb in the rankings for that university. That change in ratio could mean that either enrollment fell 60% in one year or that the faculty increased in size by two and a half fold. Or, more plausibly, the university simply changed the definition of “professor” used in its reported numbers. Large academic institutions may have associated medical schools and hospitals which employ many physicians who have academic titles but are involved only in clinical practice or in some combination of clinical practice and research. And a substantial number of faculty may practice at satellite clinics far from campus. Regardless, these faculty are not part of the team that educates undergraduates. Their presence likely has zero bearing on the quality of an undergraduate’s education. But including them in the faculty size does wonders to improve the faculty-to-student ratio.

Another problem with how rankings are computed is the need for change. Would rankings be interesting if they did not change over time? Of course not. So, to “sell” these rankings it is necessary for them to change to some extent. Accordingly, most of the big university rankings change from year to year—but having been in a university setting my whole life, I am comfortable saying that real change in universities happens at a snail’s pace because of the complexities of these institutions.

The inevitable conclusion

All these problems with how rankings are calculated leads to the inevitable conclusion that relying on published rankings rather than focusing on personal research and preferences may lead to poor decisions for any particular individual. And that indeed may be the case. For example, maybe the academic competition at the typically top-ranked Harvard, Princeton, or Yale would be detrimental for some high-achieving students, although to others it would be an important impetus. Remember, too, that much of a student’s university experience and value is obtained outside of the classroom, which is not measured in rankings. Having peers with whom one can relate and areas in which one can shine are critical, particularly as students find themselves away from home perhaps for the first time.

So the value of the ranking enterprise to those who use them is ambiguous at best. The results may be misconstrued or consumers may be led astray. But we must also consider in equally severe measure the potential effects of rankings on those being ranked. These effects go beyond possibly motivating unethical behavior like manipulating or outright lying about the data. Because these rankings are taken at face value and viewed to be important by consumers, the entities which are ranked might very well modify their behavior with the sole purpose of improving their rank. For example, universities typically are ranked in part on their graduation rate. Could that be an incentive for a university to lower its standards, thereby increasing its graduation rate? Or, for rankings that are determined by student surveys, might universities inflate grades and water down courses—or concentrate more on “student amenities” and less on academics—with the goal of making students happier and getting a more favorable rating from them? It would be very difficult for outsiders to evaluate how rankings negatively impact the quality of an institution. But the possibilities are surely there.

Crafting better rankings

We rely too much on rankings because of our thirst for knowing which choice is best, even when the answers are fallible and not personalized to our preferences. However, with a few important changes, rankings could be improved to enable better decision making.

For example, misleading responses about the faculty to student ratio could be changed by redefining the measurement. What is the ratio trying to measure? Perhaps it is the availability of faculty to the students, and the sizes of classes. So, one could ask for the following information instead: 1) What is the average class size? and, perhaps, 2) To what extent do students engage in research, one-on-one, with a faculty member? To be sure, even these measures can be manipulated. Many universities offer the same class, meeting at the same time, with the same professor, but under different numbers in the course guide, depending on the student’s program within the university. There might be four course numbers, with each course populated with 25 students, giving the impression of an average class size of 25, but really there are 100 students taking the class together. With some careful thought, definitions, and directions on how to respond, the data inputs could clearly be improved.

As consumers of rankings, we, too, need to be much more discerning. If we place our own weights on factors that should matter most to us in our important decisions (realizing that one size never fits all), and take the input data with a grain of salt (supplementing it with our own research for important decisions), we can affect our decision making process in a positive way.

So let the best schools, cities, cars, athletes, and executives have their moment; just remember that understanding the pitfalls of rankings can empower us to take control of their influence on our behavior.