When American high school students—and foreign students keen on studying in the US—survey their choice of universities, they can consult one of any number of rankings and listings. Or they can trudge through the websites of dozens on colleges to make up their own minds. Either way, it’s a bit of a chore.
The website Onlyboth.com seeks to may make that process easier with a program that crawls through the thicket of available data and pulls out useful insights and comparisons. The startup’s co-founders, Raul Valdez-Perez, formerly a research faculty member at Carnegie Mellon’s computer science department, and his colleague Andre Lessa, collected data from several sources, chiefly the federal government, to create a database of each US university’s special characteristics. Type in a college name, and the website shows how it compares to its neighbors and peers, as well across lists.
Compare two of the top colleges in the US for example, Harvard and Yale: You’ll discover that Harvard grants higher average aid to undergraduates ($41,555, to Yale’s $39,771); boasts more Rhodes scholars among its alumni (341 to 230); and offers a bigger research budget ($624.2 million to $438.9 million). It also has a higher percentage of undergraduates who get financial aid (77% to 62%); fewer property crimes on campus per 1,000 students (1.23 to 3.23); and more foreign students (6,997 to 2,551). Then again, Yale is 22 meters above mean sea level, to Harvard’s 3 meters, so it can legitimately make that claim to occupying the higher ground.
A more complex example: “Brown has the 2nd-highest Times Higher Education world ranking (52nd place) of the 278 colleges that enroll from 5,000 to 9,999 students.” That sort of cross-referencing would be a challenge manually. Onlyboth’s idea was to combine several data sources and let the machine spit out the results—all without human intervention. Even the words were written by computer.
The result—automated insights, not just automated writing—might interest those in the corporate and media world trying to crack the code of algorithmic publishing.
The machine writes
Valdez-Perez likes to describe Onlyboth as a “reverse-Watson“—a reference to the IBM supercomputer that gained fame after winning on the US quiz show Jeopardy. Watson is great at looking at natural language—tweets, blogs—and finding data in it. Onlyboth takes structured data and investigates it for novel insight, and then writes it out in plain English.
Onlyboth is not the first program to make machines to do the writing. Software from the Chicago-based company Narrative Science ingests data sets and produces plain-English versions of reports, numbers and stock movements. Yseop, a European firm, does something similar, except it does it in English, French, Spanish and German. (Portuguese, simplified Chinese and Japanese are in development.)
All these firms address the same fundamental problem: Everybody from corporations to individuals to news outlets are trying to make sense of huge amounts of data, which is hard to do without sophisticated tools. Narrative Science and Yseop sell their tools as a service to companies. Onlyboth, which went live this week, is taking a different approach. Instead of releasing tools, the company has made the website available for anyone to use.
Onlyboth is still very basic: prospective students can only enter the names of colleges, not of specific attributes. For example, they cannot search for a college that is in a beach town, with high diversity, low tuition, and a great women’s ice hockey team, and come up with schools that meet the criteria.
Valdez-Perez says he isn’t yet sure what makes the most sense for the business: to serve companies or add more functions. The team is already working on applications that would parse baseball statistics, voting records of members of Congress, and facts about human genes, which Valdez-Perez hopes will find users among fans, hobbyists, and professionals. As data science becomes less esoteric and easier to access, finding patterns may become easier.