Alex Krizhevsky didn’t get into the AI business to change the course of history.
Krizhevsky, born in Ukraine but raised in Canada, was just looking to delay getting a coding job when he reached out to Geoff Hinton about doing a computer-science PhD program in AI at the University of Toronto. The fateful moment was when, as a graduate student, Krizhevsky and a fellow student named Ilya Sutskever, decided to enter the ImageNet competition, a test for AI consisting of a huge database of online images.
The competition, open to anyone in the world, was to evaluate algorithms designed for large-scale object detection and image classification. The point wasn’t just to crown a winner, but to test a hypothesis: with the right algorithm, the massive amount data in the ImageNet database could be the key to unlocking AI’s potential. The two grad students, working with Hinton as an advisor, decided to enter the 2012 competition using a fringe idea: an artificial neural network designed by Krizhevsky. The approach dominated the contest, beating every other research lab by a huge 10.8% margin.
Thus, the current AI boom was born. Google hired the three researchers to seed a new, major projects using neural nets; the technology’s decision-making prowess soon put the words “deep learning” on the lips of every founder and Silicon Valley executive. Other tech companies like Facebook, Amazon, and Microsoft started positioning their businesses around the tech.
Now Krizhevsky, following a four-and-a-half year stint at Google, is riding the wave he helped generate, by joining deep-learning startup Dessa as its technical adviser. Dessa, previously called Deeplearni.ng, works with companies to overhaul their businesses with AI. For example, it worked with Scotiabank to develop a deep-learning system that identifies the signs of potentially-delinquent customers faster.
A highly non-obvious solution
Back in his grad school years, Krizhevsky was reading papers on an earlier algorithm invented by his advisor, Hinton, called the “restricted Boltzmann machine.” He had seen graphics processing units (GPUs) used with restricted Boltzmann machines, instead of central process units (CPU). He thought that if could use those GPUs on other kinds of neural networks with more layers (or, “deep neural networks”) he could ratchet up processing speeds of deep neural networks and create a better algorithm.The result was a neural network design to quickly beat other state-of-the-art benchmarks in algorithm accuracy.
Shortly after that discovery, in 2011, Sutskever, another of Hinton’s grad students, learned about the ImageNet dataset. It was more than a million images, specifically crafted for the kinds of computer-vision algorithms that the Toronto team were trying to tackle. “I realized that his code was capable of solving ImageNet,” says Sutskever. “A highly non-obvious realization at the time.”
Krizhevsky then used the enhanced capabilities of his GPU-sped code to train the neural network on the dataset. The higher calculation speeds allowed the network to process those millions of images in five or six days, rather than the weeks or even months it would have taken previously. All the extra data that could be processed enabled the neural network to have unprecedented sensitivity in telling the differences between objects in an image.
Hinton was originally resistant to the idea, since the neural network still needed to be told which objects were in which images rather than learning the labels itself, but still contributed to the project in an advisory role. It took six months just to break even with what were then the image-classification benchmarks for ImageNet, and then another six to achieve the results the team submitted.
“[Krizhevsky] has an extremely deep understanding of [machine learning], and unlike many other researchers, he’s an engineer at heart,” says Sutskever, who is now director of research at OpenAI. “He has the ability to keep at a problem until it’s solved.”
Krizhevsky, who is soft-spoken and has never talked to the media before now, chuckles when recalling the weeks after the 2012 ImageNet results came out. “It became kind of surreal,” he says. “We started getting acquisition offers very quickly. Lots of emails.”
The “end goal of computer science”
The eventual neural-network framework was validated in a seminal research paper (pdf) in the field of AI, first presented at AI’s largest annual conference in 2012, after the ImageNet challenge. That study has now has been cited more than 24,000 times, according to Google Scholar.
The neural-network framework that resulted is now known colloquially as AlexNet, but it didn’t originally bear that name.
After the ImageNet challenge, Google tasked an intern named Wojciech Zaremba—now head of robotics at OpenAI— with recreating Krizhevsky’s paper for the company. Since Google has a tradition of naming neural networks after their creators, the company’s approximation of Krizhevsky’s neural network was originally called WojNet. But then Google won the war for the rights to hire the researchers and acquire their technology. After the acquisition, the name was rightfully changed to AlexNet.
During his Google tenure, Krizhevsky worked on Google Photos and then became deeply entrenched in the company’s self-driving car project. In September 2017, he left the company—he lost interest in the work, he says.
At Dessa, Krizhevsky will advise and help research new deep-learning techniques. The company is looking to double in size to 80 employees in 2018. To Krizhevsky, it makes perfect sense that AI has become such a force in the tech world and beyond.
“Artificial intelligence is sort of the end goal of computer science,” Krizhevsky says. “Computer science is about automating stuff, and artificial intelligence is about automating everything.”