Skip to content

Transcript of Jeff Dean at ScaledML 2018


Chase Hart | April 12th, 2018



Jeff Dean, Head of Google Brain

Google’s ML Applications and Systems for ML We’re at an interesting transition point. Moore’s law has slowed down, so we’re not getting the fundamental circuit scaling benefits we’ve been getting for many decades. At the same time, we have this amazing new technology for solving problems in completely different ways. It’s a new way where you learn to do things rather than hand coding them in software.
Just to give you a sense of the kinds of problems where I think machine learning is going to make a difference:  In 2008, the U.S. National Academy of Engineering put out this list of 14 grand engineering challenges for the 21st century. If we solve all these, the world would be a much better place. People would be happier and healthier. I actually think machine learning is going to impact how we solve all of these.

Advancing health informatics.
I think machine learning for health care is one of the most transformative things we could do as a society. Just to give you a couple of examples we’ve been doing in our group. There’s a disease called diabetic retinopathy, where there’s about 400 million people at risk for this every year. You look at a retinal image, and an ophthalmologist can tell you are you healthy, or showing early signs of this disease. It’s very important that you get treated because if it’s not treated, you can actually suffer vision loss.
We’ve done some early work in this space, where we now have a machine learning model trained on a fairly large number of retinal images graded by ophthalmologists, and our model is now slightly better than the median for a certified ophthalmologist when we published this paper a year ago.
But we’re also finding that there’s information in these retinal images that ophthalmologists had no idea existed. We discovered that age and gender could be predicted based on retinal images.  It turns out you can predict a lot of things that are relevant to cardiovascular. We now actually have a new biomarker from retinal images, whereas previously, you had to draw blood, do a more invasive procedure, to do these kind of predictions.

Application of ML in chemistry.  
We’ve been doing a little bit of work in this space, where we want to be able to predict various properties of molecules. The traditional way a chemist does this is they have a molecule configuration. They put it through a very computationally intensive HBC-style simulator called the density functional theory simulator and in roughly an hour, you get your answer. Does it bind with this thing? What are the various properties of this thing, of the additional molecule in this configuration?
You basically now have a supervised training problem, where you try to attach the behavior of the simulator, but with the ANI you can actually do it much faster. So essentially, we have a neural-based predictor that is indistinguishable in accuracy from the much more computationally intensive simulator, but it’s 300,000 times faster. You could imagine, that would really change how you would do chemistry. You might conceivably evaluate 100 million molecules through this chief neural-based predictor and look at the most promising 5,000 of those, or 1,000.

Reverse engineering the brain.
One of the first things you need to do to reverse engineer the brain is to understand the connectivity patterns in the brain (Google Research has been collaborating with the Max Planck Institute and other institutions on this). Essentially, reconstructing the connectivity matrix of the brain.
The way you do this is you take brain tissue, volumes of brain tissue, you slice it very thin, then you do very high-resolution electron microscope images of that, as you see in the second picture there. Then, from that stack, we try to reconstruct the jumble of connections and neurons in there. This is a graph showing the expected run length in neural tissue traced before you make an error. So essentially, as you get larger and larger, you can now, with very few errors, reconstruct larger and larger kinds of pieces of real interesting neural tissue, like a whole songbird brain.

ML in  astrophysics.
A person in our group collaborated with Harvard astrophysicists to find planets. The Kepler survey revealed significant data stating that you’d want to find planets that transit in front of stars, but it would be fragmented data that you would get from that particular part of the sky. It turns out, you can use machine learning to identify those signals, and they’ve found in this data a new planet—actually, a couple of new planets—the first known other eight-planet solar system that exists in the universe besides our own that we know about. It’s unfortunately 700 degrees there, but the years are only 14 days long.

ML in farming.
Other uses of TensorFlow are really incredibly varied. There’s a company in Amsterdam that essentially builds fitness centers for cows. They can tell when their cows are sick, and can say this cow on your herd needs attention using a bunch of signals that they gather from this fitness center. There are people who are trying to prevent deforestation by detecting very subtle chainsaw noises from a kilometer away using machine learning.
I think the point is, that machine learning is going  be everywhere. One of the things we want to nail is bring machine learning to more and more people. There are actually not that many people in the world who know how to train machine learning models as they’re applied to particular problems. The way I like to think about this is there are probably 10,000 organizations in the world that are actually deploying machine learning models and putting them into production use for real products. But there’s probably 10 million organizations that have data that should be used for machine learning.

Getting humans out of the AI  loop.
One of the things we want to be able to do, the way you currently solve machine learning problems is you take data and you have computations, and then you stir in a machine learning expert, which are fairly in short supply. Then, you stir it all together and you hopefully come up with a solution. What if we could turn that into something that looks more like you have data and a lot of computations, but you don’t need the machine learning expert to solve a new problem? Can we get the human out of the loop for solving every new problem in machine learning? That would be pretty powerful.
Our group has been looking at a bunch of different techniques for this. One of them is called Neural Architecture Search. One of the things that a machine learning expert does is they sit down, with a particularly for deep learning models,  they say okay, I’m framing this structure of models. It’s gonna have 17 layers and 3 x 3 convolutions in the first layer. The first layer is gonna connect with the second and the third layer, and so on. They make a bunch of decisions like this based on prior knowledge and other kinds of things that have worked for similar problems.

Machine Learning for Systems.
The hypothesis I have is that learning should be used throughout our computer. Traditional low-level computing software, operating systems, storage systems, doesn’t really make use of machine learning today. That should really change.
Here’s some work we’ve been doing in our group on how can we essentially take a large model that we want to run and instead of running a copy of the model on one device, we want to proof throughput of it by running that model spread across multiple devices. Essentially dividing up the computation so that we now can run a single example across many devices and you get good throughput.
For these large models, sometimes they might spit out a single chip or a single device, and you actually have to have them all parallel for that.
But it’s not always obvious how you should apply model paralleling. It’s very obvious how you apply data paralleling. But model parallelism is more subtle. One way we can spread this across multiple devices is essentially splitting the layers into different GPUs, and writing the computation across those GPUs. That’s a good human design. If the computation is a little unbalanced between each of these layers though, it’s a little less obvious what to do. It turns out, this is very amenable to reinforcement learning.
You can take a graph and a set of devices and make a map from my graph, and make all of the maps from each of those in the graph. Then, you can run that in a cycle so that you get better and better at doing placements for this particular model. We essentially then measure the step time and that gives you the reward signal for the reinforcement learning.
We’ve been working on basically having the model also decide on the clusters of nodes. Taking graphs of 80,000, 90,000 nodes and then being able to place them by first clustering them, and then placing those clusters, and doing everything in a differential way so that we also formed the clustering algorithm as part of the reinforcement learning feedback signal. It actually does pretty interesting splittings.

Learned Index Structures.
Another thing I want to talk about is the work we’ve been doing on learned index structures. If you think about the things that are in the core of traditional database systems or data management systems, there are things like B-trees, and hash tables and hash maps, Bloom filters, things like that. It turns out that we started exploring the problem of treating a B-tree.If you think about a B-tree, it’s really an approximate model.
You take in a key, and that B-tree will then predict what page of a large storage system that key should reside on, if it resides at all, exists at all on this data chart. You think of it as giving you a range in which that key must exist, which is essentially a page in a storage system or a database table. If you take the view that the B-tree is essentially a model taking a key and predicting a position, then you can actually remove the B-tree and replace that with a neural map.
We now take in a key from our key space and we try to predict an approximate position. If we know the set of keys that are present in this thing ahead of time, we can actually bound the error that this model makes for all the keys. That gives us a range on which we can search.
You can think of this as trying to predict the position in the cumulative distribution function of the keys. The keys can be numbers, it can be names or whatever kinds of sorted of keys you want, but this works pretty well, it turns out. Somewhat surprisingly well. For example, one trade off we can make is we could get something as 60% faster than the B-tree, but 1/20th the size. We could also get 17% faster in 1/100th the size.
You could also learn the hash functions. One of the things about hash functions is they tend to be designed to work well with all distributions of keys. If you replace the hash function with the model that learns to map from key to a bucket in a hash structure, you can actually get much more dense packing in these things, leave less space.
We can learn the best design properties. The evidence from neural architecture search is if you have something where you can have a good reward metric, you can actually automatically explore the search space in a way that it’s very hard for humans to rapidly explore the search. If you run 50 experiments and now you say okay, I need to look at the results of those experiments to inform what are the next set of returns I want to run, that has a human in the loop. But you can virtually run those experiments, automatically incorporate the outcomes of those experiments into the next set of experiments by updating the model.
It does mean you need this nice metric of how you measure and optimize something that you really care about, and having a clean interface easily integrate this kind of learning system into low-level computer systems is something that I think is an open area.