Welcome, Rus, I'm really glad
you could join us here today. >> Thank you, thank you Andrew. >> So today you're the director
of research at Apple, and you also have a faculty, a professor for
Carnegie Mellon University. So I'd love to hear a bit
about your personal story. How did you end up doing this
deep learning work that you do? Yeah, it's actually,
to some extent it was, I started in deep learning
to some extent by luck. I did my master's degree at Toronto,
and then I took a year off. I was actually working
in the financial sector. It's a little bit surprising. And at that time, I wasn't quite sure
whether I want to go for my PhD or not. And then something happened,
something surprising happened. I was going to work one morning,
and I bumped into Geoff Hinton. And Geoff told me, hey,
I have this terrific idea. Come to my office, I'll show you. And so, we basically walked together and
he started telling me about these Boltzmann Machines and contrasting
divergence, and some of the tricks which I didn't at that time quite
understand what he was talking about. But that really, really excited, that
was very exciting and really excited me. And then basically, within three
months I started my PhD with Geoff. So that was kind of the beginning,
because that was back in 2005, 2006. And this is where some of the original
deep learning algorithms, using Restricted Boltz Machines, unsupervised
pre-training, were kind of popping up. And so that's how I started it,
was really. That one particular morning when
I bumped into Geoff completely changed my future career moving forward. >> And then in fact you were
co-author on one of the very early papers on
Restricted Boltzmann Machines that really helped with this resurgence of
neural networks and deep learning. Tell me a bit more what that
was like working on that seminal-
>> Yeah, this was actually a really, this was exciting, yeah, it was the first year,
it was my first year as a PGD student. And Geoff and I were trying to explore these ideas
of using Restricted Boltz Machines and using pre-training tricks
to train multiple layers. And specifically we were trying
to focus on auto-encoders, how do we do a non-linear
extension of PCA effectively? And it was very exciting,
because we got these systems to work on which was exciting, but
then the next steps for us were to really see whether we can
extend these models to dealing with faces. I remember we had this
Olivetti faces dataset. And then we started looking at can
we do compression for documents? And we started looking at
all these different data, real value count, binary,
and throughout a year, I was a first year PhD student, so
it was a big learning experience for me. But and really within six or seven months, we were able to get really interesting
results, I mean really good results. I think that we were able to train
these very deep auto-encoders. This is something that you couldn't
do at that time using sort of traditional optimization techniques. And then it turned out into really,
really exciting period for us. That was super exciting, yeah,
because it was a lot of learning for me, but at the same time,
the results turned out to be really, really impressive for
what we were trying to do. >> So in the early days of those
researches of deep learning, a lot of the activity was centered on
Restricted Boltzmann Machines, and then Deep Boltzmann Machines. There's still a lot of exciting
research there being done, including some in your group, but what's
happening with Boltzmann Machines and Restricted Boltzmann Machines? >> Yeah, that's a very good question. I think that, in the early days, the way that we
were using Restricted Boltz Machines is you sort of can imagine training
a stack of these Restricted Boltz Machines that would allow you to learn
effectively one layer at a time. And there's a good theory behind
when you add a particular layer, it can prove variational bound and
so forth under certain conditions. So there was a theoretical justification,
and these models were working quite well in terms of
being able to pre-train these systems. And then around 2009, 2010,
once the Computes started showing up, GPUs, then a lot of us started realizing
that actually directly optimizing these deep neural networks was giving similar
results or even better results. >> So just standard backprop
without the pre-training or the Restricted Boltz Machine. >> That's right, that's right. And that's sort of over three or
four years, and it was exciting for the whole community,
because people felt that wow, you can actually train these deep models
using these pre-training mechanisms. And then, with more Computes people
started realizing that you can just basically do standard backpropagation,
something that we couldn't do back in 2005 or 2004, because it would
take us months to do it on CPU's. And so that was a big change. The other thing that I think that
we haven't really figured out what to do with Boltz Machines and
Deep Boltzmann Machines. I believe they're very powerful models, because you can think of
them generative models. They're trying to model coupling
distributions in the data, but when we start looking at learning
algorithms, learning algorithms right now, they require using Markov Chain Monte
Carlo and variational learning and such, which is not as scalable as
backpropagation algorithms. So we yet have to figure out more
efficient ways of training these models, and also the use of convolution, it's something that's fairly difficult
to integrate into these models. I remember some of your work on
using probabilistic max pooling for sort of building these generative
models of different objects, and using these ideas of convolution
was also very, very exciting, but at the same time, it's still extremely
hard to train these models, so. >> How likely is work? >> Yes, how likely is work, right? And so we still have to figure out where. On the other side, some of the recent
work using variational auto-encoders, for example, which could be viewed as
interactive versions of Bolzmann Machines. We have figured out ways of training
these modules, a work by Max Welling and Diederik Kingma,
on using reparameterization tricks. And now we can use backpropagation
algorithm within the stochastic system, which is driving a lot
of progress right now. But we haven't quite figured out how to do
that in the case of Boltzmann Machines. >> So that's actually a very interesting
perspective I actually wasn't aware of, which is that in an early era where
computers were slower, that the RBM, pre-training was really important, it was only faster computation that
drove switching to standard backprop. In terms of the evolution of the
community's thinking in deep learning and other topics, I know you spend
a lot time thinking about this, the generative,
unsupervised versus supervised approaches. Do you share a bit about how your thinking
about that has evolved over time? >> Yeah, I think that's a really,
I feel like it's a very important topic, particularly if we think about
unsupervised, semi-supervised or generative models because to some extent
a lot of successes that we've seen recently is due to supervised learning,
and back in the early days, unsupervised learning was primarily
viewed as unsupervised pre-training, because we didn't know how to
train these multi layer systems. And even today, if you're working
in settings where you have lots and lots of unlabeled data and
a small fraction of labeled examples, these unsupervised pre-training models, building these generative models,
can help for supervised eyes. So I think that a lot of us in
the community, it kind of was the belief. When I started doing my PhD, was all about
generative models and trying to learn these stacks of model because that was the
only way for us to train these systems. Today, there is a lot of work
right now in generative modeling. If you look at
Generative Adversarial Networks. If you look at variational auto-encoders, deep energy models is something that
my lab is working on right now as well. I think it's very exciting research, but
perhaps we haven't quite figured it out, again, for many of you who are thinking
about getting into deep learning field, this is one area that's,
I think we'll make a lot of progress in, hopefully in the near future. >> So, unsupervised learning. >> Unsupervised learning, right. Or maybe you can think of it
as unsupervised learning, or semi-supervised learning, where you have,
I give you some hints or some examples of what different things mean and I throw
you lots and lots of unlabeled data. >> So that was actually a very important
insight that in an earlier era of deep learning where
computers where just slower, the Restricted Boltzmann Machine and
Deep Boltzmann Machine that was needed for initializing the neural network weights,
but as computers got faster, straight backprop then
started to work much better. So one other topic that I know you
spend a lot of time thinking about is the supervised learning versus generative
models, unsupervised learning approaches. So how does your, tell me a bit about how your thinking
on that debate has evolved over time? >> I think that we all believe that we
should be able to make progress there. It's just all the work on Boltz machines,
variational auto-encoders, GANs. You think a lot of these models
as generative models, but we haven't quite figured it out
how to really make them work and how can you make use of large moments. And even for, I see a lot of in IT sector,
companies have lots and lots of data, lots of unlabeled data,
lots of efforts for going through annotations
because that's the only way for us to make progress right now. And it seems like we should be able to make use of unlabeled data because
it's just abundance of it. And we haven't quite figured
out how to do that yet. >> So you mentioned for people wanting
to enter deep learning research, unsupervised learning is exciting area. Today there are a lot of people
wanting to enter deep learning, either research or applied work,
so for this global community, either research or applied work,
what advice would you have? >> Yes, I think that one of
the key advices I think I should give is people entering that field, I would encourage them to
just try different things and not be afraid to try new things, and
not be afraid to try to innovate. I can give you one example, which is when I was a graduate student,
we were looking at neural nets, and these are highly non-convex
systems that are hard to optimize. And I remember talking to my friends
within the optimization community. And the feedback was always that, well,
there's no way you can solve these problems because these are non-convex,
we don't understand optimization, how could you ever even do that
compared to doing convex optimization? And it was surprising,
because in our lab we never really cared that much about
those specific problems. We're thinking about
how can we optimize and whether we can get interesting results. And that effectively was
driving the community so we we're not scared,
maybe to some extent because we were lacking actually the theory
behind optimization. But I would encourage
people to just try and not be afraid to try to
tackle hard problems. >> Yeah, and I remember you once said,
don't learn to code just into high level deep learning frameworks, but
actually understand deep learning. >> Yes, that's right. I think that it's one of the things that
I try to do when I teach a deep learning class is, one of the homeworks,
I'm asking people to actually code backpropogation algorithms for
convolutional neural networks. And it's painful, but
at the same time, if you do it once, you'll really understand how these
systems operate, and how they work. And how you can efficiently
implement them on GPU, and I think it's important for you to,
when you go into research or industry, you have a really good understanding
of what these systems are doing. So it's important, I think. >> Since you have both academic
experience as professor, and corporate experience, I'm curious,
if someone wants to enter deep learning, what are the pros and cons of doing
a PhD versus joining a company? >> Yeah, I think that's
actually a very good question. In my particular lab,
I have a mix of students. Some students want to go and
take an academic route. Some students want to go and
take an industry route. And it's becoming very challenging because
you can do amazing research in industry, and you can also do amazing
research in academia. But in terms of pros and
cons, in academia, I feel like you have more freedom to work
on long-term problems, or if you think about some crazy problem, you can work on
it, so you have a little bit more freedom. At the same time the research that you're
doing in industry is also very exciting because in many cases with
your research you can impact millions of users if you
develop a core AI technology. And obviously,
within the industry you have much more resources in terms of Compute, and
be able to do really amazing things. So there are pluses and minuses,
it really depends on what you want to do. And right now it's interesting, very interesting environment where
academics move to industry, and then folks from industry move to academia,
but not as much. And so it's, it's very exciting times. >> It sounds like academic machine
learning is great and corporate machine learning is great, and the most
important thing is just jump in, right? Either one, just jump in. >> It really depends on your preferences
because you can do amazing research in either place. >> So you've mentioned unsupervised
learning is one exciting frontier for research. Are there other areas that you consider
exciting frontiers for research? >> Yeah, absolutely. I think that what I see now,
in the community right now, particularly in deep learning community,
is there are a few trends. One particular area I
think is really exciting is the area of deep
reinforcement learning. Because we were able to figure out how
we could train agents in virtual worlds. And this is something that in just
the last couple of years, you see a lot, of lot of progress, of how can we scale
these systems, how can we develop new algorithms, how can we get
agents to communicate to each other, with each other, and
I think that that area is, and in general, the settings where you're interacting
with the environment is super exciting. The other area that I think
is really exciting as well is the area of reasoning and
natural language understanding. So can we build dialogue-based systems? Can we build systems that can reason,
that can read text and be able to answer questions intelligently. I think this is something that a lot
of research is focusing on right now. And then there's another
sort of sub-area also is this area of being able to
learn from few examples. So typically people think of it as
one-shot learning or transfer learning, a setting where you learn
something about the world, and I throw you a new task at you and
you can solve this task very quickly. Much like humans do without requiring
lots and lots of labeled examples. And so this is something that's, a lot of
us in the community are trying to figure out how we can do that and how can we come
closer to human-like learning abilities. >> Thank you, Rus, for
sharing all the comments and insights. That was interesting to see, hearing the story of your
early days doing this as well. >> [LAUGH].
Thanks, Andrew, yeah. Thanks for having me.