[MUSIC] Okay, so in this course, we've talked about lots of different machine learning methods and lots of applications where these types of methods can be very impactful. But of course there a lots of open challenges that still remain in machine learning. So let's discuss some of them. One is the fact we often have a choice of which model to use. So for example, when we talked about recommending products we said we can use a classification model. Where we take features of the user and the product, pass it through this classifier to say whether yes or no the person will like or not like this product. But then we also talked about using matrix factorization where we're gonna learn features about users and products and use that to recommend products to the users. And then we also talked about featurized matrix factorization, combining these two ideas the list often of possible models we can consider for a task is very large. So typically, this leaves the practitioner very perplexed. Which model should I use and searching over this set of possible choices is still an open challenge in machine learning. Another really important challenge that we're often faced with is how do we represent our data? So for example, when we talked about our document modeling, our document retrieval task, we said, well, we could use just raw word counts, or we also talked about we could normalize the vectors. We could use things like tf-idf to account for very popular words, and to more emphasize the important words in the document. But honestly, there are lots of different variants of tf-idf, we just provided one example of doing this. You could also think about using BiTrams, and trigrams, and there are lots and lots of ways we can think about representing the words that appear in a document. That's, our, our data set of interest that we'd like to represent. But that's just for a document. Then maybe we have images. How do we represent an image? We've talked about some ways. We'll talk about others, but there's lots of challenges with that. Then maybe you have data that's really network based, so things like from Facebook. So you can have very complicated data structures and from very, very different, diverse data sets. We wanna be able to use the types of methods we've described. So, how we represent our data, of course, is gonna have significant impact on the types of inferences that we make on that data. So, this is a really really important problem and there's no one method for choosing the right representation of your data. One of the other really important and really significant challenges we're faced with in machine learning these days is how to scale up in multiple dimensions. So one aspect of this is the fact that data is getting bigger and bigger. So this is something that's been talked about extensively in the media. So let's just describe a few situations in which we're faced with a growing amount of data. One is the fact that there's just a large number of different platforms out there for social networking, collecting data via crowdsourcing, and different things like this, sharing your photos, your videos. And reviewing restaurants and the list of possible ways in which you can now go online and give data to the world is growing. And the amount of people doing this and providing data is just growing at this huge huge rate. So we have lots of new data sources available to us. And in addition, the way we think about buying products, we no longer often just go to a store and have some hand written record of what, product was purchased. We now have, vendors like Amazon who have these huge online marketplaces and collect data about different products and customers and different purchases being made and lots and lots of data from different sources like this. And beyond these types of websites there's also a lot of devices that we can now wear. So there are these wearable devices I can now wear a watch that monitors all the activities I'm doing, how I'm sleeping at night. I can wear glasses that record everything that I'm seeing. I can also talk about the Internet of Things, which is just lots of connected devices and lots of different sources of information communicating with one another. So these are just some of the areas in which we're seeing lots and lots of new data sources, but of course that's not exhaustive. We can also talk about things like medical records. Again, no longer do you go into your doctor's office and just have them write notes by hand that gets put in some file. Often, they're taking electronic health records and these are now communicating across systems and we have lots and lots of electronic health records that, it's just a source of data to be parsed and understood and used to innovate in medicine. So lots of new datasets, which is exciting. We can learn a lot about how people operate about our bodies, about how people purchase and make friends and how they go about their day to day activities but of course we need to have methods that scale to analyze these types of data sets and also to the unique structure of data that's presented in these data sets, and the noisy structure, and the list of challenges is really extensive. This is one of the very big challenges in machine learning, is how to deal with this big data. And simultaneously, to data being really large, we're also faced with the challenge with the fact that the models that we're using to analyze these increasingly complex data sets. Are also growing, so the models themselves are becoming bigger and more complicated in order to extract information from these ever growingly. I don't know if that's a word, but you get my point. These very intricate data sources and very large data sources. So just as an example, when we talked about clustering we talked about this, application where, you have, recordings of brain activity taken over time, so, this is just one, quick example of, a model that was used to analyze this type of data set, and without going into the details of what's shown here on this slide. Just realize that there are lots of circles and lots of arrows, and what that means is that this is a really complicated big, big model. So you might think, okay, well data's getting bigger, models are getting bigger, but that's okay because processors are getting faster. Well, that was the story for a while. We were seeing this exponentially increasing rate of increased speed for our processors. But that stopped about a decade ago. And now what we're seeing, is really very marginal increase in the speed of an individual processor. So instead, we have to think about new ways to scale up. And the typical thing that we're leveraging these days is collections of processors. And there are different architectures, that we have. We have things like GPUs and multicore and clusters and cloud computing resources and really, really fancy and expensive super computers. So that's great. Those are really, really powerful or potentially powerful computing resources that we have. But a question is how do we use these in machine learning? And in machine learning we have a number of challenges we're faced with. One is the fact that taking our machine learning out of those and thinking about how to distribute them across these different processors and run everything we want to run in a coherent way, is very challenging. Another challenge is how do we distribute the data across these different machines and how do we do all of this in a way that is very tolerant to different failures we can have of the individual machines. So these represent a number of challenges that we are facing in machine learning. And a lot of lot exciting research is coming out to start addressing these problems. [MUSIC]