1 00:00:00,015 --> 00:00:04,377 [MUSIC] 2 00:00:04,377 --> 00:00:05,576 Okay, so in this course, 3 00:00:05,576 --> 00:00:08,685 we've talked about lots of different machine learning methods and 4 00:00:08,685 --> 00:00:12,660 lots of applications where these types of methods can be very impactful. 5 00:00:12,660 --> 00:00:15,560 But of course there a lots of open challenges that still remain in 6 00:00:15,560 --> 00:00:16,640 machine learning. 7 00:00:16,640 --> 00:00:17,710 So let's discuss some of them. 8 00:00:18,740 --> 00:00:23,080 One is the fact we often have a choice of which model to use. 9 00:00:23,080 --> 00:00:24,400 So for example, 10 00:00:24,400 --> 00:00:28,370 when we talked about recommending products we said we can use a classification model. 11 00:00:28,370 --> 00:00:32,740 Where we take features of the user and the product, pass it through this classifier 12 00:00:32,740 --> 00:00:36,990 to say whether yes or no the person will like or not like this product. 13 00:00:36,990 --> 00:00:41,020 But then we also talked about using matrix factorization where we're gonna learn 14 00:00:41,020 --> 00:00:46,380 features about users and products and use that to recommend products to the users. 15 00:00:46,380 --> 00:00:50,170 And then we also talked about featurized matrix factorization, combining these two 16 00:00:50,170 --> 00:00:57,580 ideas the list often of possible models we can consider for a task is very large. 17 00:00:57,580 --> 00:01:01,360 So typically, this leaves the practitioner very perplexed. 18 00:01:01,360 --> 00:01:04,710 Which model should I use and searching over this 19 00:01:04,710 --> 00:01:08,889 set of possible choices is still an open challenge in machine learning. 20 00:01:10,610 --> 00:01:13,980 Another really important challenge that we're often faced with is 21 00:01:13,980 --> 00:01:15,900 how do we represent our data? 22 00:01:15,900 --> 00:01:18,760 So for example, when we talked about our document modeling, 23 00:01:18,760 --> 00:01:24,230 our document retrieval task, we said, well, we could use just raw word counts, 24 00:01:24,230 --> 00:01:28,300 or we also talked about we could normalize the vectors. 25 00:01:28,300 --> 00:01:32,390 We could use things like tf-idf to account for 26 00:01:32,390 --> 00:01:37,200 very popular words, and to more emphasize the important words in the document. 27 00:01:38,430 --> 00:01:41,580 But honestly, there are lots of different variants of tf-idf, 28 00:01:41,580 --> 00:01:44,900 we just provided one example of doing this. 29 00:01:44,900 --> 00:01:49,220 You could also think about using BiTrams, and trigrams, and there are lots and lots 30 00:01:49,220 --> 00:01:53,190 of ways we can think about representing the words that appear in a document. 31 00:01:53,190 --> 00:01:58,170 That's, our, our data set of interest that we'd like to represent. 32 00:01:59,380 --> 00:02:01,150 But that's just for a document. 33 00:02:01,150 --> 00:02:03,770 Then maybe we have images. 34 00:02:03,770 --> 00:02:05,100 How do we represent an image? 35 00:02:05,100 --> 00:02:06,290 We've talked about some ways. 36 00:02:06,290 --> 00:02:10,640 We'll talk about others, but there's lots of challenges with that. 37 00:02:10,640 --> 00:02:15,310 Then maybe you have data that's really network based, 38 00:02:15,310 --> 00:02:17,820 so things like from Facebook. 39 00:02:17,820 --> 00:02:22,440 So you can have very complicated data structures and from very, 40 00:02:22,440 --> 00:02:24,462 very different, diverse data sets. 41 00:02:24,462 --> 00:02:27,450 We wanna be able to use the types of methods we've described. 42 00:02:28,610 --> 00:02:31,350 So, how we represent our data, of course, is gonna have 43 00:02:31,350 --> 00:02:36,010 significant impact on the types of inferences that we make on that data. 44 00:02:36,010 --> 00:02:40,750 So, this is a really really important problem and there's no one method for 45 00:02:40,750 --> 00:02:42,690 choosing the right representation of your data. 46 00:02:45,690 --> 00:02:49,900 One of the other really important and really significant challenges we're faced 47 00:02:49,900 --> 00:02:54,660 with in machine learning these days is how to scale up in multiple dimensions. 48 00:02:54,660 --> 00:02:59,170 So one aspect of this is the fact that data is getting bigger and bigger. 49 00:02:59,170 --> 00:03:03,440 So this is something that's been talked about extensively in the media. 50 00:03:03,440 --> 00:03:07,170 So let's just describe a few situations in which we're faced with 51 00:03:07,170 --> 00:03:09,400 a growing amount of data. 52 00:03:09,400 --> 00:03:13,830 One is the fact that there's just a large number of different platforms out there 53 00:03:13,830 --> 00:03:18,970 for social networking, collecting data via crowdsourcing, 54 00:03:18,970 --> 00:03:22,849 and different things like this, sharing your photos, your videos. 55 00:03:24,120 --> 00:03:29,640 And reviewing restaurants and the list of possible ways in which you can now 56 00:03:29,640 --> 00:03:35,420 go online and give data to the world is growing. 57 00:03:35,420 --> 00:03:36,870 And the amount of people doing this and 58 00:03:36,870 --> 00:03:39,500 providing data is just growing at this huge huge rate. 59 00:03:39,500 --> 00:03:43,159 So we have lots of new data sources available to us. 60 00:03:43,159 --> 00:03:48,562 And in addition, the way we think about buying products, we no longer often just 61 00:03:48,562 --> 00:03:53,903 go to a store and have some hand written record of what, product was purchased. 62 00:03:53,903 --> 00:03:58,926 We now have, vendors like Amazon who have these huge online 63 00:03:58,926 --> 00:04:04,050 marketplaces and collect data about different products and 64 00:04:04,050 --> 00:04:08,974 customers and different purchases being made and lots and 65 00:04:08,974 --> 00:04:13,630 lots of data from different sources like this. 66 00:04:13,630 --> 00:04:18,110 And beyond these types of websites there's also a lot 67 00:04:18,110 --> 00:04:20,270 of devices that we can now wear. 68 00:04:20,270 --> 00:04:22,840 So there are these wearable devices I can now wear 69 00:04:22,840 --> 00:04:27,630 a watch that monitors all the activities I'm doing, how I'm sleeping at night. 70 00:04:27,630 --> 00:04:30,800 I can wear glasses that record everything that I'm seeing. 71 00:04:32,590 --> 00:04:36,160 I can also talk about the Internet of Things, 72 00:04:36,160 --> 00:04:39,960 which is just lots of connected devices and 73 00:04:39,960 --> 00:04:44,010 lots of different sources of information communicating with one another. 74 00:04:44,010 --> 00:04:47,850 So these are just some of the areas in which we're seeing lots and 75 00:04:47,850 --> 00:04:52,420 lots of new data sources, but of course that's not exhaustive. 76 00:04:52,420 --> 00:04:56,410 We can also talk about things like medical records. 77 00:04:56,410 --> 00:04:58,840 Again, no longer do you go into your doctor's office and 78 00:04:58,840 --> 00:05:02,740 just have them write notes by hand that gets put in some file. 79 00:05:02,740 --> 00:05:05,240 Often, they're taking electronic health records and 80 00:05:05,240 --> 00:05:08,400 these are now communicating across systems and we have lots and 81 00:05:08,400 --> 00:05:13,460 lots of electronic health records that, it's just a source of data 82 00:05:13,460 --> 00:05:19,110 to be parsed and understood and used to innovate in medicine. 83 00:05:19,110 --> 00:05:22,100 So lots of new datasets, which is exciting. 84 00:05:22,100 --> 00:05:26,504 We can learn a lot about how people operate about our bodies, about how people 85 00:05:26,504 --> 00:05:31,042 purchase and make friends and how they go about their day to day activities but of 86 00:05:31,042 --> 00:05:35,514 course we need to have methods that scale to analyze these types of data sets and 87 00:05:35,514 --> 00:05:39,985 also to the unique structure of data that's presented in these data sets, and 88 00:05:39,985 --> 00:05:44,090 the noisy structure, and the list of challenges is really extensive. 89 00:05:46,070 --> 00:05:48,790 This is one of the very big challenges in machine learning, 90 00:05:48,790 --> 00:05:50,230 is how to deal with this big data. 91 00:05:51,450 --> 00:05:54,775 And simultaneously, to data being really large, 92 00:05:54,775 --> 00:05:59,337 we're also faced with the challenge with the fact that the models that 93 00:05:59,337 --> 00:06:03,539 we're using to analyze these increasingly complex data sets. 94 00:06:03,539 --> 00:06:08,429 Are also growing, so the models themselves are becoming bigger and 95 00:06:08,429 --> 00:06:14,115 more complicated in order to extract information from these ever growingly. 96 00:06:14,115 --> 00:06:15,915 I don't know if that's a word, but you get my point. 97 00:06:15,915 --> 00:06:20,131 These very intricate data sources and very large data sources. 98 00:06:20,131 --> 00:06:25,478 So just as an example, when we talked about clustering we talked about this, 99 00:06:25,478 --> 00:06:27,700 application where, you have, 100 00:06:27,700 --> 00:06:32,554 recordings of brain activity taken over time, so, this is just one, 101 00:06:32,554 --> 00:06:37,571 quick example of, a model that was used to analyze this type of data set, 102 00:06:37,571 --> 00:06:42,549 and without going into the details of what's shown here on this slide. 103 00:06:42,549 --> 00:06:46,133 Just realize that there are lots of circles and lots of arrows, and 104 00:06:46,133 --> 00:06:49,850 what that means is that this is a really complicated big, big model. 105 00:06:51,100 --> 00:06:55,380 So you might think, okay, well data's getting bigger, models are getting bigger, 106 00:06:55,380 --> 00:06:58,680 but that's okay because processors are getting faster. 107 00:07:00,280 --> 00:07:02,740 Well, that was the story for a while. 108 00:07:02,740 --> 00:07:08,430 We were seeing this exponentially increasing rate of increased 109 00:07:08,430 --> 00:07:10,550 speed for our processors. 110 00:07:10,550 --> 00:07:12,920 But that stopped about a decade ago. 111 00:07:12,920 --> 00:07:14,650 And now what we're seeing, 112 00:07:14,650 --> 00:07:20,520 is really very marginal increase in the speed of an individual processor. 113 00:07:20,520 --> 00:07:24,350 So instead, we have to think about new ways to scale up. 114 00:07:25,610 --> 00:07:28,670 And the typical thing that we're leveraging these days 115 00:07:28,670 --> 00:07:30,430 is collections of processors. 116 00:07:30,430 --> 00:07:34,770 And there are different architectures, that we have. 117 00:07:34,770 --> 00:07:40,740 We have things like GPUs and multicore and clusters and cloud 118 00:07:40,740 --> 00:07:46,180 computing resources and really, really fancy and expensive super computers. 119 00:07:46,180 --> 00:07:47,420 So that's great. 120 00:07:47,420 --> 00:07:49,070 Those are really, really powerful or 121 00:07:49,070 --> 00:07:52,920 potentially powerful computing resources that we have. 122 00:07:52,920 --> 00:07:55,530 But a question is how do we use these in machine learning? 123 00:07:55,530 --> 00:07:58,325 And in machine learning we have a number of challenges we're faced with. 124 00:07:58,325 --> 00:08:03,840 One is the fact that taking our machine learning out of those and 125 00:08:03,840 --> 00:08:07,320 thinking about how to distribute them across these different processors and 126 00:08:07,320 --> 00:08:12,000 run everything we want to run in a coherent way, is very challenging. 127 00:08:12,000 --> 00:08:16,990 Another challenge is how do we distribute the data across these different machines 128 00:08:16,990 --> 00:08:20,660 and how do we do all of this in a way that is 129 00:08:20,660 --> 00:08:25,120 very tolerant to different failures we can have of the individual machines. 130 00:08:26,760 --> 00:08:31,320 So these represent a number of challenges that we are facing in machine learning. 131 00:08:31,320 --> 00:08:35,590 And a lot of lot exciting research is coming out to start addressing these 132 00:08:35,590 --> 00:08:36,277 problems. 133 00:08:36,277 --> 00:08:40,149 [MUSIC]