1 00:00:00,000 --> 00:00:04,351 [MUSIC] 2 00:00:04,351 --> 00:00:08,799 So the way we're gonna learn about this data to intelligence pipeline is by 3 00:00:08,799 --> 00:00:13,246 examining a number of case studies that are gonna ground the methods that we 4 00:00:13,246 --> 00:00:15,890 present in real world applications. 5 00:00:15,890 --> 00:00:18,570 And that's one of the really unique features of this course. 6 00:00:19,630 --> 00:00:23,920 In our first case study, we're gonna look at predicting house values. 7 00:00:23,920 --> 00:00:27,030 So, the intelligence we're deriving 8 00:00:27,030 --> 00:00:30,990 is a value associated with some house that's not on the market. 9 00:00:30,990 --> 00:00:34,140 So, we don't know what it's value is and we wanna learn that from data. 10 00:00:35,340 --> 00:00:36,470 And what's our data? 11 00:00:36,470 --> 00:00:40,610 Well in this case, we're gonna look at other houses and look at their house 12 00:00:40,610 --> 00:00:45,400 sales prices to inform the house value of this house we're interested in. 13 00:00:45,400 --> 00:00:47,730 And in addition to the sales prices, 14 00:00:47,730 --> 00:00:50,160 we're gonna look at other features of the houses. 15 00:00:50,160 --> 00:00:52,460 Like how many bedrooms the houses have. 16 00:00:52,460 --> 00:00:55,070 Bathrooms, number of square feet, and so on. 17 00:00:56,320 --> 00:01:00,500 And what we're gonna do, our machine learning method is something 18 00:01:00,500 --> 00:01:05,400 that's gonna relate the house attributes to the sales price. 19 00:01:05,400 --> 00:01:09,560 Because if we can learn this model, this relationship from our house level features 20 00:01:09,560 --> 00:01:14,750 to the observed sales price, then we can use that for predicting on this new house. 21 00:01:14,750 --> 00:01:18,520 We take its house attribute and predict its house sales price. 22 00:01:18,520 --> 00:01:20,680 And this method is called regression. 23 00:01:21,730 --> 00:01:26,607 In our second case study, we're gonna explore a sentiment analysis 24 00:01:26,607 --> 00:01:30,064 task where we have reviews of some restaurants. 25 00:01:30,064 --> 00:01:34,579 So for example in this case, it says the sushi was awesome, the food was awesome, 26 00:01:34,579 --> 00:01:36,788 but the service was awful. 27 00:01:36,788 --> 00:01:38,817 And we wanna take this review and 28 00:01:38,817 --> 00:01:42,358 be able to classify whether it had positive sentiment. 29 00:01:42,358 --> 00:01:46,820 It was a good review, thumbs up or a negative sentiment, thumbs down. 30 00:01:48,350 --> 00:01:49,800 And so how are we gonna do this? 31 00:01:49,800 --> 00:01:52,850 Well, we're gonna look at a lot of other reviews. 32 00:01:52,850 --> 00:01:57,350 So, we're gonna look at the text of the review and the rating of the review. 33 00:01:57,350 --> 00:02:00,390 In order to understand what's the relationship here, for 34 00:02:00,390 --> 00:02:03,900 classification of this sentiment. 35 00:02:03,900 --> 00:02:06,420 So, for example in this case, 36 00:02:06,420 --> 00:02:10,860 maybe we might analyze the text of this review in terms of how many times it uses 37 00:02:10,860 --> 00:02:15,410 the word awesome versus how many times it uses the word awful. 38 00:02:15,410 --> 00:02:20,100 And from these other reviews that we have, we're gonna learn some decision boundary 39 00:02:20,100 --> 00:02:24,730 between based on the balance of usage of these words whether it's a positive or 40 00:02:24,730 --> 00:02:26,290 negative review. 41 00:02:26,290 --> 00:02:29,560 And the way we learn that from these other reviews is based on the ratings associated 42 00:02:29,560 --> 00:02:31,050 with that text. 43 00:02:31,050 --> 00:02:35,110 And so this method is called a classification method. 44 00:02:35,110 --> 00:02:39,941 In our third case study, we're gonna do a document retrieval task where here, 45 00:02:39,941 --> 00:02:44,051 what we wanna do, the intelligence we're deriving is an article or 46 00:02:44,051 --> 00:02:47,963 a book or something like this that's of interest to our reader. 47 00:02:47,963 --> 00:02:52,919 And the data that we have is a huge collection of possible articles that 48 00:02:52,919 --> 00:02:54,436 we could recommend. 49 00:02:54,436 --> 00:02:58,013 And what we're gonna do, in this case, is we're gonna try and 50 00:02:58,013 --> 00:03:01,741 find structure in this data based on groups of related articles. 51 00:03:01,741 --> 00:03:06,124 Such as, maybe there's a collection of articles about sports and world news and 52 00:03:06,124 --> 00:03:08,120 entertainment and science. 53 00:03:08,120 --> 00:03:11,920 And if we find this structure and annotate our corpus, 54 00:03:11,920 --> 00:03:15,680 our collection of documents with these types of labels 55 00:03:15,680 --> 00:03:19,510 which we don't have ahead of time, we're trying to infer this from the data. 56 00:03:19,510 --> 00:03:23,690 Then we can use this for very rapid document retrieval because if I'm sitting 57 00:03:23,690 --> 00:03:28,470 here currently reading some article about world news, then maybe, if I wanna 58 00:03:28,470 --> 00:03:32,000 retrieve another article, I already know which articles to search over. 59 00:03:33,180 --> 00:03:36,750 And this type of approach is called clustering. 60 00:03:36,750 --> 00:03:37,790 In our fourth case study, 61 00:03:37,790 --> 00:03:40,330 we're gonna do this really interesting thing that's called collaborative 62 00:03:40,330 --> 00:03:44,588 filtering that's had a lot of impact in many domains in the last decade. 63 00:03:44,588 --> 00:03:48,400 Specifically, we're gonna look at doing product recommendation, 64 00:03:48,400 --> 00:03:53,329 where you take your past purchases and trying to use those 65 00:03:53,329 --> 00:03:56,789 to recommend some set of other products you might be interested in purchasing. 66 00:03:58,050 --> 00:04:02,410 So in this case, the data that we're gonna use to derive this intelligence for 67 00:04:02,410 --> 00:04:07,350 product recommendation is we'd like to understand what's the relationship between 68 00:04:07,350 --> 00:04:11,550 what you bought before and what you're likely to buy in the future. 69 00:04:11,550 --> 00:04:16,530 And to do this, we're gonna use other users' purchase histories. 70 00:04:16,530 --> 00:04:18,720 And possibly, features of those users. 71 00:04:18,720 --> 00:04:22,655 But the key idea here is we're gonna take this data and 72 00:04:22,655 --> 00:04:27,728 we're gonna arrange it into this customers by products matrix where 73 00:04:27,728 --> 00:04:33,161 the squares here indicate products that a customer actually purchased. 74 00:04:33,161 --> 00:04:36,630 So those are products that are liked by that customer. 75 00:04:36,630 --> 00:04:42,274 And from this matrix, we're gonna learn features about users and 76 00:04:42,274 --> 00:04:44,608 features about products. 77 00:04:44,608 --> 00:04:47,359 And once we learn those features about users and 78 00:04:47,359 --> 00:04:50,122 products from this data that I've described. 79 00:04:50,122 --> 00:04:54,270 We can think about using those features to see how much agreement there is 80 00:04:54,270 --> 00:04:58,282 between what the user likes, different attributes the user likes and 81 00:04:58,282 --> 00:05:02,100 whether the product is actually about those attributes. 82 00:05:02,100 --> 00:05:05,515 So in the example I'm showing here, maybe a user is a mom, 83 00:05:05,515 --> 00:05:09,798 has certain features that are similar to other users that are also moms. 84 00:05:09,798 --> 00:05:13,880 And from that, we can infer things about products. 85 00:05:13,880 --> 00:05:15,260 What are attributes about? 86 00:05:15,260 --> 00:05:18,835 For example, baby products that are of interest to moms. 87 00:05:18,835 --> 00:05:22,570 And we're using that information to form our recommendations. 88 00:05:22,570 --> 00:05:25,420 And this type of approach going from this matrix, 89 00:05:25,420 --> 00:05:30,400 this customers products matrix into these learned features about users and 90 00:05:30,400 --> 00:05:33,040 products is called matrix factorization. 91 00:05:34,830 --> 00:05:37,120 Okay, well in our final case study, 92 00:05:37,120 --> 00:05:40,090 we're gonna look at a visual product recommender. 93 00:05:40,090 --> 00:05:45,040 So here, our data is somebody's gonna go to the web and they're gonna input, 94 00:05:45,040 --> 00:05:46,920 not text, but an image. 95 00:05:46,920 --> 00:05:51,406 They're gonna put an image like of a black shoe, or a black boot, or 96 00:05:51,406 --> 00:05:54,892 a high heel, or some docker shoe, or running shoe. 97 00:05:54,892 --> 00:05:59,729 And what they want is they want a set of results of shoes that 98 00:05:59,729 --> 00:06:02,690 might also be of interest to them. 99 00:06:02,690 --> 00:06:07,360 So, shoes that are visually similar to the picture that they have. 100 00:06:07,360 --> 00:06:11,668 And they wanna be able to search over those to purchase this item. 101 00:06:11,668 --> 00:06:13,528 And the way we're gonna do this, 102 00:06:13,528 --> 00:06:17,117 to be able to go from an image to a set of related images is we need to 103 00:06:17,117 --> 00:06:21,910 have very good features about that image to find other images that are similar. 104 00:06:21,910 --> 00:06:25,610 And the way we're gonna derive those really detailed features is something 105 00:06:25,610 --> 00:06:27,340 called deep learning. 106 00:06:27,340 --> 00:06:31,425 So in particular, we're gonna look at these neural networks where every layer of 107 00:06:31,425 --> 00:06:35,840 the neural network provides more and more descriptive features. 108 00:06:35,840 --> 00:06:38,450 So in the little example we show here, 109 00:06:38,450 --> 00:06:43,720 the first layer might just detect in the image things like different edges. 110 00:06:43,720 --> 00:06:47,120 Whereas with one we get to the second layer, we start detecting corners and 111 00:06:47,120 --> 00:06:49,050 more interesting features like that. 112 00:06:49,050 --> 00:06:50,850 And as you go deeper and deeper in these layers, 113 00:06:50,850 --> 00:06:53,760 you get more intricate features arising. 114 00:06:53,760 --> 00:06:58,031 So as you see, we're gonna walk through a series of real-world case studies, 115 00:06:58,031 --> 00:07:02,570 real-world problems and real-world solutions using machine learning. 116 00:07:02,570 --> 00:07:03,564 And through this, 117 00:07:03,564 --> 00:07:07,490 we're gonna explore a series of methods that have a lot of power out there. 118 00:07:07,490 --> 00:07:12,696 And are gonna allow you to be able to develop and deploy new machine learning 119 00:07:12,696 --> 00:07:17,748 techniques on new problems that aren't the exact case studies we used. 120 00:07:17,748 --> 00:07:20,828 But the case studies will allow us to really ground the methods that 121 00:07:20,828 --> 00:07:23,691 we're describing with things that are very interpretable. 122 00:07:23,691 --> 00:07:27,559 [MUSIC]