1 00:00:00,000 --> 00:00:03,817 [MUSIC] 2 00:00:03,817 --> 00:00:07,875 So, we now built a whole new kind of restaurant experience, or 3 00:00:07,875 --> 00:00:11,410 restaurant review experience, using classifiers. 4 00:00:11,410 --> 00:00:14,620 Let's dig in and really understand a little bit more what a classifier is, 5 00:00:14,620 --> 00:00:17,610 and some other applications of classification. 6 00:00:17,610 --> 00:00:22,550 So a classifier takes some input x, for example a sentence from the review or 7 00:00:22,550 --> 00:00:24,440 other inputs as we'll see. 8 00:00:24,440 --> 00:00:27,840 It pushes it through what's called a model to output some value y, 9 00:00:27,840 --> 00:00:29,770 that we're trying to predict. 10 00:00:29,770 --> 00:00:34,570 And here it's a class, for example, positive or negative. 11 00:00:34,570 --> 00:00:38,630 So positive in the case of sentiment analysis corresponds with thumbs up 12 00:00:38,630 --> 00:00:42,000 reviews, while negative corresponds with thumbs down. 13 00:00:42,000 --> 00:00:44,820 But this is just one example of classification. 14 00:00:44,820 --> 00:00:47,620 You can look at text, for example, a web page, I want 15 00:00:47,620 --> 00:00:52,230 to figure out which webpages interest me, and I can align them to categories. 16 00:00:52,230 --> 00:00:55,850 For example, is this about education, a page about education, 17 00:00:55,850 --> 00:00:59,770 is it a page about finance, is it a page about technology and so on? 18 00:00:59,770 --> 00:01:01,550 So, there's not just two categories. 19 00:01:01,550 --> 00:01:03,040 There can be three, four, or 20 00:01:03,040 --> 00:01:05,200 even thousands of categories I'm predicting from. 21 00:01:07,180 --> 00:01:11,150 Now, another example of classification, 22 00:01:11,150 --> 00:01:15,820 which really has impacted all of our lives, is in spam filtering. 23 00:01:15,820 --> 00:01:21,900 Some of you might remember, perhaps in the early 2000s, what spam filters were like. 24 00:01:21,900 --> 00:01:24,520 The quality was not very good. 25 00:01:24,520 --> 00:01:27,830 Really, they were all hand-tuned things where somebody said, 26 00:01:27,830 --> 00:01:30,330 oh it has certain words, it must be spam. 27 00:01:30,330 --> 00:01:33,580 But the spammers kept changing the words a little bit, 28 00:01:33,580 --> 00:01:37,760 adding numbers instead of letters, and beating the spam filters. 29 00:01:37,760 --> 00:01:40,030 And what really changed the world of spam filtering. 30 00:01:40,030 --> 00:01:44,880 It changed it so much that I don't even look at my spam folder, actually, sorry if 31 00:01:44,880 --> 00:01:49,930 your message went to my spam folder, I just don't open it, is machine learning. 32 00:01:49,930 --> 00:01:50,990 It's classifiers. 33 00:01:50,990 --> 00:01:53,030 They took the input x of the email and 34 00:01:53,030 --> 00:01:57,080 they fed it through a classifier that predicted whether it's spam or not. 35 00:01:57,080 --> 00:01:58,760 It did that really well. 36 00:01:58,760 --> 00:02:01,480 And it does it by not just looking at the text of the email, but 37 00:02:01,480 --> 00:02:02,820 it looks at other characteristics. 38 00:02:02,820 --> 00:02:05,240 So for example, who sent it. 39 00:02:05,240 --> 00:02:08,570 If it somebody's whose a close friend, or somebody you have lots of vacation, 40 00:02:08,570 --> 00:02:10,320 with it's less likely to be spam. 41 00:02:10,320 --> 00:02:14,180 The IP address is a person sending from the usual computer and so on. 42 00:02:14,180 --> 00:02:15,000 Lots of information. 43 00:02:16,480 --> 00:02:19,230 So this is another really interesting practical application. 44 00:02:21,590 --> 00:02:24,460 In computer vision, we do a lot of classification. 45 00:02:24,460 --> 00:02:27,200 We'll take an image, and figure out what is in that image. 46 00:02:27,200 --> 00:02:30,860 For example, here, the input x are the pixels of the image. 47 00:02:30,860 --> 00:02:32,030 We feed it to a classifier, and 48 00:02:32,030 --> 00:02:35,080 we're going to predict things like is this a dog? 49 00:02:35,080 --> 00:02:39,090 In fact it's a Labrador Retriever, Golden Retriever, or a different kind of dog. 50 00:02:39,090 --> 00:02:42,620 This is actually my dog, Labrador Retriever. 51 00:02:43,650 --> 00:02:46,709 And as we will see later on in the deep learning module, 52 00:02:46,709 --> 00:02:51,049 there's really interesting new ways of doing this with very high accuracy. 53 00:02:53,044 --> 00:02:56,422 Now you can also use classification in medical diagnosis systems, and 54 00:02:56,422 --> 00:02:58,960 in fact this is what your doctor does. 55 00:02:58,960 --> 00:03:02,570 They take your temperature, maybe they look at your x-ray, they look at some 56 00:03:02,570 --> 00:03:07,720 medical tests and they try to make a prediction about what's ailing somebody. 57 00:03:07,720 --> 00:03:10,940 Maybe they say nah, nah, you're just healthy, or you have a cold, 58 00:03:10,940 --> 00:03:13,190 you have the flu, maybe even you have pneumonia. 59 00:03:13,190 --> 00:03:15,853 That's a variable y that's being predicted, 60 00:03:15,853 --> 00:03:18,730 what is the disease that's going on. 61 00:03:18,730 --> 00:03:24,030 Now these days, there's really interesting new things around personalized medicine, 62 00:03:24,030 --> 00:03:27,660 because the prediction doesn't just depend on the standard measurements, but 63 00:03:27,660 --> 00:03:29,710 can be really personalized for me. 64 00:03:29,710 --> 00:03:34,440 Can depend on my particular DNA sequencing, which is pretty exciting and 65 00:03:34,440 --> 00:03:39,423 also my lifestyle, which maybe look something like 66 00:03:40,530 --> 00:03:43,420 or maybe more realistic, something like this. 67 00:03:45,580 --> 00:03:47,550 And so, given all these measurements, 68 00:03:47,550 --> 00:03:50,050 can make an even better prediction at what's ailing me. 69 00:03:52,150 --> 00:03:54,870 Now this idea of classification in off-machine learning, 70 00:03:54,870 --> 00:03:58,890 has really gone much further, even to be able to read your mind. 71 00:04:00,260 --> 00:04:05,610 So when I was at Carnegie-Mellon, Tom Mitchell, one of my friends and colleagues 72 00:04:05,610 --> 00:04:09,740 at his office next door, and he come to me and says, we've done this amazing thing. 73 00:04:09,740 --> 00:04:12,970 Which would take an image of your brain using a technology called FMRI, 74 00:04:12,970 --> 00:04:18,750 which is brain scan, and predict when you're reading a word of text, 75 00:04:18,750 --> 00:04:22,580 whether you're reading the word hammer or the word house. 76 00:04:22,580 --> 00:04:24,990 So it's really reading your mind. 77 00:04:24,990 --> 00:04:28,060 But in fact, he went on to do many interesting things. 78 00:04:28,060 --> 00:04:32,810 For example, if you're looking at a picture of a hammer or a house, but 79 00:04:32,810 --> 00:04:37,310 you train the classified on them you reading words hammers and house, 80 00:04:37,310 --> 00:04:41,260 you're still able to read your mind and figure out what picture you're looking at. 81 00:04:42,330 --> 00:04:43,221 So this is yet 82 00:04:43,221 --> 00:04:48,168 the next frontier of classification to understand how the brain works. 83 00:04:48,168 --> 00:04:48,668 [MUSIC]