1 00:00:00,000 --> 00:00:06,835 For Homework six, you will need to implement a programming assignment to do 2 00:00:06,835 --> 00:00:14,351 with Bayesian networks. This is the Bayesian network that we'll be 3 00:00:14,351 --> 00:00:19,567 evaluating. This is a simple diagnostic network where 4 00:00:19,567 --> 00:00:28,641 the variable are characteristics about the patient such as whether they smoke or not, 5 00:00:28,641 --> 00:00:37,102 Whether they have visited Asia or not. Some symptoms such as whether or not they 6 00:00:37,102 --> 00:00:44,403 have dyspnea, or whether or not they have tested positive for an X-ray, which shows 7 00:00:44,672 --> 00:00:51,924 occlusion in the lung area, and there are three possible conditions that they may 8 00:00:51,924 --> 00:00:56,669 have, which is tuberculosis, lung cancer, or bronchitis. 9 00:00:56,669 --> 00:01:03,473 And our goal will be to write a program that will evaluate the a posteriori 10 00:01:03,473 --> 00:01:10,720 probability of each of the three possible diseases given some combination of 11 00:01:10,720 --> 00:01:14,364 evidence. For example, you maybe given that the 12 00:01:14,364 --> 00:01:19,990 patient has visited Asia and has a positive X-ray, and then you need to 13 00:01:19,990 --> 00:01:26,709 figure out which disease is most likely. The probabilities associated with this 14 00:01:26,709 --> 00:01:30,990 network are given as follows. They're also available in this paper, 15 00:01:30,990 --> 00:01:33,845 where this example was originally published. 16 00:01:33,845 --> 00:01:40,003 Let me explain what these things mean. Alpha is the variable about Asia which has 17 00:01:40,003 --> 00:01:46,693 the probability that somebody has actually visited Asia is 0.01. As a consequence, 18 00:01:46,693 --> 00:01:53,268 the probability of not visiting Asia that is the probability of not A is 0.99. So, 19 00:01:53,268 --> 00:01:57,437 those probabilities are implied in this description. 20 00:01:57,437 --> 00:02:03,210 So, this is not the, all the possible values that you will need to encode. 21 00:02:03,210 --> 00:02:08,823 Please note that. Similarly, the probability of tuberculosis given that 22 00:02:08,823 --> 00:02:14,275 somebody's visited Asia, is five%.. Whereas, in general, if they have not 23 00:02:14,275 --> 00:02:19,593 visited Asia, it's one%.. As a consequence of this, the probability 24 00:02:19,593 --> 00:02:24,779 that somebody does not have tuberculosis if they have visited Asia is 95%. 25 00:02:24,779 --> 00:02:30,819 And the probability that they do not have tuberculosis if they have not visited Asia 26 00:02:30,819 --> 00:02:34,158 is 99%.. So, the converse probabilities need to be 27 00:02:34,158 --> 00:02:37,853 inferred for each possible case in this description. 28 00:02:37,853 --> 00:02:43,488 Don't forget that in your encoding. Once you've encoded this, you will be 29 00:02:43,488 --> 00:02:50,201 given, evidence and asked questions about the a posteriori or rather the most likely 30 00:02:50,201 --> 00:02:54,728 explanation that is the best diagnosis given the evidence. 31 00:02:54,728 --> 00:03:00,973 For example, given no evidence whatsoever, what is the chance that you would have 32 00:03:00,973 --> 00:03:04,954 tuberculosis? And the answer is just over one%, if you 33 00:03:04,954 --> 00:03:11,143 believe this belief network. Your program can encode these 34 00:03:11,143 --> 00:03:19,273 probabilities in tables and use SQL, the way of we described in class, to evaluate 35 00:03:19,273 --> 00:03:24,500 the required a posteriori probability for each condition. 36 00:03:25,240 --> 00:03:32,577 Using SQL makes it easier. And this network is certainly not too big 37 00:03:32,577 --> 00:03:39,325 for any SQL engine to handle. Sql Lite3 in Python, which is built in, is 38 00:03:39,325 --> 00:03:44,944 a good thing to use, or you can use any SQL engine, or you can encode the network 39 00:03:44,944 --> 00:03:50,631 directly using any algorithm that you may wish to read up, but the one which we have 40 00:03:50,631 --> 00:03:54,875 covered in class is SQL. So, this is the assignment. 41 00:03:54,875 --> 00:04:01,060 Please implement your program, try it out for different possible conditions, and 42 00:04:01,060 --> 00:04:04,505 then open Homework six to try and answer it. 43 00:04:04,505 --> 00:04:11,004 It will be a timed homework, as before, so that I am sure you're actually using your 44 00:04:11,004 --> 00:04:14,918 program. But there will be enough time to run your 45 00:04:14,918 --> 00:04:21,182 program, type in the conditions that are given, and get the answer and answer the, 46 00:04:21,182 --> 00:04:24,120 the homework. Good luck.