1 00:00:00,000 --> 00:00:08,031 When we normally use the word 'information' our meaning is colloquial. 2 00:00:08,031 --> 00:00:15,038 We use this term very informally to convey its informal meaning. 3 00:00:15,092 --> 00:00:20,080 But it has a very formal mathematical definition as well. 4 00:00:20,080 --> 00:00:27,467 For a really enjoyable explanation of the formal meaning of its information and its 5 00:00:27,467 --> 00:00:33,029 history, a recent book by James Clyke is highly recommended. 6 00:00:34,071 --> 00:00:45,021 In the context of advertising and news information has a very important role to 7 00:00:45,021 --> 00:00:50,017 play. Many of you will remember the scandal that 8 00:00:50,017 --> 00:00:54,007 broke last year in the British media world. 9 00:00:54,034 --> 00:01:02,000 And you may have wondered, like I did, why did these people do these obviously 10 00:01:02,000 --> 00:01:06,759 illegal things? Well, simply so that you and I read their 11 00:01:06,759 --> 00:01:10,078 story and they become more and more popular. 12 00:01:11,057 --> 00:01:18,083 To put it simply, a story about a dog biting a man is not newsworthy at all. 13 00:01:18,083 --> 00:01:25,036 It happens very often. But if you reverse the words and a story 14 00:01:25,036 --> 00:01:31,865 is about a man biting a dog, well it becomes interesting because this doesn't 15 00:01:31,865 --> 00:01:39,010 really happen that often. Why is news so much about the scandal of 16 00:01:39,010 --> 00:01:44,015 such events, the rare events, the unexpected? 17 00:01:44,042 --> 00:01:55,029 As it turns out, Lord Shannon in his famous 1948 theory about communications 18 00:01:55,029 --> 00:02:02,006 defined the term 'information' formally as being related to surprise. 19 00:02:02,006 --> 00:02:10,025 In particular, a message informing us that some event has occurred, an event which 20 00:02:10,025 --> 00:02:18,014 normally has probability p of occurring, well such a message conveys a precise 21 00:02:18,014 --> 00:02:25,002 information content which Shannon argued was exactly minus log of p. 22 00:02:25,002 --> 00:02:34,030 Lets look at this for a moment. If P is one then log of P is zero. 23 00:02:34,030 --> 00:02:42,025 So a message that informs us that an event which is guaranteed to occur has actually 24 00:02:42,025 --> 00:02:46,065 occurred is not really telling us anything new. 25 00:02:46,065 --> 00:02:53,067 So, the information content is zero. On the other hand, if p is close to zero, 26 00:02:53,067 --> 00:03:01,035 that means the event is very rare, then minus log p is a very large number telling 27 00:03:01,035 --> 00:03:06,087 us that a rare event conveys a large amount of information. 28 00:03:07,097 --> 00:03:16,099 Equally importantly, Shannon defined the term bit, saying that a message informing 29 00:03:16,099 --> 00:03:25,000 us of an event of probability P conveys minus log P bits of information. 30 00:03:25,040 --> 00:03:33,003 Now one might wonder whether this bit has anything to do with the bits we all used 31 00:03:33,003 --> 00:03:36,494 to? Well, it's exactly the same bit and here 32 00:03:36,494 --> 00:03:41,084 is why. Think about the Morse code where letters 33 00:03:41,084 --> 00:03:48,284 are encoded by dots and dashes which is used for communication by telegraph. 34 00:03:48,284 --> 00:03:53,314 Outdated now, but very common at the time of Shannon. 35 00:03:53,314 --> 00:04:02,114 Notice that letters E, A, I, which are the most common letters in the language are 36 00:04:02,114 --> 00:04:10,585 encoded with a small number of symbols. Whereas less common letters are encoded 37 00:04:10,585 --> 00:04:19,205 with longer sequences. Similarly notice that the more common 38 00:04:19,205 --> 00:04:25,077 words in language itself have the shorter spellings. 39 00:04:25,077 --> 00:04:30,098 And, the rarer words have longer spellings. 40 00:04:31,056 --> 00:04:42,517 Quite simply words, or even artificial codes, naturally encode those items which 41 00:04:42,517 --> 00:04:49,022 are more frequent with smaller numbers of bits. 42 00:04:49,069 --> 00:04:55,061 Reserving the longer sequences, or the longer bit sequences for the rarer items. 43 00:04:55,061 --> 00:05:03,005 Just for efficiency purposes. And this is the relationship between bits 44 00:05:03,005 --> 00:05:09,062 and surprise. Another way to understand this is to see 45 00:05:09,062 --> 00:05:16,097 what happens when he is point five or when you are tossing the coin which can be 46 00:05:16,097 --> 00:05:23,069 either heads or tails, one or zero. The information works out to be exactly 47 00:05:23,069 --> 00:05:27,472 one. So the news of event which tells us that 48 00:05:27,472 --> 00:05:34,954 one of two equally probable things have occurred Conveys a bit of information. 49 00:05:34,954 --> 00:05:43,012 In fact, the concept of information is so deep, not only for our digital world but 50 00:05:43,012 --> 00:05:50,811 for our entire universe itself, that some physicists, such as John Wheeler have 51 00:05:50,811 --> 00:05:59,389 suggested that the entire laws of physics can be derived directly form information, 52 00:05:59,389 --> 00:06:06,628 yin and yang It from bit. Well all that sounds very interesting, but 53 00:06:06,628 --> 00:06:14,640 lets now return to our discussion about news, news papers and what it takes to 54 00:06:14,640 --> 00:06:16,075 make news sell.