When we normally use the word 'information' our meaning is colloquial. We use this term very informally to convey its informal meaning. But it has a very formal mathematical definition as well. For a really enjoyable explanation of the formal meaning of its information and its history, a recent book by James Clyke is highly recommended. In the context of advertising and news information has a very important role to play. Many of you will remember the scandal that broke last year in the British media world. And you may have wondered, like I did, why did these people do these obviously illegal things? Well, simply so that you and I read their story and they become more and more popular. To put it simply, a story about a dog biting a man is not newsworthy at all. It happens very often. But if you reverse the words and a story is about a man biting a dog, well it becomes interesting because this doesn't really happen that often. Why is news so much about the scandal of such events, the rare events, the unexpected? As it turns out, Lord Shannon in his famous 1948 theory about communications defined the term 'information' formally as being related to surprise. In particular, a message informing us that some event has occurred, an event which normally has probability p of occurring, well such a message conveys a precise information content which Shannon argued was exactly minus log of p. Lets look at this for a moment. If P is one then log of P is zero. So a message that informs us that an event which is guaranteed to occur has actually occurred is not really telling us anything new. So, the information content is zero. On the other hand, if p is close to zero, that means the event is very rare, then minus log p is a very large number telling us that a rare event conveys a large amount of information. Equally importantly, Shannon defined the term bit, saying that a message informing us of an event of probability P conveys minus log P bits of information. Now one might wonder whether this bit has anything to do with the bits we all used to? Well, it's exactly the same bit and here is why. Think about the Morse code where letters are encoded by dots and dashes which is used for communication by telegraph. Outdated now, but very common at the time of Shannon. Notice that letters E, A, I, which are the most common letters in the language are encoded with a small number of symbols. Whereas less common letters are encoded with longer sequences. Similarly notice that the more common words in language itself have the shorter spellings. And, the rarer words have longer spellings. Quite simply words, or even artificial codes, naturally encode those items which are more frequent with smaller numbers of bits. Reserving the longer sequences, or the longer bit sequences for the rarer items. Just for efficiency purposes. And this is the relationship between bits and surprise. Another way to understand this is to see what happens when he is point five or when you are tossing the coin which can be either heads or tails, one or zero. The information works out to be exactly one. So the news of event which tells us that one of two equally probable things have occurred Conveys a bit of information. In fact, the concept of information is so deep, not only for our digital world but for our entire universe itself, that some physicists, such as John Wheeler have suggested that the entire laws of physics can be derived directly form information, yin and yang It from bit. Well all that sounds very interesting, but lets now return to our discussion about news, news papers and what it takes to make news sell.