So now we really got to face that crucial question. How can we tell when a generalization from a sample is a good argument? We're not going to be able to do a lot of mathematical statistics here. That would take a whole course and you ought to take one, they're, they're useful, it's important, it's interesting. Definitely worth taking a course on statistics. But all we're going to be able to do here is look at some really common errors that people make, so you won't be led, misled in these obvious ways. It won't so much help you do statistical studies of your own, but it will help you avoid getting fooled by people who cites statistical studies for conclusions that you might not want to believe. To illustrate these problems, I did a little survey. I happen to love chocolate chip cookies. Now, not all chocolate chip cookies are equally good. Some of them, I think, have too few chips. You need a lot of chocolate chips. But you don't want too many, because then it's just a bunch of chocolate and you don't get the dough, the, the butter, the sugar. That stuff is good too. So you want just the right balance here. You want, I would say, in a, say a three-inch diameter cookie, you want about ten to 12 chocolate chips. And there happened to be five bakeries in my hometown that serve chocolate chip cookies. So, I went and got ten chocolate chip cookies from each of the five bakeries and counted the chocolate chips in the cookies and I found out that 80% of the chocolate chip cookies that I bought from bakery A had ten to 12 chips in them. So, I conclude that 80% of the chocolate chip cookies from bakery A had ten to 12 chocolate chips in them. That sounds like a pretty good argument, doesn't it? And I could have sampled 20 or 30, but 10 is a pretty good number for a sample. So, you ought to believe that conclusion, right? There's nothing wrong with that argument, is there? Well, what's the problem? The problem is, I'm lying. I didn't buy a single cookie. I didn't do this survey. What does that show you? That shows you that one problem for statistical generalizations from samples is that the premises have to be true. kind of obvious. And what it shows is that just like a valid argument is no good unless it's also sound when it's a deductive argument. Similarly, what would be a strong argument if the premises were true is no good if the premises aren't true. But also like deductive arguments, it's not enough for the premises to be true. What if I counted, I really did buy the cookies, and I really did count, but I missed a bunch of them because I was going so fast? Or, I couldn't count the chocolate chips very well because they like all melted together? Or, maybe I got the cookies from bakery A mixed-up with the cookies from bakery B or bakery C or bakery D, then I just made a mistake when I was counting the cookies. Then it turns out the premises of the argument, of the generalization from the sample, those premises are false, but not because I'm lying. Rather, because I didn't count well, so I'm unjustified. And of course it doesn't help if I'm justified if you're not justified. You are not going to have any reason to believe this about the cookies, unless not only did I count them accurately, and carefully, and reliably, but you have reason to believe that I did so. So the general point is simply that when you face a generalization from a sample, then the premises have to be true and justified. So the first question you ought to ask about any generalization from a sample is, are the premises true and are they justified? Next, let's assume that I'm honest, I'm not lying, and I count carefully and thoroughly so I don't make a mistake and I'm justified in believing that I haven't made a mistake. I go to all five of the bakeries, and I buy a cookie from each bakery, and I count the chips in the, in the cookies that I bought. And it turns out that the cookie from bakery A has 11 chips in it and the cookies from bakery B, bakery C, bakery D, bakery E, they all have less that ten chips in them. So now I can do two of these generalizations from samples, right? I can say, well, 100% of the cookies that I sampled from bakery A have between ten and 12 chips. Therefore 100% of the cookies from bakery A have between ten and 12 chips, but 0% of the cookies I sampled from bakery B have ten to 12 chips. So, 0% of the cookies from bakery B had ten to 12 chips. what's wrong with that argument? Well, I hope it's pretty clear the, the problem is you can't generalize from just one cookie. The cookies aren't made in a totally mechanical way where they count the chips before they put them in the cookies, then you can't know that every cookie is the same and if you can't know that every cookie is the same, then you can't generalize from one cookie alone to all the cookies in the bakery. You might have gotten one that happened to have 11 chips when all the other cookies in the bakery had less than ten. Or, from the other bakeries or B, C, D, E, you might have gotten a cookie that had less than ten when actually they all, all the rest of the cookies in the store had between ten and 12. So when you generalize from such a small from a sample that's too small, then it's called the fallacy of hasty generalization. And so obvious that it's hard to believe that it's actually often committed. It's a very common fallacy. You know, your next door neighbor will buy a new car and it breaks down and you say kind of car is no good. Or you meet somebody from Sweden and this person from Sweden likes football and so you say people from Sweden like football. and people just constantly generalize from extremely small samples in order to form generalizations that guide their behavior in everyday life. Sometimes they're right but a lot of times they're wrong, that's when you have the fallacy of hasty generalization. To avoid that fallacy we need to ask a second question about generalizations from samples. Namely we have to ask is the sample large enough? Now notice the samples come in varying sizes, from just one item to almost the whole set. And so the question that we have to ask first is, is it large, but exactly what that means. What we really want to know is, is it large enough? Because sometimes, a very small sample can be plenty big. Just imagine that you come across an apple tree and you want to find out whether the apples off that tree float in the water. So you bring in a tub of water, you pull of an apple off the tree, and you put it in a tub and it floats. Now, you can generalize that all the apples or maybe almost all the apples off that tree will float in water. One is a big enough sample in that case. Now, why is that? It's because you have background information from biology that the apples on that tree are going to be very similar to each other because of how they arose. And so sometimes, a single instance is going to be enough even if it's not enough when we're counting chips in chocolate chip cookies. The other point about the sample being large enough that you need to keep in mind, is that whether it's large enough, depends on what the stakes are. If you're testing a bunch of parachutes to see if they work, you better not just check a few. Even ten is not enough like we used for the chocolate chip cookies. You want to check every parachute to make sure that it's packed properly and is going to work or you're going to have disaster when they fail. Well, what about chocolate chip cookies? Suppose you take a sample of ten and it turns out that that sample's not really representative. Big deal. So there are nine chips or 13 chips you know, it's just not that serious an issue. So, a sample can be large enough for something that doesn't matter like chocolate chip cookies without being large enough for something that really does matter, like whether parachutes are packed properly. So whether the sample is large enough depends on the background information, that's what the apple case showed us, and also what's at stake, that's what the parachute case showed us. Next, let's assume that I'm not lying and the count's accurate, you're justified in believing it's accurate, and the sample's big enough. All that is settled, right? So what I did was I went into the five bakeries and turned to the person behind the counter and I said, I'm doing this little survey because I gotta figure out which place I want to buy my chocolate chip cookies in town and," and, I want to find out whether how many of your cookies have between ten and 12 chocolate chips. So, could you sell me ten chocolate chip cookies and I'm going to do my survey. So, then I take five, ten cookies from each of the five bakeries, and I bring them home, and I count them, and sure enough, from bakery A we find that 80% of the cookies that I bought from bakery A have 10-12 chips and so I conclude that 80% of the cookies that are made in bakery A have 10-12 chips and that seems like a pretty good argument, doesn't it? It's got a big enough sample and if you don't believe, that let's say it's 20 cookies. There's still something wrong, something wrong with argument. What's wrong with that argument? Well, I told the guy behind the counter that I was doing this survey to figure out where I was going to buy my chocolate chip cookies. So if he wants me to buy chocolate chip cookies from his cookie shop, from his bakery, then he's going to look in the counter for the ten cookies that look like they have about 10-12 chips in them. So it might be no surprise that 80% of the cookies that I sampled have 10-12 chips because he picked out the ones that did. This is called the fallacy of biased sampling. Sometimes the sample that you take is not representative of the whole because it's biased in a certain way. In this case, it was biased by the person behind the counter and what he knew about what I was doing and what his motives are in trying to get me to buy from his shop. But the fallacy of bias sampling is something we have to watch out for. And, what we need to do then to avoid this fallacy is to ask a third question about all generalizations from samples. We want to know where the premises are showing justified, we want to know whether the sample's large enough, we also have to ask is the sample biased in any way that's going to weaken the argument? It might seem that the fallacy of biased sampling is so obvious that nobody who was careful and any good at what they were doing would ever commit that mistake, but actually, people do it all the time. Some of the top pollsters in history have done it. The most famous example was Franklin Delano Roosevelt who was running for president in 1936 against Alf Landon and the Literary Digest did a poll that took just tons of data. I can't remember how many tens of thousands of letters they sent out in this poll. And they reached the conclusion from their poll, that Alf Landon was going to win, 56 to 44, but that's not the way it turned out, turned out that Roosevelt won, 62 to 38. They were way off, they weren't even close. So what was going on with that? Well, they needed to get addresses of people to send the survey to. And what did they use for that? They used a phone book. But back then, remember this is 1936, a lot of people didn't have phones. And in particular, poor people didn't have phones and people who lived in rural areas didn't have phones. And it was those poor people and people who lived in rural areas that loved Franklin Roosevelt, because of the new deal and all the policies that he had put in that helped them out. So they voted for him and he won by a landslide and the prediction had gone the other way and it's all because of bias sampling and the same problem continues today. Many pollsters, especially when they want to do a really quick poll, will call up people on their phones. But there are lots of restrictions and lots of problems with that. For one thing, you're not supposed to do polls on cell phones, but a lot of young people only have cell phones and don't have landlines. So that means that young people are going to be underrepresented in the sample. And even if they have a land line, well, they often have caller ID and they know it's a pollster, so they don't answer so you get very low response rates. And some studies have found that women tend to answer the phone more than men, so you get samples skewed in that direction. So what do you do when you get a biased sample like this? Well, if you know it's been biased in these ways, then you try to correct for that. That's where the pollsters don't all agree. They correct in different ways. Because think about it, suppose that this time, you got a lot more people from the Liberal Party than you did last time. That might mean that your sample is skewed, but it might mean, but they're just more people that have moved over to the Liberal Party or to the conservative party, either way. How do you know the proper way to correct so as to get an accurate answer? And different polling organizations use different techniques and that explains why very often the polls reach quite different result about who's ahead in a, in an election. Another reason why polls often reach different results is that some pollsters are dishonest. I know it comes as a shock, but it's true. And pollsters can sometimes reach the conclusions that they want to reach by slanting their questions. So the fourth question that we want to ask about any generalization from samples is, was the question slanted in some way? Which means was phrased in a way that made it more likely to reach one result rather than a conflicting result. So how does this happen? Well, a simple way to do it is to word the question in such a way that people are going to feel bad about giving a certain answer. So, for example, if you want to find out how many people in a certain society think that it's, it's wrong to experiment in scientific experiments on animals, then you can always ask one question if you want one answer and another question if you want another answer. So what about this one? suppose they say, is it okay to kill a mouse in order to save a human? I think most people are going to say yes. And then they're going to say so they, those people who gave that answer, they actually support scientific experiments on mice to save human lives, therefore, they support animal experimentation. If you want to reach the other result in your poll, then you can say, should scientists torture animals in their experiments? Nobody wants to say they're going to be for torture. So they're going to say, no, scientists shouldn't do that. And then, you can conclude so most people are against animal experimentation because animal experimentation is torturing animals. In fact you can go a little further. You could say, should scientists stop torturing animals in their experiments? It's like, should, when did you stop beating your wife? It presupposes that they are torturing animals. And then, people who say, of course they should stop it. I didn't even realize they were doing it. Well, you're just guaranteeing the results in your poll by the way you ask the question. Another way to so slant your questions in a poll is to give the survey participants limited options. So in one example, the New York Time magazine reported a poll by the Doris Day Animal League, where they said 51% of the people we surveyed think that chimpanzees should be treated about the same as children, but what happened in the survey was they were only given four options. They could say chimpanzees should be treated like property or they should be treated a lot like children or they should be treated like adult humans or you're not sure? But notice the given those options, you don't want to say that chimps are like property, because you can just destroy your property if you want. If you got an old car and you want to tear it to bits with a sledgehammer, that's up to you. And they're not like adult humans, because we don't think they should be given schooling and so on or votes, or so on. So, you're pretty much left with either saying you're not sure, and you feel like, well, I should have thought about this a little bit. So you, most people were saying, 51% were saying that chimpanzees ought to be treated similar to children. So by limiting the options to a list where all the other options were undesirable for one reason or another the pollsters can get you to pick the option that they want you to pick. And in the next trick, because you report the conclusion right, they actually didn't say that 51% said the chimpanzee should be treated similar as children. They said that the survey showed that primates should be treated the same as children. They changed chimpanzees to primates which includes a lot of very small primates that aren't nearly as intelligent and close to us as chimps are. And they also said it's not similar to children, the same as children, so you can also slant the poll not by playing around with the question, but by playing around with the way you report the conclusion of the survey. So those are some of the tricks that people use in order to reach a predetermined result and to fool you into believing that their poll results are reliable. And we saw other mistakes that people make, even when they're not trying to fool you. So, what you need to do is to keep your eye on all those. You need to ask these questions that I've emphasized throughout this lecture, namely you got to see all the premises true and justified, is the sample large enough, is the sample biased in some way, it had the questions or the conclusion then reported in slanted way? And there's a lot more to learn about statistics. And again I want to emphasize, it would be useful to take a statistics course, people ought to know more about statistics but we can't go into all of those details here. Still, if you can just avoid these few simple mistakes that I've been talking about, then you can avoid being misled in a number of situations in everyday life.