7

I am curious what everyone's thoughts are on the following problem I found in a middle school textbook I am teaching out of. The chapter is on biased and unbiased data samples. Here is the question:

To find how much money the average American family spends to cool their home, 100 Alaskan families are surveyed at random. Of these families, 85 said they spend less than 75 dollars per month on cooling. The researcher concluded that the average American family spends less than 75 dollars on cooling per month. Is the conclusion valid?

The book states that it is valid because it is a simple, random sample. I would say otherwise considering they surveyed Alaskans and made a generalization about the whole nation. If you were given the options of unbiased simple, random sample, systematic unbiased sample, biased convenience sample, and biased voluntary sample, which would say is a better answer? I'm thinking a convenience sample.

K Math
  • 1,245
  • 1
  • 11
  • 21
  • The question specifically says that it's a random sample, though. You are however correct that it's biased because Alaska isn't representative of the entire United States. – Michael McGovern Apr 11 '18 at 00:08
  • 2
    I don't know anything about statistics, but the problem seems to be set up to say the opposite. If I were to try to demonstrate biased data samples, I think surveying Alaskans on cooling costs would be a great example. – TeaFor2 Apr 11 '18 at 00:08
  • I would also have to say that this would be a convenience sample, but tbh, I'm not sure there's enough information to reach a definite conclusion. – supersmarty1234 Apr 11 '18 at 00:08
  • There are multiple issues here. First of all, the sample size is $100$, so a single survey out of these sample does not represent the entire survey. Second of all, the families surveyed are Alaskan, and the assertion that "cooling spending in Alaskan families represent cooling spending in American families" is needed to apply the result to an average American family. – Frenzy Li Apr 11 '18 at 00:10
  • The fact that they only surveyed Alaskans about their cooling costs is a hilarious error. Wow. What textbook is this? – littleO Apr 11 '18 at 00:37
  • It does get really warm in the Panhandle. – Michael McGovern Apr 11 '18 at 00:42

2 Answers2

3

You are right.

A simple random sample is a subset of the population chosen so that each member of the population has the same probability of being in the sample. It is clearly stated that the population here is the set of American families, while it seems that the sample was chosen from Alaskan families. If this is the case, then for any American family that isn't Alaskan, the chance of them being in the sample is 0, while for a family which was picked, the chance was (obviously) nonzero. Thus, it would not be a simple random sample.

However, if the statement that they were chosen randomly means that everyone had an equal chance of being in the sample, but it happened to be the case that all 100 families chosen were Alaskan, then it would in fact be a simple random sample.

A simple random sample is not inherently biased (if everyone has the same chance of being in the sample, how could it be?). However, it is simply a method of choosing who will be in the sample, and not how the sample itself is taken. Is it an optional anonymous survey? Are they asked in public, in front of friends and/or family? For this reason, being a simple random sample does not guarantee that it is an unbiased sample.

Lastly, this should be a lesson as to the downsides of SRS. However unlikely, there is always the possibility of a fluke like this, where everyone or most people in the sample have some similar trait which could be a lurking variable for the parameter being studied. If we suspect that different subsets of the population will have different answers inherent to their age, sex, location, race, religion, ethnicity, etc, then we would be better off using a sampling scheme which dealt with each group separately. Otherwise, we could simply increase the sample size to a significant multiple of the number of groups to increase the chances of each group having some representation in the sample.

Statistics is not an exact science, and nothing gained from a statistical study or experiment is ever "fact". We determine the validity in the context of the problem we are trying to solve and the probability of getting such a sample. Sometime, we just get extremely rare samples, and blindly calling them valid simply because they pass some mathematical test with arbitrary requirements is missing the point entirely.

Alex Jones
  • 8,990
  • 13
  • 31
2

The question specifically says that the sample is a random sample. However, the problem is that it is a sample of alaskans, and, regardless of weather it is representative of alaskans, it is certainly not representative of all Americans.