How do I tell if a sample data is roughly normal? (Normal Probability Plots)

Question

I don't understand why this graph shows that the data set is roughly normal:

enter image description here

I can see that the points are evenly spread out below and above the line, but why does that make the data approximately normal?

I'm also having trouble understanding where this formula comes from which describes the area under the standard normal curve to the left of $f_i$: $f_i = \frac{i-0.375}{n+0.25}$ where $i$ is the index of the arranged sample data and $n$ is the number of observations. Answering either question is fine.

score 1 · Answer 1 · answered Jan 13 '14 at 02:51

I'm guessing this is a normal probability plot, which is designed so that if a normally distributed dataset is plotted on it, it will appear as the solid diagonal line in your image. The particular dataset that is plotted on the graph above is close to the solid line, so it is approximately normal.

score 0 · Answer 2 · answered Jan 13 '14 at 15:35

The data is roghly normal in that it falls around the line and within the confidence bands. However, there is some correlation in the errors about the line that suggests it may be heavy tailed (i.e., the S-pattern). At 19 points, there just isnt enough data to make this "significant", but I'd guess more data may show a heavy tailed distributon, but the normal wouldn't be too far off.

To get a feel for what normal probability plots should look like for actual normal data, simulate some data (of various sample sizes e.g., 5, 20, 50, 100 points) from a normal distribution and run this exact same plot on it using a statisical package. You'll see what they look like when you know the data are normal. This should help your intuition, as distribution fitting is very visual.

Also, this like will be helpful: http://www.itl.nist.gov/div898/handbook/eda/section3/normprpl.htm

How do I tell if a sample data is roughly normal? (Normal Probability Plots)

2 Answers2