[ Murdoch University logo and link to homepage ]
School of Chemical and Mathematical Sciences

Q-Q plots

A Q-Q plot is used to check whether or not your sample could have come from some specific target population. The target population can be any shape you like. Usually it is Normal, in which case the plot is more properly called a Normal Q-Q plot, although it seldom is. The most common usage is as a diagnostic tool in analysis of variance or regression, where the theory relies heavily on the assumption of Normality, even though the practice may not. Checking for gross departures from Normality is always a good idea, lest you make a complete ass of yourself in your reported conclusions.

The first Q stands for the quantiles of your sample; not just any quantiles, but the ones which coincide with the sample members themselves. So if you have 9 data points, for example, calculate the 0.1, 0.2,...,0.9 quantiles. (Assuming you are using the SPSS style of quantile calculation and there are no two sample values the same.) The second Q stands for the same quantiles of the population you are using as a yardstick. So, in the example, calculate the 0.1,0.2,...,0.9 quantiles of your target population. This task will entail some inverse probability calculations involving the theoretical target population, and is best left to a computer. If you plot the target population quantiles (y) against the respective sample quantiles (x), you have constructed a Q-Q plot. If your sample is indeed from the population you suspect, the sample quantiles will lie close to where they might be expected and the points on the plot will straggle about the line y = x.

In order to calculate the theoretical quantiles, the target population must be specified exactly, which means, in the case of a Normal target, that the population mean and standard deviation must be known. In practice, of course, this is never the case, and so sample estimates are used. So, the target or benchmark Normal population always has the same mean and standard deviation as your sample. In discussing departures from Normality it is well to keep this in mind. A population is skew relative to a Normal population with the same mean and standard deviation. Similarly, populations are heavy tailed relative to a Normal population with the same mean and standard deviation.


The One True Path

Normal Q-Q Plots of Samples from Normal Populations

A sufficiently trained statistician can read the vagaries of a Q-Q plot like a sharman can read a chicken's entrails, with a similar recourse to scientific principles. Interpreting Q-Q plots is more a visceral than an intellectual exercise. The uninitiated are often mystified by the process. Experience is the key here. The first step is to examine Normal Q-Q plots of samples known to be from Normal populations, to get some idea of how much straggling about the line is acceptable.

Show me more Normal Q-Q plots from Normal populations
  Normal Q-Q plot

The points straggle about the line y=x

Normal Q-Q plot of a sample of 20 observations from a Normal population with mean 10 and standard deviation 3

 


Departures from the One True Path and the Implications Thereof

Normal Q-Q Plots of Samples from Skew Populations

Specific departures from Normality in the population being sampled manifest themselves as specific departures from the straight and narrow in the Q-Q plot. If the population being sampled is actually skewed to the right, i.e. has a long right hand tail, and thus short left tail, then the sample quantiles close to 1 will lie to the right of where Normality would place them, and similarly for the sample quantiles close to 0. For quantiles closer to 0.5, the Normal quantiles will exceed those of the sample. (Why?) The result is a Q-Q plot which resembles the left hand top of an arch, starting below the target line (or to the right if you prefer), arching across it and then back to finish below (or to the right of) the line again. If the sampled population is skewed to the left, the arch is reflected about y = x, starting above it, crossing below and then back to finish above.

Show me more Normal Q-Q plots from skew populations
  Normal Q-Q plot

The points arch up and over the line y=x

Normal Q-Q plot of a sample of 20 observations from a lognormal population with mean 10 and standard deviation 3. This population is skewed to the right (i.e. it has a long right hand tail).

 


Normal Q-Q Plots of Samples from Heavy Tailed (Leptokurtic) Populations

Heavy tailed populations are symmetric, with more members at greater remove from the population mean than in a Normal population with the same standard deviation. To compensate for the extreme members of the population, there must also be higher concentration around the population mean than in a Normal population with the same standard deviation. (To simply weight the tails would increase the standard deviation.) That is, heavy tailed populations also have higher, narrower peaks than the benchmark Normal population. Hence the term leptokurtic - narrow arched. (Remember the benchmark Normal population has the same mean and standard deviation as your population.) Consequently, the sample quantiles might be expected to start ahead of their Normal counterparts, but be soon overtaken by them. Symmetry would place both sample and target median back together again. The situation would be reversed as you move from the median into the right hand tail, with the sample quantiles in front of the targets to begin with, but eventually being overtaken by them. (A picture, please.) The result is a Q-Q plot which resembles a stretched S, starting to the left of the target line, and ending to the right of it, having crossed it three times in between.

Show me more Normal Q-Q plots from heavy tailed populations
Normal Q-Q plot

The points snake over the line y=x

Normal Q-Q plot of a sample of 20 observations from a heavy tailed population with mean 10 and standard deviation 3.