|
|
|
|
The Sample MeanThe mean of a sample of numerical observations is the sum of the observations divided by the number of observations. It is the simple arithmetic average of the numbers in the sample. If the sample members are denoted by x1, x2, ... , xn where n is the number of observations in the sample or the sample size, then the sample mean is usually denoted by an x with a ¯ over the top and pronounced "x bar". In mathematical parlance:
For example, the following sample of 20 observations: has a mean of 7.69, being the sum 4.2 + 4.4 + 5.1 + 5.6 +...+ 11.6 all divided by 20. In physical terms, the sample mean is the balance point of the sample. Physicists call it the first moment about the origin. If you imagine the sample points to be equally weighted marbles stuck on a number line (see dot plot of the sample), then the total turning force at this point is zero. If you place a support at the sample mean, the line will not tilt. In the following diagrams the blue arrow indicates the point of support.
If the sample is represented by a histogram rather than a dot plot, the same idea applies. If you imagine the shape of the histogram cut from cardboard, then the place to balance it is at the sample mean. (Objection from pedant in back row). Armed with this idea, and presented with a histogram, it is not difficult to have a good stab at the sample mean without doing any calculations. Test your arm.
The Sample Standard DeviationThe standard deviation of a sample of numerical observations is a measure of the spread or range of the sample values. It is derived from the distance of each point in the sample from the sample mean (positive distance to the right, negative to the left). These distances are the deviations of the title - they are deviations from the sample mean. If you sum the squared deviations, and then divide by one less than the sample size, you get what is known as the sample variance. Typically this is denoted by s2. The sample variance is a useful measure in itself of the variability in the sample values, but its units of measurement are the square of those of the sample values themselves. The standard deviation of a sample is the (positive) square root of the sample variance, and is usually denoted by (no surprises here) s. It is a measurement on the same scale as that of the original sample values. In mathematical terms, if the sample values are denoted by x1, x2,...,xn where n is the sample size, then the standard deviation of the sample is given by:
The standard deviation cannot be less than zero. If the standard deviation of a sample is zero, then all sample values are the same. If the sample values are not all the same then they must exhibit some form of variability. How much variability the sample values exhibit is encapsulated by the standard deviation. If the standard deviation is small, then the sample values cluster close to the sample mean. If the standard deviation is large then the sample values are widely dispersed. Dispersion or variability is the bane of a researcher's life. If everything was the same there would be no arguments as to the conclusions of an experiment or study. Unfortunately, large variability in a sample can eclipse any information provided by that sample. It is the noise to the mean's signal. Whilst you can put your finger on the sample mean, the sample standard deviation is a much more nebulous entity. This can only be expected from something which was designed to measure nebulousness. However, beginning students tend feel somewhat mystified by the concept. If the standard deviation is a measurement on the same scale as the sample values, where exactly is it? To get some idea of what the standard deviation represents, think of it as a distance from the sample mean. For most reasonably symmetric samples of a respectable size (20 or more observations, say), almost all sample values will lie within two standard deviations either side of the sample mean. That is, four standard deviations will just about cover your entire sample - from smallest to largest value. This last rough rule of thumb seems to hold for skew samples as well, even though the mean will no longer be central in the four standard deviation range. With a little practice, you can get quite proficient at estimating the standard deviation from a histogram of the sample, as the following quiz will attest.
|