|
|
|
|
The Median of a SampleThe idea of a sample median is a very simple one. It is the point in the middle of the ordered sample. Half the sample values exceed it and half do not. It is used, not surprisingly, to measure where the centre of the sample lies, and hence where the centre of the population from which the sample was drawn might lie. Unfortunately, this simple idea becomes complicated by technical detail when confronted by an actual sample, since finer points must be put on the definition to uniquely determine the median. However, you should not lose sight of the fundamental simplicity - half the sample values lie to one side of the median, and half to the other. The following diagram demonstrates the point for a sample of 20 observations. The sample values are presented as red "dots" on the number line in what is known as a dot plot. Sample values which are equal are piled on top of one another. (Actual sample values) ![]()
The Minimum, Maximum and Range of a SampleThe smallest sample value is called the minimum of the sample, and the largest sample value is called the maximum. The distance between the sample minimum and maximum is called the range of the sample. So the range is a distance, rather than an interval, even though it is quite common to see the phrase "the sample values ranged from (insert minimum value here) to (insert maximum value here)" used in the literature. There is no source of real confusion here. ![]() The range clearly is a measure of the spread of sample values. As such it is a fairly blunt instrument, for it takes no cognizance of where or how the values between the minimum and maximum might be located. Are they, for example mainly clustered around a central point with a few outlying values tailing off either side, or are they uniformly spread over the interval? Measures of spread, or dispersion, or variability, which do attempt to take such issues into account include the sample standard deviation and interquartile range. That said, if you have a reasonable idea of the nature of your population, then there is a fairly close association between the range and the standard deviation of your sample. This fact is exploited in control charts, which monitor industrial processes and are designed to be drawn by shop floor factory workers in real time. In this situation, ease of calculation is of prime importance.
The Upper and Lower Quartile of a Sample and the Interquartile RangeQuartiles divide the sample into four equal parts. The lower quartile has 25% of the sample values below it and 75% above. The upper quartile has 25% of the sample values above it and 75% below. The middle quartile is, of course, the median. The middle half of the sample lies between the upper and lower quartile. The distance between the upper and lower quartile is called the interquartile range. Like the range, the interquartile range is a distance, not an interval, and it is a measure of the spread of the sample. It measures variability or dispersion. For a Normal population with standard deviation s, the interquartile range is approximately 4s/3. The following diagram demonstrates the point for a sample of 20 observations. The sample values are presented as red "dots" on the number line in what is known as a dot plot. Sample values which are equal are piled on top of one another. (Actual sample values) ![]()
Quantiles and PercentilesThe lower quartile is the 0.25 quantile or 25th percentile. The median is the 0.5 quantile or 50th percentile. The extension to other fractions between 0 and 1 (percentages between 0 and 100) should be obvious. Not so obvious is how these things should be defined and calculated for samples, which alas are not infinitely divisible. The problem is the same as that encountered in defining and calculating quartiles of a sample, with the added complication that the point you are seeking may lie outside the gamut of the sample. For example, if you only have 10 data points in your sample, you might imagine that the 5th percentile lies somewhere (conceptually anywhere) to the left of the sample minimum, unless you strictly prohibit this in your definition. The size of the sample thus places restrictions on the class of sensible quantiles of the sample.
|