Why use the median and not the mean?
As summary statistics, the sample median and sample mean both give us an idea of where the middle of the sample is.
Statisticians call them measures of location, or of central tendency, but only in the privacy of their own offices. As
competitors for the centre ground, they both have advantages and disadvantages.
The sample mean has an accommodating and egalitarian nature
which takes into account the opinions of all sample values equally as to where the centre might be. As a consequence,
it is easily led astray by outlying or extreme sample values. The sample median, on the other hand favours the executive
style of decision making, and is influenced only by an inner sanctum of sample values.
Dissenting points of view are ignored. The unswerving, blinkered view of the sample median earns it a reputation
as a robust statistic, which is actually a technical term.
For example, the following sample has a mean of 7.69 and median of 7.65:
4.2, 4.4, 5.1, 5.6, 6.0, 6.4, 6.8, 7.1, 7.4, 7.4, 7.9, 8.2, 8.2, 8.7, 9.1, 9.6, 9.6, 10.0, 10.5, 11.6
If the last two sample values were changed to 20.5 and 21.6, then the mean of the new sample would be 8.69, but
the median would remain resolutely unchanged at 7.65.
Both the mean and the median have their place as measures of centre, and often the nature of the data
you collect will determine which might be better in the circumstances. For example, the term "median house price" is now firmly established
as the industry standard, replacing the outmoded "average house price". The latter is inflated by a relatively few sales
of executive mansions. Curiously, the term "average yearly rainfall" is still in common parlance, even though the distribution of rainfall figures in most areas is almost as skewed to the right as house prices. The skewness in the distribution ensures that the average rainfall will exceed the median rainfall, and so it is more likely than not that the rainfall of any given year will be less than average. It is not at all unlikely to find several years in succession of below average rainfall, a fact used to good advantage by water corporations arguing for supply side price increases.
The sample mean has the upper hand when it comes to theoretical considerations.
It is much easier to handle as a mathematical object than the median, whose rigorous definition depends on whether or
not the sample has an odd or even number of observations.
The Alphabet Rule
In a population which is skewed to the right (i.e. has a long right hand tail, or extreme high values)
the mean, median and mode occur in reverse alphabetical order. That is, mode < median < mean. To date, no one
has found a use for this strange but true phenomena.
The Large Sample Distribution of the Sample Median
In samples from a Normal population with mean m and standard deviation s,
the sample median is asymptotically Normally distributed with mean m and standard deviation
Ö(p/2)s/Ön or approximately 1.2533s/Ön .
|