[ Murdoch University logo and link to homepage ]
School of Chemical and Mathematical Sciences

Dot Plots

A dot plot is a primitive display of a sample of numerical values. Every sample value is simply represented by a dot or a blob where it occurs on the number line. Identical sample values are piled on top of each other. For example,

dotplot of sample of 20 observations

is the dot plot of the sample 4.2, 4.4, 5.1, 5.6, 6.0, 6.4, 6.8, 7.1, 7.4, 7.4, 7.9, 8.2, 8.2, 8.7, 9.1, 9.6, 9.6, 10.0, 10.5, 11.6. Obviously, the efficacy of the dot plot decreases with sample size.


Histograms

A histogram (from the Greek histos meaning mast) of a sample of numerical values is a plot which involves rectangles which represent frequency of occurrence. As such, it resembles a bar chart but should never be confused with one. Histograms build on the idea of dot plots. Their use is limited to samples of continuous variables where any value in a range is possible. An interval on the number line which completely encompasses the data, from the sample minimum to its maximum, is subdivided into smaller adjoining intervals called bins (to use an apple sorting metaphor). The bins typically all have the same width, the bin width, but need not have. The number or frequency of sample values which fall inside each bin is calculated, and a rectangle based on each bin is constructed. The height of the rectangle is equal to the frequency in the relevant bin divided by the bin width. The area of the rectangle will then represent the number of sample values in the bin. If all bin widths are the same, the area of each rectangle will be in the same proportion to its height, and so you can read the frequency from the vertical axis. This is the origin of the popular misconception that the height of the rectangle represents the frequency. It only does so when all bins have the same width.

All sample values should be represented in one and only one bin in the histogram. If a sample value falls on the crack between two bins make some rule to apportion it to one or other of them, e.g. always upgrade such values. It makes little difference in the scheme of things.

A histogram of the sample 4.2, 4.4, 5.1, 5.6, 6.0, 6.4, 6.8, 7.1, 7.4, 7.4, 7.9, 8.2, 8.2, 8.7, 9.1, 9.6, 9.6, 10.0, 10.5, 11.6 is given below:

histogram of a sample of 20 observations
Histogram with 9 bins based on the intervals 3.5-4.5, 4.5-5.5, 5.5-6.5,...,11.5-12.5. The bin width is 1.0 for all bins, and so the frequency can be read straight from the vertical axis.

The term "a histogram" in the previous sentence was used advisedly. The above histogram is just one of an infinite number that could have been drawn for the given sample. Here are a few more.

histogram of a sample of 20 observations   another histogram of the same sample of 20 observations   yet another histogram of the same sample of 20 observations
Bin width 1.0 starting at 3.2.   Bin width 1.5 starting at 3.25   Bin width 0.5 starting at 3.75

Obviously, the narrower you make the bins, the more like a dot plot the histogram becomes, and the less idea of the shape of a sample you get. Conversely, a few wide bins will summarise your sample into oblivion. The optimal choice lies somewhere in between. There are rules of thumb concerning the optimal choice of bin width, which depend on the range of your data and the sample size. Remember a histogram is a summary plot, and you will lose specific information on where particular sample values lie. The idea is to get some idea of the location, spread, overall shape and to perhaps highlight any salient features. The latter can be achieved by varying the bin widths in certain places, if something is happening there which you don't want smoothed over (see example). Obviously the non-uniqueness aspect makes it dangerous to read too much into the appearance of any one particular histogram. Observations in the tails of the sample can become detached from the pack, gaps can appear almost anywhere and modes spring up from nowhere depending on your choice of bins. In general, however, the larger the sample size, the more resilient the histogram becomes to changes in bin width.


How to Draw Histograms

 

More on Histograms

 


Bar Charts

A bar chart, is a plot designed for categorical variables, i.e. variables which take only a finite number of discrete values or categories. A bar is drawn for each category, the length of which equates to some characteristic of the category. (The width of the bar signifies nothing, although all bars should have the same width.) For example, the categories could be sales areas, and the bar lengths could represent sales figures for each area. Or, the categories could be hair colour, and the bar length could represent the frequency of occurrence of each hair colour in a population or in a sample from that population. The possibilities are limitless, unlike the uses of bar charts in statistical analysis. Still, they are fun to draw and both Excel and SPSS can churn them out like billyo (American translation - "all get out") irrespective of their appropriateness to the situation at hand.