Distributions, Central Tendency and Variability

Measures of Central Tendency

A measure of central tendency is a single number that best represents an entire set of values (either a sample or an entire population). We have three options:

  • Mean: the average value, outliers skew this number
  • Median: the number in the middle (if there are two in the middle, average those two)
  • Mode: the value that appears most often, more useful with categorical variables

Measures of Variability 1: Ranges

With measures of variability we want to get one number to represent how spread out or how clumped together the set of values are. Typically, we will look at:

  • Range: the minimum and the maximum value of a set of data (also represented as the difference between those two numbers)
  • Intraquartile range: the 25th and 75th percentile values of a set of data (also represented as the difference between those two numbers)

Measures of Variability 2: Variance and Standard Deviation

With measures of variability we want to get one number to represent how spread out or how clumped together the set of values are. Typically, we will look at:

  • Variance: take how far each value is from the mean, square it (to get rid of negative numbers), then average all those squared differences from the mean. This will result in units that are on a different scale from the original values (and the central tendency). For example, if we are looking at height in inches the variance will be in inches squared.2
  • Standard Deviation: the square-root of the variance; the units of this measure of variability will match that of the data set (if we are looking at height in inches, the units of the standard deviation will also be inches)

Frequency Distributions

Here we talk about a very important part of statistics, that is, frequency distributions. A frequency distribution of a variable is a graph that shows how frequently each value of the variable appears in the dataset. The x-axis shows the value of the variable (e.g. for the variable height, the x shows all possible heights) and the y-axis shows the number of times that value appears in the dataset. For instance, if a height of 65 inches appears in the dataset three times, then we would plot a value of y = 3 at x = 65. Watch the video. You’ll get it.

Frequency Distributions & Central Tendency

Not all distributions take the familiar bell-shape. Some are skewed to the left and others to the right. The different shapes have different implications for measures of central tendency.

The Normal Distribution

The normal distribution holds special relevance in statistical inference. When a set of data follows this bell-shape, we can make certain inferences. Watch the video.

Z-scores

The power of the normal distribution comes from understanding z-scores. This is a standardized version of the normal distribution where the mean is set to zero and the standard deviation to 1. With this you can calculate the probability of a number being above (or below) a certain value. This is an important concept to understand as it will serve as the foundation of hypothesis testing later.


Test your comprehension

With this Distributions, variability and central tendency problem set.

Here is the previous Z-score video.

One thought on “Distributions, Central Tendency and Variability

Leave a comment