By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Measures of Central Tendency A measure of central tendency is a statistical value that gives a reasonable estimate for the center of a group of data.
There are several different ways of describing the measure of central tendency. Each one has a unique way it is calculated, and each one gives a slightly different perspective on the data set.
Whenever you give a measure of central tendency, always make sure the units are the same. If the data has different units, such as hours, minutes, and seconds, convert all the data to the same unit, and use the same unit in the measure of central tendency.
If no units are given in the data, do not give units for the measure of central tendency. Mean The statistical mean of a group of data is the same as the arithmetic average of that group.
To find the mean of a set of data, first convert each value to the same units, if necessary. Then find the sum of all the values, and count the total number of data values, making sure you take into consideration each individual value.
If a value appears more than once, count it more than once. Divide the sum of the values by the total number of values and apply the units, if any.
Note that the mean does not have to be one of the data values in the set, and may not divide evenly.
For instance, the mean of the data set {88, 72, 61, 90, 97, 68, 88, 79, 86, 93, 97, 71, 80, 84, 89} would be the sum of the fifteen numbers divided by 15:
While the mean is relatively easy to calculate and averages are understood by most people, the mean can be very misleading if used as the sole measure of central tendency. If the data set has outliers (data values that are unusually high or unusually low compared to the rest of the data values), the mean can be very distorted, especially if the data set has a small number of values. If unusually high values are countered with unusually low values, the mean is not affected as much.
For example, if five of twenty students in a class get a 100 on a test, but the other 15 students have an average of 60 on the same test, the class average would appear as 70.
Whenever the mean is skewed by outliers, it is always a good idea to include the median as an alternate measure of central tendency.
A weighted mean, or weighted average, is a mean that uses “weighted” values.
The formula is .
Weighted values, such as are assigned to each member of the set .
If calculating weighted mean, make sure to use a weight value for each member of the set. Median The statistical median is the value in the middle of the set of data.
To find the median, list all data values in order from smallest to largest or from largest to smallest. Any value that is repeated in the set must be listed the number of times it appears.
If there are an odd number of data values, the median is the value in the middle of the list.
If there is an even number of data values, the median is the arithmetic mean of the two middle values.
For example, the median of the data set {88, 72, 61, 90, 97, 68, 88, 79, 86, 93, 97, 71, 80, 84, 88} is 86 since the ordered set is {61, 68, 71, 72, 79, 80, 84, 86, 88, 88, 88, 90, 93, 97, 97}.
The big disadvantage of using the median as a measure of central tendency is that it relies solely on a value’s relative size as compared to the other values in the set. When the individual values in a set of data are evenly dispersed, the median can be an accurate tool. However, if there is a group of rather large values or a group of rather small values that are not offset by a different group of values, the information that can be inferred from the median may not be accurate because the distribution of values is skewed. Mode The statistical mode is the data value that occurs the greatest number of times in the data set. It is possible to have exactly one mode, more than one mode, or no mode.
To find the mode of a set of data, arrange the data like you do to find the median (all values in order, listing all multiples of data values). Count the number of times each value appears in the data set.
If all values appear an equal number of times, there is no mode.
If one value appears more than any other value, that value is the mode. If two or more values appear the same number of times, but there are other values that appear fewer times and no values that appear more times, all of those values are the modes.
For example, the mode of the data set {88, 72, 61, 90, 97, 68, 88, 79, 86, 93, 97, 71, 80, 84, 88} is 88.
The main disadvantage of the mode is that the values of the other data in the set have no bearing on the mode. The mode may be the largest value, the smallest value, or a value anywhere in between in the set.
The mode only tells which value or values, if any, occurred the greatest number of times. It does not give any suggestions about the remaining values in the set. Dispersion The measure of dispersion is a single value that helps to “interpret” the measure of central tendency by providing more information about how the data values in the set are distributed about the measure of central tendency. The measure of dispersion helps to eliminate or reduce the disadvantages of using the mean, median, or mode as a single measure of central tendency, and gives a more accurate picture of the dataset as a whole. Range The range of a set of data is the difference between the greatest and lowest values of the data in the set. To calculate the range, you must first make sure the units for all data values are the same, and then identify the greatest and lowest values. If there are multiple data values that are equal for the highest or lowest, just use one of the values in the formula. Write the answer with the same units as the data values you used to do the calculations. Standard Deviation Standard deviation is a measure of dispersion that compares all the data values in the set to the mean of the set to give a more accurate picture.
To find the standard deviation of a sample, use the formula Note that s is the standard deviation of a sample, x represents the individual values in the data set, is the mean of the data values in the set, and n is the number of data values in the set. The higher the value of the standard deviation is, the greater the variance of the data values from the mean. The units associated with the standard deviation are the same as the units of the data values. Variance The variance of a sample, or just variance, is the square of the standard deviation of that sample. While the mean of a set of data gives the average of the set and gives information about where a specific data value lies in relation to the average, the variance of the sample gives information about the degree to which the data values are spread out and tell you how close an individual value is to the average compared to the other values. The units associated with variance are the same as the units of the data values squared. Percentile Percentiles and quartiles are other methods of describing data within a set.
Percentiles tell what percentage of the data in a set fall below a specific point. For example, achievement test scores are often given in percentiles.
A score at the 80th percentile is one which is equal to or higher than 80 percent of the scores in the set. In other words, 80 percent of the scores were lower than that score.
Quartiles are percentile groups that make up quarter sections of the data set.
The first quartile, Q1, is the 25th percentile. The second quartile, Q2, is the 50th percentile; this is also the median of the dataset. The third quartile, Q3, is the 75th percentile. The interquartile range (IQR) is the difference between the third quartile and the first quartile, Q3 – Q1. Outlier An outlier is an extremely high or extremely low value in the data set. It may be the result of a measurement error, in which case, the outlier is not a valid member of the data set. However, it may also be a valid member of the distribution. Unless a measurement error is identified, the experimenter cannot know for certain if an outlier is or is not a member of the distribution.
There are arbitrary methods that can be employed to designate an extreme value as an outlier.
One method designates an outlier (or possible outlier) to be any value less than ) or any value greater than ). Practice P1. Given the following graph, determine the range of patient ages: P2. Calculate the sample variance for the dataset Practice Solutions: P1. Patient 1 is 54 years old; Patient 2 is 55 years old; Patient 3 is 60 years old; Patient 4 is 40 years old; and Patient 5 is 25 years old. The range of patient ages is the age of the oldest patient minus the age of the youngest patient. In other words, . The range of ages is 35 years. P2. To find the variance, first find the mean:
Now, apply the formula for sample variance:
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.