Welcome to a basic lesson in statistics which recapitulates much of what you already know about statistics, however it is nice to have it all in one place and maybe this will give you a fresh perspective on the statistics that you learnt in oh so long ago.
So statistics, hmm rambling thought, is it plural? well I have heard the term "summary statistics" so methinks that statistics is a common noun. Well it is. A statistic is any formula (algebraic) that reduces a set of numbers given to it to a single number that describes the group of numbers in some (mostly) approximate way. It is possible to describe a group of numbers and their properties using a much smaller group of numbers. So statistics gives you formulae which allow you to reduce a giant set of numbers into a much smaller useful set of numbers that allow you to compare between two such giant set of numbers.
The mean/average is a basic statistic. It is the sum of all the values divided by the number of values. This statistic gives you the central value of the set of numbers. So knowing the mean you sort of get the idea of what the value of a randomly picked number from that could be around.
However the mean does not tell you about the spread of the values around the mean. So differently spread out numbers can have the same mean. This means the mean can be misleading about the scale of the numbers and sometimes the mean can be heavily influenced by a few numbers in the group having extreme values which can shift the value of the mean towards themselves.
This is where the variance comes in. The variance tells you an estimate of how much the numbers are spread around the central number (which in this case is the mean). It is calculated by subtracting the mean from every observation and then squaring these numbers and finally taking the average of these numbers. So the variance gives you an idea of the "average spread" of the numbers around the mean.
However the variance does not have the same units as the original numbers and therefore to bring the number representing the spread to the same unit we take the square-root of the variance and this is known as the standard deviation. Bringing it to the same unit helps you state that a particular number in your group is X times the standard deviation away from the mean.
Another government conspiracy which I think is true is that taking the square-root makes sure that the variance grows slowly otherwise it grows very fast. But then that is just me thinking that statistics is all about the algebraic properties of the numbers themselves rather than any significance of the physical quantities that they represent.
So the mean tells you where the centre point of your group of numbers are (from now on I'll call your group of numbers a distribution) and the variance tells you how fat the spread of the numbers around your central number is. So using two numbers we can describe a distribution (well ideally only in the case of nicely shaped distributions). For instance, the normal distribution, which will be a set of numbers that when plotted in a density/ histogram show up as a bell-curve shape can be described completely by the mean and the variance.
So say you had 100 numbers which you tell me are top secret. Now I know that they are normally distributed because I spied your histogram of those numbers as I went past your PC. So if we played a guessing game and you gave me the hint that the mean and the variance of those numbers was (x and y) I could reconstruct the entire set of hundred numbers with just this information because there is a formula for the normal distribution which takes in the mean and the variance and gives me the set of numbers you have.
So that takes care of the mean, variance and the standard deviation. More in another post
No comments:
Post a Comment