Six Sigma deploys a number of statistical methods to understand current processes. As we have discussed previously, variation in output costs the business, both directly and indirectly so we need to analyse and mitigate as much variation as we can.
First off, let’s look at the two main types of data. Attribute data and continuous data.
Attribute data is that which cannot be sensibly added or subtracted. So, hair colours; t-shirt sizing or yes/no responses from a survey. You can however sort attribute data from smallest to largest for example or sort the months of a year.
Continuous data is that which can be added or subtracted. Such as your bank balance, the length or width of an item; process completion time and temperature readings.
Now, let’s kick off this statistical party.
Above is a chart that graphically shows the distribution of our data set. By looking at it, we can say that next time we carry out the process, we’re most likely to have a result of 5 or 6 and quite unlikely to get a 1 or 10 – so hey, we’ve got a bit of insight.
All of our X’s and Y’s will have a distribution. It may be the dimensions of the output product; the temperature of the inputs or something else, but all of these could have variability and we need to understand how much they vary.
To begin understanding variance, we need to look at the mean, median and mode of the data. It’s worth noting that the mode isn’t really a very useful measure, so while some people do use it, admittedly, I’ve not ever found a huge reason to use it. The median is very useful when data has outliers and the mode is the most common and very useful.
So to understand our variance, we can use two methods. One is to simply calculate the mean. So, here, we have a maximum value of 10 and a minimum value of 1. So the range is 9.
That’s useful. However, we can also use standard deviation, which enables us to understand the typical average distance from the average.
In this formula Xi refers to the individual data point. The X with the bar over the top, is the mean of all data points and N refers to the total number of data points.
We must calculate (Xi – X) squared for every data point in the data set. We square the results as it ensures that all resulting values are positive numbers.
Then, we must sum the total of those data points. Next, we sum up all of the resulting values and divide them by the total number of data points (n) minus 1.
Finally, we square root the resulting figure & we’ll have our standard deviation. As mentioned above, standard deviation represents the typical average distance from the mean – and don’t despair, Excel has a standard deviation formula to do this for you.
Short term deviation
So, Standard Deviation is the long term deviation. That is, for a full data set. If we’re going to take a subset of the data, let’s say look at 1 hour from a 1 month dataset, we need to use the short term variation formula, as below.
Now, to utilize this formula, we must find the mean of R (shown above as an R with a bar above it). First, we’re going to figure out the differences between sequential measures. So, let’s take this series of data:
1, 3, 5, 7, 9
We can use the below formula. Note: the vertical bars mean we’re taking the absolute resulting value. That’s converting any negative value (e.g. -55) into a positive value (55):
So in our example, X1 is equal to 1 and X2 is equal to 3. So here, Ri = 2. We do this for all our individual data points.
Then, to find the mean of R we use the below formula.
- R with a bar over it = the mean of R
- R = range
- Sigma sign = sum
- N = number of data points
So, our data set of 1, 7, 9, 10 would go through the below calculation (as discussed above):
Ri = | X1 – X2|
So, for our data set, we get the below R values:
R =| 1-7| = 6
R = | 7-9| = 2
R = | 9-10| = 1
Now that we have our R values and our N value (4 data points), we can write the equation as:
(1 / 3) * (∑(6,2,1))
So, that’s 0.333 * 9 which equals 3. So that’s the result of our equation.
Once we’ve finally calculated the mean of R, we simply divide by 1.128. So our answer to short term variance is 2.6.
N.B. the numbers above and below the sigma sign are not used in the calculation. They show the range. The bottom is 1 and the top is 3 (n-1).
The lower the variation, the better.
What does X standard deviations from the mean, mean?
Let’s take the above example. We have a normal distribution. Zero denotes the mean & each number after shows the number of standard deviations from it. So, if we had a standard deviation of 2.5 then each jump in the above diagram would represent a change of 2.5. With a regular distribution, we can say that 68% of all observations fall between minus and plus one standard deviation from the mean.
Content based on study of the Six Sigma Black Belt course and Six Sigma for Dummies