Let us take the example of examination/ test centers, that run an exam throughout the year, every day. Past one-year data shows – 30% of the candidates pass the exam and 70% fail the exam, all over India.
The Bangalore test center handles around 1000 candidates per month, whereas the Mysore center handles around 100 per month. Over the last one year, both centers have shown the same 30 pass: 70 fail ratio.
For the month of June 2010, one center has reported 38% pass and another has reported 29% pass. Which center (Bangalore or Mysore) is more likely (has a higher probability) to have reported 38%?
Well, Mysore is more likely to have the higher deviation from the average (+8%) than Bangalore (-1%), because Mysore, handling lesser candidates, has a lesser number of opportunities to “average out”. An easy way to figure this out is to take the case of a center that handles only 1 candidate. This center can have either 0% or 100% pass percentage; a -30% to +70% deviation from the average.
Let us now get back to the process performance baselines that we create and the way we do sub-process control. Here are some things that we need to keep in mind while creating, publishing and using baselines:
1) Baseline (mean and standard deviation) for a sub-process parameter (like coding productivity) will be different depending on whether we consider each the coding phase of each project as a data point, or we consider each program coded in each project as a data point. The standard deviation in the first case (large base) is likely to be smaller than the second case (small base).
2) When we publish performance baseline data, we need to qualify it with the level of detail at which it applies.
3) When we use the baseline data to do sub-process control, it needs to be applied to the same level of detail. So, to do sub-process control on program level coding productivity, we need to use the baseline that was created using programs as data points (not each project as a data point).
4) Baselines need to be created using similar situations of the base data. For example, we cannot combine the coding productivity on large programs with the productivity on small programs. Even if the average/ mean remains the same, the standard deviation will be higher when we take data from a smaller base as against a larger base.
The above points are not just “nits” but have an impact of the usefulness of baselines and sub-process control. Incorrect usage of baselines leads to incorrect displays of process instability / stability.