Tag Archives: Baseline

What comes first – SPC or a stable process?

An interesting topic, which has been discussed very often. In every discussion, people agree on what is right and what needs to be implemented. But in actual implementation the principles are forgotten. Therefore it is good to re-align ourselves to the basics time and again.

What is often seen in actual implementation of SPC (ineffective and incorrect implementation):

1)    A process is documented and used

2)    Data related to the process is collected

3)    When we need to do sub-process control (because we are aiming for High Maturity rating), an SPC chart is prepared.

4)    Data which are outliers are thrown out (root cause analysis is not possible, because the outlier data belongs belongs to a distant past, and the causes are lost in the mists of time)

5)    Control limits are recalculated

6)    Steps 4) and 5) are repeated till all (remaining) points demonstrate process stability

7)    The SPC parameters (center line, UCL/ UNPL, LCL/ LNPL) are declared as baselines and used for sub-process control. The fact that the limits are too wide or that a lot of data points were thrown out (without changing anything in the process) is ignored.

What we have in the above scenario is a maturity level 2/ 3 organization using maturity level 4 tools. Usage of tools alone does not increase maturity. We cannot create a stable process through the use of SPC, we can only confirm the stability of the process through SPC and get signals when the process is out of control or shows changes in trends.

The More Effective Implementation of SPC:

1)    A process is documented and used. As the process is used, variations in the interpretation of the documented process are qualitatively analyzed. Actions are taken to augment the process definition, training and orientation till the interpretation and the qualitative understanding of the process is consistent.

2)    Process compliance audits (PPQA audits) on the implementation of the process identify more actions that need to be implemented to fine-tune the definition, training and orientation related to the process.

3)    Once the audits show consistent compliance, data related to the process performance are collected. Integrity of the data is checked and the data collection process is streamlined and consolidated- till the collected data demonstrates the required credibility

4)    Now we start looking at the data somewhat quantitatively (without using full SPC) – does the trend chart show stability? Is it showing too much dispersion/ variation? Based on the findings, the definition, training and orientation related to the process is refined further

5)    This is point we start using SPC charts to confirm process stability. Each inflection of instability is analyzed. Corrective and preventive actions are identified to further standardize the process, based on analysis of past instability. Once we are sure that causes of those inflections are removed, we can remove the points from the analysis.

6)    We are still left with points which show instability, and our CAR analysis tells us that some of the causes are truly extremely rare events. These are then removed from the data pool. Now all the remaining points are a part of the process. If the process still shows instability, then we can do further analysis – are these really part of a single process? Beneath the surface, are there two or more processes, and we need to separate out the data (e.g., the process may behave differently in the “performance appraisal season”? :-))

Having followed all the above steps, we now have a basis (and hence baseline) for an effective implementation of SPC.

Remember: We cannot create a stable process through the use of SPC, we can only confirm the stability of the process through SPC.


I am Rajesh Naik. I am an author, management consultant and trainer, helping IT and other tech companies improve their processes and performance. I also specialize in CMMI® (DEV and SVC), People CMM® and Balanced Scorecard. I am a CMMI Institute certified/ authorized Instructor and Lead Appraiser for CMMI® and People CMM®. I am available on LinkedIn and I will be glad to accept your invite. For more information please click here. To get email alerts for new posts, click here to subscribe.

Size Does Matter! (for baselines and sub-process control) -Continued

Let us take the example of  examination/ test centers, that run an exam throughout the year, every day. Past one-year data shows – 30% of the candidates pass the exam and 70% fail the exam, all over India.

The Bangalore test center handles around 1000 candidates per month, whereas the Mysore center handles around 100 per month. Over the last one year, both centers have shown the same 30 pass: 70 fail ratio.

For the month of June 2010, one center has reported 38% pass and another has reported 29% pass. Which center (Bangalore or Mysore) is more likely (has a higher probability) to have reported 38%?

Well, Mysore is more likely to have the higher deviation from the average (+8%) than Bangalore (-1%), because Mysore, handling lesser candidates, has a lesser number of opportunities to “average out”. An easy way to figure this out is to take the case of a center that handles only 1 candidate. This center can have either 0% or 100%  pass percentage; a -30% to +70% deviation from the average.

Let us now get back to the process performance baselines that we create and the way we do sub-process control. Here are some things that we need to keep in mind while creating, publishing and using baselines:

1) Baseline (mean and standard deviation) for a sub-process parameter (like coding productivity) will be different depending on whether we consider each the coding phase of each project as a data point, or we consider each program coded in each project as a data point. The standard deviation in the first case (large base) is likely to be smaller than the second case (small base).

2) When we publish performance baseline data, we need to qualify it with the level of detail at which it applies.

3) When we use the baseline data to do sub-process control, it needs to be applied to the same level of detail. So, to do sub-process control on program level coding productivity, we need to use the baseline that was created using programs as data points (not each project as a data point).

4) Baselines need to be created using similar situations of the base data. For example, we cannot combine the coding productivity on large programs with the productivity on small programs. Even if the average/ mean remains the same, the standard deviation will be higher when we take data from a smaller base as against a larger base.

The above points are not just “nits” but have an impact of the usefulness of baselines and sub-process control. Incorrect usage of baselines leads to incorrect displays of process instability / stability.


I am Rajesh Naik. I am an author, management consultant and trainer, helping IT and other tech companies improve their processes and performance. I also specialize in CMMI® (DEV and SVC), People CMM® and Balanced Scorecard. I am a CMMI Institute certified/ authorized Instructor and Lead Appraiser for CMMI® and People CMM®. I am available on LinkedIn and I will be glad to accept your invite. For more information please click here. To get email alerts for new posts, click here to subscribe.

Size Does Matter! (for baselines and sub-process control)

Here is a small brain-teaser.

Let us take the example of a examination/ test centers, that run an exam throughout the year, every day of the year. Analysis of the past one-year data shows that 30% of the candidates pass the exam and 70% fail the exam, all over India.

The Bangalore test center handles around 1000 candidates per month, whereas the Mysore center handles around 100 per month. Over the last one year, both centers have shown the same 30 pass: 70 fail ratio.

For the month of June 2010, one center has reported 38% pass and another has reported 29% pass. Which center (Bangalore or Mysore) is more likely (has a higher probability) to have reported 38%? Why do you think so?

See my post dated August 3, 2010 for the answer and implications.


I am Rajesh Naik. I am an author, management consultant and trainer, helping IT and other tech companies improve their processes and performance. I also specialize in CMMI® (DEV and SVC), People CMM® and Balanced Scorecard. I am a CMMI Institute certified/ authorized Instructor and Lead Appraiser for CMMI® and People CMM®. I am available on LinkedIn and I will be glad to accept your invite. For more information please click here. To get email alerts for new posts, click here to subscribe.

Generating Lots of Data through Monte Carlo (a misuse?!?)

I have seen the metrics groups of organizations generating “enough” data for creating process performance baselines, from very few available data points, using Monte Carlo simulation.

Here is the method they use: Ten data points are available; using the pattern of the ten data points, they generate a thousand (or maybe a million) data points using Monte Carlo simulation. Now they feel that they have enough data points to generate a baseline.

But in reality the baseline has been generated using 10 data points. The 1000 data points only give a feeling of having lots of data and this is clearly a misuse of Monte Carlo simulation.


I am Rajesh Naik. I am an author, management consultant and trainer, helping IT and other tech companies improve their processes and performance. I also specialize in CMMI® (DEV and SVC), People CMM® and Balanced Scorecard. I am a CMMI Institute certified/ authorized Instructor and Lead Appraiser for CMMI® and People CMM®. I am available on LinkedIn and I will be glad to accept your invite. For more information please click here. To get email alerts for new posts, click here to subscribe.

Normal Distribution is Actually Rare

When we often use statistical analysis tools and techniques, the underlying assumption is that process/ sub-process displays a “normal” behavior. Even if the limited data that we have shows non-normal behavior, we assume that the reason is the lack of data, and we approximate the distribution to normal.

This assumption and subsequent analysis, conclusions and decisions are therefore inaccurate, especially if we are combining “assumed” normal behavior across multiple processes, viz Process Performance Modeling.

“Normal” behavior is very rare in real life. For example, you travel from your home to office, let us say usually in 1 hour. The least time you have ever done the trip is in 30 mins. If the distribution was normal, the worst time should have been 1 hour 30 mins (symmetrical on both sides). You will find that on some days that you were delayed, the time could have been 2 or even 3 hours!

Another way of saying that real life does not behave in a “normal” way, is “there is a limit on how well you can do, but no limit on how badly you can screw up!”

There is more on this in the books “Fooled by Randomness” and “Black Swan” by Nassim Taleb — must-reads for anyone involved in high maturity CMMI® implementation.

Also see:


I am Rajesh Naik. I am an author, management consultant and trainer, helping IT and other tech companies improve their processes and performance. I also specialize in CMMI® (DEV and SVC), People CMM® and Balanced Scorecard. I am a CMMI Institute certified/ authorized Instructor and Lead Appraiser for CMMI® and People CMM®. I am available on LinkedIn and I will be glad to accept your invite. For more information please click here. To get email alerts for new posts, click here to subscribe.