Generating Lots of Data through Monte Carlo (a misuse?!?)

I have seen the metrics groups of organizations generating “enough” data for creating process performance baselines, from very few available data points, using Monte Carlo simulation.

Here is the method they use: Ten data points are available; using the pattern of the ten data points, they generate a thousand (or maybe a million) data points using Monte Carlo simulation. Now they feel that they have enough data points to generate a baseline.

But in reality the baseline has been generated using 10 data points. The 1000 data points only give a feeling of having lots of data and this is clearly a misuse of Monte Carlo simulation.

I am Rajesh Naik. I am an author, management consultant and trainer, helping IT and other tech companies improve their processes and performance. I also specialize in CMMI® (DEV and SVC), People CMM® and Balanced Scorecard. I am a CMMI Institute certified/ authorized Instructor and Lead Appraiser for CMMI® and People CMM®. I am available on LinkedIn and I will be glad to accept your invite. For more information please click here.

One thought on “Generating Lots of Data through Monte Carlo (a misuse?!?)”

  1. Agreed. Statistical descriptors (mean, median, standard deviation etc.) for 1000 data points will be more or less the same as that of 10 data points. Confidence level may increase because of 100 times increase in data points. But it is a misplaced sense of confidence ! The remaining 990 data points are not real data points at all !

Leave a Reply

Your email address will not be published. Required fields are marked *