# Probability/ Stats Puzzles – 2 & 3 (Solutions)

If you’ve not seen/ attempted the puzzles, the links are here: puzzle-2 and puzzle-3. These were presented in earlier posts.

Both these puzzles are adopted from a delightful little book by John Allen Paulos titled A Mathematician Reads the Newspaper.

I will provide more details about the book next week. For now, here are the solutions to the two puzzles.

## Puzzle-2

You need to call the throw of a dice a 1000 times. Like all dices, in each throw, this dice also gives you a number between 1 and 6. You are also told that the dice is slightly distorted / damaged – the probability of getting the six results is as follows: 1- 20%; 2- 10%; 3- 25%; 4-15%; 5-15%; 6-15%.

What strategy would you use to call the answers for the 1000 throws? Your objective is to get the right answer for a maximum of the throws.

Solution:

Call 3, 3, 3, 3…. all the 1000 times. This will get you aprroximately 250 right calls.

Or better still, tell the dice roller that your call is 3 all the thousand times, go for a coffee, or do something useful, come back after some time.

## Puzzle-3

Two contestants are to decide on the winner of 10 mn by flipping a coin. The winner will be the one who reaches six (6) correct calls first.

After 8 flips, contestant A has 5 correct calls, and contestant B has 3 correct calls. At this stage they agree NOT to continue with the flipping of the coin. Here are some proposals on how the money should be shared:

1. Contestant A says that since he is leading, he should get the 10mn.
2. Contestant B says that since the flipping was called off before the final result, the 10mn should be shared equally.
3. The show-host says that TV quiz program sponsors should retain the 10mn, since both the contestants agreed to call off the contest.
4. Someone from the audience suggests that the prize money be split in the 5:3 ratio (5 for A and 3 for B), in line with the number of right calls
5. A mathematician calls in to suggest that the money be split A7:B1 (try and guess the logic here, it is related to the probability of winning from this point, if the flipping had continued)

Solution:

The question on how the money is to be shared is not a mathematical /statistical problem at all! It is a matter of fairness and justice, and each solution proposed (and some yet to be proposed) has its own merit.

However, if you have not yet worked out the logic of why the mathematician proposed option # 5 above, here it is:

For contestant B to win 6 calls in a row, he/ she needs to call ALL of the next three calls correctly (even if he / she calls one incorrectly, A will reach 6 right calls. So the probability of B winning is (0.5) x (0.5) x (0.5) = 0.125; which means A has a probability of 0.875 – that is 7:1.

Next week, I will cover the source of these puzzles, a book titled A Mathematician Reads the Newspaper by John Allen Paulos.

You may also forward the link to this post to your friends, colleagues, and anyone else who may be interested.

Notes:

Nothing Official About It! – The views presented above are in no manner reflective of the official views of any organization, community, group, institute, country, government, or association. They may not even be the official views of the author of this post :-).

# Probability/ Stats Puzzle – 3

I encountered another problem in the same book (I will disclose the name of the book in a later post along with the answer). Here is the problem:

Two contestants have reached the last round of a TV quiz contest and one of them is hoping to be the winner of a prize of 10 mn (currency deliberately left vague) via a tie-breaker. Even after the tie-breaker, neither of them has beaten the other.

The show-host offers to break the tie with a coin (my guess is that the show host did not have any more questions left :-)). However, to maintain the suspense and gain more TRP, he proposes that the winner will be one who reaches six (6) correct calls first.

After 8 flips, contestant A has 5 correct calls, and contestant B has 3 correct calls. At this stage both the contestants agree NOT to continue with the flipping of the coin (maybe the coin is lost or it breaks or falls into something disgusting – use your imagination). They have to decide on the winner based on result of the 8 flips.

Here are some proposals:

1. Contestant A says that since he is leading, he should get the 10mn.
2. Contestant B says that since the flipping was called off before the final result, the 10mn should be shared equally.
3. The show-host says that TV quiz program sponsors should retain the 10mn, since both the contestants agreed to call off the contest.
4. Someone from the audience suggests that the prize money be split in the 5:3 ratio (5 for A and 3 for B), in line with the number of right calls
5. A mathematician calls in to suggest that the money be split A7:B1 (try and guess the logic here, it is related to the probability of winning from this point, if the flipping had continued)
6. Any other…

It is interesting to note so many options to a simple situation.

# Probability/ Stats Puzzle – 2

I encountered this simple problem in a book (I will disclose the name of the book in a later post along with the answer).

Here is the problem:

You need to call the throw of a dice a 1000 times. Like all dices, in each throw, this dice also gives you a number between 1 and 6. You are also told that the dice is slightly distorted / damaged – the probability of getting the six results is as follows: 1- 20%; 2- 10%; 3- 25%; 4-15%; 5-15%; 6-15%.

What strategy would you use to call the answers for the 1000 throws? Your objective is to get the right answer for a maximum of the throws.

Here are some answers that I have heard:

1. Call the number ‘3’ all the 1000 times – this is the most common answer I have heard.
2. Call the numbers in the same pattern as the probability: 1- 200 times; 2- 100 times; 3- 250 times; 4-150 times; 5-150 times; 6-150 times.
3. Call the numbers randomly, ignoring the distortion in the dice.
4. A variation of 2 above is to call the numbers in the same pattern, but also taking into account the answers to the past throws, so that we try and keep the probabilities similar to the expected patterns. So if in the first 100 throws, 1 has already rolled more than 20% and 2 has been rolled less than 10%, then in the 101st throw, call 2 instead of 1, and so on.
5. There are other possible answers too – and the right one may not be listed above (this is not a mutiple choice question 🙂 )

Work out the reasons for your choice, not just make a choice. The reasons are more important.

This is a simple question, and you should get the right answer.

The answer will be posted later.

# Probability/ Stats Puzzle – 1 (Solution)

If you have not tried to solve the puzzle, click here for the problem. The problem was discussed in an earlier post.

This is a famous puzzle, called the “Monty Hall Problem”. Monty Hall was a host in the early episodes of the game show Let’s Make a Deal.

The common version of the the puzzle used three doors (instead of 3 boxes) and a car and two goats (instead of gold and garbage).

The problem was originally posed by Steve Selvin and became famous when it was quoted by Marilyn vos Savant in Parade magazine in 1990.

The answer: You increase the probability of winning the gold if you change your choice of the box to open. The probability of winning the gold is only 1/3 if you continue with your original choice and 2/3 if you change your choice.

Here is a brief explanation of why:

When you initially selected a box, you had a 1/3 probability of being right. The host knowingly opened a box with garbage in it, so that eliminated one of the wrong choices.  You still have a 1/3 probability that you initially chose the right box; this means that the other unopened box has a 2/3 probability of containing the gold.

Amit Bhattacharjee, Satish K Mariyappagoudar, and Patrick OToole got it right.

Or you can watch the youtube video.

You can also search the internet for the keywords “Monty Hall Problem” – you will get lots of hits.

You may also forward the link to this post to your friends, colleagues, and anyone else who may be interested.

Notes:

Nothing Official About It! – The views presented above are in no manner reflective of the official views of any organization, community, group, institute, country, government, or association. They may not even be the official views of the author of this post :-).

# Probability/ Stats Puzzle – 1

This problem was presented to me by Swapna (my wife) on last Friday – I could not work out the right answer even after considerable struggle.

You are participating in a TV show contest. You have reached the last round. If you win this round, you get take home a pure gold brick of 5 KG (5KG = 11.02 lbs); if you lose you have to take away an equivalent quantity of stinking garbage.

Here is the problem in the last round:

There are 3 closed boxes (let us say B1, B2, B3). Inside two of the boxes is garbage. Inside one of the boxes is the gold. You have to open one box and take home whatever is in that box. You decide to open B1. The show-host/ quiz-master asks you to stop, and as a hint opens one of the other two boxes, and inside that box there is garbage. The show-host gives you the option of changing your choice. Would you still go for your original choice or switch to the other unopened box?

Here are some relevant assumptions/ hints/ guidances:

1. Most important: you would prefer to take home the gold instead of the garbage :-).
2. You will not be able to smell the garbage or gold without opening the boxes, or in any way be able to “know” what is inside the unopened boxes.
3. You do not know the show host’s motivation. The show-host may be trying to help you or trick you, or trying to increase hir (his/her) popularity rating, or just following a script. So, do not consider the show host’s motivation in trying to solve the problem (when Swapna presented me the problem, I went on the motivation track, and could not approach it as a problem of probability, even after she told me to ignore the show host’s motivation 🙁 ).
4. There is no “trick” in the problem or the solution – so, approach it as a problem of probability/ statistics.
5. Do not be lazy and search the internet to find a solution. That is cheating. I have changed some things in the problem so that is not easy to search. However, this is not a test of how quickly and ingeniously you can search the internet.
6. You will have to work out the reasons for your choice, not just make a choice. The reasons are more important.

Don’t feel bad if you don’t get the answer right, many renowned statisticians have got it wrong.

The answer is available in another post here.

# HMBP Conference 2013 on 24th Sept in Pune and 27th in Bangalore, India

The HMBP Conference (an annual event in its 4th year), showcases what leading organizations are doing and are planning to do in their implementation of high maturity practices.

This year, the HMBP Conference theme is “High Maturity Impacts: Interweaving Services and People” and it is also being held in two cities.

• at a new venue – Pune on September 24th, 2013
• in its regular den – Bangalore on September 27th, 2013

Since nobody could figure out the difference between a “Colloquium” and a “Conference”, I believe the organizers decided to stick to the more traditional word. 🙂

# HMBP (High Maturity Best Practices) Colloquium – 2012 proceedings are now available for download

The HMBP Colloquium (an annual event in its 3rd year), was held in Bangalore on September 7, 2012. This year’s theme was “High Maturity – Beyond Statistics & Quantification”.

Full details of the colloquium are available here.

The presenations of the day are put up on a page at the site here. To get the presentations, you will need a login id and password – which you can get by writing to conferences@qaiglobal.com .

# How Mature is High Maturity Implementation of CMMI®? – Find out at the HMBP Colloquium on 7th September, 2012 in Bangalore, India

Over the last few years the implementation of high maturity practices has evolved significantly. Some of the evolution has been driven by changes to the model, but most of change is driven by sophisticated interpretation by LAs, and through the HMLA qualification/ certification program.

To “having a few control charts”, we have added “tests of hypotheses”, “regression equations”, “Monte Carlo Simulations”, “Comprehensive PPMs” to the pot churning out HM practices.

The HMBP Colloquium (an annual event in its 3rd year), showcases what leading organizations are doing and are planning to do., This year’s HMBP Colloquium is in Bangalore on September 7, 2012. It is aptly titiled “High Maturity – Beyond Statistics & Quantification”.

I am looking forward to understand the emerging direction. And meeting all of you.

I also hope to figure out the difference between a “Colloquium” and a “Conference”. 🙂

This Colloquium  was held on Sept 7, 2012 at Le Meridian, Bangalore.

# What comes first – SPC or a stable process?

An interesting topic, which has been discussed very often. In every discussion, people agree on what is right and what needs to be implemented. But in actual implementation the principles are forgotten. Therefore it is good to re-align ourselves to the basics time and again.

What is often seen in actual implementation of SPC (ineffective and incorrect implementation):

1)    A process is documented and used

2)    Data related to the process is collected

3)    When we need to do sub-process control (because we are aiming for High Maturity rating), an SPC chart is prepared.

4)    Data which are outliers are thrown out (root cause analysis is not possible, because the outlier data belongs belongs to a distant past, and the causes are lost in the mists of time)

5)    Control limits are recalculated

6)    Steps 4) and 5) are repeated till all (remaining) points demonstrate process stability

7)    The SPC parameters (center line, UCL/ UNPL, LCL/ LNPL) are declared as baselines and used for sub-process control. The fact that the limits are too wide or that a lot of data points were thrown out (without changing anything in the process) is ignored.

What we have in the above scenario is a maturity level 2/ 3 organization using maturity level 4 tools. Usage of tools alone does not increase maturity. We cannot create a stable process through the use of SPC, we can only confirm the stability of the process through SPC and get signals when the process is out of control or shows changes in trends.

The More Effective Implementation of SPC:

1)    A process is documented and used. As the process is used, variations in the interpretation of the documented process are qualitatively analyzed. Actions are taken to augment the process definition, training and orientation till the interpretation and the qualitative understanding of the process is consistent.

2)    Process compliance audits (PPQA audits) on the implementation of the process identify more actions that need to be implemented to fine-tune the definition, training and orientation related to the process.

3)    Once the audits show consistent compliance, data related to the process performance are collected. Integrity of the data is checked and the data collection process is streamlined and consolidated- till the collected data demonstrates the required credibility

4)    Now we start looking at the data somewhat quantitatively (without using full SPC) – does the trend chart show stability? Is it showing too much dispersion/ variation? Based on the findings, the definition, training and orientation related to the process is refined further

5)    This is point we start using SPC charts to confirm process stability. Each inflection of instability is analyzed. Corrective and preventive actions are identified to further standardize the process, based on analysis of past instability. Once we are sure that causes of those inflections are removed, we can remove the points from the analysis.

6)    We are still left with points which show instability, and our CAR analysis tells us that some of the causes are truly extremely rare events. These are then removed from the data pool. Now all the remaining points are a part of the process. If the process still shows instability, then we can do further analysis – are these really part of a single process? Beneath the surface, are there two or more processes, and we need to separate out the data (e.g., the process may behave differently in the “performance appraisal season”? :-))

Having followed all the above steps, we now have a basis (and hence baseline) for an effective implementation of SPC.

Remember: We cannot create a stable process through the use of SPC, we can only confirm the stability of the process through SPC.

# Size Does Matter! (for baselines and sub-process control) -Continued

Let us take the example of  examination/ test centers, that run an exam throughout the year, every day. Past one-year data shows – 30% of the candidates pass the exam and 70% fail the exam, all over India.

The Bangalore test center handles around 1000 candidates per month, whereas the Mysore center handles around 100 per month. Over the last one year, both centers have shown the same 30 pass: 70 fail ratio.

For the month of June 2010, one center has reported 38% pass and another has reported 29% pass. Which center (Bangalore or Mysore) is more likely (has a higher probability) to have reported 38%?

Well, Mysore is more likely to have the higher deviation from the average (+8%) than Bangalore (-1%), because Mysore, handling lesser candidates, has a lesser number of opportunities to “average out”. An easy way to figure this out is to take the case of a center that handles only 1 candidate. This center can have either 0% or 100%  pass percentage; a -30% to +70% deviation from the average.

Let us now get back to the process performance baselines that we create and the way we do sub-process control. Here are some things that we need to keep in mind while creating, publishing and using baselines:

1) Baseline (mean and standard deviation) for a sub-process parameter (like coding productivity) will be different depending on whether we consider each the coding phase of each project as a data point, or we consider each program coded in each project as a data point. The standard deviation in the first case (large base) is likely to be smaller than the second case (small base).

2) When we publish performance baseline data, we need to qualify it with the level of detail at which it applies.

3) When we use the baseline data to do sub-process control, it needs to be applied to the same level of detail. So, to do sub-process control on program level coding productivity, we need to use the baseline that was created using programs as data points (not each project as a data point).

4) Baselines need to be created using similar situations of the base data. For example, we cannot combine the coding productivity on large programs with the productivity on small programs. Even if the average/ mean remains the same, the standard deviation will be higher when we take data from a smaller base as against a larger base.

The above points are not just “nits” but have an impact of the usefulness of baselines and sub-process control. Incorrect usage of baselines leads to incorrect displays of process instability / stability.