Repeatability and Reproducibility in a Noisy World

Sam Taylor
4 min readFeb 6, 2021

--

Cedric Chin at Commonplace recently published a very interesting article on how to make better predictions. In it he breaks down a recent paper arguing that the best way to improve prediction accuracy is not by reducing cognitive biases, but by reducing “noise”.

But what is noise? I will quote Cedric at length to allow him to explain.

A more interesting example is to take, say, a loan officer at a bank, receiving an application in their inbox. In theory, the same application submitted to the same officer on Monday and on Wednesday should result in the same decision (“Hi! Mr Yang, I’m pleased to say that we’ve approved your loan of $50k at 3.7% per annum …”). Similarly, the same application submitted to different officers of the same bank should in theory result in the same decision: an approval, or a rejection, and on the same terms.

In practice, when you go looking, there turns out to be fair amount of variability in the outcome. Perhaps the loan officer broke up with his girlfriend on Wednesday, and zipped through his inbox in anguish. Perhaps the officers in bank branch A are more paranoid than the officers in bank branch B. Kahneman and his co-authors argued that variability can be measured relatively easily in any decision making organisation; more importantly, this variability may result in millions of dollars of invisible costs in higher risk domains like loan underwriting or investment sizing. Such random variability in decision making outcomes is also called ‘noise’.

So one definition of noise is when given the same inputs you get wildly different outputs! This makes it hard to give reliable predictions as given the same information two people can give different answers. Even more worryingly the same person can give a different answer depending on the mood they are in.

Cutting out this noise, by building systems to ensure that when the inputs are the same the outputs are the same, appears to be critical to good predictions.

Manufacturing’s Answer to Noise

This idea made me remember a practice common in manufacturing plants — Gauge Repeatability and Reproducibility (GR&R) studies.

The purpose of a GR&R study is to see how “noisy” a measuring process is, and if the “noise” you do have is so loud as to make your process useless.

Lets have a look at an example GR&R study to help visualise this:

Imagine you have a factory producing cereal. You are the Quality Manager and you have your workers weigh every box on a scale at the end of the line. If the box is too light you know you haven’t added enough cereal, too heavy and you know you have put too much.

One day you are in a meeting with your Quality Engineer and she says there are lots of failures for weight. She is confused as she calibrated the scale last week, and there are no issues with the cereal dispenser. You say time for a GR&R study!

  1. You first ask your engineer to get 10 random boxes of cereal from the production line, and then call 3 workers to the scale Amanda (A), Beryl (B), and Connor (C) .
  2. You then have each operator weigh every box randomly, and have the engineer record the result.
  3. You then repeat step 2, with the order of the box weighing randomised again.

So at the end of this process each operator has weighed each box twice. Now we analyse the results for 3 elements:

  1. Reproducibility — Do all the different workers get similar weights when weighing the same cereal box?
  2. Repeatability — Does an individual worker get similar weights when weighing the same cereal box?
  3. Part to Part Variation — How much variation in weight is there between the boxes themselves?

So what do we want to see:

  1. Operators get similar weights when they weigh the same box for a second time.
  2. Different operators get similar weights to each other when they weigh the same box.
  3. Any variation between or among operators is MUCH smaller than the natural variation in the weight of the boxes

So why go deep on GR&R studies?

The simple fact that a study like this has to exist. A factory is one of the most controlled of environments, and putting a box of cereal on a scale is one of the easiest processes one can imagine. Even in this situation we often find that the same person gets different readings when weighing the same box of cereal!

There are no cognitive biases creeping in to change what reading the scale gives us. The scale doesn’t have political views that change the readings when it doesn’t like the opinions of the cereal box.

Yet still noise creeps in.

And if you have controlled the cereal box filling machine better than your scale you may find that the variation from workers is greater than the variation in your parts. This would mean your scale is useless for distinguishing one part from another!

Now imagine this problem applied to a more complicated problem. Say the problem of the loan application that Cedric mentioned at the start of this post. The world is much more complicated than a factory. It makes you think what an important factor noise is in so many of our judgements, and how important it is to control and understand its impacts on our decision making and predictions.

--

--

No responses yet