“Trust but verify”: a data analytic cautionary tale

Posted:  April 3, 2013

It’s a phenomenon everyone in the research business faces now and then:  you see something surprising, unexpected in your data, something not particularly positive for your client.  You say to yourself, “Oh no, the clients are going to be so disappointed.  This is going to come out of the blue for them.  They aren’t going to be happy.  They’re going to have a hard time believing it.  I’m not looking forward to presenting this.”

So what do we do?  We wait.

Then there’s the opposite kind of finding in your data.  You say to yourself, “That’s it—that’s a huge finding.  Who would have thought?  I can’t wait for our clients to see this.  They’re going to love it.  It will change the way they act on this opportunity.” 

So what do we do?  We wait.  Much as I don’t want to, we wait.

And why do we wait?  It’s not because we don’t trust our data.  I was taught a long time ago, as a graduate student (and that was a long time ago), to trust your data.  If you’ve followed the principles and best practices on sampling and questionnaire design, the data don’t lie.*

So we trust our data, but we don’t trust it blindly.  We “trust but verify,” meaning we test every possible reason why and how our data could be wrong, before we’re comfortable reporting that surprising, unexpected finding.  We never want to fall into the trap the Russian army recruiter fell into in a parable related by John Allen Paulos in his book, “Once Upon a Number:  The Hidden Mathematical Logic of Stories.”

“A recruiter in the Tsar’s army was riding through a small town and noticed dozens of chalked circular targets on the side of a barn, each with a bullet hole through the bull’s eye.  The recruiter was impressed and asked a neighbor who this perfect shooter might be.  The neighbor responded, ‘Oh that’s Shepsel, the shoemaker’s son.  He’s a little peculiar.’  The enthusiastic recruiter was undeterred until the neighbor added, ‘You see, first Shepsel shoots and then he draws the chalk circles around the bullet hole.’”


It’s the perfect cautionary tale for anyone in our business:  don’t take that surprising, unexpected finding at face value—trust your data, but verify that trust by testing every alternative explanation for it. 

*Every time I think of this phrase, “data don’t lie,” I’m reminded of the line attributed to NBA player Rasheed Wallace (now with the Knicks, formerly with the Celtics, and long-time Piston):  “Ball don’t lie.”  Whenever ‘Sheed  felt a foul called on him was wrong (meaning—if you know Rasheed Wallace—every foul ever called on him), if the foul shooter subsequently missed the free throw ‘Sheed would yell to the ref, “Ball don’t lie!”  The foul call was incorrect—the missed free throw verified it.