The shadow war between stories and spreadsheets
Your brain has an invisible army fighting against data that challenges its beliefs. A data pioneer explains why facts alone can't win this battle.
This is the third in a series in which data-industry leaders answer a question I’ve often wondered about: Is there a widespread resistance among business people to resist the use of data in decision making? If so, why, and what can be done about it?
Now comes Scott Davis for a philosophical view. After my comments comes Mark Madsen’s response to Scott.
Scott Davis
Scott Davis is tough to categorize. He describes himself as a hybrid of “corporate operator and mechanical adventurer” and “a kind of polymath who can as easily crawl under a diesel engine as design a cloud platform.”
I’ve known him since 2008 when he launched his pioneering analytics tool, Lyza. Unlike every other tool of that era, it dared to inject social trust and conversation as factors for judging data quality. His theory was that data trusted by trusted people would be more valuable than data of unknown origins.
He’s also the author of a recent book, Surf the Seesaw, Unconventional Essays on Balance, Beauty and Meaning in Life. I reviewed it here. In short, I recommend it.
I’ve come to know Scott as one of the most interesting and insightful iconoclasts I've encountered in the technology industry.
Scott’s thoughts the sources of resistance to the use of data in business
There are many. I’ll give one example for each of two very different types:
A bad reason to resist data….
1. Myths have mass. Before data arrive on the scene of a discussion, it’s not like there’s a void in everyone’s head. There are anecdotes, stories, prejudices, memories, biases, values, preferences, etc. Together, these constitute the myths we tell ourselves as we try to disambiguate an ambiguous world. These are the pre-data factors in our sense-making. And, they affect our emotional receptivity to data — not just whether we agree with the data, but WHICH data we seek and even in many cases which specific pieces of the data in front of us we even SEE. Our myths have mass that give our preconceived stories inertia against which data must push very hard to get any movement in our stories about our world. Becoming an impartial strategic juror who considers all data objectively, that is a very big ask of a human brain.
A good reason to be circumspect about data….
2. Data are synthetic. They are not the real world. They are manifestations of the real world, shadows on the wall of Plato’s cave — they are representations. Those data representations are captured through a series of instruments and massaged through a series of processes, all of which were designed under a specific set of preconceived notions about how the world works, about what the business should be doing, about which decisions matter, and therefore about which data should even be captured to feed those decisions. In other words, our data feeds — despite the fact that they are staggeringly large and dizzyingly rapid — are not even remotely (a) comprehensive or (b) objective. They are selective, and they manifest the prejudices of the folks who designed the instrumentation processes through which they were captured and presented. In such cases, being “data driven” amounts to being driven by the architects of the data ecosystem.
TC: The two reasons are two sides of the same coin. Both sides originate in experience, either imagined or real — though both seem real, and probably indispensable. The “bad” reasons to resist data might actually be antibodies from past experience. The “good” reasons might actually be a valid sense that something is just not right and shouldn’t be trusted.
The remedy for both, it seems to me, is education to help new users of data to manage those internal filters. Few of us are truly aware of our own myths and biases, and even fewer can ignore them on command. Perhaps the starting course would be on such basic problems as how to identify your questions and select your data. How to select your data for quality, by provenance, relevance, completeness, etc. Finally, how to read “cave paintings.”
Mark Madsen’s reply to Scott:
Myths, or implicit knowledge? How do you know the difference when experience is long?
I had to learn to debate such things with people a long time ago because it was the barrier to adoption. Everyone knows it works this way, and it's probably true. Until someone changes a process, or the external factors change.
For example, I argued with the exec of a hardware store chain that these drops we see are not due to the weather, but are due to the introduction of big box home stores. Systemic shift, not usual factors.
The inexplicable in physical retail sales was often the result of something, like weather or another unscheduled event. The retreat to the known and explicable continues after the conditions change.
In this case the only way to convince him otherwise was to incorporate weather data into the system. Temps, clouds, precipitation, all time logged as a dimension of sales at a location. So he could see for himself that this wasn't the case, and open up to alternatives which are too painful to bear normally. Denial caused by discomfort can prevent people from seeing the truth that information shows.
I like the "data isn't real" point. It's an abstraction, a representation. If the abstraction is poor, or poorly understood, it's hard to trust the data given the gap. Plus, over time as processes change, or conditions change, the data doesn't change to match. Metrics and attributes become untethered to reality and again trust is lost.
Trust is at the root of many behaviors. If you don't trust "the data" then you have top rely on other sources of information. Today it's easier than ever to not trust the data because we have ever more distributed systems, the machinations of which are poorly specified and poorly understood.
Even 20 years ago I was finding bugs in transactions systems (aka the source of truth, the system of record) because the integrated data didn't match reality. But that SoR is the data reality according to most people. So you trust that arbitrary generator of data over the triangulation that says something is amiss.
Now roll forward to today's hands-off data environments where "we collect, you decide" is the architectural principle. The data content is not supplied by need driving the collection, but by what's available to be collected. It's further disempowering people because most of the best data comes by making good choices about how and what to collect based on the use to which it will be applied.
This was the big failing of the structural interpretation of star schema. You can make a model out of the sources. Sure you can, but it fails to work based on the questions that are asked, which redefine the required semantics of the sources and the model.
As Scott said, the architects of the ecosystem are the drivers.
I would comment on the notion of "data-driven", which I think is another disempowering sound bite phrase. Good decision making under conditions of ambiguity or uncertainty requires one be decision driven, or need driven. Letting the data drive is putting the wrong questions and people at the front of the process.
Mark
Mark’s hardware exec was “data driven”; he could operate the merchandise and read data charts. Yet it wasn’t enough. Could he have invoked a money-back guarantee? Why must he pay extra, Mark’s fee, to know which data should drive him? Or was there a disclaimer in the “solution’s” warranty? “Vendor is not responsible for choice of data.”
The exec was lucky enough, or smart enough, to look away from his preconception to recognize the correct data — while the data machine stood by offering both.
The industry’s happy talk of “data driven” and “easy to use” avoids the basic and difficult questions involved in judging reality. Data’s just another manifestation — imperfect, often subtle, but always requiring wise judgement.
The industry puts shockingly little emphasis on education for becoming truly adept at using data. Sure, it sponsors educational events, online courses, etc. But that’s an afterthought.
The data industry vendors sell the mcguffin — the object of high dramatic focus but low importance. In Alfred Hitchcock’s crook stories, as Hitchcock put it, “it’s always the necklace, and in spy stories it’s always the papers.” In the data industry, it’s always the “solution.”
Mark, as usual, focuses on some interesting angles. To followup and expand on those topics... I think that the fact that we cannot tell the difference between tacit/implicit knowledge or myths is great way to restate my point. They are both equivalent logically until proven/disproven via experiment and evidence, and only then can we discern which was which. From an information theory perspective, the proper data will force the condensation of our Schrodinger-suspended reality (my story is unproven, and the contrary is also unproven) into an at-least-temporarily-reliable binary state of true/false. Only then do we know whether the stories we’ve been telling ourselves are reliable models of reality. I also 100% hate the term Data Scientist. It is demeaning. It communicates an impotent construction of a job expectation, because it says …data are the ends rather than the means, data constitute a domain of science rather than a tool in the practice of scientific thought, etc. As you point out, the productive genesis of a logical and scientific approach to these management questions is the decision. What decisions matter to us, why do they matter, and what does that tell us about the best way to structure the decision calculus? That calculus then explicitly sets forth the set of data necessary to inform/make those decisions. You simply cannot get to the right answer by starting with the data, because any set of data will enable an infinite set of calculi – the 99.999% of which are nonsense. That is the root reason that the “let’s build a data world to support every possible decision” is a fool’s wasteful errand. Even though I worked neck-deep in data for decades, I always referred to myself as a decision scientist, not a data scientist. The data were simply a means to the end of scientific decision-making. My job is to map out how we should be making decisions, then design instrumentation and data management processes to enable us to execute those decisions at scale.