'I don't have time for this garbage': Why data resistance isn't what you think
Data pro's passionate reply to my question about “resistance” to data in decision making (rev. 2.5.2025)
Is there a general resistance among business decision makers to the use of data? If so, what’s to be done? I asked three leaders in the data industry — Scott Davis, Donald Farmer, and Mark Madsen. Each of the three gave a starkly different view.
First comes Mark with his passionate and characteristically iconoclastic challenge of the “resistance”premise. What’s assumed to be resistance to “the data” is usually resistance to something else entirely.
Data analysis at the scale and penetration that modern business usually deploys it, or tries to deploy it, is a relatively new beast. Data from inside the business and outside it comes in far more volume than even a decade ago. Remember a few more than a dozen years ago when “big data” was all the rage? “Big” was compared with what we had been used to, and the volume has become super-sized. So has the need to use it, now that every competitor uses it.
He asks, what’s “the data”? His argument made into an analogy with food would ask, “What’s food?” Is it the raw grain that comes off the field? The milled grain? Or bread?
Here’s Mark, with my comments interspersed and marked “TC.”
So your starting position is that there is widespread resistance?
I ask because I don't see that so much as an inability to use data, with many different reasons and rationales.
When there is resistance to "using data" it's usually resistance to something else, not the use of data.
“Bad people" - Could be bad actors, who don't want their failings/failure of what they do to be revealed.
“Stupid people" - They don't know how to use it, which has many guises. They are stupid and can't comprehend so do what they have always done.
They're ignorant, which has many guises too. Ignorant of the tools to access information, ignorant of the available information, ignorant of the relationship between the information and their practices.
In my experience, there is a considerable amount of the latter, not really seeing how the data relates to operations, or operations to the data.
There is also considerable lack of understanding by users of "the data,” which is also inherent in your question. What data? Which data? Do you mean data, or the metrics derived from the data?
TC: He asks, what’s “the data”? His argument made into an analogy with food would ask, “What’s food?” Is it the raw grain coming off the field? The milled grain? Or bread?
“The data” is usually understood among business people as the metrics derived from data — just as they understand food as what they buy at any retail source.
Which leads us into the rabbit hole of shitty IT and the regression to pre-1980s data practices in the guise of self-service and the confusion IT and devs [developers] have of data from systems (aka "raw") and data that is used to inform decisions, and the fact that it is usually required to transform one into the other, and that "the data" isn't a single-system thing but from many systems, and must be integrated in order to make sensible information.
TC: Imagine a “self-service” restaurant that touts itself as cutting-edge for its refusal to offer actual meals. Instead, patrons select raw ingredients from farm boxes of unwashed produce and animal carcasses. From there, they describe the preparation from start to finish. If the result is inedible, well, that’s on you. Such is what’s often called “self service” data.
“Lazy people" - Could be they are just lazy. I hear this all the time from IT people in companies large and small. If only the user would just learn... followed by something. The BI tool. The names of metrics and attributes. The semantic layer. The data model. The self-service tool and how to use it. SQL. Now, of course, “Just learn Python."
Sometimes I do meet lazy people, lazy in their thinking.
TC: Resistance seeming to have one reason when a different one is the cause is as true in resistance to data as it is in so many other complex, learned activities. Take cooking, for example. Does so-and-so produce bland or bad tasting food because either their equipment is bad or they have no sense of smell? Not usually. More often the reason is laziness, such as failing to pay attention. Or passive aggression. “Here, take that, you wretched spouse.” And there’s “stupid,” who throw frozen fish into hot soup and serve it as is.
But when the tools are an obstacle, whether in their ease of use or their ability to make clear what information is available and how it relates to process, it's chalked up to laziness rather than IT's or the vendor's fault.
Knowledge of "the data" can be an obstacle.[Users are told,] “Here's a pile of columns assembled from the applications for you. Just turn that into information with your self-service tool.” It's rarely so simple.
The "resistance" is more like learned helplessness. You have more IT crap to learn to do your job than what you have to learn to do your job. IT tells you "it's easy, you just have to..." and rolls their eyes when you don't "get it.”
It's also time. Learn all these tools. “Go to this training on data. Fit that in to your standard 8-hour day that is actually a 10-hour day because we realize that remote work allows us to reclaim your former 2 hours of daily commute. Now fit in the time to self-serve the data for the thing you need. Oh, we didn't teach you how to understand and map problem domain onto data? No problem, the modern data stack developers didn't learn that either.”
TC: Ill-designed tools, IT’s brow beating, days of training on top of regular work, hours prepping “self service” data — So goes the organization’s investment in becoming “data driven.”
If I was a user in a business I'd probably barely use data, or use the wrong data, take what's present and easy, because I DON'T HAVE THE TIME TO DEAL WITH THIS GARBAGE, IT. This is literally the complaint in one BI project after another over my career. And now I hear it from data scientists to the point that I made a data cleaning course and people are asking me "why didn't anyone tell us about this? Why isn't IT doing this?"
To which one need only respond that malice, stupidity, and laziness are either projections by IT people or lack of empathy. Empathy is underrated. How often does the developer take the time to sit down and learn what the user is doing and why they do what they do and need what they say they need? Usually looks like a data order-taker and is treated that way. Because asking questions like this is hard, and takes people skills, and it's not “agile.”
Speed kills, in particular speed kills the notion of design, which requires time spent up front thinking about and defining the problem.
What to do about the various problems, as I labeled them, seems pretty obvious. Vendors held accountable. People given the time to learn. Emphasis by IT, biz, vendors or learnability and ease of use and so on. IT held accountable to making useful data, not making data.
Data scientists who are supposed to know how to deal with data preparation as 70% of their work whine that it should only be 20% when feature engineering is basically data prep and the quality of solution is based entirely on knowing your data and shaping it. Proper emphasis on training of the practitioners, including the IT people. That is a tall order.
Focus on the problem of data as more than "copy everything from distributed to one place" with the assumption now All Will Be Easy, which is the thinking behind cloud DW/lake/lakehouse/MDS.
This is the terrible mistake in all the modern [sic] data infrastructure, where the physical access problem isn't the real one. The real problem is what you do to the data after you resolve the physical access problem. It always has been. The industry confused itself over this circa 2009 and has regressed. It is slowly relearning all the old problems and solutions.
TC: I would like to know why they regressed. What happened within the data industry to retreat from the hard problem to deal with an easier one?
AI [sic] isn't going to help this problem either. It's a reasoning problem, deep in semantics, not something you can use fuzzy pattern matching to dig your way out of.
In the above, the uses to which I aver are primarily making decisions that are not purely operational in the applies-to-a-single-task operational BI sense. That sort of lookup-and-do is mostly "raw" data, with low context requirements, no or little derivation. The "how many widgets came from the machine" or "what's the balance" kinds of questions are most often trivial to answer.
Mark
TC: As I read it, he winds up with this: Key parts of the data industry have shirked their responsibility. Instead of fulfilling the glib “easy to use” slogan slapped onto so many tools, make them truly easy to use. Properly refine raw data into usable, meaningful metrics. Don’t intimidate users. Deal with the hard problems that the data industry and IT departments are there to handle. Make extracting value from data a clearcut, natural part of business people’s work. To vendors and IT, just step up and do your work.
NEXT WEEK: Donald Farmer’s response, followed by Scott Davis’s.
I agree with Mark that the problem is not technical per se. It is people. It always was. The best technology designers divide the tech-usage (or data usage) challenge into two separate cognitive domains: tool domain, and problem domain. Design challenges in the tool domain pertain to how the user interacts with the tool. These are the attributes of the tech that the user must focus on and master in order for the tool to do its thing well. Where is the button for X? Does the tool define the workflow in this sequence ot that? Generally speaking, this is NOT where tech design fails. Most tech product engineers of ordinary skill level can deliver products that work ok within their vision of what the product is supposed to do -- the tool domain. The same for data. The breakdowns that we see in industry are very, very rarely that users cannot figure out how tools work or what data mean. The breakdowns are not within the tool domain. They are within the problem domain. Very few business folks have the logical, strategic, and mathematical skills to design a data-informed decision making process for the business question on the table at any moment. This is the dirty little secret behind the billions of failed dollars spent on "BI for the masses": you were never going to overcome the cognitive shortcomings of the average business analyst in the face of the average analytical challenge. That might be a little harsh, but not by a lot. There is no amount of data prep, no amount of access ease, no library of simple statistical widgets, etc that are going to substitute for an analyst being able to write with pencil and paper a good analytical design that specifies what data are needed to feed what mathe-logical heuristic that will elicit real root causes beneath the felt symptoms, surface intervention options that manifest the organic and connected nature of the enterprise, and define a defensible means of choosing a "best" decision from those options. Not one aspect of that is tool domain. It is entirely problem domain ....reasoning... every step of which can be laid out in the strong analyst's mind without consideration of any buttons/affordances etc in any tool. Imo this is the comeuppance visited on the tech industry for playing the pied piper in "BI for the masses" for decades. Now everyone believes the masses can reason through these very complex logical and mathematical decision-spaces. And they cannot. From this reality spring all manner of behaviors you might be observing as "resistance to data."
Thank you ... I could use more data to support my studies!