Welcome back, data provocateurs! This recording is the second segment of our data discussion with Tom Redman about his new book, Getting in Front on Data. Tom discusses different themes about becoming a data-driven business, and best practices to ensure data quality.
Tom Redman, the “Data Doc,” helps companies, including many of the Fortune 100, improve data quality. Those that follow his innovative approaches enjoy the many benefits of far-better data including far lower cost. He is the author of Getting in Front on Data: Who Does What (Technics Publications, 2016) and Data Driven(Harvard Business Review, 2008). His articles have appeared in many publications, including Harvard Business Review, The Wall Street Journal and MIT Sloan Management Review. Tom started his career at Bell Labs, where he led the Data Quality Lab. He has a Ph.D. in Statistics and two patents.
To take the first steps towards a comprehensive data quality management plan, request a free data assessment today! Know the health of your database.
Path to Data Quality: Where to Start?
Joe: Let’s say an executive accepts that they really need to get on top of data quality, and they understand it’s an issue. How does that translate to action? So in other words, what do you suggest as far as where to start?
TR: I think the first thing that… Data quality is a solvable problem, and this is one of the points that I wanted to really drive home in “Getting in Front… ”
Lots and lots of companies have figured out that the secret is to figure out what’s most important to customers. Figure out where the data’s created, measure the quality of data at the point of creation, and then find and eliminate the root causes of error. It is just not that hard; virtually everyone can participate. And I said a couple of things there:
- The first thing is make your needs known.
- For many executives, I don’t know how they manage with the quality of data they’re dealing with day in and day out, and a first step for many of them is to become increasingly intolerant of bad data, right?
- To get this thing kicked off and say, “We’ve been tolerating this stuff too long. Let’s start focusing in on an area and let’s follow the script that I just talked about. Let’s articulate what’s most important, let’s figure out where that stuff comes from and let’s help those guys, those creators.”
TR: And it may be a process, it may be a department, it may be physical people, men and women. Let’s help them figure out the root causes of the errors that are bedeviling us, and make them go away. I think that’s the first step for everybody is recognize you’re a data customer, and there’s no way for most creators to know that you’re not getting what you need unless you tell them. And so this is very important, I really want to emphasize this:
This is a solvable problem when customers speak up. If customers don’t speak up, then I don’t know how we solve it.
Friday Afternoon Measurement (FAM):
TR: One of the things that I observed early on was that people are trying to manage data quality by anecdote. And so the people who think it’s bad would come up with a list of counter examples and the people who think it’s not so bad would say, “Yeah, but you haven’t really shown us any hard statistics or anything around the cost, or anything like that.” And a sort of stasis, no consensus, no movement, existed.
So the obvious answer was, “Well, let’s make some measurements.” And then you found out that making measurements, you can go purchase a tool and then install it and use it.
TR: I developed this thing called the Friday Afternoon Measurement. The basic idea was just pick something that’s important to you. If you open customer accounts, pick 10 or 15 things.
So you assemble something and there’s 15 data items and a hundred records. Lay this out on a spreadsheet, it’s 100 by 15, and get a couple of people together in a room and give them each a red pen, and just work line by line and have them circle everything that they see that’s a problem.
They can’t do 50 million records like you do with a computer, but they can do a hundred. And a lot of times, if you do the last hundred and then we’d start counting out, well, how many rows had no errors? And we were finding out that some were between 20% and 50% of the data records, the last hundred things you did, had errors in them.
TR: What we found is, is this has been prima fascia evidence of a data quality problem. It was a simple, fast, reasonably defensible method for figuring out whether you have a problem or not, and informing this debate, “No, we do. Yes, we do. No, we don’t.” Kind of thing. It’s been one of the things that have worked out really well for the organization.
TR: One of the things that has been sort of people think it’s true, is that anything around data requires senior leader commitment. I did a re-synthesis of every engagement that I’ve worked on and others that I knew about. In every case, I did not find the leader at the tip of the spear.
What I’ve found was somebody, usually in the middle, often at fairly low levels of the organization, who asked the question, “Why am I having to deal with this bad data, and can I make it better?”
And one thing lead to another, and they made some improvements. And they got good traction, they made some gains. And then what they were able to do was hold those up to the organization. And by the way, I want to make it clear, these people, most of them when they were starting out, they weren’t interested in data quality per se. What they were interested in was doing their job better, and doing their job better required better data. And when they really looked at that they just grew dissatisfied with correcting all this stuff. And they followed the script we talked about.
TR: And so, the way they became provocateurs was by having a success story, and then putting that in front of their organization, and saying, “Hey, this is the way we ought to work.” And variously this work team, or this department, or this business unit, you ought to just be doing this differently. My fondest hope for “Getting in Front on Data” is that people develop the courage to go be provocateurs, to go do this.
Data Customers vs. Data Creators:
Joe: Something you talk about in the book is the difference between a data creator and a data customer and that relationship. And I think an issue with how some people understand data quality is that they can’t assign who is responsible for data. You talk about how everybody deals with data. So what exactly is the difference between a data customer and a data creator, and how do those two relate to each other?
TR: Okay. So you’ve asked a really important question. In quality, a customer is somebody who uses the product, service, or data. And so, a data customer is obviously somebody who’s using the data. They may be inside the organization or they may be outside the organization. A boss certainly qualifies as a customer. And we use the term person. A customer need not be a person. It could be a device. There’s lots of things that are done automatically.
Sometimes I find it helpful to personalize it and other times to de-personalize it, and that’s a process. And then the process is using the data. And so, if you think about that, the process, the customer is on the using the data end.
TR: The other side is the creation. And there’s sort of a realization that every day, all of us are creating data. And we may be putting it in a system, we may be putting it in a report, we may be putting it in an article we write, it may be in an email. It’s how we create value in our organization. And for a large, large number of problems, what we’ve observed is that the creators have no idea who is using their stuff.
And you think about this, at one level people are working to do order entry. And their job is to do order entry. They have no window into the fact that two days later, somebody has to ship the thing. And so, it’s “Well, what did you put in?” “Well, whatever the system would allow.”
The customer is depending on the creator to do things right is really sort of fundamental. And pity the poor creator who doesn’t know what the customer wants. And you can’t blame the creator if they don’t know. But again, this business of why customers have to become intolerant and articulate their needs, and reach out to creators, and say, “Yeah. Hey, this is what we need out of this thing.”
Data Quality Management:
Joe: So the bridge between customer and creator, you talk about data quality management in Getting in Front on Data, and setting up the infrastructure within companies to deal with the task of making sure that data is of good quality. Can you elaborate on what it takes to setup data quality management systems, and what are really the essential aspects of being a data quality manager?
TR: In the beginning an organization and everybody in the organization is dealing with data quality in their own way. And as we talked about, they may go to extraordinary lengths to fix up the data they need so that they can complete the task at hand and then a provocateur kind of saying, “Yeah, that doesn’t make much sense.” He or she will lead efforts to create a working example and a script for the organization. But if it’s going to go further, somebody in a more senior position would say, “Hey, this data quality stuff is pretty good stuff.”
They have to put some management in place to make that happen, and so they name a data quality manager. The simple definition of his or her job and the job of his or her team is to figure out the most important needs of the most important customers and make it easier for customers to connect with creators, make it easier for creators to make the measurements they need, and so forth.
So the data quality manager is put in place by a more senior leader than the provocateur, with the understanding that we need to spread this more certainly across an organization.
Culture of Data Quality:
Joe: So what you’re really talking about is changing the culture within a business. That’s something both “Data Driven” and “Getting in Front on Data” are really talking about. Have you noticed any change within the various businesses you’ve worked with as far as how they are approaching data quality at a cultural level?
TR: I find culture is this very, very, amorphous topic. One addresses culture indirectly and the way they do that is, they put people in place and they put processes in place and they set goals around data quality and so forth and then they change hearts and minds one at a time. And then, after they do that for four or five years you wake up, and you go, “Wow, the culture is different around here. The customers step up to their responsibility.”
But the point is, if you want to address culture, then you put a data quality manager in place. You train people how to be customers. You train them how to be creators. Leadership talks about the importance of data quality and then they back that talk up. And if those things happen then the culture changes as a result, but the point wasn’t to change the culture. The point was to do the work necessary to change the culture.
Joe: So with you personally, when you’re assisting companies in this transition, is that something you’re focused on intently, sort of establishing the structure and then enabling the business to be able to work organically within this new structure of having data quality managers to sort of deal with their data quality problems?
TR: Yes, right on. Get the right people and structure in place and it’s a lot easier to improve data quality and then the culture follows.
Changes Since Data Driven:
TR: In the eight years since Data Driven has been published, two things have happened:
- Lots of companies are addressing data quality and we see plenty of good examples. At the same time, the sheer quantity of data has increased enormously. I think the industry statistic is, the quantity of data is doubling every 18 months or whatever it is. I see lots more companies who get it and I see lots more attacking it properly. At the same time, I’m pretty sure we’re not keeping up with the data deluge. Companies and individuals have to move faster.
- We need more provocateurs, we need more switched-on leaders, we need more data quality managers who are out there pushing this ‘get in front’ mantra. We’ve got to keep up.