Bank Systems & Technology is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


12:44 PM
Norbert Turek, InformationWeek
Norbert Turek, InformationWeek
Connect Directly


In this day of stepped-up data analysis, data quality is a critical issue that is getting more attention.

Nobody had to convince Jim Eardley, managing director of business development and strategy for FleetBoston Financial, that data quality is important. FleetBoston's highest-value customers have multiple accounts with the bank and, therefore, multiple touch points. To meet their needs effectively, Eardley says, "everyone must be reading from the same script, and that script has to be exactly right."

But achieving and maintaining clean data isn't easy. The rapid growth of e-commerce, the ongoing integration of multiple databases, and the rise in the use of business intelligence across companies have only amplified the problem. The more information companies access, the more unreliable it is. "We've gone from IT environments where there were a few data-entry clerks, to a world where everyone is a data-entry clerk," says Mike Schiff, an analyst at Current Analysis.

Data cleansing in the past has been done as a batch job, comparing name-and-address records with available reference material, or on a case-by-case basis when a customer complains about faulty information. "Most companies have seen data quality as a tactical IT problem rather than a strategic advantage for business," says Ted Friedman, a Gartner analyst. "That's just the wrong way to do it. But most companies don't have data quality on their radar yet."

They should, given that the cost of erroneous mailings--including postage, printing, and staffing--hit $611 billion for U.S. businesses in 2002, according to a study by the Data Warehouse Institute. And that doesn't include a potentially bigger loss: missed opportunities because of bad marketing decisions that are based on faulty data.

Rather than looking at maintaining the integrity of data as a necessary evil, companies should see data quality as a critical business process, and clean data a key product of the business. "You need to build your database around your constituents. And then you need to have a corporate commitment to maintain the quality of that data," says Ron Boeving, VP of information services and CIO at First Health Group Corp., a managed-care service provider that uses data-cleansing software from Group 1 Software. "If we don't control the quality of our data, we have no products," he says.

But doing that isn't easy. Part of the problem is that databases in a company of any size are typically owned by individual lines of business. Each database administrator manages the data to a sufficient level of quality for the department. For example, a call center may allow operators to input addresses without apartment numbers, while a service manager wouldn't. "In data reengineering, one of the hardest problems is defining what bad is," says Andy Lesser, a senior technical analyst at FedEx Corp.

Databases created at different times, in different formats, and for different business purposes may look similar, but undocumented assumptions about fields can lead to errors even in hand-checked records. When First Health was importing data from a company that it recently acquired, it thought the data was relatively clean. But then the undocumented data started to create problems. "We saw 'start date'" as a field in the database, says Bob Bularzik, assistant VP of software technologies at First Health. "But the start date of what?" What's more, when multiple databases are combined, errors multiply. When two records are nearly identical, companies must determine which, if either, is correct. For example, two people may be shown at the same address. Do they both live there? Has one moved? Are they the same person?

Data acquired from mergers, acquisitions, foreign divisions, third-party data providers, and customers or partners entering information on their own should be suspect, analysts and database managers say. "But the worst thing of all is the paper document," First Health's Boeving says, because it can include errors from both the person who filled out the document and the data-entry clerk.

Even companies that have engaged in data cleansing once often lapse on maintenance. That leads to data-quality drift when data quality goes down without input from anyone simply because the data is no longer accurate relative to what's happening in the real world. Marriage, death, divorce, moves, change of business, and change of product suppliers all create drift, which is estimated at upwards of 2% a month for the U.S. population, according to Trillium Software.

"We try to stay on top of company-executive changes, but even with our research and subscription to multiple business-data providers, we sometimes end up listing two different CEOs," says Bill Schumacher, senior VP of content at business data aggregator and provider OneSource Information Services Inc.

Most companies still favor building their own data-scrubbing tools because simple data validation is relatively easy to program, and they think simple data cleansing is all they need. It's also much less expensive than the $100,000 to $200,000 that data-quality software can cost.

But complex data validation isn't easy to program or anticipate, so high-end data-quality tools offer much more.These suites, which vendors say typically take two to five days to implement, include both batch and real-time audits, repairs, records matching, and augmentation of additional data to records, such as geospatial data, worldwide postal-code information, or a new company affiliation or part information.

Sheer power is also a good reason to move to a commercial data- cleansing tool, says Peter Harvey, president and CEO of Intellidyn, a data-management and research firm. One of Intellidyn's jobs is to give clients credit data on U.S. consumers. "I have to update the credit history of everyone in the U.S. on my system on a frequent basis," Harvey says. The multiterabyte database is updated and checked using tools from DataFlux Corp. "This used to take days to do using home-built tools. Now we can perform the entire upload and clean in 16 hours," Harvey says.

Strategies to fight data degradation range from the simple to the very complex, with results that typically map closely to the effort. That's why First Health, FleetBoston, FedEx, and OneSource take it so seriously. "Our executives don't ask about ROI. They ask if we'll lose out on an opportunity by not doing this," Boeving says. And your company's loss may be another's gain.

This article originally appeared in InformationWeek.

Register for Bank Systems & Technology Newsletters
Bank Systems & Technology Radio
Archived Audio Interviews
Join Bank Systems & Technology Associate Editor Bryan Yurcan, and guests Karen Massey and Jerry Silva from IDC Financial Insights, for a conversation about the firm's 11th annual FinTech rankings.