database

Washing your car can be rewarding. Scrubbing your kitchen is satisfying. Organizing your home office facilitates productivity. And cleaning your bedroom can help you relax or, even, sleep better. And know what else can help you sleep better at night? Maintaining a squeaky-clean database.

What exactly constitutes a clean database? It’s one free of outdated, incorrect or just plain junk data. It also involves consolidating datasets to create a more efficient system.

But, against all best efforts, bad data has a way of sneaking its way into databases. It’s simply a fact of life. To add to the database-maintenance complexity, time passes, people move, phone numbers change. Think about it: Do you have the same phone number you did 10 years ago? Home address? Email address? Almost certainly, somewhere, there’s a database with your old information stored.

And what are the companies, consultants or government agencies doing with that bad information? Sending advertisements to old email addresses? Making crucial business decisions based on faulty data? Relying on outdated information to make decisions about infrastructure improvements?

Bad data hurts. In fact, according to findings from IBM, poor data quality costs the U.S. economy more than $3 trillion per year. That’s trillion—with a T. And, according to Gartner research, poor data quality costs organizations an average of almost $10 million per year.

And it’s clear to see how bad data takes the financial toll it does. For instance, unqualified leads that have wiggled their way into a marketing database waste advertising dollars. Opening a storefront based on errant data about traffic in the area can lead to less-than-satisfying sales, or worse. And crucial business decisions using erroneous demographic information is usually disastrous.

It may not be as fun as taking your car to the car wash, but following some guidelines can help keep your database clean and uncluttered.

4 Best Practices for Keeping Your Database Clean

Eliminate Duplicates

Duplicates are simply a fact of databases. In other words, it’s hard to avoid them altogether. But the first step in cleaning out your database involves proactively eliminating duplicates. Pay special attention as you do so. Cross-reference duplicates before deleting to decide which has the most complete and pertinent data.

Ensure Data Uniqueness

Uniqueness is key to a tidy database. Here’s how to ensure unique data: Avoid creating columns prone to duplicate values. For instance, no two social security numbers are the same. However, there are duplicates of Maria Garcia, Robert Smith and other common names. Thus, it’s important to set up your database to avoid referencing columns with potential duplicates. And if there’s no single column that has the property of being inherently unique, then use two or more columns as a composite, which taken together are more likely to be unique.

Ensure Uniformity

This is one of those “easier said than done” situations. But, indeed, uniform data is clean data. Let’s say you have a form on your website that collects information on potential customers. One field asks them to enter their state of residence. Some of these potential leads that live in, say, Utah may type in “UT.” Others may type it out, “Utah.” And still others may mistype, to errantly record “Urah.” All these users live in the same state, but now your database doesn’t understand that. A drop-down list can eliminate many of these mistakes.

The same goes for databases used by municipalities, government organizations and others. One court clerk may enter a city one way, while another might enter that same city another way—no matter the regulations put in place to avoid these types of formatting mistakes. After all, databases are prone to human error. Again, a drop-down list could help eliminate these inconsistencies.

The point? As you review your database, make sure all the information is formatted and displayed similarly.

Take out the Trash

Garbage piles up in your database. Just like at home, right? For instance, some users may be wary of inserting their real email address or phone number into a form. That’s when you may start seeing data such as “999-999-9999” phone numbers or “blahblahblah@msn.com” email addresses. It’s important to eliminate these bad leads. Filter for these common and suspicious entries to automatically find all their instances and remove. Along with this, look for the telling red flag of columns missing required data.

Conclusion

Data is everywhere. And quality data can be an asset when gathered, managed and utilized correctly. A clean, uncluttered database is one of the cornerstones of reliable data. Following the best practices outlined here will get you on the path to ensuring your data is an asset, not a liability.

Derek Smith is the president of White Box Technologies, a trusted leader in database conversion and integration projects. Since 2007, Smith and the team at White Box Technologies have helped convert or integrate databases for organizations ranging from the healthcare industry to public safety and everything in-between. Whether it’s moving all the data from a legacy database to a new system or merging data from disparate frameworks into an accurate, meaningful and trusted pipeline of information, White Box Technologies can handle it.  

Database stock photo by Lightspring/Shutterstock