Many questions circle the heads of Database Architects during the design phase of any database build; especially those that aim to integrate existing data sources into a database to support a direct marketing practice. A solid discovery process answers many of these questions, and usually concentrates on aspects of the data that are self-evident, either from direct analysis or exposure to some level of documentation.
However, there are a few select cases where what emerges from the data is new information about the business that is not obvious from any normal dictionary view of the data sources. I like to call these “undocumented truths”…a class of business rules hidden or embedded in the design of corporate data sources that reveal a condition that’s important to the business. Examples of such discoveries are unique combinations of status codes, or the absence of data which has its own meaning, or my personal favorite; “Data Type Stretching” — e.g. improbable data such as a date attribute populated with the 0th of the month.
The last example clearly didn’t come from an information system that enforced common data types, but it’s a perfect example to run with in this entry. As we look at this example more closely, we discover that a specific date attribute, let’s say ship date, containing the 0th of the month means that a product is back-ordered. What do you do? You obviously can’t store it in a date field. Do you store it in a character field and give up a few DBMS functions and features? A purist view would be to store exactly what you received, how you received it, and decide whether it’s associated with dimension or fact. But in doing so, the mystery continues for future users of the data.
This is where the database architect has a unique opportunity to make the data more intuitive, and should take it…the creative license. When discovery and content knowledge reach a critical mass for the data modeler, every effort should be taken to improve the model’s usability by adding structure, and attributes that are more self-revealing. From the source-side, it doesn’t make the “undocumented truth” go away, but at least its interpretation will be recorded in your new database and the ETL process. You’ll be happier in the future when the use of the resultant data model will not require elusive domain knowledge or documentation or extensive training by users.
About the Author:
Chris Dewey is the CIO at SIGMA Marketing Group. Connect with Chris on , or follow him on .