This article will make the case for using Agile to build Data Warehouses. There are many who say that creating an enterprise-level database or data warehouse requires BDUF. They will argue that it is necessary to have the logical E-R design locked down prior to development. Otherwise, a Winchester Mansion-like mess will result. However, it is not only possible, Agile is actually the better way to go as both Kimball and Inmon attest. Given that Kimball’s architecture is best suited for Agile, it will be the one that underlies this article. Kimball’s architecture is the best approach due to its:
Agile allows us to eliminate the key Data Warehouse failure modes often seen on standard waterfall-based projects.
|DW Failure Mode||How Agile Eliminates or Mitigates|
|Insufficient business involvement||Fosters close collaboration with the customer. PO acts as surrogate for customer|
|Underestimating the complexity and scope||Spikes, Walking Skeletons, progressive elaboration, and planning patterns|
|Not anticipating or allowing change||Agile expects and embraces change. Iterative development addresses change.|
|Misunderstood expectations||Close collaboration with the customer, Agile Modeling, Specification By Example, Behavior Driven Development (BDD), and progressive elaboration|
|Over-complicated architecture||Use of Minimal Viable Architecture, Architectural Runway, and just in time modeling|
|Poor understanding of the data||Close collaboration with the customer, Agile Modeling, Specification By Example, BDD, and progressive elaboration|
Avoiding failure consists of “architecting” to support the delivery of working DW/BI functionality early and continuously to our customer. Applying best practices and creating a flexible, extensible data architecture to support evolutionary development is also key.
Conformed dimensions are sets of descriptive master reference data that are addressed in multiple dimensional models. These are fundamental to the Kimball approach and enable Agile DW/BI by levering the relevant conformed dimensions.
To create conformed dimensions, start by identifying a subset of entities that have significance across the enterprise and iteratively grow from there. For these entities, define common names and attributes for like entities. Failure to create conformed dimensions at the start will result in significant technical debit and is one of the key reasons for Agile Data Warehouse project failure.
The key to implementing Kimball’s methodology is the creation of a bus matrix. These are laid out as follows:
Simple Example for a Pizza Parlor
|Pizza Type||Process Owner|
A Bus Matrix meets the Agile principle of just enough documentation. It can be created in a matter of days in collaborative workshops with the right people at the table and a skilled facilitator. Good business and Data SME’s are crucial as a solid understanding of data and processes is required for success. Output will inform release and iteration planning as well as provide a list of reusable common dimensions.
Focusing on one process row at a time reduces risks that result from overly-ambitious plans and supports the Agile principle of rapid development of valuable software. Each row can then be tackled in individual iterations.
Agile Modeling, conceived by Scott Ambler, can be done concurrently or even in conjunction with creating the Bus Matrix. The goal of Agile Modeling is to iteratively develop models that are just barely good enough. The process starts with a light-weight envisioning session to create an initial domain model. With each iteration, the team develops just barely enough of the data model to support development of the sprint backlog.
Software architecture should include a database encapsulation layer, aka persistence layer or data layer. This layer hides the physical details of the database from the application code, so If the database changes, only this layer needs to be changed. The layer also consolidates all DB access code in ‘one’ place.
“Database refactorings” are design improvement changes to the schema that still preserve its behavioral and informational semantics. These includes both structural and functional aspects and in some ways can be considered as normalization after the fact.
Refactoring can involve doing three changes together:
Refactoring Examples include:
Hyper-Normalization goes beyond the 3rd Normal Form. It consists of a data vault with attributes stored in satellite tables and with the foreign keys stored in link tables. This allows changes to data relationship without changing the data(hub) tables, thereby creating a data structure more conducive to Agile development. A variant is Hyper-Generalization, where all hub data is moved to a single table and has a table of tables to identify which rows belong to what data category. In addition, only one link table is needed. While it reduces code and data structure complexity which makes it easier to make changes, it is at the price of more complex queries.
While this article has only scratched the surface of the subject of Agile Data Warehouse implementation, I hope that you can now see that it is not only possible to implement a Data Warehouse with Agile, but that it is actually the best approach.