How to Use Agile to Build Data Warehouses

Data Warehouse

This article will make the case for using Agile to build Data Warehouses. There are many who say that creating an enterprise-level database or data warehouse requires BDUF. They will argue that it is necessary to have the logical E-R design locked down prior to development. Otherwise, a Winchester Mansion-like mess will result. However, it is not only possible, Agile is actually the better way to go as both Kimball and Inmon attest. Given that Kimball’s architecture is best suited for Agile, it will be the one that underlies this article. Kimball’s architecture is the best approach due to its:

  • Bottom-up approach
  • Conformed Dimensions and Bus
  • Matrix Bus

Agile allows us to eliminate the key Data Warehouse failure modes often seen on standard waterfall-based projects.

DW Failure Mode How Agile Eliminates or Mitigates
Insufficient business involvement Fosters close collaboration with the customer. PO acts as surrogate for customer
Underestimating the complexity and scope Spikes, Walking Skeletons, progressive elaboration, and planning patterns
Not anticipating or allowing change Agile expects and embraces change. Iterative development addresses change.
Misunderstood expectations Close collaboration with the customer, Agile Modeling,  Specification By Example, Behavior Driven Development (BDD), and progressive elaboration
Over-complicated architecture Use of Minimal Viable Architecture, Architectural Runway, and just in time modeling
Poor understanding of the data Close collaboration with the customer, Agile Modeling, Specification By Example, BDD, and progressive elaboration

 

Mitigating Failure

Avoiding failure consists of “architecting” to support the delivery of working DW/BI functionality early and continuously to our customer. Applying best practices and creating a flexible, extensible data architecture to support evolutionary development is also key.

For example:

  • Close collaboration between Customers, DBAs, testers, and developers
  • Each developer gets their own database instance and test data
  • Continuous integration into the shared master database and code trunk
  • Automate the refactoring and testing as much as possible
  • Automatically update the developer instances whenever the master is changed
  • Have a clear data access layer within the code
  • Employ Continuous Integration (CI) and Continuous Delivery/Deployment (CD)
  • Beware of delivering one-off solutions

 

Laying the Foundation – Conformed Dimensions

Conformed dimensions are sets of descriptive master reference data that are addressed in multiple dimensional models.  These are fundamental to the Kimball approach and enable Agile DW/BI by levering the relevant conformed dimensions.

To create conformed dimensions, start by identifying a subset of entities that have significance across the enterprise and iteratively grow from there. For these entities, define common names and attributes for like entities. Failure to create conformed dimensions at the start will result in significant technical debit and is one of the key reasons for Agile Data Warehouse project failure.

 

Bus Matrix

The key to implementing Kimball’s methodology is the creation of a bus matrix. These are laid out as follows:

  • Each column is a conformed dimension
  • Separate columns describe other information associated to each business process i.e. Owner, etc.
  • Every row is a business process
  • Each dimension is associated to a process by placing an X in the intersecting cell

 

Simple Example for a Pizza Parlor

Process Customer

Name

Customer

Address

Order

Number

Pizza Type Process Owner
Take Order X X X X Maria
Make Pizza X X Giovanni
Deliver Pizza X X X Fred

 

A Bus Matrix meets the Agile principle of just enough documentation. It can be created in a matter of days in collaborative workshops with the right people at the table and a skilled facilitator. Good business and Data SME’s are crucial as a solid understanding of data and processes is required for success. Output will inform release and iteration planning as well as provide a list of reusable common dimensions.

Focusing on one process row at a time reduces risks that result from overly-ambitious plans and supports the Agile principle of rapid development of valuable software. Each row can then be tackled in individual iterations.

 

Agile Modeling

Agile Modeling, conceived by Scott Ambler, can be done concurrently or even in conjunction with creating the Bus Matrix. The goal of Agile Modeling is to iteratively develop models that are just barely good enough. The process starts with a light-weight envisioning session to create an initial domain model. With each iteration, the team develops just barely enough of the data model to support development of the sprint backlog.

Agile Database Development

Database Encapsulation Layer

Software architecture should include a database encapsulation layer, aka persistence layer or data layer. This layer hides the physical details of the database from the application code, so If the database changes, only this layer needs to be changed. The layer also consolidates all DB access code in ‘one’ place.

Variations include:

  • Single application, single DB – the simplest form
  • Multiple-applications, single DB – common when there is a legacy DB
  • Multiple applications, multiple DB
  • Direct SQL access, DAOs, Persistence Frameworks, or services

 

Database Refactoring

“Database refactorings” are design improvement changes to the schema that still preserve its behavioral and informational semantics. These includes both structural and functional aspects and in some ways can be considered as normalization after the fact.

Refactoring can involve doing three changes together:

  1. Changing the schema
  2. Migrating the data to the new schema
  3. Changing the code in the database encapsulation layer

Refactoring Enablers:

  • Regression testing
  • Strong configuration management
  • Close collaboration
  • The will to do it

Refactoring Examples include:

  • Apply Standard Types to Similar Data
  • Consolidate Key Strategy for Entity
  • Encapsulate Common Structure With View(s)
  • Introduce Column Constraint
  • Migrate Database Method to Application
  • Introduce Common Format
  • Rename Column
  • Introduce Lookup Table
  • Replace One-To-Many With Associative Table
  • Replace View With Method(s)

 

Hyper-Normalization – Taking Refactoring to the Next Level

Hyper-Normalization goes beyond the 3rd Normal Form. It consists of a data vault with attributes stored in satellite tables and with the foreign keys stored in link tables. This allows changes to data relationship without changing the data(hub) tables, thereby creating a data structure more conducive to Agile development.  A variant is Hyper-Generalization, where all hub data is moved to a single table and has a table of tables to identify which rows belong to what data category. In addition, only one link table is needed. While it reduces code and data structure complexity which makes it easier to make changes, it is at the price of more complex queries.

 

In Closing

While this article has only scratched the surface of the subject of Agile Data Warehouse implementation, I hope that you can now see that it is not only possible to implement a Data Warehouse with Agile, but that it is actually the best approach.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.