TDAN: The Data Administration Newsletter, Since 1997

THE DATA ADMINISTRATION NEWSLETTER – TDAN.com
ROBERT S. SEINER – PUBLISHER

Subscribe to TDAN

TDWI
Dataversity
Data Governance Winter
DGI Conference
Master Data Management

   > home > newsletter > article
 Printer-friendly
 E-mail to friend

Temporal Versioning

by Paul Fong, Dan Snyder
Published: October 1, 2008
This article explains temporal versioning and why it is a useful design approach for tracking changes in operational systems so that users can explain why “the system” did what it did.

The Need

“What do you mean my claim was rejected? Why?”

…Irate caller to customer service representative

Do your customer reps frequently need to be able to explain why “the system” did what it did? Then your system needs to keep careful track of changes. This article presents an innovative design approach for operational systems to track changes, including retroactive and future dated ones, so that users can answer these types of questions and more.

For most business systems, a change to a birth date or gender is treated like a correction, where the value is simply updated in place. This is because a person can’t change his/her birth date so it must have been incorrect; and when a change in gender is required, it’s usually not necessary to keep the old value.

However, the authors worked on an insurance claims processing system where these and most other changes had to be tracked because the fields were used to adjudicate claims. For example, a claim for birth control pills may have been denied if the gender of the patient was incorrectly recorded as male. The business needed to be able to make the change in gender retroactive, but also needed to preserve the old value to be able to explain why the claim adjudicated the way it did.

Another challenge with this particular system is that claims received needed to be processed with the values that were in effect on the claim’s date of service, which could have been a year ago. Most systems process items like orders received today with the values in effect today. Not so for claims. The member’s plan may have changed, but the claim has to be processed with the plan as it was on the date of service.

The Solution

These requirements are addressed by keeping track of changes along two different timelines – a business timeline and a chronological timeline. The business timeline records when values are in effect for business purposes, i.e., the periods of effectiveness for various states. This timeline supports retroactive or future dated changes.

The chronological timeline records when changes were actually made over time, literally “arranged in the order of time” – so no retroactive or future dated changes are supported on this timeline. Tracking changes along these two timelines allows a system to accurately answer the following questions for an instance of a business entity, such as a member:

  • As of right now, what is in effect on May 1, 2008?

  • Last week Tuesday at 10:30 a.m., what was in effect on May 1, 2008?


Figures 1, 2 and 3 below illustrate these two concepts using a highly simplified example of a member whose gender and preferred language fields change over the following series of events:

  1. On 2007/4/1 a member with a gender of Male and a preferred language of English, effective from 2006/1/1 to the end of time was created. Let’s call this initial state A.

  2. On 2007/7/15 the member’s gender was corrected from Male to Female, effective 2006/1/1 to the end of time. Let’s call this state B.

  3. On 2007/8/6, the member’s preferred language was updated from English to French, effective 2007/1/1/ to the end of time. Let’s call this state C.


alt

Figure 1: Business vs. Chronological Timeline Event 1

“On 2007/4/1 a member with a gender of Male and a preferred language of English, effective from 2006/1/1 to the end of time was created. State A.”

alt

Figure 2: Business vs. Chronological Timeline Event 2

“On 2007/7/15 the member’s gender was corrected from Male to Female, effective 2006/1/1 to the end of time. State B.”

Here is what is going on behind the scenes at the database level in this event. Since the requirement is to be able to explain why the system did what it did, true updates are not allowed. So, when the member’s gender was corrected from Male to Female in this second event:

  • The original row with the state of A (Male, English, etc.) is inactivated.

  • A new active row with a state of B (Female, English, etc.), effective 2006/1/1/ to the end of time, is created on 2007/7/15.

State A is effectively replaced by B in this event.

alt

Figure 3: Business vs. Chronological Timeline Event 3

“On 2007/8/6, the member’s preferred language was updated from English to French, effective 2007/1/1/ to the end of time. State C.”

Now, when the member’s preferred language is updated from English to French in this third event, there is more going on behind the scenes at the database level:

  • The row with a state of B (Female, English, etc.) is inactivated, and

  • A new active row, also with a state of B, is created to cover the period from 2006/1/1 to 2006/12/31, (the day before C is effective), and

  • A new active row with a state of C (Female, French, etc.), effective from 2007/1/1 to the end of time, is created.

There are many other possible scenarios not shown here. It is important to discuss and document all the different scenarios that are permitted and how each will be handled. For example, how should an update, with a specific end date, that is in the middle of a state that goes to the end of time be handled? Should the original state continue after the end of the update, or end before the start of the update? The section on Eliciting Requirements discusses the challenges in teasing out requirements.

The Gory Details

The following fields were used to implement the two timeline concepts:

  • Effective Date and Expiry Date to record the periods of effectiveness on the business timeline,

  • Create Date/Timestamp and Last Update Date/Timestamp to record when changes were actually made, on the chronological timeline, and

  • Record Status, to record whether a particular row is “active” or “inactive”.

    Instead of a record status, a second date/timestamp could have been used to end date records in chronological time, with each record in chronological time being ended by the creation of a new record that overlaps it in business time, similar to Expiry Date for business time. However, by using a record status, queries are simplified by being able to look for rows where the record status is active or inactive, and the maintenance logic is simplified. Also, since this kind of application can generate large volumes of data, indexes can help speed access to the most interesting rows – the “active” ones!

Now, how should we relate two business entities that are temporally versioned? If they each will have one or more temporal versions (i.e., rows with different keys), it will be difficult over time to preserve the relationship. So, we’ll need to create “independent identity anchors” that uniquely identify instances of the business entities being modeled with keys that remain constant over time, and are not temporally versioned, and then relate these to each other. Those fields whose changes need to be tracked should be logically grouped into temporally versioned “dependent child entities” of the independent identity anchors. Fields that never change, or those whose changes do not need to be tracked, can be placed in the independent identity anchors.

Figure 4 below illustrates these ideas with members and groups, where members may belong to different groups over time:

 alt

Figure 4: Member and Group


In this entity relationship diagram, MEMBER and GROUP are independent identity anchors that are not temporally versioned, where each row represents individual members and groups respectively. MEMBER PROFILE and MEMBER ADDRESS are dependent child entities of MEMBER that are temporally versioned. GROUP PROFILE is a dependent child entity of GROUP that is likewise temporally versioned.

GROUP MEMBERSHIP is an associative entity between MEMBER and GROUP that records the periods of effectiveness of the relationship, and that supports the many-to-many relationship between members and groups – a group can have many members, and a member can belong to many groups over time, but only to one group at a time. GROUP MEMBERSHIP is also temporally versioned.

We discovered that two different sets of temporal versioning rules were required. The first is for entities that are permanent in nature – there should be a row in effect at all times. The second is for entities that are temporary in nature – that are effective for a period of time, but may not be effective for other periods. The main rules are:

Permanent entity rules (e.g., for dependent child entities):

  1. Overlapping active rows is not permitted, i.e., only one active row can be in effect on any date.

  2. No gaps in periods of effectiveness.

  3. The row with the greatest business effective date is open-ended, i.e., is effective to the “end-of-time.”

Temporary entity rules (e.g., for associative entities):

  1. Overlapping active rows is not permitted, i.e., only one active row can be in effect on any date.

  2. Gaps are permitted.

  3. The row with the greatest business effective date may be end-dated, i.e., does not have to be effective to the “end-of-time.”

In Figure 4 above, MEMBER PROFILE follows the permanent entity rules, because even if the member’s status says he is deceased that row should be effective from the date of death until the end of time. GROUP MEMBERSHIP naturally follows the temporary entity rules.

Design Recommendations / Decisions

First, we recommend that inactivation of states that have been overlapped by a new state, and the creation of new gap-filling boundary states, be managed in the database to ensure consistency, rather than leaving it up to each developer to do correctly.

Second, in order to preserve history we recommended that row updates not be permitted, except to update the record status to inactive. You can use database triggers and stored procedures to intercept application updates and replace them with one or more inserts.

You do have to decide the granularity of your business date/times and how boundary conditions are to be resolved. (e.g., are expiry dates the last day the state is effective or the first day it is no longer effective?)

You also have to decide what value will represent the “end of time,” whether it is some arbitrary future date, the greatest date that your RDBMS can store, e.g., Dec 31, 9999, or null. Our preference is for the greatest date that your RDBMS can store. An arbitrary future date leaves the design open to multiple arbitrary future dates, each one meaning something different. A null requires different SQL to correctly find the period of effectiveness that goes until the end of time, rather than for other periods of effectiveness with a real expiry date. Using the greatest date that your RDBMS can store simplifies the development effort by making the SQL consistent for any period of effectiveness. (You may also find a need for the related concept of “beginning of time.”)

Querying Temporally Versioned Data

Dependent child entities and associative entities, as modeled in Figure 4, have date/time ranges on both time dimensions. Each row (state) on these entities applies to a business date range and to a chronological date range. So, to query data that is temporally versioned this way, such that a single state can be chosen, it is necessary to provide two reference points in time: a business date/time and a chronological date/time.

A typical query in the claim processing system used the date of the service as the business date (which could be in the past, present, or future) and “now” as the chronological date. Only on rare occasions, when it was necessary to find out why “the system” did something in the past did the chronological date need to vary.

A further note on chronological dates at query time: By choosing to use a record status to mark records as active or inactive, as was done in the example, it made it possible to effectively ignore the date/time on the chronological scale when seeking a state that is chronologically current. Instead, the queries simply filtered out “inactive” records since the set of active records are a non-overlapping sequence of states representing what the system knows to be current. So, a query need only specify a point on the business timeline to be able to isolate a single state!

Eliciting Requirements – When Words Fail, Try Pictures!

Not surprisingly, it is a challenge to discuss multiple time dimensions. For some people, coming up with clearly defined terms is enough; for others, it is necessary to use pictures. Whatever you choose to use, it is critical that your language and/or symbols are well defined to be able to document the use cases.

Words: Some of the terms we found helpful were “point of interest” to represent the point in time on the business timeline and “point of view” to represent the point in time on the chronological timeline.

Pictures: You could use a graph like the one in Figure 1 that uses one axis for each time dimension. Or, you could use more of a state transition diagram that shows tbefore, action, and tafter. Using this approach we were able to document the use cases by representing each as a set of pre-existing temporal versions (tbefore), a new or changing state (action), and a resulting set of temporal versions (tafter).

Summary

Temporal versioning is a useful design approach for tracking changes in operational systems so that users can explain why “the system” did what it did. It relies on tracking changes along two time dimensions, a business timeline and a chronological timeline, using an effective date and an expiry date for the business timeline, and a create date/timestamp and a record status for the chronological timeline.

Modeling consists of identifying independent identity anchors that are not temporally versioned, containing fields that never change, or fields whose changes do not need to be tracked, and temporally versioned dependent child entities for those fields whose changes need to be tracked, as well as temporally versioned associative entities to relate independent identity anchors to each other.

Two different sets of temporal versioning rules are required: the first for entities that are permanent in nature (i.e., where there should be a row in effect at all times) and the second for entities that are temporary in nature (i.e., that are effective for a period of time, but may not be effective for all periods).


Reference:

Developing Time-Oriented Database Applications in SQL, 2000, by Richard T. Snodgrass.

Go to Current Issue | Go to Issue Archive

Paul Fong - Paul is a data and business intelligence architect based in Oakville, Ontario. He has over 25 years of consulting experience in information technology in a wide variety of industries including retail, financial, insurance and airlines, for both operational and data warehouse systems. He has held key leadership positions on numerous projects and can be contacted by email at pfong@performancekey.com.
Dan Snyder - Dan is a business systems analyst based in Oakville, Ontario. He has more than 18 years of experience in data and application solution delivery. His focus is on requirements capture and high-level solution design for back-end database applications. His industry experience is in investments, insurance and pensions. He can be contacted at td_snyder@sympatico.ca.