|
Defining Keys
Published: July 1, 2006
2006 Published in TDAN.com July 2006
IntroductionWe all know that keys are essential to relational structures. It is crucial that we be able to differentiate between one instance and another. Consider the following:
Person
Name Jim Smith Name here is inadequate as a key; consider what happens when we add additional instance with the same name:
Person
Name Jim Smith Jim Smith How do we tell the differences between the two? Can we assume that we can use occurrence? The second instance was inserted before the first (just to be perverse). In a relational structure you cannot dependent on occurrence to differentiate between instances. The solution here is to:
Natural KeysA natural key is a collection of attributes that basically insure that an instance is uniquely identified in all instances. Natural keys can be hard to come by. An attribute is an inherent characteristic. For example the only inherent characteristics that uniquely identify a person are biometric in nature (e.g. DNA signatures, fingerprints etc). Unfortunately, the biometric technology is not mature enough to use these characteristics as natural neys. As a consequence, Person entities are almost always assigned a surrogate key. Lets us then examine an entity that has readily identifiable Natural Keys; Automobile.. ISO 3779 mandates a structure for Vehicle Identification Numbers to be used world wide to uniquely identify Automobiles. The mandated structure for a VIN is:
World Manufacturer Identifier The Automobile entity would look like:
Automobile Use of the ISO VIN structure permits us to identify 2 occurrences of an Automobile that are otherwise exactly identical. Natural Keys Selection Rules
Must ensure uniqueness The previous discussion dealt with why occurrences of attributes must be unique. To re-iterate, we have to have a mechanism that allows us to differentiate between two otherwise identical occurrences. Cannot be null in whole or in part In a relational structure, a null value equates to an 'unknown' value. If one allows key attributes to assume null values, once the value for the instance becomes 'known', then a the possibility of a resulting duplicate exists. Must have an invariant value Once a value is assigned to a primary key attribute, it cannot change. Typically RDBMS's do not handle changes in key values well. Furthermore, changes in primary key values will cause cascading change to foreign key values in related tables. Consider the following
Invoice
Invoice Line Let us change the value for the Invoice instance from 123 to 456. In order to maintain the relation integrity, we have to change the foreign key reference in Invoice Line. Not that invoice number is part of a concatenated primary key. As a result of changing the value of the invoice number foreign key reference in Invoice Line, we also have to change any foreign key references to Invoice Line and so on and so forth. Must have an invariant format The comments made regarding invariant values also apply to the format for the primary key attribute. Surrogate KeysA 'surrogate key' is an arbitrary artificial attribute / column. In tables they quite often are implemented as identity columns So when does one use a 'surrogate key'? There are two circumstances where the use of a 'surrogate key' is required:
No suitable natural key exists Let us consider our Person entity again.
Person As mentioned previously, Person does not typically have a suitable natural key attribute. So what we do is assign a arbitrary attribute to the entity to ensure that we will always have a unique value.
Person Too Long a concatenated key Let us assume that an entity has a concatenated key {CK1, CK2, CK3, ... CK255}. A table implemented with 255 columns in its primary key index would exhibit unacceptably poor performance in most RDBMS's. So quite often a 'surrogate key' column is created for the table and the concatenated key becomes a unique alternate index. At what point a surrogate key becomes necessary varies from RDBMS to RDBMS. The decision to define a surrogate key column should belong to the DBA who is better equipped to evaluate the performance issues.
Copyright © Maximum Business Solutions
Go to Current Issue | Go to Issue Archive
James P. O'Brien -
James O'Brien is a IT consultant with over 30 years of experience in the industry. For the last 10 years he has specialized in Data Architecture. His clients included numerous fortune 500 clients both in the United States, Canada and Europe. As a certified product instructor he taught classes in Data Modeling and Model Management all over the United States. |