|
Using Levels of Abstraction to Name Data Elements
Published: December 1, 1998 What we call a naming convention is a collection of rules, which, when applied to data, results in a set of data elements named in a logical and standardized way.
Introduction Naming conventions for data elements are part of the toolset of data administrators. What we call a naming convention is a collection of rules, which, when applied to data, results in a set of data elements named in a logical and standardized way. These names will inform the user about the contents of the data value domain (the set of possible valid values for a data element), and the usage of the data element, in a concise manner[ISO P]. When gathered into a repository or data registry, this collection of meta data assists users to achieve efficient use and reuse of data while maximizing understanding of information both within and outside their organization. The International Standard ISO 11179, Information Technology-Specification and Standardization of Data Elements describes a set of rules for developing naming conventions together with standards for data classification, attribution, definition and registration. Part 5 of this standard is Naming and Identification Principles for Data Elements[ISO 5]. This article is based on the information in that standard. Types of Names Data elements are ideally the result of a process of development, involving several levels of abstraction. Levels progress from the most general (conceptual) to the most specific (physical). The objects at each level are called data element components; their names become name components. Using the Zachman Framework, for instance, the highest levels of definition are contained in the business view; development progresses to the implemented system level. Components are defined and combined differently at each level. Each component contributes its name, or part of its name, to the final products. The rules by which these component names are combined are a data element naming convention. Also, one data element may have many names depending on context of use. Naming conventions must reflect this multiplicity. After the conceptual components are developed by a process of specification from the highest conceptual level, a representation term is assigned which may in turn be derived from a structure set or process. Components are envisioned as a set of building blocks that can be assembled into data elements, and serve to ensure that the end product, the total set of data elements, is as discrete and complete as possible. Names derived in this way serve as the primary means of identification for elements external to systems that process them. However, within physical systems, names are subject to constraints imposed by software limitations. Other names may be used by reports or EDI files. Provision for identification of synonymous names is made through sets of name-context pairs in the element description. Since many names may be associated with a single data element, it is important to also use a unique identifier, usually in the form of a number, to distinguish each data element from any other. ISO 11179-5 discusses assigning this identifier at the International registry level. Both the identifier and at least one name are considered necessary to comply with ISO 11179-5. Each organization should decide the form of identifier best suited to its individual requirements. Levels of Abstraction Name development begins at the conceptual level (See Figure 1). At this stage, a set of concepts exists as entities or objects (called object classes), which, with the assignment of properties, become data element concepts (DECs). An object class represents an idea, abstraction or thing in the real world, such as tree or country. A property is something that describes all objects in the class, such as height or identifier. Each of these components has its own name. When applied to data element names, these are called object class term and property term. DECs are named by combining the object class term and the property term. From the examples above, we can form the DECs tree height and country identifier. DECs also contain conceptual domains, which are composed of value meanings. These value meanings are defined but do not have a specific form of representation (Figure 2). The next step in forming data element names takes place at the logical level. A complete logical data element must include a form of representation for the values in its data value domain (the set of possible valid values of a data element). The representation term describes the data element's representation class. The representation class is equivalent to the class word of the prime/class naming convention many data administrators are familiar with. For example, name, code, and measure can be applied to the DECs above to produce tree height measure, country identifier name and country identifier code.
Notice that identifier name and identifier code are somewhat redundant. A naming convention could include a rule that eliminates redundancy by allowing the dropping of a property term in this case. The property would still exist as part of the inheritance structure of the data element, but it would be rendered invisible in terms of the data element name. This rule would make name concision easier to achieve. Some logical data elements can be considered generic elements. These are data elements that have a well-established data value domain and are recognized at the organizational level or above as useful and shared among several systems. Country name and country code are both potential candidates for designation as generic elements. ISO standard 3166, Codes for the representation of names of countries, presents a well-established reference list of country names and codes. Note that this is the highest level at which true data elements, by the definition of ISO 11179, appear: they have an object class, a property, and a representation. The next level of data element development is the application level. Typically, a data element will be customized to an application by subsetting its data value domain or narrowing the definition (or both) to include only those values of interest to the application. Changes in the name to reflect this will be accomplished by addition of qualifier terms to the logical name. For example, if an application of Country name were to list all the countries a certain organization had trading agreements with, the application data element would be called Trading partner country name. The data value domain would consist of a subset of countries listed in ISO 3166. Note that the qualifier term trading partner is itself an object class. This relationship could be expressed in a hierarchical relationship in the data model. The last type of name is the physical name. These are the names which actually appear in the database table column headers, file descriptions, EDI transaction file layouts, etc. They will have abbreviations and possibly other accommodations to the restrictions of a particular software system, and they may also have additional information about their origin or format. For example, trd-ptnr-3166-Eng-name may appear in an EDI transaction file. (Expanded, this name would read Trading partner ISO 3166 English name. In a registry, each of the above names, and name components, will always be paired with a context attribute. This will serve to identify the source or usage of the name or name component. One registry entry will serve to gather all the names of each data element, and allow users to trace all appearances of each data element wherever it occurs, no matter what name it is using at the time. Principles of Naming Conventions We have seen that components of data elements have names. By combining these names in a specific way, that is by following the naming rules, standardized names are given to data elements. These rules will vary depending on the requirements of each organization developing data elements, but the basic principles for developing rule sets are constant. There are three kinds of rules that form a complete naming convention:
While the following naming convention is oriented to the development of application-level names, the rule set may be adapted to the development of names at any level. An Example Naming Convention This naming convention is adapted from Annex A of ISO 11179-5. Semantic Rules These are rules based on the meaning of name components.
Syntax Rules These rules specify the arrangement of name components.
Lexical Rules These rules determine the standard "look" of names.
Representation Term List Representation terms must be strictly controlled. Their definitions should allow the user to easily decide which term is most appropriate for each data element. This list of representation terms and definitions has been updated from the Class Word list in Guide on Data Entity Naming Conventions[NEWT].
Amount - Monetary quantity. In addition to these representation terms for data elements, one more term for group elements is appropriate: Group - Indicates a designation for a set of data elements that have relationships to each other. For example: Employee Address Group. This article is a contribution of the National Institute of Standards and Technology, not subject to copyright in the United States. References [ISO 5] ISO/IEC International Standard 11179-5, Information technology - Specification and standardization of data elements, Part 5: Naming and identification principles for data elements, International Organization for Standardization, Geneva, January, 1996. [ISO P] ISO/IEC PDTR 15452, Information Technology - Specification of Data Value Domains, August, 1998. [NEWT] Newton, Judith, Guide on Data Entity Naming Conventions, NIST Special Publication 500-149, Gaithersburg, MD, October, 1987. Go to Current Issue | Go to Issue Archive Recent articles by Judith Newton
Judith Newton - Judith is Principal of Ashton Computing and Management Services, LLC, a consulting firm specializing in web design and metadata development. She is currently the Senior Analyst for two metadata
registry development projects.
She is a U.S. delegate to the International Standards Organization Subcommittee for Data Management and Interchange (ISO/IEC JTC 1/SC 32), Working Group 2, Metadata, and author and editor of the ISO Standard on Metadata Registries: Naming and Identification Principles (ISO/IEC 11179-5) and the technical report Specification of Data Value Domains (ISO/IEC TR 15452). She is editor of the technical report on Procedures for Achieving Metadata Registry Content Consistency: Data Elements (ISO/IEC PDTR 20943-1). She is a member of ANSI INCITS L8, Metadata, which is U.S. TAG to SC 32/WG 2. As Chair of the L8 Task Group for Technical Development, she led the technical development and consensus process to achieve completion of products at the national and International level. Ms. Newton is a past member of the American National Standards Accredited Committee for Information Resource Dictionary System (X3H4). In 1992 she chaired the Task Group that Produced the Technical Report IRDS Support for Naming Convention Verification (ANSI X3/TR-11-92), addressing the feasibility of an automated naming tool for the IRDS. Judith was employed by the National Institute of Standards and Technology 1979 to 2004. At NIST, her most recent project involved study of the synergy between XML registries and 11179-based metadata registries. Other projects have addressed enterprise data modeling, data repositories, and semantic interoperability. In a consultant capacity, she has advised several agencies and Federal committees on metadata usage, among them EPA, DoD (DISA), and Navy. She has also served as president of the Data Administration Management Association (DAMA) National Capital Region Chapter (DAMA-NCR), from its founding in 1987 to 1990; and chaired the highly successful DAMA Symposia in 1988, 1989, 1990 and 2001. She continues to serve on the Executive Board of DAMA-NCR. She served on the Program Committee for the DAMA-International/Metadata Symposium 2000, and the DAMA-NCR Symposium 2003. From 1973 to 1979, she was employed by Navy Regional Data Automation Command (NARDAC), Washington, D.C. to develop and maintain the RAS STADES system, an early effort to manage standard data elements using a data element dictionary system. She was the recipient of the 2001 InterNational Committee for Information Technology Standards (INCITS) Merit Award, and the 2005 DAMA-International Government Award. She is a graduate of Temple University. |