TDAN: The Data Administration Newsletter, Since 1997

THE DATA ADMINISTRATION NEWSLETTER – TDAN.com
ROBERT S. SEINER – PUBLISHER

Subscribe to TDAN

TDWI
Dataversity
Business Analysis Conference Europe 2014
Data Governance Financial Services Conference
Data Modeling Zone
Data Governance Winter Conference

   > home > newsletter > article
 Printer-friendly
 E-mail to friend

Terminology, Semantics and the Meanings of Names

by Judith Newton
Published: July 10, 2007
Basic terminology principles help to understand how naming conventions can be formed and then applied to relate the meanings of objects to the development of their names in a structured way.

Why Think About Theory?

Semantics is often considered by business users of information management systems to be an abstract science with little application to their day-to-day problems, but application of research done in the academic study of meaning may be rewarded by the development of metadata entity names and definitions that convey immediate meaning to users. The advantages of meaningful names can be reflected in increased efficiency with reduced level of effort for information transfer through and beyond the enterprise.

Terminology research applies linguistic theory to classifying things in the real world. An introduction to some basic terminology principles will help to understand how naming conventions can be formed. Then, these conventions can be applied to develop well-formed names.

Semantics and Names

Naming of data entities (data elements, value domains, attributes, etc.) in a rational and organized way is an integral part of the metadata management of an organization. Artificial intelligence researchers try to develop languages that have restricted vocabularies, rules and constraints so that their meanings may be easily interpreted by both machine and human intelligence. These are called controlled languages. Naming conventions are sets of such rules applied to names. Unlike natural language names, which have evolved from many influences so that any particular name may or may not describe the thing named, the goal for metadata naming is to have maximum clarity and transparency of meaning, combined with concision, with minimal effort of interpretation by the end user.

Concept Systems

Concept systems consist of sets of concepts ordered according to the relationships among them [ISO 108]. These can be as simple as ordered lists or keywords, or as complicated as taxonomies and ontologies. A data model is the most common example of a concept system used by data managers. It is the primary source of the components used to form rational data entity names.

Designations

A concept is a unit of knowledge created by a unique combination of characteristics [ISO 108]. There are two types of concept:

  • A general concept corresponds to two or more objects that form a group by reason of common properties;

  • An individual concept corresponds to only one object.

Relationships among concepts in concept systems provide clues to structure names, which may then be codified in naming conventions. Some of the relationships defined in ISO 1087-1 are:

Hierarchical relation – a relation between two concepts, which is either generic or partitive.

  • Generic: the definition of one concept includes that of the other and at least one additional distinguishing characteristic (also known as an IS-A relationship – e.g., an employee is a person)

  • Partitive: one of the concepts constitutes the whole and the other a part of that whole (also known as a PART-OF relationship – e.g., a street name is part of a mailing address).

Associative relation – a relation between two concepts having a non-hierarchical thematic connection by virtue of experience – e.g., "cost" and "amount."

A designation is the representation of a concept by a sign which denotes it [ISO 108]. Two ways to categorize a designation are shown in Figure 1:

Designation by kind. This designation sub-type consists of three entities that can be used in a process to develop well-formed names:

  • A term is a verbal designation of a general concept in a specific subject field (“Employee”).

  • An appellation is a verbal designation of an individual concept (“French”).

  • A symbol is a visual representation of a concept (“$”).

The three parts of designation by kind can be used as building blocks, guiding development of semantic rules to construct names that convey meaning to human users, as part of a naming convention. Together with rules concerning relationships among the components and those concerning the appearance of the names, they can be employed to form names by which information about the data is expressed, in a simplified but still understandable grammar compared to natural language. Ideally, the names resemble summaries of the formal definition of the information being named.

Designation by intended use. Designators of this sub-type may not consist of names that are meant to convey meaning to a human user. Their primary use is to identify, locate or refer to a piece of data for use by software or other automated service. As such, they may be cryptic or unintelligible to a naïve user.

Figure 1: Designation and Its Components


This structure can be adapted to the process of developing metadata entity names. In this article, the term name refers to any result of the application of a process involving the three parts of designation by kind.

When naming classes of objects, terms for general concepts are preferred. Appellations and symbols are used as part of a name in combination with one or more terms, when a name contains more than just a term. Appellations may be used to name individual concepts. The use of symbols as sole name components should be avoided.

Relationships as defined above are used to determine the relationships of components of a name. These are applied to the semantic and syntactic rules of a naming convention.

Modeling Names

Enterprises that have developed a data model have a major tool for developing a rational system of names. This provides a firm basis for collecting and organizing metadata. The components of a traditional data model may be translated into meaningful information. The semantic information contained may be collected from anywhere in an enterprise's area of interest. Names can then be developed using the components of the model as building blocks of name parts.

Using a model for metadata, such as the conceptual metamodel depicted in Figure 2, users can store metadata about classifying, naming, identifying, defining, and registering information in order to make it understandable and shareable. Data about sources, usages, and derivation of information can be stored in a readily accessible form. This metamodel is the basis for the registry for the standard described in ISO/IEC 11179, Information technology - Metadata registries [ISO 111].

Using a conceptual metamodel allows relationships among differing representations and value sets of the same information to be mapped together in one place. This is useful, for instance, for tracking the source of the XML objects generated for interchange back to the original usage (information which tends to get lost as XML structures tend to focus on data syntax but not semantics or other kinds of metadata), and documentation of other usages of that information within an organization. This information can then be used to avoid redundancy and reprocessing of information.

The metamodel components can be used in the development of entity names. A structure is developed in which higher-level component names are used to construct the lower-level names. Relationships among the components are reflected in the names, contributing to rationalization of name development and understandability.

Figure 2: Conceptual Metamodel

Name Control

The proliferation of names has many causes. Each application of data has a unique set of requirements and restrictions that constrain the name used in that application. Determining the semantics of names is part of a broader issue of getting computers to “understand” meaning.

Since a name is a non-unique form of identification for a metadata entity, a unique identifier must also be associated with the registry entry. This is one of the several means by which a populated metamodel can maintain a complete set of metadata, including all names in all contexts of applications in which the entity is used, and all sources and targets of an entity used in data interchange.

Polysemy. Except within a controlled namespace, there is no guarantee of name uniqueness. Thus the possibility that two or more different data entities may use the same name must also be accounted for and controlled in the registry.

Synonymy. In a metadata registry, one name may be designated as the "enterprise name," derived by describing the content of a metadata entity in a structured way, using a set of rules, i.e., by application of a formalized naming convention. Other names for the same data entity may occur in any context. For example, these may be:

  • Software system names

  • Programming language names

  • Report header names

  • Data interchange (e.g., XML) names

  • Names in other natural languages

They may have varying levels of rigor applied to their formation and usage. The collection and display of all names used by any one metadata entity is a major strength of the metadata registry. The process of deriving names from concept systems and arranging semantic components with a naming convention forms a set of consistent, meaningful enterprise names. Names from other contexts, which may or may not have been formed with naming conventions and therefore may have little or no semantic content, are collected and related to the enterprise name, thus contributing in a valuable way to enterprise data management.

Summary

Applying the principles developed by the terminology research community lets us relate the meanings of objects to the development of their names in a structured way. A name that conveys information about a business object is an advantage to the understanding of applications across an organization, when all usages can be mapped to a name that anyone can understand, and names can be developed using sets of rules anyone can utilize.

References:

  1. [ISO 704] ISO 704:2000, Terminology work – Principles and methods, International Organization for Standardization, Geneva
  2. [ISO 108] ISO 1087-1:2000, Terminology work – Vocabulary – Part 1: Theory and application, International Organization for Standardization, Geneva.
  3. [ISO 111] ISO/IEC 11179:2003, Information technology – Metadata registries (MDR) – Parts 1-6, International Organization for Standardization, Geneva. Available for download at:
    http://isotc.iso.org/livelink/livelink/fetch/2000/2489/Ittf_Home/PubliclyAvailableStandards.htm

Go to Current Issue | Go to Issue Archive


Recent articles by Judith Newton

Judith Newton - Judith is Principal of Ashton Computing and Management Services, LLC, a consulting firm specializing in web design and metadata development. She is currently the Senior Analyst for two metadata registry development projects.

She is a U.S. delegate to the International Standards Organization Subcommittee for Data Management and Interchange (ISO/IEC JTC 1/SC 32), Working Group 2, Metadata, and author and editor of the ISO Standard on Metadata Registries: Naming and Identification Principles (ISO/IEC 11179-5) and the technical report Specification of Data Value Domains (ISO/IEC TR 15452).  She is editor of the technical report on Procedures for Achieving Metadata Registry Content Consistency: Data Elements (ISO/IEC PDTR 20943-1).

She is a member of ANSI INCITS L8, Metadata, which is U.S. TAG to SC 32/WG 2.  As Chair of the L8 Task Group for Technical Development, she led the technical development and consensus process to achieve completion of products at the national and International level.

Ms. Newton is a past member of the American National Standards Accredited Committee for Information Resource Dictionary System (X3H4). In 1992 she chaired the Task Group that Produced the Technical Report IRDS Support for Naming Convention Verification (ANSI X3/TR-11-92), addressing the feasibility of an automated naming tool for the IRDS.

Judith was employed by the National Institute of Standards and Technology 1979 to 2004. At NIST, her most recent project involved study of the synergy between XML registries and 11179-based metadata registries. Other projects have addressed enterprise data modeling, data repositories, and semantic interoperability. In a consultant capacity, she has advised several agencies and Federal committees on metadata usage, among them EPA, DoD (DISA), and Navy.

She has also served as president of the Data Administration Management Association (DAMA) National Capital Region Chapter (DAMA-NCR), from its founding in 1987 to 1990; and chaired the highly successful DAMA Symposia in 1988, 1989, 1990 and 2001. She continues to serve on the Executive Board of DAMA-NCR.  She served on the Program Committee for the DAMA-International/Metadata Symposium 2000, and the DAMA-NCR Symposium 2003.

From 1973 to 1979, she was employed by Navy Regional Data Automation Command (NARDAC), Washington, D.C. to develop and maintain the RAS STADES system, an early effort to manage standard data elements using a data element dictionary system.

She was the recipient of the 2001 InterNational Committee for Information Technology Standards (INCITS) Merit Award, and the 2005 DAMA-International Government Award.

She is a graduate of Temple University.