|
In SQL: Unified Heterogeneous Virtual Hierarchical Views
Published: October 1, 2004
Published in TDAN.com October 2004
IntroductionThis is the third and final part in this series of articles on ANSI SQL’s inherent hierarchical processing capability and its support of transparent native XML integration. Part 1, Using XML to Link Below the Root, described how SQL can naturally integrate native XML at a full hierarchical processing level utilizing the Left Outer Join operation to inherently model and process hierarchical data structures. Part 2, Processing XML’s Complex Hierarchical Structures, demonstrated how standard SQL can naturally support advanced XML hierarchical operations such as logical network structures, variable structures, and structural transformations. This third and final article will address issues of efficiency and ease of use utilizing SQL hierarchical views and how they come together to form an extremely powerful combination. The synergy created from this combination delivers a totally seamless, hierarchically optimized unified heterogeneous virtual view defined entirely in standard SQL. This assures an accurate, consistent, and efficient ANSI SQL hierarchical operation across the entire heterogeneous view. Figure 1 (below) demonstrates the processing of a heterogeneous hierarchical ANSI SQL query. The query involves the dynamic hierarchical joining and processing of a physical XML hierarchical structure and a logical hierarchically modeled relational structure. It will be used to demonstrate the SQL hierarchical capabilities described in this article necessary for ANSI SQL seamless native XML integration.
Figure 1
Hierarchical ViewsAs shown at the top of Figure 1, the left outer joins that model and define the XML documents and the relational hierarchical structures can be defined separately in their own SQL views. An important concept here is that these outer joins are also defining the hierarchical structure metadata and do not necessarily have to represent or perform join operations as in the case of XML physical structures. In all cases, they still represent their structure and data portion in the query’s virtual view shown on the right middle side of Figure 1. The ability to utilize this hierarchical metadata raises these SQL views to a higher more abstract level allowing significantly more freedom in how they are used to perform their ANSI SQL function. Hierarchical SQL views can be embedded in other hierarchical views or joined dynamically on the invoking SQL statement to hierarchical create larger views. The latter allows for ad hoc data modeling or joining of hierarchical structures at query invocation. This is demonstrated in Figure 1 starting at the top down where the XML view is joined over the RDB view which follows the semantics of the left join operation with its hierarchical data preservation. The ON clause is used to specify the exact join points between structures. Figure 1 demonstrates a typical hierarchical joining of hierarchical structures where the lower level structure is joined by its root node in a standard manner, keeping the newly formed structure and its hierarchical semantics intuitive. This does not have to be case, Part 1 used an example where the lower level structure was joined under its root node. This powerful view combining capability still produces a valid hierarchical result. It significantly increases the hierarchical view’s flexibility and transparency, and establishes SQL’s left outer join syntax as an advanced hierarchical data modeling language. Unified Heterogeneous Virtual View The natural view expansion of the invoking SQL query shown in the middle of Figure 1 demonstrates how the heterogeneous view types automatically merge together to form a single unified SQL view. Since this unified view also includes XML data which may be remotely located, it is also heterogeneous and virtual. This enables a consistent SQL operation across heterogeneous structure types. Also of interest is that the expanded views are automatically nested. This nesting is controlled by the delayed matching ON clause placement caused by expanding embedded views inserting additional joins and their matching ON clauses. Since outer joins are not performed until their matching ON clause is processed, this causes SQL views to be fully materialized before being joined. This occurs in the example in Figure 1 where the lower level RDB view is materialized fully before being joined to the previously materialized XML view. This operation, like the hierarchical operations already covered, is naturally present in the ANSI SQL standard and occurs automatically. While the nesting of views is not necessary for the query in Figure 1, there are many situations where this automatic operation becomes necessary as in the linking of the lower level structure below its root described in Part 1. It also protects other participating views from possible side effects caused by a view materialization because the nesting process causes the current view being materialized to be isolated in its own working set. This view nesting operation demonstrates that the left outer join is a powerful data modeling language for logical and physical structures and possesses many recombinant SQL capabilities. For distributed and heterogeneous processing there is an inverse operation to the recombinant SQL capability. When the virtual view is broken up and sent as separate SQL sections to their remote locations, the left outer join query statements that represent these different substructures also inherently carry their hierarchical structure metadata. This automatically enables the distributed processing to be naturally carried out hierarchically in autonomous pieces. It will be described later how the hierarchical processing can enable the distributed processing to perform in a linear fashion. It is worth mentioning that the left outer join hierarchical data modeling does produce less logic and coding errors. This is because it is more formal in nature and adheres to sound hierarchical principles. This is very similar to the advantages gained by structured programming. These characteristics also enable powerful semantic optimizations that are described in the next section. OptimizationEarlier in this article, it was mentioned that the data modeling outer join views can be used as metadata that defines the hierarchical structure rather than a more physical (code) definition. This allows both the hierarchical definition of physical structures like XML and the logical data modeling of relational structures. This also allows the hierarchical optimization of physical and logical structures by pruning the unneeded legs of the hierarchical data structure to avoid physical access navigation or logical join operations. This is possible by utilizing the hierarchical semantics of the metadata. Basically, only referenced nodes and nodes on the path to referenced nodes need to be accessed and processed. Referenced nodes are determined at runtime, so that every request is optimally optimized based on the data requested which is easily added or deleted from the SQL SELECT list at execution. Notice in Figure 1, the Relational Working Set located along the lower left side and the corresponding Hierarchical Working Data along the right side, do not contain the M or B nodes specified in their respective views. These nodes have been identified in the Unified SQL View by their dashed box and dashed connection lines indicating that they are not referenced or on a path to a referenced node. This is shown halfway down the right side of Figure 1. This optimization not only reduces data access, but significantly decreases Cartesian product data explosions which cause replicated data and their required additional processing. This has been a significant efficiency problem with XML integration in SQL. Standard inner join views can not utilize this powerful hierarchical optimization. With inner joins, all tables (or nodes) must be accessed to detect missing data (dangling tuples) in order to remove their incomplete rows from the result. This is not the case with left outer joins that hierarchically preserve rows with missing data. One reason that most SQL databases do not support this optimization is because they do not recognize hierarchical structures. Another reason is that the ANSI SQL specification defines the outer join operation in terms of the inner join causing this optimization not to be considered. Use of this powerful hierarchical optimization is easily triggered when recognizing the hierarchical virtual structure, and can be applied simply as a pre-step to the normal inner join optimization which is still applicable. This also makes these hierarchical outer join views even more powerful and user friendly than the older inner join views because a single large hierarchical view can do the job of many tailored inner join views. Normally views comprised of code would be very difficult to optimize globally, code is usually optimized in chunks, but this is not so for hierarchical outer join views with their hierarchical structure metadata. Their Unified SQL Virtual View defined by the expanded outer join views, shown midway down the right side of Figure 1, can be globally optimized utilizing its hierarchical structure metadata. This total control over the virtual view enables the transparent replacement of the relational Cartesian product engine with a hierarchical engine. This will completely eliminate the Cartesian product data replication efficiency problems discussed earlier and enable a linear processing. The use of a hierarchical engine is depicted near the bottom of Figure 1 where the left side represents relational processing and the right side demonstrates its corresponding hierarchical processing for the same query. Note that in hierarchical processing, the XML data access routine at the bottom center can supply the data directly in hierarchical data form. This complete hierarchical processing can be extremely efficient and can approach linear processing because there is no replicated data and the hierarchical joins are now performed by simple linking operations. Hierarchical View and Optimization Synergy Combining the hierarchical SQL view capability with the hierarchical optimization capabilities significantly enhances and extends ANSI SQL’s hierarchical processing capabilities. By optimizing views, larger more global views can be utilized without increasing overhead, this cuts down on the number of hierarchical views needed. This increases reuse, view abstraction, and ease of use. More significantly, this also allows a predefined view to automatically process any single or multi-leg query under its global view control. This allows fully optimized general purpose views and flexible ad hoc queries unique to SQL for hierarchical processing and native XML integration. Conclusion The first two articles in this series demonstrated that ANSI SQL can naturally support basic through advanced native XML hierarchical integration nonprocedurally. This third article has shown how the natural ANSI SQL support of hierarchical views and additional hierarchical optimization have supplied the ease of use and efficiency that makes this SQL native XML integration world class. In addition, the left outer join hierarchical metadata was shown to provide significant flexibility in how the hierarchical processing is carried out internally. This allows the hierarchical processing performed by the Cartesian product relational engine to be transparently replaced by a hierarchical engine which is considerably more suited for hierarchical processing. There are two extremely powerful leverage points that ANSI SQL can capitalize on for SQL native XML integration. This is SQL’s inherent hierarchical processing capability and through this capability the automatic utilization of the vast amount of hierarchical semantics naturally in the data structure being accessed. This allows for a full nonprocedural operation where the hierarchical semantics naturally available are utilized automatically to supply the correct hierarchical semantic processing. Because of this automatic operation, the user does not have to specify how the data is to be processed hierarchically. The user just specifies what hierarchical results are needed. Because this nonprocedural processing can automatically process hierarchical queries involving extremely complex hierarchical logic utilizing the data’s own semantics, it increases the value of the data by making this advanced processing more practical and available. The nonprocedural capabilities described above are not being fully utilized today by XML query products. This is evident by their required procedural programming of the complex hierarchical semantics needed to process the query. For further information on the SQL hierarchical processing and native XML integration capabilities described in this article please refer to: www.adatinc.com. Recap of unique SQL Native XML integration capabilities: * Full Nonprocedural Multi-leg Hierarchical Processing
* Uses Only ANSI SQL Centric Syntax
* Hierarchical Unified SQL Virtual Views
Copyright 2004 by Advanced Data Access Technologies, Inc. Go to Current Issue | Go to Issue Archive Recent articles by Michael M. David
Michael M. David - Michael is founder of Advanced Data Access Technologies, Inc. Previously he was a staff scientist designing relational utilities and the lead XML architect for NCR/Teradata and their representative
to the ANSI SQLX Group. Before the advent of relational databases, he designed and implemented commercial hierarchical query languages and program generator products for Informatics General and
Sterling Software. He has over twenty-five years experience researching and designing commercial nonprocedural, heterogeneous relational and hierarchical database processing products. Based on this
experience, he has authored the book Advanced ANSI SQL Data Modeling and Structure Processing and many papers and articles on this subject. Contact Mike at mike@adatinc.com.
|