|
Data Modeling & Enterprise Project Management - Part 4
Estimation – Considering Derived & Intermediate Data
Published: October 1, 2004
Published in TDAN.com October 2004
This is the fourth in a series of articles from Amit Bhagwat. Abstract Data modeling is no doubt one of the most important and challenging aspects of developing, maintaining, augmenting and integrating typical enterprise systems. More than 90% of functionality of enterprise systems is centered around creating, manipulating and querying data. It therefore stands to reason that individuals managing enterprise projects should leverage data modeling to execute their projects successfully and deliver not only capable and cost effective but also maintainable and extendable systems. A project manager is involved in a variety of tasks including estimation, planning, risk evaluation, resource management, monitoring - control, delivery management, etc. Virtually all of these activities are influenced by evolution of the data model and may benefit by taking it as the primary reference. This series of articles by Amit Bhagwat has gone through the associations between data modeling and various aspects of project management. Having explained the importance of the data model in the estimation process, taken an overview of various estimation approaches and having presented illustrative examples for them, this article addresses potential confusion arising out of derived and intermediate data. A Recap In the first article[1] of this series we established the central function of most enterprise projects as data-operation and concluded that data structures associated with a system would prove an effective starting point for estimation process. We also discussed the temporal and accuracy implications of analysis-time - design-time estimates and briefly considered interpreting estimates for specific funding style. In the next article[2] we took a simple case to illustrate the function-based estimation approach. We presented the case; itemized functions involved and then atomized them into transactions. We next analyzed how each transaction manipulated entities, followed by converting the transaction data to Unadjusted Function Point count (UFP) that can stand as basis for various estimation and resourcing calculations. We also briefly discussed a shortcut function-based technique. In the last article[3] we continued with the case considered in the previous article and illustrated the quicker but less accurate data-based approach. There were a few important points noted here:
1. The data-based approach is a direct and simplified derivation of the function-based approach
2. The data-based approach should consider data owned by the system
3. Whenever possible an elaborate function-based approach should be used with data-based approach providing a pre-estimate / quick-check
Agenda Having illustrated the two approaches to FPA, I hinted in the last article that we will next cover the impact of the following on estimation:
1. Intermediate - Derived data
2. Extent of Normalization / Denormalization
3. Generalization
To keep the reader focused and to allow assimilation of many concepts that these topics present, it may be prudent to confine ourselves to discussion on the first point, i.e. Intermediate - Derived data, in this article. We'll continue to use the example of book-lending facility at a public library that we have developed over the previous two articles. Our discussion will also touch upon the value of logical models to estimation and the value of data-based approach to detailed function-based analysis. Before we begin, it will be useful to have to our ready reference a view of important data elements - the entities owned by our subsystem. These are provided in Figures 1 - 2 below.
Fig. 1. : A view of important data elements
Fig. 2. : Entities owned by the lending facility subsystem
Intermediate - Derived Data In our subsystem, let's consider a requirement that if Total Fine Amount is less than a system setting value X (which is set by another subsystem), then the fine charged is zero, else the fine charged is the nearest round-figure in dollars (which also means that X automatically has a minimum meaningful value of half a dollar). What impact does this additional requirement have on the estimate? First, consider what changes there are to the data structure. We have a new system setting X, which may not be completely owned by our subsystem, but which is none-the-less read in the context of our subsystem's functionality. Then we have two Total Fine quantities:
1. Total Fine calculated before rounding (Let's call it Fcalc)
2. Total Fine charged after rounding (Let's call it Fact)
We know that Fcalc is the total of all fines for a particular Borrower ID for a particular Return on a Return Date (I know what some of you are thinking. If you have exceeded the return date by a small margin, you can potentially save paying any fine at all, simply by returning the items separately throughout the day. I personally will be happy to waive off your fine so long as the delay is slight and hasn't inconvenienced other library users, though I bet I have set some librarians thinking!) We therefore can, from a data-economy viewpoint, do without separately storing Fcalc in the database (we'll need to have an ID locking process to define a Return which could be replicated from timestamps on Returned Date (which of course means Returned Date is actually Returned Date-time)). On the other hand, storing it achieves a significant amount of process economy if this data is going to be referred to repeatedly. Fact is likewise redundant data in that it can be calculated easily knowing X and Fcalc. So which will you store, say in the entity Total Fine in our example?
a. Fcalc
b. Fact
c. Both
d. Neither
I am not sure that there is any right or wrong in any of these answers, though I would have considered storing Fact, as this is the Total Fine Amount that is of the greatest interest to the business. Those who follow this approach will have one Total Fine Amount Fact saved with the entity Total Fine. To them, Fcalc will be a transient intermediate quantity. For those who save Fcalc instead, Fact will be a transient derived quantity. For those who store both, both will be persistent derived quantities; whereas for those who like to calculate Fact on-the-fly, both will be transient derived quantities. This means the number of attributes associated with the entity Total Fine will vary depending on the line of thinking. The run-time performance of the system will likewise be dictated by this line of thinking. However, given the ultimate responsibility of the system to calculate total fine and, from it, payable fine, the programmer will have to write the entire algorithm in any of the four cases. It is therefore prudent here to consider both Fcalc and Fact as attributes of the entity Total Fine for estimation purposes. In transaction terms, if we need to implement an algorithm y = f (x), and where f is a simple algorithm (as in most business systems), we need to account for the efforts of writing this algorithm as those of reading x and writing y, whether or not x - y exist in the database or simply in transient memory. Impact of Potentially Transient Data In the last article, we calculated UFP based on E=3, R=2, A=11. Now adding one attribute to account for the two Fine quantities associated with the entity Total Fine, i.e. A = 12, we get UFP = (1.42 x 12 x (1 + 2/3)) + (8.58 x 3) + (13.28 x 2) = 28.4 + 25.74 + 26.56 = 80.7 ~ 81 That's an addition of 3 UFP caused by the added functionality. In functional terms, we have one additional transaction that deals with an input (X, although this is typically obtained from system setting, it is too trivial to be considered an entity, but rather deserves status of input), an entity and an output. We thus have addition of 0.58 + 1.66 + 0.26 = 2.50 to our old estimate of 78.78 (from second article of this series) giving 81.28 ~ 81 Using either approach, we find the estimate going up by ~ 4% to implement the additional feature desired. Estimation Approach We observe here that when both Fcalc and Fact are important in the logic, they contribute to the estimation, whether or not they are stored physically. This illustrates that in estimation we must consider all data that logically belongs to the system, irrespective of whether it is physically located in persistent storage, or for that matter whether it is afforded a separate variable in the algorithm. In other words, for those who may be harassed by the logical-physical dilemma, the message is loud and clear: as the process of estimation is fundamentally function-based, stick to logical data for purpose of estimation. In terms of steps followed in estimation, it is useful, easier - quicker to locate Fcalc and Fact, and associate them with Total Fine, simply by scanning through the requirement document for nouns and their adjective qualifiers. This activity precedes and leads to discovery - understanding of transactions. An elaborate transactional analysis that projects the role of the data (as contributor to a transaction) in system functionality can then follow. In other words, the typical early steps taken in data-based approach make function-based approach easier, quicker and more accurate. This means that not only is data-based estimation a quick-reckoner, but in fact it is also a useful tool for an estimator who is only moderately familiar with the system functionality and who therefore needs a means of attaining familiarity with what the system does, before proceeding with the function-based treatment. Conclusions
What's next Our next focus of attention will be on the effect of Denormalization, Normalization - Generalization of data on estimation. I also have some comments to make towards the end on estimation refinement, in sync with data structure refinement. In the meantime, I would like to suggest a small activity for you. Spot an instance of looser coupling attained by multiplicity relationship between certain entities we have discussed in our example. This multiplicity relationship can be refined in the context of system functionality presented to us. See if you can locate this. I'll be taking this as an example when I discuss data structure refinement. --------------------------------------------------------------------------------
[i] Amit Bhagwat - Data Modeling - Enterprise Project Management, Part 1: Estimation – TDAN (Issue 26) [ii] Amit Bhagwat - Data Modeling - Enterprise Project Management, Part 2: Estimation Example – The Function-based Approach – TDAN (Issue 27) [iii] Amit Bhagwat - Data Modeling - Enterprise Project Management, Part 3: Estimation Example – The Data-based Approach – TDAN (Issue 28) |