Table of Contents
This course concerns business-oriented systems design, particularly with regard to the business and process implications of information technology (IT) and database design.
Why are such issues important? The journey from an intriguing strategic idea to a group of business systems that successfully implement that idea is neither direct nor easy, and it often is never completed unless it can be accomplished relatively quickly. Designing traditional information systems tends to be neither simple nor fast, especially when system design gets caught up in organizational conflict between line departments and the systems development function.
With this in mind, our focus while considering the problem of applying information systems to business processes will be on highly iterative, relatively rapid techniques for designing IT-based process support. Specifically, we will focus on asking you to develop expertise that offers business solutions that line managers will want - with the additional skill of being able to express those solutions at a sufficiently detailed design level so that information systems and database specialists specialists know what production-strength systems to build.
One way of defining a system is as a collection of highly-interrelated parts. An information systems researcher at Boeing, for example, describes his company's product in this way:
We think of a 747 as more than two million parts flying in close formation.
In this sense, the success of a Boeing airliner depends importantly on the design that enables the plane's parts to work together successfully. This analogy holds beyond the boundaries of the aerospace industry. Most consumer goods, particularly consumer electronics goods, are physical representations of systems in that they have component parts that work together to deliver some function. Tape decks have a read/write head assembly, a volume control, input/output jacks, a volume/overload display, and other components that assist in their adjustment and operation. CD players and clock/radios have many of the same components, designed in slightly different ways.
In our discussions during this course we will use the term system to refer to such collections of component parts, particularly as those components represent the pieces of hardware and software that combine to deliver an information system. Within these boundaries, our choices of hardware will necessarily be limited to microprocessor-controlled PC- or Mac-compatible machines, and our work with software will be constrained to various applications packages that run on such equipment, but there is nothing that says that the expertise we develop here could not be transferred to other types of information systems and other types of software.
An information system represents a particularly interesting challenge from the point of view of systems design. You have no doubt heard at some point that the term computer is actually a shortened form of a reference to so-called general purpose computing devices - in other words, a machine that in theory could do any type of computation provided that you fed it the appropriate programming instructions. In many respects, computers live up to this advance billing. Originally used to calculate missile trajectories (e.g., Eniac in WWII), they now process accounting transactions inside large corporations, simulate weather for the NOAA, provide document management services to law firms, function as electronic typewriters for word processing packages, and deliver the calculation horsepower behind spreadsheets.
The "do-anything" general purpose nature of these machines, however, leads to a particular class of design problems that are particularly salient to software systems. These problems focus around the need to provide structure within which to design useful systems. The techniques and tools that we will examine this term illustrate several examples of solutions that system and process designers have developed for this problem. In one sense, being able to do "anything" means being able to do "too much" - without the physical constraints that affect more tangible products (e.g., you can only do so much with plastic buttons on the front of a portable CD player), code-based information systems offer confusingly few boundaries against which designers can react. The result can lead to operationally elegant systems that prove useless in practice.
One traditional way of thinking about information systems, for example, suggested that a simple model could capture most of the salient design features that needed to be considered for any application: This model described information systems as a combination of input, processing, and output components (see Figure 1).

This perspective added some simple structure for understanding information systems development projects: e.g., if a team of designers could understand the requirements of the system as they related to input, processing, and outputs, they could deliver a systems that would be both satisfactory and operationally successful. During the years that development teams were programming in assembly language and prior to the introduction of successful full-featured operation systems, building a system that worked at all was no small task, much less building one that fulfilled its sponsor's (often inflated) hopes for its business application.
The input-process-output model is still useful today, and can provide productive checklists to use in addition to the tools and techniques developed more recently. Several decades of experience with this approach, however, led to insights that extended the notion of system structure.
One fundamental insight that arose from the experience of system developers who sought to apply the input-process-output model was that the success of IS support depended critically on the data that the system delivered. This might not seem earth-shattering to you, but the developers who worked in this area did take the thought in an interesting direction: they began to ask whether there wasn't structure inherent in the data that a system was to use. In many business applications, this proved to be a very powerful idea, enabling system designers to develop data-driven applications that uncovered relationships unknown even to business managers who had spent their careers working in the markets the systems were intended to support. This notion of data structure had important influence in other ways, especially insofar as it supported the development of so-called database management systems that handled data as an entity abstracted from the other operations of a business. This separation supported an important shift in perspective on the part of many managers - one that recognized that information flows could be as important as physical product flows. Beginning in the late 1960s, researchers in disciplines related to both information systems and management control began to understand the potential of focusing on information flow as well as on other, more physical flows associated with a business.
The ideas behind data structure analysis have direct bearing on the design of the database management software we will use in this course and on the techniques that we will be using to pursue traditional information systems design while building systems prototypes. In one sense, the progression of analytical data models that has taken place since WWII and the software systems built to exploit each new paradigm shift has largely dictated the technical system design terrain as we now know it.
This perspective has led to focusing on two aspects of data: data flows and data structure. Not surprisingly, the two interrelate, and techniques exist to assist analysts in understanding both areas. We will study two of these techniques during this term.
In the past ten years a new perspective has gained prominence that suggests the frameworks suggested by traditional systems development approaches are somewhat limited in scope. These approaches, which originated in ideas associated with industrial engineering (a field often concerned with the need to keep assembly lines running at maximum efficiency), focused on business processes.
The first insight from this area was that process design was of strategic importance because it was the structure of processes that dictated business performance. For results-oriented managers, this shift was often difficult to absorb, but the success of various reengineering consultants in the past five years suggests the rapidity with which this message has been absorbed. With renewed focus on processes came new focus on process analysis and design. Tools in these areas are just beginning to be developed. We will consider one approach to process design in this course in some detail.
The link between strategies and processes is an important one, for it shows a path by which we can put information systems design techniques to very productive use (see Figure 2). To the degree that a blend of analytical approaches can assist us in developing systems that support process redesign, it may become possible to develop systems that act as powerful levers for both organizational performance improvements and longer-term organizational learning. We will discuss some of the implications of this shift during the term.
Figure 2: Combining process and data structures links strategy and implementation
The remaining sections of this reading suggests some of the trends that have taken place in the development of data structure analysis and database management in the past several decades. In effect, they describe activities that have evolved at the furthest righthand extreme of Figure 2 (the area marked Information Systems Design). They provide some background to the technical IS design ideas that we will be covering in the upcoming weeks.
This section (1) introduces a perspective on information within organizations that examines the assumptions implicit behind terms such as database management and database systems, and (2) introduces a technique, commonly referred to as Entity-Relationship Analysis (or Entity-Relationship modeling), for understanding data characteristics important to database design.
Entity-Relationship analysis uses Entity-Relationship Diagrams (often referred to as ERDs) to understand the structures inherent in data that a database management system is trying to organize. Because of this focus, we will often refer to ERDs as a data structure modeling technique to distinguish it from other approaches, such as data flow diagramming, which tend to focus on data processing rather than data structures alone. We will discuss data flow diagramming (e.g., DFDs) in upcoming course sessions, after we have examined ERDs as an example of data structure diagramming (e.g., DSD).
I should warn you at the outset that engineers and system developers who have devoted their professional lives to database development sometimes harbor feelings of near-religious intensity about data structures and data processes. There are those who passionately believe that one methodology or one tool alone can do all the database design one needs -- and they may tend to describe all other tools are useless. For example, some system designers with excellent technical educations have been trained in nothing but entity-relationship analysis. Some systems analysts use data flow analysis techniques heavily and let other people worry about data structures. Others swear by object-oriented design techniques. Some even would argue that it is unfair to call entity-relationship modeling "data-structure" analysis, since they reserve that term exclusively for techniques developed by industry researchers such as Charles Bachman.
Our point of view in this course is less doctrinal than that. At one level, we are interested in understanding how technical system design tools can be useful for building information systems within which databases play an important role. At a higher level, we are interested in understanding how we can leverage a useful set of design techniques to integrate knowledge about organizational activities, information processing, and data structures in order to design business processes that achieve strategically important goals. Among the many available system design techniques we could examine, I have chosen two -- entity-relationship modeling and data flow diagramming -- to provide representative perspectives on data structures and data flows. We will discuss the value and limitations of each technique extensively during class sessions.
The point of this explanation is not to dissuade you from concentrating on understanding the particular tools that we cover (although those of you who are taking seven courses and working part-time on the side while planning a wedding for the middle of the term may find this a convenient opportunity), but to emphasize the following point: while there is no universally accepted way to analyze data, information flows, or business process characteristics for the purpose of designing database systems, there are some helpful techniques available for structuring the masses of detail that you must handle while doing so. Analyses that focus on data structures and methods that consider data flows represent two of these.
By the end of the course you will be able to discuss ERD and DFD techniques intelligently with technically-trained systems analysts who have spent years working with specific methodologies- at least intelligently enough so that you can learn from them and they can learn from you. You will be able to use analytical techniques to build prototypes that professional systems developers can understand and use to deliver industrial-strength systems to you in less time than would otherwise be possible. You will also be able to use such prototypes to articulate to line business managers why they should care about the organizational and systems changes that you recommend. In short, you will have developed expertise that includes powerful tools for generating, communicating, and implementing new strategies within business organizations.
Portions of the following material are adapted from McFadden, F. R. and J. A. Hoffer (1994) Modern Database Management Menlo Park, CA, Benjamin/Cummings, Chapters 1-4, pp. 2-199. Our discussion begins by examining the evolution of databases and database design theories. We then focus on entity-relationship modeling as a method for supporting database design.
Databases represent a phenomenon that emerged from technical developments in information systems as business managers encouraged system designers to deliver increasingly cost-efficient yet effective methods for storing, organizing, distributing, and managing the details of business operations. In very rough terms, it has taken upwards of thirty years to develop the database concepts, technology and systems that are represented by familiar business tools used today. These tools include products that create and manage databases on mainframe computers (such as IBM's DB2), on minicomputers (such as Oracle and Informix), and on personal computers (such as Microsoft Access, Borland's Paradox, and the various dBase packages). Although the differences in scale between these products can be large (e.g., DB2 routinely handles millions of records while we may discover that Access often begins to slow down after a thousand), they tend to use such similar design concepts and data models that it is not difficult to develop a prototype delivered on a PC, for example, into a much larger application and database intended to reside on a mainframe.
To begin with, it might be helpful to review briefly how database
concepts developed out of the habits and constraints associated
with early programming and the problems that managers encountered
in attempting to apply early information processing tools to business
procedures. In this discussion, it can be useful to remember two
points. First, computers are essentially numerical processors
- i.e., they are best at following sequences of extremely precise
instructions, no matter how long those sequences might be. Second,
computers can execute long sequences of instructions extremely
quickly: i.e., they are fast. Human beings, on the other hand,
tend to be quite slow at executing numerical instructions (how
fast can you do long division to sixteen decimal places in your
head?), but are positively brilliant at handling ambiguous or
incomplete data: i.e., they normally don't need all the steps
of a process spelled out to them beyond a certain level. A sign
said to hang in IBM's Tokyo office captures this contrast quite
well:
|
Much of the tension in database design, and many of the techniques that attempt to force structure onto the database design process, derive from this fact: that people are very good at handling relatively small amounts of relatively stable but potentially ambiguous detail, while computers are suited to managing very large volumes of rapidly changing details so long as the definitions of the data they encounter are precisely and consistently defined. Much of the history of database theory and system design tools can be interpreted as an attempt to strike an appropriate balance between human aptitudes and machine aptitudes, given the technological choices available at the time.
Error! Reference source not found. suggests how the interaction of business needs and technology evolution has produced the concepts that we label databases and database management systems. Reviewing these developments offers a useful way to understand and define basic terms that we will need for our discussions. For another perspective on the ideas presented here, refer to McFadden and Hoffer (1994: 5-13).
Figure 3: Database management system (DBMS) evolution

Business expansion following WWII Following the second world war, between approximately 1946 and 1965, the U.S. economy surprised most observers by avoiding the recession/depression cycles that were expected (and had indeed been experienced following demobilization from WWI). Instead, it grew at unprecedented rates for an unexpectedly long period of time. As a result, many U.S. corporations found themselves forced to cope with rapidly increasing volumes of business as they expanded on a worldwide scale. In many industries this expansion placed new strain on procedures that had been developed in the first third of the century for collecting and managing the information needed to run a large business. Starting in the early 1950s, selected large corporations followed the lead of the U.S. government and started to apply then so-called data processing equipment to collecting and managing business data. The federal government had used numerical processors during the war for mathematically-intensive work such as computing artillery trajectories, and began to apply electronic computing machines to collecting large volumes of data, such as census data, soon afterwards. Census-taking, with its requirements for handling enormous amounts of pre-defined details, had been a focal point for mechanically-assisted data processing since the end of the 19th century, and was one of the first areas to go electronic after WWII.
Against this background, the first mainframe computers emerged to collected transaction-oriented business data. It is useful to remember that these machines replaced systems that collected and collated business information using paper-based systems -- systems that sometimes resulted in the equivalent of acres of large file cabinets at large corporations. This situation is suggested by panel (a) of Error! Reference source not found..
Early file-oriented systems It was only natural, perhaps, that the early computing machines treated "data" much as paper forms treated tables of figures -- e.g., they maintained data as an integral part of a specific program, just as paper reports retained tables of data as integral parts of a page (see panel (b) of Figure 3). To understand the implications of this approach, consider two typical business processes that converted early to computerized support: sales order entry and accounts receivables. Sales order systems, for example, retained all the data about sales orders as a part of the sales order program. Accounts Receivable (A/R) programs retained receivables data as part of the A/R program. This worked well until the Accounting department wanted to generate receivables based on the order data generated by Sales. All too often, companies then discovered, the data used by the two programs could not be integrated -- e.g., for technical reasons the data generated by the sales order system could not be used directly by the receivables system. This led to inefficient practices, such as hiring armies of data-entry personnel to retype order information into the receivables system from paper reports generated by the sales order system.
Modularization As recently as 1970, many companies using mainframe-based software found themselves at the mercy of nonintegrated programs. The solution commercialized during that decade exploited a technology trend that continues to this day: the increasing modularization of software components. In database terms, this meant that the raw details that programs employed for their calculations (e.g., the data needed by both the sales order system and the receivables system described above) became maintained separately from the applications programs that specified how a sale was transacted or a receivable collected. Maintaining and managing data as a separate resource from applications code required a new type of software that could (a) organize and manage data storage efficiently while (b) providing standardized, effective access to data for specialized programs (such as order entry and A/R). This type of software became referred to as a database management system, just as the collection of raw details that it managed came to be called a database. Panel (c) of Figure 3 illustrates a database management package and a specific applications package residing on a corporate mainframe. The applications package no longer retains specific lines of data used by the application, as shown in panel (b). Instead, the applications software refers to the database management system to obtain any data needed for its operations. The database management system (DBMS) maintains the data in a form that can be accessed by any applications program used by the company.
Figure 4 summarizes this shift. The change from monolithic to modular software proved to be an important step forward -- one that we can see continuing to play out in the mid-1990s with Internet standards, operating system kernels, graphical user interfaces, componentware such as Visual Basic Extensions (VBXs), and object-oriented programming.
Figure 4: Non-integrated vs. integrated database systems
Flexibility The business implications of this change were important. For the first time, data could travel across organizational boundaries where it was most needed -- without the enormous effort of developing integrated, customized software to make this possible. The presence of modular databases made it possible to increase organizational flexibility as well: supporting a new product or service usually meant developing a new application to interact with the database management system, not building entirely new software from the ground up. At the same time, existing business procedures could be advanced by swapping in a new software application for an older one, without the need to rework data that was already satisfactorily organized by the DBMS. Changing to a new Sales order entry system, for example, now meant converting to a new program that could work with the existing corporate database rather than rebuilding both program and database from scratch.
Data as a resource In this context, "data" came to be viewed as a generalized corporate "resource", similar in many ways to the more tangible resources represented by assets such as land, buildings, and equipment. Over the decades from 1950 to 1980, new technical positions emerged devoted to the care and feeding of corporate data, with names such as database administrator, data analyst, and database designer. Engineering departments at universities, database vendors, and industry consultants developed database design methodologies that purported to solve the many problems that arose as corporations attempted to separate program code from data management. As you probably guessed, the transition from the paper-based systems suggested in panel (a) of Figure 3 to the distributed database architecture suggested by panel (d) has not been as smooth as the preceding paragraphs may have implied -- in fact, many companies have neither successfully nor fully made the transition even by 1996. Some observers, moreover, might suggest that most large organizations will perpetually remain in a state of transition, as new data and new data structures emerge from the course of business as rapidly as existing databases can adjust to them.
The information hierarchy Against this background some generalized definitions of terms might now be useful. In an attempt to be precise, we can distinguish between data, or the "facts concerning things such as people, objects, or events" that a business needs to track, and information, which we can consider the description of relationships between items of data (McFadden and Hoffer 1994: 7). Concepts, following this progression, provide guidance and insight by emphasizing the relationships between pieces of information. Understanding shared concepts based on similar information and consistent data suggests one way of defining business knowledge. Figure 5 illustrates how these definitions complement one another.
Figure 5: A hierarchy of data, information, concepts, and knowledge
Are these definitions bulletproof? Of course not. We could probably have arguments about them for hours and come out having made little more progress than when we started. All four terms are notoriously slippery -- easy to say, hard to define precisely. A knowledge-concepts-information-data hierarchy seems important, however, because it implies one of the more difficult challenges facing database designers. It is relatively easy for us to say in general that these four elements are linked, and to provide examples which describe knowledge as built upon concepts, information, and data (as above). A database designer, however, has to go in both directions at once: he or she is presented with masses of apparently unrelated data and has to understand what information the database system should support to provide useful business knowledge to line managers. This sometimes (often, perhaps) is not a question to which the line managers themselves know the answer. The techniques of database design that we discuss in this course represent attempts to understand the generic structures that need to be identified within collections of data in order to develop useful knowledge from them. They represent techniques for identifying and designing a useful, modular data "resource".
Managing the data asset Researchers have pointed out several characteristics of data management that assist in defining how information can be managed as a corporate asset. McFadden and Hoffer (1994:6) build an organizational model of "information resource management" with six elements. I have paraphrased their model below. They emphasize how an organization's reliance on information about physical products or services tends to increase with sales volume. They also point out the role played by database and systems management in ensuring that information used to make management decisions is timely, accurate, secure, and relevant. As organizations move towards increasing use of widely-distributed databases to support flexible and partially autonomous business processes (e.g., the situation implied by panel (d) in Error! Reference source not found.), the interrelationships between processes, the flow of physical resources, and the flow of information prove to be of increasing strategic importance.
Table 1: A resource-flow model of the corporation
| Model component | Description & Argument |
| A resource-focused business model | A business organization transforms resources to deliver product and services within the competitive conditions of an external market. |
| Physical flows and information flows | Those resources can be described as contributing to a flow of physical products or services and a flow of information about those products or services. Physical resources include people, materials, machines, buildings, machines, and money. Information resources represent knowledge and information derived from the ongoing flow of data about physical flows. |
| Greater scale means increased reliance on information flows | As the scale of an operation grows, it becomes more difficult to manage physical flows directly; organization managers typically come to rely increasingly on information flows to manage the business. |
| Managing physical flows and information | Information flows can be managed according to the same principles developed for managing physical resources (e.g., planning, inventory management, cost efficiency, performance effectiveness). |
| Specifics for managing information flow | Managing information flows includes managing data acquisition (so that data are available prior to when they are needed), data security (so the information resource is not compromised), quality assurance (so the data are consistently accurate), and maintenance (so that stale data, however defined, is filtered out appropriately). |
| Organizational commitment | Information flows can only be managed well through organizational rather than merely individual commitment. |
Source: Adapted from McLeod and Brittain-White, 1988
Information infrastructure Many organizations -- particularly large corporations that must coordinate large volumes of business over widely-dispersed geographies -- have evolved mixed architectures of hardware, software, and networks intended to deliver data on an as-needed basis to managers at any company location. The combination of networks, mainframes, minicomputers, and personal computers with attendant software applications for each can be considered an information infrastructure -- a set of information-technology-intensive organizational structures exclusively dedicated to supporting the capture, development and communication of information within the organization. Databases tend to play an important role in storing, managing, and transferring data within such an infrastructure. What began as large, centralized databases such as those suggested by panel (c) in Figure 3 have evolved into so-called distributed databases, as suggested by panel (d) in Figure 3.
Distributed data Distributed databases no longer restrict data storage to a monolithic, centralized machine: instead, they enable a corporation to aggregate data as needed from many (sometimes thousands) of local databases that store details of use to specific departments. This approach tends to add complexity to challenges of data management. Where distributed databases are homogeneous, meaning that each small local database uses the same definitions for defining data, aggregating detail into a centralized data repository is not much more difficult than electronically collecting the most recent copy of data in each location. Over time, however, distributed databases tend to become increasingly hetergeneous -- i.e., their designs tend to diverge as local business needs encourage departments to adapt the kinds of data that they define and collect.
Heterogeneous databases prove difficult to combine, for several reasons. First, they can produce data definitions that are inconsistent -- and there are often powerful organizational reasons for those inconsistencies to persist. One company, for example, discovered that the term "sales" did not have the same meaning across the company. Databases in Marketing recorded orders as sales. Databases in Manufacturing recorded shipments as sales. Databases in Accounting matched sales with receivables. Relying on such data, functional managers in these three areas developed three very different views of the business -- so much so that they had difficulty working together until the inconsistency was resolved. There were logical organizational reasons for the difficulty: the structure of the company's performance incentives meant that adopting a sales=shipments definition would have reduced marketing managers' compensation and using a sales=orders definition would have penalized manufacturing executives.
Beyond differences in data definition, different copies of databases often develop multiple copies of the same data -- copies that diverge over time. As a simple example, the database in a bank's Loans department might list a customer as Jane A. Doe, while the database in the Savings department might list the same person as Jane Austen Doe. The database in the Customer Service department might list this individual as Jane S. Doe (because the data entry operator typed the record late on a Friday afternoon and pressed the S instead of A). This might not strike you as a difficulty, but consider that the bank handled over 2 million records listing the account information of 25 products and services covering over 200,000 customers. At any time, between 10% and 25% of the bank's records could have been in error -- the only problem was determining which 25%. From the database's point of view, Jane A. Doe, Jane Austen Doe, and Jane S. Doe represent different individuals. Over time, it became impossible for managers using the database to be sure that they had correct data.
A third type of problem arises when business needs change the kind of information that corporations need to generate from existing databases. Between 1985 and 1995, cross-selling became increasingly important at insurance companies and other financial institutions: existing customers became an important source of new product revenue. To cross-sell effectively it became important to have a complete profile of a customer's dealings with the company. Many insurance companies discovered that their existing databases did not describe customer relationships in this way: instead, they listed specific relationships by policy number rather than by customer name. Jane Doe, for example, might be known to the company as Life Insurance Policy #115524, not as Jane Doe; it was almost impossible to find out what other relationships she had with the company short of sorting through millions of other policy records looking for possible matches. For some companies it took years to complete the sorts and database redesigns required to represent their business on a customer-relationship basis rather than a product-relationship basis.
Data integrity, redundancy, and timeliness These examples, easiest to understand in the case of heterogeneous databases, highlight problems encountered by any database installation. They are often referred to as issues of data integrity and data redundancy, although the two issues prove highly interrelated in practice. Simply put, data redundancy (the presence of multiple copies of any single data item) threatens data integrity (the accuracy and consistency with which data are maintained in the database). Many of the features of the data models we will study in this course are devoted to maximizing integrity by minimizing redundancy.
From this perspective it is easier to see data-related issues as a recurring balance that must be maintained between data accuracy, data redundancy, and data timeliness. Accurate data often take longer to deliver than approximate data, and take more effort to maintain, yet a corporate database that is inaccurate rapidly becomes untrustworthy (and a waste of time and money). Redundant data breeds mistakes (e.g., as one systems developer put it, "multiple versions of the truth"), and increases maintenance costs. Issues of timeliness suggest the importance of understanding the "shelf life" of relevant data items -- e.g., a sufficiently detailed understanding of business performance so that an information support system knows what data need to be seen in what form at what time by whom. The level of understanding of both market dynamics and internal operations required to accomplish effective information flows often takes time and great effort to develop.
Figure 6 suggests the interrelatedness of these issues. The balance that must be struck between efficiency (e.g., non-redundant data) and effectiveness (e.g., trustworthy, timely data) influences every database model and database design technique we examine in this course. The search for solving tradeoffs between these factors first led database designers to think carefully about the structure of the data that they were trying to represent using DBMS applications. Computer-aided systems engineering (CASE) software often explicitly incorporates a specific method for understanding data structures in order to encourage efficient and effective design. While databases themselves do not represent an entire solution for supporting business processes, they often play a pivotal role in implementing process redesign.
Figure 6: Data management issues
Since 1950 there have been five major approaches to database management systems: file-oriented, hierarchical, network, relational, and object-oriented database designs. Production systems in large corporations tend to use a mix of hierarchical and relational databases. Databases in smaller companies or in R&D facilities tend to concentrate on relational databases and object-oriented databases. Our prototypes in this course will focus on using relational databases.
This section offers a quick comparison of hierarchical, network, and relational database models, with file-structured and object-oriented systems noted as contrasts. Most commercial databases in the mid-1990s rely on relational data models, even when high transaction speeds are needed. File-oriented systems predate the development of database management concepts. Object-oriented systems apply the newest concepts of modularity to both applications and data.
The five major database design perspectives represent an evolution of design thinking -- in a sense, each design model represents a reaction to those that preceded it. Hierarchical systems were developed, in part, to overcome the shortcomings of file-oriented systems. Network database designs developed as a response to the limitations of hierarchical designs. Relational databases emerged as a new solution to problems raised by both hierarchical and network designs. Object-oriented systems developed from a contrasting perspective that questioned orthodox assumptions underlying earlier design approaches. Each new approach to database design required the use of increasingly powerful computers to achieve satisfactory performance for large volumes of data. As faster machines became available, commercial versions of each design perspective gained temporary dominance (see Figure 7). In the decades prior to 1970, hierarchical and network systems replaced file-oriented databases. Between 1970 and 1990, relational databases became the design of choice. In the mid-1990s a transition may be occurring from relational databases to object-oriented databases, but it is not clear whether computer power has yet reached a sufficient level to support this shift.
Figure 7: Database design trends
It is also important to recognize that other approaches to databases are emerging that adhere less rigidly to the data file/data record/data field organizational structure first encouraged by hierarchical designs. Some databases, such as those built using software such as Lotus Notes or the World Wide Web, can be described as document-oriented -- e.g., the database organizes text based on documents, forms, and templates rather than records and fields. Database purists might argue with this interpretation (Lotus Notes, for example, uses fields within its documents and is actually built upon a sophisticated database engine that most users never see), but the development of such systems is worth noting because of their management implications. For some purposes, document-oriented databases may be easier to construct and use than more traditional designs. We will examine this question during the course by using several examples of newly-emerging database designs, and comparing the results with using Microsoft Access, a small-scale relational database.
File-oriented systems Early computer programs retained data that programmers thought important to program execution as lines of data included in the program code itself. This approach worked well for many scientific programs, where the lines of code devoted to processing were extensive relative to the lines of code devoted to describing data, but business programs (which often represented relatively simple processing steps applied to huge amounts of data) soon proved unwieldy. Soon programs were being written to use so-called data files, or external lists of data that one specific program knew how to read.
File-oriented systems preserved specific data files for specific programs. This was efficient from a processing standpoint, but soon led to complications from a business standpoint. File-oriented systems tended to be slow, hard to maintain, and very cumbersome when business processes required trading data across organizational functions or departments: too often the programs in one department could not read the data used by programs in another (as suggested in Error! Reference source not found.).
File-oriented systems led to situations where corporate data existed in fragments throughout the organization but where moving data across functional boundaries (to track a business process, for example) was extremely difficult, if not impossible. Using the arrangement of programs and files shown in Error! Reference source not found., for example, it would be very difficult for a sales representative to tell a customer what the expected price changes on back-ordered products might be: prices reside with Accounting while inventory information resides with the Orders Department; both sets of data are accessed by different people using different systems; and the file formats used by different programs may be incompatible.
Figure 8: File-oriented database design
Hierarchical systems The original response to the limitations of file-oriented systems, as we have discussed, initiated the development of specialized database management software (database management systems, or DBMS). Early forms of DBMS arranged data in separate repositories in a hierarchical structure based on what functions system developers expected the database would be used for most frequently. This hierarchical structure collects specific data characteristics into data fields which are arranged as data records within a database. The hierarchical model derives its name from the manner in which records are linked for navigational purposes within the database. A customer record, for example, might include a customer number that links to customer name and address data, order data, and price data.
In Figure 8, for example, a hierarchical database is designed for rapid searches of orders by customer number. Each customer number is linked to a customer name and address and one or more orders. By searching to find what orders are associated with a customer number, sales representatives can find what products are included in any order, and search the database separately to find which orders are backordered. Accounting staff can accumulate order quantities and prices by customer number to generate invoices.
Figure 9 also illustrates two of the kinds of links between data items that play a key role in database design. Note that in this example, each customer number is associated with one and only one name (e.g., see link (a) in the schematic). Database designers often refer to such a link as a one-to-one relationship. Each customer number, however, is linked with one or more orders (as suggested by the Product/Price illustration at the bottom of link (b) in the schematic). This kind of link is often referred to as a one-to-many relationship. Two other kinds of links are possible: many-to-one relationships and many-to-many relationships. At one level, much of the work of database design focuses on identifying such relationships and accommodating record structures to them.
The advantages of separating data from programs using a DBMS begins to become apparent from examining Figure 9. Now any sales rep can use the database to answer customer's questions about orders, and any member of the accounting staff can use the database to generate invoices. In this sense the hierarchical database begins to solve the data fragmentation problems suggested by the file-oriented design shown in Error! Reference source not found., and offers a way to share data for multiple purposes across organizational boundaries.
Figure 9: Hierarchical database design
The hierarchical design, however, carries with it severe limitations. In the example above, these become clear when we attempt to generate a price list for all products that includes order and backorder history. Such a task quickly demonstrates that the database above is well-designed to answer any questions that can be answered based on looking at data according to customer number or product number, but is less well suited for answering questions that require some other navigational hierarchy (e.g., price). Building a price list using the database design above would require sorting through all products and prices by order and customer number, then rearranging the results on the basis of price. Only then could backorder data begin to be added. Such tasks often required extensive custom programming in early hierarchical databases. When business conditions changed to make existing navigational hierarchies obsolete (as when insurance companies sought to understand their product sales on a customer basis rather than on a policy-number basis), hierarchical databases sometimes proved unexpectedly cumbersome.
Network systems Network database designs (not to be confused with data communications networks, which are a different subject entirely), evolved in part to solve navigational problems encountered in hierarchical designs. In practice, the two types of databases often appear quite similar. Network designs, however, build more sophisticated links between database records than do hierarchical approaches. In particular, network designs enable multiple paths between records. These paths are maintained as part of the database definition, and are intended to specify mechanisms for answering most conceivable questions that could be asked of the data.
The schematic shown in Figure 10 suggests a simple extension to the hierarchical database that would add network design properties. Here, for example, an explicit pathway is maintained between product numbers, orders, and prices (see the arrow labeled (a) in the schematic). Maintaining this link would make it easier to traverse from customer orders to backordered products to determine the intersection of the two sets -- e.g., which records exist in both groups. The result would list what customers had products on backorder, and describe pricing for those products. In this sense, the network database would make it easier to collect pricing information without the potentially cumbersome multi-step processing required by the hierarchical design.
Figure 10: Network database design

Relational systems Relational databases employ a fundamentally different model for understanding data relationships than do either hierarchical or network database designs. The breakthrough achieved by this approach was to employ a mathematically precise (and therefore repeatable) relational algebra to link data listed in data tables in such a way that even unforeseen combinations of data could be reconstructed relatively easily from the database.
Simply put, relational databases completely separate data storage from data navigation. Relational designs focus exclusively on relating data in tables in a fashion that enables later processing to recombine small tables into larger ones that answer specific questions. The design insight here was that by separating data tables from (navigational) processes that combine data in tables, it would be easy to answer even questions that were not anticipated at the time the database was designed. In this sense, relational databases represented a further step in the modularization of database software: now data storage and data navigation could be separated from one another.
This advantage also pointed to the initial drawback of relational databases: to accomplish anything meaningful with the data required a very large amount of processing. Because of this, early relational databases were very slow. Once computer power evolved to the levels where the processing-intensive aspects of relational designs could be accomplished at speed, the relational model gained broad popularity across multiple hardware and software platforms. Chances are that if you hear the term "database" used to refer to production-strength data manipulation applications built in the past 5 to 7 years, the reference is to a relational database design. Most of the leading DBMS that were developed for use on minicomputers are based on the relational model, including Oracle and Informix. The major database packages used on PCs, including Paradox and Access, employ a relational perspective. For this reason we will concentrate in this course on relational designs.
Figure 11 shows the customer order example expressed as a relational database. The relational design breaks apart customer data into its smallest component parts, identifying each part with a unique identification (i.d.) number -- e.g., each customer name gets a customer number (ensuring, for example, that the database can distinguish between many different customers named Jones). Activities that occur within the organization can thereby be represented by combinations of i.d. numbers. For example, an order can be represented by the combination of a customer i.d. number and a product i.d. number (see the Orders Table below).
Questions that may not have been anticipated in the original design of the database can be readily answered by combining pieces of different tables. The table shown in panel (b) of the Figure 11 shows how much revenue is represented by backordered product. It was generated by using customer and product i.d.'s to combine data from all of the tables shown in panel (a).
Figure 11: Relational database design
Object-oriented databases
The most recent development in database modularization is object-oriented databases. Object-oriented systems attempt to use database structures to more directly reflect definitions used in business practice. For example, an object-oriented database may enable a designer to define an object called a "customer", and then define to the database that a customer has a specific list of attributes, e.g., a name, an address, a credit limit, and so forth. Similarly, a "product" object would have a name, a price, and other attributes. Building a system that understands attributes of the objects it represents can make the database easier to use. For example, a system that understands that orders include quantities of product objects, each of which has price attribute, can calculate projected revenues with very little instruction from the user. In fact, an object-oriented database could attach a "revenue" calculation to each individual order object, based upon that order's quantity and the relevant product price. Projecting revenues would not mean much more than summarizing these revenue calculations for all the order objects in the database.
Object-oriented systems have two other characteristics, aspects of which we will employ in the process modeling that we perform in this course. These characteristics are captured by concepts labeled inheritance and specialization.
Inheritance An object-oriented system can understand "generalized" characteristics of objects. For example, if a company looks at all its customers, it will realize that all of its customers have a name and an address. A generalized "customer" object would therefore have attributes for capturing both. The power of this representation begins to become evident when the system generates a new object that inherits the characteristics of a more general one. For example, consider that Lynn Wu becomes a new customer by placing an order with the company. A hierarchical database would need to add a new customer and order records for Wu. A relational database would add new data table entries for Wu, her address, her order, and any other relevant information. An object-oriented database would generate a new customer object for Wu and a new order object for her order.
The way in which an object-oriented system would accomplish this is described in Figure 12. The system would already know about attributes assigned to customers (e.g., name, address) and attributes assigned to orders (e.g., product, customer). In understanding Wu's order and customer characteristics, the system would generate a new customer object specifically for Wu that inherited the characteristics of the more general customer object whose characteristics it knows. It would do the same for Wu's order. Now the system would have a customer object and a Wu customer object whose attributes were inherited from the more general customer object. The same would be true for Wu's order: the system would already know many of the characteristics of an order because of its knowledge of the generic order object.
Inheritance is one aspect of object-oriented databases that make objects reusable. In this sense, information associated with a known customer object acts as a template for data organized about specific customers. The more sophisticated that object representations can become, the more definitional knowledge that can be reused, and the more useful the representation of real-world data captured by the database.
Specialization Inheritance accounts for data related to new objects when those objects' attributes are the same as attributes associated with a more generalized object in the database, but what about situations where objects are clearly similar but different? As a simple example, consider a customer in the United States (U.S.) vs. a customer in England (the U.K). Each customer will have an address that includes a shorthand location code. The address of the U.S. customer will include a zip code -- a number that will include at least five digits but possibly five digits followed by a dash followed by four more. The address of the U.K. customer will include a postal code, which will be represented by a collection of four or more numbers, characters, and spaces. These differences may seem trivial -- until the company needs to send out $3 million in refunds to 50,000 different addresses and needs to sort the letters by location code.
An object-oriented database would account for differences in location code by specializing an address object. An address object might include street, city, and location code information. A U.S. customer's address would inherit all of these attributes -- but would specialize to the extent that it would include a zip code as the location code. A U.K. customer would inherit address attributes but specialize the location code as a postal code. Based on this specialization, the system would know to use a zip code for any future U.S. customer and a postal code for any future U.K. customer (see Figure 12).
Figure 12: Object-oriented inheritance and specialization


Assuming sufficient computing power to represent and manipulate objects (an assumption that is only beginning to prove true on a production basis in the mid-1990s), object-oriented databases can provide broad libraries of reusable objects whose structures are relatively straightforward to understand and whose use of inheritance and specialization support both precision and flexibility.
It is likely that object-orientation will represent the next major shift in information system and database design. Object-oriented systems modularize both data and processing according to objects that represent real-world entities. This represents a further step beyond the separation of data structure from navigational structure that relational databases strive to achieve. We are currently seeing trends towards object-orientation occurring at many levels within software architectures, from operating systems (where every major operating system vendor is working on object-aware systems for release over the next 3-5 years), to program development (where "componentware" such as Visual Basic and object-oriented languages such as C++ are becoming the development and prototyping platforms of choice within large corporations), to user interfaces (on machines like the Macintosh and the Windows95 standard that represent programs and files as objects on the screen). At present, however, product-strength object-oriented databases are only just beginning to emerge as commercial products. In many respects, object-oriented databases are at a similar stage of development to that of relational databases ten years ago.
In this course, we will perform process analyses that rely greatly on object-oriented perspectives. The specific database development tools that we learn, however, will be geared to relational database designs. This mix represents a reasonable approximation of conditions existing within many large corporations in the mid-1990s.
To continue with this section, click here.
Browning, J. (1990). Information technology: The Ubiquitous Machine (Survey). The Economist: S1-S16.
McFadden, F. R. and J. A. Hoffer (1994). Modern Database Management.
Menlo Park, CA, Benjamin/Cummings.