Review of Entity-Relationship Modeling

@ Charles S. Osborn, Babson College 1996

RDB limits | ER vocabulary | 1:1 | 1:M | M:M | Summary | ER Checklist

One perspective on the trends in database design since 1950 emphasizes the growing realization of the importance of data structures as software design becomes increasingly modularized. File-oriented database systems were concerned with the structure of data that were kept in customized files intended only for predefined uses. Hierarchical and network database designs had to specify record structures that could adequately capture the meaning in corporate data so that managers and staff could complete appropriate aggregations and analyses. Relational databases faced the more difficult challenge of designing data tables that could recombine to answer future questions, not all of which could be anticipated by designers. Object-oriented databases were intended to mirror actual business conditions with sufficient fidelity that ostensibly unanticipated questions would prove largely anticipated by the structure of the objects in the database.

The development of data analysis tools has occurred parallel to the development of database designs themselves. As database designs evolved, analysis tools that helped designers understand the structure of the data that they were attempting to organize came to play an increasingly important role in the design process.

Of the many different approaches to database design available, one perspective has proven particularly useful for understanding the mix of data and navigational characteristics employed by relational databases. This technique, known as entity-relationship (ER) modeling, focuses on the characteristics of data used by an organization and the relationships between those characteristics. It provides effective assistance when translating high-level descriptions of the data generated by business processes into the low-level descriptions of data tables and data identifiers used by relational database systems.

ER modeling is a technique for producing so-call entity-relationship diagrams (ERDs). ERDs provide graphical techniques for organizing an analyst's understanding of the major data entities used by an organization, their characteristics, and the ways in which they might be deployed within a relational database. For the next section of this course we will concentrate on learning to build ERDs in order to understand how they can accelerate and facilitate the process of relational database design.

ER modeling dates from the mid-1970s, when database analysts were struggling to develop an integrated, generalizable view of data structures that would make it easier to design databases that could abstract data characteristics away from the needs of data navigation. Since then, ERDs have become well enough accepted as data-description tools to be built into many commercial computer-aided software engineering (CASE) tools, effectively becoming one of the main links between data conceptualization and database execution. If ER elements appear to you to be somewhat object-oriented, do not be surprised -- it can be argued that ER modeling represented one of the final stages in analytical thinking as trends in analysis evolved towards object-orientation.

ER models attempt to depict data objects and data relationships in two dimensions on a single page by using a vocabulary of symbols, labels, and lines. This vocabulary acts, in effect, as a table of contents for more detailed information about data characteristics, data tables, and data identifiers. ER analysis can compress what would require pages of narrative description into a small number of diagrams that attempt to depict data structures schematically.

In this course, we will use ERDs to assist in designing data tables in Microsoft Access. You may discover that newer versions of Access have a so-called Relationships window that mimics the final result of ER analysis, making the job of translating data analyses into specific table designs that much easier. The following discussion offers an introduction to ERDs; some database engineers spend their entire careers working within ERD vocabularies, and extensions of ER modeling can become much more complex that shown here. For a more complete discussion, see Chapter 4 of McFadden and Hoffer (1993:123-166).

Limitations of relational databases

The relational model of database design offers flexible mechanisms for representing data in the context of changing business conditions because it allows for recombining small data tables into larger ones. These transformations can be completed on an ad hoc basis for supporting business processes or answering management questions.

Unfortunately, the effectiveness of a relational database depends largely on the care with which the initial data tables that comprise the database are designed. If the data tables are incorrectly specified, the "relational" model no longer works within the database and navigational problems rapidly surface that recall the difficulties of old-style hierarchical database design.

One key issue in relational database design focuses on how many data tables to build. Build too few and you build a so-called "flat-file" database that cannot deliver the advantages of relational design; build too many and you develop so many data tables that the database becomes helplessly confusing. Build the wrong tables and you can generate both problems.

ERDs offers a mechanism for understanding what and how many data tables to use for database prototypes. On a complex (i.e., real-world) problem, they can assist in simplifying the task of data analysis by

The following discussion introduces the vocabulary of ER modeling and suggests how to use ERDs to accomplish each of the above three objectives: i.e., to identify candidate keys, to manage repeating groups, and to handle many-to-many relationships.

ER symbols

ERDs use a number of symbols (representing data entities, their attributes, and relationships) connected by lines (representing characteristics of those relationships). We will use a subset of these symbols in this course. A more complete set can be found in McFadden and Hoffer (1993:125). Figure 11 shows the major components of an ER diagram as we will use them. These components include entities, entity attributes, relationships, and relationship types.

An "entity" represents something about which you wish to save data. Academic descriptions are more precise (and perhaps a bit overblown): McFadden and Hoffer define an entity as "a person, place, object, event, or concept in the user environment about which the organization wishes to maintain data" (1993:126). In ERDs, entities are normally represented by boxes with one-word labels in them.

Entities have attributes that represent the entity characteristics that you wish to capture in the database. Entities are related to one another in identifiable ways that describe the links that need to be maintained between tables in a relational database. The combination of entities, attributes, and relationships assists in (a) defining what data tables and relationships you should define for your database, and (b) what further tables and relationships you should identify to compensate for limitations within relational database design. The latter use for ERDs suggests (1) when a set of attributes should be included in its own data table, and (b) when a new data table needs to be introduced in order to adjust for many-to-many relationships (which relational databases have difficulty representing). Entities, attributes, and relationships assist in specifying relational database fields, records, and tables, as shown in Table 2. Figure 13 summarizes the vocabulary of symbols used by entity-relationship modeling.

Table 2: ERD vocabulary and relational database components

ER model component Relational database component or indication
Entity Names a data table
Attributes Describes data table's fields
Repeating groups Identify a need to build a new data table (by specifying a new entity)
Relationships Describe links between data tables, based on key fields
Many-to-many relationships Identify relationships that must be "broken" by the addition of a new, "linking" data table

Figure 13: Entities, Attributes, Relationships, and Kinds of Relationships in ERDs

For example, in the relational database we discussed above that tracked customers, orders, products, and prices, both Customer and Product are entities (as are Orders, Backorders, and Prices). Attributes shown for Customers are Customer Number and Customer Name. Attributes shown for Products are Product Number and Product Name (see Figure 11 and Figure 14 below).

Figure 14: Examples of Entities and Repeating Groups

Repeating groups represent one way in which ERDs assist in converting data structures into working relational databases. As we will see in the exercises we complete over the next several weeks, repeating groups actually signal a hidden entity -- one which is related to the current entity by a one-to-many relationship. For example, where each customer has more than one telephone number, the hidden entity is telephone numbers: this suggests the addition of a telephone number table to the database. The relationship linking the tables would be labeled "has". There would remain one line in the customer table, identified by customer i.d. number, and potentially several lines in the Telephone Numbers table, each identified by the same customer i.d. The relational database would process both tables, matching i.d. numbers to reassemble the telephone numbers for any one customer on an ad hoc basis. This change is shown in Figure 15.

Figure 15: Extending repeating groups into one-to-many relationships

The manner in which an ERD would handle this transformation is shown below in Figure 16. Panel (b) of this figure illustrates a one-to-many relationship specified between a customer entity and a telephone entity. One implementation of this relationship, translated into data tables and key fields, is illustrated in Figure 15 above.

ERDs capture three key relationships between data entities. These relationships translate directly into specific types of links between data tables in a relational database. We have described these relationships as one-to-one, one-to-many, and many-to-many relationships.

Figure 16: Examples of relationships

One-to-one relationships

A one-to-one relationship, for example, suggests that for each unique key field in one data table there will be one and only one row in another data table that includes that key field value. The example in panel (a) of Figure 16 suggests that for each employee identifier in a data table named Employee, there will be one and only one row in a data table named Parking Place that contains that employee identifier:

Figure 17: Implementing a one-to-one relationship

One-to-many relationships

For a one-to-many relationship, the data table for the entity on the "one" side of the relationship will have a single uniquely identified row whose identifier appears in more than one row of the data table representing the entity on the "many" side of the relationship (see Figure 15).

Many-to-many relationships

A many-to-many relationship encounters one of the limitations of relational database design. Where a relational database encounters non-unique identifies (e.g., the same customer i.d. occurs on multiple rows of both tables being compared), it has no way to make valid comparisons between the tables. In essence, a relational DBMS can efficiency search one-to-many relationships but becomes confused by many-to-many relationships.

ERDs assist in resolving this problem by identifying where many-to-many relationships exist, then offering guidance on how to convert a many-to-many relationship into two one-to-many relationships by adding a new, or "linking" table.

Panel (c) of Figure 16 illustrates a many-to-many relationship represented by customers placing orders for products. In the real world, a customer may order one or more products, and one product may be ordered by one or more customers. Most DBMS software cannot handle such a relationship gracefully (although human beings have little difficulty understanding it).

The solution is to convert the many-to-many relationship into two one-to-many relationships by adding a third table. In the example used in Figure 16, this means adding a third table (call it Orders) that links both customers and products in a one-to-many fashion: each customer will have one or more entries in the Orders table, and each product may have one or more entries in it, but each customer still only appears once in the Customer table and each product still only appears once in the Product table. In this sense resolving the many-to-many relationship with a table that can manage two one-to-many relationships represents a reasonably elegant way of solving the design problem. ERDs can assist in identifying and resolving many-to-many links in this manner.

Figure 18: Linking tables resolve many-to-many relationships

Summary

In summary, ER modeling can contribute to accelerating and supporting your data structure analyses in at least the following ways:

Checklist for completing ERDs

As a final summary, the following list suggests ways in which to use ERDs to support database design. It is presented as a chronological set of steps to consider when designing a database from scratch.

Table 3: Checklist for using ER modeling to accelerate database prototype implementation

Step Conceptual description ERD mechanics description
1. Name a set of candidate entities that map the scope of the database. What labels do you put in the boxes? These will be names for your data tables.
2. Consider entity attributes. What bubbles go around the boxes? These will be field names for (e.g., the columns in) your data tables.
3. Consider candidate key fields. What bubble label(s) get underlined? These will be key fields for your data tables.
4. Consider relationships (you may want to do this step immediately after naming entities). What lines do you draw between the boxes? What labels to you give the lines? You will use these connections when you specify relationships between your data tables.
5. Consider attributes that represent repeating groups. Should any of the bubbles have a double line for its outline? If so, think about placing them in a separate data table.
6. Extend repeating groups into additional entities. Label a new box for the new entity and consider its attributes. This will add a new data table, with a one-to-many link.
7. Consider potential many-to-many links. Look for lines that have double crow's-feet connecting their entity boxes, e.g.,

8. Resolve many-to-many relationships by adding linking entities. Add a new box that can break the M:M relationship into two 1:M relationships. This will be a new data table, with two new relationships to specify: