An early method for developing data-based systems involved considering the generalized structure of the data that had to be captured and maintained by computer software. Several methods were developed for this purpose; one of the most successful is referred to as entity-relationship analysis. This technique produces entity-relationship diagrams (ERDs) that summarize, often on a single page, characteristics of the data that make it easier to design a database that proves useful.
ERDs are particularly relevant for our purposes because they facilitate the design of tables and links between tables that we would use in a relational database (e.g., such as Access). One quick way to translate between ERDs and a commonsense perspective on your database design problem is to remember two approximations that translate between data structure analysis and actual database design:
We will discuss this terminology more in class, and will exercise the idea of key fields - e.g., primary keys, foreign keys, and candidate keys - sufficiently to eliminate any questions you might have about these concepts.
This reading takes simple examples of entity-relationship diagrams (ERDs) and uses them to develop data tables and relationships that work effectively within Microsoft Access. It is adapted from an example presented on pages 104-115 and page 156 of McFadden, F. R. and J. A. Hoffer (1994) Modern Database Management Menlo Park, CA, Benjamin/Cummings.
Mountain View Hospital (MVH) is a small 100-bed community hospital that is trying to deliver high-quality health care while controlling costs within a service area that includes a section of central Colorado. The population of the service area is growing rapidly (at greater than 8% per year) as more people move to the area, and MVH is attempting to upgrade its information infrastructure to handle a greater patient load without a greater increase in costs. The hospital is expecting to increase by 50 beds in the next two years, and has placed money in its strategic investments fund to support the development of new information systems to support its internal medicine and surgical care divisions. Medicine accounts for 60% of the hospital's patient load and the Surgery service accounts for most of the rest.
The hospital is particularly interested in the use of information systems for supporting patient tracking. Hospital management is acutely aware that regulatory bodies are becoming increasingly concerned with understanding the quality of care on a case-by-case basis, and recent news stories about process-oriented mishaps at a community hospital in Florida (including the amputation of the wrong leg on one patient) have given both MVH administrators and physicians new motivation to collaborate on a goal best described as "that will never happen here".
The patient tracking system raises questions about how best to follow patients through the cycle of their relationship with the hospital. It is clear that the hospital wants to collect data on patients and their experiences with MVH medical care as they interact with hospital administrative departments and MVH physicians. The specifics of this data collection are less obvious, however.
In this example, we will use ERDs to suggest one way of analyzing the data that might be collected about an MVH patient. We will use the same sequence of steps suggested at the end of the prior reading that discussed performing entity-relationship analysis. The table that suggested this sequence is reproduced below (see Table 1).
Table 1: Suggested sequence for entity-relationship analysis
| 1. | Name a set of candidate entities that map the scope of the database. | What labels do you put in the boxes? These will be names for your data tables. |
| 2. | Consider entity attributes. | What bubbles go around the boxes? These will be field names for (e.g., the columns in) your data tables. Alternatively, what list of attributes do you want to include for each box? |
| 3. | Consider candidate key fields. | What bubble label(s) get underlined (or bolded, if you wish)? These will be key fields for your data tables. |
| 4. | Consider relationships (you may want to do this step immediately after naming entities). | What lines do you draw between the boxes? What labels to you give the lines? You will use these connections when you specify relationships between your data tables. |
| 5. | Consider attributes that represent repeating groups. | Should any of the bubbles have a double line for its outline (or are represented by italics in our shorthand version)? If so, think about placing them in a separate data table. |
| 6. | Extend repeating groups into additional entities. | Label a new box for the new entity and consider its attributes. This will add a new data table, with a one-to-many link. |
| 7. | Consider potential many-to-many links. | Look for lines that have double crow's-feet connecting their entity boxes, e.g.,
![]() |
| 8. | Resolve many-to-many relationships by adding linking entities. | Add a new box that can break the M:M relationship into two 1:M relationships. This will be a new data table, with two new relationships to specify:
|
The first step suggested in Table 1 is to name the entities about which you want to collect data. For MVH, this obviously includes each patient, (which can generalize to a Patient entity), but it also includes the "things" with which a patient interacts. For example a patient may be billed for a procedure, may be assigned to a hospital room, and may be seen by one or more physicians at specific times during his or her stay at MVH. During any of those procedures or visits, a patient may use one or more kinds of consumable supplies. These range from items such as bed linens in patient rooms to medications employed during surgical procedures.
The "boxes" of an ERD diagram provide a rapid way to assemble a possible description of these entities (see Figure 1). At this stage, this description should not be considered a definitive list of the entities to be considered, but it can rapidly generate an initial version of real-world entities that a database system might need to understand in order to support processes within the hospital.
Figure 1: Candidate entities for MVH
Having a list of entities makes it possible to consider the attributes of each entity in turn. The next step in Table 1 suggests reviewing each suggested entity in turn to understand which of its characteristics the hospital might want to collect information on. A preliminary list of such characteristics is shown below in Figure 2. Information on patients includes contact information; information on physicians includes their medical specialty; information on services includes a relevant description and a price, and information on supplies includes appropriate costs.
Figure 2: Entity attributes
Note that some of the attributes are underlined. These represent characteristics that could be used as key fields in a relational database. Key fields represent the data items that provide the single "unique identifier" for each row of a data table in a relational design. In this case, for example, Patient # represents a single number that could consistently identify one and only one patient. Social security numbers (a single number that identifies each U.S. citizen) are often used for this purpose. Since relational databases use key fields to combine tables in appropriate ways, selecting keys is an important step in database design. ERDs assist by making it easy to concentrate on each entity's possible key fields in turn. Sometimes there are obvious options for keys: a room number for the Room table, for example. Sometimes an organization will develop a complex system of identifiers to track materials or supplies (many manufacturing organizations use a complex system of part numbers for this purpose, and so many retail organizations key inventory by stock keeping units, or "skus" (pronounced "skews") that the term has become part of the language of warehousing). In this instance, the Consumables # (or Consum. #) shown above is probably an identifier that is generated and used internally by the hospital.
Note also that the Telephone attribute for the Patient and Physician entity is printed in italics. In the notation we are using in this discussion, italics refer to a repeating group. Upon examination of the attributes of a patient, for example, it becomes clear that each patient may have more than one telephone number. The same is true for physicians: one doctor may have many numbers (e.g., an office number, a home number, a pager number, a mobile phone number, etc.). Since relational databases have difficulty handling repeating groups within a single data table, this suggests that it might be worthwhile to add a new entity to the diagram, called Telephone, to represent the many possible telephone numbers that might be recorded for each patient. Telephone would describe a data table that captured information on telephone numbers. It is interesting to note that the same entity might be able to capture information about physicians' telephone numbers as well. The final representation of such an entity appears later, following the discussion of relationships.
The reason for drawing entity boxes on a page as suggested by the foregoing figures is to facilitate a focus on relationships between entities. These relationships will suggest how the data tables that implement suggested entities within a relational database will be linked. Figure 3 suggests some of the possible relationships that may exist among the entities identified for MVH.
Figure 3: Relationships between entities
Represent relationships between entities by drawing lines between the boxes; the "crow's feet" at the end of the lines suggest what type of relationship the picture represents. We will focus on three types of possible relationships: one-to-many, many-to-one, and many-to-many. One to many is reasonably easy to understand: one physician may have many telephone numbers. Many-to-many relationships (like those shown above in Figure 3) are sometimes more complex. For example, the schematic suggests a many-to-many relationship between physicians and patients. This occurs because each physician may take care of many patients but many patients may, over the course of their stay at MVH, come under the care of more than one physician (in this sense, whatever is not "one" is "many"). In the same sense, each patient receives many services from the hospital during the course of treatment, but each service may be delivered to many patients over time.
Many-to-many relationships pose a special problem for relational databases because the unique identifiers that such database designs impose can only join data on a one-to-many basis. This means that any many-to-many relationship needs to converted to several related one-to-many relationships as part of database design. ERDs assist this effort by making it possible to isolate specific relationships, then to use entity attributes to understand how such conversions can occur.
To understand this better, consider the patient/service relationship suggested above. A traditional ER diagram might represent that relationship (in simple form) as shown in Figure 4:
Figure 4: The Patient/Service relationship example
This picture suggests, at very least, a Patients table and a Services table (derived from the Patient and Service entities). The Patients table might have fields like ID, First Name, Last Name, address & phone. the Services table might have fields like Service ID, Description, and Price.
If we were to move directly to representing this ERD using data tables, a preliminary tables list might include the following tables and fields. Patient ID is the unique identifier for the Patients table and Service ID is the unique identifier for the Services table.
Figure 5: Patients and Services tables
The ER diagram suggests that one service may have zero to many patients and each patient uses from one to many services. This looks like a problem, i.e., a many-to-many relationship. It would be difficult to build relationships between the two tables without inserting hierarchical meaning within one table (e.g., suggesting that patient 300 gets both services 30 and 20). This might work for some applications, but would mean that the database no longer fit the relational model.
To convert the many-to-many relationship to two one-to-many relationships, consider adding a third table, e.g. a transactions table, that links patients with services on the basis of two one-to-many relationships.
Figure 6: Linking tables break many-to-many relationship conflicts
Note how adding a new entity (called transactions) to represent the link across the many-to-many relationship enables a relational database to maintain its unique-identifier structure while capturing information about specific transactions, even when more than one service if offered to more than patient or many patients receive many services.
Notice that the Patient ID (P#) and the Service ID (S#) are in the Transactions table, and provide a way of connecting from any transaction back to its relevant patient and service(s). Access builds all of its relationships between tables based on primary keys in one table (e.g., Patient ID or P# in the Patients table) that are present as a "foreign" key in another table (e.g., Patient ID or P# in the Transactions table). Where the program finds a match across inter-table relationships, it can build accurate connections between individual records.
Please note that you can sketch tables and links between tables very quickly on a piece of paper; the effort will help immensely when you are in the midst of using Access to define a database.
Figure 7: Patients, Services, and Transactions tables: resolving a many-to-many link
Using Microsoft Access, it is possible to define relationships between data tables, using the Tools Relationships menu command, by dragging and dropping fields between tables much as suggested by Figure 7. Using this approach, it is possible to translate relatively directly from an ERD to a database design. Figure 8, for example, shows a relatively complete ERD for the MVH example that adds attributes and relationships to the initial entity list suggested by Figure 1. The Telephone entity is included to handle the repeating group implied by telephone numbers. Each many-to-many relationship is converted to two one-to-many relationships through the use of a linking table (shown in Figure 8 using a vertical bar to represent the "one" and the infinity symbol to represent the "many" side of the relationship).
Figure 8: ERD showing entities, attributes, and relationships
Figure 9: Relationships window implementing MVH design in Access
If you turned Figure 8 on its side, you would see a picture similar to that shown in Figure 9. This is a screenshot taken of the Relationships window in an Access database that implemented the data structure design suggested in this discussion. It shows the results of defining Access tables with the primary keys and attributes shown in Figure 8, including the four linking tables (Supplies Consumed, Room Assignments, Transactions, and Visits) that converted many-to-many links into a more manageable form. The Tools Relationships command was then used to specify links (one-to-many links) between each appropriate table. All queries and forms subsequently defined within this database will automatically know how to organize data between the data tables based on these relationships.
This simple example will, I hope, clarify questions that you have
about ERDs - especially if you have not encountered data structure
analysis in earlier courses or work experience. As we use the
technique in this course, it is intended as a rapid way to identify
key elements of database design relating to the constraints of
the relational data model, including entities, relationships,
key fields and repeating groups. Each of these elements translates
directly into important conversions that you might need to accomplish
for prototype databases that you design using sofware such as
Access. Indeed, the Access interface tries hard to provide ways
to facilitate such a translation -- so much so that a well-designed
Relationships window within a working Access database will bear
a one-to-one correspondence with the ERD that you developed to
understand relevant data structures while designing the database.