Basic structures: classes and schemes

Objects/things modeled by UML classes

Databases model enterprises where we find objects or things crucial to the how the enterprise works. This article introduces how to model such things or objects in UML using classes and shows how to implement them in the relational model.

Introduction

A database has information that is representative of the real world. When we model aspects of the real world, we find that we need to represent a physical “thing” or simply a fact or an event that happens.

UML is used in modeling complex object oriented software systems and thus the language is thorough. In this tutorial we use only a subset of UML; here, we introduce a UML class. A UML class (ER term: entity type) is used to model any “thing” in the enterprise that is to be represented in our database. As mentioned, it could be a physical “thing” (e.g., a store, an animal, etc) or simply a fact about the enterprise or an event that happens in the real world (e.g., the lending of a book). We are interested in the following important aspects of a UML class.

name
the name of the class is nearly always a noun as it represents a thing or fact.
description
The first step in modeling a class is to describe it in natural language. This helps us to know exactly what this class means in the enterprise, as two class diagrams could use the same class name and yet represent different things.
attributes
Each class is uniquely defined by its set of attributes (UML and ER), also called properties in some OO languages. Each attribute is one piece of information that characterizes each member of this class in the database. Together, they provide the structure for our objects (entities in ER) or database tables. In UML, we will only identify descriptive attributes— those which actually provide real-world information relevant to the enterprise about the objects being modeled; these are sometimes called natural attributes. We will not add “ID numbers” or similar attributes that a database developer may introduce only for the purpose of the internal implementation of the database; often, these are artificially introduced for efficiency or simplicity. As these are not natural nor descriptive, they are excluded as attributes in the UML class.

Example: Customers

Throughout this tutorial, we’ll build a sales database — it could be for any kind of business. To sell anything, we need to model the customers of our business, so Customer will be our first class (entity type). After meetings and discussions with the managers/owners of the business, we might agree on the following description of a customer.

A customer is any person who has done business with us or who we think might do business with us in the future. We need to know this person’s name, phone number and address in order to contact him or her.

Class diagram

The UML notation for a class is a three-element rectangle that shows the class name (always a singular noun) at the top of the rectangle, its list of attributes with a data type in the mid section, and a list of methods at the bottom. Since the classes used in this tutorial are modeling information and not behavior or actions that objects can take, the diagrams will exclude the third section listing methods. In addition, for the sake of brevity and clarity, some of the diagrams may not list the attributes' data types and readers are encouraged to add the appropriate data types.

Example: Customer class diagram

Customer class diagram
Customer class representing customers of a business by their name and some contact information. Other views of this diagram: Large image - Data dictionary (text)

Relation scheme

In an object oriented (OO) programming language, an object is the instance of a class. Methods can be invoked on an object to inspect and manipulate its properties. Unlike in OO systems, traditional relational databases store information as records of data, not as objects. Thus, for now, we are interested in learning how to model the attributes of a class without being concerned with the methods, thus explaining why these were excluded from the class diagram.

In building a relational database, each class is first translated (mapped) into a relational model scheme. The scheme is identified by the plural form of the class name, and lists all of the attributes in the class diagram.

In the relational model, a relation scheme is defined as a set of attributes, together with an assignment rule that associates each attribute with a set of legal values that may be assigned to it. These values are called the domain of the attribute. Note that, a domain is much more than a data type. A scheme can be shown graphically or in set notation.

Example: Customers relation scheme

The mapping of the UML class diagram representing customers of a company results in the Customers relation scheme, given in the figure below.

Customer relation scheme
Customer relation scheme resulting from its corresponding class. Other views of this diagram: Large image - Data dictionary (text)

Using set notation to represent this relation scheme gives the following:

Customers = {firstName, lastName, phone, street, zipcode}

There is no convenient graphical way to represent domains; we’ll discuss this issue in a later page. For the moment, our Customers relation scheme looks exactly like the Customer class diagram, only drawn sideways. However, it won’t stay that way for long as there are still more concepts to introduce.

Everything is a set

It’s important to recognize that defining schemes or domains as sets of elements automatically tells us a lot more about them, from the mathematical properties of sets.

  • A set cannot contain duplicate elements.
  • The elements in a set are unordered.
  • Given a set, rules can be developed to determine when an element can be included or excluded from it.
  • We can define subsets of them; for example, we can display only a selected set of attributes from a scheme, or we can limit the domain of an attribute to a specific range of values.
  • They may be manipulated with the usual set operators: union, intersection, etc. For example, given a set of customers from one company and a second set of customers from a different company, the intersection of these gives the set of customers in common to both companies.

The properties of sets discussed above can be applied to the Customers example.

  • The Customers scheme cannot have two phone attributes (even if they are called phone1 and phone2).
  • It doesn't matter if a customer's last name is listed at the start or at the end of the relation scheme.
  • The rules that include/exclude elements from a domain tell us that zip codes don’t belong in the domain (set) of phone numbers, and vice-versa. Note that both may be defined with a data type of string, yet, they clearly have different rules.

Table structure in SQL

When we actually build the database, each relation scheme becomes the structure for one table. The SQL syntax for creating the table includes a data type for each attribute, which is needed for the database but a data type is not the same as the domain of the attribute.

Example: customers table

CREATE TABLE customers (
   first_name VARCHAR(20) NOT NULL,
   last_name  VARCHAR(20) NOT NULL,
   phone      VARCHAR(20) NOT NULL,
   street     VARCHAR(50),
   zipcode    VARCHAR(5));

In this example, VARCHAR is a datatype in SQL databases that state the values of the column are variable-length character string of no more than the number of characters in parentheses. Consult your own system documentation for supported data types. We will explain the extra keyword NOT NULL when we look at rows and tables.