Basic structures: tables and rows

Database tables, rows, keys

Representing data in rows

Each object, i.e., a real-world individual of a class (for example, each customer who does business with our enterprise), is represented by a row of information in a database table. The row is defined in the relational model as a tuple that is constructed over a given scheme. Mathematically, the tuple is a function that assigns a constant value to each attribute of the scheme from its corresponding attribute domain. Notice that because the scheme is a set of attributes, we could show them in any order without changing the meaning of the data in the row (tuple).

Customers row or tuple
A customer tuple. Other views of this diagram: Large image - Description (text)

In formal notation, we could show the assignments explicitly, where the identifier t represents a tuple: tTJ = ‹first_name := 'Tom', last_name := 'Jewett', phone := '714-555-1212', street := '10200 Slater', zipcode := '92708'›

Inserting rows

In practice, when we create a table row in SQL, we are actually making the assignment of domain values to attributes, just as in the tuple definition.

INSERT INTO customers(first_name, last_name, phone, street, zipcode)
  VALUES ('Tom', 'Jewett', '714-555-1212', '10200 Slater', '92708');
SQL statement to insert one row with values for all columns.

In SQL, you can omit the attribute names from the INSERT INTO statement, as long as you keep the comma-delimited list of values in exactly the same order as was used to create the table. This syntax is more prone to errors, so use sparingly. It is provided for completion. Removing the attribute names from the above statement, gives the SQL statement below.

INSERT INTO customers
  VALUES ('Tom', 'Jewett', '714-555-1212', '10200 Slater', '92708');
SQL statement to insert one row with values for all columns in the order specified when the table was created.

It is also possible to list only some of the attribute names, the ones we provide values for; in that case, the remaining attributes will be assigned NULL in such rows, assuming there is no constraint prohibiting NULLs in such columns. In the case of the customers table, it was created with NOT NULL constraint on the name and phone number columns, so non-NULL values for these must be inserted. On the other hand, there is no such constraint on the other columns, so NULL is allowed.

INSERT INTO customers(first_name, last_name, phone)
  VALUES ('Alvaro', 'Monge', '562-985-4671'),
         ('Wayne', 'Dick', '562-985-1190');
SQL statement to insert two rows, omitting values for some columns.

Updating rows

The UPDATE statement is used to modify the value for one or more attributes in the specified rows of a table. When we change the data in a table row using SQL, we are also following the tuple definition of assigning domain values to attributes.

The statement below will modify all rows in table customers that have a phone number of '714-555-1212' and change them to have the phone number specified in the SET clause.

UPDATE customers
  SET phone = '714-555-2323'
  WHERE phone = '714-555-1212';
SQL statement to update a phone number in the table.

Deleting rows

The SQL DELETE statement is used to delete rows

SQL Tables with constraints

A database table is simply a collection of zero or more rows. This follows from the relational model definition of a relation as a set of tuples over the same scheme. (The name “relational model” comes from the relation being the central object in this model.)

Customers table or relation
The customers database table or relation. Other views of this diagram: Large image - Description (text)

Knowing that the relation (table) is a set of tuples (rows) tells us more about this structure, as we saw with schemes and domains.

  • Each tuple/row is unique; there are no duplicates
  • Tuples/rows are unordered; we can display them in any order we like and the meaning doesn’t change. (SQL gives us the capability to control the display order.)
  • Tuples/rows may be included in a relation/table set only if they are constructed on the scheme of that relation; they are excluded otherwise. (It would make no sense to have an Order row in the Customers table.)
  • We can define subsets of the rows in a table by specifying criteria for inclusion in the subset. (Again, this is part of a SQL query.)
  • We can find the union, intersection, or difference between two subsets of rows as long as both are constructed over the same scheme.

Insuring unique rows

Since each tuple in a relation must be unique, no two tuples can have exactly the same values for every one of their attributes, that is, there can be no duplicate tuples in a relation. Unfortunately, the same cannot be said about SQL tables. A SQL table is bag (i.e., a multiset) of rows, unless constraints are placed on the table to ensure there be no duplicate rows. Thus, to implement a relation as a SQL table, there must be some set of attributes in each relation whose values, taken together, guarantee uniqueness of each row. Any set of attributes that can do this is called a super key (SK). By the definition of a relation as a set of tuples, the set of all attributes must be a super key. If such a set were not a super key, it would allow two or more identical tuples in the relation which would violate the definition of a set. Since super keys are constraints on the data, they must be true for any relation (table) of a relation scheme, thus these super keys are shown in the relation scheme diagram. A super key is our first database constraint, we will learn more of them throughout.

The set of all attributes in a relation scheme R is just one super key of that scheme, there can be and usually there are more. The other super keys are proper subsets of the relation scheme. Out of all these super keys, the database designer picks one to serve as the primary key (PK) of the relation. (Notice that the PK is a SK, but not all SKs are PKs, since only one is chosen as a PK!) The PK is sometimes also called a unique identifier for each row of the table. This is not an arbitrary choice—we’ll discuss it in detail on a later page. For our customers table, we’ll pick the set {first_name, last_name, phone}. We are likely to have at least two customers with the same first and last name, but it is very unlikely that they will both have the same phone number.

In SQL, we specify the primary key of a table with a data constraint that lists the attributes that form the PK. We also give the constraint a name that is easy for us to remember later (as is done below using “customers_pk”).

ALTER TABLE customers
  ADD CONSTRAINT customers_pk PRIMARY KEY (first_name, last_name, phone);
SQL statement to alter a table by adding a PK

Since every SQL table must have a PK in order for it to correctly represent a relation, it is best to avoid altering a table to add a PK constraint and instead specify such constraints at the time the table is created. In creating the customers table, you may have noticed the use of NOT NULL for some of the attributes. NULL is a special constant in database systems (as is the case in most programming languages) that means “this field doesn’t have any value assigned to it. ” It’s not the same as a zero length string or the number zero. Thus, NOT NULL is a database constraint that prevents rows from being inserted without a value for attributes with this constraint (similarly also prevents updates that would assign NULL to such attributes).

The statement below recreates the table with the required PK constraint. If you’ve previously created the table, you can use the DROP TABLE statement to delete the table from the database, including of course all data that may be in the table.

CREATE TABLE customers (
   first_name VARCHAR(20) NOT NULL,
   last_name  VARCHAR(20) NOT NULL,
   phone      VARCHAR(20) NOT NULL,
   street     VARCHAR(50),
   zipcode    VARCHAR(5),
   CONSTRAINT customers_pk PRIMARY KEY (first_name, last_name, phone));
SQL statement to create Customers table, including its PK.