Database Design

Once we have created tables and loaded them with data, we need to retrieve this data. This website introduces the fundamental retrieval statements from two languages, SQL and Relational Algebra.

Retrieving data from one table

Retrieval with SQL

In SQL, to retrieve data stored in our tables, we use the SELECT statement. The result of this statement is always in the form of a table that we can view with our database client software or use with programming languages to build dynamic web pages or desktop applications. While the result may look like a table, it is not stored in the database like the named tables are. The result of a SELECT statement can also be used as part of another statement.

Basic syntax of `SELECT` statement

The basic syntax consists of four clauses as shown in the figure below. While SQL is not case sensitive, by convention many database developers use uppercase for keywords to improve readability.

SELECT {attribute}+
  FROM {table}+
  [ WHERE {boolean predicate to pick rows} ]
  [ ORDER BY {attribute}+ ];

The four basic clauses of a SQL SELECT statement.

Of the four clauses, only the first two are required. The two shown in square brackets are optional. When you start learning to build queries, it is helpful to follow a specific step-by-step sequence, look at the data after each modification to the query, and be sure that you understand the results at each step. This iterative refinement will allow you to hone in on just the right SQL statement to retrieve the desired information. Below is a summary of the clauses.

The SELECT clause allows us to specify a comma-separated list of attribute names corresponding to the columns that are to be retrieved. You can use an asterisk character, *, to retrieve all the columns.
In queries where all the data is found in one table, the FROM clause is where we specify the name of the table from which to retrieve rows. In other articles we will use it to retrieve rows from multiple tables.
The WHERE clause is used to constrain which rows to retrieve. We do this by specifying a boolean predicate that compares the values of table columns to literal values or to other columns.
The ORDER BY clause gives us a way to order the display of the rows in the result of the statement.

The example of the next section provides more information on how to retrieve information using this SELECT statement.

SQL Example: customers in a specified zip code

We’ll build a list of customers who live in a specific zip code area, showing their first and last names and phone numbers and listing them in alphabetical order by last name. A company might want to do this to initiate a marketing campaign to customers in this area. In this example, we’ll use zip code 90840. Listed below are the refinement steps we take to arrive at the statement that will retrieve what we need.

Start by retrieving all of the relevant data; in this case, that is all data of every customer. In our database all of this is stored in only one table, so that table is specified in the FROM clause. Since we want to retrieve all columns from this table, instead of naming each of them individually, we can use the abbreviation symbol * to indicate that all columns are to be retrieved. That completes the recipe for our SQL statement which is shown below; note, we have no use for the two optional clauses in this initial statement. In the same figure below, you will also find the result of this query executed on a tiny database.
```
SELECT *
  FROM customers;
```
Customers
Tom Jewett 714-555-1212 10200 Slater 92708
Alvaro Monge 562-333-4141 2145 Main 90840
Wayne Dick 562-777-3030 1250 Bellflower 90840

SQL statement to retrieve all customers and the result set
While the result of a query is known as a result set, the result is not in fact always a set. The result could be a multiset, that is, a collection of rows that can have duplicate rows.
Clearly we need to a refinement step as the query retrieves all customers while we are only interested in customers who live in zip code 90840. We need to specify in the statement that the only rows to retrieve from the database are those that meet this criteria. Such qualifying criteria is specified in the WHERE clause using boolean expressions. Our first statement is thus refined as shown in the figure below.
```
SELECT *
  FROM customers
  WHERE cZipCode = '90840';
```
Customers in zip code 90840
Alvaro Monge 562-333-4141 2145 Main 90840
Wayne Dick 562-777-3030 1250 Bellflower 90840

Refinement #2 to retrieve desired customers.
Note that SQL syntax requires the use of single quotes around literal strings like '90840'. While not illustrated in this example and unlike SQL keywords, literal strings and strings stored in the database are case sensitive; thus, 'Long Beach' is a different string than 'long beach'.
We need just a couple of more refinements. While we now are retrieving only the customers we desire, we are also retrieving every column from the table yet, not all are needed. We need a way to pick the attributes (columns) we want. This is done by listing them in the SELECT clause, each column name separated by a comma. The figure below shows this refinement and its corresponding result set.
```
SELECT cLastName, cFirstName, cPhone
  FROM customers
  WHERE cZipCode = '90840';
```
Columns from SELECT
Monge Alvaro 562-333-4141
Dick Wayne 562-777-3030

Refinement #3 to retrieve specific columns.
Note that changing the order of the columns (like showing the last name first) does not change the meaning of the results.
For practical purposes our last refinement is all that we need. To make the result set more appealing to a human, we may want to order the result set. Imagine having a result set that is 100 times of what we are showing here! It would be better to display the result sorted alphabetically by the name of the customer. In SQL, you can use the ORDER BY clause to specify the order in which to retrieve the results. Once again, this ordering does not change the meaning of the results; the result set does not change, all it changes is the order in which the rows are displayed. This final refinement and its result are shown below.
```
SELECT cLastName, cFirstName, cPhone
  FROM customers
  WHERE cZipCode = '90840'
  ORDER BY cLastName ASC, cFirstName ASC;
```
Rows in order
Dick Wayne 562-777-3030
Monge Alvaro 562-333-4141

Refinement #4 to order the rows in the result.
The keyword ASC is used to order the rows in ascending values, which is the default ordering so the keyword is not necessary and is shown here for completeness. To order rows in descending values, use the keyword DESC. In the statement above, rows are first ordered in ascending value of the last name and in case of ties (two or more customers with the same name), then the rows are ordered in ascending value of the first name.

Customers

Tom	Jewett	714-555-1212	10200 Slater	92708
Alvaro	Monge	562-333-4141	2145 Main	90840
Wayne	Dick	562-777-3030	1250 Bellflower	90840

Customers in zip code 90840

Alvaro	Monge	562-333-4141	2145 Main	90840
Wayne	Dick	562-777-3030	1250 Bellflower	90840

Columns from SELECT

Monge	Alvaro	562-333-4141
Dick	Wayne	562-777-3030

Rows in order

Dick	Wayne	562-777-3030
Monge	Alvaro	562-333-4141

Retrieval with relational algebra

SQL is a declarative language. As such, SQL is used to declare what is to be retrieved from the database. In our SQL statement above, we did not specify how to retrieve the result. In an imperative language, we do specify the steps to take to solve a problem, such as how to retrieve a result from a database. Thus, it is the responsibility of the database system to determine how to retrieve what is declared in SQL. In relational database systems, this is commonly done by translating SQL into Relational Algebra.

Like all algebras, RA applies operators to operands to produce results of the same type as the operands. RA operands are relations and thus the results are also relations. Furthermore, like all algebras, the results of operators can be used as operands in building more complex expressions. We introduce two of the RA operators following the example and refinements above for SQL.

RA operators: σ and π

To retrieve a single relation in RA, we only need to use its name. The common notation in the relational model is to use uppercase letters for relation scheme (R, S, T, U, etc) and lowercase letters for relations (r, s, t, u, etc). Thus, the simplest RA expression is to retrieve all columns and every row of a relation is just the name of the relation: r

The two RA operators introduced here are σ, the select operator, and π, the project operator.

The select (RA) operator specified by the symbol σ picks tuples that satisfy a predicate; thus, serving a similar purpose as the SQL WHERE clause. This RA select operator σ is unary taking a single relation or RA expression as its operand. The predicate, θ, to specify which tuples are required is written as a subscript of the operator, giving the syntax of σ_θe, where e is a RA expression.
The scheme of the result of σ_θr is R—the same scheme we started with—since the entire tuple is selected, as long as the tuple satisfies the predicate. The result of this operation includes all tuples of relation r that satisfy the predicate θ—that is, θ evaluates to true.
The project (RA) operator specified by the symbol π picks attributes, confusingly like the SQL SELECT clause. It is also a unary operator that takes a single relation or expression as its operand and the attributes to retrieve are specified as a a subscheme, X (subset of its operand). The syntax is π_Xe where, as before, e is a RA expression. Following are additional properties of the project operator.
- For X to be a subscheme of R, it must be a subset of the attributes in R that preserves the assignment rule from R (that is, each attribute of X must have the same domain as its corresponding attribute in R).
- The scheme of the result of π_Xr is X. The tuples resulting from this operation are tuples of the original relation, r, but cut down to the attributes contained in X.
- If X is a super key of r, then there will be the same number of tuples in the result as there were to begin with in r. If X is not a super key of r, then any duplicate (non-distinct) tuples are eliminated from the result, ensuring the result is always a set. This is unlike SQL where the result of a SELECT statement with a WHERE clause is a multiset.
As with other algebras, we can use function composition by applying the project operator to the result of the select operator from the previous set to get: π_Xσ_θr

RA Example: customers in a specified zip code

Given the above RA syntax, we can now use RA to create expressions that match the SQL statements from above which retrieve the customers who live in zip code 90840.

The first step is to retrieve all customers. This is done by a RA expression that consists of just the name of the relation, thus the RA expression customers is the equivalent of the first SQL statement above. Its scheme is the same as the Customers scheme.
To retrieve the equivalent result set as the SQL statement in refinement #2, we apply the σ operator to the result set of our previous expression:
```
σ_{cZipCode='90840'}customers
```
Again, the scheme of the result set is the same as the Customers scheme.
Now, applying function composition here, we can retrieve just the columns we desire from the result set of the previous expression to get the RA expression that retrieves the equivalent result set as the SQL statement in refinement #3:
```
π_{cLastName, cFirstName, cPhone} σ_{cZipCode='90840'}customers
```
Note that, in RA the results of expressions are strictly sets of tuples, thus, there is no way to specify the order of tuples in a result set. This is unlike SQL and its ORDER BY caluse.

Basic queries: SQL and RA