Discussion: more about domains
You learned earlier that a domain is the set of legal values that can be assigned to an attribute. Each attribute in a database must have a well-defined domain; you can’t mix values from different domains in the same attribute. (See below for some examples.) One goal of database developers is to provide data integrity, part of which means insuring that the value entered in each field of a table is consistent with its attribute domain. Sometimes we can devise a validation rule to separate good from bad data; sometimes we can’t. Before you design the data type and input format for an attribute, you have to understand the characteristics of its domain.
Some domains can only be described with a general statement of what they contain. These are difficult or impossible to analyze precisely; the best we can do is to make them VARCHAR strings that are long enough to hold any expected value. The examples listed below provide more context for these domains.
- Names of people. We typically show these broken into First (which might include a middle name or initial) and Last (which is really the family name—some languages write this first). Depending on your application, you might have to add attributes for a courtesy title (Mr., Ms., Dr., etc.), a suffix (Jr., III, etc.) or a nickname ('Tom' for 'Thomas' and so on).
- Names of businesses or organizations. These typically fit into a single character field, and are not in the same domain as people’s names. Different domains require different attributes (which sometimes can even mean different class/entity types).
- Street addresses. Even the format of these can vary widely: '3201 Main St., Apt. 3', 'Sohnmattstraße 14', and 'P.O. Box 8259' are all valid address strings.
Domains with a pattern
Some domains have at least some pattern in their permitted values. These might be recognizable in code, for example with a regular expression, although it is still impossible to insure that every value that passes a validity check is actually correct. The examples listed below include three well-known attributes with domains that have such patterns.
- Email addresses. To be valid, an email address must contain a single @ sign that separates the user name from the server and domain names. It cannot contain any spaces. Unfortunately, that’s about all we can check.
- Web addresses (URLs). These form a different domain from email addresses or telephone numbers, no matter how tempting it might be to permit any of them to be entered in the same attribute field. Other than checking for spaces and other illegal characters, there’s again not much way to be sure a URL is valid.
- North American telephone numbers. Many databases, and most wireless phones, require numbers to be exactly ten digits—perhaps providing formatting such as (800)-555-1212. Problem: you can't store extension numbers, access codes, or other data that has to be transmitted along with the number itself. Solution: make this an unformatted character string long enough to hold all the information that is needed.
Domains with a precise pattern
Few domains conform to a precise pattern that can be analyzed or specified exactly. Two examples follow.
- United States Social Security numbers. These are always of the form 999-99-9999, where 9 represents any digit.
- United States Zip codes, which are always of the form 99999 or 99999-9999.
- United States state abbreviations are always of the form AA, where A is an upper-case letter; however, just checking the format won’t insure a valid abbreviation. There is a much better way to model this kind of domain, which we will explain separately.
Definitely NOT international postal codes or phone numbers. Don’t ever over-specify a domain or data entry field in any way that would prevent users from entering a valid real-life domain value.
- Easy domains to handle are those which can be specified by a well-defined, built-in system data type. These include integers, real numbers, and dates/times. You might have to range-check these data types to insure that realistic values are entered. In most systems, a boolean data type is also available; oddly, Oracle® doesn’t provide this. (Oracle developers typically use a CHAR(1) data type, and assign it values of 'T' or 'F'). See Oracle built-in data types for more information (Oracle 12 as of October 2018). Apache Derby, on the other hand, does support the BOOLEAN data type. See Apache Derby Data types for a full list of the data types supported by Apache Derby version 10.8 as of October 2018.
- Finally, there are many domains that may be specified by a well-defined, reasonably-sized set of constant values. We’ll look at these in a separate page.
- In general, your user interface should provide any necessary format or range checking. If done well, this can help the user with data entry, increase data integrity, and prevent the user from having to deal with cryptic and frustrating error messages from the database itself.
Consult DBMS documentation
While there is a SQL standard defined, DBMS vendors deviate from it. Thus, database developers must consult the DBMS vendor documentation to learn about the data types supported in the product being used. Wherever possible, data types supported across multiple vendors should be used as that makes the database more portable.
- In SQL, the characters
%(percent sign) and
_(underscore) are used to define patterns in strings. The percent sign matches any sequence of 0 or more characters, whereas the underscore matches one character. These are commonly used with the
LIKEoperator for string comparison.