Creating tables with related data

As described in the previous section, each customer of a bank may have more than one account. We describe the "relationship" between a person and her accounts as a "one to many" relationship — one person may have many accounts.

Because there is no limit to the number of accounts a person may have, there is no way to design a record structure ahead of time that can handle all possible combinations of accounts.

Relational databases, are designed to solve this problem. We will create one table for customers, and another table for accounts. (In a real bank, we would probably divide the accounts into multiple tables, too, with one table for checking accounts, another table for savings accounts, and so on.) Then we create a "link" between the customer and each of her accounts. This allows us to waste very little space and yet still have complete information available to us.

As we mentioned earlier, in our example of composers, every record should have a primary key that allows us to identify that record. It is usually just an integer. We will now use that unique integer to help us "relate" a customer to her accounts. Below are the commands to create and populate the customer table:

CREATE TABLE customers (id INTEGER PRIMARY KEY, name CHAR(20),
address CHAR(40));
INSERT INTO customers (id, name, address) VALUES (1, ’Smith’,
’123 Main Street’);
INSERT INTO customers (id, name, address) VALUES (2, ’Jones’,
’456 Fifth Avenue’);

We have inserted two customers, named Smith and Jones. Let us create the account table:

Here, we have designated the column customer_id to be a "foreign key" pointing to the customer table (this is indicated by the REFERENCES keyword). The value of this column is supposed to be exactly the same as the "id" value (the primary key) in the corresponding customer row in the "customers" table. This way we will associate account rows with customer rows. The feature of a database allowing to maintain such relationships in a reliable way is called "referential integrity", and the corresponding SQL syntax elements used to define such relationships are called "referential integrity constraints". For more on referential integrity, see Referential integrity.

Customer Smith has two accounts, and customer Jones has 1 account.

INSERT INTO accounts (id, balance, customer_id)
VALUES (1001, 200.00, 1);
INSERT INTO accounts (id, balance, customer_id)
VALUES (1002, 5000.00, 1);
INSERT INTO accounts (id, balance, customer_id)
VALUES (1003, 222.00, 2);

As Smith has two accounts, each of Smith's accounts has a 1 in the customer_id field. That means that a user can find all of Smith's account records by doing the following:

It's as though you taped a copy of your home telephone number onto the forehead of each of your children when they went to school. If there is an emergency and you need to send a taxi driver to find and pick up your children at school, you can simply tell the taxi driver your phone number and he can check every child in the school to see if the child has your phone number. (This isn't very efficient, but it works.) By knowing the parent's id number, you can identify all the children. Conversely, by knowing each child, you can identify the parent. If, for example, one of your children is lost on a field trip away from the school, any helpful person can simply read the telephone number off the child's forehead and call you.

As you can see, the parent and child are linked to each other without any sort of physical contact. Simply having the id number (or phone number) is enough to determine which children belong to a parent and which parent belongs to each child. The technique works regardless of how many children you have.

Relational databases use the same technique. Now that we've created our customer table and our accounts table, we can show each customer and each of the accounts that she has. To do this, we use what SQL programmers call a "join" operation. The WHERE clause in the SELECT statement "joins" those pairs of records where the account's customer_id number matches the customer's id number.

The output of this query is similar to the following:

If a person has multiple accounts, she might want to know the total amount of money that she has in all accounts. The computer can provide this information by using the following query:

The output of this query is similar to the following:

Note that this time, Smith appears only once, and she appears with the total amount of money in all her accounts.

This query uses the GROUP BY clause and an aggregate function named SUM(). The topic of GROUP BY clauses is more complex than we want to go into during this simple introduction to SQL. This query is just to give you a little taste of the type of useful work that SQL can do in a single statement. Getting the same result in a language like C would take many statements.

Note that join operations are not limited to two tables. It's possible to create joins with an almost arbitrary number of tables. As a realistic extension of our banking example, we might have another table, "checks", which holds information about each check written. Thus we would have not only a 1-to-many relationship from each customer to her accounts, but also a 1-to-many relationship from each checking account to all of the checks written on that account. It's quite possible to write a query that will list all the checks that a customer has written, even if that customer has multiple checking accounts.