7.12 Databases

Companies are required by law to keep proper records: transactions, employees, customers, expenditures, sales and financial information. Increasingly the records are kept in databases, which are software programs that store and manage data. A database management system (DBMS) or relational database management system (RDBMS) is the software application used to create, maintain, access and manipulate data. The most common of database management systems are D2 from IBM and some variety of the industry-standard SQL in Sybase, Oracle, Access, and other proprietary databases. Databases operating under the client-server model are called database servers, popular examples being SQL Server, MySQL and PostGreSQL.

Relational Databases

Modern databases are relational databases where data is stored in two-dimension tables arranged in rows and headings, known technically as records and fields.

Field Properties

Information is not simply entered into fields, but those fields are first specified by data type and properties. Some examples:

Property

Apply to Data Type

Specifies

field

size

alphanumeric

maximum number of characters

format

most types

appearance of data

input

mask

phone nos. zip codes,

social security no., etc.

entry in predefined

and validated format

decimal

places

numeric & currency

number of decimal places

default

most types

value filled in automatically

if not otherwise specified

null

most types

undefined value to which

special rules apply

required

most types

cannot be left empty

primary

key

alphanumeric

one or more fields as unique: every table must

have at least one, which allows for database normalization.

memory

size

OLE objects (Access)

maximum size of graphic, video or sound files

SQL

SQL or Structured Query Language is a database computer declarative language designed to manage data in relational database management systems (RDBMS), and was originally based upon relational algebra and tuple relational calculus. It allows for data insertion, query, update and deletion, for schema creation and modification, and for data access control.

Some Basic SQL commands:

Command

Operation

SELECT

retrieves data from one or more tables

FROM

indicates table(s) from which data is to be retrieved

WHERE

specifies what condition holds

GROUP BY

groups rows having common values into a smaller set of rows

HAVING

filters by common property

ORDER BY

specifies which field is used to order records (and whether

ascending or descending)

INSERT INTO

adds records to table

DELETE

deletes records from table.

JOIN

operates across two or more tables joined by the field specified

MERGE

combines tables

UPDATE

modifies data of several tables

COMMIT

makes data change permanent

ROLLBACK

returns table to state of last commit command

CREATE TABLE

creates table to the specifications included in the command

ALTER TABLE

modifies table in the way specified

DROP TABLE

deletes the table

GRANT

authorizes one or more users to perform

an operation or a set of operations.

REVOKE

eliminates a grant

An example:

SELECT Sales,
    COUNT(*) AS Customer
        FROM Sales
    JOIN Customer
        ON Customer.ID = Sales.Customer.ID
GROUP BY Customer;

Returns a list of customers and the number of purchases made by each, grouped by customer.

Normalization

Normalization is the process of efficiently organizing data in a database. It has two goals: to eliminate redundant data (e.g. storing the same data in more than one table) and to store only related data in a same table. Both goals reduce the amount of space a database consumes, ensures data is logically stored, and maximize operational efficiency.

Normalization proceeds by a series of rules or recommendations:

First Normal Form

First normal form (1NF) sets the very basic rules for an organized database:

1 Eliminate duplicated columns from the same table.
2. Create separate tables for each group of related data, and and identify each row with a unique column or set of columns (the primary key).

Second Normal Form

Second normal form (2NF) takes the removal of duplicated data further:

1. Meet all the requirements of the first normal form.
2. Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
3. Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form

The third normal form (3NF) goes one large step further:

1. Meet all the requirements of the second normal form.
2. Remove columns that are not dependent upon the primary key.

Fourth Normal Form

The fourth normal form (4NF) has one additional requirement:

1. Meet all the requirements of the third normal form.
2. Remove any multi-valued dependencies.

Though usually recommended, normalization can be restricted in these cases:

1. Where creation of vast numbers of separate tables actually slows down the operation of the database.
2. Very complicated databases where full normalization is just too difficult.
3. Where a 'quick and dirty' approach is more time efficient (as in developing early prototypes).

Data Migration

Data often has to be moved between databases of slightly different design (between different company divisions, or upgrading to more complex software.) Keying in data again is unthinkable in most cases, but small differences in the implementation of SQL between databases can cause serious problems. It's therefore wise to first check what software exists to facilitate data migration between different proprietary databases, ensuring what will be needed in the second database can indeed be stored in the first.

ODBC Connectivity

Data can be added to and retrieved from databases by most computer programs: those commonly used in websites include Perl, PHP, and Java Server Pages. A standard database access method is via ODBC (Open Database Connectivity), a snippet of code that makes the 'handshake' between computer program and the database concerned. {12}

Questions

1. What are relational databases, and why are they preferable to flat files for storing information?
2. Explain records, fields and data types.
3. What is Structured Query Language? Give some important commands relating to table creation, use and removal?
4. Explain what is meant by database normalization. Why is it undertaken?
5. Your company has switched from an Apache server to (Windows) SQL Server. Undertake a literature search to find the best way of migrating data from the original MySql database.

Sources and Further Reading

1. Databases for Beginners: A Gentle Introduction to Databases, SQL, and Microsoft Access by Mike Chapple. About.com. Short linked articles on the basics.
2. Getting started with Microsoft Access. University of Leeds. Clear introduction to basic RDBMS operations.
3. Relational databases. Wikipedia. Simple explanation of terminology.
4. Brief Summary of Popular Database Systems. Paragon Corporation. October 2006. Brief comparison of SQL Server, MySql and PostGreSQL database servers.
5. Beginner's Guide to Data Modelling. Database Answers. Extended article using Starbucks Coffee as an example.
6. Microsoft Access 2000 Bible by Cary N. Prague and Michael R. Irwin. IDG Books. Similar reference books exist for most database systems.
7. Comparison of relational database management systems. Wikipedia. Tables comparing RDBMS by release date, operating system support and selected features.
8. Structured Query Language (SQL). About.Com. Rather fragmentary but useful set of articles.
9. SQL. Wikipedia. Introduction to history and theory and problems, with examples and vender comparisons.
10. Client Server Databases. CfConf. Powerpoint presentation of key elements of server databases.
11. MySQL. Mysql. Details of and tutorials on the world's most popular open-source database server.
12. PHP Database ODBC. W3Schools. Example of PHP connection to MS Access database (others are similar).