Normal Forms: database schema with ``good'' properties
Goal: avoid redundancies and anomalies
Each normal form subsumes the level below it:
Boyce-Codd Normal Form (BCNF)
Third Normal Form (3NF)
Second Normal Form (2NF)
First Normal Form (1NF)
Design: First Normal Form [229]
Requires that each attribute consists of indivisible values, not sets or tuples of values from a more elementary domain or domains.
Arrays are not allowed in first normal form.
All the attributes within R must be flat.
| NAME | BIRTHDATE |
| allen | june 7,1949 |
| alfred | march 31,1933 |
| NAME | BMONTH | BDAY | BYEAR |
| allen | june | 7 | 1949 |
| alfred | march | 31 | 1933 |
| NAME | SEX |
| {john,joe,ivan} | male |
| {mary,marie} | female |
| NAME | SEX |
| john | male |
| joe | male |
| ivan | male |
| mary | female |
| marie | female |
Design: Second Normal Form [230]
Every nonkey column depends ENTIRELY on the primary key.
| A | B | C | D |
Key: AB
But if
, then not in 2NF
| Course | StudentID | Grade | Name |
Key: Course, StudentID
But StudentID
Name
This would be better as:
| Course | StudentID | Grade |
| StudentID | Name |
Eliminates redundancy: Name not repeated
Eliminates anomalies: Student can have a Name without Grade
This is in 3NF. Note that the relations just follow the FD's.
Design: Third Normal Form [231]
2NF and no nonkey column determines any other nonkey column
| A | B | C | D |
Key: A
But if
, then not in 3NF
Exception: if
is prime (belongs to some candidate), then 3NF
| Course | InstructorID | Name | Time |
Key: Course
But InstructorID
Name
This would be better as
| Course | InstructorID | Time |
| InstructorID | Name |
Eliminates redundancy: Name not repeated
Eliminates anomalies: Instructor can have a Name without a Course
This is in 3NF. Note that the relations just follow the FD's.
Design: 1NF but not 2NF [232]
Not 1NF: NotFirst(S#, STATUS-CITY, P#, QTY)
Given: First(S#, STATUS, CITY, P#, QTY)
Key: S# P#
Note: CITY always indicates the STATUS here.
Redundancy: same CITY implies same STATUS
This is not in 2NF because a subkey (S#) determines a nonkey.
Design: 2NF but not 3NF [233]
Second(S#, STATUS, CITY) and SP(S#, P#, QTY)
This is in 2NF because a subkey does not determine a nonkey.
This is not 3NF because a nonkey determines a nonkey.
The redundancy is still there.
Design: 3NF [234]
ThirdA(S#, CITY), ThirdB(CITY, STATUS), SP(S#, P#, QTY)
This is in 3NF because there are only key
nonkey.
It is also in BCNF.
Design: BCNF [235]
3NF: Candidate Key
X or X is prime
BCNF: Candidate Key
X
Two candidate keys, both composites, and overlap
| A | B | C | D |
Candidate Keys: AC and BC
B is not determined by a candidate key but it is prime
Is it 3NF: Yes, because or X is prime
Is it BCNF: No
BCNF: every FD has a candidate key as a determinant (lhs)
Design: BCNF [236]
| S# | SNAME | P# | QTY |
S# P#
SNAME QTY
SNAME P#
S# QTY
S#
SNAME (and SNAME
S#)
This would be better as
| S# | SNAME |
| S# | P# | QTY |
Or as
| S# | SNAME |
| SNAME | P# | QTY |
Design: 3NF Decomposition Example [237]
| SS | NAME | CITY | ZIP |
| 123 | bill | hayward | 94542 |
| 124 | joe | hayward | 94542 |
This is NOT 3NF because
Decompose by finding a Minimal Cover: second one is redundant
Rule: make 3NF by making a relation out of each FD.
So 3NF is (SS,NAME), (SS,ZIP), (ZIP,CITY)
But better as
| SS | NAME | ZIP |
| 123 | bill | 94542 |
| 124 | joe | 94542 |
| ZIP | CITY |
| 94542 | hayward |
But does a ZIP ever change to a new CITY?
How do you think the University stores your ZIP code?
Design: 3NF Decomposition Example [238]
| FLIGHT | DAY | PILOTID | NAME |
| 112 | 6jun | 31174 | bosley |
| 112 | 7jun | 30046 | brooks |
| 203 | 9jun | 31174 | bosley |
This is not in 3NF. The
is redundant
A 3NF decomposition:
| FLIGHT | DAY | PILOTID | |
| 112 | 6jun | 31174 | |
| 112 | 7jun | 30046 | |
| 203 | 9jun | 31174 |
| PILOTID | NAME |
| 31174 | bosley |
| 30046 | brooks |