U Me N CS: DBMS- Normalization

Tuesday, May 22, 2012

DBMS- Normalization

Posted by umencs

Functional Dependency

Consider a relation R defined over a set of attributes (A1,A2,…..An) and let X and Y be subset or equal to (A1,A2,……...An), then
X ---> Y
Y is functionally dependent on X if and only ,whenever two tuples in R agree on their X value ,they also agree on their Y value . i.e if each X value in (A1,A2,…..An) has associated with it one Y value in (A1,A2,……..An)

A functional dependency of the form
X ---> Y
where Y subset or equal to X is said to be trivial .

Candidate Key

Consider a relation R on set of n-attributes, then a candidate key K is a set of one or more fields if and only if it satisfies following :

Uniqueness : K uniquely identifies each tuple, i.e. no two tuples can have same value of K

Non - redundancy : K is non - redundant, i.e. no proper subset of K has the uniqueness property

There is no subset of the key which has above above properties and in general more than one candidate key may exist.

Example:

STUDENT( name, father-name, course, enroll-number, grade)
Candidate keys assuming they give an unique identification : {name,father-name}

{enroll-number}

Non Candidate Key :
{name, enroll-number} violates property 2.
{ course} violates property 1.

Prime/ Non Prime

Prime attribute : Attribute participates in a candidate key.

Non Prime attribute: Attribute does not participate in any candidate key

Example:

STUDENT( name, father-name, course, enroll-number, grade)
Where key be enroll-number, then
Prime attribute : enroll-number.
Non Prime attribute : name, father-name, course, grade

Primary Key

A designated candidate key is a primary key. Primary key must be fully defined i.e no unknown values or NULL values are permitted.

Example:

Let candidate key of STUDENT be rollno, the
Every student must have a rollno.
rollno is a PRIME attribute
name, fname, course, grade are NON PRIME attributes.

Super Key

A collection of attributes which is a unique identifier, may be non minimal.

Axioms

Reflexivity rule

If A is a set of attributes and B subset or equal to A
            =>A ----> B

Augmentation rule
If A ---->B holds and C is a set of attributes
           =>CA---> CB

Transitivity rule

If A ----> B holds and B ----> C holds
=>A ----> C

These axioms are sound and complete as they generate all other functional dependencies for a given set F of functional dependencies.

This collection of rules is known as Armstrong’s axioms

Additional Rules

Union rule
If A---->B holds and A---->C holds
=> A----> BC

Decomposition rule
If A---->BC holds
=A---->B and A----->C

pseudo transitivity rule
If A---->B holds and CB----->D holds
=> AC----->D

Normalization

Two broad approaches to normalization :
1. Decomposition approach
2. Synthesis approach

Decomposition approach:

Treat all the attributes as defining the properties of one entity
Determine the functional/multi-valued dependencies.
Decompose the global entity into its components. Repeatedly decompose each entity thus obtained till no further decomposition is possible.

Synthesis approach:

Identify all the functional / multi-valued dependencies.

Group together into entities all those attributes which exhibit these dependencies.

Definition of Decomposition:

Let r be a relation on relation scheme R and let ri=Ri(r) for
i=1,2,…. then

r subset or equal to r1 join r2 ………..join rn

The Decomposition of the relational scheme R={A1,A2,A3…An} in its replacement by a set of relation schemes {R1,R2,R3….Rn} such that R1 join R2 join R3…..Rn = R.

Decomposition can be Lossless-Join Decomposition : All the original information can be recovered by joining

Decomposition can be Non-Lossless-Join or Lossy Decomposition : Only partial information can be recovered

Lossless-Join Decomposition

R a relation and F a set of FDs
Decompose R into R1 and R2
Decomposition is lossless if either
R1 ∩ R2 → R1 or
R1 ∩ R2 → R2 is in F+

EmpDept ( empno, empname, job, deptno, dname, dloc)
F = { deptno → dname deptno → dloc empno → empname
empno → deptno empno → job }

Decompose EmpDept into two relations
Emp ( empno, empname, job, deptno )
Dept( deptno, dname, dloc)

Decomposition is lossless-join

deptno → dname deptno → dloc
deptno → dname, dloc (union)
deptno → dname, dloc, deptno ( augmentation)
Emp ∩ Dept = { deptno } → Dept

Decompose EmpDept into two relations
Emp ( empno, empname, job)
Ejob( deptno, dname, dloc, job)

Emp ∩ Dept = { job } !→ Emp or Ejob
Decomposition is non-lossless-join

Dependency Preserving

A relation R with a set of functional dependencies F be decomposed into the relations R1, R2, ……., Rn .
The restriction of F to Ri = Fi = {FDs in F+ which include attributes only of Ri }
F| = F1 U F2 U … U Fn
Decomposition is dependency preserving if F| = F or F|+ = F+

EmpDept ( empno, empname, job, deptno, dname, dloc)
F = { deptno → dname deptno → dloc empno → empname
empno → deptno empno → job }

Decompose EmpDept into two relations

    Emp ( empno, empname, job, deptno )
        Femp =     {empno → empname,     empno → deptno, empno → job }
    Dept( deptno, dname, dloc)
        Fdept = {deptno → dname,    deptno → dloc }

F| = Femp U Femp = F hence dependency preserving

Notion of Normalization

Normalization refers to the procedure of successive decomposition of a given relation into smaller relations. Normalization should be information preserving.

1st Normal Form

A relation R defined (A1, A2, ……., An) is said to be in 1 NF if : Values in the domain of each attribute of the relation are atomic . i.e the value is not a set of values or a list of values. Relational model expects relations to be in 1 NF.

Example:

STUDENT(name, fname, roll-no, course,grade)
Every attribute takes on a simple value.Thus it is in 1 NF.

EMPLOYEE(name, address, child)
child has attributes like child-name,age,sex.It is not atomic, thus is not in 1NF

PRODUCT(product-no, price, qty)

It is in 1 NF as every attribute has as atomic value

Enforcing the 1 NF

Replacement method: Systematically replaces all complex attributes by their constituents

Example:

For EMPLOYEE (name, address, child) define as

EMPLOYEE(name,address, child-name, cage, csex)

Decomposition method: Split the relation into two components, each of which are in 1 NF.

Example:

For EMPLOYEE define
EMPLOYEE(ename, address) and CHILD(cname, cage, asex)

Notion Of Anomaly

Knowledge of the relation that is required to perform an operation on relation without creating any data inconsistencies.

Three anomalies are:

INSERT: One can not insert the fact that a particular supplier is located in a particular city until that supplier supplies at least one part

DELETE: If one deletes a tuple for a particular supplier, the information about the supplier’s City is also lost.

UPDATE: The city value for a given supplier appears many times.This redundancy causes problems.For eg is a supplier S1 moves from LONDON to AMSTERDAM, then there is a problem either to find every tuple connecting S1 and LONDON or to produce an inconsistence result.

Partial functional Dependency

An attribute is partially functionally dependent upon another when it is functionally dependent upon it and also upon a proper subset of it.

Example:

A , B------>C
A ----->C

It leads to redundancy

2nd Normal Form

A relation is said to be in 2 NF if and only if it is in 1 NF and every non-key attribute is irreducibly dependent on the primary key.

It is concerned with the elimination of redundancy due to partial functional dependencies.

Redundancy causes inconsistent modifications :

Update Anomaly : In supplier if X shifts business from Delhi to Bangalore then time dependent behavior on the number of parts being supplied at that time.Number of updates performed may be less than required

Deletion Anomaly : In supplier if X stops supplying parts 1, 2 and 3 then all three rows are deleted. And thus information about city of X is lost.

Insertion Anomaly : A new supplier C starts operating from Calcutta then, one can not insert since it will cause an undefined value in the primary key

If partial functional dependency does not involve the primary key then time dependent behavior is shown

Assume (S, P#) is not the primary key then, INSERTION is okay but when first P# is defined then an UPDATE has to be performed and for subsequent P# insertion has to be done.

Again inconsistent modifications are possible.

Eliminating Redundancy due to Partial Dependency

Eliminate partial functional dependency by having only full functional dependencies.

This is known as the 2NF norm.

An entity is in 2 NF if it is in 1 NF and if each non-prime field is fully dependent upon each candidate key

Represent the offending partial functional dependency as as separate entity.Thus split the original relation into two.

Transitive Dependency

Let A, B, C be three distinct collections of attributes of an entity and following functional dependencies hold :

A → B, B !→ A, B → C
Then we say that A → C transitively or that C is transitively functionally dependent upon A

Transitive functional dependencies give rise to redundancies and thus inconsistencies.

3rd Normal Form

Codd’s Definition
A relation is in 3NF if it satisfies 2NF and no non prime attribute of R is transitively dependent on the primary key

Equivalently,

A relation is in 3 NF if for every functional dependency of the form X---->A, one of the following statements is true:

1. it is a trivial FD
2. X is a super key

3. A is a prime attribute

Example:

Consider a relation Stdinf (Name, Phoneno, Course, Major, Prof. ,Grade ,Major-Elective) with following FD’s

The key of the relation is {Name Course}

The partial dependencies are caused by Name → Phoneno

Name → Major and Course → Prof. as Name, Course is the key

The only transitive dependency is

Name → Major, Major → Major-Elective.

2NF Decomposition:
R1 {Name, Phoneno, Major, Major-Elective}
R2 {Course,Prof..}
R3 {Name,Course,Grade}

3NF Decomposition:
R1-1{Name,Phoneno,Major}
R1-2{Major,Major-Elective}
R2 {Course,Prof.}
R3 {Name,Course,Grade}

Preserves all the Functional Dependencies existing in the original relation

BC Normal Form

Need For BCNF arises when X → A and A → B where B is a subset of X

A relation is in BCNF if whenever a functional dependency

X → A holds then, either

i) X is a super key of R, or

ii) X → A is trivial (A is subset of X)

Lossless BCNF Decomposition:

If A-->B in R violates BCNF then create
R1(A,B ), R2 (R - B)

Case where BCNF is Lossless and Dependency Preserving:

Consider Relation Shipping(Ship,Capacity,Date,Cargo,Value) with following FD’s
Ship-->Capacity
Ship,Date-->Cargo
Cargo,Capacity-->Value
In this case first and third FD’s are nontrivial and also determinant is not super key so it is not in BCNF

(Ship,Capacity,Date,Cargo,Value)

(Capacity,Cargo,Value)

With FD(Cargo,Capacity)---->Value

(Ship,Capacity)

With FD Ship---->Capacity

(Ship,Date,Cargo)

With FD Ship ,Date---->Cargo