Note on Normalization

  • Note
  • Things to remember

Normalization of Data

Normalization is a process which has a setof steps that enables to identify the existence of potential problems called update anomalies (irregularities) in the design of a relation database. This process supplies the methods for correcting these anomalies. In this process, it involves converting tables into various types of normal forms. A table in particular normal form processes a certain desirable collection of properties.

.

Fig: Database Normalization

Source :www.w3schools.in

The normal forms are 1NF, 2NF,3NF, which were later extended by Boyce-Codd. The normalization process allows a table or collection of tables and produces a new collection of tables that represent the same information but free of anomalies.

In the normalization process, two terms are basically used. They are functional dependence and keys.

Functional dependence:

Functional Dependence is a formal name for what is basically a simple idea. Example: A sales representative pay class determine his or her commission rate. In other words, a sales representative’s pay class functionally determines his commission rate or a sales commission rate functionally depends on his pay class.

File name: SaleRep

Sales Rep

Last Name

First Name

Price

City

Commission

Rate

[Note: Each blank row can hold data]

Filename: Customer

Customer Number

Last Name

First Name

Place

City

Balance

Credit Limit

Sales rep number

[Note: Each blank row can hold data]

A column (attribute) B is functionally dependent on another column (or possibly a collection of columns), if a value for A determines a single value for B at any one time. The functional dependence of A is written as (AàB). If B is functionally dependent on A. This can express as A functionally determines B.

Functional dependence is determined on the basis of the unique field. For example, last name, address, ID code, etc.

Keys

It is the second underlying concept of the normalization process. A primary key is very important. A column (attribute) C (or collection of columns) is the primary key for table (relation) T.

If;

  1. All column in T is functionally dependent on C.
  2. No sub-collection of the columns in C (assuming C is a collection of columns and not just a single column) also has property (a).

Example, given the name, you cannot determine the address, city etc. In this case, you will choose candidate key. The candidate key is a column or collection of columns on which all columns in the table are functionally dependent. The primary key defines the candidate key as well.

The candidate keys that are not chosen to be the primary key are often referred to as an alternate key. The primary key is frequently called simply, the key in other studies on database management and the relational model.

First Normal Form

A table (relation) is said to be the first normal form ( 1NF) if it does not contain the repeating groups. A table (relation) that contains a repeating group (or multiple entries for as single record) is called a non normalized relation. Removal of repeating groups is the starting points in the quest for tables that are free of problems as possible.

Tables without repeating groups are in the first normal form. In the given table, if a student (SIDà 100) is deleted, the information about the fee of Golf is lost. This has deletion anomaly. Similarly, if a student wants to take part in cricket, its fee is unknown. It also possesses the insertion anomaly.

Second Normal Form

Consider the third relation in the above figure. Here, we have assumed that the student is allowed to enroll for more than that one activity at a time. We observe that it has modification anomalies. If we delete the tuple for student 175 (SIDà175), we lose the fact that the monthly fee of squash is Rs. 50. Also, we cannot enter an activity until a student signs up for it. Thus, the relation suffers from both insertion and deletion anomalies.

The problem with this relation is that it has a dependency involving only part of the key. The key here is the combination (SID, Activity) but the relation contains dependency, Activity a Fee (Fee depends on the type of Activity). The determinant of this dependency (Activity) is only part of the key (SID, Activity). The modification anomalies could be eliminated if the non-key attribute (Fee) were dependent on all the key, not just a part of it.

To resolve the situation, we must separate the relation into small ones. The situation now leads to the definition of the second normal form:

A relation is in second normal form if all the non-key attributes are dependent on the entire key. Observe that this definition pertains to relations that have composite keys. If the key is a single attribute, then the relation is automatically in second normal form.

Filename: ACTIVITY (SID, Activity, Fee)

Key: (SID, Activity)

Activities can be split to form two relations in second normal form. The relations are the same as those in fig.

STU-ACT

SID

Activity

100

Skiing

150

Swimming

175

Squash

200

Swimming

ACT-COST

Activity

Fee

Skiing

200

Swimming

50

Squash

50

Third Normal Form

The relation in second normal form also has anomalies. Consider the relation HOUSING in given fig. Here, the students stay in different houses and their cost prices per quarter are different. The key here is SID and the functional dependencies are SID à Building and Building à Fee (SID à Activity and Activity à Fee, as in second normal form). The dependencies arise because a given student lives in only one building and each building has its own fixed fee structure, for e.g. everyone living in Yellow House pays Rs. 1200 per quarter.

Since, SID determines Building and since, Building determines Fee, then, transitively SID à Fee. Hence, SID depends on fee which is incorrect. Thus, SID, a single attribute, is the key and the relation must, therefore, be in second normal form. The relation, however, does still have anomalies. These anomalies can be resolved by splitting the table.

HOUSING ( SID, Building, Fee)

Functional dependencies: Building à Fee

SID à Building à Fee SID à Housing (SID,Building) BLDG-FEE (Building, Fee)

SID

Building

Fee

100

Yellow

1200

150

Blue

1100

200

Red

1200

250

White

1100

300

Green

1200

Table 1

SID

Building

100

Yellow

150

Blue

200

Red

250

White

300

Yellow

Table 2

Building

Fee

Yellow

1200

Blue

1100

White

1100

Table 3

If we delete the second row of the Housing relation (table 1), we lose not only the fact that student 150 lives in Blue house, but also that it costs Rs. 1100 to live there. This is a deletion anomaly. Further, how can one record the fact that the fee for XY House (imagine) is R.s 1500? We cannot do that until a student moves into the XY House. This is an insertion anomaly. To eliminate the anomalies from the relation in second normal, the transitive dependency must be removed. This leads is to the definition of the third normal form:

A relation is in third normal form if it is in second normal form and has no transitive dependencies.

Housing relation in above fig can be split into two relations in third normal form. The relation STU-Housing (SID, Building) and BLDG-Fee (Building, Fee) are examples.

Structured Query Language (SQL)

SQL was developed by IBM in the, 1970s as a way to get information into and out of relational database management systems. It was first standardized in 1986 ANSI. It is declarative in nature. That is its commands are accurate and declared so that they perform on particular databases. SQL commands are categorized as:

  • DDL such as create a table, alter table, drop table, etc.
  • DML such as select, insert into, update, delete from, etc.

SQL is a declarative language. It is used to find the results of the database. SQL queries are the most common in use. The SQL sublanguage DML and DDL are very common in server-based database management system. For using SQL, you need to create the table, insert data records into the table and then, you can manipulate the data records or make the queries. A typical command could be:

SELECT CarModel FROM CarSales WHERE CarSoldDate Between ‘May 1, 2005’ AND May 31, 2005’;

For creating database you would like to enter the SQL command as:

CREATE TABLE MyFirstTable

(Stud_name varchar (40),

Stud_Address varchar (40),

Stud_DateOfBirth Date, )

For listing all the records from the table, you would enter the command as:

Select * from MyFirstTable

Similarly, for various activities such as deleting records, indexing records, filtering records, joining tables, etc., you have commands of SQL. These commands are beyond context at this level.

References:

Khanal, R.C. Khanal, R.C. Computer Concept for XII. Pashupatigriha Marga, Thapathali, Kathmandu, Nepal: Ekta Books Distributors Pvt. Ltd., 2010. 9-13.

Adhikari, Deepak Kumar.,et.al., Computer Science XII,Asia Publication Pvt.Ltd

  1. Normalization is the processes of identifying the existence of potential problem called update anomalies (irregularities) in the design of the relational database.
  2. First Normal Form is said to be the first normal form (1NF) if it does not contain the repeating groups. It includes the tables without repeating groups are in the first normal form.
  3. A relation is in second normal form if all the non-key attributes are dependent on all the key. It includes the modification anomalies could be eliminated if the non-key attribute were dependent on all the key not just a part of it.
  4. A relation is in third normal form if it is in second normal form and has no transitive dependencies. It includes the eliminate the anomalies from the relation is second normal form, the transitive dependency must be removed which leads us to the definition of the third normal form.
.

Very Short Questions

0%

DISCUSSIONS ABOUT THIS NOTE

No discussion on this note yet. Be first to comment on this note