Pages

May 24, 2014

All about Normalization!

Normalization is one of the favorite topics of interviewee, here are the basics about what Normalization is all about.

What is Normalization?
A normalization is the process of organizing the columns, tables of a database to minimize the redundancy of data and ensure data dependencies make sense( means storing related data in a table).


It involves in dividing large tables into smaller tables and defining relationships between them.

What are different types of Normalization Levels or Normalization Forms?

The technique of normalization includes some rules, which when applied, changes the unnormalized data to a normalized-structured state. The process of nomalizing is implemented in 3 levels which as known as normal forms.

Inner and Outer Joins!..

First Normal Form:
1). Identify the primary key.
2). Remove all duplicate columns from the same table.
3). Create separate tables for each group of related data and identify each row with a unique column or set of columns (Primary Key).
4). Add a primary key column to the new table. It should be the same column as the foreign key in the base table.
   
Second Normal Form:
First it should meet the requirement of first normal form.

All the non-primary key column should be dependent on the whole primary key.  When a non primary key column is partially dependent on primary key then it is known as partial dependency. In 2nd normal form this partial dependency is removed.

Frequently asked SQL queries in interview..

How to remove Partial Dependency?
a). First you need to identify the columns which are partially dependent on primary key.
b). Remove those column from the base table and create another table, in which all the non-primary column are dependent on primary key of newly created table.

Third Normal Form:
First it should meet the requirements of second normal form. in 3rd NF all the non-primary key columns in the tables are directly dependent on the primary key and are not dependent on any other non-primary key.

1). First step is to identify the non-primary key column that depend on other non-primary key columns.
2). Then you need to remove these columns from the base table.
3). Create another table with this removed columns and include the non-primary key column that they are dependent on making it the primary key.
4). Create a foreign key in the base table and link it to primary key of newly created table.


What is Transitive Dependency?
It's a situation where a non-key column of a table is dependent on another non-key column.


What is De-normalization?
The de-normalization is the process of optimizing the read performance of a database by adding redundant data or by grouping data.

Let us take an example:


Student Details
Course Details
Result Details
Roll No
Student Name
DOB
Course ID
Course Name
Duration in Month
Date Of Exams
Marks Obtained
Grade
1001
Ram
11/9/1986
M4
Basic Maths
7
11/11/2004
89
A
1002
Shyam
12/8/1987
M4
Basic Maths
7
11/12/2004
78
B
1001
Ram
23/06/1987
H6

4
11/13/2004
87
A
1003
Sita
16/07/1985
C3
Basic Chemistry
11
11/14/2004
90
A
1004
Gita
24/09/1988
B3

8
11/15/2004
78
B
1002
Shyam
23/06/1988
P3
 Basic Physics
13
11/16/2004
67
C
1005
Sunita
14/09/1987
P3
 Basic Physics
13
11/17/2004
78
B
1003
Sita
23/10/1987
B4

5
11/18/2004
67
C
1005
Sunita
13/03/1990
H6

4
11/19/2004
56
D
1004
Gita
21/08/1987
M4
Basic Maths
7
11/20/2004
78
E

The above table contains the student and result details. Above table has below issues:

1). Firstly, we cannot insert prospective course which does not have any registered student or we cannot insert student details that is yet to register for any course.
2). Secondly, if we want to update the course M4’s name we need to do this operation three times. Similarly we may have to update student 1003’s name twice if it changes.
3). Also, if we want to delete a course M4 , in addition to M4 occurs details , other critical details of student also will be deleted. This kind of deletion is harmful to business. Moreover, M4 appears thrice in above table and needs to be deleted thrice.
4). Course M4’s data is stored thrice and student 1002’s data stored twice .This redundancy will increase as the number of course offerings increases.

To remove above issues, normalization is performed on above table.

-K Himaanshu Shukla...



Copyright © 2014 - ScrutinyByKHimaanshu


No comments:

Post a Comment