Tuesday, November 3, 2009

Introduction to Databases

CHAPTER I

1.1 Overview
Relational database systems have become increasingly popular since the late 1970's. They offer a powerful method for storing data in an application-independent manner. This means that for many enterprises the database is at the core of the I.T. strategy. Developments can progress around a relatively stable database structure which is secure, reliable, efficient, and transparent.
In early systems, each suite of application programs had its own independent master file. The duplication of data over master files could lead to inconsistent data.
Efforts to use a common master file for a number of application programs resulted in problems of integrity and security. The production of new application programs could require amendments to existing application programs, resulting in `unproductive maintenance'.
Data structuring techniques, developed to exploit random access storage devices, increased the complexity of the insert, delete and update operations on data. As a first step towards a DBMS, packages of subroutines were introduced to reduce programmer effort in maintaining these data structures. However, the use of these packages still requires knowledge of the physical organization of the data.
1.2 The Database Approach
A database system is a computer-based system to record and maintain information. The information concerned can be anything of significance to the organization for whose use it is intended.
The contents of a database can hold a variety of different things. To make database design more straight-forward, databases contents are divided up into two concepts:
• Schema
• Data
The Schema is the structure of data, indicates the rules which the Data must obey whereas the Data are the "facts".
Imagine a case where we want to store facts about Employees in a company. Such facts could include their name, address, date of birth, and salary.
In a database all the information on all employees would be held in a single storage "container", called a table.
This table is a tabular object like a spreadsheet page, with different employees as the rows, and the facts (e.g. their names) as columns... Let's call this table EMP, and it could look something like:


Name Address Date of Birth Salary
Jim Smith 1 Apple Lane 1/3/1991 11000
Jon Greg 5 Pear St 7/9/1992 13000
Bob Roberts 2 Plumb Road 3/2/1990 12000
Table 1.1
From this information the schema would define that EMP has four components, "NAME","ADDRESS","DOB","SALARY".
Protecting the database against rubbish data is one of the most important database design steps, and is what much of this course is about.
Facts can be things like:
• NAME is a string, and needs to hold at least 12 characters.
• ADDRESS is a string, and needs to hold at least 12 characters.
• DOB is a date... The company forbids people over 100 years old or younger than 18 years old working for them.
• SALARY is a number. It must be greater than zero.
Such rules can be enforced by a database. During the design phase of a database schema these and more complex rules are identified and where possible implemented. The more rules the harder it is to enter poor quality data.
1.3 USER TYPES
When considering users of a Database system, there are three broad classes to consider:
1. The application programmer, responsible for writing programs in some high-level language such as COBOL, C++, SQL, ORACLE etc.
2. The end-user, who accesses the database via a query language
3. The database administrator (DBA), who controls all operations on the database
1.4 DATABASE ARCHITECTURE
DBMSs do not all conform to the same architecture.
• There are three-level architecture that forms the basis of modern database architectures.


The architecture for DBMSs is divided into three general levels:
• External
• Conceptual
• Internal
1.4.1 Three level database architecture

Figure 1.1: Three level architecture
1. The external level : concerned with the way individual/end-users see the data
2. The conceptual level: can be regarded as a community user view ¬ a formal description of data of interest to the organization, independent of any storage considerations.
The structure of the data how it is stored. E.g. Data type, Properties
3. The internal level : concerned with the way in which the data is actually stored
How data is seen by the Programmer


Figure 1.2: How the three level architecture works

1.4.2 EXTERNAL VIEW (Data and its Schema as it appears to the End Users)
A user is anyone who needs to access some portion of the data. They may range from application programmers to casual users each user has a language at his/her disposal.
The application programmer may use a high level language (e.g. COBOL) while the casual user will probably use a query language.
Regardless of the language used, it will include a data sub¬language DSL which is that subset of the language which is concerned with storage and retrieval of information in the database and may or may not be apparent to the user.
A DSL is a combination of two languages:
• A data definition language (DDL) - provides for the definition or description of database objects
This is packed within any database system when loaded to the system.
• A data manipulation language (DML) - supports the manipulation or processing of database objects this is packed within any database system when loaded to the system.
• All these languages have been written in a computer language format.
Each user sees the data in terms of an external view: Defined by an external schema, consisting basically of descriptions of each of the various types of external record in that external view, and also a definition of the mapping between the external schema and the underlying conceptual schema.
1.4.3 CONCEPTUAL VIEW (Intangible representation of information within the database each conceptual records with its own conceptual schema)
• An abstract (intangible) representation of the entire information content of the database.
• It consists of multiple occurrences of multiple types of conceptual record, defined in the conceptual schema.
• To achieve data independence, the definitions of conceptual records must involve information content only.
• storage structure is ignored
• access strategy is ignored
• In addition to definitions, the conceptual schema contains authorization and validation procedures.
1.4.4 INTERNAL VIEW (Multiple types of stored records)
The internal view is a low-level representation of the entire database consisting of multiple occurrences of multiple types of internal (stored) records.
It does not deal in terms of physical records or blocks nor with any device specific constraints such as cylinder or track sizes.
Details of mapping to physical storage are highly implementation specific and are not expressed in the three-level architecture.
The internal view described by the internal schema which describes the following:
• defines the various types of stored record
• what indices (indexes, directories) exist
• how stored fields are represented
• what physical sequence (chains) the stored records are in
In effect the internal schema is the storage structure definition.

1.4.5 MAPPINGS
• The conceptual/Internal mapping:
o Defines conceptual and internal view (analysis) correspondence (association)
o Specifies mapping from conceptual records to their stored counterparts
• An External/Conceptual mapping:
o Defines a particular external and conceptual view correspondence
• A change to the storage structure definition means that the conceptual/internal mapping must be changed accordingly, so that the conceptual schema may remain invariant (unchanged) , achieving physical data independence.
• A change to the conceptual definition means that the conceptual/external mapping must be changed accordingly, so that the external schema may remain invariant, achieving logical data independence.

1.5 DATABASE MANAGEMENT SYSTEMS (DBMS)
The database management system (DBMS) is the software that:
• Handles all access to the database
• Is responsible for applying the authorization checks and validation procedures
Conceptually what happens is:
1. A user issues an access request, using some particular DML.
2. The DBMS intercepts the request and interprets it.
3. The DBMS inspects in turn the
 External Schema,
 The External/Conceptual Mapping,
 The Conceptual Schema,
 The Conceptual/Internal Mapping,
 The Storage Structure Definition.
4. The DBMS performs the necessary operations on the stored database.

1.5.1 DATABASE ADMINISTRATOR
The database administrator (DBA) is the person (or group of people) responsible for overall control of the database system.
The DBA's responsibilities include the following:
• Deciding the information content of the database, i.e. identifying the entities of interest to the enterprise and the information to be recorded about those entities. This is defined by writing the conceptual schema using the DDL
• Deciding the storage structure and access strategy, i.e. how the data is to be represented by writing the storage structure definition. The associated internal/conceptual schema must also be specified using the DDL
• Communicate with users, i.e. to ensure that the data they require is available and to write the necessary external schemas and conceptual/external mapping (again using DDL)
• Defining authorization checks and validation procedures. Authorization checks and validation procedures are extensions to the conceptual schema and can be specified using the DDL
• Defining a strategy (Plan. Policy, Approach) for backup and recovery. For example periodic dumping of the database to a backup tape and procedures for reloading the database for backup. Use of a log file where each log record contains the values for database items before and after a change and can be used for recovery purposes
• Monitoring performance and responding to changes in requirements, i.e. changing details of storage and access thereby organizing the system so as to get the performance that is `best for the enterprise'
1.5.2 FACILITIES (SERVICES) AND LIMITATIONS
The facilities offered by DBMS vary a great deal, depending on their level of sophistication.

In general, however, a good DBMS should provide the following advantages over a conventional system:
• Independence of data and program - This is a prime advantage of a database. Both the database and the user program can be altered independently of each other thus saving time and money which would be required to retain consistency.
• Data shareability and non¬redundance of data - The ideal situation is to enable applications to share an integrated database containing all the data needed by the applications and thus eliminate as much as possible the need to store data redundantly.
• Integrity (Accurate) - With many different users sharing various portions of the database, it is impossible for each user to be responsible for the consistency of the values in the database and for maintaining the relationships of the user data items to all other data item, some of which may be unknown or even prohibited for the user to access.
• Centralized control - With central control of the database, the DBA can ensure that standards are followed in the representation of data.
• Security - Having control over the database the DBA can ensure that access to the database is through proper channels and can define the access rights of any user to any data items or defined subset of the database. The security system must prevent corruption of the existing data either accidentally or maliciously.
• Performance and Efficiency - In view of the size of databases and of demanding database accessing requirements, good performance and efficiency are major requirements. Knowing the overall requirements of the organization, as opposed to the requirements of any individual user, the DBA can structure the database system to provide an overall service that is `best for the enterprise'.
1.6 Data Independence
• This is a prime advantage of a database. Both the database and the user program can be altered independently of each other.
• In a conventional system applications are data dependent. This means that the way in which the data is organized in secondary storage and the way in which it is accessed are both dictated by the requirements of the application, and, moreover, that knowledge of the data organization and access technique is built into the application logic.
• For example, if a file is stored in indexed sequential form then an application must know
o that the index exists
o the file sequence (as defined by the index)
The internal structure of the application will be built around this knowledge. If, for example, the file was to be replaced by a hash-addressed file, major modifications would have to be made to the application.
Such an application is data-dependent - it is impossible to change the storage structure (how the data is physically recorded) or the access strategy (how it is accessed) without affecting the application, probably drastically. The portions of the application requiring alteration are those that communicate with the file handling software - the difficulties involved are quite irrelevant to the problem the application was written to solve.
• It is undesirable to allow applications to be data-dependent - different applications will need different views of the same data.
• The DBA must have the freedom to change storage structure or access strategy in response to changing requirements without having to modify existing applications.
• Data independence can be defines as
`The immunity of applications to change in storage structure and access strategy'.
1.7 Data Redundancy
In non-database systems each application has its own private files. This can often lead to redundancy in stored data, with resultant waste in storage space. In a database the data is integrated.
The database may be thought of as a unification of several otherwise distinct data files, with any redundancy among those files partially or wholly eliminated.
Data integration is generally regarded as an important characteristic of a database. The avoidance of redundancy should be an aim, however, the vigour with which this aim should be pursued is open to question.
Redundancy is
• Direct if a value is a copy of another
• Indirect if the value can be derived from other values:
o Simplifies retrieval but complicates update
o Conversely integration makes retrieval slow and updates easier
• Data redundancy can lead to inconsistency in the database unless controlled.
• The system should be aware of any data duplication - the system is responsible for ensuring updates are carried out correctly.
• A db with uncontrolled redundancy can be in an inconsistent state - it can supply incorrect or conflicting information
• A given fact represented by a single entry cannot result in inconsistency - few systems are capable of propagating updates i.e. Most systems do not support controlled redundancy.
1.8 Data Integrity
This describes the problem of ensuring that the data in the database is accurate...
• Inconsistencies between two entries representing the same `fact' give an example of lack of integrity (caused by redundancy in the database).
• Integrity constraints can be viewed as a set of assertions to be obeyed when updating a DB to preserve an error-free state.
• Even if redundancy is eliminated, the DB may still contain incorrect data.
• Integrity checks which are important are checks on data items and record types.
Integrity checks on data items can be divided into 4 groups:
1. type checks
o e.g. ensuring a numeric field is numeric and not a character - this check should be performed automatically by the DBMS.
2. redundancy checks
o Direct or indirect (see data redundancy) - this check is not automatic in most cases.
3. range checks
o e.g. to ensure a data item value falls within a specified range of values, such as checking dates so that say (age > 0 AND age < 110).
4. comparison checks
o In this check a function of a set of data item values is compared against a function of another set of data item values. For example, the max salary for a given set of employees must be less than the min salary for the set of employees on a higher salary scale.
A record type may have constraints on the total number of occurrences, or on the insertions and deletions of records. For example in a patient database there may be a limit on the number of x-ray results for each patient or the details of a patients visit to hospital must be kept for a minimum of 5 years before it can be deleted
• Centralized control of the database helps maintain integrity, and permits the DBA to define validation procedures to be carried out whenever any update operation is attempted (update covers modification, creation and deletion).
• Integrity is important in a database system - an application run without validation procedures can produce erroneous data which can then affect other applications using that data.

CHAPTER II
DATA ANALYSIS AND ENTITY-RELATIONSHIP MODELING
2.1 Introduction
Data analysis is concerned with the NATURE and USE of data.
It involves:
 The identification of the data elements which are needed to support the data processing system of the organization,
 The placing of these elements into logical groups
 The definition of the relationships between the resulting groups.
Other approaches, e.g. D.F.Ds and Flowcharts, have been concerned with the flow of data-dataflow methodologies.
In data analysis we analyze the data and build a system representation in the form of a data model (conceptual).
A conceptual data model specifies the structure of the data and the processes which use that data.
= establishing the nature of data.
Functional Analysis = establishing the use of data.


2.2 Database Analysis Life Cycle

Figure2.1: Database Analysis Life Cycle

When a database designer is approaching the problem of constructing a database system, the logical steps followed is that of the database analysis life cycle:
A. Database study - here the designer creates a written specification in words for the database system to be built.
This involves:
o Analyzing the company situation - is it an expanding company, dynamic in its requirements, mature in nature, solid background in employee training for new internal products, etc. These have an impact on how the requirement is to be viewed.
o Define problems and constraints – Overall or Portion of the Current system problems and limitations. What are the limits of the new system?
o Define objectives - what is the new database system going to have to do, and in what way must it be done. What information does the company want to store specifically, and what does it want to calculate. How will the data evolve?
o Define scope (level) and boundaries (limits) - what is stored on this new database system, and what it stored elsewhere? Will it interface to another database?
• Database Design - conceptual, logical, and physical design steps in taking specifications to physical implementable designs. This is looked at more closely in a moment.
• Implementation and loading -
Once a DBMS has been installed for example: Oracle, SQL Server, the database must be created within that DBMS depending on the aim of usage ready for the application of required and prepared data.
• Testing and evaluation - the database, once implemented, must be tested against the specification (document) supplied by the client.
It is also useful to test the database with the client using mock (approved) data, as clients do not always have a full understanding of what they think they have specified and how it differs from what they have actually asked for!
In addition, this step in the life cycle offers the chance to the designer to fine-tune the system for best performance. Finally, it is a good idea to evaluate the database in-situ, along with any linked applications.
• Operation - this step is where the system is actually in real usage by the company.
• Maintenance and evolution - designers rarely get everything perfect first time, and it may be the case that the company requests changes to fix problems with the system or to recommend enhancements or new requirements.
o Commonly development takes place without change to the database structure. In elderly systems the DB structure becomes fossilized.
2.3 THREE-LEVEL DATABASE MODEL
This is where the design moves from a written specification taken from the real-world requirements to a physically-implementable design for a specific DBMS.
The three levels commonly referred to are `Conceptual Design', `Data Model Mapping', and `Physical Design'.

Figure2.2: Logic behind the three level architecture
a) The specification is usually in the form of written document containing customer requirements, mock reports, screen drawings and the like, written by the client to indicate the requirements which the final system is to have.

These data has to be collected together from a variety of internal sources to the company and then analyzed to see if the requirements are necessary, correct, and efficient.
b) Once the Database requirements have been collected, the Conceptual Design phase takes the requirements and produces a high-level data model of the database structure.
In this module, we use ER Modeling to represent high-level data models, but there are other techniques. This model is independent of the final DBMS which the database will be installed in.

c) Then from there the high-level data model it taken and converted into a conceptual schema, which is specific to a particular DBMS class (e.g. relational). For a relational system, such as Oracle, an appropriate conceptual schema would be relations.
d) Finally, the conceptual schema is converted into database internal structures. This is specific to a particular DBMS product.

2.4 Basics of the ER model
Entity Relationship (ER) modeling
• Is a graphical representation of the database system
• Supports the user's perception (responsiveness) of the data
• Is composed of Entities, Attributes, And Relationships
2.4.1 Entities
• An entity is any object in the system that we want to model and store information about
• Individual objects are called entities
• Groups of the same type of objects are called entity types or entity sets
• Entities are represented by rectangles (either with round or square corners)

Figure2.3: Entities in this entities is Lecturer
2.4.2 Attribute
• All the data relating to an entity is held in its attributes E.g. Lecturer Name, Address, Office No.
Attributes can be:
• simple or composite
• single-valued or multi-valued
• Attributes can be shown on ER models
• They appear inside ovals and are attached to their entity.
• Note that entity types can have a large number of attributes... If all are shown then the diagrams would be confusing. Only show an attribute if it adds information to the ER diagram, or clarifies a point.

Figure2.4: In this case Attribute is “Name”
2.4.3 Keys
• A key is a data item (record) that allows us to uniquely identify entity
• A candidate key is an attribute or set of attributes that uniquely identifies an entity. E.g. Lecture Name, Address, Office No.
• An entity may have one or more possible candidate keys, the one which is selected is known as the primary key.
• A composite key is a candidate key that consists of two or more attributes
• The name of each primary key attribute is underlined.
2.4.4 Relationships
• A relationship type is a meaningful association between entities or entity types
• A relationship is an association of entities where the association includes one entity from each participating entity type.
• Relationship types are represented on the ER diagram.
• The relationship can be placed inside a diamond or by a series of lines.
For Example:. Managers Manage Employees:

Figure 2.5: Chen’s notation for relationships
• For this module, we will use an alternative notation, where the relationship is a label on the line. The meaning is identical

Figure2.6: Relationships used in this document

2.4.5 Degree of a Relationship
• The number of participating entities in a relationship is known as the degree of the relationship.
• If there are two entity types involved it is a binary relationship type


Figure 2.7: Binary Relationships
• If there are three entity types involved it is a ternary relationship type

Figure 2.8: Ternary relationship
• It is possible to have an n-ary relationship (e.g. quaternary or unary).
• Unary relationships are also known as a recursive relationship.

Figure 2.9: Recursive relationship
• It is a relationship where the same entity participates more than once in different roles.
For Example: In the example above we are saying that employees are managed by employees.
• If we wanted more information about who manages whom, we could introduce a second entity type called manager.

Degree of a Relationship
• It is also possible to have entities associated through two or more distinct relationships.

Figure2.10: Multiple relationships
• In the representation we use it is not possible to have attributes as part of a relationship. To support this other entity types need to be developed.
2.4.5.1 Replacing ternary relationships
Sometimes the relationships can be replaced by a series of binary relationships that link pairs of the original ternary relationship.

Figure 2.11: A ternary relationship example
• This can result in the loss of some information - It is no longer clear which sales assistant sold a customer a particular product.
• Try replacing the ternary relationship with an entity type and a set of binary relationships.
Relationships are usually verbs, so name the new entity type by the relationship verb rewritten as a noun.
• The relationship sells can become the entity type sale.

Figure 2.12: Replacing a ternary relationship
• So a sales assistant can be linked to a specific customer and both of them to the sale of a particular product.
• This process also works for higher order relationships.
2.4.5.2 Cardinality
i. Relationships are rarely one-to-one
For example, a manager usually manages more than one employee
There are four possible categories.
ii. One to one (1:1) relationship
iii. One to many (1:m) relationship
iv. Many to one (m:1) relationship
v. Many to many (m:n) relationship
 On an ER diagram, if the end of a relationship is straight, it represents 1, while a "crow's foot" end represents many.
 A one to one relationship - a man can only marry one woman, and a woman can only marry one man, so it is a one to one (1:1) relationship


For Example:

Figure2.13 : One to One relationship example
• A one to many relationship - one manager manages many employees, but each employee only has one manager, so it is a one to many (1:n) relationship

Figure 2.14: One to Many relationship example
• A many to one relationship - many students study one course. They do not study more than one course, so it is a many to one (m:1) relationship

Figure 2.15: Many to One relationship example
• A many to many relationship - One lecturer teaches many students and a student is taught by many lecturers, so it is a many to many (m:n) relationship

Figure2.16: Many to Many relationship example

2.4.5.3 Optionality
A relationship can be optional or mandatory.
• If the relationship is MANDATORY
An entity at one end of the relationship must be related to an entity at the other end.
• The OPTIONALITY can be different at each end of the relationship
For example,
 A student must be on a course. This is mandatory. To the relationship `student studies course' is mandatory.
 But a course can exist before any students have enrolled. Thus the relationship `course is_studied_by student' is optional.



• To show OPTIONALITY, put a circle or `0' at the `optional end' of the relationship.
• As the optional relationship is `course is_studied_by student', and the optional part of this is the student, then the `O' goes at the student end of the relationship connection.

Figure 2.17: Optionality example
• It is important to know the optionality because you must ensure that whenever you create a new entity it has the required mandatory links.

2.4.6 Entity Sets
Sometimes it is useful to try out various examples of entities from an ER model.
One reason for this is to confirm the correct cardinality and optionality of a relationship.
We use an `Entity Set Diagram' to show entity examples graphically.
Consider the example of `course is_studied_by student'.

Figure2.18: Entity set example
2.4.6.1 Confirming Correctness

Figure2.19: Entity set confirming errors

NOTE:
• Use the diagram to show all possible relationship scenarios.
• Go back to the requirements specification and check to see if they are allowed.
• If not, then put a cross through the forbidden relationships
• This allows you to show the cardinality and optionality of the relationship

2.4.7 Deriving the relationship parameters
To check we have the correct parameters (sometimes also known as the degree) of a relationship, ask two questions:
1. ONE COURSE IS STUDIED BY HOW MANY STUDENTS? ANSWER = `ZERO OR MORE'.
o This gives us the degree at the `student' end.
o The answer `zero or more' needs to be split into two parts.
o The `more' part means that the cardinality is `many'.
o The `zero' part means that the relationship is `optional'.
o If the answer was `one or more', then the relationship would be `mandatory'.
2. ONE STUDENT STUDIES HOW MANY COURSES? ANSWER = `ONE'
o This gives us the degree at the `course' end of the relationship.
o The answer `one' means that the cardinality of this relationship is 1, and is `mandatory'
o If the answer had been `zero or one', then the cardinality of the relationship would have been 1, and be `optional'.
2.4.8 Redundant relationships
Some ER diagrams end up with a relationship loop.
• check to see if it is possible to break the loop without losing info
For Example:
Given three entities A, B, C, where there are relations A-B, B-C, and C-A,
Check if it is possible to navigate between A and C via B. If it is possible, then A-C was a redundant relationship.
• Always check carefully for ways to simplify your ER diagram. It makes it easier to read the remaining information.
2.4.8.1 Example of Redundant relationships.
• Consider entities `customer' (customer details), `address' (the address of a customer) and `distance' (distance from the company to the customer address).

Figure 2.20: Redundant relationship
2.4.9 Splitting n:m Relationships
A many to many relationship in an ER model is not necessarily incorrect. They can be replaced using an intermediate entity. This should only be done where:
• The m:n relationship hides an entity
• The resulting ER diagram is easier to understand.
2.4.9.1 Splitting n:m Relationships - Example
Consider the case of a car hire company. Customers hire cars, one customer hires many cars and a car is hired by many customers.


Figure 2.21: Many to Many example
The many to many relationship can be broken down to reveal a `hire' entity, which contains an attribute `date of hire'.

Figure 2.22: Splitting the Many to Many example

2.5 Constructing an ER model
Before beginning to draw the ER model, read the requirements specification carefully. Document any assumptions you need to make.
1. Identify entities - list all potential entity types. These are the object of interest in the system. It is better to put too many entities in at this stage and then discard them later if necessary.
2. Remove duplicate entities - Ensure that they really separate entity types or just two names for the same thing.
o Also do not include the system as an entity type
o e.g. if modeling a library, the entity types might be books, borrowers, etc.
o The library is the system, thus should not be an entity type.
3. List the attributes of each entity (all properties to describe the entity which are relevant to the application).
o Ensure that the entity types are really needed.
o Are any of them just attributes of another entity type?
o if so keep them as attributes and cross them off the entity list.
o Do not have attributes of one entity as attributes of another entity!
4. Mark the primary keys.
o Which attributes uniquely identify each instances of that entity type?
o This may not be possible for some weak entities.
5. Define the relationships
o Examine each entity type to see its relationship to the others.
6. Describe the cardinality and optionality of the relationships
o Examine the constraints between participating entities.
7. Remove redundant relationships
o Examine the ER model for redundant relationships.
ER modeling is an iterative process, so draw several versions, refining each one until you are happy with it.
Note that there is no one right answer to the problem, but some solutions are better than others!
EXAMPLES:
2.5.1 Country Bus Company
A Country Bus Company owns a number of busses. Each bus is allocated to a particular route, although some routes may have several busses. Each route passes through a number of towns. One or more drivers are allocated to each stage of a route, which corresponds to a journey through some or all of the towns on a route. Some of the towns have a garage where busses are kept and each of the busses are identified by the registration number and can carry different numbers of passengers, since the vehicles vary in size and can be single or double-decked. Each route is identified by a route number and information is available on the average number of passengers carried per day for each route. Drivers have an employee number, name, address, and sometimes a telephone number.
 Entities
• Bus - Company owns busses and will hold information about them.
• Route - Buses travel on routes and will need described.
• Town - Buses pass through towns and need to know about them
• Driver - Company employs drivers, personnel will hold their data.
• Stage - Routes are made up of stages
• Garage - Garage houses buses, and need to know where they are.
 Relationships
• A bus is allocated to a route and a route may have several buses.
• Bus-route (m:1) is serviced by
• A route comprises of one or more stages.
• route-stage (1:m) comprises
• One or more drivers are allocated to each stage.
• driver-stage (m:1) is allocated
• A stage passes through some or all of the towns on a route.
• stage-town (m:n) passes-through
• A route passes through some or all of the towns
• route-town (m:n) passes-through
• Some of the towns have a garage
• garage-town (1:1) is situated
• A garage keeps buses and each bus has one `home' garage
• garage-bus (m:1) is garaged
 Draw E-R Diagram

Figure2.23: Bus Company
 Attributes
• Bus (reg-no,make,size,deck,no-pass)
• Route (route-no,avg-pass)
• Driver (emp-no,name,address,tel-no)
• Town (name)
• Stage (stage-no)
• Garage (name,address)
2.5.2 Problems with ER Models
There are several problems that may arise when designing a conceptual data model. These are known as connection traps.
There are two main types of connection traps:
1. fan traps
2. chasm traps
 Fan traps
A fan trap occurs when a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous. It occurs when 1:m relationships fan out from a single entity.

Figure 2.24: Fan Trap
A single site contains many departments and employs many staff. However, which staff work in a particular department?
The fan trap is resolved by restructuring the original ER model to represent the correct association.

Figure2.25: Resolved Fan Trap
 Chasm traps
A chasm trap occurs when a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences.
It occurs where there is a relationship with partial participation, which forms part of the pathway between entities that are related.

Figure 2.26: Chasm Trap
• A single branch is allocated many staff who oversees the management of properties for rent. Not all staff oversees property and not all property is managed by a member of staff.
• What properties are available at a branch?
• The partial participation of Staff and Property in the oversees relation means that some properties cannot be associated with a branch office through a member of staff.
• We need to add the missing relationship which is called `has' between the Branch and the Property entities.
• You need to therefore be careful when you remove relationships which you consider to be redundant.

Figure2.27: Resolved Chasm Trap
2.6 Enhanced ER Models (EER)
The basic concepts of ER Modelling are not powerful enough for some complex applications... We require some additional semantic modeling concepts:
• Specialization
• Generalization
• Categorization
• Aggregation
First we need some new entity constructs.
• Superclass - an entity type that includes distinct subclasses that require to be represented in a data model.
• Subclass - an entity type that has a distinct role and is also a member of a superclass.

Figure2.28: Superclass and subclasses
Subclasses need not be mutually exclusive; a member of staff may be a manager and a sales person.
The purpose of introducing superclasses and subclasses is to avoid describing types of staff with possibly different attributes within a single entity. This could waste space and you might want to make some attributes mandatory for some types of staff but other staff would not need these attributes at all.
2.6.1 Specialization
This is the process of maximizing the differences between members of an entity by identifying their distinguishing characteristics.
• Staff(staff_no,name,address,dob)
• Manager(bonus)
• Secretary(wp_skills)
• Sales_personnel(sales_area, car_allowance)

Figure 2.29: Specialization in action
• Here we have shown that the managed relationship is only applicable to the Manager subclass, whereas the works_for relationship is applicable to all staff.
• It is possible to have subclasses of subclasses.
2.6.2 Generalization
Generalization is the process of minimizing the differences between entities by identifying common features.
This is the identification of a generalized superclass from the original subclasses. This is the process of identifying the common attributes and relationships.
For instance, taking:
car(regno,colour,make,model,numSeats)
motorbike(regno,colour,make,model,hasWindshield)
And forming:
vehicle(regno,colour,make,model,numSeats,hasWindshielf)
In this case vehicle has numSeats which would be NULL if the vehicle was a motorbike, and has Windshield which would be NULL if it was a car.
2.7 Mapping ER Models into Relations
2.7.1 What is a relation?
A relation is a table that holds the data we are interested in.
It is two-dimensional and has rows and columns.
Each entity type in the ER model is mapped into a relation.
• The Attributes become the Columns.
• The Individual entities become the Rows.

Figure 2.30: a relation
Relations can be represented textually as:
Tablename(primary key, attribute 1, attribute 2, ... , foreign key)
If matric_no was the primary key, and there were no foreign keys, then the table above could be represented as:
student(matric no, name, address, date_of_birth)
When referring to relations or tables, cardinality is considered to the number of rows in the relation or table, and arity is the number of columns in a table or attributes in a relation.

2.7.2 Foreign keys
A foreign key is an attribute (or group of attributes) that is the primary key to another relation.
Characteristics:
• Roughly, each foreign key represents a relationship between two entity types.
• They are added to relations as we go through the mapping process.
• They allow the relations to be linked together.
• A relation can have several foreign keys.
• It will generally have a foreign key from each table that it is related to.
• Foreign keys are usually shown in italics or with a wiggly underline.


2.7.3 Preparing to map the ER model
Before we start the actual mapping process we need to be certain that we have simplified the ER model as much as possible.
2.7.3.1 Mapping 1:1 relationships
Before tackling a 1:1 relationship, we need to know its optionality.
There are three possibilities the relationship can be:
1. mandatory at both ends
2. mandatory at one end and optional at the other
3. optional at both ends
2.7.3.2 Mandatory at both ends
If the relationship is mandatory at both ends it is often possible to subsume one entity type into the other.
• The choice of which entity type subsumes the other depends on which is the most important entity type (more attributes, better key, semantic nature of them).
• The result of this amalgamation is that all the attributes of the `swallowed up' entity become attributes of the more important entity.
• The key of the subsumed entity type becomes a normal attribute.
• If there are any attributes in common, the duplicates are removed.
• The primary key of the new combined entity is usually the same as that of the original more important entity type.
2.7.3.3 When not to combine
There are a few reasons why you might not combine a 1:1 mandatory relationship.
• The two entity types represent different entities in the `real world'.
• The entities participate in very different relationships with other entities.
• Efficiency considerations when fast responses are required or different patterns of updating occur to the two different entity types.
If not combined...
If the two entity types are kept separate then the association between them must be represented by a foreign key.
• The primary key of one entity type becomes the foreign key in the other.
• It does not matter which way around it is done but you should not have a foreign key in each entity.
Example
• Two entity types; staff and contract.
• Each member of staff must have one contract and each contract must have one member of staff associated with it.
• It is therefore a mandatory relation at both ends.

Figure2.31: 1:1 mandatory relationship
• These to entity types could be amalgamated into one.
Staff(emp_no, name, cont_no, start, end, position, salary)
• or kept apart and a foreign key used
Staff(emp_no, name, contract_no)
Contract(cont_no, start, end, position, salary)
• or
Staff(emp_no, name)
Contract(cont_no, start, end, position, salary, emp_no)

2.7.3.4 Mandatory <->Optional
The entity type of the optional end may be subsumed into the mandatory end as in the previous example.
It is better NOT to subsume the mandatory end into the optional end as this will create null entries.

Figure2.32: 1:1 with 1 optional end
If we add to the specification that each staff member may have at most one contract or no contact (For temporary) ,Thus making the relation optional at one end).
• Map the foreign key into Staff - the key is null for staff without a contract.
Staff(emp_no, name, contract_no)
Contract(cont_no, start, end, position, salary)
• Map the foreign key into Contract - emp_no is mandatory thus never null.
Staff(emp_no, name)
Contract(cont_no, start, end, position, salary, emp_no)
 Example
Consider this example:
• Staff “Gordon”, empno 10, contract no 11.
• Staff “Andrew”, empno 11, no contract.
• Contract 11, from 1st Jan 2001 to 10th Jan 2001, lecturer, on £2.00 a year.
Foreign key in Staff:
Contract Table:
Cont_no Start End Position Salary
11 1st Jan 2001 10th Jan 2001 Lecturer £2.00
Staff Table:
Empno Name Contract No
10 Gordon 11
11 Andrew NULL


However, Foreign key in Contract:
Contract Table:
Cont_no Start End Position Salary Empno
11 1st Jan 2001 10th Jan 2001 Lecturer £2.00 10
Staff Table:
Empno Name
10 Gordon
11 Andrew
As you can see, both ways store the same information, but the second way has no NULLs.

Mandatory <->Optional - Subsume? IF OPTIONAL AT ONE END.
The reasons for not subsuming are the same as before with the following additional reason.
• Very few of the entities from the mandatory end are involved in the relationship. This could cause a lot of wasted space with many blank or null entries.

Figure2.33: 1 optional end
• If only a few lecturers manage courses and Course is subsumed into Lecturer then there would be many null entries in the table.
Lecturer(lect_no, l_name, cno, c_name, type, yr_vetted, external)
• It would be better to keep them separate.
Lecturer(lect_no, l_name)
Course(cno, c_name, type, yr_vetted, external,lect_no)
Summary...
 So for 1:1 optional relationships, take the primary key from the `mandatory end' and add it to the `optional end' as a foreign key.
That is if one entity ends to the other entity with “Optional” relationship, then Take the Primary key of the 2nd entity and include as the “Foreign Key” in the 1st entity table.
So, given entity types A and B, where A <->B is a relationship where the A end it optional, the result would be:
A (primary key,attribute,...,foreign key to B)
B (primary key,attribute,...)
OPTIONAL AT BOTH ENDS...
Such examples cannot be amalgamated as you could not select a primary key. Instead, one foreign key is used as before.

Figure 2.34: 2 optional end
• Each staff member may lease up to one car
• Each car may be leased by at most one member of staff
• If these were combined together...
Staff_car(emp_no, name, reg_no, year, make, type, colour)
What would be the primary key?
• If emp_no is used then all the cars which are not being leased will not have a key.
• Similarly, if the reg_no is used, all the staff not leasing a car will not have a key.
• A compound key will not work either.
Thus it needs to combine the “Entities” in one Table.
2.7.3.5 Mapping 1:m relationships
To map 1:m relationships, the primary key on the `one side’ of the relationship is added to the `many side' as a foreign key.
For example, the 1:m relationship `course-student':

Figure2.35: Mapping 1:m relationships
• Assuming that the entity types have the following attributes:
Course(course_no, c_name)
Student(matric_no, st_name, dob)
• Then after mapping, the following relations are produced:
Course(course_no, c_name)
Student(matric_no, st_name, dob, course_no)
• If an entity type participates in several 1:m relationships, then you apply the rule to each relationship, and add foreign keys as appropriate.
2.7.3.6 Mapping n:m relationships
If you have some m:n relationships in your ER model then these are mapped in the following manner.
• A new relation is produced which contains the primary keys from both sides of the relationship
• These primary keys form a composite primary key.

Figure2.36: Mapping n:m relationships
• Thus
Student(matric_no, st_name, dob)
Module(module_no, m_name, level, credits)
• becomes
Student(matric_no, st_name, dob)
Module(module_no, m_name, level, credits)
Studies(matric_no,module_no)
This is equivalent to:

Figure 2.37: After Mapping a n:m relationship
Student(matric_no,st_name,dob)
Module(module_no,m_name,level,credits)
Study()





Summary
• 1-1 relationships
Depending on the optionality of the relationship, the entities are either combined or the primary key of one entity type is placed as a foreign key in the other relation.
• 1-m relationships
That is, If the relationship between Entities is one to Many then The primary key from the `one side' is placed as a foreign key in the `many side'.
• m-n relationships
A new relation is created with only field names which are primary keys from each of the available entity forming a composite key.
2.8 Advanced ER Mapping
2.8.1 Mapping parallel relationships
Parallel relationships occur when there are two or more relationships between two entity types (e.g. employees own and service cars).
For Example:

Figure 2.38: Parallel Relationships
• In order to distinguish between the two roles we can give the foreign keys different names.
• Each relationship is mapped according to the rules, and we end up with two foreign keys added to the Vehicle table.
• So we add the employee_no as the owner_no in order to represent the `owns' relationship.
• We then add the employee_no as the serviced_by attribute in order to represent the `services' relationship.
• Before mapping
Employee(employee_no,...)
Vehicle(registration_no,...)
• After mapping
Employee(employee_no,...)
Vehicle(registration_no,owner_no,serviced_by,...)
2.8.2 Mapping 1:m in unary relationships

Figure 2.39: Mapping recursive relationships
• Employees manage employees
• Each employee has an employee_no with is the primary key
• We represent the manager’s relationship by adding a manager_no as a foreign key.
• This is in fact the employee_no of the manager.
• It is given a different name to clearly convey what it represents, and to ensure that all the entity type's attributes have unique names, as to do otherwise would be invalid.
• After mapping
Employee(employee_no,manager_no, name,...)
• So in general, for unary 1:n relationships, the foreign key is the primary key of the same table, but is given a different name.
• Note that the relationship is optional in both directions because not all staff can be managers, and the top manager is not managed by anybody else.
2.8.3 Mapping superclasses and subclasses
There are three ways of implementing superclasses and subclasses and it depends on the application which will be the most suitable.
Only the first method is a true reflection of the superclasses and subclasses and if either of the other methods is preferential then the model should not have subclasses.
1. One relation for the superclass and one relation for each subclass.
2. One relation for each subclass.
3. One relation for the superclass.

Example

Figure 2.40: Superclass/Subclass mapping example
Staff(staff_no,name,address,dob)
Manager(bonus)
Secretary(wp_skills)
Sales_personnel(sales_area, car_allowance)
One relation for the superclass and one relation for each subclass:
Staff(staff_no,name,address,dob)
Manager(staff_no,bonus)
Secretary(staff_no,wp_skills)
Sales_personnel(staff_no,sales_area, car_allowance)
The primary key of the superclass is mapped into each subclass and becomes the subclasses primary key. This represents most closely the EER model. However it can cause efficiency problems as there needs to be a lot of joins if the additional information is often needed for all staff.
One relation for each subclass:
Manager(staff_no,name,address,dob,bonus)
Secretary(staff_no,name,address,dob,wp_skills)
Sales_personnel(staff_no,name,address,dob,sales_area, car_allowance)
All attributes are mapped into each subclass. It is equivalent to having three separate entity types and no superclass.
It is useful if there are no overlapping entities and there are no relationships between the superclass and other entity types. It is poor if the subclasses are not disjoint as there is data duplication in each relation which can cause problems with consistency.
One relation for the superclass:
Staff(staff_no,name,address,dob, bonus, wp_skills, sales_area, car_allowance)
This represents a single entity type with no subclasses.
This is no good if the subclasses are not disjoint or if there are relationships between the subclasses and the other entities.
In addition, there will be many null fields if the subclasses do not overlap a lot. However, it avoids any joins to get additional information about each member of staff.

No comments:

Post a Comment