Introduction to Databases
Quick Nav
Create E-R (entity-relationship) Diagrams and
written descriptions
Working with Weak Entities
Creating E-R diagrams and written descriptions for superclasses and subclasses
Computing as a utility
The idea that computers can be turned on and off as needed allows companies to align technology expenditure with requirements at any time is computing as a utility, with other utilities being water, power, etc, the supply is effectively unlimited so the only factor is demand. Side thought; UTILITY MARKETS, NOT PLASTIC
Foundations
- What is a database?
- Why would you need a database?
- What would a database be used for.
Database: Noun
A database is a large collection of related data, personal is a few MB of data, medium is GB to TB of data,
large is petabytes.
A database is šļø:
- Persistent: stored on a stable medium
- Shared: Multiple (simultaneous) users
- Interrelated: forms a bigger picture
- Models real world information
Database example šļø :
Airline data
- Airplane: model, weight, capacity...
- Airport: name, location, code
- Customer: name, address, flights taken...
- Ticket: customer, flight, price
Airline database users: - Customers: lookup flights, buy tickets
- Analyst: View purchased tickets and flight data
Database Management Systems (DBMS)
A database management system (DBMS) is a collection of software components that lets you
- Create (define, populate)
- Maintain (update, modify, enforce constants in)
- Control access to (secure, allow concurrency in)
a database
Why not use the flat file method? (spreadsheets)
Redundancy! š
Data management is less flexible for creating and managing data which can lead to duplicate entries.
Program data dependency
- Inconsistency in the data, is "John smith" the same as "j. smith" or "jon smih"?
- Duplication of data - Why are we storing "jon Smith" every time he takes a classes?
- Update inconsistency- You need to update all of your data every time there is a change
No data integrity
- No support for sharing data
- No recovery in case of crash
- No prevention of data entry errors (eg, two people editing the same file)
- No security mechanism
How does a database fix this?
- A database uses a single access repository of data accessed by multiple users
- Supports different views and sharing of concurrent access
- The costs are higher for design, implementation, and maintenance, but the result is improved safety and reliability
Database questions (queries)
- I want to know..
- I want to change... (do you have permission?)
Database benefits
- Program-data independence
- Controlled data redundancy
- Controlled access to data
- Support for multiple user interfaces
- More efficient query processing
- Faster application development
File processing systems
Hierarchical model
- Data arranged in "parent-child" relationships
- A company has many departments, a department has many employees, an employee has many dependents, etc.
- A structure can be too restrictive: for example, a student may take many courses, but each course has many students ("many many")
Network Model
- Can represent more general relationships among data, but it has drawbacks.
- Applications still need to navigate relationships explicitly š©
- Very difficult to maintain or reorganize š©
Relational Model
- First model to separate the logical structure of the database from its physical structure
- Data is divided into two-dimensional tables called relations
- Relationships between tables are given by shared keys
- Rules exist for dividing data among table
- A standardized query language (SQL) exists
later models
- Object and object-relational databases: add support for more structured types of data (used in multimedia and geographic information systems)
- Not as common as the relational model
Users' views of a database
- End users: Various interfaces to work with particular "views" of a database
- Developers: Tools to write, test, and debug applications
- Administrators: Tools to control data access, track system performance, maintain database
- Designers: Full access to DBMS to create and modify schema and stored data
Three-Schema Architecture
- Internal schema: How data is stored š
- Physical schema: arrangement of data on desk
- Conceptual schema: Abstract description of the different types of data āļø
- Entities, relationships, constraints
- External schema: A user's view of the data š
- Could be different for different users
Data independence / abstraction
- A lower-level schema can be changed without affecting a higher-level schema
- Can make changes to the entities and relationships between them without invalidating existing access queries
- Can change the organization of the data or the arrangement of the data on the disk without affecting the conceptual schema
Schema vs Snapshot
- Database schema describes the data
- The structure of the database
- Types
- Constraints or connections
- Does not tell us what the data is
- Snapshots are the particular data stored in the database at any given point in time
- Putting the data in is populatingthe database
The Making of a Database
- Requirement analysis
- Document business rules, determine entities, relationships, and constraints needed
- Conceptual Design
- Generate conceptual schema
- Logical Design
- Generate local schema
Entity-Relationship (E-R) Model
- A way to represent different kinds of data within the organization and how they interact
- Entities (nouns, people, places, things, NOT individual items but categories of
items)
- Entity names (capitalized) should be single nouns and unique within the database
- Attributes (adjectives, things that describe properties of an entity)
- Typically lower case, should be singular nouns (or noun phrases)
- A key uniquely identifies each entity instance, but multiple entities can share the same attributes. You can not have duplicate attributes on the same entity, eg, a student appearing twice on the same course for the same quarter, where course is the entity.
- Relationships
- Entities (nouns, people, places, things, NOT individual items but categories of
items)
- Developed from business rules
- Survey the domain experts
- Represent the information accordingly
Notional Conventions
- Unfortunately, no single standard
- Use the notation from the textbook
- Will accept submissions using alternate notation if this is something you already know from elsewhere
- Entities RECTANGLE
- STUDENT, INSTRUCTOR, COURSE
- Atributes OVAL
- Name, address, phone_number, code, fee, length
- Put an additional OVAL around an attribute to indicate that it can have multiple values
- Relationships DIAMOND with a line
- enrolled, teaches
Other Indicators:
- Double box = Weak entity
- Underline = Unique Identifier
- Double Outline / bubble = Multiple values
- Derived attribute = Dotted oval (not solid)
- Optional = No special indicator, explain it in documentation.
- Single line -- Optional participation
- Double line == Mandatory participation
An ENTITY description includes...
- What it represents
- What attributes it has
- What causes it to be created/deleted
- What instances to include/exclude
An Attribute description includes...
- What it represents
- Maximum and minimum number of values recorded (eg. if you have multiple students there will be multiple duplicate attributes on an object)
- Any dependencies on other attributes
- Add a sentence to each entity identifying its key)
A Relationship description includes
- Name of relationship
- Describe both directions of relationship, with names of entities involved preceding and and following the name of the relationship
- For each instance of one entity, specify how many instances of the other entity it must (minimum) and may (maximum) relate to, and vice versa
- Explain any additional restrictions
- Participation keywords: Can, may, may not, might, will, must, exactly
- Cardinality keywords include: One, one or more, more than one, many, several
Types of Attributes
- Required vs Optional: Do]es a value need to be specified?
- Composite vs Simple: Can the attribute be divided into meaningful parts?
- Multi-valued vs Single-Valued: Can the attribute have more than one value at once?
- Derived vs Stored: Can the value of the attribute be computed from others?
Carnality Constraints
Carnality constraints determine how many relationship instances each entity instance may participate in. Cardinals are numbers which refer to things, eg, 3 chairs, two cars, 4 cars, Ordinal numbers are relative, eg, 3 greater, 4 fewer, etc.
Three types of relationships:
- One-to-one (1:1) ("lock and key")
- Many-to-one (M:1) ("bills and wallet") (also includes one-to-many)
- Many-to-many (M:N) ("library": books and borrowers)
Constraints
- Primary key
- Foreign key (full dependency, directly references a primary key from ancient table)
- Unique
- Not null (must have a value)
- Check
Normalization
- Full dependency - Column is fully dependent on primary key column
- Partial dependency - Column is partially dependent on primary key (for composite primary keys)
- Transitive dependency - Column is dependent on another key dependent on the primary key
Dependencies are reduced as you apply normal forms, normalization itself is the process of minimizing redundancy & dependencies by separating data elements into related tables
Record / Instances
Entity instances are rows in a database while attributes are columns
Participation Constraints
Participation determines if every entity instance must participate in some relationship instance
- Partial participation: Some instances of this entity may not relate to any instance of any other
- Full participation ever instance of this entity must relate to at least one instance of the other
The Four Participation Types
- Partial-Partial, some instances of A participate in B, and some instances of B participate in A
- Full-Partial, all instances of A must participate in B
- Partial-Full, all instances of B must participate in A
- Full-Full, all instances of A participate in B and all instances of B participate in A
Record type: Weak Entities
A weak entity is an instance that depends on other instances for their existence. The associated
entity is the owner entity, and the relationship between a weak entity and its owner is called an identifying
relationship.
A weak entity only has a partial key which uniquely identifies the instances associated with a
particular instance of the owner entity.
Example
A social media user account is an entity, which can create messages, which are weak entities. The messages cannot exist without an associated account.
E-R Diagram representation of weak entities
- A weak entity is represented by a double rectangle
- Identifying relationship is represented by a double diamond
- Partial key is represented by a dashed underline
Written representation of Weak Entities
- Entity: "No WEAK_ENTITY will be included without a corresponding OWNER_ENTITY.
- Attributes: "No set of attributes is sufficient to uniquely identify each WEAK_ENTITY. Each WEAK_ENTITY is identified by it's partial key together with the key of the related OWNER_ENTITY."
Relationships with attributes and intersecting entities
- Relationships can sometimes have attributes, in this case you can sometimes replace the relationship and attribute with a weak entity derived from the intersection of the other two strong entities
- New relationships are identifying relationships
- To represent it in writing, say "A NEW_ENTITY represents a combination of a ENTITY1 and a ENTITY2", then describe its attributes, and state that it is a weak entity, thus "There is no attribute that uniquely identifies a NEW_ENTITY; instances of NEW_ENTITY will be identified by the key1 and key2 of the related instances of ENTITY1 and ENTITY2"
Relationship Types
- Unary (recursive) Relationship
- Entity relationship to self
- Relates two instances of the same entity, eg, employees being supervised by other employees, or persons being married to other persons.
- Appears just once in a diagram, with the relationship connected to both ends and directions written for both directions of the relationship.
- Ternary Relationship
- Three entities connected by one relationship
- Eg, an interpreter meets with a buyer and seller, where each combination has a different MeetingTime and MeetingRoom
- In a written representation, describe all directions of the relationship and describe attributes
- Create a new intersecting entity, it will be weak with three identifying relationships.
- N-ary Relationship
- Four+ entities connected by one relationship
Composite vs Simple attributes
Composite attributes contain derived or sub attributes while simple attributes stand on their own.
2024-09-18 Super classes and Sub classes
Super classes contain only attributes which are shared between all sub classes, while sub classes contain attributes which are purely self contained and not shared.
Specialization vs Generalization
- Specialization is identifying the superclass first and then determining the subclasses (top down)
- Total specialization: Every instance of the superclass MUST be an instance of some subclass
- Partial specialization: Instances of the superclass might not be instances of any subclass
- Generalization is identifying the sub classes first and then determining the super class (bottom up)
Specialization constraints
- Disjointedness Constraint (circle contains D): An instance can only belong to one subclass, like a Chipotle divided container where the rice and beans belong to their own compartments that don't touch
- Overlapping constraint (circle contains O): An instance can belong to more than one subclass, like having a bowl where rice and beans can both be part of the same section of the container
- Lines are drawn from superclass to circle, and from circle to each subclass
- Line to circle is single for partial specialization, double for total specialization
- Circle contains ādā for disjointness constraint, 'oā for overlapping constraint
- Lines from circle to subclasses contain subset symbols (with open end toward circle)
Partial, disjoint: Every superclass may also be one of the following: list
subclasses
Partial, overlapping: Every superclass may also be one or more of the following: list
subclasses
Total Disjoint: Every superclass MUST also be exactly one of the following: list
subclasses
Total, overlapping: Every superclass must also be one or more of the following: list
subclasses
The relational model
A relational database is made up of two-dimensional tables called relations, each representing
some entity recorded in the database.
Each row of a relation is a tuple, or record, representing one instance of that entity.
Each column of a relation is an attribute for which each tuple has a value assigned, and a domain
which from which its values are taken. A domain consists of atomic values- no multi-valued or divisible attributes
are allowed.
Lastly, missing or unknown values are indicated by NULL.
Domains (data types)
- Numerical values, which include integer and floating point values
- Character strings, fixed or variable length
- Dates and times
Constraints can also be enforced, such as ranges, upper/lower bounds, allowable or forbidden values, no NULLs, etc.
Properties of a Relation
- Each relation (entity instance) has a unique name (key) in the database
- Each attribute has a unique name
- Each entry of a relation contains a single value from its attribute's domain
- The order of the records does not matter
- The order of the attributes does not matter
- No two records in a relation are identical
DBMS Constraints
- Domain constraints: All values of an attribute must come from the same domain
- Entity integrity: A table CANNOT contain a null value for a primary key
- Referential Integrity: Every foreign key value must match some value of the primary key in the table that it references
E-R mapping
Is the process of turning an E-R Model into a set of relations in the relational model.
- Map strong entities
- Map weak entities
- Map binary many-to-many relationships
- Map binary one-to-one relationships
- Map binary one-to-many/many-to-one relationships
- Map unary relationships
- Map ternary (and higher) relationships
- Map generalizations/specializations
Strong entities become a table, the key of an entity becomes the primary key of the table, atomic attributes (not derived and has constraints so it's not null) become columns. Multi-valued attributes become new tables with columns for different values and it's own primary key, the original primary key from the parent becomes the foreign key in the new table.
Classes of SQL Commands
- Data Definition Language (DDL)
- Create schemas, tables, constrains, views
- Data Manipulation Language (DML)
- Modify and update tables, retrieve infomation
- Data Control Language (DCL)
- Grant and revoke access to parts of the database
Most users only have access to the DML - We will use both the DDL and DML.
The Schema of a Relation
- The Schema of a relation consists of the name of the relation followed by a list of its attributes. It's basically the header row in a PostgreSQL database on it's own to communicate it easier
- A relation for the schema is a set of tuples where each tuple contains an element from the associated domain of each of its attributes
SQL Domains
- String Domains include
char(n)
, which is exactly N characters long,varchar(n)
andvarchar2(n)
, which hold up to n bytes which caps at 2000 and 4000 bytes respectively.
Generally, the form is SELECT, WHERE, GROUP BY , HAVING, ORDER BY (in order)
Others include FROM, DISTINCT
INSERT INTO
It's like for adding things and stuff but more fast I think... Make sure columns match 1:1!
INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');
Equijoins
These simply sort data where columns match between multiple tables, usually a primary and foreign key.
select *
from mydb.STUDENT, mydb.RESULT
where mydb.STUDENT.SNO = mydb.RESULT.SNO
and mydb.RESULT.GRADE > 90
Outer joins
Two tables are selected and all rows from the specified table are returned, however if there is a match, data from the other table is returned as well.
select bc.lastname, bo.orderid, bo.orderdate
from Book_customer bc LEFT OUTER JOIN Book_order bo
using (customerid);
Join On
Join ON just sandwiches results from two tables together. Not much to it. Not always suppoted.
select bc.lastname, bo.orderid, bo.orderdate
from Book_customer bc JOIN Book_order bo
ON bc.customerid = bo.customerid
AND bc.lastname = 'GIRARD';
Natural Join
Natural join needs identical columns which have the same data type and simply merges return data.
select *
from mydb.STUDENT
NATURAL JOIN mydb.RESULT
where GRADE > 93
Self Join
Pro tip: Don't lick frozen poles in winter. This joins a table to itself, which is helpful for when you need to find sets of data stored in the same table.
select customer.lastname, referredby.lastname
from customer.book_customer, referredby.book_customer
where refferedby.referred = customer.customerid
Update
Update a record! Change names, details, etc.
UPDATE customer.
SET customer.book_customer = 'Joe'
where customer.id = 24245
SQL Wildcards
_
= One character
%
= Any number of characters (eg, B% could be B0, Boo, Boston, or Boooooo)
SQL Math
Count, sum, min, max, and avg all work in SQL!