SQL Data Models
Definition of SQL Data Models
SQL data models are an essential aspect of database management and play a critical role in organizing and structuring data to ensure efficiency, integrity, and accessibility.
At its core, a SQL data model defines the logical design and structure of a database, including tables, relationships between tables, constraints, and rules for accessing and manipulating data. It acts as a blueprint for how data will be stored, organized, and retrieved within a database system.
There are two main types of SQL data models: the conceptual model and the physical model. The conceptual model focuses on the high-level design of the database, identifying the entities, attributes, and relationships between data elements. It provides a big-picture view of the database's structure and helps stakeholders understand the overall data architecture.
On the other hand, the physical model delves into the technical implementation of the database, specifying the storage structures, indexing, and optimization techniques. It translates the conceptual model into a concrete database schema, detailing how data will be stored, indexed, and accessed at the physical level.
Additionally, SQL data models adhere to the principles of normalization, which is the process of organizing data to minimize redundancy and dependency. Normalization ensures data integrity and reduces the risk of anomalies or inconsistencies within the database.
Essential terms
An entity is an abstraction of some object: a student, a car, or a building. A relationship defines how entities are interconnected—each student is assigned to one of the groups for studying. Such a relationship in Relational DB is called a many-to-one relationship.
A relation consists of attributes and tuples. An attribute is a column in a table, some property of which has a relation. Concerning a student, these are the attributes: last name, first name, date of birth, and student number. A tuple is a row in a table, a collection of specific attribute values.
For a student relation with the attributes student number, last name, first name, and date of birth, the tuple can be 88881111, Jackson, John, 01.01.2000. A relation with its attributes can be briefly denoted as follows: Student(student number, last name, first name, date of birth).
Restrictions
There are some restrictions imposed on relations in a relational data model:
- In a database, it is not possible to have two relations with identical names;
- When designing a database, make sure that each attribute within a relation has a unique name;
- One value per cell.;
- In relations, the order of the rows and columns does not matter.
Entity integrity
Integrity is the conformity of a data model with certain rules. There are two kinds of integrity: entity integrity and referential integrity.
The primary key (PK) concept maintains entity integrity by ensuring a table has no identical tuples (rows).
A primary key is a field or a set of fields used to uniquely identify each record in the table, guaranteeing that no two records are identical. It's vital as it allows easy and efficient access to specific records and helps maintain the integrity and consistency of data in the table by preventing duplicate values.
Relation between the two tables, Student and Group, will be:
Adding the following row about the student: 88881188, Madison, Peter, 01.02.2002, 132 will cause a referential integrity violation as there are no groups where the group number of 132.
Referential integrity
Referential integrity occurs when users have two connected relations. The relationship of the relations is provided using a foreign key (FK). A foreign key is an attribute of the relation (subordinate/dependent) that points to the primary key of another relation (main/master). Referential integrity requires every foreign key in a main table to reference an existing tuple in a subordinate table.
Student and Group have two relations. The Group has an attribute number, the primary key, and the Student relation has the same attribute, indicating that the student belongs to a particular group. In the Student relation, the group number is a foreign key. If a tuple in the Student table references a group tuple that does not exist in the Group table, referential integrity is violated.
FK creates a many-to-one (one-to-many) relationship.
Relationships
There are relationships of one-to-one, one-to-many, and many-to-many types.
- A one-to-one relationship (1-1) means that one tuple of the first relation can be associated with no more than one tuple of another relation. The opposite is also true. For example, passport-citizen (one person can have only one passport, and a passport refers to one person).
- A one-to-many relationship (1-M) means that a tuple of the first relation can be associated with one or more tuples of the second relation, but the opposite is false. Any tuple of the second relation can be associated with only one tuple of the first relation. For example, student-group (there can be many students in a group, but each student belongs to only one group).
- A many-to-many (M-M) relationship means that any tuple of the first relation can be associated with one or more tuples of another relation. The opposite is also true. For example, student-discipline (each student studies many disciplines, and each discipline can be studied by many students).
Note that many-to-many relationships do not exist in RDM. It is split into two one-to-many relationships, so an intermediate associative relation appears, with two foreign keys as attributes that indicate the primary keys of the original relations.
Any of the relationships described above can be identifying or non-identifying. An identifying relationship identifies if a subordinate relationship's foreign key goes into (or becomes part of) its primary key. For example, the store's order number (PK of order relation) is generated based on the customer number and the order date. The customer number is the PK for the customer relation. The order number is a composite PK of two attributes: customer number (also a foreign key) and order date.
In a non-identifying relationship, the foreign key relationship does NOT go to the primary key, for example, the Student-Group relationship discussed above.
Pros and cons
The popularity of RDM can be attributed to a couple of reasons. Firstly, it comes with a query language called SQL, making it easy to interact with data. Secondly, it is based on set theory, a mathematical apparatus ensuring reliability. However, DBMS based on other data models could be more scalable and performant with larger data volumes, albeit simpler and less general. Additionally, the relational data model differs from the object data model, requiring a translational layer that increases code complexity.