11. Data Integrity

Data Integrity — Ensuring Data Quality in SQL

Motivation: Imagine an online shop where a customer order references a customer ID that does not exist, or a product price is accidentally stored as a negative number. Both situations lead to inconsistent, unreliable data — and ultimately to wrong decisions or broken applications.

Data integrity is the set of rules and mechanisms that prevent exactly these problems. It ensures that data stored in a database is correct, consistent, and meaningful at all times.

These three words — correct, consistent, and meaningful — are worth defining precisely, because they are often used loosely in everyday language.

Correct means that every stored value is a valid instance of what the column is supposed to represent. A date of birth column contains an actual calendar date, not an arbitrary string. A price column contains a non-negative number, not NULL or a letter. Correctness is enforced by data types, NOT NULL, and CHECK constraints.
Consistent means that the data does not contradict itself — neither within a single row nor across related tables. An order row that references a customer who does not exist is inconsistent. An end_date that is earlier than the start_date in the same row is inconsistent. Consistency is enforced by FOREIGN KEY constraints and multi-column CHECK constraints.
Meaningful means that the data makes sense in the real-world context it is meant to model. A value can be correct (technically a valid integer) and consistent (no contradictions) but still be meaningless — for example, a quantity of zero in an order line, or a discount of 150%. Meaningful data satisfies the business rules of the domain. These rules are enforced by user-defined CHECK constraints.

Why Data Integrity Matters

Without integrity rules, databases are fragile:

A foreign key column could reference a row that was deleted.
A salary column could hold a negative value.
Two users could accidentally share the same email address.
A mandatory field could be left empty.

These problems are often hard to detect after the fact and expensive to fix. By defining integrity constraints at the database level, errors are caught immediately at the moment of insertion or update — regardless of which application or user caused them.

The key benefits of enforcing data integrity are:

Benefit	Explanation
Reliability	Data is consistent and trustworthy across the whole system.
Automation	The database enforces rules automatically, reducing the burden on application code.
Early error detection	Invalid data is rejected immediately, before it can cause harm.
Interoperability	Multiple applications using the same database all benefit from the same rules.

Categories of Data Integrity

Data integrity is traditionally divided into four categories. Each addresses a different dimension of data quality.

Entity Integrity

Every table must be able to uniquely identify each of its rows. No two rows may be indistinguishable from one another, and no row may be “anonymous”.

Rule: Every table must have a primary key. The primary key must be unique and must never be NULL.

PostgreSQL
MariaDB / MySQL

-- The id column uniquely identifies each customer.
CREATE TABLE customers (
  id   SERIAL PRIMARY KEY,
  name TEXT   NOT NULL
);

-- The id column uniquely identifies each customer.
CREATE TABLE customers (
  id   INT  AUTO_INCREMENT PRIMARY KEY,
  name TEXT NOT NULL
);

Without entity integrity, it would be impossible to reliably address a specific row — for example when updating or deleting a record.

Referential Integrity

When one table references rows in another table (via a foreign key), those referenced rows must actually exist. A reference to a non-existent row is called a dangling reference and leads to inconsistent data.

Rule: Every value in a foreign key column must either match an existing primary key value in the referenced table, or be NULL (if the relationship is optional).

PostgreSQL
MariaDB / MySQL

-- An order must reference an existing customer.
CREATE TABLE orders (
  id          SERIAL  PRIMARY KEY,
  customer_id INTEGER NOT NULL REFERENCES customers(id),
  placed_at   DATE    NOT NULL
);

-- An order must reference an existing customer.
CREATE TABLE orders (
  id          INT  AUTO_INCREMENT PRIMARY KEY,
  customer_id INT  NOT NULL,
  placed_at   DATE NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(id)
);

If someone tries to insert an order with a customer_id that does not exist in customers, the database will reject the operation immediately.

Domain Integrity

Each column has a defined domain: the set of all values that are valid for that column. Domain integrity ensures that only values within this domain can be stored.

The domain of a column is determined by:

its data type (e.g., INTEGER, DATE, VARCHAR(100))
additional constraints (e.g., NOT NULL, CHECK, DEFAULT)

Examples:

A birth_date column of type DATE will reject the text "hello".
A price column with CHECK (price >= 0) will reject -5.00.
A status column with CHECK (status IN ('ACTIVE', 'INACTIVE')) rejects any other string.

User-Defined Integrity

Beyond the three structural categories above, databases also support user-defined rules that reflect specific business requirements. These are constraints that cannot be derived from the schema structure alone but must be explicitly specified.

Examples:

A discount percentage must be between 0 and 100.
A start_date must not lie after an end_date.
A quantity in an order line must be greater than zero.

In SQL, user-defined integrity is primarily expressed using CHECK constraints and, in more advanced scenarios, triggers or stored procedures.

Integrity Constraints in SQL

SQL provides a set of constraints that are defined directly in the table schema. The database engine evaluates them automatically on every INSERT, UPDATE, and DELETE.

`NOT NULL`

A column declared NOT NULL must always contain a value. An attempt to insert or update a row with NULL in that column is rejected.

PostgreSQL
MariaDB / MySQL

CREATE TABLE employees (
  id         SERIAL PRIMARY KEY,
  first_name TEXT   NOT NULL,
  last_name  TEXT   NOT NULL,
  email      TEXT   -- optional (NULL is allowed)
);

CREATE TABLE employees (
  id         INT  AUTO_INCREMENT PRIMARY KEY,
  first_name TEXT NOT NULL,
  last_name  TEXT NOT NULL,
  email      TEXT -- optional (NULL is allowed)
);

Use NOT NULL for every column where an empty value makes no sense — for example, a person’s name or a product code.

`UNIQUE`

A column (or combination of columns) declared UNIQUE must contain no duplicate values across all rows. Unlike a primary key, a UNIQUE column may contain NULL — and, in most database systems, multiple NULL values are allowed because NULL is never considered equal to anything, including another NULL.

PostgreSQL
MariaDB / MySQL

CREATE TABLE users (
  id       SERIAL PRIMARY KEY,
  username TEXT   NOT NULL UNIQUE,
  email    TEXT   NOT NULL UNIQUE
);

-- MariaDB/MySQL requires VARCHAR (not TEXT) for UNIQUE constraints,
-- because index keys on TEXT columns need an explicit prefix length.
CREATE TABLE users (
  id       INT          AUTO_INCREMENT PRIMARY KEY,
  username VARCHAR(255) NOT NULL UNIQUE,
  email    VARCHAR(255) NOT NULL UNIQUE
);

You can also declare a UNIQUE constraint across multiple columns at the table level. This means the combination of those columns must be unique, while individual columns may repeat.

CREATE TABLE enrollments (
  student_id INTEGER NOT NULL,
  course_id  INTEGER NOT NULL,
  UNIQUE (student_id, course_id)
);

Here, the same student can appear in multiple courses, and the same course can have multiple students — but the same student cannot be enrolled in the same course twice.

`PRIMARY KEY`

The PRIMARY KEY constraint is the central pillar of entity integrity. It combines two rules in one:

The column (or columns) must be unique.
The column (or columns) must never be NULL.

A table can have only one primary key, but that key can span multiple columns (a composite primary key).

PostgreSQL
MariaDB / MySQL

-- Single-column primary key
CREATE TABLE products (
  product_id SERIAL  PRIMARY KEY,
  name       TEXT    NOT NULL
);

-- Composite primary key (common in junction tables)
CREATE TABLE course_registrations (
  student_id INTEGER NOT NULL,
  course_id  INTEGER NOT NULL,
  PRIMARY KEY (student_id, course_id)
);

-- Single-column primary key
CREATE TABLE products (
  product_id INT  AUTO_INCREMENT PRIMARY KEY,
  name       TEXT NOT NULL
);

-- Composite primary key (common in junction tables)
CREATE TABLE course_registrations (
  student_id INT NOT NULL,
  course_id  INT NOT NULL,
  PRIMARY KEY (student_id, course_id)
);

`FOREIGN KEY` and Referential Actions

A FOREIGN KEY constraint links a column in one table to the primary key of another table, enforcing referential integrity.

CREATE TABLE orders (
  id          SERIAL  PRIMARY KEY,
  customer_id INTEGER NOT NULL REFERENCES customers(id),
  total       NUMERIC(10, 2) NOT NULL CHECK (total >= 0)
);

What happens when the referenced row changes?

A question arises immediately: what should happen to the child rows when a referenced row in the parent table is deleted or updated?

SQL allows you to specify this behavior using referential actions:

Action	Effect
`RESTRICT`	The operation on the parent row is rejected if child rows exist. This is the default in most systems.
`CASCADE`	The operation is propagated to all child rows automatically (e.g., delete or update them).
`SET NULL`	The foreign key columns in child rows are set to `NULL`.
`SET DEFAULT`	The foreign key columns in child rows are set to their default value.
`NO ACTION`	Similar to `RESTRICT`, but the check may be deferred until the end of a transaction.

PostgreSQL
MariaDB / MySQL

CREATE TABLE orders (
  id          SERIAL  PRIMARY KEY,
  customer_id INTEGER NOT NULL
    REFERENCES customers(id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE,
  placed_at   DATE    NOT NULL
);

CREATE TABLE orders (
  id          INT AUTO_INCREMENT PRIMARY KEY,
  customer_id INT NOT NULL,
  placed_at   DATE NOT NULL,
  CONSTRAINT fk_orders_customer
    FOREIGN KEY (customer_id)
    REFERENCES customers(id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);

Reading the example above:

ON DELETE RESTRICT — if someone tries to delete a customer who still has orders, the database refuses the deletion.
ON UPDATE CASCADE — if a customer’s id changes (rare, but possible), all matching customer_id values in orders are updated automatically.

`CHECK`

A CHECK constraint defines a logical condition that every row must satisfy. The condition can involve any column in the same table and any SQL expression that returns a boolean result.

PostgreSQL
MariaDB / MySQL

CREATE TABLE products (
  id       SERIAL         PRIMARY KEY,
  name     TEXT           NOT NULL,
  price    NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
  discount NUMERIC(5, 2)           CHECK (discount BETWEEN 0 AND 100)
);

CREATE TABLE products (
  id       INT            AUTO_INCREMENT PRIMARY KEY,
  name     TEXT           NOT NULL,
  price    DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
  discount DECIMAL(5, 2)           CHECK (discount BETWEEN 0 AND 100)
);

You can also write table-level CHECK constraints that compare multiple columns with each other:

PostgreSQL
MariaDB / MySQL

CREATE TABLE projects (
  id         SERIAL PRIMARY KEY,
  start_date DATE   NOT NULL,
  end_date   DATE,
  CONSTRAINT chk_dates CHECK (end_date IS NULL OR end_date >= start_date)
);

CREATE TABLE projects (
  id         INT  AUTO_INCREMENT PRIMARY KEY,
  start_date DATE NOT NULL,
  end_date   DATE,
  CONSTRAINT chk_dates CHECK (end_date IS NULL OR end_date >= start_date)
);

Here, end_date may be left empty (the project is still running), but if it is set it must not be earlier than start_date.

`DEFAULT`

A DEFAULT value is not a constraint in the strictest sense — it does not reject invalid data. Instead, it provides a fallback value that is used automatically when no explicit value is supplied for a column during an INSERT.

PostgreSQL
MariaDB / MySQL

CREATE TABLE articles (
  id         SERIAL    PRIMARY KEY,
  title      TEXT      NOT NULL,
  published  BOOLEAN   NOT NULL DEFAULT FALSE,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE articles (
  id         INT      AUTO_INCREMENT PRIMARY KEY,
  title      TEXT     NOT NULL,
  published  BOOLEAN  NOT NULL DEFAULT FALSE,
  created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

If you insert a new article without specifying published or created_at, the database fills in FALSE and the current timestamp automatically.

DEFAULT is particularly useful for:

Timestamps — recording when a row was created.
Boolean flags — setting a sensible initial state (e.g., “inactive”, “unpublished”).
Status columns — starting every new record in a defined state.

Naming Constraints

The database generates a name automatically for every constraint that has no explicit name. These auto-generated names are often cryptic and hard to work with when you receive an error message or need to drop a constraint later.

It is good practice to name every constraint explicitly using the CONSTRAINT keyword:

PostgreSQL
MariaDB / MySQL

CREATE TABLE order_items (
  id         SERIAL         PRIMARY KEY,
  order_id   INTEGER        NOT NULL,
  product_id INTEGER        NOT NULL,
  quantity   INTEGER        NOT NULL,
  unit_price NUMERIC(10, 2) NOT NULL,

  CONSTRAINT fk_order_items_order
    FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,

  CONSTRAINT fk_order_items_product
    FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE RESTRICT,

  CONSTRAINT chk_order_items_quantity
    CHECK (quantity > 0),

  CONSTRAINT chk_order_items_unit_price
    CHECK (unit_price >= 0)
);

CREATE TABLE order_items (
  id         INT            AUTO_INCREMENT PRIMARY KEY,
  order_id   INT            NOT NULL,
  product_id INT            NOT NULL,
  quantity   INT            NOT NULL,
  unit_price DECIMAL(10, 2) NOT NULL,

  CONSTRAINT fk_order_items_order
    FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,

  CONSTRAINT fk_order_items_product
    FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE RESTRICT,

  CONSTRAINT chk_order_items_quantity
    CHECK (quantity > 0),

  CONSTRAINT chk_order_items_unit_price
    CHECK (unit_price >= 0)
);

A widely used naming convention is:

pk_<table> — primary key
uq_<table>_<column> — unique constraint
fk_<table>_<referenced_table> — foreign key
chk_<table>_<column> — check constraint

Constraint Violations

When an operation violates a constraint, the database aborts the statement and returns an error. The data is not changed. Here are the most common violations and what triggers them:

Constraint	Violation Example
`NOT NULL`	Inserting a row without a value for a required column.
`UNIQUE`	Inserting a row whose email already exists in the table.
`PRIMARY KEY`	Inserting a row with an `id` that already exists.
`FOREIGN KEY`	Inserting an order for a `customer_id` that does not exist.
`CHECK`	Inserting a product with a negative price.

-- Attempting to insert an order for customer_id = 999 (does not exist):
INSERT INTO orders (customer_id, placed_at) VALUES (999, CURRENT_DATE);
-- Result: ERROR — foreign key constraint violation

-- Attempting to insert a product with a negative price:
INSERT INTO products (name, price) VALUES ('Widget', -5.00);
-- Result: ERROR — check constraint violation

Error messages differ between database systems, but they always tell you which constraint was violated and on which table. Reading these messages carefully is an important skill when working with SQL.

Adding and Removing Constraints

Constraints can also be added to or removed from an existing table using ALTER TABLE.

PostgreSQL
MariaDB / MySQL

-- Add a CHECK constraint to an existing table
ALTER TABLE products
ADD CONSTRAINT chk_products_price CHECK (price >= 0);

-- Add a UNIQUE constraint
ALTER TABLE users
ADD CONSTRAINT uq_users_email UNIQUE (email);

-- Add a FOREIGN KEY constraint
ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers(id);

-- Remove a constraint by name
ALTER TABLE products
DROP CONSTRAINT chk_products_price;

-- Add a CHECK constraint
ALTER TABLE products
ADD CONSTRAINT chk_products_price CHECK (price >= 0);

-- Add a UNIQUE constraint
ALTER TABLE users
ADD CONSTRAINT uq_users_email UNIQUE (email);

-- Add a FOREIGN KEY constraint
ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id) REFERENCES customers(id);

-- Remove a CHECK constraint
ALTER TABLE products
DROP CONSTRAINT chk_products_price;

-- Remove a FOREIGN KEY constraint
ALTER TABLE orders
DROP FOREIGN KEY fk_orders_customer;

-- Remove a UNIQUE constraint (treated as an index in MariaDB/MySQL)
ALTER TABLE users
DROP INDEX uq_users_email;

Summary

Data integrity is the foundation of a trustworthy database. It ensures that only correct, consistent, and meaningful data can enter the system — automatically, at the database level, for every application that uses it.

Constraint	Category	What it enforces
`PRIMARY KEY`	Entity	Unique, non-null identifier for every row
`NOT NULL`	Domain	Column must always contain a value
`UNIQUE`	Entity / Domain	No duplicate values in a column
`FOREIGN KEY`	Referential	References must point to existing rows
`CHECK`	Domain / User-defined	Values must satisfy a logical condition
`DEFAULT`	Domain	Fallback value when no value is supplied

Learning Outcomes: What you should be able to do after this chapter

After completing this chapter, students should be able to:

Name: name the categories of data integrity (entity, referential and domain integrity) and common constraints.
Explain: explain why data integrity is essential for data quality.
Apply: define, name, add and remove integrity constraints (PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE, CHECK).
Analyze: analyze a constraint violation based on an error message and determine its cause.
Evaluate: assess which constraints secure the integrity of a given data model.