Skip to content

5. SQL Foundations

Switch to Zen Mode

SQL (Structured Query Language) is a language for working with relational databases.

It was developed from IBM’s earlier language SEQUEL (Structured English Query Language).

The aim was to create a database language that is relatively easy to read and write, even for people who are not professional programmers, and that does not require mathematical symbols such as ∀ or ∃ (hence, relations are represented as tables, which are mathematically equivalent to sets).

Over time, SQL became an international standard. The first standard was published by ANSI (American National Standards Institute) in 1986. Since then, the language has been extended several times. New versions add more functions, but most database systems still rely on the core of the older SQL-92 standard and then add some newer features and vendor-specific extensions.

Modern SQL also offers support for JSON (JavaScript Object Notation). JSON is a compact, text-based data format used to exchange data between applications, especially in web and mobile development. It is easy to read, independent of any specific programming language, and supported by many tools and libraries. Newer standards also introduce advanced features such as pattern matching on table rows and support for multidimensional data and graph queries, but these are more relevant for special use cases and are not yet fully available in all database systems.

SQL is an interactive language. It allows ad-hoc queries, meaning commands can be entered and executed immediately. At the same time, SQL is used inside applications to communicate with a database, both for reading and for writing data. Several SQL commands can be stored in scripts to be executed automatically.

The language is usually divided into four main groups of commands:

  • DDL (Data Definition Language)

    Commands for creating and changing the structure of databases, tables, and indexes (for example: creating tables).

  • DQL (Data Query Language)

    Commands for querying and reading data from tables (for example: selecting data).

  • DML (Data Manipulation Language)

    Commands for inserting, changing, and deleting data records.

  • DCL (Data Control Language)

    Commands for creating users and assigning access rights.

In summary, SQL is a standardized, widely used language that provides clear and structured commands to define data structures, retrieve data, modify data, and control access to data in relational database systems.

In practice, there is not just one “SQL”, but several SQL dialects, depending on the database system. All of them follow the general SQL standard, but each system adds its own functions, data types, and small differences in syntax. This means that basic queries often look very similar, but advanced features may behave differently.

Because of these differences, SQL code is not always 100% portable from one database system to another. Simple queries (for example basic SELECT statements) usually work in many systems without change, while complex queries, stored procedures, or special data types often need to be adjusted for each SQL dialect.

Basic workflow when working with databases

Section titled “Basic workflow when working with databases”

When creating a database, three basic steps are followed:

  1. Create the database. The database acts as a container that holds all tables and other database objects.

  2. Create the required tables. At this stage only the structure is defined: which columns a table has, which data types are stored in these columns, and which column (or columns) forms the primary key.

  3. Fill the tables with data. Tables are the only database objects in which data is physically stored. Once the data is inserted, it can be evaluated, changed, or deleted.

A database can contain several tables, and any number of databases may be created in order to separate data by topic or project. If different users work with the same database system, it is important that appropriate access rights are assigned so that each user only has the permissions that are intended for them.

A database management system (DBMS) can host multiple databases. Before working with a database, it must first be created. When a database is created, it serves as a container for all the tables and other database objects that will be defined later.

Finally, if a database is no longer needed, it can be deleted using the DROP DATABASE command. This command permanently removes the database and all its contents, so it should be used with caution.

The very first SQL command you encounter is the CREATE DATABASE command. This command creates a new database in the database management system. The syntax may vary slightly between different SQL dialects, but the basic structure is similar.

The syntax for creating a database in PostgreSQL is as follows:

CREATE DATABASE someDatabaseName
[WITH OWNER userName
TEMPLATE template
ENCODING encodingSpecification
TABLESPACE tablespaceName];

Where:

  • someDatabaseName: The name of the database to be created.
  • userName: The name of the database owner.
  • template: The template database from which to create the new database (default is template1).
  • encodingSpecification: The character encoding for the database (for example, UTF8).
  • tablespaceName: The tablespace where the database files will be stored.

At times you do not know which databases already exist in your database management system. You need some means to list all existing databases.

There is no direct SQL command to list databases in PostgreSQL. Instead, you can use the following command in the psql command-line interface:

\l

A mere backslash \ followed by the letter “l” lists all databases along with their owners, encodings, and access privileges.

Before you can create tables and insert data, you need to select the database you want to work with. This is done using the USE command in MariaDB/MySQL. In PostgreSQL, you connect to the desired database when starting the psql command-line interface.

This is necessary because a database management system can contain multiple databases, and you need to specify which one you want to work with at any given time.

In PostgreSQL, you connect to a specific database when starting the psql command-line interface. You can do this by specifying the database name as an argument:

\c databaseName

Where:

  • databaseName: The name of the database you want to connect to.

Finally, if a database is no longer needed, it can be deleted using the DROP DATABASE command. This command permanently removes the database and all its contents, so it should be used with caution

To destroy a database in PostgreSQL, you can use the following command:

DROP DATABASE databaseName;

Where:

  • databaseName: The name of the database to be deleted.

Tables (relations) are the very first class citizens objects in a database. It is these tables in which data is stored. Every access to the data takes place via the tables. For every query or evaluation, the name of the table must be specified. The database serves as a container for the logically related tables. The table name must be unique within one database, but the same name may be used in several databases of a DBMS.

A table consists of individual fields (data fields, attributes, columns). The structure or schema of the table is defined by the names and data types of these fields. Related data are entered into one row of the table. They form a record or a tuple.

To be able to create a table, the execution right for the CREATE TABLE statement is required. The administrator grants this right. The user who creates the table then becomes its owner.

To create a table, the CREATE TABLE command is used. The syntax may vary slightly between different SQL dialects, but the basic structure is similar. There is no need to specify the database again, as the table is created in the currently selected database.

The syntax for creating a table in PostgreSQL and MariaDB/MySQL is very much alike. However, there are differences in the supported data types and constraints.

To create a table in PostgreSQL, you can use the following command:

CREATE TABLE tableName (
column1 dataType1 [constraints],
column2 dataType2 [constraints],
...
columnN dataTypeN [constraints]
);

Where:

  • tableName: The name of the table to be created.
  • columnX: The name of a column in the table.
  • dataTypeX: The data type of the column (for example, INTEGER, VARCHAR(255), DATE).
  • constraints: Optional constraints for the column (for example, PRIMARY KEY, NOT NULL).

Derived from an ER model, the table structure is created using the CREATE TABLE command. Each entity in the ER model corresponds to a table in the database. The attributes of the entity become the columns of the table, and the relationships between entities are represented through foreign keys.

When defining a table, it is important to specify the data type for each column. The data type determines what kind of data can be stored in that column (for example, integers, text, dates, etc.) and how much space is allocated for it.

Datatypes can vary between different database systems, but there are some common ones that are widely used. They can be broadly categorized into several groups.

Numeric Data Types for integer values:

Data TypeSizeDescriptionExample Values
INT or INTEGER4 BytesStandard integer type42, -7, 0
SMALLINT2 BytesSmaller range integer type32000, -32000
BIGINT8 BytesLarger range integer type9223372036854775807

Numeric Data Types for floating point values:

Data TypeSizeDescriptionExample Values
FLOAT or REAL4 BytesStandard floating point type42.0, -7.5, 0.0
DOUBLE or DOUBLE PRECISION8 BytesDouble precision floating point type32000.123, -32000.456
DECIMAL (p,s) or NUMERIC (p,s)VariableExact numeric type with fixed precision12345.67, -12345.67

Date and Time Data Types:

Data TypeSizeDescriptionExample Values
DATE4 BytesA single date01.01.2026
TIME8 BytesTime value12:34:56
TIMESTAMP8 BytesA value combining a date and a time01.01.2026 12:34:56

Time values are usually stored in 24-hour format: HH:MM:SS (hours:minutes:seconds). Date values are often stored in the format YYYY-MM-DD (year-month-day) or DD-MM-YYYY, depending on the database system and locale settings.

Text Data Types:

Data TypeSizeDescriptionExample Values
CHAR(n)n BytesFixed-length character string’Hello’
VARCHAR(n)Up to n BytesVariable-length character string’Hello, World!’
TEXTVariableLarge text data’This is a long text…’

Binary Data Types:

Data TypeSizeDescriptionExample Values
BINARY(n)n BytesFixed-length binary data0x4D5A
VARBINARY(n)Up to n BytesVariable-length binary data0x4D5A3B
BYTEAVariableLarge binary data (PostgresSQL)(images, files, etc.)
BLOBVariableLarge binary data (MariaDB / MySQL)(images, files, etc.)

Boolean Data Type:

Data TypeSizeDescriptionExample Values
BOOLEAN1 ByteRepresents true/falseTRUE, FALSE

In SQL, DECIMAL and NUMERIC are fixed-point number types used when exact values are needed - especially for money, quantities, or measurements where rounding errors are not acceptable.

Basically, both DECIMAL and NUMERIC store numbers with:

  • a fixed total number of digits (precision)
  • a fixed number of digits after the decimal point (scale)

General syntax:

DECIMAL(precision, scale)
NUMERIC(precision, scale)
  • precision = total number of digits (before + after decimal)
  • scale = number of digits after the decimal point

If scale is omitted, many systems default to 0 (integer-like) or sometimes to the implementation’s default (often 0 or same as precision). It depends on the DBMS. In the SQL standard, DECIMAL and NUMERIC are intended to be equivalent. In practice, most database systems treat them as synonyms.

Example:

DECIMAL(7, 2)
  • precision = 7 → total digits = 7
  • scale = 2 → 2 digits after the decimal point

So this allows numbers from:

  • -99999.99 to +99999.99

because:

  • 5 digits before decimal
  • 2 digits after decimal
  • → total = 7 digits

Valid values for DECIMAL(7, 2):

  • 0.00
  • 12.34
  • 99999.99
  • -123.45

Invalid (too many digits or too many decimals):

  • 123456.78 → 6 digits before decimal (needs precision ≥ 8)
  • 1.234 → 3 digits after decimal (scale = 2, so would be rounded or rejected)

NUMERIC(p,s) must have exact precision as specified. DECIMAL(p, s) may internally use slightly more precision, but logically they behave the same from the user’s point of view.

Floating-point types (FLOAT, REAL, DOUBLE PRECISION, etc.) are approximate:

  • They are stored in binary format and cannot represent all decimal fractions exactly.
  • This leads to rounding artifacts like 0.1 + 0.2 becoming 0.30000000000000004.

DECIMAL/NUMERIC are exact decimal types:

  • Calculations with money (prices, taxes, balances) must not be affected by binary rounding errors.
  • Always use DECIMAL/NUMERIC for currency, precise financial calculations, and counts/quantities with fixed decimal places.

Example problem with float:

-- Might not store exactly 0.10
price FLOAT = 0.10
-- May produce strange results because of binary rounding

Better:

price DECIMAL(10, 2) = 0.10 -- Exactly two decimal digits

Commonly used for money and financial data, i.e. monetary amounts like prices, salaries, taxes, etc.

How many bytes are used to store DECIMAL values depends on the DBMS and the specified precision/scale. Generally, higher precision requires more storage space.

Example storage requirements for DECIMAL/NUMERIC in Transact-SQL (SQL Server) decimal and numeric (Transact-SQL):

PrecisionStorage bytes
1 - 95
10-199
20-2813
29-3817

In addition to the basic data types, SQL also provides modifiers that can be used to further specify the characteristics of a column. Common modifiers include:

  • UNSIGNED: Specifies that a numeric column can only store non-negative values (i.e., zero and positive numbers). This is often used for fields like IDs or counts where negative values do not make sense.
  • ZEROFILL: (MariaDB / MySQL specific) Pads numeric values with leading zeros when displayed. This is often used for fields like account numbers or codes where a fixed length is desired.
  • MariaDB / MySQL: AUTO_INCREMENT, automatically generates a unique value for each new row inserted into the table. This is often used for primary key columns.
  • PostgreSQL: SERIAL, automatically generates a unique value for each new row inserted into the table. This is often used for primary key columns.
  • DEFAULT: Specifies a default value for a column if no value is provided during an insert operation.

When defining a table, you can also specify constraints for the columns. Constraints are rules that restrict the type of data that can be stored in a column. Common constraints include:

  • PRIMARY KEY: Uniquely identifies each record in the table: this marks a column (or a combination of columns) as the primary key of the table.
  • FOREIGN KEY, REFERENCES: Ensures referential integrity between two tables: this marks a column (or a combination of columns) as a foreign key that references the primary key of another table.
  • NOT NULL: Ensures that a column cannot have a NULL value. Every row must have a value for this column.
  • UNIQUE: Ensures that all values in a column are unique, i.e. no duplicate values are allowed. Typically used for fields like email addresses or usernames.
  • CHECK: Ensures that all values in a column satisfy a specific condition.

You can define constraints either at the column level (directly after the data type) or at the table level (after all columns have been defined).

Row level constraints:

CREATE TABLE blog_post (
post_id SERIAL PRIMARY KEY,
author_id INTEGER NOT NULL REFERENCES app_user(user_id) ON DELETE RESTRICT ON UPDATE CASCADE,
slug VARCHAR(120) NOT NULL UNIQUE,
title TEXT NOT NULL,
body TEXT NOT NULL,
status VARCHAR(20) NOT NULL CHECK (status IN ('DRAFT', 'PUBLISHED')),
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Table level constraints:

CREATE TABLE blog_post (
post_id SERIAL,
author_id INTEGER NOT NULL,
slug VARCHAR(120) NOT NULL,
title TEXT NOT NULL,
body TEXT NOT NULL,
status VARCHAR(20) NOT NULL,
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT pk_blog_post PRIMARY KEY (post_id),
CONSTRAINT uq_blog_post_slug UNIQUE (slug),
CONSTRAINT ck_blog_post_status CHECK (status IN ('DRAFT', 'PUBLISHED')),
CONSTRAINT fk_blog_post_author
FOREIGN KEY (author_id)
REFERENCES app_user(user_id)
ON DELETE RESTRICT
ON UPDATE CASCADE
);

Users table (identity + text + booleans + timestamps)

CREATE TABLE app_user (
user_id SERIAL PRIMARY KEY,
email VARCHAR(320) NOT NULL UNIQUE,
display_name TEXT NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Products table (identity + text + numeric + timestamps)

CREATE TABLE product (
product_id SERIAL PRIMARY KEY,
serial_number VARCHAR(64) NOT NULL UNIQUE,
name TEXT NOT NULL,
price NUMERIC(12,2) NOT NULL CHECK (price >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Blog posts table (foreign key + text + timestamps)

CREATE TABLE blog_post (
post_id SERIAL PRIMARY KEY,
author_id INTEGER NOT NULL REFERENCES app_user(user_id),
title TEXT NOT NULL,
body TEXT NOT NULL,
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

A “table budget” is a pragmatic capacity estimate: row width (bytes per row) x expected row count gives you a approximation of how much storage a table will consume. This is useful early in schema design because it forces you to think in units: “If each row is about 350 bytes and we expect 50 million rows, that table alone is on the order of 17.5 GB before overhead.” Even if the estimate is rough, it helps you catch design choices that will scale poorly.

This helps you identify potential issues early on, such as choosing data types that are too large for the expected data volume. For example, using BIGINT for a column that will only ever store small integers wastes space. Instead, using SMALLINT or INT would be more efficient.

Here’s a simple approach to estimating a table budget:

  1. Estimate Row Width: Sum the sizes of all columns in bytes. Consider data types and any overhead (e.g., variable-length types like VARCHAR have additional bytes for length).

    Fixed-width types are straightforward (typical heap sizes):

    • BOOLEAN = 1 byte (plus alignment effects)
    • SMALLINT = 2 bytes
    • INTEGER = 4 bytes
    • BIGINT = 8 bytes
    • TIMESTAMP / TIMESTAMPTZ = 8 bytes
    • UUID = 16 bytes

    Variable-width types (TEXT, VARCHAR, BYTEA, JSONB, NUMERIC, etc.) are:

    • average payload bytes (e.g., string length in bytes under UTF-8), plus
    • a variable length header (commonly 4 bytes, sometimes 1 byte for very short values), plus
    • possible overhead for alignment/padding.

    Example: email VARCHAR(320) with average email length of 50 bytes: 50 bytes + 4 bytes (header) = 54 bytes (actually more due to padding/alignment but the length may also only be encoded in 1 single byte too.). But these are estimations; actual storage may vary due to implementation details.

  2. Estimate Row Count: Based on expected usage, estimate how many rows the table will hold.

  3. Calculate Table Size: Multiply row width by row count to get an estimated table size.

Changing the structure of an existing table is done using the ALTER TABLE command. This command allows you to add, modify, or delete columns, as well as change constraints and other properties of the table.

The alteration of a table can be done in several ways, depending on the specific change you want to make. Common operations include:

  • Adding a new column.
  • Modifying an existing column (e.g., changing its data type or constraints).
  • Deleting a column.

With respect to constraints, you can also

  • Add a new constraint (e.g., adding a primary key or foreign key).
  • Modify an existing constraint (e.g., changing the behavior of a foreign key on delete).
  • Delete a constraint.
  1. Add a column
ALTER TABLE employees
ADD COLUMN phone varchar(30);
  1. Rename a column
ALTER TABLE employees
RENAME COLUMN phone TO phone_number;
  1. Drop a column
ALTER TABLE employees
DROP COLUMN phone_number;
  1. Add a constraint (examples)
ALTER TABLE employees
ADD CONSTRAINT employees_salary_nonnegative
CHECK (salary >= 0);
ALTER TABLE employees
ADD CONSTRAINT employees_email_unique
UNIQUE (email);
  1. Drop a constraint
ALTER TABLE employees
DROP CONSTRAINT employees_email_unique;

Deletion of a table is done using the DROP TABLE command. This command permanently removes the table and all its data from the database.

Example:

  1. A simple drop:
DROP TABLE posts;
  1. Drop with cascade (also removes dependent objects, e.g., foreign keys in other tables) if the table exists:
DROP TABLE IF EXISTS users CASCADE;

Learning Outcomes: What you should be able to do after this chapter

Section titled “Learning Outcomes: What you should be able to do after this chapter”

After completing this chapter, students should be able to:

  • Name: name the four SQL command groups (DDL, DQL, DML, DCL) and common data types.
  • Describe: describe the purpose and history of SQL as well as the concept of SQL dialects.
  • Apply: create, alter and drop databases and tables using CREATE, ALTER and DROP.
  • Analyze: select an appropriate data type and suitable constraints for a given column (e.g., DECIMAL vs. FLOAT).
  • Evaluate: assess the impact of data type and constraint decisions on storage requirements and data quality.