4. DBMS Market

Market Overview of Database Management Systems

In a modern software landscape, database management systems form a kind of invisible infrastructure, quietly storing and organizing the data that almost every application depends on. But there is not just “one” type of database, but rather a whole ecosystem of different systems, each optimized for particular kinds of data and workloads.

Database Management System Market. — Fig. 7.1: Database Management System Market Size 2024-2030

Source: [9]

The figure shows the global Database Management System (DBMS) market by region from 2018 to 2030. Overall market size grows steadily from a bit under 50 billion USD in 2018 to around 100 billion USD in 2023.

Regionally, North America (dark purple) remains the largest segment throughout the period, followed by Europe and the Asia-Pacific region. Asia-Pacific’s share visibly expands over time, contributing a large part of the growth in later years, while Europe also increases but a bit more moderately. Latin America and the Middle East & Africa stay relatively small segments, but they, too, grow gradually as the total market expands.

Year	Total DBMS market	Relational share	Approx. non-relational (mostly NoSQL) share
2022	USD 78.5 B	>74%	<26%
2023	USD 100.79 B	>62%	<38%
2024	USD 114.99 B	≈61.8%	≈38.2%

Table 4.1: Estimated market share of relational vs. NoSQL DBMS (source: [9])

Relational database systems: the classical core

Relational database systems represent the classic, well-established core of the database world. Data is organized in tables with rows and columns; each table describes one kind of entity, for example customers, orders or products. Relationships between these tables are expressed through primary keys and foreign keys.

Fig. 7.2: Relational Database Management

Among the most famous are PostgreSQL, MySQL or MariaDB on the open-source side, and Oracle Database, Microsoft SQL Server or IBM DB2 on the proprietary side. All of these systems would be accessed through SQL, the Structured Query Language, which is the standard language for managing and querying relational databases.

A central characteristic of relational systems is their support for ACID transactions (Atomicity, Consistency, Isolation, Durability). This means that a sequence of changes—such as transferring money between bank accounts or booking a seat on a flight—would either be applied completely or not at all, keeping the database in a consistent state even in the presence of failures. For many classical business applications—ERP systems, inventory management, finance, HR systems and many web backends—this combination of strong guarantees, mature tooling and rich query capabilities would make relational databases the default choice.

However, this model would also have limitations. Among these are:

With the rigid schema it is more difficult to store very flexible or rapidly changing data structures.
Horizontal scaling across hundreds or thousands of machines is possible, but often complex.

Yet, the rigid schema and strong consistency guarantees is the very reason why relational databases remain dominant in many enterprise applications, where data integrity and complex querying are paramount.

PROs of relational database management systems (RDBMS)

Strong data integrity & consistency (ACID transactions)
- RDBMSs are built around ACID guarantees (Atomicity, Consistency, Isolation, Durability).
- This makes them ideal when it’s critical that data is always correct and consistent (e.g. banking, inventory, grades).
Powerful, standardized query language (SQL)
- SQL is expressive, declarative and standardized across many systems (PostgreSQL, MySQL, SQL Server, etc.).
- Complex joins, aggregations, subqueries, window functions, etc. are first-class citizens, which is often harder or more verbose in NoSQL.
Clear, enforced schema (data model discipline)
- Tables, columns, types, and constraints (NOT NULL, UNIQUE, FOREIGN KEY, CHECK, …) enforce structure at the database level.
- This reduces inconsistent or “dirty” data and shifts many validation concerns away from the application code.
Mature tooling & ecosystem
- RDBMSs have decades of optimization: query planners, indexes, profiling tools, backup/restore, replication, migration tools, ORMs, etc.
- Easier to find DBAs (Database Administrators), documentation, and community knowledge.

NoSQL as an umbrella for alternative models

From around the late 2000s, the term “NoSQL” gained prominence. Strictly speaking, it does not mean “no SQL”, but rather “Not only SQL”. The idea is that, alongside the relational approach, there are other models that might be better suited for certain scenarios, especially large-scale web applications, big data analytics or very flexible, rapidly evolving data.

Under this NoSQL umbrella, several families of databases are grouped. A non-exhaustive list include:

Document databases
Key-value stores
Graph databases and
Time-series databases
… and others.

These systems often prioritize horizontal scalability, flexible schemas or specialized access patterns over strict relational consistency. For the sake of scalability and performance, many NoSQL systems would relax some of the ACID properties, embracing eventual consistency or other models.

Document-oriented databases: JSON as the natural format

Document-oriented databases treat data as documents, usually as JSON documents or a similar format. Instead of splitting information across many normalized tables, like in relational databases, a single document describes an entire entity potentially including nested structures. For example, an order document could contain customer data, the list of products, shipping information and status updates in one nested structure.

Fig. 7.3: MongoDB Logo

A sample order document in a document database might look like this:

{
  _id: ObjectId("675f4ef49b09f4a0e8a4b001"),
  orderNumber: "ORD-1234",
  createdAt: ISODate("2025-12-03T09:15:23Z"),
  status: "pending",

  customer: {
    customerId: ObjectId("675f4ef49b09f4a0e8b2a111"),
    name: "Andrea Hofer",
    email: "[email protected]"
  },

  shippingAddress: {
    street: "Tschinowitscher Straße 12",
    city: "Villach",
    postalCode: "9500",
    country: "AT"
  },

  items: [
    {
      productId: ObjectId("675f4ef49b09f4a0e8c30001"),
      name: "Wireless Mouse",
      quantity: 1,
      price: 19.99,
      currency: "EUR"
    },
    {
      productId: ObjectId("675f4ef49b09f4a0e8c30002"),
      name: "Mouse Pad",
      quantity: 1,
      price: 5.99,
      currency: "EUR"
    }
  ],

  totals: {
    subtotal: 25.98,
    currency: "EUR"
  }
}

MongoDB or CouchDB are representatives of this category. In such systems, different documents in the same collection could have slightly different fields; the schema would be flexible or even completely dynamic. This flexibility would make it easier to evolve an application’s data model without complex migrations.

Fig. 7.4: CouchDB Logo

Querying is powerful, but instead of SQL, a document-specific query language or an API that works with JSON-style filters and aggregation pipelines is used. Document databases are attractive when data naturally appears as JSON - for example in many web and mobile applications - or when the structure changes frequently, such as in content management systems, product catalogs or logging. A sample query to find the top 5 customers by revenue from electronics orders in 2024 in MongoDB’s aggregation framework would look like this:

db.orders.aggregate([
  // Orders in 2024
  {
    $match: {
      orderDate: {
        $gte: ISODate("2024-01-01T00:00:00Z"),
        $lt:  ISODate("2025-01-01T00:00:00Z")
      }
    }
  },

  // One doc per order item
  { $unwind: "$items" },

  // Join products (to filter by category)
  {
    $lookup: {
      from: "products",
      localField: "items.productId",
      foreignField: "_id",
      as: "product"
    }
  },
  { $unwind: "$product" },

  // Only Electronics
  { $match: { "product.category": "Electronics" } },

  // Group by customer: revenue + quantity
  {
    $group: {
      _id: "$customerId",
      totalRevenue: {
        $sum: { $multiply: ["$items.quantity", "$items.unitPrice"] }
      },
      totalQuantity: { $sum: "$items.quantity" }
    }
  },

  // Join customers for name + country
  {
    $lookup: {
      from: "customers",
      localField: "_id",
      foreignField: "_id",
      as: "customer"
    }
  },
  { $unwind: "$customer" },

  // Final shape
  {
    $project: {
      _id: 0,
      customerId: "$_id",
      customerName: "$customer.name",
      country: "$customer.country",
      totalRevenue: 1,
      totalQuantity: 1
    }
  },

  // Top 5
  { $sort: { totalRevenue: -1 } },
  { $limit: 5 }
]);

PROs of document databases

Flexible schema (“schema-less”)
- You don’t have to predefine a rigid schema like in SQL.
- Fields can be added/removed per document without migrations.
- Super handy in evolving projects or when requirements are still fuzzy (typical in web dev).
Natural fit for JSON / application objects
- Documents are basically JSON (or JSON-like), which fits perfectly with JavaScript/TypeScript backends and frontends.
- You often can store data in almost the same structure you use in your code → less mapping / less impedance mismatch.
Good for nested / aggregate data (fewer joins)
- Related data can be embedded directly (like customer, items, totals in your order example).
- Many read operations become a single query instead of multiple joins across tables like in RDBMS. This is vast speed improvement for read-heavy workloads.
- This can make certain read patterns faster and simpler to reason about.
Easy horizontal scalability & high availability
- Document stores are designed with sharding and replication in mind right from the start. They have been built to scale out across many servers and provide high availability.
- It’s relatively straightforward to scale out across multiple machines and to build highly available setups.
Fast development & iteration
- Because of flexible schema + JSON + fewer joins, you can iterate quickly, especially in early stages or prototypes.
- Changing requirements often mean just adjusting the code and letting the database adapt.

Key-value stores: simplicity for maximum speed

Key-value databases have the simplest mental model: each piece of data is stored under a unique key and retrieved by that key. Conceptually, the whole system behaves like a very large, distributed dictionary or hash map.

Fig. 7.5: Redis Logo

Redis is a prominent example, often operating primarily in memory but with optional persistence. Amazon DynamoDB is another important system, offering a key-value and document-style API as a fully managed cloud service.

Because of their simplicity, key-value stores can be extremely fast and highly scalable. They are used where simple access patterns dominate, such as caching frequently accessed data, storing user sessions, managing feature flags or maintaining counters and queues at very high throughput. In many architectures, a key-value store does not replace a relational database, but complements it, for example as a caching layer in front of a slower but more feature-rich system.

Typically setup of a Redis key-value store. — Fig. 7.6: How Redis typically works: at first (1) the application server asks the Redis database for a value by key. If the value is not found (cache miss), it (2) fetches it from the primary database, returns it to the application server, and (3) also stores it in Redis for future requests.

Source: [11]

PROs of key-value stores

Very high performance & low latency
- Data is usually stored in memory → reads/writes are extremely fast.
- Great for use cases where milliseconds (or microseconds) matter (caching, real-time analytics, gaming, etc.).
Simple data model
- Just key → value. Easy to understand, easy to use.
- No complex schemas, joins, or relations to manage.
Easy horizontal scalability
- Key-value stores are usually easy to partition/shard across many nodes by key.
- This makes it straightforward to scale to large workloads.
Great for caching
- Perfect as a cache in front of another database or API.
- Built-in TTL (time-to-live) and eviction policies help automatically expire old data.
Atomic operations & concurrency support
- Many operations are atomic (e.g., INCR, DECR, list push/pop), simplifying concurrent updates.
- This is very useful for rate limiting, distributed locks, and counters.
Extra features beyond simple storage
- Redis provides Pub/Sub, streams, Lua scripting, and more.
- That allows using it as a message broker, event stream, or coordination service.

Graph databases: focus on relationships

Graph databases emerged from the observation that in many problem domains, the relationships between entities are just as important as the entities themselves, or even more so. Social networks, recommendation systems and network analyses all rely heavily on complex relationship patterns.

Fig. 7.7: Neo4j Logo

In a graph database such as Neo4j, data is modeled as nodes (representing entities such as people, products or devices) and edges (representing relationships such as “friend of”, “purchased”, “connected to”). Both nodes and edges carry properties. Queries in such a system express patterns in the graph: for example, “find all people who are friends of a person who bought product X and who also liked product Y”.

This model excels where multi-step relationships and traversals across the data are frequent and performance-critical. While relational databases could also represent graphs using join tables, the query complexity and performance characteristics would often be less favorable when traversals become deep and irregular. For these niche but important cases, specialized graph databases would offer a more natural and efficient solution.

An example code for Cypher query to find friends-of-friends in a social network graph would look like this:

// Create some people and friendships
CREATE (alice:Person {name: "Alice"})
CREATE (bob:Person   {name: "Bob"})
CREATE (carol:Person {name: "Carol"})

CREATE (alice)-[:FRIENDS_WITH]->(bob)
CREATE (bob)-[:FRIENDS_WITH]->(carol);

// Find Alice's friends-of-friends
MATCH (a:Person {name: "Alice"})-[:FRIENDS_WITH]->(:Person)-[:FRIENDS_WITH]->(fof)
RETURN fof.name AS friendOfFriend;

PROs of graph databases

Natural modeling of highly connected data
- Nodes = entities, relationships = edges.
- Social networks, recommendation systems, identity/access graphs, network topologies, knowledge graphs, etc. map directly to the data model.
Fast traversal of relationships
- Queries that follow many hops (friends-of-friends, shortest path, common neighbors, etc.) are much faster and simpler than in relational or key-value stores.
- Relationship edges are “first-class” and usually stored with efficient pointers.
Expressive, relationship-focused query languages
- Languages like Cypher are designed around pattern matching: (a)-[:KNOWS]->(b) is literally the query syntax.
- Complex graph questions (paths, patterns, neighborhoods) become short and readable.
Flexible schema / easy evolution
- You can easily add new node or relationship types and properties without painful migrations.
- Great when the domain evolves or you discover new connections over time.
Built-in graph algorithms & analytics
- Many graph DBs (Neo4j, etc.) offer libraries for centrality (PageRank), community detection, similarity, shortest paths…
Better insight into relationships and structure
- The model encourages you to think in terms of connections and structure, not just rows and columns.
Less join-hell for relationship-heavy queries
- In an RDBMS you’d need multiple joins across junction tables; in a graph DB, it’s just traversing edges.
- That simplifies both the schema and the query logic for highly relational domains.

Time-series databases: data over time as a first-class concern

Time-series databases are specialize in data that is primarily structured around time: metrics, sensor readings, log entries, stock prices or any other values that change over time. In such systems, each record typically consists of a timestamp, one or more measured values and additional labels or tags describing the source of the measurement.

Fig. 7.8: InfluxDB Logo

Systems like InfluxDB, TimescaleDB or Prometheus fall into this category. They are optimized for extremely high write rates, since monitoring and IoT systems easily generate thousands or millions of measurements per second. They also provide efficient retention mechanisms to downsample or delete old data, and highly optimized queries for ranges of time, averages, maximums, percentiles and similar aggregations.

In monitoring scenarios—for servers, applications, networks or smart devices—time-series databases often form the backbone of dashboards and alerting systems. While relational or document databases store time-series data, they usually do not match the performance and convenience of specialized time-series engines when data volume and time-based queries become very large.

The Grafana (https://grafana.com/) dashboard below shows an example of visualizing time-series data stored in a time-series database.

Example Grafana Dashboard. — Fig. 7.8: Example of a Grafana Dashboard

PROs of time-series databases

Optimized for time-stamped data
- Data model, storage engine, and indexes are all designed around (measurement, tags, fields, timestamp).
- Much more efficient for metrics, logs, sensor data, financial ticks, etc. than a “generic” table.
High write throughput
- TSDBs are built to handle huge numbers of inserts per second (e.g. metrics from thousands of servers, sensors, IoT devices).
- Batched, compressed writes and append-only storage formats are common.
Efficient compression & storage
- Use time-series-aware compression (e.g. delta encoding, run-length encoding) to store large volumes cheaply.
- You can keep much longer history (months/years) without exploding disk usage.
Built-in retention policies & downsampling
- You can define rules like:
  - “Keep raw data for 7 days, 1-minute averages for 30 days, 1-hour averages for 1 year.”
- Old data is automatically aggregated and/or deleted → no manual cleanup scripts.
Time-series-friendly query language and functions
- Aggregations over time windows: MEAN(), SUM(), MAX(), etc. grouped by time buckets (GROUP BY time(1m)).
- Built-in functions for moving averages, derivatives (rates), windowing, interpolation, etc.
Good integration with monitoring & dashboards
- TSDBs integrate nicely with tools like Grafana & Co.
- Easy to build dashboards for system metrics, business KPIs, IoT telemetry, etc.

Embedded and in-process databases: databases that travel with the application

Not every application uses a separate database server running on its own machine or container. In many cases, especially in desktop applications, mobile apps or small tools, an embedded database running in the same process as the application is more appropriate.

Fig. 7.9: SQLite Logo

SQLite is the most prominent example worldwide: it does not run as a server at all, but stores its data in a single file and be linked directly into the application via a library. Other embedded databases in the Java ecosystem would include H2 or Derby. These systems often offer a subset of relational features, but in a very lightweight package.

Embedded databases avoid the overhead of setting up and managing a separate server. They are ideal for offline-capable applications, local data storage on devices, prototyping or automated tests. Even in large web applications, SQLite still appears in development environments, while production would use a heavier relational system.

Fig. 7.10: SQLite architecture compared to traditional client-server RDBMS.

Source: [10]

Despite its lightweight nature, SQLite still gives you full SQL and ACID semantics. You get tables, indexes, transactions, constraints, and all the relational goodies students learn when working with larger RDBMSs. For many workloads—especially read-heavy, single-user scenarios—SQLite can be surprisingly fast because it avoids network round trips and context switches between client and server. The data is simply a file on the local filesystem, and the engine is optimized for small, quick transactions and queries.

However, in-process databases show limitations when it comes to concurrency and scaling. SQLite uses file-level locking strategies, which means heavy write concurrency (many writers at once) doesn’t scale nearly as well as a server-based RDBMS that can handle many concurrent clients and complex locking. Classic client-server databases are designed to coordinate many clients, manage complex transactions, and spread load over multiple cores and even multiple machines.

Another important difference is in deployment topology and security features. A server-based RDBMS supports connections from multiple host machines, user management, roles, fine-grained privileges, replication, and high-availability setups. You can centralize your data, run backups and monitoring on the database server, and expose it safely on the network. SQLite, by design, has no “remote access” layer—your application process needs direct access to the database file.

PROs of in-process databases

Zero setup & administration
- No separate server to install, configure, or run.
- Your app just links the SQLite library and uses a single .db file → great for end-users and simple deployments.
Easy deployment & portability
- The entire database is usually one file you can copy, move, or bundle with the app.
- Makes backups, migrations to another machine, or “portable apps” very straightforward.
Good performance for local, single-user workloads
- No network round-trips: queries run in the same process as the application.
- Very fast for read-heavy or small write workloads on a single device.
Full SQL & ACID semantics in a tiny package
- Despite being “lightweight” SQLite still gives you real tables, indexes, constraints, and transactions.
Ideal for embedded / offline scenarios
- Perfect for mobile apps, desktop tools, IoT devices, and offline-first apps that must work without a network.
- The app can sync the SQLite file (or selected data) when a connection is available.
Lower operational complexity & cost
- No need for DB admins, monitoring of a DB server, or managing remote connections.
- Reduces the overall infrastructure and maintenance overhead.

Open source and proprietary systems: different models of development and licensing

Beyond technical differences, the database market can also be divided by licensing and development models. Open-source databases publish their source code under licenses that allow free use, modification and distribution. PostgreSQL, MySQL, MariaDB, SQLite, MongoDB Community Edition, Redis, Cassandra and many others belong in this category.

Fig. 7.11: Microsoft SQL Server Logo

Open-source databases often benefit from large communities, a rich ecosystem of tools and libraries and the absence of traditional license fees. Organizations could run them on their own infrastructure or choose managed offerings from cloud providers. Some vendors would follow an “open core” model, where the core of the database remains open source but additional enterprise features are only available in a paid edition.

Fig. 7.12: Oracle Database Logo.

Proprietary systems, such as Oracle Database, Microsoft SQL Server, IBM DB2, SAP HANA or Snowflake, are instead developed and licensed by companies that do not publish the full source code. These products typically focus on enterprise features such as advanced clustering, integrated tooling, close alignment with other components of the vendor’s ecosystem (Microsoft) and strong commercial support. In return, they would require license fees, often tied to the number of cores, users or the volume of data processed.

Fig. 7.13: IBM DB2 Logo.

The choice between open source and proprietary is not be purely technical; budget constraints, support expectations, compliance requirements and existing vendor relationships all play a role.

PROs od commercial/proprietary databases

Vendor support & SLAs
- Dedicated support teams, 24/7 hotlines, guaranteed response times (SLAs).
- Useful when downtime or data loss has real financial or legal consequences.
Advanced enterprise features out of the box
- Built-in tools for clustering, high availability, automatic failover, advanced replication, partitioning, in-memory options, etc.
- Often more polished and easier to configure than DIY setups around open-source systems.
Integrated ecosystem & tooling
- GUI admin tools, monitoring dashboards, performance analyzers, migration assistants, and backup solutions that are designed to work together.
- Reduces the need to glue together many separate open-source tools.
Compliance, certifications, and governance
- Commercial vendors often provide features and documentation for compliance (GDPR, HIPAA, PCI-DSS, etc.).
- Audit logging, fine-grained access control, encryption, and certifications can be important in regulated industries.
Training, consulting & ecosystem partners
- Official training courses, certifications, and consulting services.
- Helpful for large organizations that want standardized skills and procedures.

Deployment models: from on-premises servers to “Database as a Service”

A further axis in the database landscape concerns how the system is operated. Traditionally, organizations install databases “on-premises” on their own servers, handle backups, updates and scaling themselves and possibly run them in virtual machines or containers.

With the rise of cloud computing, many databases would also be offered as fully managed services. Amazon RDS, Azure SQL Database, Google Cloud SQL and similar services operate familiar relational engines on behalf of customers, handling patching, backups and monitoring within the provider’s infrastructure. More specialized cloud-native databases, such as DynamoDB, Cosmos DB, Bigtable or Firestore, would be tightly integrated with a specific cloud provider and offer automatic scaling, high availability and pay-as-you-go pricing: : DaaS (Database as a Service).

PROs of Database as a Service (DBaaS)

No server installation or maintenance
- Provider handles OS, database installation, patching, upgrades, and many config details.
- You don’t need your own DBA just to keep the database alive and updated.
Automatic backups & high availability
- Built-in automated backups, point-in-time recovery, and easy replication options.
- Failover to replicas is often handled automatically → higher uptime with less effort.
Easy scalability
- You can typically scale up (more CPU/RAM/storage) with a few clicks or an API call.
- Some services also support scaling out (read replicas, sharding options) without redesigning everything.
Pay-as-you-go cost model
- You pay for what you use (instances, storage, I/O) instead of buying hardware upfront.
- Good for projects with uncertain load or for student/experimental systems.
Integrated security & compliance features
- Encryption at rest & in transit, role-based access, audit logs, VPC/network integration.
- Easier to meet security/compliance requirements than rolling everything yourself.
Faster time-to-market
- You can provision a production-ready database in minutes.
- Lets teams focus on application logic instead of infrastructure setup.

How different types are combined in practice

In real systems, it is rare to see only a single database type. Instead, architects often combine multiple systems, each chosen for its strengths. A web application might use PostgreSQL as its primary relational database for core business data, while Redis provides caching and session management, Elasticsearch or another search engine handles full-text search and a time-series database collects monitoring metrics.

This polyglot approach acknowledges that no single system could be ideal for all tasks. Relational, document, key-value, graph and time-series databases each occupy a niche where they each operate best.