Relational Database Management System (RDBMS) is built with many complex algorithms and data structure just to properly store and retrieve information. The complexity is almost like an operating system that works in sync with many features almost in real time. Modern RDBMS has built-in facilities for memory management, file buffering, network communication support, etc. They form the basic architecture of the RDBMS package. The article gives an overview of what happens behind the scenes when a user submits a query until the result is obtained from the database.
Understanding the RDBMS
An RDBMS package is generally a database server that serves several clients via communication channels under the umbrella of a network protocol such as socket, pipes, etc. In a stand-alone database application, the client communicates with the database through programming interfaces. In such a case, the database server becomes part of the client application or vice versa. Sometimes the database is contained in the on-board system as a slave to the host system. Usually, in a large database application, the RDBMS server is separated from the concern of the application by hosting the server in a remote location. Business logic interacts with the database server over the network as needed. Either way, the logic of processing queries remains the same whether it is an embedded database application, a network application, or a stand-alone application.
Applications connect to the database using a set of protocols called database connectors. Open Database Connectivity (ODBC) is a well-known database connector that an application can use to connect to almost any database. There are also vendor specific database connectors for an RDBMS such as MySQL. MySQL supports connectors for Java (JDBC), PHP, Python, .NET, etc. These implementations primarily support communication through network protocols. These connectors are designed (API) to transfer SQL commands to the database server and retrieve information at the request of the client. Connectors typically consist of Database Driver and Client Access APIs.
Queries are nothing more than questions put to the database according to the syntax and semantics of the standard query language called SQL (Structured Query Language). The database server understands the language and responds based on the submitted request. According to SQL semantics, queries can be of two types. The first type of query is a Data Definition Language (DDL) query, which is typically used to create and do things with the data base, like creating and modifying tables, defining indexes, handling constraints, etc. A second type of query called Data Manipulation Query (DML) allows you to work on database data. This includes actions such as SELECT querying, updating, and deleting data in database tables.
A typical SELECT query syntax can be written as follows. The hook () represents optional parameters and lowercase notation represents user-defined variables.
SELECT [ DISTINCT ] columns FROM tables [ WHERE expression ] [ GROUP BY columns ] [ HAVING expression ] [ ORDER BY columns ] ;
- The DISTINCT keyword removes duplicate records in the final result.
- The FORM clause forms a projection onto the references that appear in the other clauses.
- The WHERE applies the expression to the referenced table.
- The GROUP BY clause groups the result according to the specified attribute.
- The HAVING clause applies a filter on groups.
- The ORDER BY clause sorts the result.
Processing of requests
After the client submits a database query statement over the network protocol to the database server, it is first interpreted and then executed. The purpose of the interpretation is to decipher the meaning of the request. This is done by parsing the SQL statement and breaking it down into pieces before executing it. Interpreting the query is a two-step process: first, in the logical plane, it describes what the query is supposed to do, and second, in the physical plane, it describes how to implement the query.
The physical plane of the query is managed by the database system’s query execution engine. A tree structure is created where each node represents the query operator with the number of children. These children represent a number of tables involved in the operation. The request goes through several phases before execution such as analysis, validation, optimization, generation / compilation of the plan and finally execution.
- The analysis analyzes the SQL statement in several parts, validates it and translates the logical query (SQL query) into a query tree according to the syntax schema of relational algebra. This is the logical plan.
- The logical query is then translated into a physical plan. There can be many such plans, but the query optimizer finds the best one, for example, based on estimated runtime performance. This is done by taking the relational algebra tree in the optimizer’s search space and extending it by forming alternative execution plans, then ultimately choosing the best of them. The result is similar to the code generation part of SQL compilation. Critical resources for optimizing the code are obtained from the database system catalog which contains the information about the number of tuples, and many other things such as the stored relationships referenced by the query, etc. The optimizer finally copies the optimal plan from the memory structure and sends queries to the execution engine. The query runtime engine runs the plan using the database relationship as input and generates a new table with rows and columns that match the query criteria.
Note that the plan is always optimal or near optimal in the optimizer’s search space. The interpretation of an SQL query by the RDBMS is ultimately not so simple. Optimization is a costly business because it analyzes on alternative execution plans. A single query can have an infinite number of possibilities. Therefore, it consumes additional processing time, which impacts both the query optimizer, query execution engine, and overall database response time.
This is just an overview of an overall process for executing SQL queries. In short, parsing divides the SQL statement into pieces which then go through the validation phase to validate errors and check syntax according to the SQL standard and identify the query operation. The parser transfers the query in an intermediate form, recognized by the optimizer, which generates an efficient query execution plan. The runtime then takes the optimized query and executes the query. The result of the execution thus obtained is ultimately returned to the client.
# # #