Skip to main content

Introduction to Klustron's MySQL Connection Protocol

KlustronAbout 13 min

Introduction to Klustron's MySQL Connection Protocol

Preface

Klustron's computing nodes are developed based on PostgreSQL, which directly supports PostgreSQL's connection protocol and SQL syntax. Therefore, software using JDBC, ODBC, and other common database connection protocols, as well as PostgreSQL-specific connection libraries in various programming languages, can connect to and operate with Klustron clusters without issue.

To allow applications originally using MySQL to connect to and use Klustron without modification or recompilation, we developed Klustron's MySQL connection protocol. This document provides an overview of this protocol.

In essence, for Klustron, the connection protocol is just a pipeline for communication between the client and the Klustron server. MySQL and PostgreSQL protocols are like two different-shaped pipes, but the SQL statements and query results they transport are fundamentally the same.

This means that any SQL syntax and functionality supported by Klustron can be transmitted to the server cluster and executed normally via either the MySQL or PostgreSQL connection protocol, with the results returned accordingly.

For example, you can send PostgreSQL-specific SQL or standard SQL statements, including prepared statement syntax, stored procedure syntax, and DDL syntax, through a MySQL connection and receive results following the MySQL protocol. This allows the use of MySQL client libraries for result retrieval. Conversely, you can send any Klustron-supported MySQL-specific syntax (e.g., prepared statements, DML) or standard SQL statements through a PostgreSQL connection and receive results following the PostgreSQL protocol, using PostgreSQL client libraries for result retrieval.

Overview of Klustron's MySQL Connection Protocol

  1. Supports text protocol and binary protocol
  2. Supports prepared statements (binary protocol)
  • Does not support MySQL syntax for EXECUTE statements but supports PostgreSQL syntax for EXECUTE statements
  1. Supports the widely used MySQL authentication method mysql_native_password
  2. Supports MySQL client libraries for all major programming languages
  • C/C++/C#/Go/Java/PHP/Python/Rust/Ruby/.NET
  1. Supports all MySQL proprietary data types and operators
  2. Supports all MySQL proprietary DML syntax
  3. Supports MySQL variable read/write syntax
  4. Supports common commands, including:
  • USE
  • KILL CONNECTION/QUERY
  • SHOW series commands
  1. Supports MySQL's AUTO_INCREMENT keyword and functionality
  2. Supports autocommit and implicit transaction start & commit
  3. Uses the same error codes as MySQL
  • Error descriptions are similar in meaning to MySQL but not identical in text
  1. Compatible with MySQL UTF8/GBK/latinN/ascii and other common character sets
  2. Supports MySQL data compression algorithms
  • zlib & zstd
  1. Supports all useful MySQL command types
  • COM_CHANGE_USER (mysql_change_user())
  • COM_INIT_DB (mysql_select_db())
  1. Does not currently support SSL; will be supported in the future
  2. Supports MySQL quoting rules (``, '', "")
  3. From Klustron-1.2, supports common MySQL DDL syntax

Features Supported by Klustron's MySQL Protocol

Klustron & MySQL protocols support all common features, including text and binary protocols, connection authentication (only supporting mysql_native_password), data compression, prepared statements, character sets, error handling, and SSL connections.

A Klustron-server (computing node) simultaneously listens on two TCP ports—one for PostgreSQL protocol (default 5432) and one for MySQL protocol (default 5306), both configurable via the configuration file.

MySQL and PostgreSQL clients use the same username and password to connect to Klustron-server, regardless of the connection protocol used. Upon receiving a TCP connection request, Klustron-server will initiate the respective server-side protocol processing module (i.e., PostgreSQL or MySQL) to complete connection authentication and establish a valid database connection. Subsequently, users can send identical SQL statements in both types of connections, regardless of the protocol. Users can send standard SQL statements or PostgreSQL or MySQL-specific SQL statements to Klustron in any connection type and receive results accordingly.

Accounts and Access Control

Users can create accounts by sending CREATE USER statements through Klustron-server using either PostgreSQL or MySQL connections. The CREATE USER syntax (as well as any other DDL syntax) must be PostgreSQL syntax. DBAs can optionally define access control rules in the pg_hba.conf configuration file, specifying that certain user accounts must come from specific IPs or domains or that certain accounts can only access specific databases. For more details on configuring access control rules in pg_hba.conf, refer to the PostgreSQL documentation.

Error Handling

Klustron's MySQL protocol automatically maps PostgreSQL error codes to the corresponding MySQL error codes. Errors returned during MySQL protocol operations use the same error codes as MySQL's server protocol implementation. Therefore, the application code handling MySQL errors does not need any modification and will work as expected. JDBC and other database client API libraries handle errors based on an exception class hierarchy, where each exception type is associated with several MySQL error codes. Consequently, as long as the application code implements exception handling, it can correctly catch errors returned by Klustron's MySQL connection.

Error description texts use PostgreSQL error strings instead of the MySQL error code texts. This is generally not an issue because, according to MySQL documentation, while error codes remain consistent across MySQL versions, error descriptions are not guaranteed to be stable. Thus, application logic should not rely on matching error strings.

Additionally, Klustron supports MySQL's SHOW WARNINGS and SHOW ERRORS statements, and their usage and results are identical to those in MySQL.

Character Sets

MySQL clients can specify the character set in the standard way. If the specified character set is not supported by Klustron, the connection will fail with an error. PostgreSQL natively supports a wide range of character sets, most of which overlap with those supported by MySQL. Both support all common character sets, including Chinese, Japanese, Korean, and major European languages. Any string sent from a MySQL client to Klustron is converted to the database's current character set before use; any string returned to the MySQL client is converted from the database's character set back to the MySQL client's character set before sending.

Klustron does not support MySQL's SET client/connection/server_character_set/collation commands but does support the SET NAMES syntax.

Data Compression

Klustron supports data compression using zlib and zstd before transmitting data packets.

Unsupported MySQL Features in Klustron's MySQL Connection Protocol

  1. Authentication Methods: Only supports mysql_native_password and no other authentication methods or external authentication plugins. This is compatible with all MySQL client versions since MySQL 4.1, allowing them to connect and function correctly.
  2. Password Management Features: Does not support features like password expiration, password validation, dual passwords, password reuse restrictions, or account temporary locking after multiple failed login attempts. All such features use PostgreSQL's existing functionality.
  3. Rarely Used Account Management Features: Does not support proxy users, unknown or anonymous users, account locking, or account resource limits.
  4. Text Protocol for Prepared Statements: Production systems should use binary connections to execute prepared statements as intended by their design, which Klustron's MySQL protocol supports. Using the text protocol to execute PREPARE/EXECUTE commands is generally for development and debugging purposes in both MySQL and PostgreSQL. Klustron does not support MySQL's PREPARE/EXECUTE syntax or MySQL's user-defined variables, which means it cannot bind parameters in the MySQL prepared statement manner.
  5. init_connect Statement for Initializing Connection State: This feature is rarely used and executing SQL statements before the connection is fully initialized contradicts database system design principles, thus not supported in Klustron.
  6. Session State Tracking: Klustron's computing nodes do not maintain the status variables tracked by MySQL. Therefore, Klustron does not support MySQL's mysql_session_track_xxx series of client API functions. However, Klustron inherits PostgreSQL's pgstat infrastructure, which collects extensive runtime statistics. Both MySQL and PostgreSQL clients can query pgstat views and functions to obtain this information. For more details, see the PostgreSQL pgstat documentation.

Features Different Between Klustron MySQL Server Protocol and MySQL Server Protocol

1. Target Database for Connections

When a MySQL client connects to a Klustron compute node without specifying a database name, it defaults to connecting to the "postgres" database. This behavior differs from MySQL, which does not specify the current database in such cases. MySQL supports switching the current database using the USE db statement or the mysql_select_db() client API, but PostgreSQL does not allow switching databases within an active connection.

Starting from Klustron-1.1, in Klustron's MySQL connections, when the transform_create_db_to_schema configuration parameter is set to on (default is on), executing CREATE/DROP DATABASE or SHOW DATABASES is equivalent to executing CREATE/DROP SCHEMA or SHOW SCHEMAS. This ensures MySQL compatibility by default. Thus, in Klustron's MySQL connections, users can execute the USE statement or call the mysql_select_db() function to switch the current database, achieving the same effect as in MySQL.

2. Data Output Format

For all data types, Klustron uses PostgreSQL's output functions to generate query results in text protocol. This means that the output for decimal/numeric, float, real (double), date, timestamp, and timestamptz data types might have slight differences compared to MySQL's output in some cases. These differences can arise due to floating-point precision discrepancies or locale and timezone settings. For example, PostgreSQL's timestamptz type always includes the timezone, such as "2022-05-30 21:08:35+08". These differences are not present when using the binary protocol (i.e., using prepared statements with bind parameters and results).

MySQL Commands Supported by Klustron

In Klustron's MySQL connections, MySQL client software can send several commands in addition to SQL statements (COM_QUERY command). These include:

  1. Commands that change connection state and their corresponding MySQL client APIs:
COM_SET_OPTION:mysql_set_server_option() 
COM_RESET_CONNECTION: mysql_reset_connection()
COM_PING:mysql_ping()
COM_QUIT: mysql_close()
  1. All commands related to prepared statements:
COM_STMT_EXECUTE
COM_STMT_FETCH
COM_STMT_PREPARE
COM_STMT_SEND_LONG_DATA
COM_STMT_RESET
COM_STMT_CLOSE
  1. Other Commands

These include COM_INIT_DB and COM_CHANGE_USER commands, which are detailed below.

Each of these commands corresponds to a function in the MySQL connector library, meaning these functions are available for user calls.

MySQL Commands Not Supported by Klustron

Klustron does not support deprecated commands (those that will be removed in future versions), replication and clone-related commands, or commands used internally by the MySQL server (which clients cannot use). If an application calls a MySQL client API to execute any of the unsupported commands listed below, the Klustron compute node will return the standard MySQL error 1047 (unsupported command).

The MySQL commands not supported by Klustron are as follows:

COM_PROCESS_KILL: mysql_kill() is deprecated; use kill connection/query instead. Klustron currently supports this command.
COM_PROCESS_INFO: mysql_list_processes() is deprecated; use show processlist instead. Klustron currently supports this command.
COM_BINLOG_DUMP_GTID: A replication command used when the replica I/O thread connects to the primary node.
COM_BINLOG_DUMP: A replication command used when the replica I/O thread connects to the primary node.
COM_REFRESH: Refreshes user accounts and access control information. Klustron does not require explicit cache refreshes, as it automatically detects changes in metadata tables and updates the cache accordingly. This command and the corresponding mysql_reload() and mysql_refresh() functions are not supported.
COM_STATISTICS: Klustron uses pgstat facilities for statistics. The mysql_stat() API function and mysqladmin for statistics are not supported.
COM_DEBUG: The mysql_debug() function is used only for developer debugging and is not available in production systems.
COM_REGISTER_SLAVE: A replication command used when the replica I/O thread connects to the primary node.
COM_CLONE: A clone command.
COM_FIELD_LIST: mysql_list_fields() is deprecated; use show columns instead. Klustron does not support this command. Users need to query pg_attribute and other metadata tables to obtain column metadata.
COM_SLEEP: Not sent by the client, used internally.
COM_CONNECT: Cannot be sent once the connection is established.
COM_TIME: Not sent by the client, used internally.
COM_DELAYED_INSERT: Removed in MySQL.
COM_END: Not sent by the client, used internally.
The mysql_set_local_infile_xxx() series of functions are not supported.

Commands to Switch Current Database and User Sessions

  1. COM_INIT_DB

This command is used in MySQL connections to switch the current (default) database, essentially the command version of the use db statement. In Klustron, switching databases this way is not supported. Instead, you can use the USE command and the mysql_select_db() client API to switch the current (default) schema.

PostgreSQL requires that clients use the same database throughout a session, without switching. Klustron inherits this requirement. PostgreSQL introduces the concept of schemas, which act as logical namespaces. Within a PostgreSQL database, you can create any number of schemas. Except for databases and schemas, all other database objects (tables, indexes, views, stored procedures, sequences, etc.) belong to a schema. When referencing these objects, if a schema name is not specified, they are searched for in the namespaces listed in the search_path variable. This variable can be dynamically modified within a session using the PostgreSQL set search_path command. Klustron fully inherits these features, and in a MySQL connection, you can switch schemas by executing PostgreSQL's set search_path.

In MySQL, the concept of a database combines the physical storage location and the logical namespace. MySQL also has schemas, where each MySQL database is equivalent to a schema. PostgreSQL separates these concepts, offering greater flexibility and freedom.

To fully support MySQL behavior, Klustron allows users to switch the current database using the USE statement or the mysql_select_db() function. Klustron supports optionally treating create/drop database statements as create/drop schema when the transform_create_db_to_schema variable is set to true. In MySQL connections, this variable defaults to true and does not affect PostgreSQL connections. This setup allows MySQL users to switch databases and schemas in Klustron seamlessly. When transform_create_db_to_schema is true, all databases imported from MySQL become schemas within a single Klustron database, maintaining their original structure.

Starting from Klustron 1.2, Klustron supports a more flexible way of specifying database names, schema names, or both in the COM_INIT_DB command, as detailed in the next section.

  1. COM_CHANGE_USER

In MySQL, this command switches the current user and can also switch databases. Due to the reasons mentioned above, Klustron does not support switching databases but does support switching the current user and schema. The mysql_change_user() function call will succeed only if the specified database name matches the database the connection is already using; otherwise, it will fail and return an error.

Starting from Klustron 1.2, Klustron supports a more flexible way of specifying database names, schema names, or both in the COM_CHANGE_USER command, as detailed in the next section.

Additionally, users can execute PostgreSQL's set session authorization, set role, and other commands to switch users/roles.

  1. Database Name in the Above Two Commands

Starting from Klustron 1.2, Klustron supports a more flexible way to specify database names, schema names, or both in the COM_INIT_DB and COM_CHANGE_USER commands.

3.1. Determining the Default Schema

First, introduce the string-type configuration variable mysql_default_schema, which is only effective in MySQL connections. This variable specifies the default schema name used when the COM_INIT_DB and COM_CHANGE_USER commands do not specify a schema name. Users can set this variable in the configuration file of the compute node to ensure clients connect uniformly to a specific schema. If the variable is set to an empty string (NULL or ''), the default schema in this case will be 'public'. All references to "default schema" in the following sections are determined by this method.

In the COM_CHANGE_USER command, users can specify a string in the form str1.str2 as the database name, where str1 must match the current connection's database name (referred to as current_database), otherwise, it will fail. If str1 matches current_database and str2 is a valid schema name, then the search_path will switch to the str2 schema within the current_database; if str2 is not a valid schema within current_database, the command will fail.

In the COM_CHANGE_USER command, if the database name parameter is a string str without a dot (.), then if str matches the current_database name, it switches to its default schema (determined by the method described in section 3.1); if str does not match the current_database name, it checks if str is a schema under current_database, and if it exists, switches the search_path to this schema; otherwise, the command will fail.

In the COM_INIT_DB command, if the database name is in the form str without a dot, Klustron checks if str is an existing database, and if so, connects to the default schema of this database (determined by the method described in section 3.1); otherwise, in versions before 1.3.1, this command fails. From version 1.3.1 onwards, if the database name does not exist, the system assumes str is a schema under the database named in the mysql_default_db variable. If this database and schema exist, the connection succeeds; otherwise, it fails. The default value for mysql_default_db is postgres, and users can modify the compute node's configuration file to set it to any existing database in the system.

In the COM_INIT_DB command, if the database name is in the form str1.str2, then str1 must be an existing database, and str2 must be a schema within str1; otherwise, the connection will fail. This connection will connect to str1 and set the current_schema to str2.

Klustron does not change the search_path lookup rules: it always searches the system schemas pg_catalog, pg_tempX, etc., first, and then the schemas specified in the search_path.

MySQL Syntax Compatibility

  1. PostgreSQL requires identifier names to be fewer than 64 bytes, meaning if the identifier is in Chinese, the maximum number of Chinese characters in an identifier (M) will vary within a range, with the smallest M being less than 20 characters. Identifiers include database, table, column, procedure names, etc.

  2. Klustron's Support for MySQL Private DML Syntax and Introduction to Klustron's Unique Variable Read and Write Features. Regardless of whether using PostgreSQL or MySQL connection protocols, any syntax supported by Klustron, including these MySQL private DML syntaxes, standard SQL syntax, and other PostgreSQL private syntaxes supported by Klustron, can be executed.

  3. Klustron supports the autoincrement keyword to define columns as auto-increment, and the last_insert_id() function, which is used exactly like in MySQL.

  4. Klustron supports all MySQL private data types, including: a. TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, VARBINARY, BINARY These are stored and handled as bytea type in Klustron. b. TINYTEXT, MEDIUMTEXT, LONGTEXT These are stored and handled as text type in Klustron. c. DOUBLE (without PRECISION) d. byte, tinyint, mediumint, middleint, and all integer types with UNSIGNED modifier e. DATETIME: a timestamp without time zone information. f. TIMESTAMP The TIMESTAMP type has different meanings in MySQL and PostgreSQL: in MySQL, it represents a timestamp with time zone information, while in PostgreSQL, it represents a timestamp without time zone information. To be consistent with both, Klustron follows MySQL's definition in MySQL connections and PostgreSQL's definition in PostgreSQL connections.

  5. Klustron does not support user-defined MySQL stored procedures or MySQL-specific DDL syntax.

  6. Klustron supports MySQL-specific system functions, such as unix_timestamp(), timediff(), str_to_date(), etc.

  7. Klustron supports a series of common show commands (listed below).

  8. Klustron does not support the load data infile command; users can use the copy command to achieve similar data loading tasks.

Compatibility with MySQL Variable Read/Write Syntax

Updating Variables: Executed on the Primary Node of Each Storage Shard

  • Only partial variable updates are allowed, not all variables (for security control).
  • Session variables are synchronized to each new backend connection.
  • Updates to global variables are not persistent, similar to MySQL.
  • set global innodb_lock_wait_timeout = 4;
  • set session innodb_lock_wait_timeout = 3;
  • set persist_only innodb_lock_wait_timeout = 11;
  • set persist innodb_lock_wait_timeout = 11;
  • set innodb_lock_wait_timeout = 2;
  • set @@globa.innodb_lock_wait_timeout = 2;
  • set @@session innodb_lock_wait_timeout = 2;
  • set @@innodb_lock_wait_timeout = 2;

Reading Variables: Read from a Randomly Selected Primary Node of a Storage Shard

  • show global variables like 'innodb_lock_wait_timeout';
  • show session variables like 'innodb_lock_wait_timeout';
  • show variables like 'innodb_lock_wait_timeout'
  • select @@global.innodb_lock_wait_timeout, @@autocommit AS ac, @@session.sql_log_bin;

Not Limited by Connection Type (MySQL/PostgreSQL connections can both execute these).

Unique Variable Read/Write Feature: Introduction to Klustron's Unique Variable Read and Write Features

Support for Some Common MySQL DBA Syntax

  • SHOW DATABASES

  • SHOW SCHEMAS

    • True schemas, using databases for physical isolation and schemas for logical partitioning, suitable for multi-department, multi-business sharing of a single DB cluster.
    • MySQL: uses databases for logical partitioning.
  • SHOW TABLES [ IN SCHEMA ... ]

  • SHOW COLLATION

  • SHOW CREATE TABLE [LIKE pattern ]

  • SHOW WARNINGS

  • SHOW ERRORS

  • SHOW PROCESSLIST

    • Lists all sessions in all compute nodes.
  • KILL connection/query

    • Can kill connections in other compute nodes.
  • USE schema

    • PostgreSQL's schema is equivalent to MySQL's database.
    • Provides better data isolation, not only based on access control rules but also more secure.
    • To allow users to use the USE statement or mysql_select_db() to switch the current database, Klustron defaults to treating CREATE/DROP DATABASE as CREATE/DROP SCHEMA in MySQL connections. For more details, see the earlier explanation of the COM_INIT_DB command.

Other Technical Details of the Connection Protocol

Autocommit and Implicit Transaction Start & Commit

The way a transaction starts and commits is not tied together. An implicitly started transaction can be explicitly or implicitly committed, and an explicitly started transaction can also be explicitly or implicitly committed. This is completely consistent with MySQL.

  • Implicit Transaction Start: DML execution when autocommit is off.
  • Implicit Transaction Commit: Execution of one of the following statements:
    • set autocommit = on;
    • BEGIN/START TRANSACTION
    • Certain DDLs such as CREATE/DROP/ALTER TABLE/INDEX/DATABASE/SCHEMA, ALTER COLUMN, RENAME/TRUNCATE/REINDEX/CLUSTER TABLE
    • COMMIT (explicit commit)

Error Handling for Statement Execution

When enable_stmt_subxact is set to true, only the statement is rolled back upon an error, not the transaction. The application decides whether to roll back or commit the transaction (optionally after executing some additional statements). This achieves the same effect as MySQL. If enable_stmt_subxact is false, the transaction is rolled back upon a statement execution error, as per PostgreSQL convention.

Quotation Rules

  • Back-quote (`): Used for identifiers, e.g.,
CREATE TABLE ` space table ` (` space col ` int)
  • Single Quote ('): Used for string constants, e.g., 'this is a string constant', '这是一个字符串'
  • Double Quote ("):
    • In PostgreSQL connections, used for identifiers, e.g.,
CREATE TABLE " space table " (" space col " int)
  • In MySQL connections, if mysql_ansi_quotes=true, it behaves the same as in PostgreSQL connections, i.e., used for identifiers. This is equivalent to setting the ANSI_QUOTES flag in MySQL's sql_mode; otherwise, it is used for string constants, similar to MySQL's behavior when ANSI_QUOTES is not set in sql_mode.

Operators with Different Meanings in MySQL and PostgreSQL Connections

  • ||, &&, ^ These operators follow their definitions and usage in MySQL connections in MySQL and in PostgreSQL connections in PostgreSQL.

  • Stored procedures containing ||, &&, ^ operators can only run in the same type of connection in which they were created. If a stored procedure is created in a MySQL connection (using PostgreSQL PL/SQL syntax) and uses these operators, it can only be used in MySQL connections; if created in a PostgreSQL connection, it can only be used in PostgreSQL connections. If a stored procedure does not use these operators, it can be used in both MySQL and PostgreSQL connections.

  • The ? operator is a placeholder for prepared statement parameters in MySQL connections and is treated as a regular operator in PostgreSQL connections.

Autoincrement

  • A table can have multiple autoincrement columns. The field value is automatically generated when it is default, null, or does not exist.
  • Supports last_insert_id() function.
  • Cannot explicitly set offset and starting value.

SSL Support

Currently, Klustron's MySQL protocol does not support SSL. Future versions will include SSL support. At present, applications cannot use any MySQL client SSL API to work with Klustron.

END