Klustron Limits
Klustron Limits
By utilizing the computing and storage resources of multiple servers, Klustron distributed database can achieve virtually unlimited data management capabilities and greatly enhance the limits of PostgreSQL and MySQL centralized databases, making it almost impossible for application systems to be limited by actual loads. Nevertheless, Klustron still has some limitations. Of course, most of these limitations are only theoretical and are almost impossible to reach in practice.
General Limitations in a Klustron Cluster
| Limitation Item | Limit Value | Description |
|---|---|---|
| Maximum number of compute nodes | 1023 | |
| Maximum number of standby nodes per shard | 7 | Klustron development team can increase it according to actual user needs |
| Maximum number of shards | 2^24 (16M) | 2^N refers to 2 to the power of N, same below. |
| Maximum total number of nodes in all shards | 2^24 (16M) | |
| Maximum number of active client connections | 2^32 (4G) | Also limited by the number of file handles available to the operating system and available memory. |
| Maximum total number of databases and schemas | Both are approximately 2^32 | Also limited by the limit on the total number of directories in the file system where the data directory of each storage node is located. |
| Maximum number of relations in each database | 2^32 - number of system relations | Includes data tables, indexes, views, materialized views, sequences, etc. At the same time, the total number of data tables is also limited by the limit on the total number of files in the file system where the data directory of each storage node is located and the total number of files in each directory. The number of system relations ranges from hundreds to thousands, same below. |
| Maximum total number of partitions at all levels of a partitioned table | 2^32 - N - number of system relations | N is the total number of all internal nodes in the tree formed by this partitioned table. |
| Maximum number of data rows in each table | Single table: 2^48; Partitioned table: approximately 2^80 | Also limited by available disk space and the maximum file size allowed by the file system. |
| Maximum number of pages that can be used by each table | Single table: 2^32; Partitioned table: 2^64 | Also limited by available disk space and the maximum file size allowed by the file system. |
| Maximum data size (bytes) of each table | Single table: 2^48; Partitioned table: 2^80 | Also limited by available disk space and the maximum file size allowed by the file system. |
| Maximum page size of each table | 64KB | Same as PostgreSQL and MySQL; |
| Number of indexes per table | 64 | Same as MySQL's limitation |
| Number of columns per index | 16 | Same as MySQL's limitation |
| Number of columns per table | Dynamic, see specific description | The theoretical limit for this item in compute nodes is 1600, while being limited by page size and the width of existing column field types of a specific data table. In MySQL's InnoDB engine, this limit is further reduced to 1017, including virtual generated columns. RocksDB has a maximum index row length limit of 3072 bytes, with no limit on the number of non-indexed columns; RocksDB also has no limit on the number of columns or row length of a table. Therefore, after MySQL uses the MyRocks engine, the maximum number of columns in a table is limited by the MySQL server's data row memory buffer size (64KB) and the data types and maximum widths of the columns in a specific table. Therefore, if the number of columns in a table exceeds InnoDB's limit, the MyRocks engine can be used. |
| Maximum number of partition columns per table | 32 | |
| Total number of each type of database object | 2^32 - small number of system relations | Database objects include databases, schemas, tables, indexes, views, materialized views, sequences, as well as stored procedures, triggers, constraints, types, domains, etc. |
| Maximum length of database object name string | 63 bytes | Corresponding to approximately 30 Chinese characters (depending on the encoding, each Chinese character occupies 2-4 bytes). In addition, the total length of database and schema cannot exceed 60 characters (not bytes). |
| Maximum length of each text/blob field | 1GB | |
| Maximum number of characters in each varchar field | Dynamic, see specific description | Limited by the maximum number of bytes in the character set of the column, and also limited by the space occupied by other columns in the table --- all fields in a row share 64KB of row buffer memory space. If other columns in a table use 32KB, then approximately (slightly less than) 32KB remains for this varchar column. If the maximum number of bytes in the character set of this column is 4 (for example, using utf8mb4 character set), then the maximum number of characters in this varchar column is approximately 8K. When this limit is exceeded, the table creation statement will fail. |
Runtime Load Limits
Total Number of Transactions Executable During the Entire Lifecycle of the Cluster
Same as InnoDB and RocksDB in each shard. InnoDB stores a 6-byte unsigned integer transaction ID in each row header, meaning the maximum allowed transaction ID is 2^48 - 1. The transaction ID allocated and used in memory is an 8-byte unsigned integer, maximum 2^64 - 1. After the transaction ID reaches the limit, the transaction ID distributed by the InnoDB transaction subsystem will wrap around, but the InnoDB code does not consider how to handle transaction ID wrap-around, and it is not mentioned in MySQL documentation. Although this available transaction ID set is huge, it still theoretically has a limit. Once it reaches the limit and continues to allocate transaction IDs, it will wrap around, causing updated transactions to have smaller transaction IDs, which will cause InnoDB's MVCC snapshot isolation mechanism (row visibility judgment mechanism) to fail. Although this 'limit' is 2^64 - 1 during memory runtime, the maximum transaction ID stored in undo logs and row headers is 2^48 - 1, meaning that after reaching 2^48 - 1, continuing to allocate transaction IDs will cause confusion in the internal transaction state of the InnoDB system --- at least the transaction subsystem can no longer work properly in MVCC visibility judgment mechanism after recovery.
InnoDB does not perform a freeze operation similar to PostgreSQL. In PostgreSQL, a transaction ID maximum is 2^32 - 1, which in today's data management scenarios will inevitably wrap around, and in extreme cases, it may wrap around once a week. Therefore, freeze is a necessary operation for PostgreSQL, and it is one of the reasons why PostgreSQL users have to do vacuum and endure its serious impact on database performance. InnoDB wrap-around is almost impossible to occur, but once it happens in some extreme scenarios, MySQL currently has no effective mechanism to solve the problem, and theoretically, all data can only be reloaded into a new instance. One of these extreme scenarios is when autocommit=on, a large number of concurrent connections perform single-row additions, deletions, and modifications. This is the scenario that consumes transaction IDs the fastest, and in this scenario, the maximum transaction number limit of 2^48 - 1 may be reached.
RocksDB's maximum transaction ID is 2^64 -1, and it also has the assumption that "updated transaction IDs are larger", so there is also a theoretical tiny possibility of wrap-around, and once wrap-around occurs, RocksDB (and MyRocks) will not work properly.
Assuming a Klustron cluster has M shards, transactions that have executed addition, deletion, and modification statements in the cluster may write to at least one shard and at most M shards. Therefore, Klustron expands the maximum total number of transactions that a database cluster can execute by up to M times (compared to community MySQL), making it even less likely to have transaction ID wrap-around issues.
Total Number of Active Transactions That Can Be Executed Concurrently
Same as InnoDB and RocksDB in each shard, limited by the resources required by InnoDB and RocksDB transactions. When using the InnoDB engine, it is limited by the total number of available undo slots in all rollback segments of all undo tablespaces of InnoDB, which can be adjusted through innodb_rollback_segments, but the default innodb_rollback_segments is its maximum value of 128.
Mark the total number of available undo slots in all rollback segments located in the system tablespace and all undo tablespaces at a certain moment as N (N dynamically changes during system operation with load), and the total number of shards in the cluster as M. Then the total limit of active transactions in the cluster at this moment is in the range [N, M*N], depending on the shard set written by each transaction. If using community MySQL, M is always 1, so using Klustron can significantly increase the total number of active transactions that the system can execute concurrently.
With innodb_rollback_segments=128 and 4KB page size configuration, in the worst case, i.e., if each active transaction is inserting and updating in temporary tables, the maximum number of concurrent write transactions is only 2^14, which is about 16,000. If a larger page size is used during database initialization, (innodb_rollback_segments maximum limit is 128, cannot be larger), then this upper limit also increases accordingly.
RocksDB does not have an undo log mechanism, and the total number of its concurrently executed transactions is limited by available memory and the write_batch_max_bytes parameter. Under common write strategies (WRITE_COMMITTED and WRITE_PREPARED), a large amount of memory is needed to store the data rows added, deleted, and modified by active transactions. Therefore, if there are many transactions with large amounts of additions, deletions, and modifications at the same time, the number of transactions that can be executed concurrently will also be significantly reduced.
Number of Statements That Can Be Executed in Each Transaction
If the transaction contains addition, deletion, and modification statements, the compute node limits 32767 statements, while being limited by the resources required by InnoDB and RocksDB transactions in each shard's storage node.
When using the InnoDB engine, it is limited by the available disk space of the file system where the undo tablespace file is located and the maximum file size allowed by the file system, recording this limit as N. If the cluster has M shards, then the total limit in the cluster is between [N, M*N]. If using community MySQL, M is always 1, so using Klustron can significantly increase the number of addition, deletion, and modification statements that the system can execute in each transaction and the total amount of transaction addition, deletion, and modification data.
RocksDB does not have an undo log mechanism, but a transaction has a maximum row lock number limit (rocksdb_max_row_locks), and if a transaction locks more than this number of rows, it will fail. And if rocksdb_lock_scanned_rows=true, then even if a row is scanned and checked to not meet the filtering conditions of the update statement, it will still be continuously locked, although in MySQL's RC mode, row locks for rows that do not meet the conditions will be automatically released in this scenario. In addition, RocksDB's write_batch_max_bytes indicates the total memory space that can be occupied by additions, deletions, and modifications, which indirectly limits the total number of rows added, deleted, and modified in a transaction.
Limitations Imposed by Available Memory
When a large amount of internal and external memory swapping occurs, the database system cannot provide meaningful performance. Therefore, the previous two sections do not consider the case where available memory is extremely low, leading to frequent swapping. Available memory has many constraints on the operation of InnoDB and RocksDB transaction subsystems. The most memory-consuming items for InnoDB are the buffer pool, and in addition, it is mainly used to allocate memory required for a large number of internal data structures and objects, including open tables, especially transaction row locks. According to the number of rows added, deleted, and modified by transactions, this memory consumption varies widely. For RocksDB, it mainly includes memtable, block cache, and write batch for each active transaction (storing data rows added, deleted, and modified by active transactions), as well as memory consumption for transaction row locks. If InnoDB's buffer pool or RocksDB's block cache frequently evicts pages and reads from external storage, the performance of query execution will also be too slow to accept.
Therefore, assuming that memory is severely insufficient, the system can no longer provide acceptable performance, meaning that the actual effective number of active transactions and the number of statements that can be executed in each transaction are in most cases smaller than the above values in actual production systems.
Using a Klustron distributed database cluster can significantly increase the total amount of available memory, thereby avoiding the limitation of available memory on the total number of active transactions in the system and the scale of each transaction.
Summary
Klustron database greatly expands the multiple limitations of PostgreSQL, MySQL and its InnoDB and RocksDB, effectively avoiding the serious impact and long-term downtime caused by these limitations on business systems.
