Klustron's Binlog2sync tool introduce
Klustron's Binlog2sync tool introduce
background
Klustron is a distributed relational database management system, oriented to TB and PB level massive data processing, with high throughput and low latency to handle massive data and high concurrent read and write requests.
When using Klustron, customers may need to import the data in the original MySQL system into Klustron, or synchronize the data in Klustron to other storage systems in real time.
In order to support these functions, the Klustron team developed the Binlog2sync tool.
Binlog2sync Features
Connect to the source MySQL through the Binlogdump protocol, dump the Binlog events on the source MySQL; or directly read the Binlog file for analysis.
Process corresponding events according to Binlog event types, convert DML events such as INSERT/UPDATE/DELETE into standard SQL statements; directly output DDL events.
For distributed XA events, decide whether to output SQL statements according to the XA event commit/rollback, if XA rollback, then do not output SQL statements.
In order to prevent the Binlog2sync tool from occupying a large amount of memory due to an XA event containing multiple SQL statements, the user can configure the number of SQL statements to be cached. When the configured value is exceeded, the cached SQL statements will be automatically written to disk.
The Binlog2sync tool supports binlog_dump and binlog_dump_gtid to dump binlog in two ways, and automatically judges according to the user output parameters.
Supports filtering by gtid, binlog file location, and time conditions. In addition, functions such as filtering and mapping at the library table level are supported.
Binlog2sync uses preconditions
- binlog_format = row。
- binlog_row_image = full (recommended to enable), if it is in minimal mode, Binlog2sync can also work normally, when binlog2sync resolves to a specific table, if the table structure is not cached, the meta information of the table structure is obtained through the information_schema. COLUMNS table, this place requires Starting from the dump binlog position, no table structure change operation is allowed.
- When dumping binlog remotely, the minimum privileges of the account are select, replication slave.
How to use Binlog2sync
Description of command line parameters
binlog2sync:
-h [ --help ] print usage message
--include_dbs arg need parse log event for db, Format: db1,db2,...
--remap_rules arg db.table remap to new db.table, Format: db1.t1=>db2.t2,db11.t11=>db12.t12, ...
--remote_host arg connect remote mysql host
--remote_port arg connect remote mysql port
--remote_user arg connect remote mysql user
--remote_password arg connect remote mysql password
--remote_binlog_file arg start dump binlog from binlog file
--binlog_position arg start dump binlog from binlog position
--exclude_gtids arg sync events but those gtids
--local_binlog_file arg parse local binlog file
--db_host arg send sql to db host
--db_port arg send sql to db port
--db_user arg send sql to db user
--db_password arg send sql to db password
--commit_sql_num arg number of one commit sql(如果不输入,默认为100条)
--reserve_event_dir arg save sql into file directory
--reserve_event_count arg reserve maxinum of sql in meomry
--job_id arg binlog sync job id
--stop_datetime arg stop to parse binlog in date time
--start_datetime arg start to parse binlog in date time
--stop_never_server_id arg assign server id to connect db sync binlog(如果输入该参数则工具随机生成)
--stop_never arg sync binlog forever
--verbose arg print sync sql, default 0
Remotely connect to mysql dump binlog events, and the downstream is synchronized to mysql/Klustron.
./binlog2sync –remote_host=127.0.0.1 –remote_port=1000 –remote_user=xxx
–remote_password=xxxx –remote_binlog_file=binlog.xxxxx –binlog_position=xx –db_host=127.0.0.2
–db_port=1001 –db_user=xxx –db_password=xxxx
The intermediate output SQL statement is as follows:
Dump the local binlog file and synchronize it downstream to mysql/Klustron.
./binlog2sync --local_binlog_file=binlog.000001 –binlog_position=xxx –db_host=127.0.0.2
–db_port=1001 –db_user=xxx –db_password=xxxx
The intermediate output SQL statement is as follows: