Skip to main content

Klustron's Binlog2sync tool introduce

KlustronAbout 2 min

Klustron's Binlog2sync tool introduce

background

Klustron is a distributed relational database management system, oriented to TB and PB level massive data processing, with high throughput and low latency to handle massive data and high concurrent read and write requests.

When using Klustron, customers may need to import the data in the original MySQL system into Klustron, or synchronize the data in Klustron to other storage systems in real time.

In order to support these functions, the Klustron team developed the Binlog2sync tool.

Binlog2sync Features

Connect to the source MySQL through the Binlogdump protocol, dump the Binlog events on the source MySQL; or directly read the Binlog file for analysis.

Process corresponding events according to Binlog event types, convert DML events such as INSERT/UPDATE/DELETE into standard SQL statements; directly output DDL events.

For distributed XA events, decide whether to output SQL statements according to the XA event commit/rollback, if XA rollback, then do not output SQL statements.

In order to prevent the Binlog2sync tool from occupying a large amount of memory due to an XA event containing multiple SQL statements, the user can configure the number of SQL statements to be cached. When the configured value is exceeded, the cached SQL statements will be automatically written to disk.

The Binlog2sync tool supports binlog_dump and binlog_dump_gtid to dump binlog in two ways, and automatically judges according to the user output parameters.

Supports filtering by gtid, binlog file location, and time conditions. In addition, functions such as filtering and mapping at the library table level are supported.

Binlog2sync uses preconditions

  1. binlog_format = row。
  2. binlog_row_image = full (recommended to enable), if it is in minimal mode, Binlog2sync can also work normally, when binlog2sync resolves to a specific table, if the table structure is not cached, the meta information of the table structure is obtained through the information_schema. COLUMNS table, this place requires Starting from the dump binlog position, no table structure change operation is allowed.
  3. When dumping binlog remotely, the minimum privileges of the account are select, replication slave.

How to use Binlog2sync

Description of command line parameters

binlog2sync:
  -h [ --help ]               print usage message
  --include_dbs arg           need parse log event for db, Format: db1,db2,...
  --remap_rules arg           db.table remap to new db.table, Format: db1.t1=>db2.t2,db11.t11=>db12.t12, ...
  --remote_host arg           connect remote mysql host
  --remote_port arg           connect remote mysql port
  --remote_user arg           connect remote mysql user
  --remote_password arg       connect remote mysql password
  --remote_binlog_file arg    start dump binlog from binlog file
  --binlog_position arg          start dump binlog from binlog position
  --exclude_gtids arg           sync events but those gtids
  --local_binlog_file arg      parse local binlog file
  --db_host arg                     send sql to db host
  --db_port arg                     send sql to db port
  --db_user arg                     send sql to db user
  --db_password arg           send sql to db password
  --commit_sql_num arg        number of one commit sql(如果不输入,默认为100条)
  --reserve_event_dir arg      save sql into file directory
  --reserve_event_count arg   reserve maxinum of sql in meomry
  --job_id arg                              binlog sync job id
  --stop_datetime arg         stop to parse binlog in date time
  --start_datetime arg        start to parse binlog in date time
  --stop_never_server_id arg  assign server id to connect db sync binlog(如果输入该参数则工具随机生成)
  --stop_never arg            sync binlog forever
  --verbose arg               print sync sql, default 0

Remotely connect to mysql dump binlog events, and the downstream is synchronized to mysql/Klustron.

./binlog2sync –remote_host=127.0.0.1 –remote_port=1000 –remote_user=xxx
–remote_password=xxxx –remote_binlog_file=binlog.xxxxx –binlog_position=xx –db_host=127.0.0.2
–db_port=1001 –db_user=xxx –db_password=xxxx

The intermediate output SQL statement is as follows: img

Dump the local binlog file and synchronize it downstream to mysql/Klustron.

./binlog2sync --local_binlog_file=binlog.000001 –binlog_position=xxx –db_host=127.0.0.2
–db_port=1001 –db_user=xxx –db_password=xxxx

The intermediate output SQL statement is as follows: img

END