Blog

What is MySQL replication?

MySQL is very popular as a database for many websites. Over time the database can grow to colossal size. Apart from storing page content (articles, comments, list of users), it also contains settings of the page or application itself. In this case, regular backup becomes a necessity. A large database is also a greater burden for the server, which has to cope with the management of millions of records. Replication can be a remedy for many of these problems.

MySQL database server provides a very interesting replication mechanism. What is it? In short, thanks to replication, each change on the master server leads to an identical change on the backup server (slave). Replication allows us to do that:

  • Scalability – it is possible to spread the load among many servers. Record saving and updating operations are performed on one server and data is downloaded and searched from another.
  • Security – thanks to replication we create a clone of the existing production base. If it does not protect us from DROP TABLE operations, it can help in case of hardware failure of the main server. A replicated database is perfect for performing a traditional backup, without the need to stop the work of the main database.
  • Analysis – complicated analytical operations, various conversions and statistical analyses can be performed on a separate server without the need to load the main database.
  • Separation – we can share a clone production database for developers or testers to do their work on a copy of the database.

MySQL replication mechanism

Replication of MySQL data is based on a very simple rule. The master server keeps a kind of logbook in which it records every action it has performed. It uses bin-logs for this purpose. There are binary files containing instructions that the master has performed. The backup server (slave) reads this data and executes queries one by one, filling the database with subsequent records. The result of this work are two identical databases.

Configuring the replication mechanism launches additional threads on the server side. An additional thread will appear on the master server (for each slave server), which is responsible for sending bin-logs to the slave servers.

The backup server creates two threads. The first one, called I/O Thread, is responsible for receiving the log from the main server. It saves them locally to disk in temporary files (relay-log). The second thread, called SQL Thread, parses these files and executes queries to the database.

Types of replication

Three different methods of replication are available, which translates into a format of data stored in bin-logs. This is due to the binlog_format variable, which can be as follows: ROW, STATEMENT, MIXED. These methods are:

  • SBR (statement-based replication) – in this mode, the server saves the queries it has made to the file. It was the only available mode before MySQL 5.1.4.
  • RBR (row-based replication) – the results of the queries are stored in the bin logs on the master server. The information about the record that has been changed is stored.
  • MFL (mixed-format logging) – it is a combination of the two above types of replications.

Each of the above methods has its advantages and disadvantages. The quickest method of replication is the use of SBR technology. The main server writes to the file the query it made, then the backup server reads and executes it. An example of such a query can be:

DELETE FROM customers WHERE customerNumber = 495;

This method is very fast and efficient. Only SQL queries are saved to the log file. Unfortunately, this causes problems in the case of more complex queries or stored procedures. Imagine a query that uses random functions (RAND()). Replication of such a query will have a completely different result on the side of each backup server.

The solution to this problem was the introduction of replication with the RBR method. In this case, the changes that occurred after the command was executed are stored in bin logs. Information on how to modify specific records is logged. Unfortunately, the method there is much slower than replication of queries. It also causes a significant increase in the amount of data sent between replicating servers. That’s why we created a mixed-format logging method.

In this method, in most cases, SQL queries are logged as in the case of SBR, while for queries whose result is not predictable, RBR replication is enabled.

Configuration

At the very beginning we will take care of the configuration of the master server. To do this, we need to edit the MySQL server configuration file. Most often it is /etc/my.cnf file. There should be such options as:

[mysqld]
log-bin = /tmp/mysql-bin
binlog_format = mixed
max_binlog_size = 50M
server-id = 1

Each option corresponds to a specific replication parameter:

  • log-bin = /tmp/mysql-bin – activates the mechanism for logging changes in the database. These changes are stored in a file.
  • binlog_format = mixed – sets the format for saving data to bin-logs. In this case it is a mixed format.
  • max_binlog_size = 50M – the maximum logbook size.
  • server-id – this number determines the unique server number within the replication mechanism.

There are also available, which allow us to replicate the selected database or table. By default, all databases and tables within the server will be cloned. The binlog-do-db=database option enforces replication of only selected databases.

After restarting MySQL server, we log into it (mysql -u root -p) and issue the following SQL query. It should show us the status of the master server.

mysql> SHOW MASTER STATUS\G
*************************** 1. row ***************************
            File: mysql-bin.000003
        Position: 106
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

mysql>

Now you need to create a user who will be responsible for the authorization of backup servers. To do this, we issue a SQL query:

CREATE USER 'repuser'@'%' IDENTIFIED BY 'haslo';
GRANT REPLICATION SLAVE ON *.* TO 'repuser'@'%' IDENTIFIED BY 'haslo';

Now is the time to configure the slave server. Most often it’s /etc/my.cnf file. It should have the options as below. They do not apply to MySQL 5.5 or later. For them we set the replication using the SQL statement.

server-id=2
master-host=192.168.0.164
master-user=repuser
master-password=haslo
master-connect-retry=30

For MySQL 5.5 and above:

CHANGE MASTER TO MASTER_HOST='192.168.0.164',
MASTER_USER='repuser',
MASTER_PASSWORD='haslo';

We connect the client to the server and give a command that will load the first portion of data from the main server and then stop the replication.

mysql> LOAD DATA FROM MASTER;
Query OK, 0 rows affected, 1 warning (0.34 sec)
mysql> SLAVE STOP;
Query OK, 0 rows affected (0.00 sec)

The next step is to set the replication parameters. We publish a SQL query:

CHANGE MASTER TO
MASTER_HOST='192.168.0.164',
MASTER_LOG_FILE='mysql-bin.000003',
MASTER_LOG_POS=1274;

MASTER_LOG_FILE and MASTER_LOG_POS values are taken from the SHOW MASTER STATUS command issued on the main server side. Now we can turn on the replication and check its status.

mysql> START SLAVE;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> SHOW SLAVE STATUS\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.164
                  Master_User: repuser
                  Master_Port: 3306
                Connect_Retry: 30
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 1274
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 251
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1274
              Relay_Log_Space: 407
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

ERROR: 
No query specified

mysql>

The replication should work well.

Replicating a selected table

MySQL allows you to replicate a selected table, database or exclude it from replication. Parameters are responsible for this:

replicate-do-db
replicate-ignore-db
replicate-do-table
replicate-ignore-table
replicate-wild-do-table
replicate-wild-ignore-table
binlog-do-db
binlog-ignore-db

In theory, they allow you to filter what is replicated. We can replicate or not selected table, database and several tables using wildcards. Filtering can be set at the binog level (binlog-do-db, binlog-ignore-db). We then define what data goes into the files at all.

In practice it is no longer so colourful. Imagine that we have set the option: replicate-ignore-db=sales. This means that we do not want to replicate the sales database.

USE prices;
UPDATE sales.january SET amount=amount+1000;

In this case, the inquiry will be replicated. Why? MySQL filtering only takes into account the setting of the active database using the USE command; therefore, if we refer to the database using the database_data.table syntax, the MySQL filter will not catch such a reference and will replicate the query.

Therefore, it is not recommended to use this type of functionality, if we are not sure what queries go to the database. It may turn out that a nicely configured replication will suddenly arise due to this type of problem.

Typical replication problems

Many administrators believe that replication will replace the need for data backup. We can use it for backups. Thanks to this, the backup is taken from one of the backup servers, without burdening the production-running main server. You can even shut down the slave server for a while, back up the files and then restart it.

Unfortunately, the replicated database will not protect us from all kinds of queries that delete records, tables or databases. Queries like TRUNCATE TABLE or DROP TABLE, will be very quickly transferred to the backup server and delete our backup. To avoid such situations, we can use the command CHANGE MASTER TO MASTER_DELAY = N;. This will delay the replication of data by N seconds. This may prevent us from deleting data on the backup server side.

Replication protects us from physical damage to the main server. In this case we can switch our application to a backup database.

One of the most important features for MySQL replication is that it is asynchronous. This does not give us any guarantee that any operation on the main database, they were sent and saved on the backup databases. It may happen that we have a great bad luck and shortly after the completion of the transaction, the main server switches off. Our application switches to a backup database. Unfortunately, the second database did not manage to replicate.

The solution to such problems is a semisynchronous replication. It was added quite recently in MySQL 5.5. After the COMMIT command is executed, the data is sent to the backup server. It saves the data to log relations. As soon as it saves all received data, it sends a confirmation to the master server. When the master server receives information from at least one slave server, the transaction is considered completed. Of course, such a solution affects the performance of the main database, which must wait for answers from the backup server.

How to fix the replication?

It will happen to us many times that the slave server stops replicating the data. There can be many reasons for this, but the most common reason for stopping replication is an erroneous SQL query that cannot be executed. Checking the status of the whole mechanism we will see:

Slave_IO_Running: Yes
Slave_SQL_Running: No

This means that the thread downloading data from the master server works, but the thread that should load it to the database for some reason has been disabled. Below we will also see the reason why this thread has stopped.

Last_Error: Error 'Cannot add or update a child row: a foreign key constraint fails (`cms`.`comment_replies`, CONSTRAINT `comment_replies_12` FOREIGN KEY (`comment_id`) REFERENCES `comments` (`id`) ON DELETE CASCADE)' on query. Default database: 'cms'. Query: 'INSERT INTO comment_replies (`ID`,`FILE_ID`,`RECIPIENT_ID`,`COMMENT_ID`,`PROFILE_ID`,`CREATED`,`NUMBER`,`AUTHOR`,`TEXT`) VALUES (NULL,5236,2466656,'1133',454,1303150999,1,'Lorem Ipsum','Proin vel nulla vel nisi eleifend rhoncus dignissim vitae nulla. Quisque odio nibh.')

The replication should stop by itself, but to be sure, execute the command:STOP SLAVE;. Now we can manually correct the database, e.g. by adding a missing element in another array, creating a table, and so on. We can always execute a command that allows us to bypass an erroneous query:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

Now all that remains is to turn on the replication with the help of replication: START SLAVE;.

Database replication mechanism is a very interesting solution. Before its implementation into production conditions, it is necessary to pay attention to a few shortcomings. It is worth testing the implementation of this mechanism in various situations in order to avoid unpleasant surprises.

Add a comment!