It is an English version of my post of “用兩台家用電腦實現高可用性部署” dated 3rd September 2013.
The High Availability becomes the implicit non functional requirement since the enterprises has been paying rising attention of the ‘data’. The High Availability is the ability to resist failures in the whole system, rather than achieved by just a piece of software, or by deploying a load balance device. There is no perfect solution for High Availability because, with limited budget, you can’t solve all the problems which may arise from all sort of possibilities, including networks, hardware, software, and even the business logic. For an example of a B2B platform, which application servers have been deployed in redundancy. If one of application servers occurs a failure, the user requests can be routed to the server in good conditions and thus the services can be regarded as High Availability. Another example is the master-slave database deployment that ensure the minimum data loss during failures. The UPS and the backup power supply is another good example in which you secure the electricity supplies. Using two broadband suppliers is also a reasonable solution to prevent Single point of failure on network level. There is no perfect solution in High Availability! You’ve done all the four deployments in the platform. Guess what will happen if the payment service provider occurs a blackout for a couple of hours. Guess what will happen if there is fire happens at your neighbor and saves all the lives luckily, but the flood damaged both your main and backup power supplies. It is tragedy obviously because either failure cause a single point of failure that is not on your budget book! With large and sufficient budget, you could minimize the most of the risks. With limited budget, you could get rid of the most risky problem, and develop a emergency plan for the rest of the problems with your stakeholders.
The High Availability is a mature functionality on both hardware and networking devices nowadays. Redundant components is ready get “online” once a failure is detected. The complexity of handling data synchronization, data integrity, and data recovery leads to a complicated mechanism in High Availability of the database system. It makes senses to copy data in real time to achieve High Availability. This idea is implemented by copying the transaction logs files from the primary machine to the slave machine. If there is failure in the primary database, the slave database will take over and execute the transaction logs to roll back to the latest image. This way is easy but there are three problems. Firstly, it need human intervention to change the configuration file when a failure occurs. Secondly it takes time (proportional to the volume of data) to rollback, the downtime would be 52.56 minutes a year if the system claims to be 99.9999% HA. it may not be a very nice solution for large scale system. Thirdly, it is not cost-effective to buy a standby machine which is idle in most of the time. So what about MySql Cluster? It is a good idea if you can pay expensive license fee and adapt to a more complicated replication mechanism. The semi-replication mechanism causes additional synchronization, thus it requires hardwares with higher performance and hence higher cost. Another solution builds the cluster on SAN, for example of Oracle RAC. This solution offers high performance in synchronization and high scalablity, but the SAN is a Single point of failure itself. Today I want to introduce a High Availability solution which is open sourced, offers real time data synchronization, robust data protection, replication, and automation.
First of all, please familiar with yourself about DRBD from Linbit. I would like to share a successful case of deploying High Availability solution based on DRDB technology. I am not responsible and liable in the demonstration. Please test it thoroughly before applying to the production environment.
1. DRBD User Guide (8.3.x)
2. Linux HA User Guide
3. Combine GFS2 with DRBD
4. MySQL Availability with Red Hat Enterprise Linux
5. Dual Primary (Think Twice)
Background
In general, it is not possible to synchronize two computer systems (file systems) which issue disk write operation concurrently. GFS2 is a file system developed by Red Hat to solve the file system level synchronization. I am here to demonstrate how to make use of DRBD of Linbit, CMAN, and GFS2 to implement the Dual Primary system to achieve the High Availability.
Table of Content
0. Environment
1. Installation of operating system
2. Network configuration
3. Installation of softwares
4. DRBD initialization and first synchronization
5. GFS2 formating and mounting
6. Testing
0. Environment
Hardware environment
- Intel Pentium (R) Dual CPU T2330 @ 1.60 GHz
- 2GB RAM
- Intel Core2Duo CPU E7200 @ 2.53 GHz
- 4GB RAM
Software environment (identical on both computer)
1. Installation of operating system
GFS2 is developed by Red Hat. You need to pay for the Cluster suite (the software stated above), so I will demonstrate on CentOS 6.4 and install the softwares manually. Firstly, you have to prepare two PCs for installing CentOS 6.4. I would stress to use two physical machines and connected with cross-over Ethernet cable because the performance will be better. If you got two PCs with Windows installed, you can install VM Player to host the CentOS. It is not recommended to use a single Windows PC to create two VM Player instances to host the CentOS. It is possible theoretically but it will be less realistic.
1.1 Install CentOS 6.4 (non VM)
1.1.1 Download CentOS 6.4 ISO image ,make it into a bootable DVD disk.
1.1.2 Install the CentOS on both computers. Click “Next” button until seeing “Which type of installation would you like?”. Because it need a stand alone partition to install DRBD, but it offers only one boot partition and another one partition for the OS, so we need to choose “Create custom layout” as the image shown as followed
1.1.3 We re-organize the partition layout by deleting the existing partitions and creating new partitions. We use sda1 for boot partition, sda2 for LVM, sda3 for GFS2 partition. It is advised to leave some unused space for future expansion. Please note that don’t set the size of the GFS2 partition too large as it takes long time to synchronize. 10 GB or 20 GB is a good choice for testing purpose.
After finishing the partition layout, follow the instructions to complete the whole installation.
1.2.1 Install CentOS 6.4 (VM)
For Windows installation, firstly download and install VM Player 5.0.2, and also CentOS 6.4 ISO image. To create a VM Player instance, choose “Edit virtual machine setting” on the VM instance, and then click “create new virtual disk”, follow the instructions to complete the installation, and you will see the new partition sdb.
2 Network configuration
2.1 To minimize network latency, use cross-over Ethernet cable to connect both computer on their NIC, and configure as followed.
root@dell2# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 TYPE=Ethernet ONBOOT=yes BOOTPROTO=none IPADDR=192.168.1.16 BROADCAST=192.168.1.255 NETMASK=255.255.255.0 NETWORK=192.168.1.0 IPV6INIT=no IPV4_FAILURE_FATAL=yes
root@sony# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 TYPE=Ethernet ONBOOT=yes BOOTPROTO=none IPADDR=192.168.1.15 BROADCAST=192.168.1.255 NETMASK=255.255.255.0 NETWORK=192.168.1.0 IPV6INIT=no IPV4_FAILURE_FATAL=yes
2.2 DRBD configuration needs to use hostname, so configure the hostname as followed.
root@your_machine# cat /etc/hosts 192.168.1.15 sony.localdomain 192.168.1.16 dell2.localdomain
root@your_machine# yum install -y cman gfs2-utils kmod-gfs kmod-dlm modcluster ricci luci cluster-snmp iscsi-initiator-utils openais oddjob rgmanager
root@your_machine# wget http://elrepo.org/ elrepo-release-6-4.el6.elrepo.noarch.rpm#sthash.busZ7CGJ.dpuf
root@your_machine# rpm -ivUh elrepo-release-6-4.el6.elrepo.noarch.rpm
root@your_machine# gedit /etc/yum.repos.d/elrepo.repo
root@your_machine# yum --enablerepo=elrepo install drbd83-utils kmod-drbd83
root@your_machine# gedit /etc/cluster/cluster.conf
<?xml version="1.0"?> <cluster alias="cluster-setup" config_version="1" name="cluster-setup"> <rm log_level="4"/> <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="sony.localdomain" nodeid="1" votes="1"> <fence> <method name="2"> <device name="LastResortNode01"/> </method> </fence> </clusternode> <clusternode name="dell2.localdomain" nodeid="2" votes="1"> <fence> <method name="2"> <device name="LastResortNode02"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_manual" name="LastResortNode01" nodename="sony.localdomain"/> <fencedevice agent="fence_manual" name="LastResortNode02" nodename="dell2.localdomain"/> </fencedevices> <rm/> <totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/> </cluster>
root@your_machine# gedit /etc/drbd.conf
global { usage-count yes; } common { syncer { rate 100M; } } resource res2 { protocol C; startup { wfc-timeout 20; degr-wfc-timeout 10; # we will keep this commented until tested successfully: # become-primary-on both; } net { # the encryption part can be omitted when using a dedicated link for DRBD only: # cram-hmac-alg sha1; # shared-secret anysecrethere123; allow-two-primaries; } on sony.localdomain { device /dev/drbd2; disk /dev/sda3; address 192.168.1.15:7789; meta-disk internal; } on dell2.localdomain { device /dev/drbd2; disk /dev/sda3; address 192.168.1.16:7789; meta-disk internal; } disk { fencing resource-and-stonith; } handlers { #outdate-peer "/sbin/handler"; } }
- ‘resource’ is the reference name in DRBD configuration. I suggest to use ‘res2’ to point to ‘/dev’drbd2’ and use ‘res0’ to point to ‘/dev/drbd0’ for ease of management. The order of the device is not important.
- ‘device’ is the default path of the DRBD device. After DRBD is installed, the default devices will be shown as /dev/drbd0, /dev/drbd1, …., /dev/drbd9 .
- ‘disk’ is the hard disk partition for synchronization, we will format it as GFS2 a while later, and which we have already prepared in section 1.
- ‘address’ is the IP address of the computer. The default port number will be 7789.
root@your_machine# gedit /etc/init.d/drbd
root@your_machine# iptables -I OUTPUT -o eth0 -j ACCEPT root@your_machine# iptables -I INPUT -i eth0 -j ACCEPT root@your_machine# service iptables save
4. DRBD initialization and first synchronization
4.1 Start the DRBD services on both computers
root@your_machine# service drbd start
root@your_machine# drbdadm create-md res2
If you got error like this “exited with code 40″
Device size would be truncated, which would corrupt data and result in 'access beyond end of device' errors. You need to either * use external meta data (recommended) * shrink that filesystem first * zero out the device (destroy the filesystem) Operation refused. Command 'drbdmeta 0 v08 /dev/hdb1 internal create-md' terminated with exit code 40 drbdadm create-md ha: exited with code 40
root@your_machine# dd if=/dev/zero of=/dev/hdb1 bs=1M count=100
root@your_machine# drbdadm up res2
root@your_machine# #drbd-overview
1:res2 Connected Secondary/Secondary Inconsistent/Inconsistent C r----
# drbdadm -- --overwrite-data-of-peer primary res2
1:res2 Connected Primary/Secondary UpToDate/UpToDate C r----
#gedit /etc/drbd.conf
# mkfs.gfs2 -p lock_dlm -t cluster-setup:res2 /dev/drbd2 -j 2
# /etc/init.d/NetworkManager stop # service cman start
# mkdir /mnt/ha # mount -t gfs2 -o noatime /dev/drbd2 /mnt/ha