This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Showing posts with label RHEL Cluster. Show all posts
Showing posts with label RHEL Cluster. Show all posts

Thursday, October 15, 2015

What is split-brain condition in Red Hat Cluster?

Thursday, October 15, 2015
 We say a cluster has quorum if a majority of nodes are alive, communicating, and agree on the active cluster members.
    For example, in a thirteen-node cluster, quorum is only reached if seven or more nodes are communicating. If the seventh node dies, the cluster loses quorum and can no longer function.

  • A cluster must maintain quorum to prevent split-brain issues.
  • If quorum was not enforced, quorum, a communication error on that same thirteen-node cluster may cause a situation where six nodes are operating on the shared storage, while another six nodes are also operating on it, independently.
  • Because of the communication error, the two partial-clusters would overwrite areas of the disk and corrupt the file system.
  • With quorum rules enforced, only one of the partial clusters can use the shared storage, thus protecting data integrity.
  • Quorum doesn't prevent split-brain situations, but it does decide who is dominant and allowed to function in the cluster.
  • quorum can be determined by a combination of communicating messages via Ethernet and through a quorum disk.

What are Tie-breakers in Red Hat Cluster?

Thursday, October 15, 2015 0
Tie-breakers are additional heuristics that allow a cluster partition to decide whether or not it is quorate in the event of an even-split - prior to fencing.
  • With such a tie-breaker, nodes not only monitor each other, but also an upstream router that is on the same path as cluster communications.
  • If the two nodes lose contact with each other, the one that wins is the one that can still ping the upstream router.
  • That is why, even when using tie-breakers, it is important to ensure that fencing is configured correctly.CMAN has no internal tie-breakers for various reasons. However, tie-breakers can be implemented using the API

Thursday, October 1, 2015

How to Configure Linux Cluster with 2 Nodes on RedHat and CentOS

Thursday, October 01, 2015 0
In an active-standby Linux cluster configuration, all the critical services including IP, filesystem will failover from one node to another node in the cluster.

It explains how to create and configure two node redhat cluster using command line utilities.
The following are the high-level steps involved in configuring Linux cluster on Redhat or CentOS:
• Install and start RICCI cluster service
• Create cluster on active node
• Add a node to cluster
• Add fencing to cluster
• Configure fail over domain
• Add resources to cluster
• Sync cluster configuration across nodes
• Start the cluster
• Verify fail over by shutting down an active node

1. Required Cluster Packages

First make sure the following cluster packages are installed. If you don’t have these packages install them using yum command.
[root@rh1 ~]# rpm -qa | egrep -i "ricci|luci|cluster|ccs|cman"
modcluster-0.16.2-28.el6.x86_64
luci-0.26.0-48.el6.x86_64
ccs-0.16.2-69.el6.x86_64
ricci-0.16.2-69.el6.x86_64
cman-3.0.12.1-59.el6.x86_64
clusterlib-3.0.12.1-59.el6.x86_64

2. Start RICCI service and Assign Password

Next, start ricci service on both the nodes.

[root@rh1 ~]# service ricci start
Starting oddjobd: [ OK ]
generating SSL certificates... done
Generating NSS database... done
Starting ricci: [ OK ]

You also need to assign a password for the RICCI on both the nodes.

[root@rh1 ~]# passwd ricci
Changing password for user ricci.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Also, If you are running iptables firewall, keep in mind that you need to have appropriate firewall rules on both the nodes to be able to talk to each other.

3. Create Cluster on Active Node

From the active node, please run the below command to create a new cluster.
The following command will create the cluster configuration file /etc/cluster/cluster.conf. If the file already exists, it will replace the existing
cluster.conf with the newly created cluster.conf.

[root@rh1 ~]# ccs -h rh1.mydomain.net --createcluster mycluster
rh1.mydomain.net password:

[root@rh1 ~]# ls -l /etc/cluster/cluster.conf
-rw-r-----. 1 root root 188 Sep 26 17:40 /etc/cluster/cluster.conf
Also keep in mind that we are running these commands only from one node on the cluster and we are not yet ready to propagate the changes
to the other node on the cluster.

4. Initial Plain cluster.conf File

After creating the cluster, the cluster.conf file will look like the following:
[root@rh1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="1" name="mycluster">
<fence_daemon/>
<clusternodes/>
<cman/>
<fencedevices/>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>


5. Add a Node to the Cluster

Once the cluster is created, we need to add the participating nodes to the cluster using the ccs command as shown below.
First, add the first node rh1 to the cluster as shown below.

[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh1.mydomain.net
Node rh1.mydomain.net added.

Next, add the second node rh2 to the cluster as shown below.
[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh2.mydomain.net
Node rh2.mydomain.net added.
Once the nodes are created, you can use the following command to view all the available nodes in the cluster. This will also display the node
id for the corresponding node.

[root@rh1 ~]# ccs -h rh1 --lsnodes
rh1.mydomain.net: nodeid=1
rh2.mydomain.net: nodeid=2

6. cluster.conf File After Adding Nodes

This above will also add the nodes to the cluster.conf file as shown below.

[root@rh1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="3" name="mycluster">
<fence_daemon/>
<clusternodes>
<clusternode name="rh1.mydomain.net" nodeid="1"/>
<clusternode name="rh2.mydomain.net" nodeid="2"/>
</clusternodes>
<cman/>
<fencedevices/>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>

7. Add Fencing to Cluster

Fencing is the disconnection of a node from shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity.
A fence device is a hardware device that can be used to cut a node off from shared storage.
This can be accomplished in a variety of ways: powering off the node via a remote power switch, disabling a Fiber Channel switch port, or revoking a host’s SCSI 3 reservations.
A fence agent is a software program that connects to a fence device in order to ask the fence device to cut off access to a node’s shared storage (via powering off the node or removing access to the shared storage by other means).
Execute the following command to enable fencing.

[root@rh1 ~]# ccs -h rh1 --setfencedaemon post_fail_delay=0
[root@rh1 ~]# ccs -h rh1 --setfencedaemon post_join_delay=25
Next, add a fence device. There are different types of fencing devices available. If you are using virtual machine to build a cluster, use
fence_virt device as shown below.

[root@rh1 ~]# ccs -h rh1 --addfencedev myfence agent=fence_virt
Next, add fencing method. After creating the fencing device, you need to created the fencing method and add the hosts to the fencing method.
[root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh1.mydomain.net
Method mthd1 added to rh1.mydomain.net.
[root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh2.mydomain.net
Method mthd1 added to rh2.mydomain.net.

Finally, associate fence device to the method created above as shown below:
[root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh1.mydomain.net mthd1
[root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh2.mydomain.net mthd1

8. cluster.conf File after Fencing

Your cluster.conf will look like below after the fencing devices, methods are added.
[root@rh1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="10" name="mycluster">
<fence_daemon post_join_delay="25"/>
<clusternodes>
<clusternode name="rh1.mydomain.net" nodeid="1">
<fence>
<method name="mthd1">
<device name="myfence"/>
</method>
</fence>
</clusternode>
<clusternode name="rh2.mydomain.net" nodeid="2">
<fence>
<method name="mthd1">
<device name="myfence"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_virt" name="myfence"/>

</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>

9. Types of Failover Domain

A failover domain is an ordered subset of cluster members to which a resource group or service may be bound.
The following are the different types of failover domains:
• Restricted failover-domain: Resource groups or service bound to the domain may only run on cluster members which are also members
of the failover domain. If no members of failover domain are availables, the resource group or service is placed in stopped state.
• Unrestricted failover-domain: Resource groups bound to this domain may run on all cluster members, but will run on a member of the
domain whenever one is available. This means that if a resource group is running outside of the domain and member of the domain
transitions online, the resource group or
• service will migrate to that cluster member.
• Ordered domain: Nodes in the ordered domain are assigned a priority level from 1-100. Priority 1 being highest and 100 being the
lowest. A node with the highest priority will run the resource group. The resource if it was running on node 2, will migrate to node 1
when it becomes online.
• Unordered domain: Members of the domain have no order of preference. Any member may run in the resource group. Resource group
will always migrate to members of their failover domain whenever possible.

10. Add a Filover Domain

To add a failover domain, execute the following command. In this example, I created domain named as “webserverdomain”,
[root@rh1 ~]# ccs -h rh1 --addfailoverdomain webserverdomain ordered
Once the failover domain is created, add both the nodes to the failover domain as shown below:
[root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh1.mydomain.net priority=1
[root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh2.mydomain.net priority=2
You can view all the nodes in the failover domain using the following command.
[root@rh1 ~]# ccs -h rh1 --lsfailoverdomain
webserverdomain: restricted=0, ordered=1, nofailback=0
rh1.mydomain.net: 1
rh2.mydomain.net: 2

11. Add Resources to Cluster

Now it is time to add a resources. This indicates the services that also should failover along with ip and filesystem when a node fails. For
example, the Apache webserver can be part of the failover in the Redhat Linux Cluster.
When you are ready to add resources, there are 2 ways you can do this.
You can add as global resources or add a resource directly to resource group or service.
The advantage of adding it as global resource is that if you want to add the resource to more than one service group you can just reference the global resource on your service or resource group. In this example, we added the filesystem on a shared storage as global resource and referenced it on the service.
[root@rh1 ~]# ccs –h rh1 --addresource fs name=web_fs device=/dev/cluster_vg/vol01 mountpoint=/var/www fstype=ext4
To add a service to the cluster, create a service and add the resource to the service.
[root@rh1 ~]# ccs -h rh1 --addservice webservice1 domain=webserverdomain recovery=relocate autostart=1
Now add the following lines in the cluster.conf for adding the resource references to the service. In this example, we also added failover IP to
our service.
<fs ref="web_fs"/>
<ip address="192.168.1.12" monitor_link="yes" sleeptime="10"/>