This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Monday, September 14, 2015

Redhat Cluster Interview Questions and Answers - Linvirtshell

1. How to freeze a service in cluster node
    #clusvcadm -Z service name

2. What is CMAN

  • Basically, cluster manager is a component of the cluster project that handles communications between nodes in the cluster.
  • CMAN is Cluster Manager. It manages cluster quorum and cluster membership.
  • CMAN runs on each node of a cluster
3. What is RGManager
  • RGManager manages and provides failover capabilities for collections of cluster resources called services, resource groups, or resource trees.
  • In the event of a node failure, RGManager will relocate the clustered service to another node with minimal service disruption. You can also restrict services to certain nodes, such as restricting httpd to one group of nodes while mysql can be restricted to a separate set of nodes.
  • When the cluster membership changes, openais tells the cluster that it needs to recheck it’s resources. This causes rgmanager, the resource group manager, to run. It will examine what changed and then will start, stop, migrate or recover cluster resources as needed.
  • Within rgmanager, one or more resources are brought together as a service. This service is then optionally assigned to a failover domain, an subset of nodes that can have preferential ordering.
4. What is Cluster Quorum
  • Quorum is a voting algorithm used by CMAN.
  • CMAN keeps a track of cluster quorum by monitoring the count of number of nodes  in cluster.
  • If more than half of members of a cluster are in active state, the cluster is said to be in Quorum
  • If half or less than half of the members are not active, the cluster is said to be down and all cluster activities will be stopped Quorum is defined as the minimum set of hosts required in order to provide service and is used to prevent split-brain situations.
  • The quorum algorithm used by the RHCS cluster is called “simple majority quorum”, which means that more than half of the hosts must be online and communicating in order to provide service.
5. What is split-brain
  • It is a condition where two instances of the same cluster are running and trying to access same resource at the same time, resulting in corrupted cluster integrity Cluster must maintain quorum to prevent split-brain issues.
  • It's necessary for a cluster to maintain quorum to prevent 'split-brain' problems. If we didn't enforce quorum, a communication error on that same thirteen-node cluster may cause a situation where six nodes are operating on the shared disk, and another six were also operating on it,independently.
  • Because of the communication error, the two partial-clusters would overwrite areas of the disk and corrupt the file system.
  • With quorum rules enforced, only one of the partial clusters can use the shared storage, thus protecting data integrity.Quorum doesn't prevent split-brain situations, but it does decide who is dominant and allowed to function in the cluster. Should split-brain occur, quorum prevents more than one cluster group from doing anything.
6. What is FencingFencing is the disconnection of a node from the cluster’s shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity.
  The cluster infrastructure performs fencing through the fence daemon, fenced.

  • Power fencing — A fencing method that uses a power controller to power off an inoperable node.
  • Storage fencing — A fencing method that disables the Fibre Channel port that connects storage to an inoperable node.
  • Other fencing — Several other fencing methods that disable I/O or power of an inoperable node, including IBM Bladecenters, PAP, DRAC/MC, HP ILO, IPMI, IBM RSA II, and others.
7. What can cause a node to leave the cluster?
      A node may leave the cluster for many reasons. Among them:
  • Shutdown: cman_tool leave was run on this node
  • Killed by another node. The node was killed with either by cman_tool kill or qdisk.
  • Panic: cman failed to allocate memory for a critical data structure or some other very bad internal failure.
  • Removed: Like 1, but the remainder of the cluster can adjust quorum downwards to keep working.
  • Membership Rejected: The node attempted to join a cluster but it's cluster.conf file did not match that of the other nodes. To find the real reason for this you need to examine the syslog of all the valid cluster members to find out why it was rejected.
  • Inconsistent cluster view: This is usually indicative of a bug but it can also happen if the network is extremely unreliable.
  • Missed too many heartbeats: This means what it says. All nodes are expected to broadcast a heartbeat every 5 seconds (by default). If none is received within
8. How can I define a two-node cluster if a majority is needed to reach quorum?
       We had to allow two-node clusters, so we made a special exception to the quorum rules. There is a special setting"two_node" in the /etc/cluster.conf file that looks like this:<cman expected_votes="1" two_node="1"/>

9. How can you define a cluster and what are its basic types?

       A cluster is two or more computers (called nodes or members) that work together to     perform a task. There are four major types of clusters:Storage
        High availability
        Load balancing
        High performance.

10. What is Storage Cluster?

  • Storage clusters provide a consistent file system image across servers in a cluster, allowing the servers to simultaneously read and write to a single shared file system.
  • Storage cluster simplifies storage administration by limiting the installation and patching of applications to one filesystem.
  • The High Availability Add-On provides storage clustering in conjunction with Red Hat GFS2
11. What is High Availability Cluster?
  • High availability clusters provide highly available services by eliminating single points of failure and by failing over services from one cluster node to another in case a node becomes inoperative.
  • Typically, services in a high availability cluster read and write data (via read-write mounted file systems).
  • A high availability cluster must maintain data integrity as one cluster node takes over control of a service from another cluster node.
  • Node failures in a high availability cluster are not visible from clients outside the cluster.
  • High availability clusters are sometimes referred to as failover clusters.
12. What is Load Balancing Cluster?
  • Load-balancing clusters dispatch network service requests to multiple cluster nodes to balance the request load among the cluster nodes.
  • Load balancing provides cost-effective scalability because you can match the number of nodes according to load requirements. If a node in a load-balancing cluster becomes inoperative, the load-balancing software detects the failure and redirects requests to other cluster nodes.
  • Node failures in a load-balancing cluster are not visible from clients outside the cluster.
  • Load balancing is available with the Load Balancer Add-On.
13. What is a High Performance Cluster?
  • High-performance clusters use cluster nodes to perform concurrent calculations.
  • A high-performance cluster allows applications to work in parallel, therefore enhancing the performance of the applications.
  • High performance clusters are also referred to as computational clusters or grid computing.
14. How many nodes are supported in Red hat 6 Cluster?
     A cluster configured with qdiskd supports a maximum of 16 nodes. The reason for the limit is because of scalability; increasing the node count increases the amount of synchronous I/O contention on the shared quorum disk device

15. What is the minimum size of the Quorum Disk?                                                            
    The minimum size of the block device is 10 Megabytes.

16. What is the order in which you will start the Red Hat Cluster services?
In Red Hat 4
    service ccsd start
    service cman start
    service fenced start
    service clvmd start (If CLVM has been used to create clustered volumes)
    service gfs start
    service rgmanager start
In RedHat 5
    service cman start
    service clvmd start
    service gfs start
    service rgmanager start
In Red Hat 6
    service cman start
    service clvmd start
    service gfs2 start
    service rgmanager start

17. What is the order to stop the Red Hat Cluster services?
In Red Hat 4
    service rgmanager stop
    service gfs stop
    service clvmd stop
    service fenced stop
    service cman stop
    service ccsd stop
In Red Hat 5
    service rgmanager stop
    service gfs stop
    service clvmd stop
    service cman stop
In Red Hat 6
    service rgmanager stop
    service gfs2 stop
    service clvmd stop
    service cman stop

18.What are the lock states in Red Hat Cluster?

     A lock state indicates the current status of a lock request. A lock is always in one of three states:
  • Granted — The lock request succeeded and attained the requested mode.
  • Converting — A client attempted to change the lock mode and the new mode is incompatible with an existing lock.
  • Blocked — The request for a new lock could not be granted because conflicting locks exist.
    A lock's state is determined by its requested mode and the modes of the other locks on the same resource.

19. What is the maximum file system support size for GFS2?

  • GFS2 is based on 64 bit architecture, which can theoretically accommodate an 8 EB file system.
  • However, the current supported maximum size of a GFS2 file system for 64-bit hardware is 100 TB.
  • The current supported maximum size of a GFS2 file system for 32-bit hardware for Red Hat Enterprise Linux Release
20. What is the journalling file system?
  • A journalling file system is a file system that maintains a special file called a journal that is used to repair any inconsistencies that occur as the result of an improper shutdown of a computer.
  • In journalling file systems, every time GFS2 writes meta data, the meta data is committed to the journal before it is put into place.
  • This ensures that if the system crashes or loses power, you will recover all of the meta data when the journal is automatically replayed at mount time.
    GFS2 requires one journal for each node in the cluster that needs to mount the file system. For example, if you have a 16-node cluster but need to mount only the file system from two nodes, you need only two journals. If you need to mount from a third node, you can always add a journal with the gfs2_jadd command.5.3 and later is 16 TB.

     NOTE: It is better to have 10 1TB file systems than one 10TB file system.

21. What is the default size of journals in GFS?

  • When you run mkfs.gfs2 without the size attribut for journal to create a GFS2 partition, by default a 128MB size journal is created which is enough for most of the applications.
  • In case you plan on reducing the size of the journal, it can severely affect the performance.
  • Suppose you reduce the size of the journal to 32MB it does not take much file system activity to fill an 32MB journal, and when the journal is full, performance slows because GFS2 has to wait for writes to the storage.
22.  What is DLM lock model?
  • DLM is a short abbreviation for Distributed Lock Manager.
  • A lock manager is a traffic cop who controls access to resources in the cluster, such as access to a GFS file system.
  • GFS2 uses locks from the lock manager to synchronize access to file system metadata (on shared storage) CLVM uses locks from the lock manager to synchronize updates to LVM volumes and volume groups (also on shared storage)
  • In addition, rgmanager uses DLM to synchronize service states.
  • without a lock manager, there would be no control over access to your shared storage, and the nodes in the cluster would corrupt each other's data.
23. What is rgmanager in Red Hat Cluster and its use?
  • This is a service termed as Resource Group Manager
  • RGManager manages and provides failover capabilities for collections of cluster resources called services, resource groups, or resource trees it allows administrators to define, configure, and monitor cluster services.
  • In the event of a node failure, rgmanager will relocate the clustered service to another node with minimal service disruption
24. What is luci and ricci in Red Hat Cluster?
  • Luci is the server component of the Conga administration utility
  • Conga is an integrated set of software components that provides centralized configuration and management of Red Hat clusters and storage
  • Luci is a server that runs on one computer and communicates with multiple clusters and computers via ricci.
  • Ricci is the client component of the Conga administration utilityricci is an agent that runs on each computer (either a cluster member or a standalone computer) managed by CongaThis service needs to be running on all the client nodes of the cluster.
25. What is cman in Red Hat Cluster?
  • This is an abbreviation used for Cluster Manager.
  • CMAN is a distributed cluster manager and runs in each cluster node.
  • It is responsible for monitoring, heartbeat, quorum, voting and communication between cluster nodes.
  • CMAN keeps track of cluster quorum by monitoring the count of cluster nodes.
26. What are the different port no. used in Red Hat Cluster?
    IP Port No     Protocol         Component
    5404,5405     UDP              corosync/cman
    11111             TCP               ricci
    21064            TCP               dlm (Distributed Lock Manager)
    16851            TCP               Modclustered
    8084              TCP               luci4196,4197    TCP               rgmanager

27. How does NetworkManager service affects Red Hat Cluster?
     The use of NetworkManager is not supported on cluster nodes. If you have installed NetworkManager on your cluster nodes, you should either remove it or disable it.
     # service NetworkManager stop
     # chkconfig NetworkManager off
The cman service will not start if NetworkManager is either running or has been configured to run with the chkconfig command

28. What is the command used to relocate a service to another node?
    clusvcadm -r service_name -m node_name