This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Monday, December 21, 2015

VMware: How to rollback ESXi 5.1 to 5.0

Monday, December 21, 2015 0
Whenever you find issues after upgrading  to esxi 5.1 from 5.0 , rollback is as simple as below.

Reboot the host and press R to start the Recovery Mode..
Installed hypervisors:
HYPERVISOR1: 5.0.0-623860
HYPERVISOR2: 5.1.0-799733 (Default)
CURRENT DEFAULT HYPERVISOR WILL BE REPLACED PERMANENTLY
DO YOU REALLY WANT TO ROLL BACK?

Press Y to start the roll back

image

Result:
image
The host is downgraded and back online again with VMware vSphere ESXi 5.0.0

How to Disable the interrupt remapping on ESXi

Monday, December 21, 2015 0
 ESXi/ESX 4.1

To disable interrupt remapping on ESXi/ESX 4.1, perform one of these options:

    Run this command from a console or SSH session to disable interrupt mapping:

    # esxcfg-advcfg -k TRUE iovDisableIR

    To back up the current configuration, run this command twice:

    # auto-backup.sh

    Note: It must be run twice to save the change.

    Reboot the ESXi/ESX host:

    # reboot

    To check if interrupt mapping is set after the reboot, run the command:

    # esxcfg-advcfg -j iovDisableIR

    iovDisableIR=TRUE
    In the vSphere Client:
        Click Configuration > (Software) Advanced Settings > VMkernel.
        Click VMkernel.Boot.iovDisableIR, then click OK.
        Reboot the ESXi/ESX host.

ESXi 5.x and ESXi 6.0.x

ESXi 5.x and ESXi 6.0.x does not provide this parameter as a GUI client configurable option. It can only be changed using the esxcli command or via the PowerCLI.


    To set the interrupt mapping using the esxcli command:

    List the current setting by running the command:

    # esxcli system settings kernel list -o iovDisableIR

    The output is similar to:

    Name          Type  Description                              Configured  Runtime  Default
    ------------  ----  ---------------------------------------  ----------  -------  -------
    iovDisableIR  Bool  Disable Interrupt Routing in the IOMMU   FALSE        FALSE    FALSE

    Disable interrupt mapping on the host using this command:

    # esxcli system settings kernel set --setting=iovDisableIR -v TRUE

    Reboot the host after running the command.

    Note: If the hostd service fails or is not running, the esxcli command does not work. In such cases, you may have to use the localcli instead. However, the changes made using localcli do not persist across reboots. Therefore, ensure that you repeat the configuration changes using the esxcli command after the host reboots and the hostd service starts responding. This ensures that the configuration changes persist across reboots.
    To set the interrupt mapping through PowerCLI:

    Note: The PowerCLI commands do not work with ESXi 5.1. You must use the esxcli commands as detailed above.

    PowerCLI> Connect-VIServer -Server 10.21.69.233 -User Administrator -Password passwd
    PowerCLI> $myesxcli = Get-EsxCli -VMHost 10.21.69.111
    PowerCLI> $myesxcli.system.settings.kernel.list("iovDisableIR")

    Configured  : FALSEDefault     : FALSE
    Description : Disable Interrrupt Routing in the IOMMU
    Name        : iovDisableIR
    Runtime     : FALSE
    Type        : Bool

    PowerCLI> $myesxcli.system.settings.kernel.set("iovDisableIR","TRUE")
    true

    PowerCLI> $myesxcli.system.settings.kernel.list("iovDisableIR")

    Configured  : TRUEDefault     : FALSE
    Description : Disable Interrrupt Routing in the IOMMU
    Name        : iovDisableIR
    Runtime     : FALSE
    Type        : Bool
    After the host has finished booting, you see this entry in the /var/log/boot.gz log file confirming that interrupt mapping has been disabled:

    TSC: 543432 cpu0:0)BootConfig: 419: iovDisableIR = TRUE

How to resolve - vCenter Server task migration fails with the error: Failed to create journal file provider, Failed to open for write

Monday, December 21, 2015 0
For vCenter Server

The journal files for vCenter Server on Windows are located at:

    Windows 2003 and earlier – %ALLUSERSPROFILE%\Application Data\VMware\VMware VirtualCenter\journal\
    Windows 2008 and later – %PROGRAMDATA%\VMware\VMware VirtualCenter\journal\

Whenever you..

    Cannot perform provisioning operations, such as vMotion, Clone, storage DRS
    Cannot create new virtual machines.
    Cannot add RDM disk to a virtual machine.
    Provisioning operations (such as vMotion, Clone, or Migrate) fail.
    vCenter Server task migration fails. You see the error:

    A general system error occurred: Failed to create journal file providerFailed to open "<filename>" for write   

Cause

This issue occurs if there is not enough free disk space to store the journal information. The management components in vCenter Server and ESX/ESXi record transactions to journal files when tracking long-running operations. The path and filename cited in the error message indicate the layer that failed to create a journal file.

The Resolution should be ..

Delete or archive the unnecessary files on this filesystem to free up disk space. Depending on your vCenter Server implementation, it is recommended to have a minimum of 40GB of disk space free on the system.



Explain about VSS writers in Virtual Machines and how to disable the specific VSS writers with VMware Tools

Monday, December 21, 2015 0
VMware products may require file systems within a guest operating system to be quiesced prior to a snapshot operation for the purposes of backup and data integrity.
VMware products which use quiesced snapshots include, but are not limited to, VMware Consolidated Backup and VMware Data Recovery.
As of ESX 3.5 Update 2, quiescing can be done by Microsoft Volume Shadow Copy Service (VSS), which is available in Windows Server 2003.


Operating systems which do not have VSS make use of the SYNC driver for quiescing operations. When VSS is invoked, all VSS providers must be running. If there is an issue with any third-party providers or the VSS service itself, the snapshot operation may fail.
Before verifying a VSS quiescing issue, ensure that you are able to create a manual non-quiesced snapshot using the vSphere Snapshot Manager.


With vSphere 4.0, VMware introduced the ability to disable specificVSS writers for the benefit of troubleshooting a specific VSS writer issue.

If you experience an issue backing up a specific virtual machine using snapshot technology and you have identified an issue with a specific VSS writer within the virtual machine, this blog  explains how to disable that VSS writer from being called during a snapshot operation.

To disable a specific VSS writer being called during a snapshot operation:

    Determine the name of the VSS writer that you want to exclude from the snapshot operation. Run this command from within Windows:
    vssadmin list writers
    Note: With Windows Vista, 7, and 2008 the command prompt may need to be run with administrator elevation.

    You see output similar to:

    Writer name: 'Task Scheduler Writer'
       Writer Id: {d61d61c8-d73a-4eee-8cdd-f6f9786b7124}
       Writer Instance Id: {1bddd48e-5052-49db-9b07-b96f96727e6b}
       State: [1] Stable
       Last error: No error
    Note: The name of the VSS Writer is highlighted.

    Create or edit the vmbackup.conf file which is located at %ALLUSERSPROFILE%\Application Data\VMware\VMware Tools\ .

    Note: If the vmbackup.conf file does not exist then create it.
    Place the name of the VSS writer you want to disable on a separate line. If you want to disable more than one VSS writer, ensure that you place each VSS writer name on a separate line. For example:
    Task Scheduler Writer
    NTDS
    SqlServerWriter
    Microsoft Exchange Replica Writer
    Microsoft Exchange Writer
    Restart the VMware Tools service.
    When the writer issue has been resolved, you can remove the offending writer from the vmbackup.conf file.

Note: VMware does not provide these VSS writers. Engage the provider of the VSS writer to troubleshoot the writer issue to ensure application consistency with the writer.

Wednesday, December 9, 2015

Explain about SysRq command and How to reboot the hanged physical Linux & Xen Linux VM Server

Wednesday, December 09, 2015 0
The magic SysRq key is a key combination in the Linux kernel which allows the user to perform various low level commands regardless of the system’s state.

It is often used to recover from freezes, or to reboot a computer without corrupting the filesystem. The key combination consists of Alt+SysRq+commandkey. In many systems the SysRq key is the printscreen key.

First, you need to enable the SysRq key, as shown below.

echo "1" > /proc/sys/kernel/sysrq

List of SysRq Command Keys

Following are the command keys available for Alt+SysRq+commandkey.

    ‘k’ – Kills all the process running on the current virtual console.
    ‘s’ – This will attempt to sync all the mounted file system.
    ‘b’ – Immediately reboot the system, without unmounting partitions or syncing.
    ‘e’ – Sends SIGTERM to all process except init.
    ‘m’ – Output current memory information to the console.
    ‘i’ – Send the SIGKILL signal to all processes except init
    ‘r’ – Switch the keyboard from raw mode (the mode used by programs such as X11), to XLATE mode.
    ‘s’ – sync all mounted file system.
    ‘t’ – Output a list of current tasks and their information to the console.
    ‘u’ – Remount all mounted filesystems in readonly mode.
    ‘o’ – Shutdown the system immediately.
    ‘p’ – Print the current registers and flags to the console.
    ‘0-9′ – Sets the console log level, controlling which kernel messages will be printed to your console.
    ‘f’ – Will call oom_kill to kill process which takes more memory.
    ‘h’ – Used to display the help. But any other keys than the above listed will print help.

Perform a Safe reboot of Linux

To perform a safe reboot of a Linux computer which hangs up, do the following. This will avoid the fsck during the next re-booting. i.e Press Alt+SysRq+letter highlighted below.

  •     unRaw (take control of keyboard back from X11,
  •     Terminate (send SIGTERM to all processes, allowing them to terminate gracefully),
  •     Kill (send SIGILL to all processes, forcing them to terminate immediately),
  •     Sync (flush data to disk),
  •     Unmount (remount all filesystems read-only),
  •     Reboot.
VM Server

 To perform a safe reboot of a Linux Xen Virtual Server which hangs up, do the following. This will avoid the fsck during the next re-booting.

Run the below command in Xen Dom0.

#xm sysrq <domainid> s
#xm sysrq <domainid> u
#xm sysrq <domainid> b


Thursday, December 3, 2015

What are the tools available to properly diagnose a network performance problem in Linux Server?

Thursday, December 03, 2015 0
Below listed Linux tools are used to diagnose the network performance in Linux server.

netstat

    A command-line utility that prints network connections, routing tables, interface statistics, masquerade connections and multicast memberships. It retrieves information about the networking subsystem from the /proc/net/ file system. These files include:

        /proc/net/dev (device information)
        /proc/net/tcp (TCP socket information)
        /proc/net/unix (Unix domain socket information)

    For more information about netstat and its referenced files from /proc/net/, refer to the netstat man page: man netstat.
dropwatch
    A monitoring utility that monitors packets dropped by the kernel. For more information, refer to the dropwatch man page: man dropwatch.

ip
    A utility for managing and monitoring routes, devices, policy routing, and tunnels.

ethtool
    A utility for displaying and changing NIC settings.

/proc/net/snmp
    A file that displays ASCII data needed for the IP, ICMP, TCP, and UDP management information bases for an snmp agent. It also displays real-time UDP-lite statistics.

Wednesday, December 2, 2015

Explain about Linux Memory Huge Pages & Transparent Huge Pages

Wednesday, December 02, 2015 0
1.       Memory is managed in blocks known as pages.
2.       A page is 4096 bytes.
3.       1MB of memory is equal to 256 pages;
4.       1GB of memory is equal to 256,000 pages, etc.
5.       CPUs have a built-in memory management unit that contains a list of these pages, with each page referenced through a page table entry

There are two ways to enable the system to manage large amounts of memory:

    Increase the number of page table entries in the hardware memory management unit
    Increase the page size

The first method is expensive, since the hardware memory management unit in a modern processor only supports hundreds or thousands of page table entries.

Red Hat Enterprise Linux 6 implements the second method

  • Simply put, huge pages are blocks of memory that come in 2MB and 1GB sizes.
  • The page tables used by the 2MB pages are suitable for managing multiple gigabytes of memory, whereas the page tables of 1GB pages are best for scaling to terabytes of memory.
  • Huge pages must be assigned at boot time.
  • They are also difficult to manage manually, and often require significant changes to code in order to be used effectively.

THP (transparent huge pages) is an abstraction layer that automates most aspects of creating, managing, and using huge pages.

  • THP hides much of the complexity in using huge pages from system administrators and developers.
  • As the goal of THP is improving performance, its developers (both from the community and Red Hat) have tested and optimized THP across a wide range of systems, configurations, applications, and workloads.
  • This allows the default settings of THP to improve the performance of most system configurations
  • THP is not recommended for database workloads.
  • THP can currently only map anonymous memory regions such as heap and stack space.