This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Sunday, November 12, 2017

BUG: soft lockup - CPU#0 stuck for 10s!

Sunday, November 12, 2017 0

•Soft lockups are situations in which the kernel's scheduler  subsystem has not been given a chance to perform its job for more than  10 seconds; they can be caused by defects in the kernel, by hardware  issues or by extremely high workloads.

Run following command and check whether you still encounter these "soft lockup" errors on the system:

# sysctl -w kernel.softlockup_thresh=30

To make this parameter persistent across reboots by adding following line in /etc/sysctl.conf file:

 kernel.softlockup_thresh=30


Note: The softlockup_thresh kernel parameter was introduced in Red Hat Enterprise Linux 5.2 in kernel-2.6.18-92.el5 thus it is not possible to modify this on older versions

SOLVED : Buffer I/O error on boot

Sunday, November 12, 2017 0
Situation:

•After upgrading from Red Hat Enterprise Linux (RHEL) 5.1 to RHEL 5.5 (kernel 2.6.18-53.el5 to 2.6.18-194.8.1.el5), a system started to show IO errors on boot.

•The boot process took more time than before, but there are otherwise no significant problems occuring.


SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)

sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)
sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
sdc:end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
Dev sdc: unable to read RDB block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
unable to read partition table
sd 1:0:0:1: Attached scsi disk sdc
 Vendor: SUN       Model: LCSM100_S         Rev: 0735
 Type:   Direct-Access                      ANSI SCSI revision: 05

Solution:


follow below solution to remediate above issue.


•Switching the controller to active/active mode would allow the devices to be probed through both controller ports and prevent the errors.

•An option to speed up the boot process is to rebuild the initrd without the HBA driver kernel modules and then probe the devices post boot, ie

# mkinitrd -v -f --omit-scsi-modules /boot/initrd-2.6.18-194.8.1.el5.img 2.6.18-194.8.1.el5

Performance collection tools to gather data for fault analysis in VMware

Sunday, November 12, 2017
This article explains how to use performance collection tools to gather data for analysis of faults such as:
    Unresponsive ESX hosts
    Unresponsive virtual machines
    ESX host purple diagnostic screens

Why gather performance data for a fault?

If the diagnostic logs do not help you determine the cause of a fault, you may need to use performance collection tools to gather further data for analysis. Set up performance collections tools to gather data about faults that may occur.

Performance gathering tools

VMware recommends the following tools for gathering performance data: 

top
The top utility provides a list of CPU-intensive tasks for the ESX host Service Console.
Use top in batch mode for Fault troubleshooting by directing the output to a file so that it can be reviewed after a recurrence.


Note: The top command is not available for ESXi.
To run the top utility, run the command:


# top –bc –d <delay in seconds> [–n <iterations>] > output-perf-stats-file.txt

 
Use the information in the output file to identify any trends before the fault. 


esxtop
The esxtop tool provides performance statistics for the entire ESX/ESXi host. It provides details of network, storage, CPU, and memory load from the VMkernel perspective. It provides details on a VMkernel world basis.
esxtop
To collect the data over long periods of time, run esxtop in batch mode. Direct the output to a file so that it can be reviewed after the fault.


To run the esxtop tool, run the command:


# esxtop –b –d <delay in seconds> [-n <iterations>] > output-perf-statistics-file.csv

 
Like esxtop, the resxtop tool provides performance statistics for a specified ESX host in the network. It provides the same performance information as esxtop and may be used either after deploying the VMware vSphere Management Assistant (vMA) virtual appliance or installing the VMware Command-Line Interface (vCLI). 


To run the resxtop tool and collect batch performance data, log into the vMA or open the vCLI, and execute the command:


# resxtop [server] [vihost] [portnumber] [username] -b -d <delay in seconds> [-n <interations>] > output-perf-statistics-file.csv


vm-support -s

 
Use the vm-support command with the -s parameter to collect performance statistics, system configuration information, and logging. Submit the file generated by this command to VMware Support for further assistance, if required. 


Performance Monitor (PERFMON.EXE)

 
Microsoft's Performance Monitor is a utility that comes with every Microsoft Windows NT-based Operating System. This utility can be used to monitor local and remote Microsoft Windows machines. It can log performance data and display data from logs or real-time data.


This utility is useful when reviewing data collected from the esxtop tool and for troubleshooting virtual machine unresponsiveness. When using Performance Monitor for virtual machine unresponsiveness, collect the data remotely from another Microsoft Windows machine so that the utility does not affect the data being gathered.
For more information about Performance Monitor on your specific version of Windows, refer to Microsoft support sites.

Friday, November 10, 2017

Time command in Linux Server - Brief explanation

Friday, November 10, 2017
NAME
       time - time a simple command or give resource usage
      
Format
       time [options] command [arguments...]

The time command runs the specified program command with the given arguments.  When command finishes, time writes a message to standard error giving timing statistics about this program run.  These statistics consist of 


(i) the elapsed real time between invocation and termination,
(ii) the user  CPU time (the sum of the tms_utime and tms_cutime values in a struct tms as returned by times.
(iii) the system CPU time (the sum of the tms_stime and tms_cstime values in a struct tms as returned by times.

real %e
user %U
sys %S

%e - Elapsed real time (in seconds).
%U - Total number of CPU-seconds that the process spent in user mode.
%S - Total number of CPU-seconds that the process spent in kernel mode.

Ex:

[root@nsk-linux ~]# time route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.2.0        0.0.0.0         255.255.255.0   U     1      0        0 eth4
0.0.0.0         10.0.2.2        0.0.0.0         UG    0      0        0 eth4

real    0m0.001s
user    0m0.000s
sys     0m0.001s

[root@nsk-linux ~]# time uptime

 08:34:58 up 57 min,  2 users,  load average: 0.04, 0.12, 0.08

real    0m0.003s
user    0m0.002s
sys     0m0.001s

For more option, please read man time.

Thursday, November 9, 2017

How to find the file in between some days & delete the same?

Thursday, November 09, 2017
Follow the below steps to find out the particular modified file in between 20 to 30 days & delete the same.

Command to find and list the file

#find / -mtime +20 -mtime -30 -type f -name test.* -exec ls -al {} \;   

Command to delete the listed file.

#find / -mtime +20 -mtime -30 -type f -name test.* -exec rm {} \;

As per your needs, change the file name.

Hope it helps.

Default Queue Depth values for QLogic HBAs for various ESXi/ESX versions

Thursday, November 09, 2017
This table lists the default Queue Depth values for QLogic HBAs for various ESXi/ESX versions:



The default Queue Depth value for Emulex adapters has not changed for all versions of ESXi/ESX released to date. The Queue Depth is 32 by default, and because 2 buffers are reserved, 30 are available for I/O data.

The default Queue Depth value for Brocade adapters is 32.

Wednesday, November 8, 2017

Enabling Intel VT-x and AMD-V Virtualization Hardware Extensions in BIOS of ESXI

Wednesday, November 08, 2017
This section describes how to identify hardware virtualization extensions and enable them in your BIOS if they are disabled. The Intel VT-x extensions can be disabled in the BIOS. The virtualization extensions cannot be disabled in the BIOS for AMD-V. 

Procedure for Enabling virtualization extensions in BIOS


1.    Reboot the computer and open the system's BIOS menu. This can usually be done by pressing the delete key, the F1 key or Alt and F4 keys or F10 key depending on the Harware.


2.    Enabling the virtualization extensions in BIOS


        a.    Open the Processor submenu The processor settings menu may be hidden in the Chipset, Advanced CPU Configuration or Northbridge.
         b.    Enable Intel Virtualization Technology (also known as Intel VT-x). AMD-V extensions cannot be disabled in the BIOS and should already be enabled. The virtualization extensions may be labeled Virtualization Extensions, Vanderpool or various other names depending on the OEM and system BIOS.
         c.    Enable Intel VT-d or AMD IOMMU, if the options are available. Intel VT-d and AMD IOMMU are used for PCI device assignment.
         d.    Select Save & Exit.


3.    Reboot the machine.


4.    When the machine has booted, run cat /proc/cpuinfo |grep -E "vmx|svm". Specifying --color is optional, but useful if you want the search term highlighted. If the command outputs, the virtualization extensions are now enabled. If there is no output your system may not have the virtualization extensions or the correct BIOS setting enabled.