LINUX AND VMWARE STUFF: Linux Troubleshooting

Showing posts with label Linux Troubleshooting. Show all posts

Tuesday, December 19, 2017

Recover RPM DB from a corrupted RPM database in RHEL 7

Kalaiselvan Nachimuthu Tuesday, December 19, 2017 0

Rebuilding corrupted rpm database in RHEL7

Situation:
Although everything is done to ensure that your RPM databases are intact, your RPM database may become corrupt and unuseable. This happens mainly if the filesystem on which the rpm db resides is suddenly inaccessible (full, read-only, reboot, or so on).

Solution:
1.Start by creating a backup of your corrupt rpm db, as follows:
[root@nsk ~]# tar zcvf rpm-db.tar.gz /var/lib/rpm/*
tar: Removing leading `/' from member names
/var/lib/rpm/Basenames
/var/lib/rpm/Conflictname
/var/lib/rpm/__db.001
/var/lib/rpm/__db.002
/var/lib/rpm/__db.003
/var/lib/rpm/Dirnames
/var/lib/rpm/Group
/var/lib/rpm/Installtid
/var/lib/rpm/Name
/var/lib/rpm/Obsoletename
/var/lib/rpm/Packages
/var/lib/rpm/Providename
/var/lib/rpm/Requirename
/var/lib/rpm/Sha1header
/var/lib/rpm/Sigmd5
/var/lib/rpm/Triggername

2.Remove stale lock files if they exist through the following command:
[root@nsk ~]# rm -f /var/lib/rpm/__db*

3.Now, verify the integrity of the Packages database via the following:
[root@nsk ~]# /usr/lib/rpm/rpmdb_verify /var/lib/rpm/Packages; echo $?
BDB5105 Verification of /var/lib/rpm/Packages succeeded.
0

If it prints 0, proceed to next step.

4. Rename the Packages file (don't delete it, we'll need it!), as follows:
[root@nsk ~]# mv /var/lib/rpm/Packages /var/lib/rpm/Packages.org

5. Now, dump the Packages db from the original Packages db by executing the following command:
[root@nsk ~]# cd /var/lib/rpm/
[root@nsk rpm]# /usr/lib/rpm/rpmdb_dump Packages.org | /usr/lib/rpm/rpmdb_load Packages
rpmdb_load: BDB1540 configured environment flags incompatible with existing environment

6.Verify the integrity of the newly created Packages database. Run the following:
[root@nsk rpm]# /usr/lib/rpm/rpmdb_verify /var/lib/rpm/Packages; echo $?
BDB5105 Verification of /var/lib/rpm/Packages succeeded.
0

If the exit code is not 0, you will need to restore the database from backup.

7. Rebuild the rpm indexes, as follows:
[root@nsk ~]# rpm -vv --rebuilddb
[root@nsk rpm]# rpm -vv --rebuilddb
D: rebuilding database /var/lib/rpm into /var/lib/rpmrebuilddb.1312
D: opening db environment /var/lib/rpm private:0x401
D: opening db index /var/lib/rpm/Packages 0x400 mode=0x0
D: locked db index /var/lib/rpm/Packages
D: opening db environment /var/lib/rpmrebuilddb.1312 private:0x401
D: opening db index /var/lib/rpmrebuilddb.1312/Packages (none) mode=0x42
D: opening db index /var/lib/rpmrebuilddb.1312/Packages 0x1 mode=0x42
D: disabling fsync on database
....
...
D: adding "5f7fd424d0773a4202731bff4901d449699b0929" to Sha1header index.
D: closed db index /var/lib/rpm/Packages
D: closed db environment /var/lib/rpm
D: closed db index /var/lib/rpmrebuilddb.1312/Sha1header
D: closed db index /var/lib/rpmrebuilddb.1312/Sigmd5
D: closed db index /var/lib/rpmrebuilddb.1312/Installtid
D: closed db index /var/lib/rpmrebuilddb.1312/Dirnames
D: closed db index /var/lib/rpmrebuilddb.1312/Triggername
D: closed db index /var/lib/rpmrebuilddb.1312/Obsoletename
D: closed db index /var/lib/rpmrebuilddb.1312/Conflictname
D: closed db index /var/lib/rpmrebuilddb.1312/Providename
D: closed db index /var/lib/rpmrebuilddb.1312/Requirename
D: closed db index /var/lib/rpmrebuilddb.1312/Group
D: closed db index /var/lib/rpmrebuilddb.1312/Basenames
D: closed db index /var/lib/rpmrebuilddb.1312/Name
D: closed db index /var/lib/rpmrebuilddb.1312/Packages
D: closed db environment /var/lib/rpmrebuilddb.1312

8. Use the following command to check the rpm db with yum for any other issues (this may take a long time):
[root@nsk rpm]# yum check
Loaded plugins: fastestmirror
....
...

9. Restore the SELinux context of the rpm database through the following command:
[root@nsk rpm]# restorecon -R -v /var/lib/rpm

Tags # Linux Troubleshooting Continue Reading

Saturday, November 25, 2017

Virtual machines show warning messages when starting the udev daemon Linux

Kalaiselvan Nachimuthu Saturday, November 25, 2017 0

Virtual machines show warning messages when starting the udev daemon.

After upgrading VMware Tools, Linux virtual machines show warnings when starting the udev daemon.

dmesg shows the below messages.

Starting udev:
udevd[572]: add_to_rules: unknown key 'SUBSYSTEMS'
udevd[572]: add_to_rules: unknown key 'ATTRS{vendor}'
udevd[572]: add_to_rules: unknown key 'ATTRS{model}'
udevd[572]: add_to_rules: unknown key 'SUBSYSTEMS'
udevd[572]: add_to_rules: unknown key 'ATTRS{vendor}'
udevd[572]: add_to_rules: unknown key 'ATTRS{model}'

Ctrl+C will bypass udev daemon to finish the boot process.

To disable the warning message, comment out unused lines (ubuntu & other type of unix entries) in the /etc/udev/rules.d/99-vmware-scsi-udev.rule file

For linux we need to modify the below line from

ACTION=="add", BUS=="scsi", SYSFS{vendor}=="VMware, " , SYSFS{model}=="VMware Virtual S", RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"

To

ACTION=="add", BUS=="scsi", SYSFS{vendor}=="VMware " , SYSFS{model}=="Virtual disk ", RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"

Save the modifiation and reboot the virtual machine.

Friday, November 17, 2017

Error "system was unable to find a physical volume" SOLVED -Step by Step

Kalaiselvan Nachimuthu Friday, November 17, 2017 0

If we get Error "system was unable to find a physical volume" . It needs to restore the corrupted Volume Group

Situation :

If the volume group metadata area of a physical volume is accidentally overwritten or otherwise destroyed, you will get an error message indicating that the metadata area is incorrect, or that the system was unable to find a physical volume with a particular UUID. You may be able to recover the data the physical volume by writing a new metadata area on the physical volume specifying the same UUID as the lost metadata.

Solution:

The following example shows the sort of output you may see if the metadata area is missing or corrupted.

[root@test]# lvs -a -o +devices

Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
Couldn't find all physical volumes for volume group VG.
Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
Couldn't find all physical volumes for volume group VG.

...

You may be able to find the UUID for the physical volume that was overwritten by looking in the /etc/lvm/archive directory. Look in the file VolumeGroupName_xxxx.vg for the last known valid archived LVM metadata for that volume group.

Alternately, you may find that deactivating the volume and setting the partial (-P) argument will enable you to find the UUID of the missing corrupted physical volume.

[root@test]# vgchange -an --partial

Partial mode. Incomplete volume groups will be activated read-only.
Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
...

Use the --uuid and --restorefile arguments of the pvcreate command to restore the physical volume. The following example labels the /dev/sdh1 device as a physical volume with the UUID indicated above, zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf. This command restores the physical volume label with the metadata information contained in centos_00000-1802035441.vg, the most recent good archived metatdata for volume group .

The restorefile argument instructs the pvcreate command to make the new physical volume compatible with the old one on the volume group, ensuring that the the new metadata will not be placed where the old physical volume contained data (which could happen, for example, if the original pvcreate command had used the command line arguments that control metadata placement, or it the physical volume was originally created using a different version of the software that used different defaults).

The pvcreate command overwrites only the LVM metadata areas and does not affect the existing data areas.

[root@test]# pvcreate --uuid "zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf" --restorefile /etc/lvm/archive/centos_00000-1802035441.vg /dev/sdh1
Physical volume "/dev/sdh1" successfully created

You can then use the vgcfgrestore command to restore the volume group's metadata.

[root@test]# vgcfgrestore VG
Restored volume group VG

You can now display the logical volumes.

[root@test]# lvs -a -o +devices

LV     VG   Attr   LSize   Origin Snap% Move Log Copy% Devices

stripe VG   -wi--- 300.00G                               /dev/sdh1 (0),/dev/sda1(0)
stripe VG   -wi--- 300.00G                               /dev/sdh1 (34728),/dev/sdb1(0)

The following commands activate the volumes and display the active volumes.

[root@test]# lvchange -ay /dev/VG/stripe
[root@test]# lvs -a -o +devices
LV     VG   Attr   LSize   Origin Snap% Move Log Copy% Devices
stripe VG   -wi-a- 300.00G                               /dev/sdh1 (0),/dev/sda1(0)
stripe VG   -wi-a- 300.00G                               /dev/sdh1 (34728),/dev/sdb1(0)

If the on-disk LVM metadata takes as least as much space as what overrode it, this command can recover the physical volume. If what overrode the metadata went past the metadata area, the data on the volume may have been affected. You might be able to use the fsck command to recover that data

Tags # Linux Troubleshooting Continue Reading

Tuesday, November 14, 2017

Server hang at GRUB during boot - SOLVED

Kalaiselvan Nachimuthu Tuesday, November 14, 2017 0

If a RHEL server hangs on boot with nothing more than the word GRUB in the upper left hand corner of the screen, this means that GRUB is unable to read its configuration file. If you actually get a GRUB menu, but the server does not boot then you have different and potentially more complex issue.

The most common reason for GRUB being unable to read its configuration is caused by a discrepancy between how the BIOS enumerated the hard drives and what GRUB expects to be its boot disk.

To correct this issue, boot the server in rescue mode.

Once booted into rescue mode and your root disk filesystems have been mounted. Check the /boot/grub/device.map file to ensure it has correctly identified the boot disk. hd0 should point to the disk that contains /boot. On an HP Proliant system you should see the following line:

(hd0) /dev/cciss/c0d0

If it does not, correct the file and then update GRUB by issuing the following command:

/sbin/grub --batch --device-map=/boot/grub/device.map --config-file=/boot/grub/grub.conf --no-floppy

And then from the GRUB prompt enter the following commands:

grub> root (hd0,0)
grub> setup (hd0)
grub> quit

You can now eject the ISO and reboot the server normally.

Tags # Linux Troubleshooting Continue Reading

Sunday, November 12, 2017

BUG: soft lockup - CPU#0 stuck for 10s!

Unknown Sunday, November 12, 2017 0

•Soft lockups are situations in which the kernel's scheduler subsystem has not been given a chance to perform its job for more than 10 seconds; they can be caused by defects in the kernel, by hardware issues or by extremely high workloads.

Run following command and check whether you still encounter these "soft lockup" errors on the system:

# sysctl -w kernel.softlockup_thresh=30

To make this parameter persistent across reboots by adding following line in /etc/sysctl.conf file:

kernel.softlockup_thresh=30

Note: The softlockup_thresh kernel parameter was introduced in Red Hat Enterprise Linux 5.2 in kernel-2.6.18-92.el5 thus it is not possible to modify this on older versions

Tags # Linux Troubleshooting Continue Reading

SOLVED : Buffer I/O error on boot

Unknown Sunday, November 12, 2017 0

Situation:

•After upgrading from Red Hat Enterprise Linux (RHEL) 5.1 to RHEL 5.5 (kernel 2.6.18-53.el5 to 2.6.18-194.8.1.el5), a system started to show IO errors on boot.

•The boot process took more time than before, but there are otherwise no significant problems occuring.

SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)
sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
SCSI device sdc: 419430400 512-byte hdwr sectors (214748 MB)
sdc: Write Protect is off
sdc: Mode Sense: 77 00 10 08
SCSI device sdc: drive cache: write back w/ FUA
sdc:end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
Dev sdc: unable to read RDB block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
end_request: I/O error, dev sdc, sector 0
Buffer I/O error on device sdc, logical block 0
unable to read partition table
sd 1:0:0:1: Attached scsi disk sdc
Vendor: SUN Model: LCSM100_S Rev: 0735
Type: Direct-Access ANSI SCSI revision: 05

Solution:

follow below solution to remediate above issue.

•Switching the controller to active/active mode would allow the devices to be probed through both controller ports and prevent the errors.
•An option to speed up the boot process is to rebuild the initrd without the HBA driver kernel modules and then probe the devices post boot, ie

# mkinitrd -v -f --omit-scsi-modules /boot/initrd-2.6.18-194.8.1.el5.img 2.6.18-194.8.1.el5

Tags # Linux Troubleshooting Continue Reading

Friday, November 3, 2017

SOLVED : pam_ldap: error trying to bind as user

Kalaiselvan Nachimuthu Friday, November 03, 2017

If you are getting error below after giving the correct password

Nov 2 03:56:42 testserver sshd[30173]: pam_ldap: error trying to bind as user "uid=testuser,ou=People,dc=test,dc=testdomain,dc=com" (Invalid credentials)

Nov 2 03:56:43 testserver sshd[30173]: Failed password for testuser from 10.17.0.3 port 51306 ssh2

Reason: Password is not syncing properly to all client server during the scheduled window

Solution : Restart the slapd service on LDAP server & it will sync to all server.

#/etc/init.d/slapd restart

Hope it helps.

Tags # Linux Troubleshooting Continue Reading

Wednesday, November 1, 2017

How do I exclude Kernel or other packages from getting updated in RHEL while updating system via yum?

Kalaiselvan Nachimuthu Wednesday, November 01, 2017 0

Excluding Kernel or other packages from getting updated in RHEL while updating system via yum

The up2date command in Red Hat Enterprise Linux 4 excludes kernel updates by default. The yum in Red Hat Enterprise Linux 5 includes kernel updates by default.

To skip installing or updating kernel or other packages while using the yum update utility in Red Hat Enterprise Linux 5 and 6 use following options:

Temporary solution via Command line:

# yum update --exclude=PACKAGENAME

For example, to exclude all kernel related packages:

# yum update --exclude=kernel*

To make permanent changes, edit the /etc/yum.conf file and following entries to it:

[main]
cachedir=/var/cache/yum/$basearch/$releasever
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
exclude=kernel* redhat-release* <====

NOTE: If there are multiple package to be excluded then separate them using a single space or comma. Also, do not add multiple

exclude= lines in the configuration file because yum only considers the last exclude entry.

To exclude 32 bit packages edit /etc/yum.conf file.

exclude=*.i?86 *.i686

Tags # Linux Troubleshooting Continue Reading

Tuesday, October 24, 2017

How to solve the Error "sendmail dead but subsys locked" sm-client (pid 28752) is running?

Kalaiselvan Nachimuthu Tuesday, October 24, 2017 0

Error "sendmail dead but subsys locked" sm-client (pid 28752) is running - This is because of 2 MTA (Mail Transfer Agent) were sunning same time. Something is trying to start the postfix service also cause this issue.

[root@testserver ~]# /etc/init.d/sendmail status
sendmail dead but subsys locked
sm-client (pid 28752) is running...
First check postfix is running on the server

[root@testserver ~]# /etc/init.d/postfix status
-b (pid 1765) is running...
[root@testserver ~]#

Try to stop the service if not able to bring down the service & kill the process. Then restart the sendmail service.

[root@testserver ~]# /etc/init.d/postfix stop
Shutting down postfix:                                     [FAILED]
[root@testserver ~]#

[root@testserver ~]# ps -ef | grep -i postfix
root      1765     1 0 Jun09 ?        00:02:06 /usr/libexec/postfix/master
postfix   1772 1765 0 Jun09 ?        00:00:03 qmgr -l -t fifo -u
root     25822 24576 0 16:56 pts/7    00:00:00 grep -i postfix

[root@testserver ]# kill -9 1765
[root@testserver ]#

[root@testserver ]# /etc/init.d/sendmail restart
Shutting down sm-client:                                   [ OK ]
Shutting down sendmail:                                    [ OK ]
Starting sendmail:                                         [ OK ]
Starting sm-client:                                        [ OK ]
[root@testserver ]#

[root@testserver ]# /etc/init.d/sendmail status
sendmail (pid 28421) is running...
sm-client (pid 28429) is running...

Hope it helps

Tags # Linux Troubleshooting Continue Reading

Thursday, October 19, 2017

Kernel: WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong appear on Guest 5.x Linux VM's

Unknown Thursday, October 19, 2017 0

This was due to the MAX_DIFFERENCE parameter value (in the APIC calibration loop) of 1000 cycles being too aggressive for virtual guests. APIC (Advanced Programmable Interrupt Controllers) and TSC (Time Stamp Counter) reads normally take longer than 1000 cycles when performed from inside a virtual guest, due to processors being scheduled away from and then back onto the guest. With this update, the MAX_DIFFERENCE parameter value has been increased to 10,000 for virtual guests.

These messages can be stopped by adding ‘apiccalibrationdiff=10000’ to guest kernel in /etc/grub.conf.

Tags # Linux Troubleshooting Continue Reading

Friday, October 13, 2017

How to reduce / file system utilization in Linux Server?

Kalaiselvan Nachimuthu Friday, October 13, 2017 0

Reducing / file system utilization in linux server is very rare part.
or we can say, / file system is full, how to do housekeeping?

Situation:

We have separate mount of /boot /usr /tmp /home /var /opt file system but still / file system utilization is almost full.

Solution:

First check under / directory which are the file systems are not mounted, collect it & run the below command

For Ex: Below listed are not mounted, so we need to check which one is huge size.

admin lib middleware net lib64 srv misc media mnt

[root@testserver]# du -sk /admin /lib /middleware /net /lib64 /srv /misc /media /mnt | sort -n

4       /srv
16      /mnt
20      /middleware
31852   /lib64
920388 /lib

So lets see what's under /lib

[root@testserver ]# du -sk /lib/* | sort -n
109096 /lib/firmware
793332 /lib/modules

looks modules is huge size, lets check that one also

du -sk /lib/modules/* | sort -n

105840 /lib/modules/2.6.32-431.20.3.el6.x86_64
107040 /lib/modules/2.6.39-400.215.3.el6***.x86_64
109152 /lib/modules/2.6.32-504.23.4.el6.x86_64
116676 /lib/modules/2.6.32-696.1.1.el6.x86_64
176888 /lib/modules/3.8.13-68.3.3.el6***.x86_64
177732 /lib/modules/3.8.13-118.17.5.el6***.x86_64

Now Check the current kernal, which running on the server

[root@testserver firmware]# uname -a
Linux testserver 3.8.13-118.17.5.el6***.x86_64 #2 SMP Wed Apr 12 09:16:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Check, what are the packages are related to 2.6 (old) kernel

#rpm -qf /lib/modules/2.6.*

kernel-2.6.32-431.20.3.el6.x86_64
kernel-2.6.32-504.23.4.el6.x86_64
kernel-2.6.32-696.1.1.el6.x86_64
kernel-uek-2.6.39-400.215.3.el6***.x86_64

We can see some old kernel is still available. So remove that one

#yum remove kernel-2.6.32-431.20.3.el6.x86_64

We can get some space if not sufficient then remove other (unused) older kernal.

Hope it helps.

Tags # Linux Troubleshooting Continue Reading

Tuesday, June 7, 2016

When up2date/yum fail with "Error Class Code 31" - Solved.

NAGARAJU AVALA Tuesday, June 07, 2016 0

Whenever Running up2date or yum update fails with below error

Error Message : Service not enabled for system profile: "system1.example.com"
Error Class Code: 31
Error Class Info :This system does not have a valid entitlement for Red Hat Network.

Please visit https://rhn-server/rhn/systems/SystemEntitlements. or
login at https://rhn-server, and from the "Overview" tab,
select "Subscription Management" to enable Redhat Network service for this system.
Situation

System registration fails with above error.
Redhat Network entitlements missing after Redhat contract renewal.
After executing rhn_register, the system appears in host list, but as unentitled.
Cannot entitle system.
System does not have a valid entitlement for Red Hat Network.
When trying to install a package, an error was received that said the system does not have a valid entitlement.
No longer able to update system.
Satellite certificate activation is failing with "Error Class Code 31"?

Resolution

If the system is not registered with rhn-server, follow the below steps to have an entitlement.

Log in to Satellite Customer Portal
Click on My Subscriptions
Under Redhat Network Classic select All Registered Systems
Click on system name
Click on Edit These Properties beside System Properties
Ensure either Update or Management is selected for Base Entitlement.
Click the Update Properties button located in the bottom-right corner.

Root Cause

Error Class Code: 31 means that a valid entitlement is not assigned to your system profile.
When you register a system, the base entitlement gets assigned to either Update / Management (as per the free entitlement in account) along with the base channel. But if the base entitlement is removed for the system profile then while updating the system it fails with Error Class Code: 31

Tags # Linux Troubleshooting Continue Reading

Tuesday, December 19, 2017

Rebuilding corrupted rpm database in RHEL7

Saturday, November 25, 2017

Virtual machines show warning messages when starting the udev daemon.

Friday, November 17, 2017

If we get Error "system was unable to find a physical volume" . It needs to restore the corrupted Volume Group

Tuesday, November 14, 2017

Sunday, November 12, 2017

kernel.softlockup_thresh=30

Friday, November 3, 2017

Nov 2 03:56:42 testserver sshd[30173]: pam_ldap: error trying to bind as user "uid=testuser,ou=People,dc=test,dc=testdomain,dc=com" (Invalid credentials)

Nov 2 03:56:43 testserver sshd[30173]: Failed password for testuser from 10.17.0.3 port 51306 ssh2

Reason: Password is not syncing properly to all client server during the scheduled window

Solution : Restart the slapd service on LDAP server & it will sync to all server.

Wednesday, November 1, 2017

Excluding Kernel or other packages from getting updated in RHEL while updating system via yum

The up2date command in Red Hat Enterprise Linux 4 excludes kernel updates by default. The yum in Red Hat Enterprise Linux 5 includes kernel updates by default.

To skip installing or updating kernel or other packages while using the yum update utility in Red Hat Enterprise Linux 5 and 6 use following options:

# yum update --exclude=PACKAGENAME

# yum update --exclude=kernel*

Tuesday, October 24, 2017

Thursday, October 19, 2017

Friday, October 13, 2017

We have separate mount of /boot /usr /tmp /home /var /opt file system but still / file system utilization is almost full.

Tuesday, June 7, 2016

Search This Blog

Blog Archive

Labels

Total Pageviews

Tags

Categories

Popular Posts

Followers