This Blog is to share our knowledge and expertise on Linux System Administration and VMware Administration

Showing posts with label Linux Troubleshooting. Show all posts
Showing posts with label Linux Troubleshooting. Show all posts

Wednesday, December 27, 2017

Monitoring services using journalctl in RHEL7

Wednesday, December 27, 2017 0

Monitoring services using journalctl in RHEL7


Monitoring services using journalctl in RHEL7

Systemd's journal has the added advantage that its controls allow you to easily narrow down on messages generated by specific services.

1. First, display all the messages generated by your system.
This will show all the messages generated on the system; run the following commands:

[root@nsk ~]# journalctl
-- Logs begin at Tue 2017-12-26 07:31:23 IST, end at Tue 2017-12-26 08:32:09 IST. --
Dec 26 07:31:23 nsk systemd-journal[89]: Runtime journal is using 8.0M (max allowed 91.9M, trying to leave 137.9M free of 911.6M available → current limit 91.
Dec 26 07:31:23 nsk kernel: Initializing cgroup subsys cpuset
Dec 26 07:31:23 nsk kernel: Initializing cgroup subsys cpu
Dec 26 07:31:23 nsk kernel: Initializing cgroup subsys cpuacct
Dec 26 07:31:23 nsk kernel: Linux version 3.10.0-693.5.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #
Dec 26 07:31:23 nsk kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.5.2.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root
Dec 26 07:31:23 nsk kernel: e820: BIOS-provided physical RAM map:
Dec 26 07:31:23 nsk kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
Dec 26 07:31:23 nsk kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved…...
…………..
~]#
2. Now, display all system-related messages.
This command shows all the messages related to the system and not its users:
[root@nsk ~]# journalctl –-system

3. Display all the current user messages.
This command shows all messages related to the user that you are logged on with:
[root@nsk ~]# journalctl –-user

4. Next, display all messages generated by a particular service using the following command line:
journalctl --unit=<service>

[root@nsk ~]# journalctl --unit=sshd
-- Logs begin at Tue 2017-12-26 07:31:23 IST, end at Tue 2017-12-26 08:33:14 IST. --
Dec 26 07:31:29 nsk systemd[1]: Starting OpenSSH server daemon...
Dec 26 07:31:29 nsk sshd[944]: Server listening on 0.0.0.0 port 22.
Dec 26 07:31:29 nsk sshd[944]: Server listening on :: port 22.
Dec 26 07:31:29 nsk systemd[1]: Started OpenSSH server daemon.
Dec 26 07:33:37 nsk sshd[1238]: Accepted password for root from 10.0.2.2 port 60698 ssh2
Dec 26 07:34:11 nsk sshd[1261]: Accepted password for root from 10.0.2.2 port 60702 ssh2
Dec 26 08:30:19 nsk systemd[1]: Stopping OpenSSH server daemon...
……..

6. Now, display messages by priority.

Priorities can be specified by a keyword or number, such as debug (7), info (6), notice (5), warning (4), err (3), crit (2), alert (1), and emerg (0). When specifying a priority, this includes all the lower priorities as well. For example, err implies that crit, alert, and emerg are also shown. Take a look at the following command line:
journalctl -p <priority>

[root@nsk ~]# journalctl -p err
-- Logs begin at Tue 2017-12-26 07:31:23 IST, end at Tue 2017-12-26 08:33:14 IST. --
Dec 26 08:26:15 nsk rsyslogd[613]: imjournal: journal reloaded... [v8.24.0 try http://www.rsyslog.com/e/0 ]
Dec 26 08:30:21 nsk lvmetad[483]: Failed to accept connection errno 11.

7. Next, display messages by time.
You can show all messages from the current boot through the following commands:

[root@nsk ~]# journalctl -b
-- Logs begin at Tue 2017-12-26 07:31:23 IST, end at Tue 2017-12-26 08:33:14 IST. --
Dec 26 08:30:45 nsk systemd-journal[86]: Runtime journal is using 8.0M (max allowed 91.9M, trying to leave 137.9M free of 911.6M available → current limit 91.
Dec 26 08:30:45 nsk kernel: Initializing cgroup subsys cpuset
Dec 26 08:30:45 nsk kernel: Initializing cgroup subsys cpu
Dec 26 08:30:45 nsk kernel: Initializing cgroup subsys cpuacct
Dec 26 08:30:45 nsk kernel: Linux version 3.10.0-693.5.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #
Dec 26 08:30:45 nsk kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-693.5.2.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root
Dec 26 08:30:45 nsk kernel: e820: BIOS-provided physical RAM map:
You can even show all the messages within a specific time range by running the following:
[root@nsk ~]# journalctl --since="2017-12-26 08:30:00" --until="2017-12-26 09:00:00"
-- Logs begin at Tue 2017-12-26 07:31:23 IST, end at Tue 2017-12-26 08:33:14 IST. --
Dec 26 08:30:19 nsk polkitd[619]: Registered Authentication Agent for unix-process:1587:353648 (system bus name :1.40 [/usr/bin/pkttyagent --notify-fd 5 --fal
Dec 26 08:30:19 nsk ntpd[1551]: ntpd exiting on signal 15
Dec 26 08:30:19 nsk systemd[1]: Stopped target Network is Online.
Dec 26 08:30:19 nsk sshd[944]: Received signal 15; terminating.
Dec 26 08:30:19 nsk systemd[1]: Stopping Network is Online.
Dec 26 08:30:19 nsk crond[628]: (CRON) INFO (Shutting down)
…….

For instance, if you want to show all the error messages between 8:30 and 9:00 on 2017-12-26, your command would be the following:

[root@nsk ~]# journalctl -p err --since="2017-12-26 08:30:00" --until="2017-12-26 09:00:00"
-- Logs begin at Tue 2017-12-26 07:31:23 IST, end at Tue 2017-12-26 08:33:14 IST. --
Dec 26 08:30:21 nsk lvmetad[483]: Failed to accept connection errno 11.
[root@nsk ~]#

The journalctl binary is an executable one, so it is impossible to use the traditional "following" techniques such as tail –f or using less and pressing CTRL + F. Simply add -f or --follow as an argument to the journalctl command.

[root@nsk ~]# journalctl -f
-- Logs begin at Tue 2017-12-26 07:31:23 IST. --
Dec 26 08:30:53 nsk systemd[1]: Started Crash recovery kernel arming.
Dec 26 08:30:53 nsk systemd[1]: Startup finished in 392ms (kernel) + 1.702s (initrd) + 6.681s (userspace) = 8.777s.
Dec 26 08:32:08 nsk sshd[1259]: Accepted password for root from 10.0.2.2 port 63824 ssh2
Dec 26 08:32:09 nsk systemd[1]: Created slice User Slice of root.
Dec 26 08:32:09 nsk systemd[1]: Starting User Slice of root.
Although most environments are used to create syslog messages to troubleshoot, the journal does provide the added value of being able to create simple filters that allow you to monitor their messages live.

Wednesday, December 20, 2017

ssh_exchange_identification: Connection closed by remote host - Password less authentication setup

Wednesday, December 20, 2017 0

ssh_exchange_identification: Connection closed by remote host - Password less authentication setup 


Situation:
While setup passwordless authentication from testserver2 (192.181.166.55) to testserver1 (192.181.130.55)  server, getting  error "ssh_exchange_identification: Connection closed by remote host"

[kanachim@testserver2 .ssh]$ ssh -vv testserver1
OpenSSH_5.3p1-hpn13v7, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.181.166.55 [192.181.166.55] port 22.
debug1: Connection established.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug2: key_type_from_name: unknown key type '-----END'
debug1: identity file /home/kanachim/.ssh/id_rsa type 1
debug1: identity file /home/kanachim/.ssh/id_dsa type -1
ssh_exchange_identification: Connection closed by remote host

Solution:
Check /etc/hosts.deny on server testserver1

root@host testserver1# grep sshd /etc/hosts.deny 
# DenyHosts: Mon Dec 18 22:10:38 2017 | sshd: 192.181.166.55
sshd: 192.181.166.55

Remove the sshd entry from hosts.deny on testserver1

Login to testserver2 Switch to user kanachim & create new key

-bash-3.2$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/kanachim/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/kanachim/.ssh/id_rsa.
Your public key has been saved in/home/kanachim/.ssh/id_rsa.pub.
The key fingerprint is:
37:6a:69:c1:a4:01:4b:c5:34:be:32:42:90:0b:20:a0 kanachim@192.181.166.55
The key's randomart image is:
+--[ RSA 2048]----+
|B.  o++          |
|=. . +..         |
|E.. . o .        |
|..     *         |
|  . o o S o      |
|   . o   = .     |
|        =        |
|       o         |
|                 |
+-----------------+
-bash-3.2$
-bash-3.2$ ssh-copy-id kanachim@192.181.130.55
Password:
Now try logging into the machine, with "ssh 'kanachim@192.181.130.55'", and check in:
  .ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.

Now login from testserver2 to testserver1

-bash-3.2$ ssh 192.181.130.55
Last login: Thu Feb 27 00:30:38 2014 from 10.217.230.145
-bash-3.2$ hostname
testserver1

Tuesday, December 19, 2017

Recover RPM DB from a corrupted RPM database in RHEL 7

Tuesday, December 19, 2017 0

Rebuilding corrupted rpm database in RHEL7

Situation:
Although everything is done to ensure that your RPM databases are intact, your RPM database may become corrupt and unuseable. This happens mainly if the filesystem on which the rpm db resides is suddenly inaccessible (full, read-only, reboot, or so on).

Solution:
1.Start by creating a backup of your corrupt rpm db, as follows:
[root@nsk ~]# tar zcvf rpm-db.tar.gz /var/lib/rpm/*
tar: Removing leading `/' from member names
/var/lib/rpm/Basenames
/var/lib/rpm/Conflictname
/var/lib/rpm/__db.001
/var/lib/rpm/__db.002
/var/lib/rpm/__db.003
/var/lib/rpm/Dirnames
/var/lib/rpm/Group
/var/lib/rpm/Installtid
/var/lib/rpm/Name
/var/lib/rpm/Obsoletename
/var/lib/rpm/Packages
/var/lib/rpm/Providename
/var/lib/rpm/Requirename
/var/lib/rpm/Sha1header
/var/lib/rpm/Sigmd5
/var/lib/rpm/Triggername

2.Remove stale lock files if they exist through the following command:
[root@nsk ~]# rm -f /var/lib/rpm/__db*

3.Now, verify the integrity of the Packages database via the following:
[root@nsk ~]# /usr/lib/rpm/rpmdb_verify /var/lib/rpm/Packages; echo $?
BDB5105 Verification of /var/lib/rpm/Packages succeeded.
0

If it prints 0, proceed to next step.

4. Rename the Packages file (don't delete it, we'll need it!), as follows:
[root@nsk ~]# mv /var/lib/rpm/Packages  /var/lib/rpm/Packages.org

5. Now, dump the Packages db from the original Packages db by executing the following command:
[root@nsk ~]# cd /var/lib/rpm/
 [root@nsk rpm]# /usr/lib/rpm/rpmdb_dump Packages.org | /usr/lib/rpm/rpmdb_load Packages
rpmdb_load: BDB1540 configured environment flags incompatible with existing environment

6.Verify the integrity of the newly created Packages database. Run the following:
[root@nsk rpm]#  /usr/lib/rpm/rpmdb_verify /var/lib/rpm/Packages; echo $?
BDB5105 Verification of /var/lib/rpm/Packages succeeded.
0

If the exit code is not 0, you will need to restore the database from backup.

7. Rebuild the rpm indexes, as follows:
[root@nsk ~]# rpm -vv --rebuilddb
[root@nsk rpm]# rpm -vv --rebuilddb
D: rebuilding database /var/lib/rpm into /var/lib/rpmrebuilddb.1312
D: opening  db environment /var/lib/rpm private:0x401
D: opening  db index       /var/lib/rpm/Packages 0x400 mode=0x0
D: locked   db index       /var/lib/rpm/Packages
D: opening  db environment /var/lib/rpmrebuilddb.1312 private:0x401
D: opening  db index       /var/lib/rpmrebuilddb.1312/Packages (none) mode=0x42
D: opening  db index       /var/lib/rpmrebuilddb.1312/Packages 0x1 mode=0x42
D: disabling fsync on database
....
...
D: adding "5f7fd424d0773a4202731bff4901d449699b0929" to Sha1header index.
D: closed   db index       /var/lib/rpm/Packages
D: closed   db environment /var/lib/rpm
D: closed   db index       /var/lib/rpmrebuilddb.1312/Sha1header
D: closed   db index       /var/lib/rpmrebuilddb.1312/Sigmd5
D: closed   db index       /var/lib/rpmrebuilddb.1312/Installtid
D: closed   db index       /var/lib/rpmrebuilddb.1312/Dirnames
D: closed   db index       /var/lib/rpmrebuilddb.1312/Triggername
D: closed   db index       /var/lib/rpmrebuilddb.1312/Obsoletename
D: closed   db index       /var/lib/rpmrebuilddb.1312/Conflictname
D: closed   db index       /var/lib/rpmrebuilddb.1312/Providename
D: closed   db index       /var/lib/rpmrebuilddb.1312/Requirename
D: closed   db index       /var/lib/rpmrebuilddb.1312/Group
D: closed   db index       /var/lib/rpmrebuilddb.1312/Basenames
D: closed   db index       /var/lib/rpmrebuilddb.1312/Name
D: closed   db index       /var/lib/rpmrebuilddb.1312/Packages
D: closed   db environment /var/lib/rpmrebuilddb.1312

8. Use the following command to check the rpm db with yum for any other issues (this may take a long time):
[root@nsk rpm]# yum check
Loaded plugins: fastestmirror
....
...

9. Restore the SELinux context of the rpm database through the following command:
[root@nsk rpm]# restorecon -R -v /var/lib/rpm

Saturday, November 25, 2017

Virtual machines show warning messages when starting the udev daemon Linux

Saturday, November 25, 2017 0

Virtual machines show warning messages when starting the udev daemon.

After upgrading VMware Tools,  Linux virtual machines show warnings when starting the udev daemon.

dmesg shows the below messages.

Starting udev:
udevd[572]: add_to_rules: unknown key 'SUBSYSTEMS'
udevd[572]: add_to_rules: unknown key 'ATTRS{vendor}'
udevd[572]: add_to_rules: unknown key 'ATTRS{model}'
udevd[572]: add_to_rules: unknown key 'SUBSYSTEMS'
udevd[572]: add_to_rules: unknown key 'ATTRS{vendor}'
udevd[572]: add_to_rules: unknown key 'ATTRS{model}'

Ctrl+C will bypass udev daemon to finish the boot process.

To disable the warning message, comment out unused lines (ubuntu  & other type of unix entries) in the  /etc/udev/rules.d/99-vmware-scsi-udev.rule file

For linux we need to modify the below line from

ACTION=="add", BUS=="scsi", SYSFS{vendor}=="VMware, " , SYSFS{model}=="VMware Virtual S", RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"

To

ACTION=="add", BUS=="scsi", SYSFS{vendor}=="VMware " , SYSFS{model}=="Virtual disk ", RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'"

Save the modifiation and reboot the virtual machine.

Friday, November 17, 2017

Error "system was unable to find a physical volume" SOLVED -Step by Step

Friday, November 17, 2017 0

If we get Error  "system was unable to find a physical volume" . It needs  to restore  the corrupted Volume Group


Situation :

If the volume group metadata area of a physical volume is accidentally overwritten or otherwise destroyed, you will get an error message indicating that the metadata area is incorrect, or that the system was unable to find a physical volume with a particular UUID. You may be able to recover the data the physical volume by writing a new metadata area on the physical volume specifying the same UUID as the lost metadata.

Solution:

The following example shows the sort of output you may see if the metadata area is missing or corrupted.

[root@test]# lvs -a -o +devices

  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
  Couldn't find all physical volumes for volume group VG.
  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
  Couldn't find all physical volumes for volume group VG.

  ...

You may be able to find the UUID for the physical volume that was overwritten by looking in the /etc/lvm/archive directory. Look in the file VolumeGroupName_xxxx.vg for the last known valid archived LVM metadata for that volume group.

Alternately, you may find that deactivating the volume and setting the partial (-P) argument will enable you to find the UUID of the missing corrupted physical volume.

[root@test]# vgchange -an --partial

  Partial mode. Incomplete volume groups will be activated read-only.
  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.
  Couldn't find device with uuid 'zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf'.

  ...

Use the --uuid and --restorefile arguments of the pvcreate command to restore the physical volume. The following example labels the /dev/sdh1 device as a physical volume with the UUID indicated above, zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf. This command restores the physical volume label with the metadata information contained in centos_00000-1802035441.vg, the most recent good archived metatdata for volume group .

The restorefile argument instructs the pvcreate command to make the new physical volume compatible with the old one on the volume group, ensuring that the the new metadata will not be placed where the old physical volume contained data (which could happen, for example, if the original pvcreate command had used the command line arguments that control metadata placement, or it the physical volume was originally created using a different version of the software that used different defaults).

The pvcreate command overwrites only the LVM metadata areas and does not affect the existing data areas.

[root@test]# pvcreate --uuid "zhtUGH-1N2O-tHdu-b14h-gH34-sB7z-NHhkdf" --restorefile /etc/lvm/archive/centos_00000-1802035441.vg /dev/sdh1
  Physical volume "/dev/sdh1" successfully created

You can then use the vgcfgrestore command to restore the volume group's metadata.

[root@test]# vgcfgrestore VG
  Restored volume group VG 

You can now display the logical volumes.

[root@test]# lvs -a -o +devices

  LV     VG   Attr   LSize   Origin Snap%  Move Log Copy%  Devices

  stripe VG   -wi--- 300.00G                               /dev/sdh1 (0),/dev/sda1(0)
  stripe VG   -wi--- 300.00G                               /dev/sdh1 (34728),/dev/sdb1(0) 

The following commands activate the volumes and display the active volumes.

[root@test]# lvchange -ay /dev/VG/stripe
[root@test]# lvs -a -o +devices

  LV     VG   Attr   LSize   Origin Snap%  Move Log Copy%  Devices
  stripe VG   -wi-a- 300.00G                               /dev/sdh1 (0),/dev/sda1(0)
  stripe VG   -wi-a- 300.00G                               /dev/sdh1 (34728),/dev/sdb1(0)

If the on-disk LVM metadata takes as least as much space as what overrode it, this command can recover the physical volume. If what overrode the metadata went past the metadata area, the data on the volume may have been affected. You might be able to use the fsck command to recover that data