Troubleshooting tips and what and where to look for? Part 1

1. “Core” files are dumps of a processes memory. When a program crashes it can leave behind a core file that can help determine what the cause of the crash was by loading the core file in a debugger. By default most Linux systems turn off core file support. You can find out with the following command at the command prompt or the terminal.
[root@meadow~]# ulimit -c
The value of 0 means that it is disabled, so enable it with,
[root@meadow~]# ulimit -c unlimited
Check to see the new core file limit (should be changed to unlimited now)
[root@meadow~]# ulimit -c
While some experts consider a value of 75000 to be adequate, if u wish so, u can replace unlimited to be 75000 in the above command. Just enabling the core file dump does not solve the problem, it helps to track it.

To enable core files dump for applications do:
Edit /etc/profile file and find line that read as follows to make persistent configuration:
ulimit -S -c 0 > /dev/null 2>&1
Update it as follows:
ulimit -c unlimited >/dev/null 2>&1
Save and close the file. Edit /etc/sysctl.conf, enter:
[root@meadow~]# vi /etc/sysctl.conf
Append the following lines:
kernel.core_uses_pid = 1
kernel.core_pattern = /tmp/core-%e-%s-%u-%g-%p-%t
fs.suid_dumpable = 2
Finally, enable debugging for all applications, do:
[root@meadow~]# echo “DAEMON_COREFILE_LIMIT=’unlimited'” >> /etc/sysconfig/init
Then, Reload the settings in /etc/sysctl.conf by running the following command:
[root@meadow~]# sysctl –p

To enable core files dump for specific daemons do:
[root@meadow ~]# vi /etc/sysconfig/daemon-name-file
And append the line-
Save and exit and restart the daemon.

You are all set now, and at the time of any application or daemon crash, you can use the strace, gdb command to evaluate the dump files or send them to the vendor for insight into the problem.
gdb is the GNU debugger. A debugger is typically used by developers to debug applications in development. It allows for a very detailed examination of exactly what a program is doing.
For troubleshooting, once you determine the name of the app that caused the failure, you can start gdb with:
[root@meadow ~]# gdb daemon core
Then you will get a short info about gdb and a prompt like:
Then at the prompt type – `where`

2. Find
For troubleshooting a system that seems to have suddenly stopped working, find what changed very recently!
[root@meadow ~]# find / -mtime -1
Will find and recursively list all the file from / that have changed in the last day.
Even more precise and if the problems arose within a few minutes or within half an hour, do:
[root@meadow ~]# find /usr/lib -mmin -30
Will list all the files in /usr/lib that changed in the last 30 minutes.
Similar options exist for ctime and atime.
[root@meadow ~]# find /tmp -amin -30
Will show all the files in /tmp that have been accessed in the last 30 minutes.
The -atime/-amin options are useful when trying to determine if an app is actually reading the files it is supposed. If you run the app, then run that command where the files are, and nothing has been accessed, something is wrong.

3. Watch
It’s pretty annoying when you have looked everywhere, spent so much time on debugging and looked at various places just to find out that a df –h could have found it.
In addition to running out of space, it’s possible to run out of file system inodes. A `df -h` will not show this, but a `df -i` will show the number of inodes available on each filesystem. Being out of inodes can cause even more obscure failures than being out of space. However modern filesystems have minimized this. The best option is to watch the changes with the help of command watch with df or even free to get refreshed and up to 2 seconds refreshes. For example:
[root@meadow ~]# watch df -h
[root@meadow ~]# watch free –m

And always watch your syslog, logfiles, application logs, dmesg, service status, lastlog
I like to view my log files as
[root@meadow ~]# tail -f /var/log/messages
For recent messages or use grep for exact search of strings.

Watch for missing files that an app needs to run and see if there was any errors:
For example if your ssh daemon is showing errors do:
[root@meadow ~]# strace -eopen sshd
Also see ldd command to see if any library files are missing.

Also watch ipcs and iprm for apps like apache and oracle that uses shm/ipc.
Watch for correct file permissions, not too much or too little. Read man chmod.
Watch for firewall rules.

4. RPM dependencies missing?
Common usages would include:
$ rpm -* package.rpm
-qp –requires – to see what dependencies are needed for the package (new install)

i – install
ivh – install verbose hashing mark (can be substituted by %)
Uvh – upgrade, removes the old package if any
Uvh – – oldpackage (downgrades the updated package)
e – erase
e – – test package (simulate what happens after erase and backup by doing:
e – – repackage
q – query installed packages
q – -whatrequires ( to see what requires the certain package)
qa – query everything thats installed in the system
qpi – query package that are not installed, the i does not install in this case, p prints the info
qpR – queries and prints what is required for the package
qf – queries from which package this file/prog comes
ql – list contents of the package
rpm -Uvh ftp://user:pass@ftpserver/directory/package.rpm (install from FTP)
– – test (after each command to simulate the command and what it does)
F – freshen
-q –configfiles (to see where the configuration file for the package is)
$ rpmbuild -ba path/to/package.spec
The .spec file is usually located on /usr/src/redhat/SPECS directory.
will generate RPM package(s) on directory /usr/src/redhat/RPMS/arch, where “arch” depends on your system and build process configuration. If all goes ok you can install the generated package(s) with no problems.
$ rpm -qa –queryformat ‘%10{size} %{name}-%{version}\n’ | sort -n
to show the size of the installed package

… to be continued