How do you figure out what's going wrong with a VM?
Still working on building this page..
Here's a list of commands that I find useful when debugging issues around the performance of Linux VMs when an application is running slow. Through these commands, I try to get a good idea of what might causing the slow performance of the OS and the application.
Check the server load
uptime
#Check load averages to see if anything seems odd.
Check the RAM and CPU usage at a glance with processes
htop
#Check if the CPU and the RAM usages are maxing out.
sar -n DEV 1 #Report Network device statistics
sar -n TCP,ETCP,DEV 1 #Report the incoming and outgoing tcp connections along with network device stats.
Check the network performance
netstat -s
netstat -tnlp
netstat -r #prints the route table
#Get to know the things happening at the tcp level.
#Check with applications are using which ports
Check the network interface performance
nicstat 1
#Get the network interface statistics
#check the network throughput (read, writes) and interface %util
Check the process stats
pidstat -t 1 #%user, %system
pidstat -d 1 #Disk I/O
#Very useful process stats, e.g. by thread or disk I/O
Trace System Calls
strace -p <pid>
strace -tp ‘pgrep <process>’ 2>&1 | head -100
#system call tracer - basically records all the system calls going on.