线上服务器CPU负载过高的问题解决过程
一.找到CPU占用过高进程
执行top命令,发现PID为12443的Java进程占用CPU高达350%,出现故障。
二.定位具体线程或代码找到该进程后,接下来要定位具体的线程或代码。首先我们使用如下命令来显示线程列表,并按照CPU占用高的线程排序:
[root@localhost logs]# ps -mp 12443 -o THREAD,tid,time | sort -rn
显示结果如下:
USER %CPU PRI SCNT WCHAN USER SYSTEM TID TIME
root 10.6 17 - - - - 1838 10:12:20
root 10.2 17 - - - - 3223 10:12:16
找到了耗时最高的线程1838,占用CPU时间有12分钟了!
或者使用 top -Hp pid(shift+p 按cpu排序,shift+m 按内存排序)命令来定位:
top -Hp 8958
获取到这个进程下面所有线程,通过查看%CPU找到最耗费CPU的是线程PID
将需要的线程ID转换为16进制格式:
[root@localhost logs]# printf "%x\n" 1626
e18
最后打印线程的堆栈信息:在执行 jstack -l [PID] > /tmp/output.txt
之后可以对 /tmp/output.txt
进行分析
开头交代当前 dump 的时间和 JVM 基本信息
2019-06-12 16:13:06
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.152-b16 mixed mode):
接下来就是程序的线程信息
"JetCacheHeavyIOExecutor3" #85 daemon prio=5 os_prio=0 tid=0x00007f76a93ab800 nid=0x1c47a waiting on condition [0x00007f7696acb000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
将输出的信息进行匹配就能找出有问题的代码。