本文概览:在程序hang住时,可以通过jstack获取线程堆栈信息,分析是否是死锁或者死循环导致程序hang住。发现是log4j的问题,通过修改<root>中日志级别由DEBUG变为INFO和注释掉ConsoleAppender来修复
1 问题描述
在执行任务时,hang住了,于是通过 “ jstack -l 进程ID “命令来获取线程堆栈信息。如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
"fileModelCheckExecutor-19" prio=10 tid=0x00007f986cfcd800 nid=0x4f66 waiting for monitor entry [0x00007f975fffe000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - waiting to lock <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:135) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:110) Locked ownable synchronizers: - <0x00000000bb1d0710> (a java.util.concurrent.ThreadPoolExecutor$Worker) "fileModelCheckExecutor-18" prio=10 tid=0x00007f986cfcb800 nid=0x4f65 runnable [0x00007f9767ffe000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x00000000b84233d8> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000b84256c0> (a java.io.PrintStream) at org.apache.logging.log4j.core.appender.ConsoleAppender$CloseShieldOutputStream.write(ConsoleAppender.java:312) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - locked <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:135) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:110) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:99) Locked ownable synchronizers: - <0x00000000bb0dee10> (a java.util.concurrent.ThreadPoolExecutor$Worker) "fileModelCheckExecutor-17" prio=10 tid=0x00007f986cb50000 nid=0x4f64 waiting for monitor entry [0x00007f981414d000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - waiting to lock <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:135) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:110) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:99) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:430) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:409) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - <0x00000000b8be89b0> (a java.util.concurrent.ThreadPoolExecutor$Worker) |
2 线程dump文件的介绍
2.1 进程状态
首先对于上面dump文件中线程的状态进行解析解释。java的进程状态有如下:
1 2 3 4 5 6 7 8 9 |
public enum State { NEW, RUNNABLE, BLOCKED, WAITING, TIMED_WAITING, TERMINATED; } |
对于每一个状态解释:
- RUNNABLE 正在执行的线程
- BLOCKED 被 RetrentLock#lock 或者 Synchronized 函数阻塞
- WAITING 通过wait操作或ReentrantLock#Condition进行阻塞。
- TIMED_WAITING 就是在wiat或者ReentrantLock#Condition进行阻塞时,设置了一个阈值时间。
不同线程状态之间转换可以通过下图来展示(http://www.uml-diagrams.org/examples/java-6-thread-state-machine-diagram-example.html)
2.2 死锁状态分析
1 、如果怀疑是死锁,需要关注线程什么状态?
首先想到的是要查看下Block状态的线程。对于死锁产生的一个例子如下
1 2 3 4 5 6 7 |
thread1 (BLOCKED) wait to lock <0x20002> locked <0x 100001> thread 2 (BLOCKED) wait to lock <0x10001> locked <0x 200002> |
还有一种情况是,线程thread1获取了一个锁,然后进行死循环,造成等待这个锁的thread2和thread3进入了hang住的状态。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
thread1 (RUNABLE) 发生了死循环 locked <0x 100001> ........ thread 2 (BLOCKED) wait to lock <0x10001> ....... thread 3 (BLOCKED) wait to lock <0x10001> ........ |
除了BLOCKED状态,对于WAITING状态,相当于使用了wait操作或者在RettrentLock中Condition,也会造成死锁。
2、获取产生死锁的线程
通过分析 “locked xxx “和 “waiting to lock xxx”信息来进行分析。
(1)取得 BLOCKED 状态的线程,
在这些线程中通过 “wait to lock xxx ”来找到锁的关键字,如下得到关键字 “0x00000000b84256e0”
1 2 3 4 5 6 7 8 |
"fileModelCheckExecutor-19" prio=10 tid=0x00007f986cfcd800 nid=0x4f66 waiting for monitor entry [0x00007f975fffe000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - waiting to lock <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:135) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:110) Locked ownable synchronizers: ......... |
(2)查询“locked 0x00000000b84256e0”的线程
1 2 3 4 5 6 7 8 9 10 11 12 |
"fileModelCheckExecutor-18" prio=10 tid=0x00007f986cfcb800 nid=0x4f65 runnable [0x00007f9767ffe000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x00000000b84233d8> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000b84256c0> (a java.io.PrintStream) at org.apache.logging.log4j.core.appender.ConsoleAppender$CloseShieldOutputStream.write(ConsoleAppender.java:312) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - locked <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:135) |
(3)这样,我们就可以知道 线程“fileModelCheckExecutor-18” 获取了锁 “0x00000000b84256e0”,其他线程都在等着获取这个锁而被阻塞。但是“fileModelCheckExecutor-18” 这个线程进行入了死循环。
注意:为什么说是死循环?因为这个线程状态为Runnable。如果改线程状态为Blocked,那么此时就是死锁状态了。
3 分析 dump文件
1、首先找到处于blocked状态线程,根据“waiting to lock <0x00000000b84256e0> “找到关键词“0x00000000b84256e0”
1 2 3 4 |
"Thread-1" prio=10 tid=0x00007f986cc48800 nid=0x4778 waiting for monitor entry [0x00007f9815460000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - waiting to lock <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) |
2、整个文件中搜索 “0x00000000b84256e0” ,发现有一个地方进行locked,对 “0x00000000b84256e0” 进行了上锁。然后其他等待这个锁的地方都被阻塞了看,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
"fileModelCheckExecutor-18" prio=10 tid=0x00007f986cfcb800 nid=0x4f65 runnable [0x00007f9767ffe000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x00000000b84233d8> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000b84256c0> (a java.io.PrintStream) at org.apache.logging.log4j.core.appender.ConsoleAppender$CloseShieldOutputStream.write(ConsoleAppender.java:312) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - locked <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:135) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:110) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:99) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:430) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:409) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:367) at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:112) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:738) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:708) at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:188) at com.baifubao.licai.financeweb.service.api.common.income.service.IncomeCalcService.calcIncome(IncomeCalcService.java:64) at com.baidu.finance.licai.engine.checker.ladder.LadderBalanceAndProfitChecker.computeProfitAndAdjustment(LadderBalanceAndProfitChecker.java:550) at com.baidu.finance.licai.engine.checker.ladder.LadderBalanceAndProfitChecker.check(LadderBalanceAndProfitChecker.java:204) at com.baidu.finance.licai.engine.checker.ladder.LadderBalanceAndProfitChecker.check(LadderBalanceAndProfitChecker.java:83) at com.baidu.finance.licai.engine.extracter.BatchFileExtracter.checkOneFileModel(BatchFileExtracter.java:355) at com.baidu.finance.licai.engine.extracter.BatchFileExtracter.access$000(BatchFileExtracter.java:37) at com.baidu.finance.licai.engine.extracter.BatchFileExtracter$CheckCallable.call(BatchFileExtracter.java:386) at com.baidu.finance.licai.engine.extracter.BatchFileExtracter$CheckCallable.call(BatchFileExtracter.java:376) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - <0x00000000bb0dee10> (a java.util.concurrent.ThreadPoolExecutor$Worker) |
2、定位
就是log4j造成的,如上日志中 线程”fileModelCheckExecutor-18″ 进入到了死循环。
3、解决
如下是从网上找的的方案:将root的log级别设置为INFO.
https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/
4、后续(补)
当前文件中,出现问题的线程信息如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
"fileModelCheckExecutor-18" prio=10 tid=0x00007f986cfcb800 nid=0x4f65 runnable [0x00007f9767ffe000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x00000000b84233d8> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000b84256c0> (a java.io.PrintStream) at org.apache.logging.log4j.core.appender.ConsoleAppender$CloseShieldOutputStream.write(ConsoleAppender.java:312) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - locked <0x00000000b84256e0> (a org.apache.logging.log4j.core.appender.OutputStreamManager) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:135) at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:110) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:99) |
又查看了2017-05-26和2017-05-24的dump文件,如下:
- 2017-05-26的dump文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
"commonPublisherExecutor-16" prio=10 tid=0x00007fa6a879d000 nid=0x6147 runnable [0x00007fa650f5a000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x00000000b87c3348> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000b87ad2a0> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) - locked <0x00000000b883f0a0> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at java.util.logging.StreamHandler.flush(StreamHandler.java:242) - locked <0x00000000b86ce908> (a java.util.logging.ConsoleHandler) at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:106) at java.util.logging.Logger.log(Logger.java:616) |
- 2017-05-24的dump文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
"fileModelCheckExecutor-25" prio=10 tid=0x00007f571cc0d800 nid=0x3e89 runnable [0x00007f56724e2000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x00000000b87fb5c0> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000b87fa3e8> (a java.io.PrintStream) at org.apache.logging.log4j.core.appender.ConsoleAppender$SystemOutStream.write(ConsoleAppender.java:274) at java.io.PrintStream.write(PrintStream.java:480) - locked <0x00000000b8984a80> (a java.io.PrintStream) at org.apache.logging.log4j.core.appender.ConsoleAppender$CloseShieldOutputStream.write(ConsoleAppender.java:312) at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:122) - locked <0x00000000b8984a60> (a org.apache.logging.log4j.core.appender.OutputStreamManager) |
由上面三个信息,共同点都涉及到了ConsoleAppender、CosoleHandler。所以这里想到一个方法是注释掉consoleAppender,如下
1 2 3 4 5 6 7 8 9 |
<Loggers> <Root level="DEBUG" includeLocation="true"> <!-- <AppenderRef ref="Console"/> --> <AppenderRef ref="File"/> <AppenderRef ref="error"/> </Root> </Loggers> |
参考资料
1、线程状态 http://fangjian0423.github.io/2016/06/04/java-thread-state/xi
2、jstack分析线程堆栈 https://segmentfault.com/a/1190000000615690
(全文完)