背景
出问题的系统是一个存储系统,系统有一个功能就是多线程下载hadoop上的文件;上层业务方使用的是python,而公司不提供python api下载hadoop上的文件,只提供java api下载;因此我们的系统就变成了上层用户通过python调用我们的python api,我们的python api通过py4j调起jvm操作java中的对象和方法;用户在下载文件通常是100多个进程,某天用户通知我们说你们下载一个文件咋下载了一整天都没下载下来,根据日志和下载的文件发现不是因为下载文件慢,而是程序就卡住了;
分析过程
1、分析日志,跟踪程序,发现代码没啥毛病,日志也看不出来个所以然
2、jstack -l pid 堆栈跟踪 ,堆栈信息给出部分
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode):
"Attach Listener" #186 daemon prio=9 os_prio=0 tid=0x00007fbf98001000 nid=0x28e8 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"pool-47-thread-4" #185 prio=5 os_prio=0 tid=0x00007fbf4c44c800 nid=0x2b76 runnable [0x00007fbf348e3000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
- locked <0x0000000080223058> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:480)
- locked <0x0000000080223038> (a java.io.PrintStream)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:131)
at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:187)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:212)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:100)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
at ch.qos.logback.classic.Logger.info(Logger.java:591)
at com.baidu.adb.client.download.AbstractPartitionDownloader$Worker.run(AbstractPartitionDownloader.java:99)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x0000000587fa71f0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x000000008021cbe8> (a java.util.concurrent.locks.ReentrantLock$FairSync)
"pool-47-thread-3" #184 prio=5 os_prio=0 tid=0x00007fbf4c44b800 nid=0x2b74 waiting on condition [0x00007fbf35af5000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000008021cbe8> (a java.util.concurrent.locks.ReentrantLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:210)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:100)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)
at ch.qos.logback.classic.Logger.info(Logger.java:583)
at com.baidu.adb.client.download.AbstractPartitionDownloader$Worker.run(AbstractPartitionDownloader.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x0000000587fa4e68> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"pool-47-thread-2" #183 prio=5 os_prio=0 tid=0x00007fbf4c44b000 nid=0x2b73 waiting on condition [0x00007fbf349e4000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000008021cbe8> (a java.util.concurrent.locks.ReentrantLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:210)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:100)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)
at ch.qos.logback.classic.Logger.info(Logger.java:583)
at com.baidu.adb.client.download.AbstractPartitionDownloader$Worker.run(AbstractPartitionDownloader.java:145)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x0000000587fa2ae0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
从堆栈信息中可以看到,pool-47-thread-4 处于RUNNABLE,并且获得了2个锁
Locked ownable synchronizers:
- <0x0000000587fa71f0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x000000008021cbe8> (a java.util.concurrent.locks.ReentrantLock$FairSync)
而pool-47-thread-3和pool-47-thread-2的状态是WAITING,并且都是处于waiting on condition状态,并且都在等待0x000000008021cbe8,也就是死锁了
再看一下pool-47-thread-4的堆栈信息, at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37),也就是说是logback没有释放锁,并且是console appender上
查看logback文件,logger中有个appender是console
解决方案
搜索了一下,http://www./article/p-shezgwyf-ps.html ,发现logback打印到console确实会出现死锁现象,原因在于buffer,解决方法是把logger中的console appender去掉即可