We ran into an interesting issue with a jvm app recently.
Basically what was happening is after a period of time, the app becomes unresponsive. Running htop shows that one thread is consuming ~100% of one CPU, all other CPUs are idle, as are the other threads belonging to the app. Once the app gets to this state, it never comes back.
Looking at this, the first thing you will probably think of is that the thread is spinning, most probably due to a deadlock. You might try running jstack to take a look at the threads (I did), which will result in nothing useful (jstack couldn’t get any information on the running threads). Then, I tried to attach jdb to the app, hoping for an easy win.
However, there wasn’t much information to be found here and restarting the app in debug mode wasn’t an option. Next, I tried to run the app with hprof ( on mode LOG ALL THE THINGS). Interestingly, after a while (similar time frame to when we would otherwise notice 1 thread consuming ~100% CPU) the app just crashed. In case it was a memory issue, I cut back on the number of items I was logging and tried again. Still crashed, but it seemed to take a little longer. Looking at crash error report wasn’t too helpful, but the fact that the crash took longer when logging less metrics seemed to support the instinct that it could be a memory issue.
Thus, I tried to look at the app’s heap usage. Aha, eden and perm gen space are both ~100%!. Not good. Well … good as it’s a promising lead
Consequently, I restarted the app with gc logging turned on to see what was going on. Fairly quickly, gc logs were filling up with “Full GC” lines. These coincided perfectly with the single thread eating up one CPU.
So: when trying to promote eden space items to perm gen, gc thread realizes that perm gen is full so it runs a full gc to clean it up. However, nothing can be cleaned up. So, it tries again. And again. Ever hopeful, but never getting anywhere.
tl;dr: is your jvm app unresponsive with one thread that has a really high CPU usage? It’s probably the garbage collector, turn on gc logging to verify.
So how to fix this?
As far as I know, there really isn’t an easy way to fix this issue that isn’t just kicking the can down the street.
Auditing your app for memory leaks is a good first step. If you are legitimately using a lot of memory either increase the max heap space(kick can down the street), or preferably, figure out a way to batch the work you are doing to minimize memory usage.