JVM GC fun times

We ran into an interesting issue with a jvm app recently.

Basically what was happening is after a period of time, the app becomes unresponsive. Running htop shows that one thread is consuming ~100% of one CPU, all other CPUs are idle, as are the other threads belonging to the app. Once the app gets to this state, it never comes back.

Looking at this, the first thing you will probably think of is that the thread is spinning, most probably due to a deadlock. You might try running jstack to take a look at the threads (I did), which will result in nothing useful (jstack couldn’t get any information on the running threads). Then, I tried to attach jdb to the app, hoping for an easy win.

However, there wasn’t much information to be found here and restarting the app in debug mode wasn’t an option. Next, I tried to run the app with hprof ( on mode LOG ALL THE THINGS). Interestingly, after a while (similar time frame to when we would otherwise notice 1 thread consuming ~100% CPU) the app just crashed. In case it was a memory issue, I cut back on the number of items I was logging and tried again. Still crashed, but it seemed to take a little longer. Looking at crash error report wasn’t too helpful, but the fact that the crash took longer when logging less metrics seemed to support the instinct that it could be a memory issue.

Thus, I tried to look at the app’s heap usage. Aha, eden and perm gen space are both ~100%!. Not good. Well … good as it’s a promising lead :) Consequently, I restarted the app with gc logging turned on to see what was going on. Fairly quickly, gc logs were filling up with “Full GC” lines. These coincided perfectly with the single thread eating up one CPU.

So: when trying to promote eden space items to perm gen, gc thread realizes that perm gen is full so it runs a full gc to clean it up. However, nothing can be cleaned up. So, it tries again. And again. Ever hopeful, but never getting anywhere.

tl;dr: is your jvm app unresponsive with one thread that has a really high CPU usage? It’s probably the garbage collector, turn on gc logging to verify.

So how to fix this?
As far as I know, there really isn’t an easy way to fix this issue that isn’t just kicking the can down the street.
Auditing your app for memory leaks is a good first step. If you are legitimately using a lot of memory either increase the max heap space(kick can down the street), or preferably, figure out a way to batch the work you are doing to minimize memory usage.

Set a baseline of expectations for code reviews

Code reviews are important. Everyone knows this. Not only do they help enforce a baseline of quality in the product, they are also great for disseminating knowledge among developers.

Something that is important to realize however, is that not all reviews are created equal. Not only do they differ person to person, they may differ day to day. So, I think teams should get together and talk through what it means for code to have passed review.

What follows is the contents of the email I sent to my team to start us off in this process. Keep in mind that this is just a way to get the conversation started. No one person should dictate what a code review should encompass. Rather, the baseline should be generated by the team.

Very Basic:
- Does the code work? Verify happy path, and any edge cases you can find.
- Are there excuses? If there are comments or TODOs that say something could be done better, question them. TODOs rarely get cleaned up.
- Follow the boy scouts rule. Everyone works with legacy code that makes us cringe. However, make sure that the changes leave the code in a better state than it found them.
- Does it smell?
- Is it idiomatic? i.e. make sure the code doesn’t fight the language/environment.
- Are there tests?

Basics:
- Aim for low cyclomatic complexity. Max = 5.
- Aim for good code coverage.  Aim for 100% branch coverage.
- Function parameters that are flags. A function should do one thing. If it does diff things based on a flag, make it a new function. Similarly, classes should really just do one thing (and do it well).
- Is it solid?
- In most cases, interfaces make more sense than abstract classes.
- Don’t comment too much because comments usually go out of date. Your code will(esp. with python) usually be clear enough. Aim for readable code. Yes, some code needs comments. In these cases, make sure you comment what/why rather than how.
- Prematurely optimizing is bad, esp. if it comes at the cost of readability. BUT, keep the speed/memory ramifications in mind. It’s easier to make a correct program fast, than it is to make a fast program correct :)

Higher level/design stuff:
- Look at the class hierarchy. Does it make sense with our models? Are the right concepts being expressed in the code? Note that you aren’t looking to see if the person solved the problem exactly like you would have. You are verifying that the solution makes sense in terms of the domain.
- Look for anti-patterns. Take a look at the dev. section. Hopefully, most of these will make you laugh (not ruefully :) )

Be nice:
- Criticize code, not person.
- Be aware that there are different ways of solving a problem.
- Here’s a nice write up on the spirit of doing a code review.

There are loads of additional things that you could put in here. Different people will focus on different things, but the idea is to establish a baseline. If your team doesn’t have a baseline yet, I hope this helps you get started.

My phone number is pretty awesome


If you take f(n) to be a function that gives you the nth prime,
( f ∘ f ∘ f )  (141)  followed by the largest fermat prime you can get your hands on is my number :)




Quick starting Scalatra or In which I discover Scalatra and sbt

Over the weekend, I was researching various frameworks for implementing a REST API. Although I had already started the implementation using Tornado, I wanted to see what else was out there.

And am I glad I looked. I discovered Scalatra which seems to be exactly what I was looking for; a lightweight, sinatra-esque way to map URLs to actions that easily lends itself to testing. I especially like the uber-readable way the tests are written.

Who wouldn’t want to write tests like this?

// taken from http://github.com/alandipert/step
class MyScalatraServletTests extends FunSuite with ShouldMatchers with ScalatraTests {
  // `MyScalatraServlet` is your app which extends ScalatraServlet
  route(classOf[MyScalatraServlet], "/*")
 
  test("simple get") {
    get("/path/to/something") {
      status should equal (200)
      body should include ("hi!")
    }
  }
}

I cloned the repo, ran the examples and decided my search had ended.

However, while running the example was easy enough, I wasn’t sure of how to get started with an actual app. It looked especially cryptic since I haven’t ever used maven or sbt. I even considered bailing on scalatra for the well-known shores of tornadoweb. But, since I recently started working with Java at work, I decided to stick it out.

I’m glad I did because sbt is a pleasure to use, especially if you take the time to RTFM.

Anyway, here are the basic steps to help cut down the 0 to 60 time when starting scalatra :)

Pre-reqs:
Install java, I have java 1.6.
Setup sbt as per these instructions.

Good, now we are ready to start.
Create a new sbt project “HelloScalatra”

 mkdir HelloScalatra
 cd HelloScalatra
 sbt
# fill out the other inputs as you want but make sure you enter 2.8.0 as scala version.

Create a project definition for sbt and save it under project/build. Here’s a barebones one, refer to the docs if you want more info on creating sbt build configs.

// save as project/build/HelloScalatraBuild.scala
import sbt._
class HelloScalatraBuild(info: ProjectInfo) extends DefaultWebProject(info)
{
  // scalatra
  val sonatypeNexusSnapshots = "Sonatype Nexus Snapshots" at
"https://oss.sonatype.org/content/repositories/snapshots"
  val sonatypeNexusReleases = "Sonatype Nexus Releases" at
"https://oss.sonatype.org/content/repositories/releases"
  val scalatra = "org.scalatra" %% "scalatra" % "2.0.0-SNAPSHOT"
 
  // jetty
  val jetty6 = "org.mortbay.jetty" % "jetty" % "6.1.22" % "test"
  val servletApi = "org.mortbay.jetty" % "servlet-api" %
"2.5-20081211" % "provided"
}

Tell sbt to account for the new dependencies:

#from project root
sbt update

We have the basic dependencies taken care of, so let’s create our class that will serve requests. Create “HelloScalatra.scala” under src/main/scala/com/helloscalatra with the following content:

// save as src/main/scala/com/helloscalatra/HelloScalatra.scala
package com.helloscalatra
 
import org.scalatra._
 
class HelloScalatra extends ScalatraServlet with UrlSupport {
 
 before {
   contentType = "text/html"
 }
 
 get("/") {
   <html>
     <head>
       <title> My first scalatra webapp</title>
     </head>
     <body>
       <h1> Hello Scalatra </h1>
     </body>
   </html>
 }
 
 protected def contextPath = request.getContextPath
}

The last thing we need to do is setup web.xml to tell jetty what to do:

// save as src/main/webapp/WEB-INF/web.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE web-app
 PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
 "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd">
<web-app>
 
 <servlet>
   <servlet-name>HelloScalatra</servlet-name>
   <servlet-class>com.helloscalatra.HelloScalatra</servlet-class>
 </servlet>
 
 <servlet-mapping>
   <servlet-name>HelloScalatra</servlet-name>
   <url-pattern>/*</url-pattern>
 </servlet-mapping>
</web-app>

Great! We should be good to go :)
Go to project root and startup the sbt console:

sbt

and start jetty:

>jetty-run

TADA!

Navigate over to localhost:8080 to see the webapp in action.

Tagged

Installing java in ubuntu lucid

Hopefully this saves someone 10 mins:

sun java was moved in the partner repos for lucid.
So, you’ll need to add deb http://archive.canonical.com/ubuntu lucid partner to your sources:

sudo echo  "deb http://archive.canonical.com/ubuntu lucid partner" >> /etc/apt/sources.list

then update and install:

sudo apt-get update
sudo apt-get install sun-java6-jre sun-java6-jdk

PuSH Support follow up

Just a quick follow up from the last post.

I’ve just added bots that send you updates via email or jabber. Go to rackjam.codereviewr.com to check it out.

It’s a little rough around the edges in terms of UX but it gets the job done. And, although there were some constraints that will probably cause me to move, all this was ridiculously easy thanks to the google app engine.

Release & iterate and codereviewr PuSH support

I have recently been working on CodeReviewr. It’s motivated by the need to have a place to easily stick up code and discuss it with minimum hassle. I have rewritten the code numerous time using various platforms(from mochiweb to node.js to tornadoweb) towards multiple goals (collaborative coding, reviews, etcs) as I succumbed to feature creep, premature optimization, not built here, et al.

Anyway, a couple of weekends back I decided to just deploy it so that we could use it for rackjam. I don’t consider it even close to finished. For eg, you have to refresh in order to see comments you just made (ack), and there is no documentation on how to use it. But, it’s finished enough. I decided to stop trying to release something perfect and go with the release and iterate model. So, I rounded off the minimal feature set and let dog fooding sort out the priorities.

A few weekends later, I am still adding features. BUT, these are user driven requirements. And I have a website that people are using :) It’s surprisingly good motivation to keep trying to make something better when you see people actually using it. Driven by feedback, I have added support for multiple versions, diffs between versions, better diff visualization, ability to download raw files, free private domain support etcs etcs. Funnily enough, I still haven’t been asked to fix commenting to address the refresh annoyance! More than anything else, this has taught me the importance of release+iterate as opposed to trying to imagine complete use cases. Of course, I will throw in real time comment updates in there, but not before the core featureset.

But this blog post isn’t only about extolling the virtues of MVP, release+iterate, dog-fooding blah blah blah. It’s about a new feature I have been asked to add; the ability to get emails when a review or set of reviews you are interested in changes. Before beginning this task, I wanted to make sure that sending subscribers updates should be as painless and decoupled from the current code as possible. The web app really shouldn’t care who gets updated and how. And anyway, I definitely don’t want the webapp sending email. So, my ideal solution is 1 line of code fire off an event so w/e subsystem or subsystems are in charge of updates inform interested parties as (and when) they see fit. I also wanted the update management subsystems to be as pluggable as possible. The first thing that I thought of was a message queue(à la beanstalkd) where I push out messages whenever an update occurs. Interested consumers could then process the message and send updates to whomever and however they wanted. All the logic in handling the complexity in delivering the message would be solely in the consumers, the webapp fires [off an event] and forgets. Sounds pretty perfect. Could we do better? Enter PubSubHubBub. (I’m going to assume you know about it, or will do the required research to get up to speed if you are interested). After some prototyping I decided to go with it because

  • It met the 2 requirements I set forth.
  • It’s easy.
  • And, it allows me to have an interim solution for free that
    was good enough (rss feeds).

So after adding rss feeds to reviews (append .rss to the url) and subdomains (/feed.rss), creating a hub at superfeedr, and adding the 1 line of code to ping the hub when there is an update, I am 100% there w.r.t providing users with a way to receive updates and 80% there w.r.t email alerts. All I have to do now is to write bots that will subscribe to the hub and push out notifications through w/e medium people want. The best thing about this is CodeReviewr is now PubSubHubBub capable so anyone else can subscribe to the hub right now if they don’t want to wait for me to implement the bots they want :)

Tagged

How I stopped worrying and started loving my iPad

I got a nice little surprise yesterday from my girlfriend; a 32GB iPad. And, now that I’ve played with it, I think I understand why it’s such a great device. Yeah, I know, it doesn’t have X amount of RAM, Y input ports, and is a closed system etcs. I, too, scoffed at this “crippled” device and bemoaned various things missing that, if only they were present, would make it BE-AWESOME.

But, after owning it for a day+ish, I think I get where I was wrong. The reason I, and many others, get mired in these comparisons is because we think that the iPad is meant to be a smaller laptop. It’s not. Whereas, on a laptop you can do both content creation and consumption, the iPad is primarily a content consumption device; you can watch movies, play games, listen to music, read books, but you really can’t edit or create anything of the sort.

And that’s perfectly fine. Think of it as a TV on steroids, rather than a downsized laptop. For lots of people, especially during their personal time, content generation really isn’t what they spend time doing. Despite this, the average computer is designed more for content generation than consumption.

If you think you are primarily a producer rather than a consumer, you may be surprised. I know I was. I went through the entire weekend, reading books, checking email etc. on my new iPad without missing my laptop in the least. Yeah, I need my laptop to code and I could never be without one, but the iPad fills a lot of gaps I lug my laptop around for.

A side effect of having a consumption only device is that now I can manage my time better. While I am on my laptop, I may be on hacker news or reddit for hours before realizing how much time I have wasted. However, by restricting the consumption activities to my iPad and productive ones to my laptop, I find it a lot easier to get more done :)

TL;DR: If you don’t think the iPad is great, then you probably don’t get it. You may need to get it to get it.



P.S: No, I dont think I am suffering from the Stockholm syndrome.

Tagged

Get updates when your server’s ip changes

It’s a simple problem; I want to get updates when my home server’s ip address changes. It’s such a simple and pervasive problem that I’m sure there are loads of solutions out there. But, I figured it would be faster/easier to roll my own rather than evaluate+integrate w/e else is out there.
My first instinct at a solution comprised of 2 parts:

  1. Poll for changes to self.ip
  2. Publish updated ip using/to w/e

Let’s tackle part1. The main question here is how the server should get its external ip. The first thing that came to mind is a web page that echoes the visitor’s ip. So I wrote a simple Google appengine app that echoes your ipv4 address when you hit it. My script on the server GETs its external ip by hitting http://my-ipaddr.appspot.com/ and if its changed publishes it using the passed-in function. Here’s the code:

# ipwatcher.py
import urllib2
import socket
import time
 
root_url = 'http://my-ipaddr.appspot.com'
 
def get_ip():
    ip = urllib2.urlopen(root_url).read()
    try:
        socket.inet_aton(ip)
        return ip
    except socket.error:
        return None
 
def watch(publisher):
    if not callable(publisher):
        raise Exception('The publisher needs to be a callable function')
 
    last_ip = None
    while 1:
        current_ip = get_ip()
        if current_ip != last_ip:
            publisher(current_ip)
            last_ip = current_ip
 
        time.sleep(5*60)
 
 
def publish(ip):
    # publish however
    print ip
 
if __name__ == '__main__':
    watch(publish)
 
 
# google app engine part
# simpler than simple. It's just here for completeness
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
 
class MainPage(webapp.RequestHandler):
    def get(self):
        # echo user ip
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write(self.request.remote_addr)
 
application = webapp.WSGIApplication(
                                     [('/', MainPage)],
                                     debug=True)
def main():
    run_wsgi_app(application)
 
if __name__ == "__main__":
    main()

Yeah, it’s crummy that I do GET requests every 5 minutes, but with the GAE cap at 500requests/s I’m not too worried yet. I don’t know if there is a better way to get the external IP, but as this works and it’s a means to an end, I’m going to resist getting distracted.

So, part1 done with no sweat. Let’s move on to part2 where we actually publish the ip. My first instinct was just to use email. But I get too many emails anyway. Second I thought of scp or ftp to push the ip out. Meh. Been there done that.

Since I have something running on GAE, why not use it? Let’s do it.
GAE’s datastore makes it brain dead easy way to persist data. We could just do this:

 
name = "my-ip"
class IpAddress(db.Model):
    ip = db.StringProperty(required=True)
    name =  db.StringProperty(required=True)
 
class MainPage(webapp.RequestHandler):
    def get(self):
        # echo user ip
        ip = IpAddress.gql("where name = :1", name).get()
        if not kv:
           ip = IpAddress(ip = self.request.remote_addr, name = name)
           ip.put() 
        else:
            ip.ip =  self.request.remote_addr
            ip.put()
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write(ip.ip)

This would probably work just fine. At least until random people/bots start hitting it. But, with maybe a hundred views a week on my blog, I don’t forsee too many people hitting the random appspot site.

However, since I am playing around with the GAE, might as well make it a little more interesting. Let’s try to make a simple key value store.

class KeyValue(db.Model):
    name = db.StringProperty(required=False)
    passwd = db.StringProperty(required=False)
    value = db.TextProperty(required=False)
    modified = db.DateProperty(auto_now=True)

The password field is there just so random people who guess a key wont be able to edit the data. Yup, it’s not encrypted. Why? Because I already know the password of everyone who’s going to use it (me :) )
If you are dying to use it and don’t want me to see your password, let me know and I can hash it or something.

Anyway, adding simple CRUD is easy too:

class KVStoreUpdate(webapp.RequestHandler):
    def post(self):
        # post to an existing key
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
        content = self.request.get('content')
        kv = KeyValue.gql("where name = :1 and passwd = :2", name, passwd).get()
        if not kv:
            self.response.out.write('-1')
        else:
            kv.value = content
            kv.put()
            self.response.out.write(kv.name)
 
    def get_name_passwd(self, request):
       return 
 
class KVStoreCreate(webapp.RequestHandler):
    def post(self):
        # reserve a new key or confirm old key.
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
 
        kv = KeyValue.gql("where name = :1", name).get()
        if not kv:
            newKv = KeyValue (name = name, passwd = passwd)
            newKv.put()
            self.response.out.write(name)
        elif kv.passwd == passwd:
            self.response.out.write(name)
        else:
            self.response.out.write('-1')
 
class KVStoreRead(webapp.RequestHandler):
    def post(self):
        # return stored value for the given key
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
 
        kv = KeyValue.gql("where name = :1 and passwd = :2", name, passwd).get()
        if not kv:
            self.response.out.write('-1')
        else:
            self.response.out.write(kv.value)
 
class KVStoreDelete(webapp.RequestHandler):
    def post(self):
        # delete kv 
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
 
        kv = KeyValue.gql("where name = :1 and passwd = :2", name, passwd).get()
        if not kv:
            self.response.out.write('-1')
        else:
            kv.delete()
            self.response.out.write(name)

Now all we need is the client code:

import urllib2
import urllib
import random 
import string
 
# inbox_id is the just the key. I was going to have versioning + 
# addressing but decided not to since it was starting to look
# like a hybrid of a queuing system and a key value store.
class KeyValuePublisher:
    def __init__(self, inbox_id, passwd, url):
        self.inbox_id = inbox_id
        self.passwd = passwd
        self.url = url
        self.reserve_inbox()
 
    def reserve_inbox(self):
        self.create_inbox()
 
    def create_inbox(self):
        response = self.make_request({}, "/c")
        if (response == '-1'):
            raise Exception('Name already taken, or password invalid')
 
    def publish(self, content):
        response = self.make_request({'content': content}, "/u")
        if (response == '-1'):
            raise Exception('Invalid update')
 
    def read(self):
        return self.make_request({}, "/r")
 
    def delete(self):
        response = self.make_request({}, "/d")
        if (response == '-1'):
            raise Exception('Can't delete')
 
 
    def make_request(self, data, resource = "", method = 'POST'):
        data['name'] = self.inbox_id
        data['passwd'] = self.passwd
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(self.url + resource, data=urllib.urlencode(data))
        request.add_header('Content-Type', 'application/x-www-form-urlencoded') 
        request.get_method = lambda: method
        return opener.open(request).read()        
 
if __name__ == "__main__":
 
    # publish strings to a valid mailbox
    kvs = KeyValuePublisher("srijak0", "password", 'http://my-ipaddr.appspot.com')
    for i in range(20):
        len = random.randint(0,10000)
        content = ''.join(random.choice(string.letters) for i in xrange(len))
        kvs.publish(content)
        assert content == kvs.read()
 
    # initialize with taken mailbox name
    passed = False
    try:
        kvs = KeyValuePublisher("srijak0", "not_password", 'http://my-ipaddr.appspot.com')
    except:
        passed = True
    assert passed
 
    # delete mailbox
    kvs = KeyValuePublisher("srijak0", "password",  'http://my-ipaddr.appspot.com')
    kvs.delete()
    assert kvs.read() == '-1'
 
    print "[Tests passed]"

Pretty self explanatory.

And there you have it. Not the prettiest code I’ve ever written but it gets the job done well enough for <1hour of work :)

Tagged , ,

Massaging a Tornado web pain point: restart requirement

I have been playing around with the Tornado web framework and I really like it: the shallow learning curve paired with everything it brings to the table makes for a very good framework to at least kick off your next real time app.

One of the core assumptions in tornado is that request are handled quickly: if there are any blocking calls in your request handlers, it will cause other requests to queue. So, I have http wrappers around my dbs and queues so that I can handle these blocking calls from my request handlers asynchronously.

Well and good you say. What is the problem? The pain point is that nodes need to be restarted in order for code changes to propagate. And though it’s not a huge problem, its started to bug me that I have to waste seconds(!) restarting three nodes every time I make a change.  So, I wrote a quick python script to do so. It takes the simplest approach; polling for changes in current directory and restarting nodes if required. I was going to use inotify, but as OSX apparently doesn’t have it (has something called FSEvents), I decided to put off learning a new lib for another day so I could keep hacking on my project.

# Simple script to poll all files of interest below the current working directory
# for changes. On change, it will run w/e commands you want. For me, it has
# been helpful in restarting tornado web nodes.
import os
import re
import time
import signal
from subprocess import Popen
 
# define what you want to run here:
# each task is a list of command/arguments to run, popen style
tasks = [['python','httpDatabase.py'],['python','main.py']]
 
# define the file types you want to trigger on
# I opted for .py and html files.
file_regexp = re.compile("(.py$|.html$)")
 
def files_have_changed(old_stats, new_stats):
    if len(old_stats) != len(new_stats):
        return True
    for k in old_stats:
        if new_stats[k] != old_stats[k]:
            return True
    return False
 
def get_stats():
    stats = {}
    f = []
    for root, folders, files in os.walk(os.getcwd()):
        f.extend([os.path.join(root,x) for x in files if file_regexp.search(x)])
    for file in f:
        try:
            stats[file] = time.localtime(os.stat(file)[8])
        except:
            pass
    return stats
 
 
handles = []
def stop_current():
    if len(handles) > 0:
        for h in handles:
            print "Killing %d" % (h.pid)
            os.kill(h.pid, signal.SIGTERM)
        del handles[:]
 
def restart():
    print "Files changed. Restarting"
    stop_current()
    for t in tasks:
        p = Popen(t)
        print "Started %s. PID:%d" % (" ".join(t),p.pid)
 
        handles.append(p)
 
 
last_stats = {} 
while 1:
    current_stats = get_stats()
 
    if (files_have_changed(last_stats, current_stats)):
        last_stats = current_stats
        restart()
    time.sleep(1)
Tagged ,