The non-programmer’s explanation to the Java deserialisation bug

In the last year we’ve had several serious and well publicised software vulnerabilities like heartbleed and shellshock that set the whole tech press chattering and even made the national news. But not all vulnerabilities are as well marketed. One particular bug has been around for years, has been publicly known for over 9 months, but is only recently getting attention due to a report by Foxglove Security and a corresponding Slashdot article. As Foxglove say, “no one gave it a fancy name, there were no press releases” but “this bug is unlikely to go away soon”.

The report by Foxglove is a fascinating read if you’re a Java programmer, but it is very long and deeply technical. My goal is to explain what’s happening in enough detail that everyone else can understand how this bug got there, and why it’s not simple to get rid of. I do assume you are at least a little bit technical. You are reading an article about a software vulnerability after all.

This bug affects lots of very popular Java programs like IBM WebSphere and Jenkins. It exists because two very common Java features interact to create the bug in a lot of popular software. The reason that the bug isn’t going away any time soon is that technically, neither of these features on their own have a security vulnerability, so they can’t be “fixed”. Both are doing exactly what they claim to do. But the way most Java programmers use them creates a serious vulnerability.

Feature 1: Java serialisation. This is a feature built into the Java language that converts objects into streams. Bits of data in a Java program like numbers and lists are called objects. Objects can be used by a computer program, but can’t be sent over a network. To send an object over a network it’s converted into a stream. This is a message containing a series of instructions that tell another computer how to recreate the same object. For example, if the object is the list of numbers “4, 5, 6”, then the stream might be the instructions “create an empty list with 3 slots, add ‘4’ in the first slot, add ‘5’ in the second slot, add ‘6’ in the third slot”.

When a computer receives a stream from another computer, it doesn’t know what’s inside the stream until it converts it back into objects. The program might be expecting a list of numbers, but perhaps the stream really contains a list of Belgian beer names. This is why every good programer knows that you have to validate your input: check that the objects coming out of the stream really are lists of numbers.

Fundamental to how this bug works is that most programmers assume that it’s safe to take a stream from an untrusted 3rd party, turn it into objects, and then have a look at those objects to see if the stream contains the kind of data they were expecting. They’re wrong.

Feature 2: Commons Collections. This is a library that contains useful snippets of code for working with lists of objects in computer memory. It’s very widely used because it’s free, and saves time when developing applications. One of its more rarely used features is called InvokerTransformer. This is a special kind of list that takes another list and a description of what to do to each item in that list to make a new, transformed list. For example you create an InvokerTransformer with the list [10, 11, 12] and the command “for each number, invoke the ‘divideBy’ method with the value ‘2’.” and the result will be the list [5, 5.5, 6].

The creators of Commons Collections designed InvokerTransformer to be a general purpose tool that can do anything at all to a list. The example above divides every number in a list by 2. But there’s nothing to stop you from creating an InvokerTransformer that says “for each number in the list, delete every file on the computer hard drive”. In other words, this feature allows you to make a dangerous object – a bit of data that, simply by existing in a computer program, deletes every file on your computer. The authors of InvokerTransformer figured that while you could do this, you wouldn’t do it because it would be really really dumb. Of course, this assumes that you are in sole control of what objects get created in your own program.

How the bug works:

  1. Java serialisation is a general purpose feature that allows other computers to create any kind of object in your program.
  2. Commons Collections enables certain kinds of objects that can do damage to your system simply by existing in your program.
  3. A lot of programmers making popular software haven’t put 2 and 2 together and have ended up shipping vulnerable software.

The result is that right now there are a lot of programs out there written in Java – big name programs used by Fortune 500 companies – that you can send a carefully crafted message to containing a serialised InvokerTransformer object, and this will cause them to do anything you want. Like email you a copy of their customer database. Or change the last name of every contact in your corporate directory to “Sidebottom”.

What software is vulnerable: any software that takes data over the network and deserialises it. Which is most enterprise-oriented Java software.

How to fix it: if you’re the developer of an app, you should use Look-ahead deserialisation which will completely stop this problem. But most popular software seems not to be using this technique, so there are a lot of vulnerable systems out there right now. If you’re the user of a vulnerable app and the developers haven’t released a fixed version, the Foxglove report describes a process for hacking the app to remove InvokerTransformer. It’s messy and error prone, but at least it works.