Saturday, January 26, 2013

Is Functional Faster in Java?

I've been playing with functional programming paradigms in Java and have noticed the code seems to be slower. Being curious, I thought I'd benchmark it.

To do this, I'll be using the TotallyLazy Java library to make functional programming easier and the Caliper library to do my benchmarking and make pretty pictures.

Caveat: These examples are contrived but then I'd be interested in seeing examples that were not. And my apologies to the functional community if the functional code I wrote is bad.

First, the normal 'imperative' Java code.

Here, we sum up all the numbers to a maximum:

    private final ToRun sum = new ToRun() {
        @Override
        public long run() {
            int total = 0;
            for (int i = 0; i <= SUM_OVER; i++) {
                total += i;
            }
            return total;
        }
    };


And here, we add up all the numbers in an array, adding 1 each step:


    private final ToRun mapReduce = new ToRun() {
        @Override
        public long run() {
            long total = 0;
            for (int i = 0; i < BIG_ARRAY.length; i++) {
                total += BIG_ARRAY[i] + 1;
            }
            return total;
        }
    };


The equivalent functional code looks like this for summing up to a number:



    private final Sequence      rangeSequence   = Numbers.range(0, SUM_OVER);

    private final ToRun range = new ToRun() {
        public long run() {
            Number reduce = rangeSequence. reduce(sum);
            return reduce.intValue();
        }
    };


    private final Callable2 sum = new Callable2() {
        public Number call(Number a, Number b) throws Exception {
            return a.intValue() + b.intValue();
        }
    };




And this for summing and adding 1 over an array :



    private final Sequence        bigSequence     = Sequences.sequence(BIG_ARRAY);


    private final ToRun mapReduce = new ToRun() {
        public long run() {
            Sequence map = bigSequence.map(add1Callable);
            Number reduce = map.reduce(sum);
            return reduce.longValue();
        }
    };



    private final Callable1 add1Callable = new Callable1() {
        @Override
        public Long call(Long input) throws Exception {
            return input == null ? 0 : input.longValue() + 1;
        }
    };

The ToRun class looks like this plus some code to stop our loop being optimized away :


public class Runner {

public static interface ToRun {
long run();
}

public int run(int reps, ToRun toRun) {
int doNotOptimizeAway = 0;
for (int i = 0 ; i < reps ; i++) {
doNotOptimizeAway |= (toRun.run() + i);
}
return doNotOptimizeAway;
}
}


Caliper gives results that look like this:





Where the old-fashioned Imperative way is quicker by one to two orders of magnitude.


Autoboxing

This shouldn't be too surprising as the functional library makes liberal use of autoboxing. This is very slow.

"Changing the declaration of sum from Long to long reduces the runtime from 43 seconds to 6.8 seconds on my machine. The lesson is clear: prefer primitives to boxed primitives, and watch out for unintentional autoboxing." - Joshua Bloch, Effective Java, p23.

"The programmer ignored the distinction between primitives and boxed primitives and suffered the consequences ... severe performance problems." - ibid p223

"With Sun's current compiler, measurements show that this version [lots of autoboxing] is about 60% slower than the original" - Java Generics and Collections, Maurice Naftalin, Philip Wadler.

Even the creator of the functional language, Martin Odersky says:


"If you have tight loops over arrays of primitive values, nothing beats an imperative loop."
-StackOverflow



Multithreading to the rescue?

Naively, you might think that multi-threading would help us. After all, the ability to easily parallelize code is often one of its selling points.

[Note: what I am about to do can equally be a bad idea in the imperative world.]

Multi-threading our workload is easy in TotallyLazy and looks like this:


    private final ToRun mapReduceConcurrently = new ToRun() {
        public long run() {
            Sequence map = bigSequence.mapConcurrently(add1Callable, executor);
            Number reduce = map.reduce(sum);
            return reduce.longValue();
        }
    };
    
    //
    // Threads
    //

    private ThreadFactory threadFactory = new ThreadFactory() {
        final AtomicInteger threadId = new AtomicInteger();

        @Override
        public Thread newThread(Runnable target) {
            Thread thread = new Thread(target, "PH: thread "
                    + threadId.addAndGet(1));
            thread.setDaemon(true);
            return thread;
        }
    };

    private final ExecutorService executor = Executors.newFixedThreadPool(16,
            threadFactory);


But to no avail! The results look like this:



A billion times slower = wow!

The htop command shows the CPU usage looks like this during the multi-threaded test:


When a quiescent machine looks like this:


And here is the problem:

"When parallelizing... it is generally impractical to assign a separate thread to each element ... this would require too many threads, and the overhead of coordinating them would dwarf the computation. Instead,it makes sense to partition the problem into a number of subparts, let each thread solve a subpart, and then merge the results" - Java Concurrency in Practice, Goetz et al, p101.

I emphasise again that this mistake can be made in the imperative world. I am just saying that somebody new to functional programmer might naively use the ability to easily parallelize their code and shoot themselves in the foot.



No comments:

Post a Comment