Tuesday, August 23, 2011

Optimising Collection Access

I had an interview question recently that asked me to write code to access as efficiently as possible a collection while multiple threads intensively read and write to it.

Giving the caveat that everything should be subject to stress tests anyway, I went ahead and used a java.util.ConcurrentHashMap. I knew this is a very efficient piece of code and I wouldn't have to worry about synchronization.

When I got home, I wondered about my answer and started to put together JMeter tests to establish some empirical data. JMeter provides a lot of the plumbing for stress tests that makes it superior to just writing your own class.

[Incidentally, if you do write your own executable classes for stress tests, don't forget to pause for 4 seconds before running your code. Any object created in the first 4s of the JVM's life is not eligible for an optimization called Biased Locking (see this email from a Sun engineer). Biased Locking is an optimization for the use case where a lock is mostly sought by just one thread. In this case, the favoured thread finds attaining the lock is cheap.]

First, if you wish to plug your code into JMeter, it's helpful to have a superclass that implements the JMeter interface:
package com.henryp.stress;


import org.apache.jmeter.protocol.java.sampler.JavaSamplerClient;
import org.apache.jmeter.protocol.java.sampler.JavaSamplerContext;
import org.apache.jmeter.samplers.SampleResult;

public abstract class AbstractSamplerClient implements JavaSamplerClient {

public SampleResult runTest(JavaSamplerContext javaSamplerContext) {
SampleResult result = new SampleResult();
result.sampleStart();

doSample();

result.sampleEnd();
result.setSuccessful(true);
result.setResponseCodeOK();
result.setResponseMessageOK();

return result;
}

protected abstract void doSample();

}
Then, subclass it. For my ConcurrentHashMap, it looks something like this:
package com.henryp.stress;


import java.util.Map;

import org.apache.jmeter.config.Arguments;

public abstract class AbstractMapSamplerClient extends AbstractSamplerClient {

protected static Map concurrentHashMap;
protected final MapPopulator mapPopulator = new MapPopulator();

@Override
public Arguments getDefaultParameters() {
return mapPopulator.getDefaultParameters();
}

@Override
protected void doSample() {
for (Map.Entry entry :
concurrentHashMap.entrySet()) {
// NoOp while I think of something to do here
}
}

@Override
public synchronized void setupTest(JavaSamplerContext javaSamplerContext) {
if (concurrentHashMap == null) {
System.out.println("setupTest: " + javaSamplerContext);

concurrentHashMap = mapPopulator.makePopulatedMap(javaSamplerContext);
}
}

@Override
public void teardownTest(JavaSamplerContext javaSamplerContext) {
concurrentHashMap = null;
}
}
The MapPopulator is just a utility that instantiates a Map of a user-defined type and populates it with a user-defined number of elements. Setting the default parameters in that class looks something like:
.

.
.
protected static final int DEFAULT_COLLECTION_SIZE = 100;
protected static final String COLLECTION_SIZE_KEY = "COLLECTION_SIZE_KEY";

public Arguments getDefaultParameters() {
System.out.println("getDefaultParameters");

Arguments arguments = new Arguments();

Argument argument = new Argument();
argument.setName(COLLECTION_SIZE_KEY);
argument.setValue("" + DEFAULT_COLLECTION_SIZE);
arguments.addArgument(argument );

return arguments;
}
.
.
.

To make JMeter pick up my classes, I put this in $JMETER_HOME/bin/jmeter.properties:

search_paths=/Users/henryp/Documents/workspace/TestMaven/target/TestMaven-0.0.1-SNAPSHOT.jar

This way, I only had to execute Maven to compile my code and rebuild the JAR. It would be nice to hot-deploy the code but I don't know if JMeter supports this. So, instead I have to restart JMeter each time I change the code :-(

My code that iterates over the entry set of a ConcurrentHashMap was producing a throughput of about 25 000/s when using 100 read-threads while another 10 write-threads added an element to the collection with a constant throughput of 100/s (I've not shown the code for this).

Not bad. But could it be improved easily? Let's try using a java.util.List (perhaps surprisingly, ArrayList and LinkedList gave me similar figures) and use read/write locks to synchronize access. So, the read-code looks something like this:
 static ReadWriteLock readWriteLock;


@Override
protected void doSample() {
Lock readLock = readWriteLock.readLock();
try {
readLock.lock();
for (String aValue : list) {
// no-op
}
} finally {
readLock.unlock();
}
}

and the write-code looks something like this:
 private final AtomicInteger callId = new AtomicInteger();

.
.
.
@Override
protected void doSample() {
Lock writeLock = readWriteLock.writeLock();
try {
writeLock.lock();
String aValue = this.toString() + callId.getAndIncrement();
SingleListReaderWithRWLocksSamplerClient.list.add(aValue);
} finally {
writeLock.unlock();
}
}

The subsequent JMeter test (100 read thread, 10 write threads with a throughput of 100/s) indicated that this was about twice as fast as using ConcurrentHashMap.

What's more, it seemed to suffer a far smaller degradation in performance when I increased the throughput of write-threads to 1000/s.

Conclusion: they're both pretty nifty and perform at the same order of magnitude. Your mileage may vary but it seems java.util.concurrent.locks.ReadWriteLock is a more efficient approach.

No comments:

Post a Comment