Tuesday, September 30, 2014

Constructors and Assignment in Different Languages


If you focus on Java all the time it's easy to forget that something as simple as constructors and assignments follow very different paths in other languages.

Scala

Let's say we have a simple Scala class that takes no arguments in construction:

class ClassWithNoArgs {
  println("ClassWithNoArgs constructed")

  def fn() = { println("ClassWithNoArgs.fn") }
}

Now, let's instantiate it, assign it and use it:

    def classWithNoArgs = new ClassWithNoArgs() 
    println("classWithNoArgs assigned. About to use...")
    classWithNoArgs.fn()                        

So, a Java programmer might be surprised to see this output:

classWithNoArgs assigned. About to use...
ClassWithNoArgs constructed
ClassWithNoArgs.fn

That is, we make an assignment but not until we use the reference does instantiation take place!

This is entirely down to us using the def keyword rather than val. If the above code had been:

    val classWithNoArgs = new ClassWithNoArgs()
.
.

Then the output would have been more familiar to a Java programmer.

Similarly, if we had a function that had side effects then returned 1, such as:

    def fn() = {
      println("calling something")
      1
    }

and two similar functions:

  def callByValue(x: Int) = {
    println("x1=" + x)
    println("x2=" + x)
  }

  def callByName(x: => Int) = {
    println("x1=" + x)
    println("x2=" + x)
  }

we could pass fn to both functions but in a call to the first, fn will be evaluated before the call and in the second, it will be evaluated twice - once upon each call to println.

Consequently, since both functions can take fn, we cannot overload functions with these two different argument lists. For instance, we could not have in the same namespace functions overloaded(x: Int) and overloaded(x: => Int).

(I stole this code from this excellent clarification)

C++

Let's take two very simple classes, one extending the other. First the base class that we will imaginatively call BaseClass:

BaseClass.h

#include <iostream>

using namespace std;

class BaseClass {
public:
    BaseClass();
    BaseClass(string aString);
    BaseClass(const BaseClass& orig);
    virtual ~BaseClass();


    const string aString;

private:

};

BaseClass.cpp

#include "BaseClass.h"

BaseClass::BaseClass() {
    cout << "BaseClass::BaseClass()\n";
}

BaseClass::BaseClass(string aString) : aString(aString) {
    cout << "BaseClass::BaseClass(string)\n";
}

BaseClass::BaseClass(const BaseClass& orig) {
    cout << "BaseClass::BaseClass(const BaseClass& orig)\n";
}

BaseClass::~BaseClass() {
    cout << "BaseClass::~BaseClass()\n";
}

And now the sub class that we will call with equal imagination, SubClass:

SubClass.h

#include <iostream>
#include "BaseClass.h"

using namespace std;

class SubClass : BaseClass {
public:
    SubClass();
    SubClass(string aString);
    SubClass(const SubClass& orig);
    virtual ~SubClass();
private:

};

SubClass.cpp

#include "SubClass.h"

SubClass::SubClass() {
    cout << "SubClass::SubClass()" << endl;
}

SubClass::SubClass(string aString) : BaseClass(aString) {
    cout << "SubClass::SubClass(string aString)" << endl;
}

SubClass::SubClass(const SubClass& orig) {
    cout << "SubClass::SubClass(const SubClass& orig)" << endl;
}

SubClass::~SubClass() {
    cout << "SubClass::~SubClass()" << endl;
}

(My, isn't C++ verbose when you compare it to Scala?)

When we point to objects in C++, things are very similar to Java. But when we use code like this:

    int main(int argc, char** argv) {
.
.
    SubClass subclass; // instantiates a SubClass
    SubClass otherSubclass = subclass;.
.
.
    }
The output is very different.

BaseClass::BaseClass()
SubClass::SubClass()
BaseClass::BaseClass()
SubClass::SubClass(const SubClass& orig)
.
.
SubClass::~SubClass()
BaseClass::~BaseClass()
SubClass::~SubClass()
BaseClass::~BaseClass()

The last 4 lines are the destructors being called when we leave the method. But what of the first 4 lines?

The first two lines are fair enough. We're instantiating the SubClass via C++'s syntax for putting it on the stack. Just like Java, C++ put's an implicit call to super in the constructor for us automatically.

The third and fourth are the super constructor (again) and the copy constructor that assigns subclass to otherSubclass. This is very much unlike Java where there would be two references to the same object but here we have two different objects. We can prove this by writing a CppUnit test that looks like this:

    SubClass subClass;
    SubClass otherSubClass = subClass;

    CPPUNIT_ASSERT(&subClass != &otherSubClass);

Implicitly, C++ is passing otherSubclass to the copy constructor and it's the job of the developer to populate his new object. 

Now, let's instantiate another SubClass but this time with an argument:

    string      aString         = "a string";
    SubClass    subClassWArg    = SubClass(aString);
    CPPUNIT_ASSERT(aString == subClassWArg.aString); // must have Subclass : public BaseClass to access this

and use the copy-constructor again:

    SubClass    otherWArg       = subClassWArg;
    CPPUNIT_ASSERT(otherWArg.aString == subClassWArg.aString);

Hmm, this fails. So, let's change the copy constructor thus:

SubClass::SubClass(const SubClass& orig) : BaseClass(orig.aString) {
.
.

And that passes.

Note, you usually do assignments in C++ with:

    this->aString = orig.aString;

but in our code, we made aString a constant so this would cause a compilation error. Note, by making it constant, we must write our own copy-constructor and not use the compiler implicitly generates for us (see here for why). This demonstrates the difference between initialization (the former) and assignment (the latter).


Java

I won't spend too long on this as most Java programmers should know the order of instantiation and assignment but suffice to say Java assignment/instantiation follows the order of C++, behaves like C++ pointer assignment (not object assignment) and is like Scala's use of the val keyword using call-by-value semantics.

Wednesday, September 17, 2014

Scala Crib Sheet #3

Scala for the busy Java programmer:

Type Aliasing

In Akka, you'll see this in the Actor object:

type Receive = PartialFunction[Any, Unit]

What this means is that where ever we see Receive we mean PartialFunction[Any, Unit]. This is type-aliasing much like in C where, for example, you see:

   typedef int jint;

This is from the OpenJDK's jni_x86.h where jint is defined as an int in Linux. 

Obviously, running in the JVM, Scala does not need to define the type of something like jint according to architecture. But it's more than a convenience. Scala has something called type members whose effect is a little like parameterized types but promises not to bloat as quickly as generics can.

"In Scala, the inner class is addressed using the expression Outer#Inner instead of Java's Outer.Inner. The '.' syntax is reserved for objects" [1]

Let's use a slightly modified version of Odersky's example [1] to illustrate this:

abstract class Food

class Grass extends Food {
  override def toString() = "grass"
}

class DogFood extends Food {
  override def toString() = "dog food"
}

abstract class Animal {
  type SuitableFood

  def eat(food : SuitableFood) = { 
    println(this.toString + " eats " + food.toString)
  }
}

class Dog extends Animal {
  type SuitableFood  = DogFood

  override def toString() = "dog"
}

class Cow extends Animal {
  type SuitableFood  = Grass

  override def toString() = "cow"
}

object Animals {

  def main(args : Array[String]) {
    val dog                     = new Dog
    val cow                     = new Cow
    val dogFood                 = new DogFood
    val grass                   = new Grass

    dog.eat(dogFood)
    cow.eat(grass)
  }

  def doEat(animal : Animal, food : Animal#SuitableFood) {
    /* This fails to compile with:

 found   : food.type (with underlying type com.phenry.scala.Animal#SuitableFood)
 required: animal.SuitableFood 

    */

    animal.eat(food) // <-- does not compile!
  }
}

The same effect can be achieved with Java generics but that can be subverted via erasure. This appears more solid. More information can be found here.


Emptiness

There are a few ways to represent nothingness in Scala and a good description lives here and here. The main points are that null is just like Java, Nil is an empty List and Unit is just like Java's void.

This last one is interesting. There is only one way to have an instance of Unit and it's represented as (). It's an actual reference (unlike java.lang.Void which cannot be instantiated due to its private constructor) and can be used like this:

  def takesVoidToString(arg: () => String) : Unit = { 
    println("called with " + arg()) 
  }

  def main(arg : Array[String]) = {
    def fnTakesVoidReturnsString() = "fnTakesVoidReturnsString"
    takesVoidToString(fnTakesVoidReturnsString)
.
.

which in this case prints out:

called with fnTakesVoidReturnsString

Here our function takesVoidToString takes another function that in turn takes no arguments and just returns a String. By calling arg() we implicitly call fnTakesVoidReturnsString's apply method. Without the (), we'd see arg's toString method called and see that it is a Function0.

A note on syntax can be found here.


Partial Application of Functions in Scala

The syntax for a function that can be partially applied looks like this:

  def canBePartiallyApplied(count : Int)(f : Int => Int) { // this compiles but we'd like syntactic sugar: (f : Function1[Int, Int])
    for (i <- 1 to count) {
      println(f(i))
    }
  }

And we call it thus:

    def g(x : Int) = x + 1
    def partiallyApplied = app.canBePartiallyApplied(3)(_)
    partiallyApplied(g)

Note the underscore to indicate that we'll be filling that in later. This is also an example of currying.


Scala Hierarchy

Absolutely everything extends Any.

The Scala equivalents of primitives in Java extends AnyVal. All other types extend AnyRef.

Null is the base class for all  AnyRef.

Nothing is the base class for Null and all AnyVal.


Terminology

"Anonymous functions in source code are called function literals" [1].



Monday, September 15, 2014

Scala Crib Sheet #2

More Scala for Java programmers:

Case statements as functions

In Akka's sample.hello.HelloWorld:

  def receive = {
    // when the greeter is done, stop this actor and with it the application
    case Greeter.Done => context.stop(self)
  }

"Case sequences as partial functions: A sequence of cases (i.e. alternatives) in curly braces can be used anywhere a function literal can be used. Essentially, a case sequence is a function literal, only more general." [1]

Partial Functions

Not to be confused with partially applied functions, this borrows from the mathematical concept of a partial function where not all values in the domain map to other values (this contrasts with a total function). An example of a partial function in mathematics is the log function over all real numbers. Clearly, negative real numbers do not map to any real numbers.

Following on from the idea above of case statements as more generalized functions, then a case statement that does not cover every eventuality is a partial function. The Scala compiler will issue a warning but will not abort compilation of this (although you might get runtime errors).

Taking this example in the Scala docs:

    val isEven: PartialFunction[Int, String] = {
        case x if x % 2 == 0 => { println(x) ; x+" is even" }
    }

we can see that this is indeed a partial function as it doesn't return anything in the event x is odd.

All the other nice functionality that Scala gives you for this class is added by the compiler. For instance, I didn't implement the isDefinedAt function but the compiler did it for me:

$ javap -c ./target/scala-2.10/classes/com/phenry/scala/MyScala\$\$anonfun\$1.class
.
.
  public final boolean isDefinedAt(int);
    Code:
       0: iload_1       
       1: istore_2      
       2: iload_2       
       3: lookupswitch  { // 0
               default: 12
          }
      12: iload_2       
      13: iconst_2      
      14: irem          
      15: iconst_0      
      16: if_icmpne     23
      19: iconst_1      
      20: goto          24
      23: iconst_1      
      24: ireturn 
.
.

which when JADed, becomes:

    public final boolean isDefinedAt(int x1)
    {
        int i = x1;
        boolean flag;
        if(i % 2 == 0)
            flag = true;
        else
            flag = false;
        return flag;
    }


Type parameter hints

In Akka's sample.hello.HelloWorld:

    // create the greeter actor
    val greeter = context.actorOf(Props[Greeter], "greeter")

which calls:

  /**
   * Scala API: Returns a Props that has default values except for "creator" which will be a function that creates an instance
   * of the supplied type using the default constructor.
   */
  def apply[T <: Actor: ClassTag](): Props = apply(defaultDeploy, implicitly[ClassTag[T]].runtimeClass, List.empty)


Unlike Java, Scala can instantiate objects from type parameters. "What's required here is that you help the compiler by providing a runtime hint of what the actual type parameter is ... This means following the type with a a colon and the class name ClassManifest."  [1]. Since Programming in Scala was published, this ClassManifest has been deprecated in favour of ClassTag which is what we see here. It carries the type information that was erased at compile time.

This adding of an implicit parameter is called a context bound.

"Implicits in subclasses and subobjects take precedence over implicits in base classes." [1]

[1] Programming in Scala, Odersky.

Tuesday, September 9, 2014

SSH: Performance and Behaviour


The problem

We had set up a Jetty server to use SSL much as in a previous post. This was fine until we started pooling connections (pooling connections didn't greatly improve performance in London-to-London communication but improved Hong Kong-to-London performance by about 30% since the ping time for a packet was a huge 220ms). But when we pooled connections and introduced SSL, all communication froze after some happy-path results.

The problem did not appear to be with encryption itself as the first few requests succeeded. But after 30 hits, all the Jetty threads were blocked like this (from running jstack):

"qtp401625763-13" #13 prio=5 os_prio=0 tid=0x00007f3e2021b800 nid=0x1ecb runnable [0x00007f3e0d2e7000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)
- locked <0x00000000d90404a8> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:911)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000000d906e358> (a sun.security.ssl.AppInputStream)
at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)

(where we're using Jetty 7.6.15)

At first, we thought something was wrong with our certificates etc but a handful of requests at first went thorugh without issue. But look for this in the server-side logs when you have given the JVM argument -Djavax.net.debug=all

*** ServerHelloDone

on both the client and server side, then you know the server completed the handshake OK. Look for:

main, READ: TLSv1.2 Change Cipher Spec, length = 1

or, in Jetty: 

qtp1418621776-17, WRITE: TLSv1.2 Change Cipher Spec, length = 1

and you'll know the whole handshake pretty much completed OK.

What happens in SSL?

Asymmetric encryption used briefly at start-up to establish a more efficient cipher. The private keys in this exchange are ephemeral and generated on the server side as soon as it hears the client say "hello" and on the client side as soon as it receives the server's choice of key.

Syn

In kickstarting the SSL handshake, the client advertises all of its cipher suite codes, elliptic curve details etc to the server (see sun.security.ssl.HandshakeMessage$ClientHello.send(..) ). The client thread then blocks waiting for the server to respond.

Syn/Ack

Upon receiving the client's message, (see sun.security.ssl.ServerHandshaker.clientHello(..) ) the server chooses an algorithm that both client and server support. When I was stepping through the code, this appeared to be DSA. The server-side must have a public and private key that corresponds to this algorithm in its keystore (see ServerHandshaker.setupPrivateKeyAndChain(..)).

Here, a sun.security.ssl.DHCrypt is instantiated. From the JavaDocs:

"This class implements the Diffie-Hellman key exchange algorithm.  D-H means combining your private key with your partners public key to generate a number. The peer does the same with its private key and our public key. Through the magic of Diffie-Hellman we both come up with the same number. This number is secret (discounting MITM attacks) and hence called the shared secret."

Along with the server certificates, all of this is sent to the client and the server thread blocks.

Ack

The client then unblocks and deserializes the ServerHello created on the server side (via a bespoke deserialization process). It uses the cipher suite the server told it to use and checks the server's certificates and stores the server's public key. It then sends its own public key (via DHClientKeyExchange) to the server.

Symmetric cipher keys are then generated (Handshaker.calculateConnectionKeys), the client tells the server that it is ready to talk then blocks.

A little more server Ack

Given what the client and server have exchanged, the server now sets its own agreed secret key using the same method in Handshaker (that is the superclass to both ServerHandshaker and ClientHandshaker) and sends a Finished object back to the client. The server is now finished and it notifies an listeners in a separate thread (this listener can be found as an inner class in Jetty's SslConnectorEndPoint.run). The thread in SslConnectorEndPoint then awaits incoming data.

Introducing MAT

There's a very nice tool from the ladies and gentlemen of Eclipse called MAT. Very quickly, you can dump the memory of a JVM and query its contents with a SQL-like language. For instance, I found all the client-side sockets that were listening to my server port of 8192 by executing:

select * from java.net.SocksSocketImpl where port = 8192

Interestingly, all their incoming references originated in the connection pool. So that is what is keeping them open and correspondingly keeping the server threads listening for a request that never comes! All client threads doing something useful are blocked by Jetty not having any server threads to service them; and all those Jetty threads are listening on sockets whose other end is held open by the idle sockets in the client's connection pool.

Conclusion

By replacing Jetty's SslSocketConnector with its SslSelectChannelConnector, the server behaves asynchronously and connection pooling doesn't make it grind to a halt.

We rolled-our own nonce encryption and it performed poorly. When we got rid of it, we found that Jetty using SSL had roughly the same performance as plain, clear-text HTTP.


Further reading

1. A very good (maths) blog on elliptic curves.

Tuesday, August 19, 2014

Low Level Locks


Deep down in the JVM you'll find something like this in Atomic::cmpxchg:

  __asm__ volatile ("lock; 1: cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest)
                    : "cc", "memory");
  return exchange_value;

(Note: I've slightly modified this for reasons of clarity to ignore the case where the code might be running on a CPU that is not a multi-processor).

This code is hit often when the JVM tries to lock something. It is one of its many techniques to arbitrate contention (what happens in the event of contention I'll leave to another post).

So, what's it doing? It's basically doing what Java programmers would know from the AtomicXXX classes using low-level assembler. This is how GCC makes explicit assembler code. A good reference is here but briefly:
  • asm (or alternatively __asm__) are keywords that allow the programmer to directly write assembler. Volatile means don't move our code as part of an optimization. It must execute where we placed it.
  • The contents of the parentheses are delimited by colons and their meanings are ASSEMBLY_CODE:OUTPUT:INPUT:CLOBBERED.
  • The CLOBBERED list tells the compiler what we have fiddled with (other than input and output) and that it should bear this mind before and after executing it. In this case, we're telling it we fiddled with the memory and the condition flags ("cc").
  • The INPUT and OUTPUT fields map our C variables to particular CPU registers. In this case, we're saying use any registers for the values input via exchange_value and dest because "r" means "any register". For the input in compare_value and output in exchange_value we're saying explicitly use the EAX register ("a" and "=a" respectively. The equals sign means it is a write-only output.
Why the EAX register is used for input and output is given in Intel's Software Developer's Manual:

"The CMPXCHG instruction requires three operands: a source operand in a register, another source operand in the EAX register, and a destination operand. If the values contained in the destination operand and the EAX register are equal, the destination operand is replaced with the value of the other source operand (the value not in the EAX register). Otherwise, the original value of the destination operand is loaded in the EAX register." [1]

If you put this code in a simple .c file and compile it with:

$ gcc -S YOUR_FILE.c

you'll see the raw assembly output to YOUR_FILE.s confirms the above.

[1] Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 3A, 3B, and 3C


Monday, August 18, 2014

R


R is a great tool for plotting performance graphs. Every developer who worries about the performance profile of his application should learn it. Here are some miscellaneous scripts I've been using for the last few months.

But first...

A Crash Course in R

R is Modular

You might find that the standard installation does not have all the modules you need. Any R GUI will help you download the ones you want. But to use them, you need something like this:

.libPaths("YOUR_MODULE_DIRECTORY")
require("MODULE_NAME")

Think of this as setting the classpath and then using import in your Java files.

Loading the data

Take this example:

timings_data <- read.table("C:/Users/henphi/db_saves.txt")
timings      <- c(timings_data$V15)
times        <- timings_data$V1

The first line reads the raw data that I saved in a file.

The second line takes the 15th column of data (think of it as the awk command in Unix) and puts it into a vector. Like awk, R treats the first column as index 1, not zero. There is a 0th column but this is the line number. We put the  data into a vector using the c() command.

The third lines is just like the second but takes just the 1st ($V1) column. Here, I chose not to use the c() command but that's OK. In both cases it doesn't matter as they are both vectors already. In the paragraph above, I just wanted to introduce the standard way of making vectors in R.

Coercing the Data

We've already mentioned one data structure - vectors that are created using the c() command. Another is a list that can be created using (unsurprisingly) the list() command.

In this example, I am taking the variables defined in the section above and manipulating the types to make something I can plot.

data <- structure (
    list(
        time = structure(
            as.POSIXct(strptime(times, '%H:%M:%S')),
            class = c("POSIXct", "POSIXt"),
            tzone = ""
        ),
        variable = timings
    ),
    .Names    = c("time", "variable"),
    row.names = c(NA, -10L),
    class     = "data.frame"
)

Here, I have made a data structure of a type called a data frame. "Although a data set of 50 employees with 4 variables per worker has the look and feel of a 50x4 matrix, it does not qualify as such in R because it mixes types. Instead of a matrix we use a data frame. A data frame in R is a list with each component of the list being a vector corresponding to a column in our 'matrix' of data... Typically, data frames are created by reading in a data set from a file". [1]

As part of this structure, we have a list that has two entries that go by the arbitrary names of time and variable. Variable is simple, it's just the timings variable that we loaded from a file earlier.

The element of this list, time, is also a list but we have coerced the string representations of our times to a type that represents a date. This is done by the strptime function that converts the string in our data file to a standard type using regexp. Once in the standard format, it is then converted to a Unix time-from-epoch using as.POSIXct.

(Since the data file only has time of day entries in it, R will assume they belong to the day on which the line runs and give it "today's" date. That's OK as I know my data is only within an hour but I don't care which hour it was.)

The .Names element defines the names of the data objects within this structure. They would be used as default in the graphs we generated but we'll override them later.

A real world example

Let's say we want to plot the Garbage Collection times of our app. A couple of command line switches will help.

  • -XX:+PrintGCApplicationStoppedTime which GC guru, Gil Tene, considers "is probably the only number in GC logs that you should sort-of-semi-believe."
  • -XX:+PrintGCDateStamps which makes the logs much easier to deal with than time-since-epoch.
  • -Xloggc:/YOUR/FILE/HERE to make the JVM output log stats in the first place.

I clean this file up just to grab the application pause times:

$ grep "Total time for which application threads were stopped" jstringserver.gc.log > jstringserver_cleaned.gc.log 

such that the slightly cleaned GC logs look like this:

$ more /home/henryp/Documents/Blogs/jstringserver_cleaned.gc.log
2014-08-18T15:17:29.824+0100: 1.096: Total time for which application threads were stopped: 0.0004190 seconds
2014-08-18T15:17:32.895+0100: 4.166: Total time for which application threads were stopped: 0.0001930 seconds
2014-08-18T15:18:40.902+0100: 72.173: Total time for which application threads were stopped: 0.0001650 seconds
2014-08-18T15:18:43.903+0100: 75.174: Total time for which application threads were stopped: 0.0000450 seconds
.
.

Now, my R script looks like this:

gc_data <- read.table("/home/henryp/Documents/Blogs/jstringserver_cleaned.gc.log")

data <- structure (
  list(
    time = structure(
      as.POSIXct(strptime(gc_data$V1, '%Y-%m-%dT%H:%M:%S')),
      class = c("POSIXct", "POSIXt"),
      tzone = ""
    ),
    variable = gc_data$V11
  ),
  .Names    = c("time", "variable"),
  class     = "data.frame"
)

plot(
  data,
  xaxt="n",
  type="h",
  main="GC times for JStringServer",
  ylab="GC Pause Time (ms)",
  xlim=c(as.POSIXct(strptime("15:20", "%H:%M")),
         as.POSIXct(strptime("15:30", "%H:%M")))
)

axis.POSIXct(side=1, at=cut(data$time, "min"), format="%H:%M")

Where I have zoomed-in on a period of time I am interested in, namely from 15:20 to 15:30 and axis.POSIXct just put's the time ticks on the x-axis after we disabled x-axis marking in the plot() command with xaxt="n".

The generated graph looks like:


Note that if you prefer, you can set log="y" as an argument in the plot command and have the y-axis scaled logarithmically.


[1] The Art of R Programming.

Saturday, August 16, 2014

Performance of RSA vs. DES


Choice of encryption algorithm matters. "In software, DES is about 100 times faster than RSA. These numbers may change slightly as technology changes, but RSA will never approach the speed of symmetric algorithms". [1]

DES can be cracked with brute force in hours but for our application that needs to encrypt a nonce to check the integrity of the conversational state, that's good enough security. So, it looked like it was worth investigating a change of algorithm.

"DES is a symmetric algorithm: The same algorithm and key are used for both encryption and decryption" [1]. AS a result, the means of getting a key are slightly different to getting an RSA private/public key pair:

    KeyGenerator keyGenerator   = KeyGenerator.getInstance("DES");
    SecretKey    secretKey      = keyGenerator.generateKey();

But using the cipher is pretty similar:

    Cipher cipher        = Cipher.getInstance("DES/ECB/PKCS5Padding");
    cipher.init(ENCRYPT_MODE, secretKey);
    byte[] encrypted     = cipher.doFinal("hello world".getBytes());

The performance improvements were dramatic, even better than Schneier's estimate. To measure them, I used the Java Microbenchmark Harness. To allow your project to use JMH, you need Maven and a pom.xml that looks a little like this below (my version of Eclipse - Kepler SR2 - auto-generated it for me but my work computer couldn't seem to as it didn't have the relevant Maven archetype so I'll include my pom.xml explicitly here):

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.phenry</groupId>
    <artifactId>microbenchmarks</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>Auto-generated JMH benchmark</name>

    <prerequisites>
        <maven>3.0</maven>
    </prerequisites>

    <dependencies>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-core</artifactId>
            <version>${jmh.version}</version>
        </dependency>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-generator-annprocess</artifactId>
            <version>${jmh.version}</version>
            <scope>provided</scope>
        </dependency>
   <dependency>
     <groupId>junit</groupId>
     <artifactId>junit</artifactId>
     <version>4.8.1</version>
     <scope>test</scope>
   </dependency>
    </dependencies>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <jmh.version>0.9.4</jmh.version>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <compilerVersion>1.6</compilerVersion>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.2</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <finalName>benchmarks</finalName>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>org.openjdk.jmh.Main</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
        <pluginManagement>
            <plugins>
                <plugin>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>2.5</version>
                </plugin>
                <plugin>
                    <artifactId>maven-deploy-plugin</artifactId>
                    <version>2.8.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-install-plugin</artifactId>
                    <version>2.5.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>2.4</version>
                </plugin>
                <plugin>
                    <artifactId>maven-javadoc-plugin</artifactId>
                    <version>2.9.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <version>2.6</version>
                </plugin>
                <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>3.3</version>
                </plugin>
                <plugin>
                    <artifactId>maven-source-plugin</artifactId>
                    <version>2.2.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.17</version>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>

</project>

Your code now needs a minimum of the @Benchmark annotation like this:

    @Benchmark
    public byte[] decrypt() throws Exception {
        cipher.init(DECRYPT_MODE, secretKey);
        return cipher.doFinal(encrypted);
    }

You don't need to return anything but I did for the sake of the my unit tests.

There are many ways to run your benchmark but the one I quite like looks something like this:

$ mvn clean install && java -jar target/benchmarks.jar ".*CryptoBenchmarks.*" -wi 4 -i 20 -f 1 -bm avgt -tu ns

where wi is the number of warm-up iterations, i is the number of actual iterations for the test, f is the number of times we fork for a particular iteration and tu is time units (here it is nanoseconds). The options can be found in the CommandLineOptions class. The -bm switch is the benchmark mode and options can be found in the Mode class where here we're asking it to measure the average times.

Note: you'll need to add a main method to your classes.

The output for my comparison of RSA and DES looked like this:

Benchmark                                                      Mode   Samples        Score  Score error    Units
c.p.m.RsaCryptoBenchmarks.decryptionUsingPrivateKey            avgt        20  6257593.175   197497.265    ns/op
c.p.m.RsaCryptoBenchmarks.encryptionUsingPublicKey             avgt        20   203438.819     2428.467    ns/op
c.p.m.DesCryptoBenchmarks.decrypt                              avgt        20      672.128       40.558    ns/op
c.p.m.DesCryptoBenchmarks.encrypt                              avgt        20      731.335       57.101    ns/op


So, encryption is roughly 1000 times more expensive in RSA and decryption 10 000.

[1] Applied Cryptography, Bruce Schneier.