Sunday, December 14, 2014

Some useful Scala I learned this week


1. Indexes of a collection

If you want to  traverse a list of elements keeping the index in mind, you could do this with a fold but a more elegant way is something like:

    val words = List("the", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog")

    words.zipWithIndex // returns tuples like (word, index)

2. Choosing only some elements of a collection

Now, let's say we want the tuples of those words that begin with 't'. We do this with the collect function in TraversableLike.

    words.zipWithIndex collect {
        case (word, index) if word.startsWith("t") => (word, index)
    }

(You will note that we pass collect a partial function). This will return the list:

List((the,0), (the,6))

That is, the 0th and 6th words begin with 't', and they are both 'the'.

3. Counting the elements that meet a criteria

is very simple:

    words count (_ startsWith("t"))

Will, unsurprisingly, return 2 for the above collection.


What I learned this week is that you could do all these with folds etc but Scala (I am told) has methods that do pretty much anything you could want. It's just a matter of learning them.

Scala's way of overriding final Java classes


Or "pimp my library".

I was pair programming with a Scala guru when he saw me creating a function to interrogate a String. He took my code and re-wrote as  something like this:

object MyImplicits {
  
  implicit class MyString(val string: String) {
    
      def startsWithD: Boolean = string.startsWith("D")
      
  }

}

object MyImplicitsMain {
  
  import com.phenry.scala.MyImplicits._
  
  def main(args: Array[String]) = {
    val aString = "Does this start with D?"
    println(aString.startsWithD) // this returns true
  }
  
}

So, effectively he added a function to the String class for any code that imports this implicit.

Note that implicits can live in objects and packages (as they basically hold what Java programmers would recognise as statics) but not class or traits.

One is advised to be careful with implicits, however. "Implicits are very powerful but you should use them with care," said Scala's creator.

Sunday, November 9, 2014

Specialized Scala


I've been playing with the maths library, Breeze (a good introduction is here). It's written in Scala and uses a cunning stunt to avoid boilerplate code.

The problem with writing an efficient library in Java is autoboxing is horribly inefficient for sufficiently complicated mathematical operations. The solution is typically to make the code less generic and use primitives (as I did in my pet project, jMathematics).

Scala has a neat trick to avoid this. It uses the @specialized annotation. So, one of Breeze's implementations of a matrix looks like this:

final class DenseMatrix[@specialized(Int, Float, Double) V](...
                                                            val data: Array[V],
                                                            ...)

This results in the compiler creating methods that use these 3 primitives as well as the more generic types. This leads to quite a lot of classes:

henryp@phillsdell:~/Code/Scala/Maths/breeze$ find ./target/scala-2.10/classes/ -name DenseMatrix* | wc -l
350

but this Scala code:

  def valueAt(row: Int, col: Int) = apply(row,col)

(where apply ultimately reads from data: Array[V] in the constructor above) is compiled into this:

henryp@phillsdell:~/Code/Scala/Maths/breeze$ for FILE in `find ./target/scala-2.10/classes/ -name DenseMatrix*` ; do { javap "$FILE" | grep "valueAt(int, int" ; } done
  public int valueAt(int, int);
  public java.lang.Object valueAt(int, int);
  public float valueAt(int, int);
  public java.lang.Object valueAt(int, int);
  public double valueAt(int, int);
  public java.lang.Object valueAt(int, int);
  public V valueAt(int, int);

Great, so a generic Object return value plus methods for our three specialized primitives. What could possibly go wrong? Well careless use can lead to literally megabytes of code being generated. That's why it's so rare in the standard Scala libraries.

However, this is a big win for mathematics library writers.

Saturday, November 8, 2014

More on Java 8 Lambdas


Using the code found on this tutorial of Java 8's new Optional class, I started playing with the syntax of the new lambdas.

First let's instantiate a simple data structure using POJOs:

        Computer computer = new Computer(new Soundcard(new USB("3.0")));

and wrap it in a fairly straightforward object that's new to Java 8's functional packages:

        Optional<Computer> option = Optional.of(computer);

Now comes the more interesting code. Say Computer has a method thus:

    public Optional<Soundcard> getSoundcard() { ... }

we can flatMap so:

        Optional<Soundcard> flatMapped = option.flatMap(Computer::getSoundcard);

What interesting syntax. 

There's three curious points about this line.


Computer::getSoundcard

We call this a method reference. It's the function we call on all the elements we flatMap over. It's type can be seen by refactoring it out:

        Function<Computer, Optional<Soundcard>> asFunction = Computer::getSoundcard;

Using this newly refactored code, the flatMap line above is equivalent to:

        Optional<Soundcard> flatMapped = option.flatMap(asFunc);


Alternative syntax

We could just have easily used the Lambda Expression Syntax and the flatMap would have looked like this:

        Optional<Soundcard> flatMapped = option.flatMap( (Computer aComputer) -> aComputer.getSoundcard() );

or, equivalently, using the more verbose block form:

        Optional<Soundcard> flatMapped = option.flatMap( (Computer aComputer) -> {
            return aComputer.getSoundcard();
        } );

Again, refactoring this expression out into an isolated function shows us its type:

        Function<Computer, Optional<Soundcard>> asFunction = (Computer aComputer) -> aComputer.getSoundcard();


Map versus FlatMap

Note that the flatMap method was used rather than the map method. There is a very good reason for this.

The flatMap method signature looks like this:

    public<U> Optional<U> flatMap(Function<? super T, Optional<U>> mapper)

while the map method looks like this:

    public<U> Optional<U> map(Function<? super T, ? extends U> mapper)

(where T is the type that this Optional contains in both cases).

If we call map on this Optional, look at the return type:

        Optional<Optional<Soundcard>> mapped = option.map((Computer aComputer) -> aComputer.getSoundcard());

An Optional in an Optional is returned. Looking back at the signatures, the reason for this is clear. 

The method flatMap takes a function that returns an Optional<Soundcard> and this is exactly what the flatMap method returns.

But map's method signature says it takes a function that returns a type U and wraps that U in an Optional and returns it. Since we can pass the method reference Computer::getSoundcard to map and that obviously returns Optional<Soundcard> (because that's what getSoundcard() says it returns), the type U is Optional<Soundcard>. So, U is wrapped in an Optional and U is an Optional. Therefore, an Optional wrapped in an Optional is returned.


Monads

It may seem odd that we map and flatMap on an Optional. Perhaps you're more used to calling these methods on Collections. But Lists and Sets share properties with Optionals in that they're all monads - something I'm only just getting my head around...


Further reading


Wednesday, October 22, 2014

A Lock for all Weathers?


Continuing with the lock comparison code here, I've modified it to test the timings for all the locks for different permutations of readers and writers and compared the results

I've tested the different locks with 1 to 16 readers and 1 to 16 writers on a 16 core (with hyper-threading) E5-2687W chip. With the generated matrix, I used R to graph the results (code at the bottom of the post).

Note, duration is measured in milliseconds and is scaled by log 10.

Synchronized blocks

> showPlot("SYNCHRONIZED")



Perhaps surprisingly, with synchronized blocks, more writers mean the target number is reached more quickly. Reader threads (which are also synchronized) only make a huge difference to the overall duration when there are few writers. Then they can make a difference of an order of magnitude.

JDK 8 Adders

> showPlot("ADDER")


As you might expect from the way adders work, the threads don't seem to interfere with each other. The more threads that write, the faster the target number is reached, irrespective of reader threads. But even when they're slow, they're fast (relative to other locks).

Atomics

> showPlot("ATOMIC")



Atomic times are all tightly within an order of magnitude of each other. Like synchronized, with smaller number of writer threads, a smaller number of reader threads makes a difference. However, the effect is by no means as pronounced.


Java Read-Write Locks

> showPlot("RWLOCK")



Read-write locks are also sensitive to the number of reader threads at low number of write threads. It can make a difference of an order of magnitude. Best and worse times are also similar to Synchronized blocks.


JDK 8 StampedLocks with an Optimistic Approach

> showPlot("OPTIMISTIC_CORRECT")


I've modified the original code to retry optimistic locks in the event it was not attained, which I think makes for a more fair comparison.

This strategy is relatively insensitive to the number of threads when compared to other locking techniques but it's not the fastest strategy.

Volatile

> showPlot("VOLATILE")



NB: this is not to be trusted too much. That adding a value to a value is not an atomic operation in Java so this metric is somewhat misleading. However, if we just care for speed, knock yourself out.

Having said that, this technique is fastest when there is only one writer - in this particular case, you'll get the correct result. Fastest of all is one reader, one writer. This is not that much slower than a dirty read/write.

Dirty

> showPlot("DIRTY")


I've just just added this for grins. As long as you don't mind dirty data, it's fine.

StampedLock

> showPlot("STAMPED")


StampedLock again but this time  using a pessimistic locking strategy. I've actually got some data missing at a low number of writer threads and high reader threads as the test decided it took too long and aborted! Suffice to say, it's fast for anything but a permutation of low number of writer threads with a large number of reader threads.


R Scripting

These graphics were generated with R. The code is below for review.

The raw results are a CSV file that looks like:

Lock_type, Number_of_Readers, Number_of_Writers, Duration_in_ms

allLocksMatrix.R:

.libPaths("/home/henryp/R/x86_64-unknown-linux-gnu-library/3.1/")
require("rgl")

timingsData            <- read.csv("/home/henryp/Code/R/JavaLocks/all.csv", header = T)
names                  <- levels(timingsData$name)

toMatrix <- function (counterName) {
  counter.data             <- toRaw(counterName)
  counter.dim              <- sqrt(length(counter.data[,1]))
  counter.matrix           <- matrix(0, nrow=counter.dim, ncol=counter.dim, byrow=T) 
  for (i in 1:counter.dim) 
    for (j in 1:counter.dim) { 
      counter.matrix[i,j] = log10(counter.data[((i - 1)*counter.dim) + j, 4])
    } 
  }
  return (counter.matrix)
}

toRaw <- function(counterName) {
  counter.data             <subset(timingsData, name == counterName) 
  return (counter.data)
}


rotate <- function(x) t(apply(x, 2, rev))

showPlot <- function (counterName) {
  myMatrix <- toMatrix(counterName) # rotate(rotate(rotate(toMatrix(counterName))))
  
  nbcol = 100
  color = rev(rainbow(nbcol, start = 0/6, end = 4/6))
  zcol  = cut(myMatrix, nbcol)
  
  open3d()
  persp3d(seq(1, length(myMatrix[,1])), 
          seq(1, length(myMatrix[1,])), 
          myMatrix, 
          col=color[zcol],
          xlab="",
          ylab="",
          zlab="",
          axes=FALSE
  )

  axes3d(c('x++','z++', 'y++')) 
  box3d()
  mtext3d("Writers",edge="x++",line=3,las=2,cex=1.1)
  mtext3d("Readers",edge="y++",line=3,las=2,cex=1.1)
  mtext3d("duration", edge="z++",line=2,las=2,cex=1.1)

  title3d(counterName, col='red', line=4)
}


Monday, October 20, 2014

New to Java 8: Adders


I was reading this blog and played with the suite that tests how fast read/writes to a long are and I came across a class new to JDK 1.8, LongAdder.

(The test suite has a problem that the class that uses optimistic locking and a StampedLock ignores whether it has attained the lock but I'll deal with that elsewhere).

It's quite a cunning idea. If there is contention on the field containing the long value, the thread that fails to attain the lock creates a new memory location (a Striped64.Cell object) and adds its values there. When it comes to seeing the results of many threads mutating the value, the reader thread returns not just the base value but all values in the additional memory. Thus contention is minimized.

There is a caveat when it comes to adding up all the additional values. From the JavaDoc:

The returned value is NOT an atomic snapshot; invocation in the absence of concurrent updates returns an accurate result, but concurrent updates that occur while the sum is being calculated might not be incorporated.

Anyway, I updated the suite mentioned above adding a modified optimistic locking class that retries in the event not attaining the lock (OPTIMISTIC_CORRECT in the diagram below). The logs output look like this:

name,threads,duration
ADDER,16,782
ADDER,16,810
ADDER,16,769
ADDER,16,792
ADDER,16,771
ADDER,16,793
ADDER,16,735
ADDER,16,777
ADDER,16,768
ADDER,16,849
ATOMIC,16,514
.
.

and wrote some R code to process it and give me a graphic of all the different techniques:

.libPaths("/home/henryp/R/x86_64-unknown-linux-gnu-library/3.1/")
require("epade")
data            <- read.csv("/home/henryp/Code/R/JavaLocks/firstPastPost1W15R.csv")

groupBy         <- list(data$name, data$threads)
aggregatedMeans <-aggregate(data,
                            by=groupBy,
                            FUN=mean,
                            na.rm=TRUE)
aggregatedSds   <- aggregate(data,
                             by=groupBy,
                             FUN=sd,
                             na.rm=TRUE)

meansAndSds     <- cbind(aggregatedMeans[,c(1,5)],
                         aggregatedSds[,5])

ordered         <- meansAndSds[order(meansAndSds[,2]),]

bar3d.ade(data.matrix(ordered[,2],
                      rownames.force = NA),
          main="10000000 reads of a montonically incrementing long",
          y=ordered[,1],
          zticks=ordered[,1],
          xticks="1 writer, 15 readers",
          ylab="Duration (ms)")

The means and standard deviations look like this:

> ordered
             Group.1 duration aggregatedSds[, 5]
3              DIRTY    211.9           11.95780
2             ATOMIC    594.2           71.89777
9           VOLATILE    641.0           72.22188
1              ADDER    708.7           73.71122
5 OPTIMISTIC_CORRECT   1379.3          141.92334
4         OPTIMISTIC   7518.0         1392.87967
8       SYNCHRONIZED   8682.4         1507.51283
6             RWLOCK  10515.4          665.83251
7            STAMPED  21859.3         5076.10047

or, graphically:


Note: this has one writing thread and 15 reading threads on a 16 core (using hyperthreading) CPU. DIRTY means there was no locking, the fastest but least accurate technique. Duration (the vertical axis) is in milliseconds. I don't know why this axis is not labelled when I save the image as it is displayed in RStudio.

So, ADDER is indeed fast but not as fast as ATOMIC (which is also more accurate) on this computer and with this read/write profile. I'll try other numbers of readers and writers later.


Monday, October 13, 2014

R Tip of the Day


If I'd known this a month ago, I'd have saved myself a lot of time. I was pulling GC data from a JVM's logs but my pattern matching wasn't perfect so I was getting a lot of dirty data (NA in R).

To clean these from a vector, you can do something similar to this:

data      <- c( 1, 2, 3, NA, 5, 6 ) # data with some dirty elements
badData   <- is.na(data)            # elements are TRUE or FALSE
data(!badData)
cleanData <- data[!badData]

Similarly, if you have 2-dimensional (or more) data, you can clean all data when there is as much as one element of a tuple that is dirty. For example:

<- c( 1,  2,  3,  NA, 5,  6 )
<- c( 10, 20, NA, 40, 50, 60 )
cleaned <- complete.cases(x, y)

will clean all data where either x or y (or both) is dirty:

> x[cleaned]
[1] 1 2 5 6
> y[cleaned]
[1] 10 20 50 60