Sunday, January 18, 2015

Generics design: Java and Scala


Java

"In contrast [to arrays], the subtyping relation for generics is invariant meaning that type List<S> is not considered to be a subtype of List<T> except in the trivial case where S and T are identical." [1] as a result, all of Java's collections are invariant.

So, why is there a hole in the type system that allows this to compile?

        List<Integer> ints  = Arrays.asList(1, 2, 3);
        boolean       found = ints.contains("a string");

of course ints cannot contain a String. You'll see the contains takes an Object.

"The designers of the Java libraries chose the first, more liberal, alternative, because someone using the Collections Framework before generics might well have written a test such as ints.containsAll
(objs), and that person would want that test to remain valid after generics were added to Java... Arguably, the library designers made the wrong choice." [1]

Scala

So, Scala must be smarter right? Let's take this class hierarchy:

class GrandParent { }
class Parent extends GrandParent { }
class Child extends Parent { }

And let's instantiate a few:

  val aGrandParent: GrandParent = new GrandParent
  val aParent:      Parent      = new Parent
  val aChild:       Child       = new Child

Now let's try


    val setOfParents:  Set[Parent] = Set(aParent)
    val setOfChildren: Set[Child]  = Set(aChild)
//    val contained = setOfChildren(aParent) // doesn't compile: type mismatch
    val contained                  = setOfParents(aChild)

So, we're stopped from asking if a set of children contains a parent object but we can still ask a set of parents if they contain child objects. That's better. So what about lists?

    val listOfParents:  List[Parent] = List(aParent)
    val listOfChildren: List[Child]  = List(aChild)

    listOfParents.contains(aChild)
    listOfChildren.contains(aParent)

Yikes - it's just as bad as Java.

The reason is straightforward (why I will explain later):

In the case of List, covariance was chosen, and List.contains takes Any.
In the case of Set, invariance was chosen, and Set.contains takes A.

The question remains: why was this design decision made?

To which Martin Odersky replied:

"On the issue of sets, I believe the non-variance stems also from the implementations. Common sets are implemented as hashtables, which are non-variant arrays of the key type. I agree it's a slighly annoying irregularity."

So, basically, a design decision in the cases of both Java and Scala.

[1] Java Generics, Naftalin and Wadler

Further reading:

1. the debate on StackOverflow.
2. Michael Payton Jones's blog.

Saturday, January 17, 2015

Automatic Boundary Checking


Surprises in Java's Math class

ScalaTest/ScalaCheck has a rather nice little feature. It can boundary check your unit tests. Let's take this code:

import org.scalatest.WordSpec
import org.scalatest.prop.GeneratorDrivenPropertyChecks
import org.scalatest.Matchers

class MyScalaTest extends WordSpec with GeneratorDrivenPropertyChecks with Matchers {

  "java.lang.Math.abs" should {
    "always return a non-negative value" in {
      forAll { x: Int =>
        Math.abs(x) should be >= 0
      }
    }
  }
}

It nicely runs the test many times, probing it with different values. Unfortunately, it fails with:

[info] MyScalaTest:
[info] Math.abs
[info] - should always return a non-negative value *** FAILED ***
[info]   TestFailedException was thrown during property evaluation.
[info]     Message: -2147483648 was not greater than or equal to 0
[info]     Location: (MyScalaTest.scala:12)
[info]     Occurred when passed generated values (
[info]       arg0 = -2147483648
[info]     )

What? How could java.lang.Math.abs fail to return a non-negative value? Well, we hit a boundary condition. ScalaCheck is clever enough to try Integer.MIN_VALUE. However, because there are more negative Integers than positive (the range is -231 to 231-1), not all negative integers map to positive ones.

If we are happy with a limited range, we can write:

import org.scalacheck.Gen
.
.     
      val sensibleRange = Gen.choose(0, 100)

      forAll(sensibleRange) {
.
.

to choose numbers inclusive of 0 to 99.

Friday, January 16, 2015

All the Kings of America have beards


Consider this. I say: "everybody in the room next door is called John". You look inside the room and see nobody there and tell me so. If I then said: "Yes, everybody in the room is called John!", you might look sideways at me.

Yet this is not only mathematically correct but also what we see in functional language.

Take this Java:

        Set<String> emptySet = new HashSet<String>();
        Predicate<? super String> isHelloWorld = (String x) -> { return "hello world".equals(x); };
        System.out.println(emptySet.stream().allMatch(isHelloWorld)); // true
        System.out.println(emptySet.stream().anyMatch(isHelloWorld)); // false

Or alternatively this Scala:

    def isHelloWorld: (String) => Boolean = "hello world".equals(_)
    val emptySet: Set[String] = Set[String]()
    println(emptySet.forall(isHelloWorld)) // true
    println(emptySet.exists(isHelloWorld)) // false

Any assertion on all the elements of an empty set is always true.

Wednesday, December 24, 2014

Two tips for Java and Scala interoperability


I'm writing more and more Scala these days but don't want to give up on Java libraries I know and love. To this end, I've been writing ScalaTests with Mockito. It's fairly straightforward but two things initially foxed me.

Varargs

The first were varargs. "Both java and scala have varargs" but calling a Java method using varargs from Scala is not obvious.

I had code that looked like:

val x: List[X]       = ...
val ordered: InOrder = Mockito.inOrder(x)

which didn't compile because x was just being treated like an Object rather than an array of them.

The solution looks like this:

val ordered: InOrder = Mockito.inOrder(x:_*)


Conversions between Java and Scala collections

This is solved with a simple import.

import collection.JavaConversions._

Now we can do things like:

Thread.getAllStackTraces.map(x => println(x._1 + ":\n" + (x._2.mkString("\n    "))))

to print out all the stack traces of all the threads currently running. Without this import, the map function would fail as getAllStackTraces returns Map<Thread, StackTracesElement[]> - a Java Map that doesn't have the map method.

Sunday, December 14, 2014

Some useful Scala I learned this week


1. Indexes of a collection

If you want to  traverse a list of elements keeping the index in mind, you could do this with a fold but a more elegant way is something like:

    val words = List("the", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog")

    words.zipWithIndex // returns tuples like (word, index)

2. Choosing only some elements of a collection

Now, let's say we want the tuples of those words that begin with 't'. We do this with the collect function in TraversableLike.

    words.zipWithIndex collect {
        case (word, index) if word.startsWith("t") => (word, index)
    }

(You will note that we pass collect a partial function). This will return the list:

List((the,0), (the,6))

That is, the 0th and 6th words begin with 't', and they are both 'the'.

3. Counting the elements that meet a criteria

is very simple:

    words count (_ startsWith("t"))

Will, unsurprisingly, return 2 for the above collection.


What I learned this week is that you could do all these with folds etc but Scala (I am told) has methods that do pretty much anything you could want. It's just a matter of learning them.

Scala's way of overriding final Java classes


Or "pimp my library".

I was pair programming with a Scala guru when he saw me creating a function to interrogate a String. He took my code and re-wrote as  something like this:

object MyImplicits {
  
  implicit class MyString(val string: String) {
    
      def startsWithD: Boolean = string.startsWith("D")
      
  }

}

object MyImplicitsMain {
  
  import com.phenry.scala.MyImplicits._
  
  def main(args: Array[String]) = {
    val aString = "Does this start with D?"
    println(aString.startsWithD) // this returns true
  }
  
}

So, effectively he added a function to the String class for any code that imports this implicit.

Note that implicits can live in objects and packages (as they basically hold what Java programmers would recognise as statics) but not class or traits.

One is advised to be careful with implicits, however. "Implicits are very powerful but you should use them with care," said Scala's creator.

Sunday, November 9, 2014

Specialized Scala


I've been playing with the maths library, Breeze (a good introduction is here). It's written in Scala and uses a cunning stunt to avoid boilerplate code.

The problem with writing an efficient library in Java is autoboxing is horribly inefficient for sufficiently complicated mathematical operations. The solution is typically to make the code less generic and use primitives (as I did in my pet project, jMathematics).

Scala has a neat trick to avoid this. It uses the @specialized annotation. So, one of Breeze's implementations of a matrix looks like this:

final class DenseMatrix[@specialized(Int, Float, Double) V](...
                                                            val data: Array[V],
                                                            ...)

This results in the compiler creating methods that use these 3 primitives as well as the more generic types. This leads to quite a lot of classes:

henryp@phillsdell:~/Code/Scala/Maths/breeze$ find ./target/scala-2.10/classes/ -name DenseMatrix* | wc -l
350

but this Scala code:

  def valueAt(row: Int, col: Int) = apply(row,col)

(where apply ultimately reads from data: Array[V] in the constructor above) is compiled into this:

henryp@phillsdell:~/Code/Scala/Maths/breeze$ for FILE in `find ./target/scala-2.10/classes/ -name DenseMatrix*` ; do { javap "$FILE" | grep "valueAt(int, int" ; } done
  public int valueAt(int, int);
  public java.lang.Object valueAt(int, int);
  public float valueAt(int, int);
  public java.lang.Object valueAt(int, int);
  public double valueAt(int, int);
  public java.lang.Object valueAt(int, int);
  public V valueAt(int, int);

Great, so a generic Object return value plus methods for our three specialized primitives. What could possibly go wrong? Well careless use can lead to literally megabytes of code being generated. That's why it's so rare in the standard Scala libraries.

However, this is a big win for mathematics library writers.