Friday, December 16, 2016

More Spark tuning


I've spent some time trying to tune matrix multiplication in Spark. I had 32 executors each with 4 threads and sampled the stacks with jstack over a few minutes with a handy little script:

> cat ~/whatsEveryoneDoing.sh
for BOX in `cat ~/boxes.txt` ; do { echo $BOX ; ssh $BOX "for PID in \`ps -ef | grep $1 | grep jdk8 | grep -v bash | awk '{print \$2}'\` ; do { sudo -u yarn jstack \$PID | grep -A45 ^\\\"Exec ; } done ; echo " 2>/dev/null ; } done

where ~/boxes.txt is just a file with the addresses of all the boxes in the cluster and the argument with which you invoke it is the application ID of the job.

I noticed that of my 128 executor threads, in each sample 50 or so were contending for the lock at:

org.apache.spark.storage.memory.MemoryStore.reserveUnrollMemoryForThisTask(Memstore.scala:570)

(This is Spark 2.0.2).

So, reducing --exector-cores from 4 to 2 while only increasing the number of executors from 32 to 48 had a big impact. The runtime was reduced from about 7.5 hours to 5 hours.

No comments:

Post a Comment