# Reflect on some questions about Scala’s performance |

Scala is the programming language behind many of the most important tools that allow The Guardian to deliver its journalism. And compared to other languages, Scala has a lot of wonderful tools to make your code simpler. However, this is sometimes achieved by hiding much of the complexity. Usually that’s a good thing – it’s the compiler author’s problem and we have the luxury of being able to just ignore it and continue with what we’re trying to do. But sometimes it can actually influence the way our code executes and, as a result, present us with surprises.

To have something to work with, I started playing around with solutions to project Euler problem 39 (spoilers for issues after 100 are discouraged but this one is OK).

If p is the perimeter of a right triangle with full sides, {a, b, c}, there are exactly three solutions for p = 120.

{20,48,52}, {24,45,51}, {30,40,50}

For what value of p 1000 is the number of solutions maximized?

We want to find the combinations of three numbers that are valid lengths for the sides of a right triangle and which in turn add up to each number (perimeter) less than 1000. The final answer will be the perimeter that has the most large number of these triplets.

There are almost certainly many better ways to solve this problem than those discussed below – we’re just using a naïve brute-force approach – but the topic under study is the relative performance of the different Scala coding styles, and that therefore serves very well to illustrate this point.

We could start by writing a function which finds the valid triples for a given perimeter p. A first instinct could be to reach a for () loop.

**object** ExampleScala {**def** calc (p: integer) = {**var** result = 0**for** {

a b c = p- (a + b)

} **if**(b * b + c * c == a * a) result + = 1

results

}

}

Running it in Scala REPL (the interactive coding environment) indeed gives us the solution:

scala> timed ((3 to 1000) .maxBy (ExampleScala.calc _))

took 1240123372ns (1240ms)

val res1: Int = 840

but we can see that it takes about 1.3 seconds (for the definition of timed (), which is very simple, you can see this essential).

If you’re familiar with a language like Java or C, you might have assumed that this code was roughly equivalent to this Java snippet:

**Public** **to classify** ExampleJava {**Public** static int calc (int p) {

result int = 0;**for**(int a = 3; a **for**(int b = 1; b int c = p- (a + b);**if**(b * b + c * c == a * a) result ++;

}

}**to recover** results;

}

}

but the above is more than ten times faster:

scala> timed ((3 to 1000) .maxBy (ExampleJava.calc _))

took 85822387ns (85ms)

val res1: Int = 840

This suggests that although the two languages have the for keyword, they result in very different executions. Perhaps due to the emphasis on immutable variables, Scala does not have a direct equivalent to this Java for loop, but of course it has time so that we can easily write something in Scala that is roughly equivalent to the Java code snippet:

**object** ExampleScala {**def** calcWithWhile (p: integer) = {**var** (result, a) = (0, 3)**while**(a **var** b = 1**while**(b **val** c = p- (a + b)**if**(b * b + c * c == a * a) result + = 1

b + = 1

}

a + = 1

}

results

}

}

For our poor sensitive eyes of Scala, accustomed to contemplating only sublime functional concisions, this can be unpleasant to look at. But how does it work :

scala> timed ((3 to 1000) .maxBy (ExampleScala.calcWithWhile _))

took 66075510ns (66ms)

val res0: Int = 840

Even better than the Java version (by the way in all these cases I’m running from a cold JVM instance).

What is the difference between these two implementations? We can find out using the -Xprint: parser option to show us what our Scala code looks like after some of the sugar removal, in this case removing the for understandings (the label alone here is revealing – Java has for loops, but Scala has for-understandings – clearly it is not the same thing…).

def calc (p: integer) = {

var result = 0;

3.to (p) .foreach (((a) => 1.until (p. $ Minus (a). $ Div (2)). Map (((b) => {

val c = p. $ minus (a. $ plus (b));

scala.Tuple2 (b, c)

})). foreach (((x $ 1) => x $ 1: @ scala.match not verified {

case scala.Tuple2 ((b @ _), (c @ _)) => if (b. $ times (b). $ plus (c. $ times (c)). $ eq $ eq (a. $ times (a)))

result. $ plus $ eq (1)

other

()

}))));

results

};

Our code has been converted by creating a number of different anonymous functions, which are then called in a loop using foreach (). Even more alarming is the use of a map () call. foreach () doesn’t keep the results and reconstruct them in another data structure like map () does, so it’s naturally much faster. This example map () actually associates each value of b with its corresponding c in a tuple, and reintegrates it into the original sequence before applying another function that applies to each match and performs the calculation.

This storage of data in a newly called sequence before processing them is memory intensive and fully redundant. I was surprised to see this, as I had assumed that the assignments in the understanding frame were equivalent to the assignments in the code block, but this was clearly not correct.

This is confirmed when we examine the stack traces produced by running this code with a profiler:

ns percentage of samples high

———- ——- ——- —

30940000000 55.07% 3094 scala.collection.immutable.VectorBuilder.result

5,000,000,000 8.90% 500 Example Scala $. $ Anonfun $ calc $ 2

3420000000 6.09% 342 scala.collection.immutable.VectorStatics $ .foreachRec

2290000000 4.08% 229 scala.collection.immutable.VectorBuilder.advance1

He spends 55% of his time creating vectors that we don’t even need.

Therefore, we can immediately make a quick performance boost by making a small, seemingly innocuous change to this code:

**object** ExampleScala {**def** calcNoAssign (p: Int) = {**var** result = 0**for** {

a b } {**val** c = p- (a + b)**if**(b * b + c * c == a * a) result + = 1

}

results

}

}

Note that the assignment is now moved to the code block. How does it work:

scala> timed ((3 to 1000) .maxBy (ExampleScala.calcNoAssign _))

took 302834504ns (302ms)

val res0: Int = 840

302 ms! Another huge improvement. It’s still much slower than the while loop version. Why is that? I’m assuming this is due to creating and running the remaining lambda function. Function calls are very expensive operations in terms of performance, mainly because we need to store our execution state somewhere before entering the next function so that we can return to that state later. Of course, this is what makes function calls powerful, so when you need them, this is the only solution. But if you solve a problem like this, then this state is potentially redundant because each iteration can be calculated entirely without the state of the previous one. It also shows why optimizing queue recursion is so fast. While it allows you to write code that conceptually looks exactly like a recursive function call, it actually removes that execution state at every recursion point, because in the event that optimizing the recursion of the queue is valid, this previous state is not required.

By the way, while examining this problem, I came across the awesome cfor () macro from the TypeLevel Spire library.

**object** ExampleScala {**import** spire.syntaxe.cfor._**def** calcWithCfor (p: Integer) = {**var** result = 0

cforRange (3 to p) {a =>

cforRange (1 until ((pa) / 2)) {b =>**val** c = p- (a + b)**if**(b * b + c * c == a * a) result + = 1

}

}

results

}

}

scala> timed ((3 to 1000) .maxBy (ExampleScala.calcWithCfor _))

took 57206928ns (57ms)

val res0: Int = 840

Surprisingly, this is the fastest result yet, even faster than the while loop solution.

One reassuring conclusion we can take from all of this is that Scala actually performs really well when it needs to. We just have to be careful where and when we use its rich abstractions. Stand-alone programming issues like the ones you find on Project Euler and other sites can be a very useful way to experience Scala’s many different ways of solving a problem, so why not start exploring! And of course, it’s never really about getting the right answer in the end, but much more about the path (s) you take to get there.