prose :: and :: conz

Map, reduce, and fold for the programmatically imperative

When I first experienced functional programming in Scala, I had some learning curves to get over thanks to years of imperative programming. One of those came when knowing what to do with a collection of items (more generally, this applies to some (all?) monadic types. However, I’m going to resist the urge to contribute to the explosion of monad tutorials at this time). My instinct is to “loop over” the collection, grabbing each object in my collection with a variable. In this post, I’ll share how I’ve tried to make my brain work differently for idiomatic functional programming. In particular, we’re going to look at these three operations: map, reduce, and fold. My approach here will be much like the way I teach when instructing intro to discrete structures at UAHuntsville; I’ll give a thorough disposition on the topic, then give you the rules of thumb to help you remember the material.

I use map constantly when programming in Scala. It strikes me as being as ubiquitous as a for loop in the imperative paradigm. The name “map” should hearken back to your own discrete mathematics days when you studied functions, as the two are synonymous. The map applies a given mapping/function to the entire collection. If you recall, a function defines an association of an element in the domain to precisely one element in the co-domain. A rather descriptive term for an element in the domain is “pre-image” and the respective element in the co-domain is the “image”. Likewise, a function applied to an entire set produces an “image”.

Let’s look at an example of using map. Suppose you have this array of not-so-random numbers:

val nums = List(827, 34, 15, 27, 8, 1, 55)

And you would like to add 1 to each number:

val numsPlusOne = => i + 1)

Hopefully even if you’re new to Scala and functional programming, you can see what’s happening here. The expression i => i + 1 is a function/mapping (or “lambda” in programming terms) which is applied to every element of the collection to produce the elements of the returned collection. Hence it produces an image that is equal to the following list:

List(828, 35, 16, 28, 9, 2, 56)

For extra credit, there is also a short-hand notation that we can use to produce the same result:

val numsPlusOne = + 1)

The underscore is like an anonymous argument. No need to name it i when you can just tell the compiler what to do with the elements the collection contains.

Of course, it’s not always the case that we can act on the elements individually this way. Often times we need some state carried along the way to solve the problem. The most classic example is finding the largest element of a collection. If you’re programmatically imperative like me, you only know how to do this with a mutable variable to hang on to the max element so far. This would be the Java code I would write for the above list:

Collection<Integer> nums = Arrays.asList(827, 34, 15, 27, 8, 1, 55);
Integer max = nums.get(0);
for(Integer i : nums) {
    if(i > max) max = i;

So how do we shift our thinking from imperative’s mutable state? Well, as a student of Odersky’s Functional Programming Principles in Scala course, my instinct was to reach for a recursive solution:

def max(nums:List[Int]) = {
  def maxRec(nums:List[Int], maxThusFar:Int):Int = {
    if(nums.isEmpty) maxThusFar
    else maxRec(nums.tail, 
      if(nums.head > maxThusFar) nums.head else maxThusFar)
  maxRec(nums, nums(0))

While that is certainly valid and correct, one should be very suspicious of this approach. Notice how much more code it took than the Java solution. In general, Scala requires one-half or one-third the amount of code as Java to solve the same problem. Well it turns out that this is a very common pattern that is implemented as a the operation reduce.

This operation to find the max of a collection utilizes a binary operator. Don’t recall that from discrete mathematics? Quick review: A binary operator on a set S is a function defined on S x S that always returns a member of set S. For example, + when defined as arithmetic addition is a binary operator on the natural numbers because it is defined for any two natural numbers and always returns another natural number.

In the case of finding the max of a collection, we are applying a binary operator (Line 5 in the previous snippet) pair-wise across the collection. For each pair, this binary operator returns the larger of the two integers. Reduce will apply any binary operator across the collection exactly in this manner. In particular, we will use reduceLeft which traverses the collection from left to right:

def max(nums:List[Int]) = 
  nums.reduceLeft((maxThusFar, next) => 
    if(next > maxThusFar) next else maxThusFar)

Here we see a lambda which takes a pair of integers (the type is statically known due to type inference) and returns an integer, hence we have a binary operator on the set of integers. In particular, returns one of the two integers keying off the if condition. This lambda is precisely our binary operator from before, except I’ve used the name next in lieu of the call to nums.head. As I previously stated, reduce will apply this lambda to the first two elements, then to the result of that application and the third element, etc.

Now that it’s a one-liner (sans the newlines I added for readability) as opposed to the three in our Java source, we can feel pretty good about this solution.

Caveat: reduceLeft will raise a runtime exception if the nums argument is an empty list.

That’s pretty handy as long as the solution to your problem is of the same type as the items in the collection. For instance, perhaps we want to produce a string by concatenating our numbers together. We’ll keep it simple and leave the string unformatted, hence for our nums value, we want to produce the string "8273415278155". To solve this problem imperatively, I would naturally reach for the same idiom as the max problem: a mutable variable and a for loop:

String str = "";
for(Integer i : nums) {
    str += i;

Note: If you’re rather seasoned with Java, you may even feel the need to reach for a StringBuilder. You’re welcome to do that, but note that it’s even MORE imperative than my snippet.

Again, we can implement this recursively in Scala:

def concatStr(nums:List[Int]) = {
  def strRec(nums:List[Int], strThusFar:String):String = {
    if(nums.isEmpty) strThusFar
    else strRec(nums.tail, strThusFar + nums.head)
  strRec(nums, "")

Just as with max(nums), we find that we’re implementing a pattern that is quite common. Speaking very abstractly, the pattern is to take some initial value (The empty string in our case) along with each value of the collection, perform an operation, carry the result of the operation forward (as strThusFar in the above code), and return the final result. This pattern is what is known as the operation fold. Let me show you the call to foldLeft (like reduceLeft, this one traverses the collection from the left to the right) and we’ll discuss it further:

def concatStr(nums:List[Int]) = 
  nums.foldLeft("")((strThusFar, i) => strThusFar + i)

Here again we have a nice one-liner solution which removes boilerplate due to the common pattern utilized and emphasizes the core operation. You see the empty string as the first argument to foldLeft as our initial value, and the lambda which incrementally builds our solution.

Want more extra credit? We can use the underscore in this snippet as well:

def concatStr(nums:List[Int]) = nums.foldLeft("")(_ + _)

By the way, I should confess that this example is a little bit contrived. If you want to convert a collection of items into a string, an even better solution is to call mkString:

def concatStr(nums:List[Int]) = nums.mkString

It optionally takes an argument to define a separator/delimiter. I often find mkString to be very useful.

The Rules of Thumb
Now that I have shown you these three common functional programming operations, I’d like to give you the rules of thumb that help me know when to use which one.

Use map any time you want to do something with every element individually in a collection, especially if you want a collection of the results. The terminology of “pre-image” and “image” of a map as defined in mathematics always gives me the visual cue to select map as the operation I need. Also note that the order in which the map is applied to the collection doesn’t really matter because of how each application of the lambda is in isolation to the others. So individual/isolated operation + collection of results => map.

Use reduce any time you want to perform an operation on the collection that would otherwise require an intermediate variable and the result is the same type as the members of the collection. I always think of it as reducing the collection to a single element. However, note that it’s not required that the result is a member of the collection; it needs only to be the same type as the collection. For instance, you could use reduce to find the sum of a collection of integers. So operation that needs intermediate state + result is the same type as the members of the collection => reduce. (Also, don’t forget that unlike the other two operations, this one goes belly-up with an empty collection. The reason why is an exercise left to the reader)

Use fold when like reduce you want to perform an operation on the collection that would otherwise require an intermediate variable, but either the result of the operation is of a different type than the collection or you need to provide an initial value. Note that reduce is a special case of fold in that it doesn’t have an explicit initial value (one can think of the first element as the initial value) and it can only return elements of the same type as the collection. By having an initial value, fold is a little more resilient in that it handles the empty collection case.

Hopefully this little guide will help other folks who are programmatically imperative like I am. Be on the lookout for more posts like this one in the future as I’m sure I’ll need to create some more posts to help myself rehab from imperative programming.

Tagged with: scala (41), java (22)