Why Spark can’t foldLeft: Monoids and Associativity.

Apache Spark is the the elephant in a room full of data processing engines, yet Spark does not supply a foldLeft() or foldRight() method on its RDD class. Strange right? Such a fundamental collection method. How could it be forgotten? Or, was this not an accident? scoreAverageByPlayer(), which would take an RDD, and return an RDD … Continue reading Why Spark can’t foldLeft: Monoids and Associativity.

Zookeeper in AWS: Practices for High Availability with Exhibitor

Overview Zookeeper is a distributed sequentially consistent system developed to attack the many tough use cases surrounding distributed systems such as leader election in a cluster, configuration, and distributed locking. For more Zookeeper recipes visit: http://zookeeper.apache.org/doc/current/recipes.html. Zookeeper clusters(ensembles) can be made of of any number of nodes, but typically take the form of a three … Continue reading Zookeeper in AWS: Practices for High Availability with Exhibitor

A Web Server in 5 Minutes with Scala + Jetty + SBT

Recently, I was tasked with developing a load generation tool on top of Twitter's open source Iago project. I initially validated the request rates of the app using a separate local Play! app as a victim server with restful endpoints summing the requests. But.. this setup wasn't going to cut it within my acceptance test suite. Solution: Embedded Jetty Server. Goal … Continue reading A Web Server in 5 Minutes with Scala + Jetty + SBT