Why Spark can’t foldLeft: Monoids and Associativity.

Apache Spark is the the elephant in a room full of data processing engines, yet Spark does not supply a foldLeft() or foldRight() method on its RDD class. Strange right? Such a fundamental collection method. How could it be forgotten? Or, was this not an accident? scoreAverageByPlayer(), which would take an RDD, and return an RDD … Continue reading Why Spark can’t foldLeft: Monoids and Associativity.

Zookeeper in AWS: Practices for High Availability with Exhibitor

Overview Zookeeper is a distributed sequentially consistent system developed to attack the many tough use cases surrounding distributed systems such as leader election in a cluster, configuration, and distributed locking. For more Zookeeper recipes visit: http://zookeeper.apache.org/doc/current/recipes.html. Zookeeper clusters(ensembles) can be made of of any number of nodes, but typically take the form of a three … Continue reading Zookeeper in AWS: Practices for High Availability with Exhibitor