Coding Devops Distributed Systems Getting Started in Coding Scala Software Engineering Uncategorized

Robinhood CLI for Quick Stock Exits + Multi-Factor Login Support

Screen Shot 2019-02-19 at 4.40.05 PM

What Robinhood is Missing

I’ve really enjoyed trading with Robinhood (NO FEES) these last few months, but viewing some metrics in their web UI and mobile app take too many clicks. For one, determining my percent return requires three clicks for each Stock position. This is too slow when you want to be quickly informed on when to make quick exits.

The CLI Tool

Preferably, I wanted a quick and dirty CLI tool in Python to crunch these numbers for me. I ended up finding a great open source Python framework to interact with Robinhood’s backend API and decided to make some tweaks to it to support:

  1. Streamlined Multi-factor Auth Login
  2. Improved Security with credentials as environment variables
  3. Calculate and display percent return on each position

You can find my working Fork here:

Running the Tool

Executing Script + Login Prompt + MFA Code: 

(I have a bash/zsh alias here “robin” pointing to the script Shot 2019-02-19 at 3.57.12 PM

Metrics Output:

By sorting each position on this newly generated percent return field, your eyes are quickly drawn to the positions that you may want to exit soon.

Screen Shot 2019-02-19 at 3.33.58 PM

Happy trading!

Note: There are plans to have this work merged back into source repo in some fashion.

Code camp Coding Containers Devops Distributed Systems Docker Getting Started in Coding LinkedIn reddit Scala Software Engineering Uncategorized Web

A Better Docker Container Tagging Strategy for CI/CD

Continuous delivery is difficult, but if your applications are containerized with Docker you’re moving in the right direction to make things easier! Containers provide a ton of flexibility and portability, but they can become a nightmare once you realize the pain of container management. One thing to make it easier is to have a standard container tagging strategy to provide common assumptions and vernacular amongst the team.

Docker container build pipelines, tagging strategies, and CI/CD should go hand-in-hand.

Do I Need a Better Tagging Strategy?

You might want to rethink your Docker Image tagging strategy if you don’t immediately know the answer to the following questions:

  1. “What git commit hash of our app is currently running in production?”
  2. “Which container version in our registry is currently running in production?”

Strategy: Release Candidate Lifecycle Tagging

The tagging method I find most attractive is what I call “Release Candidate Lifecycle Tagging”. The tag values should follow a flavor of release candidate terminology along the delivery pipeline similar to:

Build Stage Tag Value Development Stage
Initial Build
  • <Commit Hash>
  • unstable
Passed Tests
(Contract, Integration, Service)
  • stable
“Release Candidate”
Deployed to Production +
Smoke Tested
  • live
“GA” (General Availability)

What it Looks like in a Build Pipeline:

In the following example of a release of the app “app”, the current git checkout sha hash is “ff613f”Initially building the Docker Image with the git sha hash is a pivotal piece that allows teams to know where/how to checkout the application for local or remote debugging of the exact version of the application.

A CI/CD build pipeline with incremental image tagging.

Taking it Further

Post-production Tags

With canary or blue/green deployments, additional tagging stages could be added incrementally to not only reflect that containers have made it to production, but that they reached levels of validity or traffic based performance metrics.

Retiring Images

Once an app image has been replaced by it’s subsequently upgraded version, the previous image needs to remain in the docker registry for an arbitrary amount of time in case a rollback is required. This can be accomplished by adding another tag after the image is retired like “retired-<RETIRED_DATE>”. Then, a reaping processes could take advantage of this new tag and only remove imagess that are X days old.



I want to give a shout out to Daniel Nephin, as his detailed and explanatory Github comments and issue discussions have led me to resolving many issues around Docker and container strategy.

Code camp Coding Distributed Systems First programming job Getting Started in Coding JAVA LinkedIn Scala Software Engineering Spark Uncategorized Web

From Junior to Senior: Software Engineering Must-Knows

* This is a living document and will be update over time*

Why these Resources?

Along a software developer’s journey from post-grad to seasoned vet, you come across articles and literature that enlighten you, propelling your skills forward by miles rather than inches. This is a collection of those essential resources that I feel a software engineer should know to be an informed, efficient, and effective engineer.


  1. Maintaining Clean Code
  2. Database Design
  3. Lean Engineering
  4. Testing
  5. Technical Decision Making
  6. Managing Deployments
  7. Container Orchestration
  8. JVM
    1. JVM Tuning
    2. Scala
  9. Machine Learning


1. Maintaining Clean Code

Clean Code (Book by Robert Martin)

“Clean Code” is one of those books that after reading it, you come out with an immediate feeling of both excitement (You know how to write maintainable code now!), and regret (you realize the code you have been writing your whole life is smelly!). While a few chapters are pretty dated technically, it successfully outlines sound practices to maintain hygienic object oriented codebases that can be borrowed for other programming paradigms. This book is a must-know!

Dependency Injection (DI)/Inversion of Control (IoC)

2. Database Design


Normalization is easy to avoid early on, but tough to ignore its effects later down the road. When designing databases, five extra minutes spent thinking about and adhering to normalization will save days, if not weeks, later on in redesign and data integrity issue resolution. Trust me.
Short walkthrough on Normalization:

3. Lean Engineering

Implementing Lean Software Development (Book by the Mary and Tom Poppendieck)

4. Testing

Testing Quadrants

Those needing to prune, or cherry pick certain testing practices into their operations, can benefit from the diagram “Agile Testing Quadrants”. It outlines each test type’s organizational boundaries, initiation mechanism, and outcomes.

Is Unit Testing Worth it?

Chances are you eventually started work at a company whose culture had a baked-in focus on quality, where you set off following orders to test, then realized the benefits later. For some, you are one of the testing thought-leaders at your organization and have to sell the benefit! This article gives you the points that express why unit testing is more than a nicety.

Testing in a Microservices Architecture

5. Technical Decision Making

Building Consensus Before Commitment

Encroaching on the famed “How to win friends and influence people” genre, this article explains how and why you should take a holistic approach to presentations and multi-org affecting decisions.

Technology Radar

A must in every developers exposure toolkit. The Thoughtworks team hand curates languages, frameworks, and practices organizations should adopt, trial, and assess.

Site Reliability Engineering Learnings

6. Managing Deployments

Continuous Integration

Git Workflows

Terraform Up-and-Running

While not critical to know intimately, Terraform is an amazing option as a multi PAAS hosting framework and Infra as Code management tool.

7. Container Orchestration

Kubernetes vs ECS

While this article will quickly grow stale, it is a great comparison of two of the leaders in cloud container orchestration and hosting.

8. JVM


Class and Package Naming Strategies

While we all like to think we always execute the best file and class packaging practices, this naming and scoping refresher from Nikita Volkov can keep you sharp!

Scala Interview Questions

Effective Scala


Extensive Learnings from JVM Performance Tuning

Profiling with VisualVM

This tool is awesome for investigating how JAVA options affect performance, and getting a feel for your apps overall health.


9. Machine Learning

10 Algorithms Software Engineers must know

Disclaimer on References

The resources in this list are intended to be self referencing and imply the original authors are the ones that are due an immense amount of credit.

Think a resource should be added to this article? Please submit it here:

Code camp Coding Distributed Systems First programming job Getting Started in Coding JAVA LinkedIn reddit Scala Software Engineering Spark Uncategorized Web

Why Spark can’t foldLeft: Monoids and Associativity.

cover (3)
Apache Spark is the the elephant in a room full of data processing engines, yet Spark does not supply a foldLeft() or foldRight() method on its RDD class. Strange right? Such a fundamental collection method. How could it be forgotten? Or, was this not an accident?


scoreAverageByPlayer(), which would take an RDD, and return an RDD of tuples of each player with their average score. Note: foldLeft() is not an available method on the scores RDD class.

Remembering associativity

Lets dig deeper, because the realization of the answer is more useful than the answer itself.

Back to Algebra class we go! Associativity is one of the many algebraic properties defining functional mathematics and therefore functional programming. Without its understanding we cannot truly appreciate parallelism in computing and its limitations.

Mathematical Associativity:
"When the order in which the operations are performed does not matter as long as the sequence of the operands is not changed. That is, rearranging the parentheses in such an expression will not change its value."
The following expressions are associative:


Even though the parentheses were rearranged in the equation for res2, the values of res1 and res2 remained equivalent. It can then be said that the act of addition of real numbers is an associative operation.

How Spark achieves Parallelism

In order for Spark to become a leader in computational speed, it needed to incorporate operational parallelism. Parallelism will ultimately be the reason foldLeft is not found on the RDD class.


At a high level, Spark clusters computational “worker” nodes or machines, partitions the data to be computed on in the master, distributes the data partitions from the master to the worker nodes where the computations are done on each node’s respective shard of data, then aggregates the resulting dataset(s) on the master node.


You can force Spark to parallelize computation on an RDD by using parallelize() on a SparkContext.

val scores = Array(68, 71, 73)
val parScores = sc.parallelize(scores)

Below is a function f being applied to an input dataset concurrently on a spark cluster. This can be thought of as a map transformation.

Parallelization Visualized


Parallelizing reduce() in Spark

Let’s look at how spark parallelizes the reduce operation on an RDD.

reduce() from the Spark Documentation

Action Meaning
reduce(func) Aggregate the elements of the dataset using a function func (which takes two arguments and returns one). The function should be commutative and associative so that it can be computed correctly in parallel.

“The function should be commutative and associative so that it can be computed correctly in parallel.”

Signature of reduce()

def reduce[A](op: (A, A) => A): A

This reads:

Execute the function “op” on each element (type A), with the result of the previous op computation (accumulator of type A) and respective element (type A) as inputs, eventually returning the resulting accumulator value from the last iteration (type A).

Spark’s reduce() in action

Now let’s say we have a set of “score” integers and want to determine the lowest score. We can execute a reduce action on the RDD with a monoid findMin() (more on monoids later) as an operational parameter to solve this.

In code we would solve this like:

val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)
def findMin(first: Int, second: Int): Int = first.min(second)

val min = distData.reduce(findMin)

This would evaluate in Spark as:



We know what parallelism looks like in Spark, but why can’t we use foldLeft()? This will come together, but we need to understand Monoids first.

The laws surrounding Monoids are tightly coupled to associativity and state a Monoid operation:

  • Is of some type A
  • Consists of an operation, op, taking two values of type A, combining them into a single: op(op(x,y), z) == op(x, op(y,z)), where the type of x, y and z is A
  • Has an identity for the operation that maintains: op(x, zero) == x and op(zero, x) == x for any x of type A

Balanced Folds

reduce() can be categorized as a balanced fold, or a fold that allows for parallelism. Compare the following for the Sequence (a, b, c, d).

foldLeft() with the operation op would look like:

op(op(op(a, b), c), d)

While a balancedFold looks like:

op(op(a, b), op(c, d))

Can you see how this operation could be parallelized with a fork-join data structure?

Thinking of reduce() as a Balanced Fold

If we look back at the reduce of the findMin operation, its operational execution looks like:

Parallelizing reduce - Visualized

This is a balanced fold! Which can be written as either:





foldLeft/right are methods made available on many monadic collections. However, let’s focus on the List collection which provides the following signature and implementation of foldLeft():

def foldLeft[B](z: B)(op: (B, A) => B): B
  if (this.isEmpty) z
  else op(head, tail.foldRight(z)(op))

This reads:

foldLeft is of type B, takes an initial element z, performs the operation op on each element in the Traversable object, returning a Type A in each accumulator iteration, and eventually returns a type B

foldLeft()’s operation is NOT a Monoid

We know that foldLeft’s predicate operation has a non-Monoidal signature, as it breaks all three monoid laws, but why again is this not a transformation supported by Spark? Simply put, it’s because foldLeft is not sufficiently parallelizable!

Looking back at the third Monoid law, it states the following must be true:

op(op(x,y), z) == op(x, op(y,z))

This law is what drives the ability to parallelize. Spark can fork a monoidal operation across a dataset into n number of operations and join the resulting values within the master. This fork-join parallelization results in a best-case decrease in execution time by a factor of n.

foldLeft() cannot be Parallelized

If we pretend foldLeft was a transformation available on Spark collections, and visually walked through its execution, maybe we could more easily understand it’s limitations.

Given the following collection and transformation lets see it in action…

val nums = List(2.2, 3.3, 4.4)
nums.foldLeft(1)((agg, next) => (agg * next).toInt)

Parallelizing foldLeft - Visualized

We can see why foldLeft() was never implemented within Spark, as the fork-join execution model would still result in serial blocking of computation!