FP & the 'Context Problem'

[Originally written 2017-09-22. I don't have time to finish this post now, so I might as well just publish it while it's still not rotted.]

While coding large backend applications in Clojure I noticed a pattern that continued to pop up.

When learning FP initially, you initially learn the basics: your function should not rely on outside state. It should not mutate it, nor observe it, unless it's explicitly passed in as an argument to the function. This rule generally includes mutable resources in the same namespace, e.g. an atom, although constant values are still allowed. Any atom that you want to access must be passed in to the function.

Now, this makes total sense at first, and it allows us to easily implement the pattern described in Gary Bernhardt's talk "Boundaries", of "Functional Core, Imperative Shell" [FCIS]. This means that we do all I/O at the boundaries.

(defn sweep [db-spec]
  (let [all-users (get-all-users db-spec)]
    (let [expired-users (get-expired-users all-users)]
      (doseq [user expired-users]
        (send-billing-problem-email! user)))))

This is a translation of Gary's example. A few notes on this implementation.

sweep as a whole is considered part of the imperative shell.
get-all-users and send-billing-problem-email! are what we'll loosely refer to as "boundary functions".
get-expired-users is the "functional core".

The difference that Gary stresses is that the get-expired-users function contains all the decisions and no dependencies. That is, all the conditionals are in the get-expired-users function. That function purely operates on a data in, data out basis: it knows nothing about I/O.

This is a small-scale paradigm shift for most hackers, who are used to interspersing their conditionals with output; consider your typical older-school PHP bespoke system, which is bursting with DB queries that have their result spliced directly into pages. But, this works very well for this simple example. It accomplishes the goal of making everything testable pretty well. And you'd be surprised how far overall this method can take you.

It formalizes as this: Whenever you have a function that intersperses I/O with logic, separate out the logic and the I/O, and apply them separately. This is usually harder for output than for input, but it's usually possible to construct some kind of data representation of what output operation should in fact be effected -- what I'll call an "output command" -- and pipe that data to a "dumb" driver that just executes that command.

You can reconstruct most procedures in this way. The majority of problems, particularly in a backend REST system, break down to "do some input operation", "run some logic", "do some output operation". Here I'm referring to the database as the source and target of IO. This is the 3-tier architecture described by Fowler in PoEAA.

However, you probably noticed an inefficiency in the code above. Likely we get all users and then decide within the language runtime whether a given user is expired or not. We've given up the ability of the database to answer this question for us. Now we're reading the entire set of users into memory, and mapping them to objects, before we make any decision about whether they're expired or not.

Realistically, this isn't likely to be a problem, depending on the number of users. Obviously Gmail is going to have a problem with this approach. But surely you're fine until perhaps 10,000 users, assuming that your mapping code is relatively efficient.

Anyway, this isn't the problem that led me to discover this. The problem happened when I was implementing the basics of the REST API, and attempting to be as RESTfully-correct as possible, I wanted to use linking. This seems easy, when you only need to produce internal links, right? In JSON, we chose a certain representation (the Stormpath representation).

GET /users/1

{
   "name": "Dave",
   "pet": {"href": <url>},
   "age": 31
}

Now, assume we also have a separate resource for a user's pet. In REST, that's represented by the URL /pets/1 for a pet with identifier 1. We have the ability to indicate this pet through either relative or absolute URLs. Assume that our base URL for the API is https://cool-pet-tracker.solasistim.net/api.

The relative URL is /pets/1.
The absolute URL is https://cool-pet-tracker.solasistim.net/api.

If you search around a bit, you'll find that from what small amount of consensus exists, REST URLs that get returned are always required to be absolute. This pretty much makes sense, given that a link represents a concrete resource that is available at a certain point in time, in the sense of "Cool URLs Don't Change".

Now the problem becomes, say we have a function that attempt to implement the /users/:n API. We'll write this specifically NOT in the FCIS style, so we'll entangle the I/O. (Syntax is specific to Rook.)

 (defn show [id ^:injection db-spec]
   (let [result (get-user db-spec {:user (parse-int-strict id)})]
     {:name (:name result)
      :pet nil
      :age (:age result)}))

You'll notice that I left out the formation of the link. Let's add the link.

 (defn show [id request ^:injection db-spec]
   (let [result (get-user db-spec {:user (parse-int-strict id)})]
     {:name (:name result)
      :pet (make-rest-link request "/pet" (:pet_id result))
      :age (:age result)}))

Now, we define make-rest-link naively as something like this.

 (defn make-rest-link [request endpoint id]
    (format "%s/%s/%s" (get-in request [:headers "host"])
                       endpoint
                       id))

Yeah, there's some edges missed here but that's the gist of it. The point is that we use whatever Host URI was requested to send back the linked result. [This has some issues with reverse proxy servers that sometimes calls for a more complicated solution, but that's outside the scope of this document.]

Now did you notice the issue? We had to add the request to the signature of the function. Now, that's pretty much a small deal in this case: the use of the request is a key part of the function's purpose, and it makes sense for every function to have knowledge of it. But just imagine that we were dealing with a deeply nested hierarchy.

(defn form-branch []
  {:something-else 44})

(defn form-tree []
  {:something-else 43
   :branch (form-branch)})

(defn form-hole [id]
   {:something 42
    :tree (form-tree)})

(defn show [id ^:injection db-spec]
  (form-hole id))

As you can see, this is a nested structure: a hole has a tree and that tree itself has a branch. That's fine so far, but we don't really want to go any deeper than 3 layers. Now, the branch gets a "limb" (this is a synonym for "bough", a large branch). But we only want to put a link to it.

(defn form-limb [request]
  (make-rest-link request "/limbs" 1))

(defn form-branch [request]
  {:something-else 44
   :limb (form-limb request)})})

(defn form-tree [request]
  {:something-else 43
   :branch (form-branch request)})

(defn form-hole [id request]
   {:something 42
    :tree (form-tree request)})

(defn show [id request ^:injection db-spec]
  (form-hole id request))

Now we have a refactoring nightmare. All of the intermediate functions, that mirror the structure of the entity, had to be updated to know about the request. Even though they themselves did not examine the request at all. This isn't bad just because of the manual work involved: it's bad because it clouds the intent of the function.

Now anyone worth their salt will be thinking of ways to improve this. We could cleverly invert control and represent links as functions.

(defn form-branch []
   {:something-else 44
    :limb #(make-rest-link % "/limbs" 1)})

Then, though, we need to run over the entire structure before coercing it to REST and specially treat any functions. This could be accomplished using clojure.walk and it would probably work OK.

What's actually being required here? What's happened is a function deep in the call stack has a need for context that's only available in the outside of the stack. But, that information is really only peripheral to its purpose. As you can see we were able to form an adequate representation of the link as a function, which by no means obscures its purpose from the reader. If anything the purpose is clearer.

This problem can also pop up in other circumstances that seem less egregious. In general, any circumstance where you need to use I/O for a small part of the result at a deep level in the stack will result in a refactoring cascade as all intervening functions end up with added parameters. There are several ways to ameliorate this.

1: The "class" method

This method bundles up the context with the functionality as a record. The context then becomes referrable to by any function within that protocol.

(defprotocol HoleShower
  (show [id] "Create JSON-able representation of the given hole."))

(defrecord SQLHoleShower [request db-spec]
  HoleShower
  (show [this id]
    {:something 42
     :tree (form-tree id)})
  (form-tree [this id]
    {:something-else 44
     :branch (make-rest-link request "/branches" 1)}))

As you can see, we don't need to explicitly pass request because every instance of an SQLHoleShower automatically has access to the request that was used to construct it. However, it has the very large downside that these functions then become untestable outside of the context of an SQLHoleShower. They're defined, but not that useful.

2. The method of `maker`

This is a library by Tamas Jung that implements a kind of implicit dependency resolution algorithm. Presumably it's a topo-sort equivalent to the system logic in Component.

(ns clojure-playground.maker-demo
  (:require [maker.core :as maker]))

(def stop-fns (atom (list)))

(def stop-fn
  (partial swap! stop-fns conj))

(maker/defgoal config []
  (stop-fn #(println "stop the config"))
  "the config")

;; has the more basic 'config' as a dependency
;; You can see that 'defgoal' actually transparently manufactures the dependencies.
;; After calling (make db-conn), (count @stop-fns) = 2:
;; that means that both db-conn AND its dependency config were constructed.

(maker/defgoal db-conn [config]
 (stop-fn #(println "stop the db-conn"))
  (str "the db-conn"))

;; This will fail at runtime with 'Unknown goal', until we also defgoal `foo`
(maker/defgoal my-other-goal [foo]
  (str "somthing else"))

The macro defgoal defines a kind of second class 'goal' which is only known about by the maker machinery. When a single item anywhere in the graph is "made" using the make function, the library knows how to resolve all the intermediaries. It's kind of isomorphic to the approach taken by Claro, although it relies on more magical macrology.

https://www.niwi.nz/2016/03/05/fetching-and-aggregating-remote-data-with-urania/ https://github.com/kachayev/muse https://github.com/facebook/Haxl https://www.youtube.com/watch?v=VVpmMfT8aYw

See this: Retaking Rules for developers: https://www.youtube.com/watch?v=Z6oVuYmRgkk&feature=youtu.be&t=9m54s

And of course, the "Out of the Tar Pit" paper.

Update 2018-08-14: Two other solutions to this broad problem are the Reader monad (see this great Pascal Hartig article and what Sinclair refers to as the Effect functor.

Update 2020-08-12: This same problem reappears in React, where it's termed 'prop drilling' and is handled by use of the Context API (along with the useContext hook), and also the older 'render props' technique.

1: The "class" method

2. The method of maker

2. The method of `maker`