During the conference circuit in 2018, we paid close attention to talks about how other companies are doing A/B testing. We do a lot of A/B testing at G/O Media so this is a topic near and dear to our hearts.

One of the themes we noticed immediately is that many companies have fully-featured no-fuss A/B testing dashboards. These dashboards let users set up and run A/B tests, monitor their behavior over the course of the test, stop them when they’re done and automatically pull relevant results. By building these dashboards, these teams were able to make A/B testing a fully streamlined, even easy, process.

Our current process is significantly more manual. Setting up our A/B tests involves coordinating a number of systems: GA Experiments, GAM key-values, our CDN and our own features service. Individually these work pretty well, but pulling the levers on the individual pieces has historically required a bunch of manual effort. Moreover we have to manually pull results from many different sources, unlike companies with A/B testing dashboards that have one fully in-house analytics pipeline.

Naturally, building tools that would make this process more streamlined appealed deeply to us. In fact we had already done some preliminary work here: We’d already built out a number of CLI tools to help with this process, and in fact, we did a hackathon project in May last year that tested out some of these ideas in a simple Flask app. After seeing enough of these talks about A/B testing dashboards and saying to ourselves that it would be totally sick to have one of these ourselves, the engineering team decided to build such a dashboard deeply integrated into Kinja’s admin interfaces.

Kinja, like many content platforms, is built out of a front-end that serves HTML and javascript and a number of backends that the front-end and browser communicate with HTTP and JSON. Our A/B testing system was to be no exception.


This post is about that backend layer. We’ll explore how we chose Python and Flask for our stack and our need for building an API framework on top of it. Then, we’ll get into the API framework our backend built on top of Scala and Play. We’ll go over some of the good features of this framework - in particular, a control abstraction called an Accumulator and a serialization system built on top of data objects. Then we’ll discuss how these work (or don’t!) in the Python world. Finally, we’ll get into the nitty-gritty and go over how we carried these concepts over to Python, with code!

Why Flask? Why Python?

One of the consequences of choosing to deeply integrate our A/B testing dashboard into Kinja was that we had to build the necessary infrastructure into Kinja’s primary AWS account. The data team’s work is often fairly siloed from the rest of the engineering team. We have our own AWS accounts, for instance, and none of the Kinja APIs call into or out of our data infrastructure at an HTTP level (typically, our integrations are at the database level). Because of this integration however, we needed a Kinja-native A/B testing API that our UI can call into to manage our tests.


The Kinja backend team generally uses Scala, Play and rabbitmq to implement their services, and over time has built a fairly comprehensive platform around that stack. Our team, however, mostly works in Python. Additionally, all of the existing tooling for managing A/B tests is written in Python. Rather than reimplementing this functionality in Scala, building a Play service and then needing the backend team to manage our A/B testing tooling, we decided to write our new service in Python, leveraging the existing libraries and making a service that, hopefully, the data team will be comfortable maintaining and iterating on for years to come.

For our web framework, we chose Flask. I have my criticisms of Flask, but Flask also has a number of distinct advantages for our team. For one, members of our team have used Flask previously to build simple dashboards, as well as our ingestion pipeline’s admin interface. Flask is also relatively lightweight - this does mean that less stuff is done for us, but it also means that there’s less stuff that we need to customize. Finally, Flask plays relatively nicely with SQLAlchemy, our database access library, via the Flask-SQLAlchemy Flask extension.

API Frameworks

In my experience, most backend teams design standards for their internal APIs and then over time build a bunch of custom libraries and additions/modifications on top of their base stack, constituting what I like to call an API framework. API frameworks aren’t unusual - in fact open source frameworks such as django REST framework are quite common! However, many stacks are idiosyncratic enough that one size doesn’t fit all. Customization still happens.


Our backend team has similarly adopted Play to its own needs by building such a framework. This sort of thing makes it much easier to make a consistent new backend service, allowing engineers to think more about the logic of their APIs. In general, when we make a new backend service, our backend team can iterate pretty quickly by using an existing backend service template and using idioms in common across the Kinja backend codebases. The advantages are similar to the ones you get from a framework such as vanilla Flask, just specific to HTTP/JSON APIs and specific to our business’s domain.

A major disadvantage of using Python and Flask for our A/B testing backend is that we aren’t able to leverage this existing API framework. Traditionally, the data team has only used Flask to build out simple HTML dashboards, and over the last few years we’ve been moving away from them and towards building dashboards in our BI tool. This means that we had to build out a bunch of Python infrastructure to stand in place of the backend team’s existing platform - in other words, we had to port the Kinja backend team’s API framework to Python and Flask.

What Do The Scala APIs Look Like?

Our goal then was to build an API framework that will allow us to set up a Kinja-style API in Python. This meant taking idioms developed over the years by the Kinja backend team in Scala, and doing our best to port them to Python. Ideally, a user shouldn’t be able to tell whether they’re working with a Scala service or a Python service - the shape of the APIs should be indistinguishable. At the Python layer, things should still feel Pythonic while also retaining the best features of the Scala APIs.


In light of that, it will be instructive to look at how the Scala APIs work, so that we can pick the best ideas to port to Python. In this next section, we’ll look at a completely fake and made up example that involves working with a simple Person model, roll it around, and try to understand what the pieces do.

I don’t expect you to be fluent in Scala! That said, you should be game to learn some Scala and perhaps be a little confused. If Scala doesn’t interest you, feel free to skip this section and go straight to the part where we write the Python framework.

Kinja APIs always return a JSON payload with the same general shape. For our users API example, such a payload might look like this:


Any route that runs on Kinja may either return a non-null error (with a code and a message) or data. A successful response may also come with a list of warnings. For our purposes we can skip the details of how warnings work and focus on errors and successful results.

The example controller that supports creating people in our example API looks something like this:


At a really high level, what this code is doing is pulling user fields from the POST body, saving them to a database, and returning the result.

There are a few major things to explore here. The first of those is figuring out what “for”, “<-” and “yield” mean in Scala (not the same things as in Python). The second, which is related, is what the API is doing to render the result (a saved Person) into a JSON response. Finally, it’s worth looking at how the model and database code works, since we’ll need some form of that in our app as well.

What’s a for comprehension? What’s an Acc?

A for comprehension is an expression that is as far as I know unlike any in any other language, though it has some similarities to Python list comprehensions, Haskell do blocks, and async/await. “Really? These all seem pretty different,” you may be saying, but bear with me.


For comprehensions are ultimately sugar on top of map and flatMap methods of, well, things that have map and flatMap defined on them. This means that the most intuitive example is going to involve list-like data structures.

For instance, here’s a simple scala for comprehension:

for (i <- 0 until 3; j <- 0 until 3) yield i + j

which in a REPL returns:

res0: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 2, 1, 2, 3, 2, 3, 4)


The idiomatic way to write this expression in Python is with a list comprehension:

>>> [i + j for i in range(3) for j in range(3)]
[0, 1, 2, 1, 2, 3, 2, 3, 4]

or to be a little less idiomatic but to really show what the for comprehension is doing:


>>> from itertools import chain
>>> list(
... chain.from_iterable(
... map(
... lambda i: map(
... lambda j: i + j,
... range(3)
... ),
... range(3)
... )
... )
... )
[0, 1, 2, 1, 2, 3, 2, 3, 4]

This is actually fairly similar to what the for comprehension in Scala is doing. We don’t have a flatMap function in Python as far as I know, so we have to use itertools.chain, but conceptually it’s the same - the for comprehension under the hood gets rewritten as maps and flatMaps. In this way, it can be used similarly to list comprehensions.

The interesting part about for comprehensions though, and the thing that makes it useful to us in this context, is that they can operate on any objects that implement “map” and “flatMap” as methods. In Scala, this turns out to be a lot of things, including Scala Futures and, more importantly, an abstraction built by our backend team that has similar use cases to those of Futures but custom-tailored, called an “Acc” - short for “accumulator”.


In fact, if you squint hard enough, “<-” looks a lot like the “await” keyword in Python, and “for” seems to stand in place for “async”. Perhaps if we had implemented our service in asyncio and were using some mysterious analogous framework, our controller would look something like this:

Like in the Scala example, our async def under the hood works with and returns an abstraction that wraps asynchronous results - in our case, asyncio coroutines and Futures.


I don’t want to mislead, however, as coroutines have significant differences from simple maps. In Python, coroutines are generators of Futures (or awaitables more generally) and additional coroutines, and “running” a coroutine involves exhausting that generator and attaching behavior to the “add_done_callback” method of each Future. This is unlike the for comprehension, where the returned result is already a Future (or Acc or whatever compatible type you use with it). Additionally, within the control flow of the coroutine, exceptions wrapped by a Future will be raised. In other words, you can wrap an await in a try/catch:

result = await async_call()
except SomeError as exc:
# do something with this error

No such behavior happens with for comprehensions. In Scala it’s generally considered bad form to raise exceptions, and in fact our Scala code always keeps error objects wrapped in Accs. For our Scala Futures, calling “map” on a Future wrapping an error datatype will be a noop. The end result however is similar, since the result of calling ensure_future on a coroutine is a Future which wraps any unhandled errors.


Either way, the take-away from all of this is that ultimately we need some way of managing both successful results and exceptions. Because we’re using Flask and not asyncio, our implementation won’t strictly have a class that’s analogous to Kinja’s Acc or asyncio’s Future; instead, we’ll be dealing with returned results and raised exceptions. Even so, with a little bit of work to adapt this idea to the control flow abstractions idiomatic to Flask, we can achieve the same goals.

What’s a Person and how do you save one?

You may have noticed that instead of returning some sort of Response object (or rather, a Response object wrapped in an Acc), the Scala code is returning a wrapped “Person”, the thing returned from personLogic.create. So what’s a Person?


Our definition of a Person in Scala looks simple enough:

Case classes in Scala are a special type of class designed represent data. They’re called case classes largely because of how they can be used with pattern matching, a pretty cool Scala feature that we don’t happen to need here.


More important to us, they define a bunch of standard behavior without a lot of boilerplate. The closest things to them in Python are the all new dataclasses and the older but more common attrs classes. Like Scala case classes, attrs defines a reasonable __init__ method and includes more features besides. For example, an analogous attrs class might look like this:

In both the Scala/case class and Python/attrs situations, these classes are mostly used for representing structured data.


The Scala apps use case classes as their models, and use a repository pattern to keep data access and db writes cleanly separated from data representation. We on the other hand are using the SQLAlchemy ORM, which is highly regarded not just on our team but among the Python community more generally. However, it also has data access and data representation fairly coupled. What this means is that while our Scala controller will only need to know how to handle case classes, our Python code will have to handle SQLAlchemy models directly.

How do you turn an Acc and a Person into JSON?

Ultimately, the data structures wrapped in an Acc have to be translated to JSON - but how?


JSON handling in Play is super powerful but use advanced Scala features are frankly hard to understand. Rather than get into exactly how JSON serialization works in Scala, I’m going to go over some high level features of Play’s JSON library.

In general, when serializing data structures to JSON, Play looks for an implementation of a serializer for that data structure - called a Writes[T] for anyone still reading Play’s JSON documentation. These can be implemented for any data structure T. Case classes happen to be super straightforward to serialize, and in fact Play has special support for them - not 100% automatic, but pretty close.

The JSON library in Python by default knows how to handle simple data structures such as lists and dicts, but doesn’t know how to handle arbitrary classes, such as attrs classes and SQLAlchemy models. This seems to present a minor challenge to us, but as it turns out there are libraries in Python that can be used to implement more advanced serialization techniques - more on that later.


Ultimate Take-Aways from Scala and Play

The techniques used in Scala and Play, while in an environment wildly different from Python and Flask, still have some lessons to teach us, and some ideas that we can implement in our API framework.

First, while a straight port of the backend team’s Acc is a weird flex in Flask and Python (actually something I tried initially, it was bad), there is in fact a general lesson around control flow. The techniques used in a synchronous context with exceptions is pretty different, but ultimately we have many of the same needs: we need to be able to generate result data structures, and we need to be able to contain and handle exceptions.


We also need to think about data access and data serialization, both interrelated. We don’t need to implement the repository pattern in SQLAlchemy - SQLAlchemy’s ORM offers an alternate set of abstractions - but we do need to manage SQLAlchemy models and we need to be able to serialize them. In fact, ideally we would be able to serialize all sorts of data structures, with the capability of customizing that serialization in a way analogous to what we do in Play.

What Does This Look Like In Flask?

This app written with Flask and using our framework could look something like this:


Some things I want to call out directly:

  1. We’re using blueprints to collect routes for later registration with our Flask app while the Scala app exports the controllers and uses a routes file to configure Play. This is why our Flask example has routing code, while our Scala example does not. Ultimately this is fine and doesn’t require much additional attention.
  2. Instead of case classes, we’re using flask_SQLAlchemy and SQLAlchemy models. Like previously mentioned, this just means that we have to serialize SQLAlchemy models instead of (or in addition to) attrs objects. This snippet includes not just the constructed model, but also a custom “unstructure hook” for telling our framework how to convert our data structure into something more readily JSON-serializable. More on what this is actually doing down the page.
  3. Unlike the Play app or the asyncio handler, there’s no aggregation of results or errors here; exceptions are directly thrown. This means our app has to be able to capture those errors and handle them by returning an appropriate status code. The ultimate functionality is analogous to what we saw in Play, but more idiomatic for how Flask is used.


This means that in order to make Flask meet our use cases, we need to do the following:

  1. We need to figure out how to get Flask to serialize the types of objects we want to return in our handlers into meaningful JSON. These objects can be simple data structures but should at a minimum include SQLAlchemy models but ideally other structured data objects such as attrs objects as well. By default Flask can handle Response objects and will treat strings as HTML, so we’ll need to do some work here to override or modify this behavior.
  2. We also need to make sure that we can handle errors. Flask’s error handling exists but is super simple and definitely not JSON, so we’ll want to modify the default error handling - lucky for us a fairly common and well-supported case in Flask.


Our end framework supports significant amounts of additional functionality on top of this, but for the purposes of this post I’ll focus on these fundamentals.

Serializing Results in Flask

In Flask, results returned by route handlers are ultimately passed to the make_response method. In case that link is dead, the current docstring looks like this:


As can be seen, this method can handle a lot of different kinds of values! Overriding this method means reimplementing a bunch of this logic. In our case, we wanted to make two important changes:

  1. In stock Flask, returning None in your handler will make your application crash. In our case, None is a valid result since it will JSON-serialize to null.
  2. For non-Response return values, we want to attempt to JSON-serialize them

The code inside our custom make_response, in a subclass of Flask, then looks pretty similar to what’s in stock Flask, but this section in the middle looks a little different:


In this handler, we skip the None check, explicitly force_type on Response objects from wekzeug (distinct from response_class!), and then finally call a function called “success” that we wrote for serializing the return value.

That “success” function looks like this:


which calls create_response:

which ultimately calls this helper:


You can see that we took pains to keep the behavior of these largely compatible with flask’s flask.json.jsonify helper. The important part of all this, the part that makes it all work, is this one line from _to_json:


The magic then is in the cattrs library. There are other ways of managing serialization and deserialization of complex data structures in Python - marshmallow also seems common amongst other Python engineers I’ve talked to - but cattrs is nice because we can use it with attrs classes the same way that we use case classes in Scala.


The cattrs library also supports a very important need of ours, which is customizing this behavior for non-attrs classes, namely with the cattr.register_structure_hook function. The Scala apps find needs for customizing JSON serialization as well, but we have a particular need for it because of how we do data access.

Error Handling in Flask

Once serialization is handled, next comes error handling.

Flask comes with a built-in system for registering error handlers which takes an exception class and a callback which converts that exception into a return value that our make_response helper will know how to handle.


Flask suggests using a pattern that they call app factories. We ended up implementing this pattern in the create_app function used in our earlier route example. Our create_app function merely instantiates our subclass of Flask, sets up some internal state, registers these event handlers, adds some health routes, and returns the app object. That looks like this:

You can see here that we ended up having to override a few different exception classes. Overriding the base default exception handling requires registering a handler for Exception, but we also had to create one for Invalid errors from our schema validation library and for HTTP errors. I’ll focus on that last one.


Our app uses Werkzeug exceptions to represent HTTP-compatible error states. We capture them by creating an error handler for Werkzeug’s HTTPException like so:

app.register_error_handler(HTTPException, http_error_handler)

and the actual handler looks like this:


log_and_generate_uid creates a uid, logs the exception with that uuid and returns the uid so we can include it in the error response - this is an idea that we borrowed from the backend team, which uses these uids to inspect stack traces corresponding to errors returned to users. kinja_error_code_from_exc simply returns a string representation of our Werkzeug error which matches the string representations seen in the Scala apps.

“error” looks a lot like the “success” helper from before. The secret within is that Werkzeug exceptions typically have a ‘code’ property which contains the HTTP status code (distinct from the Kinja error code!), so we’re able to use that status directly if none is supplied:

status = status if status else getattr(exc, ‘code’, 500)

Putting all of this together: When an exception is raised inside a controller like the “create” function in this example, Flask finds the error handler registered in our app factory that best matches that exception - say a Werkzeug HTTPException - and passes the exception along to the registered handler. Our error handler then kicks in, converting the exception into a more readily serializable data structure. Finally, that serialized data structure is passed to our overridden make_response method, which then uses cattrs to finish the job. How cool!



In order to build a nice A/B testing UI, we needed to create an API framework on top of Python and Flask. Our backend team uses Scala and Play. This meant that while we couldn’t use their work directly, we were able to learn lessons from their work and apply it to our own framework.

We saw that in Scala, our backend team uses a custom control flow abstraction called an Acc in combination with for comprehensions to manage asynchronous actions and collect results and error states without raising exceptions. This is unlike the case with Flask, which expects synchronous behavior and raised exceptions in cases of errors. We also saw that in Play we can serialize fairly complicated data structures, a feature we wanted to implement in our own framework.


Flask’s internal APIs enable handling both exceptions and complex structured results. By overriding the make_response method and adding the cattrs library, we were able to get Flask to not only JSON-serialize simple data structures automatically but also add behavior and hooks for serializing more complicated data structures such as attrs classes and SQLAlchemy models.

So that’s how we customized Flask for use as an API framework! Or, at least, this is the beginning. We of course had to implement our app.requires_auth decorator as seen in an earlier example, as well as implementing warnings (hint: we used the `g` object and an importable `warn` function). We also had to grapple with how the Play apps manage application dependencies - perhaps in a follow-up blog post!