The Kinja data team uses Google Analytics as our “source of truth” on traffic and user behavior. As such, our A/B testing system has been deeply integrated with the analytics platform for a long time. Unfortunately, we recently ran into an issue with how our system works with GA.
We noted in previous blog posts that we use a server-side implementation of Google Content Experiments, which has allowed us to collect data within GA’s built-in dimensions. We created our kinja-experiments API around this, but ran into a challenge when Google shut down Content Experiments and pushed users to Google Optimize in August.
To integrate kinja-experiments with GA, we use the Analytics Management API, which allows one to manage tests that are stored in GA. We’ve used this to programmatically create, start, and stop experiments (although because we use a server-side implementation, starting and stopping here only starts and stops recording test data in GA).
Unfortunately, Google Optimize does not support this API.
This left us with a bit of a conundrum. Using the Management API was key to how kinja-experiments operated. Since we already assign users to different variants on our end and only use GA experiments for reporting, we decided to stop using Google’s experiment offerings and to create our own experiment IDs that we would store in custom dimensions in GA. This would require two things:
- Creating our own IDs in the kinja-experiments API and storing them in custom dimensions in our GA pageviews and events.
- Writing SQL to pull test data from Big Query to mimic how Google’s experiment dimensions are scoped.
Every experiment in Google Content Experiments is associated with an ID that is generated when you create the experiment. These IDs are used in various ways in our system—for example, we use it in an Edge dictionary to determine which version of the site Fastly (our Content Delivery Network) should serve. As part of our server-side implementation, we send the info to GA so that we could track what hits belonged to which experiment and which variation.
Without Content Experiments/Optimize, we have to create our own IDs. Because of how kinja-experiments works, this was actually a fairly easy task. We created a flag in our code to determine whether to use the Management API (in case we Optimize ever supports it and we decide to go back); if we aren’t using the API, then we run our own code to create our own ID.
This creates an ID identical in length and using the same characters allowed by the GA version. We also check our database of existing IDs in kinja-experiments to ensure we aren’t creating duplicate values.
Then anywhere we would use the GA experiment ID in kinja-experiments, we use the ID we’ve generated.
On the front-end, we retrieve the experiment ID and variation from the window and store that in two custom dimensions we’ve created: one for the experiment ID and another for the user’s experiment variation. Both of these are configured as hit-level dimensions, which means they can change from hit-to-hit (more on this later).
While working on our A/B testing infrastructure earlier this year, we discovered that GA’s experiment dimensions don’t necessarily follow the the hit-, session-, or user-level scoping that other dimensions fall under. We previously wrote:
- The GA experiment ID dimension is user-scoped for the duration of the experiment but it does not overwrite values
- The GA experiment variant dimension is also user-scoped for the duration of the experiment but can be overwritten
Because our custom dimensions are hit-level, they are vulnerable to two different types of issues compared to Google’s dimensions. First, a hit might not have our experiment dimensions set (perhaps if there was an issue retrieving the values from the window). Second, if a user is placed in multiple experiment variations throughout a session, they’d have different values set from one hit to another.
We’ve done several things to address this. To make sure we’re only including one experiment ID and variation per session, we created a SQL table of session ids with the max and min hit number of the session where the experiment id and variant were set. This is then joined on our table with our initial pull of user metrics.
Through another query, we’re then taking the latest values for each session and using that as the values for the entire session. Additionally, we’re excluding any hits that occurred before the
minhit to ensure we aren’t counting any hits that might have occurred before the test began.
Finally, we remove any users who had more than one experiment ID or variation during the course of our test. This allows us to ensure our results only include those who received a consistent experience throughout the duration of the test.
We were disappointed to learn that Google Optimize does not support the Management API, but ultimately we think this solution works as a decent replacement. Because of the work done to build kinja-experiments, we were able to move quickly to replicate GA’s behavior around creating experiment IDs.
This solution worked well for us since we had only used Google Content Experiments as a small portion of our A/B testing system. While this could works for others, one’s mileage may vary depending on how many of the features of Google Optimize one uses. Given that we used a server-side implementation, moving on from Google’s testing tools was mostly painless for us.