Here are the text and slides from my talk today at Strata Hadoop World NYC.
The title of my talk is web analytics in the platform era: I’m going to be talking about the ‘Gizmodo Media’ experience.
A couple of administrative notes:
- The full text and slides of this talk are available at gawkerdata.kinja.com, so if you want to follow along there or catch up because you were distracted by twitter and missed one of my pearls of wisdom, you can do that.
- Speaking of twitter, I also have posted a link to this talk on my account: @joshlaurito, if you have an easier time navigating via social platforms.
- Also, I don’t know if we’re doing Q&A’s here: I’m happy to talk with you about the whole Gawker Media thing and what’s been going on there in the last year later today or later in the conference but let’s keep any questions this session about platforms. Feel free to grab me later or ping me on twitter.
Ok, with that out of the way, let me give you a little background about Gizmodo Media: we are a medium sized digital media outlet with 6 brands: Deadspin, Gizmodo, Jalopnik, Jezebel, Kotaku and Lifehacker.
- These properties combined are visited by over 90 million unique visitors per month, generating about 400 million page views; all the traffic data I’m mentioning here is available publicly on Quantcast, which I’ve linked to in the online version of this talk
- We employ a bit over 200 people
- Last year we made about $50 million in revenue and are looking to increase that by more than 10% this year
- We publish about 250 posts per day, so not on the order of the really big news outlets but still a lot for a small analytics team to keep track of
We were acquired by Univision out of bankruptcy 18 days ago; Univision is best known for its Spanish language properties, but has recently put together a pretty impressive collection of English language digital properties, including Fusion, The Onion, the Root, and now us: I’m really excited to be working with these new friends, but we haven’t done so yet. So none of what I’m going to say applies to any of them, don’t blame them if (when) I say something stupid.
So let’s talk about platforms. Whenever I say ‘platform’, from a digital media perspective I really mean distributed content: anyone or anything that allows users to read content off of our sites. By this definition, platforms have been around for a really long time starting with emails and RSS feeds, and any other general way of pushing content to clients other than on our own website.
So why are we talking about platforms now? There are a bunch of reasons but by far the most dominant one is the move to mobile device consumption of news. Gizmodo Media has moved from about 20% of visits from mobile devices in the beginning of 2012 to about 60% today.
This move led to a brutal squeeze on publisher economics: mobile advertising as an industry didn’t develop nearly as fast as the traffic did. Mary Meeker’s Internet Trends Report for this year shows that mobile still makes about half as much money per time spent as desktop. And that’s a huge improvement over the dark days of 2013, when mobile CPMs were still under a dollar.
This created bad incentives for publishers. With the price per impression crashing, publishers tried to make up for it by loading additional ads and creating bigger, more obnoxious ones. This was manageable but annoying on desktops, but absolutely brutal for the mobile experience
It got so bad that a local rag even put together a whole interactive about mobile webpages: how slow they are and how much they’re costing you. Several sites came in at over 10 seconds per pageload. For a news story! In the interest of full disclosure I should mention that Gawker.com is listed on this interactive, but thankfully a bit further down. You can see how bad we were in the link in the posted materials.
Into this mess, Facebook saw an opportunity and in May of 2015, announced Instant Articles. Technically, Facebook instant works as an RSS reader: we load in articles, modified to fit their HTML5 format and restrictions. There are other ways to send Facebook Instant Articles, but this is the one most publishers use.
Not to be outdone, Google launched the AMP project, going live in February of this year. AMP works in a similar way to Facebook Instant: we, the publisher, create a restricted version of a website, which Google caches and is able to serve out faster than we can serve our own sites. AMP is open-sourced: the code is open to anyone. Effectively Google is trying to build a credible alternative to Facebook in this space, hoping that other players like Twitter and maybe Amazon adopt this standard to put it on even footing with Facebook.
Facebook Instant and AMP are just two of a number of platforms we can publish to. In the last two years there’s been an explosion in both the number of platforms and their importance in terms of the traffic that they’re able to provide. Primarily in this talk I’m going to be referring to Facebook Instant and Google AMP, but Snapchat Discover and Facebook Live are also major platforms, and there are a number of other platforms that deserve to be mentioned even though they aren’t huge drivers of traffic right now including Amazon’s Alexa, Apple News and Apple Spoken Layer, and many other content aggregators that ingest data from publishers and provide it to their users.
The value proposition of the platform-publisher relationship is pretty straightforward for the platform: ingesting feeds from publishers keeps users engaged longer on the platform and allows the platform to control the user experience. Also, this allows platforms to gather more data on individual users and what they’re interested in, and gives them more inventory to advertise on.
From the publisher’s perspective there are advantages as well. Publishers think they may get preferential treatment from platform algorithmic discovery engines, meaning their stories are more likely to show up in Google search results or the Facebook newsfeed. Both Google and Facebook walk a really fine line on this concept: we’ll get back to this point later in the talk. Platforms also may offer access to a new class of readers and advertisers, so it’s a new set of users for publishers to bring on board. Also, working with a platform potentially increases the value of a publisher’s inventory, thanks to additional data that can be provided by the platforms. For smaller publishers that don’t have the scale or relationships to bag a major advertiser, this is an especially important point. Finally, working through platforms might give publishers a better chance to combat ad blockers. Publishers generally become irrationally angry, apoplectic even, at the thought of ad blockers: you might too if someone was able to access your product for free. But we all suffer from a collective action problem: while we can detect when someone is using adblockers on our site, preventing them from reading the news might drive them away. Facebook and Google, since they have less in the way of real competition, are more likely to get users to accept ads. They also have a much stronger incentive to make the user experience smooth, so ads are unobtrusive and not detrimental to the reading experience.
Let’s get a little more into the publishers’ decision making, in particular at Gizmodo, Publishers are famously political organizations that typically have a tremendous amount of tension between different parts of the org chart. Gizmodo Media is organized into three major constituencies: there’s the editorial team which is the writers & editors who are mostly concerned with their ability to write freely and getting an audience. Business and revenue’s primary goals are to maximize revenue and to attain stable growth. Finally there’s the product and tech teams, who primarily value operational simplicity, scalability, and extensibility. They want the ability to quickly and effectively build out new products & features for the other parts of the organization. I should note some organizations don’t break out Tech & Product into its own part of the team, instead folding it into the Business or Editorial side. If you squint at your favorite publisher’s site, you can probably tell which one tech reports to, since, to paraphrase Steven Sinofsky, most organizations ship their org charts.
Editorial is almost totally on board with this change. From an editorial perspective working with platforms is an obvious way to increase their reach and potential impact on the greater news conversation. There are slight reservations around being boxed in by another layer of management: losing the freedom they have to publish and platform manipulating news coverage, but generally the idea of getting additional people to look at their posts is a positive. One other concern for the editorial point of view is the additional workload to fit stories into different form factors for different platforms: At the Online News Association conference earlier this month, an editor of the publisher NowThis mentioned that they took content from a recent interview with Joe Biden and turned it into 60 pieces of content across 14 Platforms! However with totally passive applications like Facebook Instant and AMP these aren’t even a concern.
As advertisers form stronger relationships with Facebook, the revenue team knows that there might be issues down the tracks. But Instant Articles and AMP are generally seen as free options to increase inventory and improve delivery rates without much work on the business side, allowing us to grow the business in new directions. Given how dire the economic situation has been for publishers, any type of meaningful revenue source looks good.
I should mention here that Facebook has paid and continues to pay Gizmodo Media to create Facebook Live content: we have commitments to produce a certain number of videos and a certain number of minutes every month. We’ve been happy with the arrangement. Given our aggressive reporting on Facebook, I’m not sure they are as happy but they’ve been excellent, mature partners.
From a technical perspective, though, the move to platform publishing has been tough: a huge investment just to keep up with the state of the art.
Once upon a time we only had to worry about a single website. Less than a decade ago when mobile traffic starting becoming meaningful, we generally created separate mobile pages so people could access our sites on their phones. You might remember those URLs that started with an ‘m’, like ‘m dot espn’ or ‘m dot cnn’. A few years ago with the launch of tablets, responsive design came into vogue and we designed three (or more) separate versions for different screen form factors.
With the launch of multiple different platforms which we can integrate with, we now not only still maintain our responsive design pages, we also need to adjust our content management systems to support Facebook Instant feeds and feeds potentially posting to AMP, Alexa, and any number of other partners.
And we aren’t even done yet! There is concern that with open platforms like AMP, there’s the possibility of additional closed source components being built on top of them which would potentially require additional work. While this is all do-able, each additional version of our site increases the risk and complexity of making any changes to our system, and makes QA’ing and identifying bugs that much harder. And we don’t even support mobile apps.
Some of these issues can be outsourced: Gizmodo is relatively unusual in that we have built our own content management system, or CMS, which is called Kinja. But there are still additional operational considerations: if posts no longer live on our website, our ability to edit and remove them needs to be thought through. Additionally we need to account for different editorial guidelines: what’s allowed on one platform may not be allowed on another, so our management of what’s appropriate needs to become much more complicated. Finally, each of these platforms have their own rules for UI elements and tracking, which elements can appear on a page, and what reporting we can get out of that page.
To get a sense of what I mean by UI differences, let’s look at one of our pages in both Google AMP and on our own site. This is a fairly representative post from io9, which is the science fiction section of Gizmodo. Our mobile page is on the left (or above), the AMP page is on the right (or below).
The first thing you’ll probably notice is that the GIF auto-plays on AMP, but not on our site. We like a lot of publishers load our gifs asynchronously for a mobile visitors for better performance. The animation module in amp does not support this. We technically could support it by putting the gif in an iframe, but then the load would lag behind the rest of the page, and we would have to restrict what sizes of GIFs our writers could use on all pages, or adjust this GIF on the fly.
You’ll also notice directly above the GIF on the right side there are a few icons that are missing: these are tracking the number of recommendations and the number of pageviews on our site: obviously these update over time, so depending on when AMP caches our site, these would potentially be stuck at incorrect levels, and even if we wanted to put them into AMP, they’re generated by a different service internally, so we’d need to add an additional, major dependency to our site. At the top you’ll notice that my login information and my ability to see notifications is gone in the AMP version: this is because we can’t support login on AMP. And then you’ll notice a few inconsistencies that are just due to errors or changes that haven’t been ported to AMP. At the very top left the hamburger menus are slightly different, at the very bottom you’ll notice in one case the share buttons for Facebook and Twitter are centered and in the other case they are over at the corners, and there’s a difference in the title font sizes, etc, etc…
These differences in our page experiences cascade into meaningful data challenges. For example, A/B Testing across our site is no longer possible: we can’t load our testing libraries on these pages, and even if we could, the changes in UI elements and our ability to track reader activity would corrupt the results.
Google AMP does support limited A/B testing of CSS elements. In order to track this, we need to isolate traffic from each platform. This is relatively straightforward to do: most of our partners allow us to instrument each platform’s call so we can identify traffic to, say, a Facebook Instant article or a Google AMP page. We’re also able to run some platform testing in parallel with the site’s own testing. This does add overhead to our testing process, of course.
We have actually implemented A/B Testing on AMP, which I’ve written about on our website. Their solution is pretty clean though I think the documentation could be improved. One nice thing about the open-source nature of AMP in particular is that we’re able to make suggestions and engage with the maintainers in a productive way and transparent way.
Here you can see an issue I filed around getting Google’s AMP experiments to work natively with Google Analytics, which they somehow don’t right now (remember, you ship your org structure). Of course, Google is under no obligation to actually listen to anything I say, and the issue has just been sitting there for a month. Oh well.
A bigger issue is data transparency and consistency. Data has been in the news recently with Facebook announcing that one of the metrics they are providing to advertisers has been calculated incorrectly for two years, making user engagement look better than it actually is. While this is just one metric among many, I think this is a good reminder that we aren’t working with an organic phenomena. Instead of optimizing for a stable set of parameters like the lift of an aircraft or even the behavior of crowds, we’re really working with the output from algorithms generated by companies with shifting priorities, and we have to worry that their decisions or values will change in the future.
The poster child for this concern is a website, who I won’t name but which was well known for its uplifting but saccharine videos, its ‘curiousity-gap’ headlines, and its very sophisticated and successful understanding of the Facebook Algorithm. This is a chart of their monthly traffic over the last few years. Keep in mind that Facebook made a major change to its newsfeed algorithm in early 2014 which were widely considered to be hurtful to this publisher. Effectively, the system they optimized for was pulled out from under them. This is a great lesson for us: optimizing our pages for social traffic is undeniably valuable, but keeping our system easy to update so we can switch tracks quickly might be even more so.
So we’re still in the construction and implementation phase of our platform partnerships, but let’s talk about our experience so far. Some things have been easier than we thought: there’s been effectively a 100% agreement between the platforms on what is appropriate and what is not, so we haven’t had to manage multiple types of content filters. But there have been some operational issues we didn’t expect: one is that we occasionally need to update posts or even remove them. When we post to third parties, updating or changing posts becomes something of an ordeal, especially since corrections and updates are typically very high priority. Building systems to effectively update posts from all platforms, or to prevent new posts that don’t work on a certain platform for technical reasons has been a major consumer of time and effort.
Another issue I mentioned earlier is correctly gathering and unifying stats. It’s been really difficult to compute certain stats, like session length or engaged time, across all our different platforms. But even harder to think through is the fact we also face a number of competing conventions.
To take a micro-example, let’s look at one visitor who finds us through a Google search, goes to the AMP version of our page, and is interested enough to go our homepage (which is not on AMP). The convention for web-statistics from a single visit, session-level stats, is the state of the user at the end of their session. This usually makes sense: websites are conversion machines, and if you move a user down your conversion funnel, such as converting them from a guest to having an account, you want to treat them as having an account for the rest of the visit. However, if a visitor comes in through a Google AMP page and then clicks on our homepage to go to our site, convention would be to treat this user as a regular-site visitor for their entire visit.
Of course, this would be a misrepresentation of how we got this user. So we could break convention for this particular case and use the initial platform the visitor came from. However, now we’ll be overstating the amount of time users spend on our platforms.
We also could split the sessions between the time on the AMP post and the time on our page. Now when we aggregate activity on AMP and on our site, we get the correct number of pageviews and amount of time spent on each. Unfortunately, we’ve now doubled the number of sessions we’re reporting.
Finally, we could change ‘platform’ from being a session-level stat to being a page-level stat. Now we accurately capture the time spent on each platform, the number of sessions, and which platforms are helping obtain traffic. Unfortunately, we’ve added a new page-level field that our analytics partners may or may-not support, and we’ve increased the number of rows in your fact tables 10x. We also now have de-coupled pageviews from sessions, so we need to do some complex calculations to find out how many sessions started on AMP, which is probably what we wanted in the first place
In terms of our two most important platforms, Facebook Instant Articles hasn’t really generated new traffic: this is something generally agreed-upon with our partners and other publishers. It is possible there is an advantage to Facebook Instant and we’re just all competing in an arms race with one another, but we don’t see great data to indicate that.
Right now a bit less than 10% of our traffic comes to us through Facebook Instant, but it is all cannibalizing traffic we would have gotten from Facebook anyways.
Facebook has been really strong on the monetization side: we are seeing some publishers who are able to make as much money on a Facebook Instant page on their own page. Earlier this year, Facebook announced that some publishers, specifically Business Insider and Mic, were making as much money on Instant Articles as they were on their own site. While we aren’t there, it’s undeniable that the tools Facebook have given us, and the audience they can bring, are a positive.
Unfortunately, just a few months later, Facebook and one of those publishers, Mic, were back in the news. Apparently Mic was putting banner ads on their Facebook Videos, which somehow slipped Facebook’s notice. The ads didn’t even violate Facebook’s standards, but they did violate Facebook’s intentions were. So Facebook did what any cranky parent does: they immediately changed their standards and banned the ads. Nothing punitive, just removal.
Google AMP, which we launched with much later, has recently become a much larger part of our traffic. This is partially because we over-index on search traffic compared to our competitors since we’re so much older. Google AMP has had a measurable, demonstrable impact on our traffic as well. While Google insists that participation in AMP does not have SEO effects, they did create a mobile search carousel at the top of their search results page for AMP stories which acts as a huge leg up on search traffic.
Unfortunately, while AMP is bringing us people, we can’t monetize it at a reasonable rate yet. Restrictions on the types of ads we can run mean we have to be really creative about how we make our business sustainable.
Looking to the future, we have real concerns with the proliferation of platforms. The operational complexity continues to go up, and we worry about commodifying our product and severing loyalty with our readers. With the adoption of AMP: it’s likely that Google will try to turn this open source project into a deeply closed source walled garden by building important closed-source tools on top of it. We have a very bad precedent in Android: it hasn’t been a huge deal for mobile devices because there haven’t really been credible alternatives on mobile to Google standard apps. But in media, with Jeff Bezos getting into the news business and Twitter being a major player in referral traffic, I think there is a very real possibility of this project forking in a way that will require serious investment on the part of publishers.
But platforms should lead to better user experiences and better policing of abusive and malware ads, as the platforms have a much stronger incentive to police bad ads than the programmatic networks do today. They also do have the potential be the engine that turns online media into a truly sustainable business. We just hope they’re on the right track.