It’s becoming clear that Airbnb completed a large removal of New York City listings right before releasing data to the press last year. However, the manipulation may have never been discovered if not for a real estate trade reporter following up on tips from hosts. But as one of the few people who saw the data, I want to offer some insight into how it all went down.
Airbnb “released” some data on its activity back in December. The idea was to temporarily share information with journalists, policymakers and interested citizens, including me. The data was presented as a snapshot of Airbnb properties in New York City on November 17. It purportedly showed how the vast majority of Airbnb listings in the city were from individuals, not folks running illegal hotels through the site.
Then, things got tricky. Earlier this month, Murray Cox of Inside Airbnb and journalist Tom Slee recently claimed Airbnb removed listings immediately before this November 17th date. (If you’re interested in what the data looked like, the Buzzfeed team did an outstanding job of copying and publicizing the numbers).
Cox typically scrapes data from Airbnb at the beginning of each month. With the removal of listings happening in the middle of the month it would have been impossible to say with confidence that the removals happened before the data release. But then, Ariel Stulberg of The Real Deal, a NYC real estate blog, started getting tips from commercial Airbnb hosts that were getting kicked off the platform in November. (Stulberg mentioned these tips in his piece last week.)
Stulberg reached out to Cox, who did a special scrape of the Airbnb site for New York on November 20th in response. This information is what makes the change in Airbnb’s data pre-release so clear. The very idea that Airbnb manipulated data to make it appear that it’s trying to cover up how its service is being used for illegal purposes. Obviously, the cover up makes the company look pretty bad.
After the claim of data manipulation arose a few weeks ago, I expected to get some sort of denial from Airbnb that the claims were false. That hasn’t happened.
Instead, Airbnb has been trying to cast doubt on Cox’s findings, without denying them. Their spokesperson theorized to The Guardian that the drop in listings might be explained by tourism related to Halloween and the NYC Marathon, in some way.
That theory doesn’t hold up, though, because there wasn’t a drop in total listings. Cox’s data (which is publicly available) doesn’t show that the total number of listings decreased, just that the number of listings where the host has multiple properties dropped during the two weeks before the data release.
At the same time, the number of listings from owners with one listing increased a lot faster than it did before or after: possibly due to disqualified owners re-listing their homes under separate ownerships in order to avoid getting removed.
Since Cox doesn’t have the full set of data, I did need to confirm that the analysis of which hosts have multiple listings is correct. So I spot-checked their data, and it looks correct. I chose a few of the hosts that Cox’s data indicates have multiple listings; it seems like they do.
It’s unclear how many of these are really distinct listings. Are some of them accidental double listings? Unfortunately, without Airbnb releasing additional data, we can’t tell for sure.
Looking through Airbnb’s forums, it does look like there were hosts complaining about being removed in November. Here is one. (For the record, there does seem to be the same thing happening in London right now, so we’ll see if a data release is forthcoming.)
Ultimately, Airbnb has better data about this unusual activity than its accusers do. I’d also expect more complaints on the message boards and Twitter when the alleged scrubbing of data took place if it was really major. Along those lines, the best piece of evidence that Airbnb didn’t do anything inappropriate is that there weren’t many more listings removed than normal in November.
So using data scraped by Inside Airbnb, I looked at what percentage of listings from each month were no longer listed the next month: this number is often known as “churn” in the business world. If Airbnb had completed a massive one-time-only removal of listings in November, we’d expect to see a big spike in churn. Instead, we see a churn number that’s higher than surrounding months but not shockingly so.
With the additional data pull on November 20th, however, we can identify the daily churn rate much more clearly and see that all of the increase was done before November 20th. (Remember, the data was pulled the 17th).
The fact that Airbnb isn’t using its superior information to respond directly to the accusations is pretty damning, if you take the company’s word that the data was released “so that policymakers can make informed decisions about home sharing in their communities.”
But if you think of Airbnb’s “transparency” as a catalyst for self-regulation, I think there are positive signs here. It’s possible that Airbnb only realized how bad the data looked when they were considering opening up its books a little, and these were honest steps to fix its user base. I’ve worked at companies where we spent a lot of time and money to make sure our users were operating within our terms and conditions, and it’s not a trivial problem. I have a lot of sympathy towards Airbnb as they try to govern their marketplace.
Even if we give Airbnb the benefit of the doubt—and there’s so much doubt!—the disclosures here are woefully inadequate. If there had been a major shift in policy or listings during the time period the data was provided, it should have been disclosed to people who were writing about and making decisions about this data.
Airbnb, for its part, did release the following statement:
The facts are clear for all to see - the vast majority of our hosts are everyday people who have just one listing and share their space a few nights a month to help make ends meet. Airbnb is an open people-to-people platform where listings come on and go off throughout the year. We’ve also done significant work to educate our community about what is in the best interest of their city and we routinely review our listings to ensure guests are having the quality, local experience they expect and deserve. We’re glad that our platform has evolved and that most of our hosts share just one listing.
As someone who works in the world of web data and analytics, it’s worrisome to me to see tech companies playing fast and loose with their publicly released data this way. Airbnb, like most companies, has a careful process set up to vet its public statements. There’s a clear understanding of the importance of not saying anything incorrect.
However, the standards for releasing numbers are much, much lower. It’s harder to prove malfeasance and disclosure requirements are not well established. This isn’t a new problem either. Mark Twain spoke his famous words about statistics in the 19th century: “There are three kinds of lies: lies, damned lies, and statistics.” (Apparently, he wasn’t even the first to say it either.)
But as more and more of our life and economy becomes trackable, and policymakers depend more and more on that data, standards on how it is gathered and presented are more important than ever.