Wikipedia pageviews: still in decline

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline

In March 2015, I wrote about a decline in Wikipedia desktop pageviews over the last few years (and posted a short version to LessWrong). With a lot of help from Issa Rice over the last year, and a lot more quality data, I’ve revisited the claims of that post. This post provides a high-level summary of my takeaways. If enough people express interest in the comments, I intend to write up in more detail on the aspects that people express interest in. If I do a more detailed writeup, it will probably be in the latter half of 2018, giving enough additional data to evaluate how well the decline hypothesis holds up. Here are the top-level conclusions.

Comment

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=YzP2YYXfwcKLQPZCG

I am someone who has found that I’m using Wikipedia less, and I find that I’m relying more on Google than I used to, for what I used to use Wikipedia for. In particular, Featured Snippets in Search (which will often pull an excerpt from a Wikipedia article!) are a fantastic substitute for quick questions that I would, in past years, have asked Wikipedia, although it isn’t a substitute for a deeper exploration.

Comment

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=sDbkWKavP7QWp6j7k

This sounds very plausible, and my singular data point confirms.

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=8mfsKACfJZB7iJLXe

I agree that Featured Snippets is probably the cause of my decreased reliance as well.

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=xXTfZXfS5ovvm8wxE

Another possibility is that Wikipedia is facing increased competition from other info providers such as content marketers? Edit: I suppose you might measure this effect by trying to see if Wikipedia’s position in search engine rankings has dropped. Or alternatively, it might be interesting to compare Wikipedia traffic for a particular concept to Google Trends for that concept. If it’s a concept that doesn’t get discussed much on social media, and Google Trends is increasing while Wikipedia is declining, that seems like evidence against the social media displacement hypothesis.

Comment

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=8HW2hSPPc3kP2Zb7t

Great points. As I noted in the post, search and social media are the two most likely proximal mechanisms of causation for the part of the decline that’s real. But neither may represent the "ultimate" cause: the growth of alternate content sources, or better marketing by them, or changes in user habits, might be what’s driving the changes in social media and search traffic patterns (in the sense that the reason Google’s showing different results, or Facebook is making some content easier to share, is itself driven by some combination of what’s out there and what users want). The main challenge with search engine ranking data is that (a) the APIs forbid downloading the data en masse across many search terms, and (b) getting historical data is difficult. Some SEO companies offer historical data, but based on research Issa and I did last year, we’d have to pay a decent amount to even be able to see if the data they have is helpful to us, and it may very well not be. The problem with Google Trends is that (a) it does a lot of normalization (it normalizes search volume relative to total search volume at the time), which makes it tricky to interpret data over time, and (b) it’s hard to download data en masse. Also, a lot of Google Trends results are just amusingly weird, e.g. https://​​trends.google.com/​​trends/​​explore?date=all&q=Facebook (see https://​​www.facebook.com/​​vipulnaik.r/​​posts/​​10208985033078964 for more discussion)-- are we really to believe that interest in Facebook spiked in October 2012, and that it has returned in 2017 (after a 5-year decline) to what it used to be back in 2009? Google Trends is just yet another messy data series that I would have to acquire expertise in the nuances of, not a reliable beacon of truth against which Wikipedia data can be compared. The one external data source I have been able to collect with reasonable reliability is Facebook share counts. At the end of each month, I record Facebook share counts for a number of Wikipedia pages by hitting the Facebook API (a process that takes several days because of Facebook’s rate limiting). Based on this I now have decent time series of cumulative Facebook share counts, such as https://​​wikipediaviews.org/​​displayviewsformultiplemonths.php?tag=Colors&allmonths=allmonths-api&language=en&drilldown=cumulative-facebook-shares If I do a more detailed analysis, this data will be important for evaluating the social media hypothesis. How interested are you in seeing an exploration of the search engine ranking and increased use of social media hypotheses?

Comment

are we really to believe that interest in Facebook spiked in October 2012, and that it has returned in 2017 (after a 5-year decline) to what it used to be back in 2009 That seems very plausible to me; this kind of cyclical interest seems pretty common for social sites. This would also explain Facebook’s eagerness to acquire up-and-comers like Instagram and Snapchat. How interested are you in seeing an exploration of the search engine ranking and increased use of social media hypotheses? Somewhat interested, although I’m also not super clear on what relevance we think Wikipedia traffic has in the grand scheme of things.

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=L5NAJSPbNXEHKaq9P

It’s interesting how this concern is completely absent from the present Wikimedia strategy planning exercise.

Comment

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=uz64LpGoSSCQkAZZf

The Wikimedia Foundation has not ignored the decline. For instance, they discuss the overall trends in detail in their quarterly readership metrics reports, the latest of which is at https://​​commons.wikimedia.org/​​wiki/​​File:Wikimedia_Foundation_Readers_metrics_Q4_2016-17_(Apr-Jun_2017).pdf The main difference between what they cover and what I intend to cover are (a) they only cover overall rather than per-page pageviews, (b) they focus more on year-over-year comparisons than long-run trends, (c) related to (b), they don’t discuss the long-run causes. However, these reports are a great way of catching up on incremental overall traffic level updates as well as any analytics or measurement discrepancies that might be driving weird numbers.The challenge of raising more funds with declining traffic has also been noted in fundraiser discussions, such as at https://​​en.wikipedia.org/​​wiki/​​Wikipedia:Wikipedia_Signpost/​​2015-10-14/​​News_and_notes which has the quote:

Better performing banners are required to raise a higher budget with declining traffic. We’ll continue testing new banners into the next quarter and sharing highlights as we go.

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=hB9Y4efBgNCWoDC8z

Added to the frontpage. (Also, deleted your two empty comments.)

Comment

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=kki5nsEtWcdjQXMb9

They still show up in the total comment count :).

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=j62Z9dnjBRX2LZomo

To mention the elephant in the living room, I wonder if the increasingly broken wikipedia mod culture has something to do with this.

Comment

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=Ai4voooHFKwQ95QEt

Great point. As somebody who has been in the crosshairs of Wikipedia mods (see ANI) my bias would push me to agree :). However, despite what I see as problems with Wikipedia mod culture, it remains true that Wikipedia has grown quite a bit, both in number of articles and length of already existing articles, over the time period when pageviews declined. I suspect the culture is probably a factor in that it represents an opportunity cost: a better culture might have led to an (even) better Wikipedia that would not have declined in pageviews so much, but I don’t think the mod culture led to a quality decline per se. In other words, I don’t think the mechanism:counterproductive mod culture → quality decline → pageview declineis feasible.

Comment

You seem to be conflating quantity and quality.

Comment

In the case of Wikipedia, I think the aspects of quality that correlate most with explaining pageviews are readily proxied by quantity. Specifically, the main quality factors in people reading a Wikipedia page are (a) the existence of the page (!), (b) whether the page has the stuff they were looking for. I proxied the first by number of pages, and the second by length of the pages that already existed. Admittedly, there are a lot more subtleties to quality measurement (which I can go into in depth at some other point) some of which can have indirect, long-term effects on pageviews, but on most of these dimensions Wikipedia hasn’t declined in the last few years (though I think it has grown more slowly than it would with a less dysfunctional mod culture, and arguably too slowly to keep pace with the competition).

Comment

Specifically, the main quality factors in people reading a Wikipedia page are (a) the existence of the page (!), (b) whether the page has the stuff they were looking for. (c) whether the information on the page is accurate. I proxied the first by number of pages, and the second by length of the pages that already existed. Except not all topics and not all information are of equal interest to people.

Comment

FWIW, my impression is that data on Wikipedia has gotten somewhat more accurate over time, due to the push for more citations, though I think much of this effect occurred before the decline started. I think the push for accuracy has traded off a lot against growth of content (both growth in number of pages and growth in amount of data on each page). These are crude impressions (I’ve read some relevant research but don’t have strong reason to believe that should be decisive in this evaluation) but I’m curious to hear what specific impressions you have that are contrary to this.

If you have more fine-grained data at your disposal on different topics and how much each has grown or shrunk in terms of number of pages, data available on each page, and accuracy, please share :).

https://www.lesswrong.com/posts/ghBZDavgywxXeqWSe/wikipedia-pageviews-still-in-decline?commentId=dbSwH66yM49YNtbpA

I’ve never been a big user of Wikipedia.