Friday, March 23, 2007
Is Wikipedia approaching a barrier?
A couple of years ago, I speculated that the growth of Wikipedia's articles would plateau somewhere between five to ten million articles. My reason for that speculation was simple: after a certain point, it becomes increasingly easier to work on existing articles, than to create new ones.
As an example, consider all possible Wikipedia articles about Ethiopia. The first few possible subjects are easy to name: an article about the nation, its history, economy, geography, etc. -- all of the topics that the average encyclopedia article would have. The next step is a little harder, but still straightforward & easily done -- series of articles on related things, like its rulers or heads of states, the historic provinces or the current subdivisions, or major battles. (Warfare seems to be a perennial favorite topic, second only to pop culture topics.) Another avenue is mining online sources for further topics: translation from one electronic format to another is always faster than translation from print to electronic. Yet eventually all of the low-hanging fruit gets picked, and a would-be contributor finds it easier to improve existing articles than create new ones.
There is also the dynamic that some articles -- either stubs or articles of marginal importance -- are merged into a single article. Consider the Patriarchs of Alexandria: of the first 12 office holders, only Mark the Evangelist and Demetrius are little more than names for even the most informed specialist. One of my long-procrastinated projects is to combine the entries of ten of these ancient religious leaders into a single article, with the little information we possess about them; when this is done, 10 articles will effectively become one -- effectively decreasing this statistic.
That is why I found Sage Ross' analysis important: where I have been guessing, he did the necessary number-crunching to prove that this limit is already approaching. He writes:
Sage's conclusion is identical to mine: "It seems that available unwritten encyclopedic topics is becoming a significant constraint."
If we are correct, the principle of least work -- the easiest tasks will almost always be completed first -- would predict that the quality of Wikipedia's articles will start to gradually improve, because that is becoming the easiest task on Wikipedia to do now. Even if this first takes the form of automated edits -- running bots to make large numbers of repetative changes. Eventually, someone will have to acknowledge the countless requests for sources that dot so many Wikipedia articles, and begin the long, tedious task of researching the issue and meeting that demand. It will be interesting to observe Wikipedia's reputation in schools and the mass media once that effort has made notable progress.
Geoff
Technocrati tags: wikipedia.
As an example, consider all possible Wikipedia articles about Ethiopia. The first few possible subjects are easy to name: an article about the nation, its history, economy, geography, etc. -- all of the topics that the average encyclopedia article would have. The next step is a little harder, but still straightforward & easily done -- series of articles on related things, like its rulers or heads of states, the historic provinces or the current subdivisions, or major battles. (Warfare seems to be a perennial favorite topic, second only to pop culture topics.) Another avenue is mining online sources for further topics: translation from one electronic format to another is always faster than translation from print to electronic. Yet eventually all of the low-hanging fruit gets picked, and a would-be contributor finds it easier to improve existing articles than create new ones.
There is also the dynamic that some articles -- either stubs or articles of marginal importance -- are merged into a single article. Consider the Patriarchs of Alexandria: of the first 12 office holders, only Mark the Evangelist and Demetrius are little more than names for even the most informed specialist. One of my long-procrastinated projects is to combine the entries of ten of these ancient religious leaders into a single article, with the little information we possess about them; when this is done, 10 articles will effectively become one -- effectively decreasing this statistic.
That is why I found Sage Ross' analysis important: where I have been guessing, he did the necessary number-crunching to prove that this limit is already approaching. He writes:
Another side to the watershed, which nobody is quite recognizing yet, relates to the limits of Wikipedia. The exponential phase of (English) Wikipedia's growth (in terms of number of articles, and in terms of number of active users) is probably over. From 2003 to mid-2006, the number of articles had followed a very regular exponential pattern. Had exponential growth continued, it would have hit 2,000,000 a few weeks ago; it just passed 1,700,000 today. The average number of articles created per day since late December (around 1724) has actually been lower than the average number per day over the previous year (1823). This difference is only partly the result of the always slower holiday season.
Sage's conclusion is identical to mine: "It seems that available unwritten encyclopedic topics is becoming a significant constraint."
If we are correct, the principle of least work -- the easiest tasks will almost always be completed first -- would predict that the quality of Wikipedia's articles will start to gradually improve, because that is becoming the easiest task on Wikipedia to do now. Even if this first takes the form of automated edits -- running bots to make large numbers of repetative changes. Eventually, someone will have to acknowledge the countless requests for sources that dot so many Wikipedia articles, and begin the long, tedious task of researching the issue and meeting that demand. It will be interesting to observe Wikipedia's reputation in schools and the mass media once that effort has made notable progress.
Geoff
Technocrati tags: wikipedia.
Labels: speculations, wikipedia
Comments:
<< Home
Note however: 1. More than half the articles created each day are shot on sight. 2. The AFD/CSD "notability" barrier is actively discouraging. I could write hundreds of articles I'm not going to bother starting because I can't be bothered justifying to an idiot why they can't find them on Google; imagine how people less obnoxious than me feel.
Also: this may just be a sign that low-hanging fruit is starting to be exhausted for easily accessed English resources.
Whenever I see someone say or think that perhaps Wikipedia is somewhere near completion, I go and re-read http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test
Whenever I see someone say or think that perhaps Wikipedia is somewhere near completion, I go and re-read http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test
Yes, I don't think anyone is claiming Wikipedia is near completion. Geoff and I probably both agree that exhaustion of low-hanging fruit is issue. A crucial question is, to what extent does low-hanging fruit entice new editors into the fold who later tackle higher-value targets? Might rapid growth via the easy stuff hamper Wikipedia's long-term growth?
Wikipedia deletes itself after hoax entry http://ollysonions.blogspot.com/2007/03/wikipedia-deletes-itself-after-hoax.html
David, your number about how new articles are deleted day has been fairly steady over the last two years, right? It shouldn't then be a factor.
As for the "AfD/CSD 'notability' barrier", that may be another factor. As Wikipedia's coverage reaches further into esoteric subjects, it becomes more important for a Wikipedian to explain why the subject is important in the beginning of the article. We can more or less intuitively accept why a head of state is notable; why a given programmer or porn star is notable is not always obvious.
Geoff
As for the "AfD/CSD 'notability' barrier", that may be another factor. As Wikipedia's coverage reaches further into esoteric subjects, it becomes more important for a Wikipedian to explain why the subject is important in the beginning of the article. We can more or less intuitively accept why a head of state is notable; why a given programmer or porn star is notable is not always obvious.
Geoff
Anonymous linked to Piotrus' essay, which is a valid criticism of the English Wikipedia. Piotrus' criticism could be extended further: I have found that the German Wikipedia has a number of articles that the English does not. The English Wikipedia can be justifiably criticized for being a provincial community within the Wikimedia universe: we pay far less attention (and use far less content from them) to the other projects than they us.
However, translation is one of those barriers that keeps sources like the Polish or German Wikipedia from being considered "low-lying fruit". Another is the problem of writing general articles: it takes an informed mind to write a good article on a general subject that covers it adequately. This is the reason people often mention when it is pointed out that our best articles are specialized, esoteric ones -- not the general ones.
Geoff
However, translation is one of those barriers that keeps sources like the Polish or German Wikipedia from being considered "low-lying fruit". Another is the problem of writing general articles: it takes an informed mind to write a good article on a general subject that covers it adequately. This is the reason people often mention when it is pointed out that our best articles are specialized, esoteric ones -- not the general ones.
Geoff
Ollyonions, I searched your blog; I could not find the article you mention. Did you mean to write that you deleted your article on Wikipedia because it was a hoax?
Geoff
Post a Comment
Geoff
<< Home