Monday, November 13, 2006
Can Wikipedia keep growing?
As I write this, the Main Page of Wikipedia states that it has 1,482,227 articles. (It's quite safe to assume that the actual number was at that moment a little higher -- and is even higher the moment you read this.) Over a million articles! Is it possible for this number to climb even higher -- say to two million and above?
If you poke around a little, you will find pages like Wikipedia: Size Comparisons, where it is shown that Wikipedia has 40% more articles than the largest print encyclopedia, the Spanish Enciclopedia universal ilustrada europeo-americana. Then again, if you poke around a little, you may wonder just how many articles are not about popular culture subjects like Lord of the Rings, Anime, video games, and The Simpsons television show.
And yet, the number of articles have been doubling on a fairly regular basis: according to Wikipedia: Modelling Wikipedia's growth, based on prior growth the size doubles once every 354 days. And if you think about it for a moment, there are still thousands -- if not tens of thousands -- of towns and villages outside the United States which lack articles, and each of these is on average connected to at least one person or event that is arguably notable enough for inclusion. That is only one possible case, and I am sure any reader can come up with her or his own. Looked at this way, it would appear that Wikipedia has barely scratched the surface of all of the possible -- and justifiable -- topics.
However, I suspect that we are within sight of the practical limit of articles for Wikipedia. To put it in quantifiable terms, I expect that at some point between five and ten million articles, the number of new articles will dramaticaly fall off. There are a number of reasons for this:
The number of Wikipedians are finite. Something that occasionally gets overlooked is the simple fact that contributing to Wikipedia is a rather odd hobby. Further, writing for Wikipedia is not a way to gain fame or fortune because of its emphasis on group ownership of articles -- which leads to almost all articles having an anonymous personality. While this quality has a number of justifiably positive benefits, it also leads to inevitable frustration
from editors, who just as arguably feel that they do not receive enough recognition for their hard work -- which is a topic I will discuss in a later post.
And it takes people to write, revise, and revert the inevitable vandalism on articles. A person can only do so much, regardless how addicted she or he might be to Wikipedia. Once the number of articles Wikipedia has reached the practical limit our active members can handle, I wouldn't be surprised if a consensus emerges to stop accepting any new articles. That's not something I'd like to see, but if it needs to happen so we can keep the project going forward, it will happen.
So just how many potential Wikipedians are out there? If we could find a way to determine just how many people use Wikipedia, then compare that number to the number of accounts on Wikipedia (2,740,857 total), then compare that last number to the number of active editors (this information appears to be no longer available, but my seat-of-pants guess is that it is roughly equal to the number of Admins -- 1000), we might be able to extrapolate an answer. But if you
consider that 2740 people have created an account on Wikipedia for every currently active editor, it's hard not to argue that the pool of potential is very small.
Difficulty of writing a new article How many people out there remember writing papers in college? Remember how difficult it was to write a five-page paper -- let alone a 15-page term paper? Writing articles from scratch for Wikipedia is no easier. I remember organizing the Wikipedia article on King Arthur, and being thanked for sorting it out. And almost four years later, that structure is still almost entirely in place. I could point to a few other examples, but one challenge in writing an article is figuring out how to present the information.
Then there is the challenge of research. Repeating what one might have seen on the television last night will only get an editor so far; and a Google search will find not that much information. The time comes when a serious article-writer has to visit the local public library (or a univeristy library) to learn the information the article needs. And even then, a Wikipedian will encounter another barrier: sometimes the information is available only in a work the library does not have (e.g., anyone know where I can find a copy of the 1994 Ethiopian census?), sometimes the information may not be available in print & requires original research.
Limits on what to include Love it, hate it, or just wish we could find another way to figure out which articles we should include or exclude, some subjects will never have an article in Wikipedia. For example, there's a mailbox a few blocks from my house: although I can prove this object exists, I'll be surprised if it is ever the subject of its own article. Wikipedia is a reference work: if no one can be expected to want or need to read about it, it shouldn't have an article about it.
And there are only so many subjects worth an article on Wikipedia. For example, in Oregon, there are just over 90 state legislators: of these, at most a tenth of them at any time is worth an article on Wikipedia, but I suspect the number might be closer to one or two. Most of them, prinicpaled or hard-working as they might be, are doing little more than what their job requires -- and are not of interest to anyone who is not a consituent. And much the same can be said for people in many other fields.
And as I said above, sometimes one cannot find reliable information about a given subject. To put this another way, sometimes when I am researching a subject I reach a point where I am at the limit what is currently known; I feel as if I am standing at the shoulder of a given expert, watching her or him considering the evidence and in the middle of formulating a conclusion. This is not the cutting edge of knowledge: it is what is sometimes called the bleeding edge, where you gather the finest minds, explain the problem, and they tell you, "Well, this might work" -- but don't offer any guarrantee because they honestly don't know.
For example, I've been writing a lot of articles on the local districts of Ethiopia -- called woreda -- and I have often encountered situations where the information is incomplete or contradictory. As a result, I wrote articles about woredas that I suspect do not exist; after working with the material for a while, one comes to a point where one can read between the lines
and see what the actual facts are. Yet because the rules of Wikipedia, if there are reasonably reliable sources then I have to write the article -- even if the cites are wrong. I justify this approach by remembering that people look to reference works about misinformation: by showing where a given fact or statement comes from, they can determine for themselves that it is misleading.
This acknowledgement of limits to our current knowledge, I believe, is one of the justifications to replace the rule against Original Research with one about Verifiablity. An intelligent reader will accept that some points of knowledge are still undetermined; and knowing which ones are will help this reader evaluate other claims. Wikipedia therefore is meeting its mission in being a reference work.
This is weird: in a way, this argument defends the existence of stubs -- especially if they fail to mature into complete articles.
Geoff
If you poke around a little, you will find pages like Wikipedia: Size Comparisons, where it is shown that Wikipedia has 40% more articles than the largest print encyclopedia, the Spanish Enciclopedia universal ilustrada europeo-americana. Then again, if you poke around a little, you may wonder just how many articles are not about popular culture subjects like Lord of the Rings, Anime, video games, and The Simpsons television show.
And yet, the number of articles have been doubling on a fairly regular basis: according to Wikipedia: Modelling Wikipedia's growth, based on prior growth the size doubles once every 354 days. And if you think about it for a moment, there are still thousands -- if not tens of thousands -- of towns and villages outside the United States which lack articles, and each of these is on average connected to at least one person or event that is arguably notable enough for inclusion. That is only one possible case, and I am sure any reader can come up with her or his own. Looked at this way, it would appear that Wikipedia has barely scratched the surface of all of the possible -- and justifiable -- topics.
However, I suspect that we are within sight of the practical limit of articles for Wikipedia. To put it in quantifiable terms, I expect that at some point between five and ten million articles, the number of new articles will dramaticaly fall off. There are a number of reasons for this:
The number of Wikipedians are finite. Something that occasionally gets overlooked is the simple fact that contributing to Wikipedia is a rather odd hobby. Further, writing for Wikipedia is not a way to gain fame or fortune because of its emphasis on group ownership of articles -- which leads to almost all articles having an anonymous personality. While this quality has a number of justifiably positive benefits, it also leads to inevitable frustration
from editors, who just as arguably feel that they do not receive enough recognition for their hard work -- which is a topic I will discuss in a later post.
And it takes people to write, revise, and revert the inevitable vandalism on articles. A person can only do so much, regardless how addicted she or he might be to Wikipedia. Once the number of articles Wikipedia has reached the practical limit our active members can handle, I wouldn't be surprised if a consensus emerges to stop accepting any new articles. That's not something I'd like to see, but if it needs to happen so we can keep the project going forward, it will happen.
So just how many potential Wikipedians are out there? If we could find a way to determine just how many people use Wikipedia, then compare that number to the number of accounts on Wikipedia (2,740,857 total), then compare that last number to the number of active editors (this information appears to be no longer available, but my seat-of-pants guess is that it is roughly equal to the number of Admins -- 1000), we might be able to extrapolate an answer. But if you
consider that 2740 people have created an account on Wikipedia for every currently active editor, it's hard not to argue that the pool of potential is very small.
Difficulty of writing a new article How many people out there remember writing papers in college? Remember how difficult it was to write a five-page paper -- let alone a 15-page term paper? Writing articles from scratch for Wikipedia is no easier. I remember organizing the Wikipedia article on King Arthur, and being thanked for sorting it out. And almost four years later, that structure is still almost entirely in place. I could point to a few other examples, but one challenge in writing an article is figuring out how to present the information.
Then there is the challenge of research. Repeating what one might have seen on the television last night will only get an editor so far; and a Google search will find not that much information. The time comes when a serious article-writer has to visit the local public library (or a univeristy library) to learn the information the article needs. And even then, a Wikipedian will encounter another barrier: sometimes the information is available only in a work the library does not have (e.g., anyone know where I can find a copy of the 1994 Ethiopian census?), sometimes the information may not be available in print & requires original research.
Limits on what to include Love it, hate it, or just wish we could find another way to figure out which articles we should include or exclude, some subjects will never have an article in Wikipedia. For example, there's a mailbox a few blocks from my house: although I can prove this object exists, I'll be surprised if it is ever the subject of its own article. Wikipedia is a reference work: if no one can be expected to want or need to read about it, it shouldn't have an article about it.
And there are only so many subjects worth an article on Wikipedia. For example, in Oregon, there are just over 90 state legislators: of these, at most a tenth of them at any time is worth an article on Wikipedia, but I suspect the number might be closer to one or two. Most of them, prinicpaled or hard-working as they might be, are doing little more than what their job requires -- and are not of interest to anyone who is not a consituent. And much the same can be said for people in many other fields.
And as I said above, sometimes one cannot find reliable information about a given subject. To put this another way, sometimes when I am researching a subject I reach a point where I am at the limit what is currently known; I feel as if I am standing at the shoulder of a given expert, watching her or him considering the evidence and in the middle of formulating a conclusion. This is not the cutting edge of knowledge: it is what is sometimes called the bleeding edge, where you gather the finest minds, explain the problem, and they tell you, "Well, this might work" -- but don't offer any guarrantee because they honestly don't know.
For example, I've been writing a lot of articles on the local districts of Ethiopia -- called woreda -- and I have often encountered situations where the information is incomplete or contradictory. As a result, I wrote articles about woredas that I suspect do not exist; after working with the material for a while, one comes to a point where one can read between the lines
and see what the actual facts are. Yet because the rules of Wikipedia, if there are reasonably reliable sources then I have to write the article -- even if the cites are wrong. I justify this approach by remembering that people look to reference works about misinformation: by showing where a given fact or statement comes from, they can determine for themselves that it is misleading.
This acknowledgement of limits to our current knowledge, I believe, is one of the justifications to replace the rule against Original Research with one about Verifiablity. An intelligent reader will accept that some points of knowledge are still undetermined; and knowing which ones are will help this reader evaluate other claims. Wikipedia therefore is meeting its mission in being a reference work.
This is weird: in a way, this argument defends the existence of stubs -- especially if they fail to mature into complete articles.
Geoff
Labels: wikipedia