Original Research: 1.8 million articles

Friday, July 13, 2007

1.8 million articles

The good news: Wikipedia has over 1.8 million articles. You can find something about practically *any* subject.

The bad news: Wikipedia has over 1.8 million articles. Not every article has been reviewed, perhaps not even most of them. One can't say that Wikipedia is peer-reviewed with a straight face anymore; we're on the verge of re-inventing the World Wide Web in all of its instability and unreliability in Wikipedia.

And Andew Lih once accused me of optimism.

In the last week, a number of us Wikipedia-related bloggers have ranted about dealing with deletion-happy Wikipedians. They include Andrew, Ben Yates, Kelly Martin, yours truly, and most recently Urpo Lankinen. The kinds of people who can quote policy at length, sometimes by the reference or shortcut (e.g. "A7, "WP:NOT") but can't explain what it means or how it applies to save their lives. I regret to say that it's just the tip of an iceberg of a problem -- caused by the challenge of managing over 1.8 million articles.

The problem with so many articles is that people turn to tools to cope with this growing monster. The one tool I've had an excuse to rant about -- but didn't because I honestly didn't want to post that much anger and bile about any one topic -- is using bots to tag images for deletion due to questions over their fair-use status. Yes, there is a problem with abuse of this intentionally narrow exclusion to the general Wikipedia policy of free content, but it is a problem that needs to be solved with a modicum of thought and care -- not with a bot following an algorithm created from if/then statements. However, taking 30 seconds to determine whether a given image is obviously within fair use guidelines, and the problem lies in an easily fixable omission or minor error, means that less images are tagged than if the bot is allowed to make an edit every 6 seconds -- and with hundreds of thousands of images to be tagged, editcount is the important thing. (Augh. I didn't mean to write this much, so I'll stop here.)

In other words, because it is easier to tack a tag or template on an article than to fix it, Wikipedians tend to make these kinds of edits over fixing the article. And if a Wikipedian can create a bot to do the tagging for her or him, these are the kinds of edits that Wikipedian will make to the exclusion of all others. A bot tags -- but a bot won't read the article. So typos and factual inconsistencies, some of which even the newest of all Wikipedians could see and fix with a click some typing and another click, persist for months. I have found simple mistakes like these in articles I wrote months ago, embarassedly fixed them, then wondered if this was evidence that no one reads what I write. Obviously it is not critically read, so any errors in the text are spread beyond just Wikipedia to the countless mirrors, and beyond into the various dark corners of the Internet where they hibernate until someone reintroduces them into the mainstream once again.

But good, critical reworking of an article takes time. Sometimes months, because sources have to be found, read, and understood. I've mentioned in an earlier post that I was trying another tactic of solving this problem -- extracting facts from books as I read them into relevant existing articles -- but this only speeds up the process in some cases. No two articles are structured in the same manner (beyond having a lead paragraph, a body, and sections providing links and sources at the end), even on similar topics. And once I start making one change to an article, I often find that other edits are needed -- typos and grammar fixed, wording tightened, maybe another fact or two added and sourced -- and links fixed. (I have this obsession about making sure that links point to the right article.)

However, I understand the other side of this problem: a newbie discovers Wikipedia, wants to prove that she or he not only belongs but is as good -- or better -- than the current crowd. Every new Wikipedian wants to compete in the marketplace of improving Wikipedia, and being human the newbie will find the most efficient -- or easiest -- way to compete. They tend to make more edits than better ones; and it's always easier to create articles than write Featured ones. They are attracted to policy issues, and try to formulate new ones and rewrite old ones -- okay, so do old-timers like me, but I see it as a break from the writing, not instead of it -- or enforcing policy. Fighting vandals, reverting spam and other bad edits and debating in Wikipedia: Articles for Deletion appear to be very popular for this reason. All of this is valuable work, but policy issues are best handled when one has experience with how Wikipedia works, after one has written a few articles, debated a few changes, followed a few threads about the problems of enforcing policy -- and learned that policy really isn't as important as it might first appear.

If someone joins Wikipedia to write encyclopedia articles, and makes the usual effort not only to conform to community standards (which are described, not legislated, in the policy pages) but get along with people, one really doesn't need to pay attention to what the policy pages actually say.

Another error I've seen these newbies exhibit is in thinking that all of those pages in the Wikipedia namespace are legislated rules, rather than descriptions of processes and considerations. If I could compare the ideal behavior of Wikipedia to a river (both are attempts to go somewhere), the guidelines are not an attempt to turn it into a channel confined between dykes paved with stone, but an effort to remove the worst of the sandbars and snags on one hand, while on the other discouraging people from settling in the floodplain. The first way will always be more efficient than the second, but not only does the first have less charm and often beauty than the second, it is less robust and adaptable than the second. This is something experience teaches, and a good reason why newbies should avoid involving themselves in policy until they gain experience.

(Someone might insist that there are exceptions to that last statement. Some Wikipedians can demonstrate that they possess clue with their first edits, and some of us are still newbies after many years of active participation. In that case, an argument based on reason why an exception should be made is the best way to decide. Perhaps to become a full-fledged Wikipedian, we should insist that candidates write an essay explaining what they believe Wikipedia is -- and convince us that they know. I'm not sure many of us would pass such a test.)

Returning to my point, when people compete it is natural for them to seek the most efficient or easiest way to compete. Competition itself is not always a bad thing; it is how the better eventually wins out over the good. However, this human tendency can sabotage the positive nature of competition; instead of playing the game, people game the rules. Instead of lowering prices and raising quality, both prices and quality are lowered and all players find themselves trapped in a race to the bottom. I see this happening in Wikipedia: far more effort is spent on arguing over things like original research and permitted fair use than improving articles. People may win those arguments, and pervail in their bot-enabled mass deletions, but the average quality of Wikipedia's articles will remain the same.

Yet even if by some act of God, every current editor in Wikipedia were replaced by a group of superhumans, who were endowed with sufficient wisdom and learning to work together harmoniously and write articles of such great quality as to make the Encyclopedia Brittanica look as reliable as the Weekly World News, the problem facing Wikipedia is that it has over 1.8 million articles. Unless this becomes their full-time job, by the time our super-Wikipedians are rewriting the last articles, they will find that the first rewrites will need to be reviewed and updated; human knowledge, natural phenomena and history wait for no one.

Geoff

Technocrati tags: online communities, wikipedia

Labels: speculations, wikipedia

# posted by llywrch @ 2:57 PM

Comments:

"Wikipedia-related bloggers have ranted about dealing with deletion-happy Wikipedians"

Fee free to add me to your list... ;-)

The Wikipedia Bureaucracy

# posted by

Anonymous : 11:59 AM, November 14, 2007

Original Research

Friday, July 13, 2007

1.8 million articles

About Me

Wikipedia Links

Other Links

Archives

Labels