Daily Archives: September 14, 2011

Session Preview: Understanding Wikipedia

The technical session Understanding Wikipedia will feature four presentations. See the schedule for details on when and where to go.

WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance

Shyong (Tony) K. Lam, Anuradha Uduwage, Zhenhua Dong, Shilad Sen, David R. Musicant, Loren Terveen, John Riedl

Wikipedia has rapidly become an invaluable destination for mil- lions of information-seeking users. However, media reports suggest an important challenge: only a small fraction of Wikipedia’s legion of volunteer editors are female. In the current work, we present a scientific exploration of the gender imbalance in the English Wikipedia’s population of editors. We look at the nature of the imbalance itself, its effects on the quality of the encyclopedia, and several conflict-related factors that may be contributing to the gender gap. Our findings confirm the presence of a large gender gap among editors and a corresponding gender-oriented disparity in the content of Wikipedia’s articles. Further, we find evidence hinting at a culture that may be resistant to female participation.

Gender Differences in Wikipedia Editing

Judd Antin, Raymond Yee, Coye Cheshire, Oded Nov

As Wikipedia has become an indispensable source of online information, concerns about who writes, edits, and maintains it have come to the forefront. In particular, the 2010 UNU-MERIT survey found evidence of a significant gender skew: fewer than 13% of Wikipedia contributors are women. However, the number of contributors is just one way to examine gender differences in contribution. In this paper we take a more fine-grained perspective by examining how much and what types of Wiki-work men and women tend to do. First, we find that the so-called “Gender Gap” in number of editors may not be as wide as prior studies have suggested. Second, although more than 80% of editors in our sample were men, among the bottom 75% of editors by activity-level, we find that men and women made similar numbers of revisions. However, among the most active Wikipedians men tended to make many more revisions than women. Finally, we find that the most active women in our sample tended to make larger revisions than the most active men. We conclude by discussing directions for future research.

Finding Patterns in Behavioral Observations by Automatically Labeling Forms of Wikiwork in Barnstars

David W. McDonald, Sara Javanmardi, Mark Zachry

Our everyday observations about the behaviors of others around us shape how we decide to act or interact. In social media the ability to observe and interpret others’ behavior is limited. This work describes one approach to leverage everyday behavioral observations to develop tools that could improve understanding and sense making capabilities of contributors, managers and researchers of social media systems. One example of behavioral observation is Wikipedia Barnstars. Barnstars are a type of award recognizing the activities of Wikipedia editors. We mine the entire English Wikipedia to extract barnstar observations. We develop a multi-label classifier based on a random forest technique to recognize and label distinct forms of observed and acknowledged activity. We evaluate the classifier through several means including use of separate training and testing datasets and the by application of the classifier to previously unlabeled data. We use the classifier to identify Wikipedia editors who have been observed with some predominant types of behavior and explore whether those patterns of behavior are evident and how observers seem to be making the observations. We discuss how these types of activity observations can be used to develop tools and potentially improve understanding and analysis in wikis and other online communities.

What Wikipedia Deletes: Characterizing Dangerous Collaborative Content

Andrew G. West, Insup Lee

Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply “undone” – but deleted from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information).

Herein, we analyze one year of Wikipedia’s public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia’s approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied.