Print

Types of wiki metrics

There are several kinds of data that could be collected to evaluate a wiki:

  • Direct user observation and surveys - This is the most direct way of measuring how people work with their wikis, and observing beneficial activities that result.
  • Analysis of wiki content and logs - This entails indirectly observing beneficial activities in the wiki by observing the evolution of wiki content without the involvement of its users. Measurements must be carefully chosen in order to most accurately capture the significance of the events that are observed. Measurements in these category are most frequently seen.
  • Indirect measurement by tracking organizational performance - These metrics attempt to quantify the impact of the wiki by measuring its concrete impact on the organization. This data can be powerful for justifying the continued operation of the wiki, but confounding variables are a problem because the observed changes could have been the result of other factors, and not the wiki. Correlating organizational performance measurements with analysis of the wiki content could mitigate this risk.

Analysis of wiki content and logs

This type of wiki analysis entails directly measuring some quantity on the wiki (without involving the users) and using it to demonstrate that the wiki is adding value to the organization or fulfilling some goal. This is most commonly seen when the number of users or pages on the wiki is cited as evidence that the wiki is popular, implicitly justifying its existence.

Because there are various Benefits of corporate wikis, the wiki practitioner must decide which usage pattern (see link) they want to measure. Some measurements can distinguish different usage patterns, while other measurements cannot. For example, counting the number of "hits per day" on a wiki can measure the volume of activity on a wiki but cannot distinguish any of the usage patterns.

Listed below are some wiki usage patterns, and proposed metrics which can distinguish a more collaborative usage pattern from a less collaborative one. The goal is to quantify the collaborative activity occurring within a wiki, which could result in more meaningful statistics than simply counting page views or logins.

A place to store your notes

  • Simple statistics (page counts, user counts, page view counts...)

Sharing information within an organizational unit

  • Number of links - Links between pages facilitate discovery by other members of the organization
  • Page view counts (by user) - Information sharing does not occur when a wiki page is read by its sole author. Page views can be weighted such that the value of a pageview is weighted by the amount of content on the page not written by the viewer.
  • Number of links (weighted by user) - Links between pages authored by the same user might not facilitate information sharing. Links can be weighted using cosine similarity (see below) such that links between pages with distinct authorship have more value.

Sharing information across organizational units

  • Page view counts (by organization) - In the same way that we can weight page views by the amount of content not written by the viewer, we can also weight page views by the amount of content not written by someone in the same organization as the viewer.
  • Number of links (weighted by organization) - In the same way that we can weight links to give additional value to links between pages with distinct authorship, we can also weight links to give value to pages authored by distinct organizations.

Collaborative synthesis of ideas

  • Co-authorship measures - These measures can be applied within an article or wiki-wide.
    • Number of distinct editors
    • Author entropy - An information entropy formula (see below) can be used to weight the number of distinct editors by the number of edits that each editor made. When the editing volume is highly unequal (one user made a disproportionate amount of edits), the number of distinct editors is weighted downward to reflect the smaller contribution of the other authors.
    • Gini coefficient of authorship - This is another way to measure the unequal contributions from the various editors of a wiki page (see below). The Gini coefficient has previously been used in wiki research to measure the extent that content in a page has been contributed by one editor (or a small community of editors) and correlated with article quality (citation (external link)).
    • Distinct authors by organization, author entropy by organization, gini coefficient by organization - The three measurements above can also be run across organizations instead of users, to measure collaboration across organizational boundaries.
    • Interlocking edits by user - A measure based on "interlocking edits" (see below) has been proposed to identify discourse occurring within a wiki page.

Formulas for analyzing wiki content

Gini coefficient

The Gini coefficient is a formula that measures how unequally a quantity is distributed across a population. The coefficient is scaled such that a perfectly equal distribution has a value of 0.0 and the most unequal distribution (where one member of the population is the only one with the quantity) has a value of 1.0.
In wikis, the Gini coefficient is normally used to measure the degree that some users use the wiki more than others. The "population" in this case is the wiki users, while the "quantity" is the number of reads or edits to a certain article (or the whole wiki).

Author entropy

Several authors have proposed using an information entropy measure to provide a weighted count of the number of authors of a document. This is done by computing the empirical distribution of the authors of each edit, and then taking the exponent of the entropy of the distribution (which is in bits). When a number of authors equally edited a wiki article, the author entropy will equal the number of unique authors. If some authors contributed fewer revisions than others, the author entropy will be less than the number of unique authors.

The result is that the author entropy reflects the number of users that have edited a wiki article, while showing bias towards editors who have made a large number of edits. Using this measurement, a user contributing only one edit to a frequently-edited article will not affect the statistic very much, while when counting unique authors, a user who contributed only one edit affects the statistic the same as an author who contributed 1000 edits.

Author entropy is described in the paper Author Entropy: A Metric for Characterization
of Software Authorship Patterns
 (external link) by Q Taylor, J Stevenson, D Delorey, C Knutson.

Cosine similarity

When analyzing links between two pages, it may be helpful to determine if two pages have similar authorship. When collecting statistics, different significance may be placed on links between pages with identical authors and pages with distinct authors.

The authorship of a page can be represented by a vector, where the number of dimensions is equal to the number of authors in the wiki, and the value of each dimension is the number of edits that the author made to the given page. Then, the cosine similarity of the vectors will be the authorship similarity of the two pages.

This same concept is used in Information Retrieval for computing document similarity through term-frequency vectors.

Interlocking

In the paper Methods and Measures for the Analysis of Corporate Wikis: A Case Study (external link) by S Blaschke, Klaus Stein, an "interlocking" measure was proposed to determine when collaborative exchanges have taken place in a wiki.

With the interlocking measure, collaborative intensity is measured using the number of times that authors alternate in an article's edit history. Extensions are given for measuring collaborative intensity when there are multiple authors. The interlocking measurement can also be used to detect pairs of users in a wiki that tend to work together.

Strategies for analyzing wiki content

Above, several measurements are outlined that can gauge the amount of collaborative activity in the wiki. These measures could be used to identify "early adopters" in wiki who are using the wiki in a highly collaborative way. These users could then be targeted for future pilot projects or other initiatives.


Indirect measurement by tracking organizational performance


Some ideas are from the discussion in the corporate wikis workshop at WikiSym 2009:

This category of metrics gauges how the wiki impacted the organization, by measuring the same thing before and after a user adopted the wiki. In order to make this kind of measurement, the expected impact of the wiki in the organization must be quantified, which is not always straightforward. For example, deploying a wiki could reduce the number of phone calls within an organization (by providing another way to collect information) but it could also increase the number of phone call between organizational units (because the wiki uncovers opportunities for collaboration within an organization, leading to increased communication.)


Baselines for wiki metrics


When collecting quantitative wiki metrics, it is often helpful to have a baseline value for the measurement, in order for the numbers to result in meaningful information.

One way to set a baseline is to compare a measurement in one wiki with the same measurement in other wiki projects. Implementing this is straightforward, but it is of questionable validity because the purposes of the wikis and the wiki users bases could be qualitatively different. If wikis are compared in this way, a similar wiki should be chosen for comparison.

Another way to set a baseline is to measure something both before and after the wiki is implemented. This can allow the benefit of the wiki to be quantified, but choosing a suitable measurement is difficult because the wiki itself could not be measured in this case.

Measurements of the wiki could also be tracked over time in order to measure how a specific policy or software change affects the wiki.