[wiki-research] Loaded questions about Semantic Wikis

Desilets, Alain Alain.Desilets at nrc-cnrc.gc.ca
Fri Jul 14 15:22:06 CEST 2006


1. Successful Deployment of a Semantic Wiki Concept

We deployed a Semantic Wiki (based on the Semantic Mediawiki
extension) at this years European Semantic Web Conference <http://wiki.eswc2006.org/> (credit: Denny Vrandecic and Markus Krötzsch). In terms of success, then from a semantic point of view yes it was a great success: the wiki was populated automatically with people listed in the semantic delegates list, users were able to create statements about their interests, affiliations etc, which were then available as RDF/XML. There were still the issues of reaching widespread adoption/uptake and not everyone being motivated to contribute, but I'd say these were generic wiki issues.

-- Alain:
Hum... That's getting closer, and I'll have a look at it for sure.

But it still doesn't count as a "real" use. One of the basic principles of usability engineering is that "You (yes, YOU!) are NOT the user". In other words, the people building a technology cannot really evaluate its usability and usefulness. The reasons are that (a) their skill and knowledge w.r.t. to the technology being evaluated are not representative of the target end-user population and (b) they have a lot of skin in the success of the technology. I do believe builders of a technology CAN get a FEEL for how useful it will be. But they have to work real hard at putting themselves in the shoes of end-users who are very different from themselves, and at fighting the very natural tendancy to think that one's baby is beautiful. In my experience, few people (tecchies or not) are able to do this well. So the the ultimate test is to have people from the actual target user population try the system.
----

2. Semantic Wiki: Consumer or Producer of Data for the Semantic Web?

> 2) If so, how do they get around the issue of making it easy and worth 
> it for end-users to > provide the metadata that Semantic Wikis are 
> based on?
>
> 3) Also, what *used* features (i.e. features that people have used for 
> real in a real operational situation) does the metadata enable that 
> could not be provided otherwise?

Drawing on the ESWC2006 experience again, then part of the answer to your question 2) is that the semantic delegates list (for example) was produced automatically from the conference registration system, and then consumed by the semantic wiki; 

-- Alain:
Presumably that data was published as a list of machine readable facts that said things like:

Jack Sparrow was-a-delegate-at ESWC06
Wil Turner was-a-delegate-at ESWC06
Etc...

Note: Right away you can tell how little I know about semantic web by the fact that I have no idea how to format these facts using the conventions of semantic web ;-).

Here are few more questions:

A) Can you give me an example of a useful inference that the machine could make based on those facts?
B) Can you give me a scenario where an end user would benefit from the above inference. By "scenario", I mean something very concrete like: "Elizabeth Swan is a delegate at ESWC06 and wants to know if the infamous Jack Sparrow will also be there, so she asks the system for a list of delegates and sees a list with Jack's name in it".

Note that B) should something that can't be done without using the semantic metadata encoded in the pages. For example, the scenario I provided above does not fill that criteria, because you could have just as easily written a dynamic web page that directly pulls the list of delegates from the same source used to automatically generate the metadata and renders it as an HTML page instead of a list of metadata facts.
----

a pre-populated wiki and no human data creation required. It could then be supplemented by manual edits that could themselves produce more data for the semantic web in the form of RDF triples. One reason for people doing this is being able to say more about themselves (quite a motivator in my experience ;), e.g. <http://wiki.eswc2006.org/index.php/User:Tom_Heath>

-- Alain:
That's great. Now we're getting into something concrete.

So we have a user writing this:

***
User:Tom Heath  Affiliation  Open University  +
User:Tom Heath  Nationality  UK  +
User:Tom Heath  Attends  ESWC2006  +
User:Tom Heath  Interested in  recommender systems  +, and social networks on the Semantic Web  +
***

I can see how if everyone in the world agreed on the exact words and syntax to be used to describe this sort of facts, and was able to use it correctly and consistently, we would have something very useful. I could go to SemanticGoogle (I bet they're working on it as we speak) and write a query that essentially asked: "Show me all the researchers in the UK who are interested in "semantic web" and "HCI".

But how will we get to a point where we all agree on this? How large will the language be that people need to master in order to express all the important facts they might want to express in a Semantic Web page either regarding their work life or private life? And given that this will have to be a precise machine readable language, will people actually be able to use it?
----

As for your question 3), I would argue that the full range of user interaction and functionality that a semantic web enables is only now beginning to be properly explored (about time too imho :) e.g. see <http://swui.semanticweb.org/swui06/>

-- Alain:
I was hoping someone would give me a couple really cool and convincing example, without me having to read a bunch of papers ;-).
----

3. How to Create all the Semantic Web Data?

> How do you convince people to write this metadata, given that it will 
> require a significant > amount of work beyond just authoring the 
> content?

You're absolutely right - motivating people to create this data is very hard, so we need to use a wide range of different methods, some of which whereby creating the content *is* also creating the metadata. Very few people will edit RDF directly, it's just too laborious. However, something like the OpenGuides software <http://openguides.org/> allows people to make wiki entries and fill out forms for specific bits of information which then gets automatically exported as RDF/XML. The software does this out of the box, and most users don't even (need to) know. This sort of model is one way of getting people to create data for the semantic web without needing to understand what's going on beneath the surface.

-- Alain:
That's interesting. I think that could actually work. Essentially, ordinary users wouldn't have to agree on and master a standardized semantic tagging language. 

But it would still require the IT industry to agree on a very large standardized language for describing the important things in different domains. But maybe this is amenable to divide-and-conquer. For example, the financial industry could figure out a standardized way of expressing the important semantics relevant in their domain. Universities could do the same, government, etc... But I think even that will be a big challenge. Like I said, we can't even agree on a standard for JavaScript.
----

4. Semantic Web for Reasoning or Data Integration (or both)?

Lots of noise is made about the reasoning aspects of the semantic web, but that's not the full story. Others would argue that the semantic web is primarily a platform for information integration (e.g. see McBride (2002): <http://scholar.google.com/scholar?hl=en&lr=&cluster=4489085341494063500>).
I'll leave you to make up your own mind ;)

-- Alain:
There are already a bunch of standard data exchange formats relevant to various industries. Actually, for each industry, there are usually more than one competing formats I believe. It seems to me that Semantic Web may be able to contribute by unifying the syntax of all those different languages, so that you don't have to write a different parser everytime. And that may enable some interesting things that are currently not possible. But I'll need to see concrete examples to really understand what the benefits are.
----

5. Schemas, Vocabularies, Ontologies, and Getting it Right

Firstly I think it's worth distinguishing between schemas/vocabs/ontologies for describing the contents of a wiki article, and the rest of the wiki "page". In terms of how the contents of an article are made available semantically, then an article about a famous winemaker would likely use some classes and properties from FOAF and some from the Wine ontology. Being able to mix and match vocabularies without having to validate a "document" against one definitive schema, XML-style, is one of the beauties of RDF. So, a heterogenous, open world of many different vocabs and ontologies (some small, some big, many overlapping) is where we're headed.

As for people getting it right, as you say, they won't ;) at least not always. The issue is probably more about how they get it wrong. Luckily the semantic web assumes information may be contradictory; the issue is then who or what to trust.

-- Alain:
That's VERY interesting. I tend to be biased more towards statistical approaches than rule based ones because they are the ones that seem to work best in practice. All search engines are statistical based, the current best machine translation systems are all statistical based, successful speech recognition are all statistical based.

Looking at the search engine example, the reason why they work better than rule based ones is that:

A) The data has lots of noise.
B) Statistical methods are robust to noise
C) You have lots of data to compute stats from (so that you can measure the noise and signal and distinguish them from each other)
D) In the context of the web, precision matters more than recall

What you are saying is that B) also holds for at least some of the inference engines being developed for in Semantic Web. Now, B) may eventually hold if we can find a way to make it easy for machines or people to produce this data at a large scale. And D) will certainly hold for at least SOME SemanticWiki approaches. For example, when looking for "the best prices for a Toyota Corolla in the Ottawa-Montreal corridor", I'm not REALLY looking for the BEST price. I just want to find a really good rock-bottom price.

I would also guess that A) will also be true for the semantic metadata, but much less so than free-form text.
----

Ok, this is quite long enough. Hope it's of some use, whether or not it convinces you one way or another ;)

-- Alain:
Thank you for taking the time to write it. It was very informative. 

I'm almost convinced ;-).
----



More information about the wiki-research mailing list