[wiki-standards] how to get raw material for stats?

Whiteg Weng whiteg at whiteg.net
Mon Oct 20 21:59:02 CEST 2008


Hi Denis,

The two Wikipedia pages provide information of how to retrieve data  
from Wikipedia.
http://en.wikipedia.org/wiki/Wikipedia:Database_download
http://en.wikipedia.org/w/api.php

Cheers.
Whiteg


On Oct 20, 2008, at 10:50 PM, spir wrote:

> Hello everybody,
>
> I'm new here, this is my first post on the list. I'm no professional  
> researcher, programmer, or web designer, only a hobbyist. And an  
> lover of all kinds of languages.
>
> I'd like to do some statistics about the actual use of wiki language  
> features : which ones are the most used, which ones could be left  
> aside, how predominant are the most used, etc. I'm also really  
> interested in seeing how cultural/linguistic pregnancy (?) may  
> influence that.
> To do this, I plan to parse hundreds of wiki pages from wikipedia or  
> any other wiki that exists in multilinguistic versions. The issue  
> for me is : how to get the raw material? I have no clue. Would you  
> help me for that. This dataset may then be available for further  
> use. I would prefere random pages.
> I can cope with parsing (my code will not comply with any industry  
> standard, but it will be clear and do the job ;-)). The data may be  
> either, the full page's html, the wiki doc's html, or the wiki  
> source text. It may also be db extracts if I can have the format to  
> decode it -- but I highly prefere humanly readable data.
>
> Thank you for your attention,
> Denis
>
> _______________________________________________
>
> wiki-standards mailing list. wiki-standards at wikisym.org
> http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards
>
> For the wiki-research, wiki-standards, wikisym-announce mailing  
> lists, please see:
> http://www.wikisym.org/cgi-bin/mailman/listinfo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.wikisym.org/pipermail/wiki-standards/attachments/20081020/61fa23f5/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://www.wikisym.org/pipermail/wiki-standards/attachments/20081020/61fa23f5/PGP.pgp


More information about the wiki-standards mailing list