[wiki-standards] how to get raw material for stats?

spir denis.spir at free.fr
Tue Oct 21 22:42:39 CEST 2008


Hello Whiteg,

Thank you very much for the links. Exploring from there led me to 3 
methods, at least, to get the data I need :
* through the API
* using special:AllPages and special:Export special pages
* with a bot such as the ones described at http://botwiki.sno.cc/wiki/
This last solution may require some learning time to become familiar 
withthe  whole lot of pyhton-MediaWiki tools for 'boting' available 
there, but I guess I'll favor it still favor it may not be lost time at all.
Thanks again,
Denis

Whiteg Weng a écrit :
> Hi Denis,
>
> The two Wikipedia pages provide information of how to retrieve data 
> from Wikipedia.
> http://en.wikipedia.org/wiki/Wikipedia:Database_download
> http://en.wikipedia.org/w/api.php
>
> Cheers.
> Whiteg
>
>
> On Oct 20, 2008, at 10:50 PM, spir wrote:
>
>> Hello everybody,
>>
>> I'm new here, this is my first post on the list. I'm no professional 
>> researcher, programmer, or web designer, only a hobbyist. And an 
>> lover of all kinds of languages.
>>
>> I'd like to do some statistics about the actual use of wiki language 
>> features : which ones are the most used, which ones could be left 
>> aside, how predominant are the most used, etc. I'm also really 
>> interested in seeing how cultural/linguistic pregnancy (?) may 
>> influence that.
>> To do this, I plan to parse hundreds of wiki pages from wikipedia or 
>> any other wiki that exists in multilinguistic versions. The issue for 
>> me is : how to get the raw material? I have no clue. Would you help 
>> me for that. This dataset may then be available for further use. I 
>> would prefere random pages.
>> I can cope with parsing (my code will not comply with any industry 
>> standard, but it will be clear and do the job ;-)). The data may be 
>> either, the full page's html, the wiki doc's html, or the wiki source 
>> text. It may also be db extracts if I can have the format to decode 
>> it -- but I highly prefere humanly readable data.
>>
>> Thank you for your attention,
>> Denis
>>
>> _______________________________________________
>>
>> wiki-standards mailing list. wiki-standards at wikisym.org
>> http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards
>>
>> For the wiki-research, wiki-standards, wikisym-announce mailing 
>> lists, please see:
>> http://www.wikisym.org/cgi-bin/mailman/listinfo
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
>
> wiki-standards mailing list. wiki-standards at wikisym.org
> http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards
>
> For the wiki-research, wiki-standards, wikisym-announce mailing lists, please see:
> http://www.wikisym.org/cgi-bin/mailman/listinfo




More information about the wiki-standards mailing list