[wiki-standards] how to get raw material for stats?
spir
denis.spir at free.fr
Tue Oct 21 22:42:39 CEST 2008
Hello Whiteg,
Thank you very much for the links. Exploring from there led me to 3
methods, at least, to get the data I need :
* through the API
* using special:AllPages and special:Export special pages
* with a bot such as the ones described at http://botwiki.sno.cc/wiki/
This last solution may require some learning time to become familiar
withthe whole lot of pyhton-MediaWiki tools for 'boting' available
there, but I guess I'll favor it still favor it may not be lost time at all.
Thanks again,
Denis
Whiteg Weng a écrit :
> Hi Denis,
>
> The two Wikipedia pages provide information of how to retrieve data
> from Wikipedia.
> http://en.wikipedia.org/wiki/Wikipedia:Database_download
> http://en.wikipedia.org/w/api.php
>
> Cheers.
> Whiteg
>
>
> On Oct 20, 2008, at 10:50 PM, spir wrote:
>
>> Hello everybody,
>>
>> I'm new here, this is my first post on the list. I'm no professional
>> researcher, programmer, or web designer, only a hobbyist. And an
>> lover of all kinds of languages.
>>
>> I'd like to do some statistics about the actual use of wiki language
>> features : which ones are the most used, which ones could be left
>> aside, how predominant are the most used, etc. I'm also really
>> interested in seeing how cultural/linguistic pregnancy (?) may
>> influence that.
>> To do this, I plan to parse hundreds of wiki pages from wikipedia or
>> any other wiki that exists in multilinguistic versions. The issue for
>> me is : how to get the raw material? I have no clue. Would you help
>> me for that. This dataset may then be available for further use. I
>> would prefere random pages.
>> I can cope with parsing (my code will not comply with any industry
>> standard, but it will be clear and do the job ;-)). The data may be
>> either, the full page's html, the wiki doc's html, or the wiki source
>> text. It may also be db extracts if I can have the format to decode
>> it -- but I highly prefere humanly readable data.
>>
>> Thank you for your attention,
>> Denis
>>
>> _______________________________________________
>>
>> wiki-standards mailing list. wiki-standards at wikisym.org
>> http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards
>>
>> For the wiki-research, wiki-standards, wikisym-announce mailing
>> lists, please see:
>> http://www.wikisym.org/cgi-bin/mailman/listinfo
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
>
> wiki-standards mailing list. wiki-standards at wikisym.org
> http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards
>
> For the wiki-research, wiki-standards, wikisym-announce mailing lists, please see:
> http://www.wikisym.org/cgi-bin/mailman/listinfo
More information about the wiki-standards
mailing list