[wiki-standards] how to get raw material for stats?
Marc Laporte
marc at marclaporte.com
Tue Oct 21 00:05:20 CEST 2008
Hi Denis!
Please do read up on existing research and try to enhance that. You'll
find some good links on:
http://wiki-translation.com/BabelWiki08
Best regards,
--
Marc Laporte
http://MarcLaporte.com
http://TikiWiki.org/MarcLaporte
http://AvanTech.net
http://OurWiki.net
On Mon, Oct 20, 2008 at 3:50 PM, spir <denis.spir at free.fr> wrote:
> Hello everybody,
>
> I'm new here, this is my first post on the list. I'm no professional
> researcher, programmer, or web designer, only a hobbyist. And an lover of
> all kinds of languages.
>
> I'd like to do some statistics about the actual use of wiki language
> features : which ones are the most used, which ones could be left aside, how
> predominant are the most used, etc. I'm also really interested in seeing how
> cultural/linguistic pregnancy (?) may influence that.
> To do this, I plan to parse hundreds of wiki pages from wikipedia or any
> other wiki that exists in multilinguistic versions. The issue for me is :
> how to get the raw material? I have no clue. Would you help me for that.
> This dataset may then be available for further use. I would prefere random
> pages.
> I can cope with parsing (my code will not comply with any industry standard,
> but it will be clear and do the job ;-)). The data may be either, the full
> page's html, the wiki doc's html, or the wiki source text. It may also be db
> extracts if I can have the format to decode it -- but I highly prefere
> humanly readable data.
>
> Thank you for your attention,
> Denis
>
More information about the wiki-standards
mailing list