Mailing List Archive

Re: [Offline-l] The Whole Wikipedia in English with pictures in one 40GB big file
Le 02/03/2014 01:33, Samuel Klein a écrit :
> Brilliant. Congrats to everyone who is working on this!
> What is needed to scrape categories?

0 - For all dumped pages (so at least NS_MAIN and NS_CATEGORY pages),
download the list of categories they belong to (with the MW API).
1 - For each dumped page, implement the HTML rendering of the category
list at the bottom.
2 - For each category page, get the content HTML rendering from Parsoid
and compute and render sorted lists of articles and sub-categories in a
similar fashion like the online version (with multiple pages if necessary).

All the stuff must be integrated in the nodejs script and category graph
must be stored in redis.

Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Offline-l] The Whole Wikipedia in English with pictures in one 40GB big file [ In reply to ]
Le 07/03/2014 19:25, Asaf Bartov a écrit :
> btw, are these new improved tools documented anywhere?
> http://kiwix.org/wiki/Development does not seem to point in the right
> direction.

The usage is pretty straightforward (for IT people) and IMO everything
necessary is explained in the READMEs:
* mwoffliner:
https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/
* zimwriterfs:
https://sourceforge.net/p/kiwix/other/ci/master/tree/zimwriterfs/

NB: The goal is not that everybody creates its own full wikipedia ZIM
file. The goal is that we (Wikimedia) provide these files, often enough
to always have up2date ZIM information (so at least one time per month).
Thus, the challenge is now to setup an infrastructure similar to the one
which creates the XML dumps.

Emmanuel

PS: We really want to make a post @blog.wikimedia.org (so in English).
If someone is volunteer to write this, I would really appreciate his help.
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Offline-l] The Whole Wikipedia in English with pictures in one 40GB big file [ In reply to ]
----- Original Message -----
> From: "Emmanuel Engelhart" <kelson@kiwix.org>

> PS: We really want to make a post @blog.wikimedia.org (so in English).
> If someone is volunteer to write this, I would really appreciate his
> help.

If you write such a blog post in what English you have handy, I'd be happy
to English it up for you; you know what points you want to make better than
I would. :-)

Cheers,
-- jra
--
Jay R. Ashworth Baylink jra@baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII
St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l