Mailing List Archive

[Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp
Inspired by the botgenerated articles of species made on nl:wp in late
2010 a colleague of mine, User:Lsj, started a similar project on sv:wp
early 2012. By October 2012 his bot had generated some 65 000 articles,
with essentially complete coverage of all fungi and birds.

He has since then extended the scope to include all living species, both
animals and plants, which means another 1-1,5 million articles. Running
at full permissible bot speed, the bot generates around 10,000 articles
per day, but at a more realistic speed, the full project will take the
rest of 2013 to complete.

The botcode has been written in a language-independent way, so that it
can be ported to other language versions with only a modest effort. All
language-specific text strings are in external files, so the code itself
does not need changing between language versions. Beyond Swedish, the
code has been tested on Cebuano wikipedia as well; full production on
cebwp is ready to go, just awaiting community blessing there.

The source of the core of the data is taken from Catalogue of Life
http://en.wikipedia.org/wiki/Catalogue_of_Life but the bot also checks
with Commons, other languages(iwlinks) and other appropriate databases,
such as the IUCN Redlist of endangered species.

The botcode is written in C# and uses the DotNetWikiBot framework.

Example articles:
http://sv.wikipedia.org/wiki/Lichenopora_verrucaria
http://sv.wikipedia.org/wiki/Phylactolaemata
http://sv.wikipedia.org/wiki/Rundkrassing
http://ceb.wikipedia.org/wiki/Sipunculidae
http://ceb.wikipedia.org/wiki/Solaster_endeca

The full set of created articles (includes some other stuff as well,
besides organisms):
http://sv.wikipedia.org/wiki/Kategori:Robotskapade_artiklar
http://ceb.wikipedia.org/wiki/Kategoriya:Paghimo_ni_bot

My colleague is much too busy now to discuss himself just now, but I
think it could be an inspiration for us all.

Besides Lsj himself there are about 10 users supporting him, with
checking that the bot generate correct data etc, it has also been
discussed extensively on our village pump etc Wikidata is as yet not
used

The page where the project is discussed is just now (in Swedish of
course..)

http://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_arter



Anders

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
Wow

On 01/11/2013 08:45 AM, Anders Wennersten wrote:
> Inspired by the botgenerated articles of species made on nl:wp in late
> 2010 a colleague of mine, User:Lsj, started a similar project on sv:wp
> early 2012. By October 2012 his bot had generated some 65 000 articles,
> with essentially complete coverage of all fungi and birds.
>
> He has since...


Very interesting! I personally think this is a good story for the
Wikimedia Blog.

https://meta.wikimedia.org/wiki/Wikimedia_Blog/Guidelines


In the meantime:

News section at https://www.mediawiki.org/

http://identi.ca/notice/98949038

https://twitter.com/mediawiki/status/289786939358445568

https://www.facebook.com/MediaWikiProject/posts/146083212211559

https://plus.google.com/u/0/b/103470172168784626509/103470172168784626509/posts/YqVTWGHmLmy

--
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
Quim Gil, 11/01/2013 18:42:
> On 01/11/2013 08:45 AM, Anders Wennersten wrote:
>> Inspired by the botgenerated articles of species made on nl:wp in late
>> 2010 a colleague of mine, User:Lsj, started a similar project on sv:wp
>> early 2012. By October 2012 his bot had generated some 65 000 articles,
>> with essentially complete coverage of all fungi and birds.
>>
>> He has since...
>
>
> Very interesting! I personally think this is a good story for the
> Wikimedia Blog.

See also previous histories at
<https://meta.wikimedia.org/wiki/Proposal_for_Policy_on_overuse_of_bots_in_Wikipedias>

Nemo

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
2013/1/11 Quim Gil <qgil@wikimedia.org>:
> Wow
>
>
> On 01/11/2013 08:45 AM, Anders Wennersten wrote:
>>
>> Inspired by the botgenerated articles of species made on nl:wp in late
>> 2010 a colleague of mine, User:Lsj, started a similar project on sv:wp
>> early 2012. By October 2012 his bot had generated some 65 000 articles,
>> with essentially complete coverage of all fungi and birds.
>>
>> He has since...
>
>
>
> Very interesting! I personally think this is a good story for the Wikimedia
> Blog.

Story about bot-pedia - no, thanks. Human can create much more
valuable content basing on various sources and this is strong
advantage of Wikipedia (not short articles like in traditional paper
encyclopaedias).

--
Daniel // Leinad

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
On 11 January 2013 16:45, Anders Wennersten <mail@anderswennersten.se> wrote:
>
> He has since then extended the scope to include all living species, both
> animals and plants, which means another 1-1,5 million articles. Running
> at full permissible bot speed, the bot generates around 10,000 articles
> per day, but at a more realistic speed, the full project will take the
> rest of 2013 to complete.

Wow!

Very interested to see you got community support for this - normally
it's sharply the other way around.

--
- Andrew Gray
andrew.gray@dunelm.org.uk

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
Andrew Gray skrev 2013-01-12 11:58:
> On 11 January 2013 16:45, Anders Wennersten <mail@anderswennersten.se> wrote:
> Wow!
>
> Very interested to see you got community support for this - normally
> it's sharply the other way around.
We had a very lengthy dissusion of course with all the common arguments.
We were and are as negative as all others to the botgeneration of
articles from other language versions as was done around 2008. During
our discussion we evolved the concept quite a bit (special templates,
categories, messages on talk page etc). Also for us the precedence and
experience from nl:wp was important

We also had two successful runs in 2012, not only birds as was
mentioned, but also of 30 000 articles for French communes (also with
data taken from primary sources, Commons and iw links to other language
versions)

Anders



_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
On 11 January 2013 16:45, Anders Wennersten <mail@anderswennersten.se> wrote:
> Inspired by the botgenerated articles of species made on nl:wp in late
> 2010 a colleague of mine, User:Lsj, started a similar project on sv:wp
> early 2012. By October 2012 his bot had generated some 65 000 articles,
> with essentially complete coverage of all fungi and birds.

For bird species, you may wish to replicate and deploy this
en.Wikipedia template:

http://en.wikipedia.org/wiki/Template:Xeno-canto_species


--
Andy Mabbett
@pigsonthewing
http://pigsonthewing.org.uk

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
Anders Wennersten, 12/01/2013 12:20:
> We had a very lengthy dissusion of course with all the common arguments.
> We were and are as negative as all others to the botgeneration of
> articles from other language versions as was done around 2008. During
> our discussion we evolved the concept quite a bit (special templates,
> categories, messages on talk page etc).

Could you elaborate on this "evolution of the concept"? I'm not able to
see what's new, from the "titles" in parentheses.

Nemo

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
Federico Leva (Nemo) skrev 2013-01-15 11:02:
> Anders Wennersten, 12/01/2013 12:20:
>> During our discussion we evolved the concept quite a bit (special
>> templates,
>> categories, messages on talk page etc).
>
> Could you elaborate on this "evolution of the concept"? I'm not able
> to see what's new, from the "titles" in parentheses.
>
> Nemo
This bot puts a template in all generated articles clearly stating it is
botgenerated and text stating "/This article has been created by Lsjbot
and can have language errors and/or a mildly confusing setup of
illustrations. This template can be deleted after checks of content has
been done/" For the botgenerated articles for birds more then half have
afterward been manually reviewed. This was our major concern, that
botgenerated articles must not by a reader be given the impression they
are manually created.

Example
http://sv.wikipedia.org/wiki/Acanthochitona_arragonites

The bot does a major effort translating English text, like the
geographical name of the area of inhabitance for the specie. In the
balance of making these translation table too big, and to skip
translation when complicated, the bot now puts the complicated text on
the talkpage. In the example above it is for Gulf of California. In this
way the reader or the one doing the manual afterfix find the info and
can make use of it.

The set of categories that all bot generated articles will have, even if
and after it is manually checked/corrected, is partly for general
keeping track but also to be able to initiate automatic
check/corrections of a special set of botgenerated articles, if a
problem/error is found some time after the generating time.

Also there are processes set up for the inspectors of the articles in
order to easy report any questions, and get feedback it is been taken
care of. If a backlog occurs of reported problems, the bot generation
stops, until all is fixed (very few thing being reported by this stage).
On sv:wp there are around 6-8 frequest contributers in the zoological
area with 10-15 more infrequent contributers. These are very competent
and are all supporting this effort with inspecting etc. Without the
support of these the project would never have got off the ground

Anders

re the question re Xeon-cant, thanks I will forward this to Lsj. A link
to Wikipecies for corresponding articles already included (of course)



_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Re: [Wikimedia-l] Lsjbot has now started to generate 1-1, 5 M articles of species on sv:wp [ In reply to ]
Thanks for the reply.

Anders Wennersten, 15/01/2013 12:15:
> Federico Leva (Nemo) skrev 2013-01-15 11:02:
>> Anders Wennersten, 12/01/2013 12:20:
>>
>> Could you elaborate on this "evolution of the concept"? I'm not able
>> to see what's new, from the "titles" in parentheses.
>>
> This bot puts a template in all generated articles clearly stating it is
> botgenerated and text stating "/This article has been created by Lsjbot
> and can have language errors and/or a mildly confusing setup of
> illustrations. This template can be deleted after checks of content has
> been done/" For the botgenerated articles for birds more then half have
> afterward been manually reviewed. This was our major concern, that
> botgenerated articles must not by a reader be given the impression they
> are manually created.
>
> Example
> http://sv.wikipedia.org/wiki/Acanthochitona_arragonites

Oh, sure, such warnings are customary on most bot creations nowadays.

>
> The bot does a major effort translating English text, like the
> geographical name of the area of inhabitance for the specie. In the
> balance of making these translation table too big, and to skip
> translation when complicated, the bot now puts the complicated text on
> the talkpage. In the example above it is for Gulf of California. In this
> way the reader or the one doing the manual afterfix find the info and
> can make use of it.

I don't know if talk page is better than a central wikiproject page with
task subpages which are usually used for such cases, but yes this is useful.

>
> The set of categories that all bot generated articles will have, even if
> and after it is manually checked/corrected, is partly for general
> keeping track but also to be able to initiate automatic
> check/corrections of a special set of botgenerated articles, if a
> problem/error is found some time after the generating time.

This is very useful, I liked it in particular for the geograph
bot-uploads on Commons by multichill.

>
> Also there are processes set up for the inspectors of the articles in
> order to easy report any questions, and get feedback it is been taken
> care of. If a backlog occurs of reported problems, the bot generation
> stops, until all is fixed (very few thing being reported by this stage).
> On sv:wp there are around 6-8 frequest contributers in the zoological
> area with 10-15 more infrequent contributers. These are very competent
> and are all supporting this effort with inspecting etc. Without the
> support of these the project would never have got off the ground

I agree, the success of such initiatives lie in how much human work
they're able to instigate and be supported from.
6-8 editors is much better than nothing. It's still a drop in the ocean
for such an amount of articles, of course: at least on it.wiki we
usually have a similar amount of checkers for something like three
orders of magnitude less articles (asteroids in recent years; Italian
municipalities in the ~2005 golden age).

Nemo

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l