Mailing List Archive

serious interwiki.py issues on MW 1.18 wikis
Hello to both the wikitech and pywikipedia lists -- please keep both
informed when replying. Thanks.

A few days ago, we - the pywikipedia developers - received alarming
reports of interwiki bots removing content from pages. This does not
seem to happen often, and we have not been able to reproduce the
conditions in which this happens.

However, the common denominator is the fact it seems to be happening
only on the wikipedia's that run MediaWiki 1.18 wikis. As such, I
think this topic might be relevant for wikitech-l, too. In addition,
there is no-one in the pywikipedia team with a clear idea of why this
is happening. As such, we would appreciate any ideas.

1. What happens?
Essentially, the interwiki bot does its job, retrieves the graph and
determines the correct interwiki links. It should then add it to the
page, but instead, /only/ the interwiki links are stored. For example:
http://nl.wikipedia.org/w/index.php?title=Blankenbach&diff=next&oldid=10676248
http://eo.wikipedia.org/w/index.php?title=Anton%C3%ADn_Kl%C3%A1%C5%A1tersk%C3%BD&action=historysubmit&diff=3855198&oldid=1369139
http://simple.wikipedia.org/w/index.php?title=Mettau%2C_Switzerland&action=historysubmit&diff=3060418&oldid=1249270

2. Why does this happen?
This is unclear. On the one hand, interwiki.py is somewhat black
magic: none of the current developers intimately knows its workings.
On the other hand, the bug is not reproducible: running it on the
exact same page with the exact same page text does not result in a
cleared page. It could very well be something like broken network
error handling - but mainly, we have no idea. Did anything change in
Special:Export (which is still used in interwiki.py) or the API which
might cause something like this? I couldn't find anything in the
release notes.

3. Reasons for relating it to MW 1.18
To find out on which wikis this problem happens, I used a
quick-and-dirty heuristic:
select rc_comment, rc_cur_time, rc_user, rc_namespace, rc_title,
rc_old_len, rc_new_len from recentchanges left join user_groups on
ug_user=rc_user where rc_new_len < rc_old_len * 0.1 and ug_group =
'bot' and rc_namespace=0 limit 10 /* SLOW OK */;

This is a slow query (~30s for nlwiki_p on the toolserver), but it
gives some interesting results:
nlwiki: 9 rows, all broken interwiki bots
eowiki: 25 rows, all interwiki bots
simplewiki: 3 rows, of which 2 are interwiki bots
dewiki: 0 rows
using rc_old_len * 0.3: 14 rows, all double redirect fixes
frwiki: 9 rows, but *none* from interwiki bots (all edits are by the
same antivandalism bot)
itwiki: 0 rows
ptwiki: 0 rows


All ideas and hints are very welcome. Hopefully we will be able to
solve this before tuesday...

Best regards,
Merlijn van Deen

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Thu, Sep 29, 2011 at 1:08 PM, Merlijn van Deen <valhallasw@arctus.nl>wrote:

> 2. Why does this happen?
> This is unclear. On the one hand, interwiki.py is somewhat black
> magic: none of the current developers intimately knows its workings.
> On the other hand, the bug is not reproducible: running it on the
> exact same page with the exact same page text does not result in a
> cleared page. It could very well be something like broken network
> error handling - but mainly, we have no idea. Did anything change in
> Special:Export (which is still used in interwiki.py) or the API which
> might cause something like this? I couldn't find anything in the
> release notes.
>

The thing I'd recommend is enabling some debug instrumentation in the bots,
so that next time one makes a bad edit y'all can review those logs and see
what it was doing.

I don't know what logging is already available, but you basically want to
see every HTTP request it makes (URL and POST data if any), and the response
received.

This should help narrow it down significantly to one of:
* something in MW is outputting wrong data (visibly wrong output from
api/export)
* something in pywikipediabot is processing data wrong (all right output
from api/export, but input data being sent on edit is already wrong)
* something in MW is processing input data wrong (all right output from
api/export, all input being sent looks correct)

Note that there may be legitimate differences in api or export data that the
bot is processing incorrectly, so look close. :)

-- brion
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
Merlijn van Deen wrote:
> Hello to both the wikitech and pywikipedia lists -- please keep both
> informed when replying. Thanks.
>
> A few days ago, we - the pywikipedia developers - received alarming
> reports of interwiki bots removing content from pages. This does not
> seem to happen often, and we have not been able to reproduce the
> conditions in which this happens.
>
> However, the common denominator is the fact it seems to be happening
> only on the wikipedia's that run MediaWiki 1.18 wikis. As such, I
> think this topic might be relevant for wikitech-l, too. In addition,
> there is no-one in the pywikipedia team with a clear idea of why this
> is happening. As such, we would appreciate any ideas.
>
> 1. What happens?
> Essentially, the interwiki bot does its job, retrieves the graph and
> determines the correct interwiki links.

Does it use the page content to retrieve the interwiki links? Or is it
retrieved eg. by doing a different query to the API?
I.e. would receiving no content (from the bot POV) produce that behavior?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
Στις 29-09-2011, ημέρα Πεμ, και ώρα 22:08 +0200, ο/η Merlijn van Deen
έγραψε:
> Hello to both the wikitech and pywikipedia lists -- please keep both
> informed when replying. Thanks.
>
> A few days ago, we - the pywikipedia developers - received alarming
> reports of interwiki bots removing content from pages. This does not
> seem to happen often, and we have not been able to reproduce the
> conditions in which this happens.
>
> However, the common denominator is the fact it seems to be happening
> only on the wikipedia's that run MediaWiki 1.18 wikis. As such, I
> think this topic might be relevant for wikitech-l, too. In addition,
> there is no-one in the pywikipedia team with a clear idea of why this
> is happening. As such, we would appreciate any ideas.
>
> 1. What happens?
> Essentially, the interwiki bot does its job, retrieves the graph and
> determines the correct interwiki links. It should then add it to the
> page, but instead, /only/ the interwiki links are stored. For example:
> http://nl.wikipedia.org/w/index.php?title=Blankenbach&diff=next&oldid=10676248
> http://eo.wikipedia.org/w/index.php?title=Anton%C3%ADn_Kl%C3%A1%C5%A1tersk%C3%BD&action=historysubmit&diff=3855198&oldid=1369139
> http://simple.wikipedia.org/w/index.php?title=Mettau%2C_Switzerland&action=historysubmit&diff=3060418&oldid=1249270
>
> 2. Why does this happen?
> This is unclear. On the one hand, interwiki.py is somewhat black
> magic: none of the current developers intimately knows its workings.
> On the other hand, the bug is not reproducible: running it on the
> exact same page with the exact same page text does not result in a
> cleared page. It could very well be something like broken network
> error handling - but mainly, we have no idea. Did anything change in
> Special:Export (which is still used in interwiki.py) or the API which
> might cause something like this? I couldn't find anything in the
> release notes.

Out of curiosity... If the new revisions of one of these badly edited
pages are deleted, leaving the top revision as the one just before the
bad iw bot edit, does a rerun of the bot on the page fail?

Ariel




_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Fri, Sep 30, 2011 at 1:06 AM, Platonides <platonides@gmail.com> wrote:

> Merlijn van Deen wrote:
> > Hello to both the wikitech and pywikipedia lists -- please keep both
> > informed when replying. Thanks.
> >
> > A few days ago, we - the pywikipedia developers - received alarming
> > reports of interwiki bots removing content from pages. This does not
> > seem to happen often, and we have not been able to reproduce the
> > conditions in which this happens.
> >
> > However, the common denominator is the fact it seems to be happening
> > only on the wikipedia's that run MediaWiki 1.18 wikis. As such, I
> > think this topic might be relevant for wikitech-l, too. In addition,
> > there is no-one in the pywikipedia team with a clear idea of why this
> > is happening. As such, we would appreciate any ideas.
> >
> > 1. What happens?
> > Essentially, the interwiki bot does its job, retrieves the graph and
> > determines the correct interwiki links.
>
> Does it use the page content to retrieve the interwiki links? Or is it
> retrieved eg. by doing a different query to the API?
>

The interwiki links are retrieved from page content. The page content has
been received through a call to Special:Export.


> I.e. would receiving no content (from the bot POV) produce that behavior?
>

Yes, the only reasonable explanation seems to be that the bot interprets
what it gets from the server as an empty page.

--
André Engels, andreengels@gmail.com
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Fri, Sep 30, 2011 at 12:56 PM, Andre Engels <andreengels@gmail.com>wrote:

>
> The interwiki links are retrieved from page content. The page content has
> been received through a call to Special:Export.
>
>
> > I.e. would receiving no content (from the bot POV) produce that behavior?
> >
>
> Yes, the only reasonable explanation seems to be that the bot interprets
> what it gets from the server as an empty page.
>

So you screen-scrape? No surprise it breaks. Why? For example, due to
protocol-relative URLs. Or some other changes to HTML output. Why not just
use API?

--
Best regards,
Max Semenik ([[User:MaxSem]])
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel@wikimedia.org> wrote:

> Out of curiosity... If the new revisions of one of these badly edited
> pages are deleted, leaving the top revision as the one just before the
> bad iw bot edit, does a rerun of the bot on the page fail?
>

I did a test, and the result was very interesting, which might point to the
cause of this bug:

I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
the problematic bot edit. When now I look at the page, instead of the page
content I get:

In de database is geen inhoud aangetroffen voor de pagina met .

Dit kan voorkomen als u een verouderde verwijzing naar het verschil tussen
twee versies van een pagina volgt of een versie opvraagt die is verwijderd.

Als dit niet het geval is, hebt u wellicht een fout in de software gevonden.
Maak hiervan melding bij een
systeembeheerder<http://nl.wikipedia.org/wiki/Speciaal:Gebruikerslijst/sysop>van
Wikipedia en vermeld daarbij de URL van deze pagina.


Going to the specific version that after the deletion-and-partial-restore
should be the newest (
http://nl.wikipedia.org/w/index.php?title=Blankenbach&oldid=10676248), it
claims that there is a newer version, but going to the newer version or the
newest version, I get the abovementioned message again.

As an extra test, I did the
delete-then-restore-some-versions-but-not-the-most-recent action with
another page (http://nl.wikipedia.org/wiki/Gebruiker:Andre_Engels/Test), and
there I found no such problem. From this I conclude that the bug has not
been caused by that process, but that for some reason the page had a wrong
(or empty) version number for its 'most recent' version, or something like
that.




--
André Engels, andreengels@gmail.com
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Fri, Sep 30, 2011 at 11:12 AM, Max Semenik <maxsem.wiki@gmail.com> wrote:

> On Fri, Sep 30, 2011 at 12:56 PM, Andre Engels <andreengels@gmail.com
> >wrote:
>
> >
> > The interwiki links are retrieved from page content. The page content has
> > been received through a call to Special:Export.
> >
> >
> > > I.e. would receiving no content (from the bot POV) produce that
> behavior?
> > >
> >
> > Yes, the only reasonable explanation seems to be that the bot interprets
> > what it gets from the server as an empty page.
> >
>
> So you screen-scrape? No surprise it breaks. Why? For example, due to
> protocol-relative URLs. Or some other changes to HTML output. Why not just
> use API?
>

Basically, because most of the core functionality comes from before the API
came into existence. At least, that would be my explanation.

--
André Engels, andreengels@gmail.com
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels <andreengels@gmail.com>wrote:

> On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel@wikimedia.org>wrote:
>
>> Out of curiosity... If the new revisions of one of these badly edited
>> pages are deleted, leaving the top revision as the one just before the
>> bad iw bot edit, does a rerun of the bot on the page fail?
>>
>
> I did a test, and the result was very interesting, which might point to the
> cause of this bug:
>
> I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
> the problematic bot edit. When now I look at the page, instead of the page
> content I get:
>
> In de database is geen inhoud aangetroffen voor de pagina met .
>
> Dit kan voorkomen als u een verouderde verwijzing naar het verschil tussen
> twee versies van een pagina volgt of een versie opvraagt die is verwijderd.
>
> Als dit niet het geval is, hebt u wellicht een fout in de software
> gevonden. Maak hiervan melding bij een systeembeheerder<http://nl.wikipedia.org/wiki/Speciaal:Gebruikerslijst/sysop>van Wikipedia en vermeld daarbij de URL van deze pagina.
>
>
> Going to the specific version that after the deletion-and-partial-restore
> should be the newest (
> http://nl.wikipedia.org/w/index.php?title=Blankenbach&oldid=10676248), it
> claims that there is a newer version, but going to the newer version or the
> newest version, I get the abovementioned message again.
>
> As an extra test, I did the
> delete-then-restore-some-versions-but-not-the-most-recent action with
> another page (http://nl.wikipedia.org/wiki/Gebruiker:Andre_Engels/Test),
> and there I found no such problem. From this I conclude that the bug has not
> been caused by that process, but that for some reason the page had a wrong
> (or empty) version number for its 'most recent' version, or something like
> that.C
>
Curiouser and curiouser... I now see that when I click the edit button from
the abovementioned page, I do get to edit the page at is it shown, even
though that one is not in the history (the page is a copy of [[
MediaWiki:Missing-article<http://nl.wikipedia.org/wiki/MediaWiki:Missing-article>]]
with the empty string filled in for $2).

--
André Engels, andreengels@gmail.com
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Fri, Sep 30, 2011 at 9:21 PM, Andre Engels <andreengels@gmail.com> wrote:
> On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels <andreengels@gmail.com>wrote:
>
>> On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel@wikimedia.org>wrote:
>>
>>> Out of curiosity... If the new revisions of one of these badly edited
>>> pages are deleted, leaving the top revision as the one just before the
>>> bad iw bot edit, does a rerun of the bot on the page fail?
>>>
>>
>> I did a test, and the result was very interesting, which might point to the
>> cause of this bug:
>>
>> I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
>> the problematic bot edit. When now I look at the page, instead of the page
>> content I get:
>>
>> In de database is geen inhoud aangetroffen voor de pagina met .
>>
>> Dit kan voorkomen als u een verouderde verwijzing naar het verschil tussen
>> twee versies van een pagina volgt of een versie opvraagt die is verwijderd.
>>
>> Als dit niet het geval is, hebt u wellicht een fout in de software
>> gevonden. Maak hiervan melding bij een systeembeheerder<http://nl.wikipedia.org/wiki/Speciaal:Gebruikerslijst/sysop>van Wikipedia en vermeld daarbij de URL van deze pagina.
>>
>>
>> Going to the specific version that after the deletion-and-partial-restore
>> should be the newest (
>> http://nl.wikipedia.org/w/index.php?title=Blankenbach&oldid=10676248), it
>> claims that there is a newer version, but going to the newer version or the
>> newest version, I get the abovementioned message again.
>>
>> As an extra test, I did the
>> delete-then-restore-some-versions-but-not-the-most-recent action with
>> another page (http://nl.wikipedia.org/wiki/Gebruiker:Andre_Engels/Test),
>> and there I found no such problem. From this I conclude that the bug has not
>> been caused by that process, but that for some reason the page had a wrong
>> (or empty) version number for its 'most recent' version, or something like
>> that.C
>>
> Curiouser and curiouser... I now see that when I click the edit button from
> the abovementioned page, I do get to edit the page at is it shown, even
> though that one is not in the history (the page is a copy of [[
> MediaWiki:Missing-article<http://nl.wikipedia.org/wiki/MediaWiki:Missing-article>]]
> with the empty string filled in for $2).
>
> --
> André Engels, andreengels@gmail.com
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

page_latest = 0 ... WTF?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels <andreengels@gmail.com> wrote:
> On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel@wikimedia.org> wrote:
>
>> Out of curiosity... If the new revisions of one of these badly edited
>> pages are deleted, leaving the top revision as the one just before the
>> bad iw bot edit, does a rerun of the bot on the page fail?
>>
>
> I did a test, and the result was very interesting, which might point to the
> cause of this bug:
>
> I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
> the problematic bot edit. When now I look at the page, instead of the page
> content I get:
>
Can you try this on another of the problem pages?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
Hi Ariel and Andre,

On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel@wikimedia.org>wrote:
> Out of curiosity... If the new revisions of one of these badly edited
> pages are deleted, leaving the top revision as the one just before the
> bad iw bot edit, does a rerun of the bot on the page fail?

On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels <andreengels@gmail.com> wrote:
> I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
> the problematic bot edit. When now I look at the page, instead of the page
> content I get:
(...)

Using this undeleted version, and running interwiki.py, gives the
expected result:
valhallasw@dorthonion:~/src/pywikipedia/trunk$ python interwiki.py
-page:Blankenbach
NOTE: Number of pages queued is 0, trying to add 60 more.
Getting 1 pages from wikipedia:nl...
WARNING: Family file wikipedia contains version number 1.17wmf1, but
it should be 1.18wmf1
NOTE: [[nl:Blankenbach]] does not exist. Skipping.

This also happens for running it from dewiki (python interwiki.py
-lang:de -page:Blankenbach%20%28Begriffskl%C3%A4rung%29) or running as
'full-auto' bot (python interwiki.py -all -async -cleanup -log -auto
-ns:0 -start:Blankenbach).

Special:Export acts like the page just does not exist
(http://nl.wikipedia.org/w/index.php?title=Speciaal:Exporteren&useskin=monobook&action=submit&curonly=True&pages=Blankenbach%0D%0ABlanzac
shows page Blanzac but not Blankenbach)

api.php also more or less does the expected thing:
http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Blankenbach&rvprop=timestamp|user|comment|content
- that is, unless you supply rvlimit=1:
http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Blankenbach&rvprop=timestamp|user|comment|content&rvlimit=1

However, none of them seem to return an empty page - and playing
around with pywikipediabot does not allow be to get an empty page
(depending on settings, it can either be the result on the edit page
(page.get(), use_api=False / screen scraping), a
pywikibot.exceptions.NoPage exception (PreloadingGenerator /
wikipedia.getall, which uses Special:Export) or the correct page text
(page.get(), use_api=True).

Anyway, thanks a huge heap for trying this (and for everyone, for
thinking about it). Unfortunately, I won't have much time this weekend
to debug -- hopefully some other pwb developer has.

Best regards, and thanks again,
Merlijn

P.S.
On 30 September 2011 11:12, Max Semenik <maxsem.wiki@gmail.com> wrote:
> So you screen-scrape? No surprise it breaks. Why? For example, due to
> protocol-relative URLs. Or some other changes to HTML output. Why not just
> use API?
No, most of pywikipedia has been adapted to the api and/or
special:export, which, imo, is just an 'old' mediawiki api. Keep in
mind interwiki.py is old (2003!), and pywikipedia initally was an
extension of the interwiki bot. Thus, there could very well be some
code that is seldom used which still uses screen scraping. And
actually, in practice, screen scraping worked pretty well.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: serious interwiki.py issues on MW 1.18 wikis [ In reply to ]
Merlijn,

Not bothered by any actual knowledge of pywikibot (which makes it far
easier to comment!), is it possible that the bot assumes it's fetching
a page, but actually raises an error instead, and this is not handled,
interpeting the lack of response as an empty string?

Regards,

Martijn

On Fri, Sep 30, 2011 at 10:37 PM, Merlijn van Deen <valhallasw@arctus.nl> wrote:
> Hi Ariel and Andre,
>
> On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel@wikimedia.org>wrote:
>> Out of curiosity... If the new revisions of one of these badly edited
>> pages are deleted, leaving the top revision as the one just before the
>> bad iw bot edit, does a rerun of the bot on the page fail?
>
> On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels <andreengels@gmail.com> wrote:
>> I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
>> the problematic bot edit. When now I look at the page, instead of the page
>> content I get:
> (...)
>
> Using this undeleted version, and running interwiki.py, gives the
> expected result:
> valhallasw@dorthonion:~/src/pywikipedia/trunk$ python interwiki.py
> -page:Blankenbach
> NOTE: Number of pages queued is 0, trying to add 60 more.
> Getting 1 pages from wikipedia:nl...
> WARNING: Family file wikipedia contains version number 1.17wmf1, but
> it should be 1.18wmf1
> NOTE: [[nl:Blankenbach]] does not exist. Skipping.
>
> This also happens for running it from dewiki (python interwiki.py
> -lang:de -page:Blankenbach%20%28Begriffskl%C3%A4rung%29) or running as
> 'full-auto' bot (python interwiki.py -all -async -cleanup -log -auto
> -ns:0 -start:Blankenbach).
>
> Special:Export acts like the page just does not exist
> (http://nl.wikipedia.org/w/index.php?title=Speciaal:Exporteren&useskin=monobook&action=submit&curonly=True&pages=Blankenbach%0D%0ABlanzac
> shows page Blanzac but not Blankenbach)
>
> api.php also more or less does the expected thing:
> http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Blankenbach&rvprop=timestamp|user|comment|content
> - that is, unless you supply rvlimit=1:
> http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Blankenbach&rvprop=timestamp|user|comment|content&rvlimit=1
>
> However, none of them seem to return an empty page - and playing
> around with pywikipediabot does not allow be to get an empty page
> (depending on settings, it can either be the result on the edit page
> (page.get(), use_api=False / screen scraping), a
> pywikibot.exceptions.NoPage exception (PreloadingGenerator /
> wikipedia.getall, which uses Special:Export) or the correct page text
> (page.get(), use_api=True).
>
> Anyway, thanks a huge heap for trying this (and for everyone, for
> thinking about it). Unfortunately, I won't have much time this weekend
> to debug -- hopefully some other pwb developer has.
>
> Best regards, and thanks again,
> Merlijn
>
> P.S.
> On 30 September 2011 11:12, Max Semenik <maxsem.wiki@gmail.com> wrote:
>> So you screen-scrape? No surprise it breaks. Why? For example, due to
>> protocol-relative URLs. Or some other changes to HTML output. Why not just
>> use API?
> No, most of pywikipedia has been adapted to the api and/or
> special:export, which, imo, is just an 'old' mediawiki api. Keep in
> mind interwiki.py is old (2003!), and pywikipedia initally was an
> extension of the interwiki bot. Thus, there could very well be some
> code that is seldom used which still uses screen scraping. And
> actually, in practice, screen scraping worked pretty well.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l