Mailing List Archive

Get ArchiveLinks the last step to completion
The Internet Archive wants to particularly make sure to archive pages
that Wikipedians use as citations. A GSoC project last year got most of
the way to that goal but never quite finished making the feed of new
links for use by the Archive. Would anyone else like to take this up?

More information:

https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks

http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You
could ask Kevin to make his Toolserver project a MMP or you could just
write your own script.)

https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be
moved into Git from Subversion.

http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&oldid=511258971#Alarming_policy_on_Sources_that_should_be_addressed
- there is a real hunger for this!

--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Get ArchiveLinks the last step to completion [ In reply to ]
Really essential extension to finish and bring in prod!
Unfortunately, no time to work on that :(
Emmanuel

Le 18/11/2012 13:36, Sumana Harihareswara a écrit :
> The Internet Archive wants to particularly make sure to archive pages
> that Wikipedians use as citations. A GSoC project last year got most of
> the way to that goal but never quite finished making the feed of new
> links for use by the Archive. Would anyone else like to take this up?
>
> More information:
>
> https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks
>
> http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You
> could ask Kevin to make his Toolserver project a MMP or you could just
> write your own script.)
>
> https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be
> moved into Git from Subversion.
>
> http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&oldid=511258971#Alarming_policy_on_Sources_that_should_be_addressed
> - there is a real hunger for this!
>
Re: Get ArchiveLinks the last step to completion [ In reply to ]
On 18/11/12 12:36, Sumana Harihareswara wrote:
> The Internet Archive wants to particularly make sure to archive pages
> that Wikipedians use as citations. A GSoC project last year got most of
> the way to that goal but never quite finished making the feed of new
> links for use by the Archive. Would anyone else like to take this up?
>
> More information:
>
> https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks
>
> http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You
> could ask Kevin to make his Toolserver project a MMP or you could just
> write your own script.)
>
> https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be
> moved into Git from Subversion.
>
> http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&oldid=511258971#Alarming_policy_on_Sources_that_should_be_addressed
> - there is a real hunger for this!
>
Hi -- instead of the implementation suggested above, which seems to
combine link discovery with its own archiving engine, how about just
generating an RSS feed of external links present (or possibly just those
newly inserted) in pages edited in the last (say) five minutes, for
other entities such as the Internet Archive to consume?

This would only require soft state, would not require the WMF to fetch
or store any external web content, with all of the related possible
problems associated with web archiving (retries, security, copyright,
legality...), and would not require the WMF to keep track of what
resources had been archived: each external archive could do that for itself.

The guts of something like this could be written using only the
http://www.mediawiki.org/wiki/API:Recentchanges and
http://www.mediawiki.org/wiki/API:Exturlusage APIs.

It looks like Kevin's "cronscript" link above does something just like
this already -- adapting its output to generate RSS, and caching its
output to prevent massive CPU overhead on repeated calls, would surely
be trivial.

Neil


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Get ArchiveLinks the last step to completion [ In reply to ]
On 18/11/12 13:36, Sumana Harihareswara wrote:
> The Internet Archive wants to particularly make sure to archive pages
> that Wikipedians use as citations. A GSoC project last year got most of
> the way to that goal but never quite finished making the feed of new
> links for use by the Archive. Would anyone else like to take this up?
>
> More information:
>
> https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks
>
> http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You
> could ask Kevin to make his Toolserver project a MMP or you could just
> write your own script.)

This is quite straightforward.


> https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be
> moved into Git from Subversion.

This is the longer plan, which is harder to do right. Although I see
code going in the right direction.



> http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&oldid=511258971#Alarming_policy_on_Sources_that_should_be_addressed
> - there is a real hunger for this!

I'd go solving the problem with the toolserver MMP. It can be improved
later.

I see a potential problem of missing new content added to a page,
though. I'm not sure how Kevin expected to handle it. It's possible that
the archiver automatically recrawls then so it isn't needed (eg. IA vs
WebCite).



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l