Mailing List Archive

Archival for Web Citations (GSoC project)
Hi, I'm Kevin Brown, a GSoC student this year. I live in Melbourne, Florida
and am attending Brevard Community College. My previous projects include work
on bots on the English Wikipedia for tagging of uncategorized pages and new
page patrol cleanup.

Almost since the web’s inception, link rot has been a major problem. Web-based
content comes and goes, sometimes within a matter of hours. This presents a
major problem, both for users seeking to access this information and
for Wikipedia's
core content policy of verifiability. While Wikipedia policy does not
require users to use web citations, it is by far the most popular form of
citations, because they're easy for readers and editors to access.

To help solve this and ensure adherence to verifiability (WP:V), I plan to
create an archival system over the summer, so users can access all external
links even if they go down. This preemptive archival should effectively
solve the problem of linkrot, as long as the source site allows caching of
its content. The project aims to get something that "just works" without
user input/request and to seamlessly integrate with existing page parsing
and rendering. Such a system will allow users to focus on content
creation, rather
than the distracting technical aspects of archival.

I would appreciate your help with the project. Specifically, I'd appreciate
it if communites could start discussing this on your project's local village
pump, so that we can start developing consensus for deployment.
Also, please feel free to email me or find me on IRC under the nick kevin_brown
regarding any questions you may have.

I am currently drafting proposal and design documents and will be linking
them as they become available. For now, please see a few relevant
proposals:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_External_links/Webcitebot2
http://en.wikipedia.org/wiki/Wikipedia_talk:Link_rot#Proposal_for_new_WikiProject_to_repair_dead_links
http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/WebCiteBOT
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Proposals/Dead_Link_Repair

(Thanks to Neil and Sumana for helping me write this.)

Best,
Kevin
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Archival for Web Citations (GSoC project) [ In reply to ]
You might want to dig into French Wikipedia. IIRC They run a link
archival service (there was discussion about enabling it for English
Wikipedia, but I don't think it came to anything) and might have some
helpful material.

I forget the name I'm afraid, it's discussed somewhere on the en.wiki
Village Pump so I'll see if I can dig it out.

Tom Morton

On 1 Jun 2011, at 21:51, foo bar <nnwiki@gmail.com> wrote:

> Hi, I'm Kevin Brown, a GSoC student this year. I live in Melbourne, Florida
> and am attending Brevard Community College. My previous projects include work
> on bots on the English Wikipedia for tagging of uncategorized pages and new
> page patrol cleanup.
>
> Almost since the web’s inception, link rot has been a major problem. Web-based
> content comes and goes, sometimes within a matter of hours. This presents a
> major problem, both for users seeking to access this information and
> for Wikipedia's
> core content policy of verifiability. While Wikipedia policy does not
> require users to use web citations, it is by far the most popular form of
> citations, because they're easy for readers and editors to access.
>
> To help solve this and ensure adherence to verifiability (WP:V), I plan to
> create an archival system over the summer, so users can access all external
> links even if they go down. This preemptive archival should effectively
> solve the problem of linkrot, as long as the source site allows caching of
> its content. The project aims to get something that "just works" without
> user input/request and to seamlessly integrate with existing page parsing
> and rendering. Such a system will allow users to focus on content
> creation, rather
> than the distracting technical aspects of archival.
>
> I would appreciate your help with the project. Specifically, I'd appreciate
> it if communites could start discussing this on your project's local village
> pump, so that we can start developing consensus for deployment.
> Also, please feel free to email me or find me on IRC under the nick kevin_brown
> regarding any questions you may have.
>
> I am currently drafting proposal and design documents and will be linking
> them as they become available. For now, please see a few relevant
> proposals:
> http://en.wikipedia.org/wiki/Wikipedia:WikiProject_External_links/Webcitebot2
> http://en.wikipedia.org/wiki/Wikipedia_talk:Link_rot#Proposal_for_new_WikiProject_to_repair_dead_links
> http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/WebCiteBOT
> http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Proposals/Dead_Link_Repair
>
> (Thanks to Neil and Sumana for helping me write this.)
>
> Best,
> Kevin
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Archival for Web Citations (GSoC project) [ In reply to ]
Wikiwix, I think --

http://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archived_citations
http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_External_links/Webcitebot2

Kevin, you should check out the second link above for projects which are
potentially similar to yours.

Pete

On 6/1/11 13:59 PM, Thomas Morton wrote:

> You might want to dig into French Wikipedia. IIRC They run a link
> archival service (there was discussion about enabling it for English
> Wikipedia, but I don't think it came to anything) and might have some
> helpful material.
>
> I forget the name I'm afraid, it's discussed somewhere on the en.wiki
> Village Pump so I'll see if I can dig it out.
>
> Tom Morton


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Archival for Web Citations (GSoC project) [ In reply to ]
Welcome Kevin,

I tried to contact you a few days ago, but was unable to.
Please create a wiki account (with email notifications enabled) and
commit your USERINFO.



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Archival for Web Citations (GSoC project) [ In reply to ]
Hello

http://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archived_citations_2

if you want more explication you could contact me, we have build a solution
to store external link.

It already use by fr.wikipedia and hu.wikipedia.


Cordialement
Pascal Martin
06 13 89 77 32
02 32 40 23 69


----- Original Message -----
From: "Platonides" <Platonides@gmail.com>
To: <wikitech-l@lists.wikimedia.org>
Sent: Thursday, June 02, 2011 12:37 AM
Subject: Re: [Wikitech-l] Archival for Web Citations (GSoC project)


> Welcome Kevin,
>
> I tried to contact you a few days ago, but was unable to.
> Please create a wiki account (with email notifications enabled) and
> commit your USERINFO.
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l