Mailing List Archive

1 2 3 4 5 6 7  View All
Re: [Wikimedia-l] Quality issues [ In reply to ]
On Sat, Nov 28, 2015 at 5:23 AM, Andreas Kolbe <jayen466@gmail.com> wrote:

> To the extent that Wikidata draws on Wikipedia, its CC0 license would
> appear to be a gross violation of Wikipedia's share-alike license
> requirement.
>

By the same logic, to the extent Wikipedia takes its facts from non-free
external source, its free license would be a copyright violation. Luckily
for us, that's not how copyright works. Statements of facts can not be
copyrighted; large-scale arrangements of facts (ie. a full database)
probably can, but CC does not prevent others from using them without
attribution, just distributing them (again, it's like the GPL/Affero
difference); there are sui generis database rights in some countries but
not in the USA where both Wikipedia and most proprietary
reusers/compatitors are located, so relying on neighbouring rights would
not help there but cause legal uncertainty for reusers (e.g. OSM which has
lots of legal trouble importing coordinates due to being EU-based).

The generation of data always has a social context. Knowing where data come
> from is a good thing.
>

You probably won't find any Wikipedian who disagrees; verifiability is one
of the fundaments of the project. But something being good and using
restrictive licensing to force others to do it are very different things.
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
It was from the Myanmar WIkipedia that a lot of data was imported to
Wikidata. Data that did not exist elsewhere. I do not care really what
"Freedom House" says. I do not know them, I do know that the data is
relevant and useful It was even the subject on a blogpost..

You may ignore data that is not from a source that you like. This
indiscriminate POV is not a NPOV.

As to Grasulf, you failed to get the point. It was NOT about the data
itself but about the presentation. I worked on this item because a
duplicate was created with even less data.

While I happily agree that Sources are good, I will not ask people to start
adding Sources at this point of time it will not improve quality
signifcantly. It makes more sense once we are at a stage where multiple
sources disagree on values for statements. Adding sources is signifcantly
more meaningful and useful once we start curating data. Statistically most
errors will be found where sources disagree.

When people add conflicting data, it is indeed really relevant to add
Sources. My practice for adding data is that I will only add data that
fulfils some minimal criteria. Typically I am not interested in adding data
that already exists. I will remove less precise for more precise data.

The biggest issue with data is that we do not have enough of it and the
second most relevant issue is that we need processes to compare sources
with Wikidata and have a workflow to curate differences.
Thanks,
GerardM

On 28 November 2015 at 19:23, Andreas Kolbe <jayen466@gmail.com> wrote:

> Gerard,
>
>
> On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:
>
> When you compare the quality of Wikipedias with what en.wp used to be you
> > are comparing apples and oranges. The Myanmar Wikipedia is better
> informed
> > on Myanmar than en.wp etc.
> >
>
>
> Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages at
> the time of writing, covering (or trying to cover) all countries of the
> world, and all aspects of human knowledge.[1]
>
> The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
> within its purview.[2] I dare say that's more articles on Myanmar than the
> Burmese Wikipedia contains. As an indication, the English Wikipedia's
> article on Myanmar is more than twice as long as the one in the Burmese
> Wikipedia.
>
> Moreover, according to Freedom House[3], the internet in Myanmar is not
> free:
>
> "The government detained and charged internet users for online activities
> [...] Government officials pressured social media users not to distribute
> or share content that offends the military, or disturbs the functions of
> government."
>
>
>
> > When you qualify a Wikipedia as fascist, it does not follow that the data
> > is suspect. Certainly when data in a source that you so easily dismiss is
> > typically the same, there is not much meaning in what you say from a
> > Wikidata point of view.
> >
>
>
> Data are always generated within a social context, and data generated by
> political extremists or people living under oppressive regimes are suspect
> whenever they have political implications. (Looking at the descriptions of
> Burmese politics, my feeling is the Burmese Wikipedia is not under
> significant government control, but largely written by ex-pats. However,
> the situation is quite different in some other Wikipedias serving countries
> labouring under similar regimes.)
>
>
>
> > PS What does your librarian think when she knows
>
>
>
> It was a he, but I'll leave him to join in himself if he chooses to.
>
>
> I happen to work on Dukes of Friuli. Compare the data from Wikidata and the
> > information by Reasonator based on the same item for one of them.
> >
> > https://tools.wmflabs.org/reasonator/?&q=2471519
> > https://www.wikidata.org/wiki/Q2471519
> >
>
>
> Let's look at this example. Reasonator says of Grasulf II of Friulim, "He
> died in 653". There is no source. Wikidata says he died in 653, and the
> indicated source is the Italian Wikipedia.
>
> However, when you look at the (very brief) Italian Wikipedia article[4],
> you will find that the year 653 is given with a question mark. The English
> Wikipedia, in contrast, states, in its similarly brief article[5],
>
> "Nothing more is known about Grasulf and the date of his death is
> uncertain."
>
> Do you now see the problem about nuance? Reasonator and Wikidata
> confidently proclaim as uncontested fact something that in fact is rather
> uncertain.
>
> The sole source cited by both the English and the Italian Wikipedia is the
> Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
> rusty, but while the Historia mentions that Ago succeeded Grasulf upon the
> latter's death, it says nothing specific about when that was. The
> Historia's time indications are in general very vague, usually limited to
> the phrase "Circa haec tempora", meaning "about this time". So it is in
> this case.
>
> For reference, the Google Knowledge Graph states equally confidently that
> Grasulf II of Friuli died in 651AD. This may be based on the English
> Wikipedia's unsourced claim (in the template at the bottom of the English
> Wikipedia article) that his reign ended c. 651, or on some other source
> like Freebase.
>
> The other Wikipedias that have articles on Grasulf II provide the following
> death dates
>
> Catalan: 651
> Galician: 653
> Lithuanian: 653
> Polish: 651
> Romanian: Unknown
> Russian: 653
> Ukrainian: 651
>
> As for published sources, I can offer Ersch's Allgemeine Encyclopädie
> (1849), which states on page 209 that Grasulf II died in 651.[7]
>
> The extreme vagueness of the available dates is pointed out by Thomas
> Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts the end
> of Grasulf's reign at 645, "as a mere random guess", and adds that "De
> Rubeis, following Sigonius", puts the accession of Ago in 661.[8]
>
> There may well be better and more recent sources beyond my reach, but
> having these published dates in Wikidata, with the source references, would
> actually make some sense. Unsourced data, not so much.
>
> Answers are comfortable, but they are not knowledge when they are
> unverifiable and/or wrong.
>
>
> [1] https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles
> [2]
>
> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessment
>
> [3] https://freedomhouse.org/report/freedom-net/2015/myanmar
> [4]
>
> https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid=76641444
> [5]
>
> https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=633223880
> [6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV
> [7]
>
> https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasulf+friuli+651%7C653&hl=en&sa=X&ved=0ahUKEwiNh5Tz0rPJAhUIChoKHV6lDTYQ6AEILzAC#v=onepage&q=grasulf%20friuli%20651%7C653&f=false
> [8]
>
> https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%7C653&q=Grasulf+%22mere+random+guess%22#v=snippet&q=Grasulf%20%22mere%20random%20guess%22&f=false
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
>
> While I happily agree that Sources are good, I will not ask people to start
> adding Sources at this point of time it will not improve quality
> signifcantly. It makes more sense once we are at a stage where multiple
> sources disagree on values for statements. Adding sources is signifcantly
> more meaningful and useful once we start curating data.


​the problems will that by the time Wikidata starts to curate data​ it'll
will have corrupted that data with its own data, and secondly past
experience with wiki's is that fixing data after its been entered is
actually harder and more time consuming to do, along with the fact that the
damage to reputation will have a lasting impact and fixing that consumes
millions of dollars in Donner money.. As said earlier there are lesson in
the development of Wikipedia that should be heeded in an attempt to avoid
those same pitfalls


On 29 November 2015 at 08:37, Gerard Meijssen <gerard.meijssen@gmail.com>
wrote:

> Hoi,
> It was from the Myanmar WIkipedia that a lot of data was imported to
> Wikidata. Data that did not exist elsewhere. I do not care really what
> "Freedom House" says. I do not know them, I do know that the data is
> relevant and useful It was even the subject on a blogpost..
>
> You may ignore data that is not from a source that you like. This
> indiscriminate POV is not a NPOV.
>
> As to Grasulf, you failed to get the point. It was NOT about the data
> itself but about the presentation. I worked on this item because a
> duplicate was created with even less data.
>
> While I happily agree that Sources are good, I will not ask people to start
> adding Sources at this point of time it will not improve quality
> signifcantly. It makes more sense once we are at a stage where multiple
> sources disagree on values for statements. Adding sources is signifcantly
> more meaningful and useful once we start curating data. Statistically most
> errors will be found where sources disagree.
>
> When people add conflicting data, it is indeed really relevant to add
> Sources. My practice for adding data is that I will only add data that
> fulfils some minimal criteria. Typically I am not interested in adding data
> that already exists. I will remove less precise for more precise data.
>
> The biggest issue with data is that we do not have enough of it and the
> second most relevant issue is that we need processes to compare sources
> with Wikidata and have a workflow to curate differences.
> Thanks,
> GerardM
>
> On 28 November 2015 at 19:23, Andreas Kolbe <jayen466@gmail.com> wrote:
>
> > Gerard,
> >
> >
> > On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:
> >
> > When you compare the quality of Wikipedias with what en.wp used to be you
> > > are comparing apples and oranges. The Myanmar Wikipedia is better
> > informed
> > > on Myanmar than en.wp etc.
> > >
> >
> >
> > Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages
> at
> > the time of writing, covering (or trying to cover) all countries of the
> > world, and all aspects of human knowledge.[1]
> >
> > The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
> > within its purview.[2] I dare say that's more articles on Myanmar than
> the
> > Burmese Wikipedia contains. As an indication, the English Wikipedia's
> > article on Myanmar is more than twice as long as the one in the Burmese
> > Wikipedia.
> >
> > Moreover, according to Freedom House[3], the internet in Myanmar is not
> > free:
> >
> > "The government detained and charged internet users for online activities
> > [...] Government officials pressured social media users not to distribute
> > or share content that offends the military, or disturbs the functions of
> > government."
> >
> >
> >
> > > When you qualify a Wikipedia as fascist, it does not follow that the
> data
> > > is suspect. Certainly when data in a source that you so easily dismiss
> is
> > > typically the same, there is not much meaning in what you say from a
> > > Wikidata point of view.
> > >
> >
> >
> > Data are always generated within a social context, and data generated by
> > political extremists or people living under oppressive regimes are
> suspect
> > whenever they have political implications. (Looking at the descriptions
> of
> > Burmese politics, my feeling is the Burmese Wikipedia is not under
> > significant government control, but largely written by ex-pats. However,
> > the situation is quite different in some other Wikipedias serving
> countries
> > labouring under similar regimes.)
> >
> >
> >
> > > PS What does your librarian think when she knows
> >
> >
> >
> > It was a he, but I'll leave him to join in himself if he chooses to.
> >
> >
> > I happen to work on Dukes of Friuli. Compare the data from Wikidata and
> the
> > > information by Reasonator based on the same item for one of them.
> > >
> > > https://tools.wmflabs.org/reasonator/?&q=2471519
> > > https://www.wikidata.org/wiki/Q2471519
> > >
> >
> >
> > Let's look at this example. Reasonator says of Grasulf II of Friulim, "He
> > died in 653". There is no source. Wikidata says he died in 653, and the
> > indicated source is the Italian Wikipedia.
> >
> > However, when you look at the (very brief) Italian Wikipedia article[4],
> > you will find that the year 653 is given with a question mark. The
> English
> > Wikipedia, in contrast, states, in its similarly brief article[5],
> >
> > "Nothing more is known about Grasulf and the date of his death is
> > uncertain."
> >
> > Do you now see the problem about nuance? Reasonator and Wikidata
> > confidently proclaim as uncontested fact something that in fact is rather
> > uncertain.
> >
> > The sole source cited by both the English and the Italian Wikipedia is
> the
> > Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
> > rusty, but while the Historia mentions that Ago succeeded Grasulf upon
> the
> > latter's death, it says nothing specific about when that was. The
> > Historia's time indications are in general very vague, usually limited to
> > the phrase "Circa haec tempora", meaning "about this time". So it is in
> > this case.
> >
> > For reference, the Google Knowledge Graph states equally confidently that
> > Grasulf II of Friuli died in 651AD. This may be based on the English
> > Wikipedia's unsourced claim (in the template at the bottom of the English
> > Wikipedia article) that his reign ended c. 651, or on some other source
> > like Freebase.
> >
> > The other Wikipedias that have articles on Grasulf II provide the
> following
> > death dates
> >
> > Catalan: 651
> > Galician: 653
> > Lithuanian: 653
> > Polish: 651
> > Romanian: Unknown
> > Russian: 653
> > Ukrainian: 651
> >
> > As for published sources, I can offer Ersch's Allgemeine Encyclopädie
> > (1849), which states on page 209 that Grasulf II died in 651.[7]
> >
> > The extreme vagueness of the available dates is pointed out by Thomas
> > Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts the
> end
> > of Grasulf's reign at 645, "as a mere random guess", and adds that "De
> > Rubeis, following Sigonius", puts the accession of Ago in 661.[8]
> >
> > There may well be better and more recent sources beyond my reach, but
> > having these published dates in Wikidata, with the source references,
> would
> > actually make some sense. Unsourced data, not so much.
> >
> > Answers are comfortable, but they are not knowledge when they are
> > unverifiable and/or wrong.
> >
> >
> > [1]
> https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles
> > [2]
> >
> >
> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessment
> >
> > [3] https://freedomhouse.org/report/freedom-net/2015/myanmar
> > [4]
> >
> >
> https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid=76641444
> > [5]
> >
> >
> https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=633223880
> > [6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV
> > [7]
> >
> >
> https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasulf+friuli+651%7C653&hl=en&sa=X&ved=0ahUKEwiNh5Tz0rPJAhUIChoKHV6lDTYQ6AEILzAC#v=onepage&q=grasulf%20friuli%20651%7C653&f=false
> > [8]
> >
> >
> https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%7C653&q=Grasulf+%22mere+random+guess%22#v=snippet&q=Grasulf%20%22mere%20random%20guess%22&f=false
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>



--
GN.
President Wikimedia Australia
WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
Photo Gallery: http://gnangarra.redbubble.com
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On Sun, Nov 29, 2015 at 12:37 AM, Gerard Meijssen <gerard.meijssen@gmail.com
> wrote:

> As to Grasulf, you failed to get the point. It was NOT about the data
> itself but about the presentation.
>


QED. :)
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
Wikidata is a wiki and, you seem to always forget that.

The corruption of data .. how? Each statement is its own data item how do
you corrupt that? As I say so often, when you get a collection that is 80%
correct you have an error rate of 20%. When you do not include that data
you have an error rate of 100%. When you have an other source that is 90%
correct that has similar data and you have an overlap of 50%, you can be
smart and at the start or later compare the data and curate.. When you only
import at the start what is the same, you probably get something like 84%
correct data imported. You can gamify the rest but however you slice it,
what you do not have and could have is 100% wrong.

Wikidata is NOT Wikipedia. It is much easier to curate data and
consequently your argument is FUD. The big thing we have not learned is
cooperation. We do not cooperate. We do not have per standard RSS feeds for
the changes to the items that belong to a specific source. We are happy to
get data but we do not reach out and give back. For me the fact that VIAF
uses Wikidata as a link is an opportunity to do better. The German DNB
cooperation are the projects that we should emulate.

When you talk about quality, you talk in an insular fashion. We have to do
it, our community. At Wikidata our community can include other
organisations with rich collections of data with high quality. We can
share, compare, curate. Even with our current low quality, we have subsets
of data that shine. Subsets of data that our of at least the same quality
as Wikipedia. However this quality is often marred with a lack of quantity,
quantity we can have when collaboration is what we do.

You are afraid of our reputation. Reputation has many aspects. Jane023
presented at the Dutch Wikimedia conference. She uses a tool that is easier
on her because no Wikipedians bother her because it is a Wikidata based
list. A similar list is now used for its quality on the Welsh Wikipedia.
The data is of a quality that Google actually uses it as she reported.

When I see the religious application of Wikipedia sentiments. I find that
we do not even care for the life of one of our own. Bassel is executed or
likely to be executed soon and some think our neutrality is so important.
FOR WHAT? So that we may not even protect our own? Is it right to protest
against TTIP (and we should) and not protest for a Wikipedian that embodies
our values?

Wikipedia think is not applicable at this stage for Wikidata. Its quality
is arguably piss poor but better in places. Many items are corrupt because
they follow the structure of Wikipedia articles. A structure Wikipedians
insist on because they wrote that article and "Wikidata is only a service
project".

I do agree that we need more quality. My approach has set theory on its
side, it embodies the wiki approach and yours is one where Pallas Athena is
to rise from the brain of Zeus in full armour. You may have noticed that my
arguments are easy to follow and conform to something that is measurable.
Yours is private, there is no possibility to verify the accuracy of your
argument. I call bullshit on your argument, not because you do not make a
fine argument but because it is an argument that prevents us from improving
Wikidata.

My hope is that we can work constructively on our quality and have a
measurable effect.
Thanks,
GerardM

On 29 November 2015 at 02:05, Gnangarra <gnangarra@gmail.com> wrote:

> >
> > While I happily agree that Sources are good, I will not ask people to
> start
> > adding Sources at this point of time it will not improve quality
> > signifcantly. It makes more sense once we are at a stage where multiple
> > sources disagree on values for statements. Adding sources is signifcantly
> > more meaningful and useful once we start curating data.
>
>
> ​the problems will that by the time Wikidata starts to curate data​ it'll
> will have corrupted that data with its own data, and secondly past
> experience with wiki's is that fixing data after its been entered is
> actually harder and more time consuming to do, along with the fact that the
> damage to reputation will have a lasting impact and fixing that consumes
> millions of dollars in Donner money.. As said earlier there are lesson in
> the development of Wikipedia that should be heeded in an attempt to avoid
> those same pitfalls
>
>
> On 29 November 2015 at 08:37, Gerard Meijssen <gerard.meijssen@gmail.com>
> wrote:
>
> > Hoi,
> > It was from the Myanmar WIkipedia that a lot of data was imported to
> > Wikidata. Data that did not exist elsewhere. I do not care really what
> > "Freedom House" says. I do not know them, I do know that the data is
> > relevant and useful It was even the subject on a blogpost..
> >
> > You may ignore data that is not from a source that you like. This
> > indiscriminate POV is not a NPOV.
> >
> > As to Grasulf, you failed to get the point. It was NOT about the data
> > itself but about the presentation. I worked on this item because a
> > duplicate was created with even less data.
> >
> > While I happily agree that Sources are good, I will not ask people to
> start
> > adding Sources at this point of time it will not improve quality
> > signifcantly. It makes more sense once we are at a stage where multiple
> > sources disagree on values for statements. Adding sources is signifcantly
> > more meaningful and useful once we start curating data. Statistically
> most
> > errors will be found where sources disagree.
> >
> > When people add conflicting data, it is indeed really relevant to add
> > Sources. My practice for adding data is that I will only add data that
> > fulfils some minimal criteria. Typically I am not interested in adding
> data
> > that already exists. I will remove less precise for more precise data.
> >
> > The biggest issue with data is that we do not have enough of it and the
> > second most relevant issue is that we need processes to compare sources
> > with Wikidata and have a workflow to curate differences.
> > Thanks,
> > GerardM
> >
> > On 28 November 2015 at 19:23, Andreas Kolbe <jayen466@gmail.com> wrote:
> >
> > > Gerard,
> > >
> > >
> > > On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen@gmail.com>
> wrote:
> > >
> > > When you compare the quality of Wikipedias with what en.wp used to be
> you
> > > > are comparing apples and oranges. The Myanmar Wikipedia is better
> > > informed
> > > > on Myanmar than en.wp etc.
> > > >
> > >
> > >
> > > Is it? The entire Burmese Wikipedia contains a mere 31,646 content
> pages
> > at
> > > the time of writing, covering (or trying to cover) all countries of the
> > > world, and all aspects of human knowledge.[1]
> > >
> > > The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
> > > within its purview.[2] I dare say that's more articles on Myanmar than
> > the
> > > Burmese Wikipedia contains. As an indication, the English Wikipedia's
> > > article on Myanmar is more than twice as long as the one in the Burmese
> > > Wikipedia.
> > >
> > > Moreover, according to Freedom House[3], the internet in Myanmar is not
> > > free:
> > >
> > > "The government detained and charged internet users for online
> activities
> > > [...] Government officials pressured social media users not to
> distribute
> > > or share content that offends the military, or disturbs the functions
> of
> > > government."
> > >
> > >
> > >
> > > > When you qualify a Wikipedia as fascist, it does not follow that the
> > data
> > > > is suspect. Certainly when data in a source that you so easily
> dismiss
> > is
> > > > typically the same, there is not much meaning in what you say from a
> > > > Wikidata point of view.
> > > >
> > >
> > >
> > > Data are always generated within a social context, and data generated
> by
> > > political extremists or people living under oppressive regimes are
> > suspect
> > > whenever they have political implications. (Looking at the descriptions
> > of
> > > Burmese politics, my feeling is the Burmese Wikipedia is not under
> > > significant government control, but largely written by ex-pats.
> However,
> > > the situation is quite different in some other Wikipedias serving
> > countries
> > > labouring under similar regimes.)
> > >
> > >
> > >
> > > > PS What does your librarian think when she knows
> > >
> > >
> > >
> > > It was a he, but I'll leave him to join in himself if he chooses to.
> > >
> > >
> > > I happen to work on Dukes of Friuli. Compare the data from Wikidata and
> > the
> > > > information by Reasonator based on the same item for one of them.
> > > >
> > > > https://tools.wmflabs.org/reasonator/?&q=2471519
> > > > https://www.wikidata.org/wiki/Q2471519
> > > >
> > >
> > >
> > > Let's look at this example. Reasonator says of Grasulf II of Friulim,
> "He
> > > died in 653". There is no source. Wikidata says he died in 653, and the
> > > indicated source is the Italian Wikipedia.
> > >
> > > However, when you look at the (very brief) Italian Wikipedia
> article[4],
> > > you will find that the year 653 is given with a question mark. The
> > English
> > > Wikipedia, in contrast, states, in its similarly brief article[5],
> > >
> > > "Nothing more is known about Grasulf and the date of his death is
> > > uncertain."
> > >
> > > Do you now see the problem about nuance? Reasonator and Wikidata
> > > confidently proclaim as uncontested fact something that in fact is
> rather
> > > uncertain.
> > >
> > > The sole source cited by both the English and the Italian Wikipedia is
> > the
> > > Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
> > > rusty, but while the Historia mentions that Ago succeeded Grasulf upon
> > the
> > > latter's death, it says nothing specific about when that was. The
> > > Historia's time indications are in general very vague, usually limited
> to
> > > the phrase "Circa haec tempora", meaning "about this time". So it is in
> > > this case.
> > >
> > > For reference, the Google Knowledge Graph states equally confidently
> that
> > > Grasulf II of Friuli died in 651AD. This may be based on the English
> > > Wikipedia's unsourced claim (in the template at the bottom of the
> English
> > > Wikipedia article) that his reign ended c. 651, or on some other source
> > > like Freebase.
> > >
> > > The other Wikipedias that have articles on Grasulf II provide the
> > following
> > > death dates
> > >
> > > Catalan: 651
> > > Galician: 653
> > > Lithuanian: 653
> > > Polish: 651
> > > Romanian: Unknown
> > > Russian: 653
> > > Ukrainian: 651
> > >
> > > As for published sources, I can offer Ersch's Allgemeine Encyclopädie
> > > (1849), which states on page 209 that Grasulf II died in 651.[7]
> > >
> > > The extreme vagueness of the available dates is pointed out by Thomas
> > > Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts the
> > end
> > > of Grasulf's reign at 645, "as a mere random guess", and adds that "De
> > > Rubeis, following Sigonius", puts the accession of Ago in 661.[8]
> > >
> > > There may well be better and more recent sources beyond my reach, but
> > > having these published dates in Wikidata, with the source references,
> > would
> > > actually make some sense. Unsourced data, not so much.
> > >
> > > Answers are comfortable, but they are not knowledge when they are
> > > unverifiable and/or wrong.
> > >
> > >
> > > [1]
> > https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles
> > > [2]
> > >
> > >
> >
> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessment
> > >
> > > [3] https://freedomhouse.org/report/freedom-net/2015/myanmar
> > > [4]
> > >
> > >
> >
> https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid=76641444
> > > [5]
> > >
> > >
> >
> https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=633223880
> > > [6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV
> > > [7]
> > >
> > >
> >
> https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasulf+friuli+651%7C653&hl=en&sa=X&ved=0ahUKEwiNh5Tz0rPJAhUIChoKHV6lDTYQ6AEILzAC#v=onepage&q=grasulf%20friuli%20651%7C653&f=false
> > > [8]
> > >
> > >
> >
> https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%7C653&q=Grasulf+%22mere+random+guess%22#v=snippet&q=Grasulf%20%22mere%20random%20guess%22&f=false
> > > _______________________________________________
> > > Wikimedia-l mailing list, guidelines at:
> > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > Wikimedia-l@lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> > >
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >
>
>
>
> --
> GN.
> President Wikimedia Australia
> WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
> Photo Gallery: http://gnangarra.redbubble.com
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Gerard,
Thanks for highlighting my work! I already posted slides on Commons, but I
want to flesh them out with links to actual edits so people can better
understand some of these quality improvement workflows. The tools I use for
lists are written mostly by the Wikidata "god" Magnus Manske and the tools
I use on Commons are self-built kludges with the assistance of Commonist
Vera de Kok. Here is an example of a quality improvement I did this morning
for a file on Commons that was originally uploaded by an English Wikipedian
who uploaded it with the default uploader for use in an English Wikipedia
list. The improvements are coming from both the original edits of the
uploader on Wikipedia as well as the associated Wikidata list:
https://commons.wikimedia.org/w/index.php?title=File:Rembrandt_Man_with_a_Falcon_on_his_Wrist.jpg&diff=prev&oldid=180547014

Jane

On Sun, Nov 29, 2015 at 10:42 AM, Gerard Meijssen <gerard.meijssen@gmail.com
> wrote:

> Hoi,
> Wikidata is a wiki and, you seem to always forget that.
>
> The corruption of data .. how? Each statement is its own data item how do
> you corrupt that? As I say so often, when you get a collection that is 80%
> correct you have an error rate of 20%. When you do not include that data
> you have an error rate of 100%. When you have an other source that is 90%
> correct that has similar data and you have an overlap of 50%, you can be
> smart and at the start or later compare the data and curate.. When you only
> import at the start what is the same, you probably get something like 84%
> correct data imported. You can gamify the rest but however you slice it,
> what you do not have and could have is 100% wrong.
>
> Wikidata is NOT Wikipedia. It is much easier to curate data and
> consequently your argument is FUD. The big thing we have not learned is
> cooperation. We do not cooperate. We do not have per standard RSS feeds for
> the changes to the items that belong to a specific source. We are happy to
> get data but we do not reach out and give back. For me the fact that VIAF
> uses Wikidata as a link is an opportunity to do better. The German DNB
> cooperation are the projects that we should emulate.
>
> When you talk about quality, you talk in an insular fashion. We have to do
> it, our community. At Wikidata our community can include other
> organisations with rich collections of data with high quality. We can
> share, compare, curate. Even with our current low quality, we have subsets
> of data that shine. Subsets of data that our of at least the same quality
> as Wikipedia. However this quality is often marred with a lack of quantity,
> quantity we can have when collaboration is what we do.
>
> You are afraid of our reputation. Reputation has many aspects. Jane023
> presented at the Dutch Wikimedia conference. She uses a tool that is easier
> on her because no Wikipedians bother her because it is a Wikidata based
> list. A similar list is now used for its quality on the Welsh Wikipedia.
> The data is of a quality that Google actually uses it as she reported.
>
> When I see the religious application of Wikipedia sentiments. I find that
> we do not even care for the life of one of our own. Bassel is executed or
> likely to be executed soon and some think our neutrality is so important.
> FOR WHAT? So that we may not even protect our own? Is it right to protest
> against TTIP (and we should) and not protest for a Wikipedian that embodies
> our values?
>
> Wikipedia think is not applicable at this stage for Wikidata. Its quality
> is arguably piss poor but better in places. Many items are corrupt because
> they follow the structure of Wikipedia articles. A structure Wikipedians
> insist on because they wrote that article and "Wikidata is only a service
> project".
>
> I do agree that we need more quality. My approach has set theory on its
> side, it embodies the wiki approach and yours is one where Pallas Athena is
> to rise from the brain of Zeus in full armour. You may have noticed that my
> arguments are easy to follow and conform to something that is measurable.
> Yours is private, there is no possibility to verify the accuracy of your
> argument. I call bullshit on your argument, not because you do not make a
> fine argument but because it is an argument that prevents us from improving
> Wikidata.
>
> My hope is that we can work constructively on our quality and have a
> measurable effect.
> Thanks,
> GerardM
>
> On 29 November 2015 at 02:05, Gnangarra <gnangarra@gmail.com> wrote:
>
> > >
> > > While I happily agree that Sources are good, I will not ask people to
> > start
> > > adding Sources at this point of time it will not improve quality
> > > signifcantly. It makes more sense once we are at a stage where multiple
> > > sources disagree on values for statements. Adding sources is
> signifcantly
> > > more meaningful and useful once we start curating data.
> >
> >
> > ​the problems will that by the time Wikidata starts to curate data​ it'll
> > will have corrupted that data with its own data, and secondly past
> > experience with wiki's is that fixing data after its been entered is
> > actually harder and more time consuming to do, along with the fact that
> the
> > damage to reputation will have a lasting impact and fixing that consumes
> > millions of dollars in Donner money.. As said earlier there are lesson in
> > the development of Wikipedia that should be heeded in an attempt to avoid
> > those same pitfalls
> >
> >
> > On 29 November 2015 at 08:37, Gerard Meijssen <gerard.meijssen@gmail.com
> >
> > wrote:
> >
> > > Hoi,
> > > It was from the Myanmar WIkipedia that a lot of data was imported to
> > > Wikidata. Data that did not exist elsewhere. I do not care really what
> > > "Freedom House" says. I do not know them, I do know that the data is
> > > relevant and useful It was even the subject on a blogpost..
> > >
> > > You may ignore data that is not from a source that you like. This
> > > indiscriminate POV is not a NPOV.
> > >
> > > As to Grasulf, you failed to get the point. It was NOT about the data
> > > itself but about the presentation. I worked on this item because a
> > > duplicate was created with even less data.
> > >
> > > While I happily agree that Sources are good, I will not ask people to
> > start
> > > adding Sources at this point of time it will not improve quality
> > > signifcantly. It makes more sense once we are at a stage where multiple
> > > sources disagree on values for statements. Adding sources is
> signifcantly
> > > more meaningful and useful once we start curating data. Statistically
> > most
> > > errors will be found where sources disagree.
> > >
> > > When people add conflicting data, it is indeed really relevant to add
> > > Sources. My practice for adding data is that I will only add data that
> > > fulfils some minimal criteria. Typically I am not interested in adding
> > data
> > > that already exists. I will remove less precise for more precise data.
> > >
> > > The biggest issue with data is that we do not have enough of it and the
> > > second most relevant issue is that we need processes to compare sources
> > > with Wikidata and have a workflow to curate differences.
> > > Thanks,
> > > GerardM
> > >
> > > On 28 November 2015 at 19:23, Andreas Kolbe <jayen466@gmail.com>
> wrote:
> > >
> > > > Gerard,
> > > >
> > > >
> > > > On Fri, Nov 27, 2015, Gerard Meijssen <gerard.meijssen@gmail.com>
> > wrote:
> > > >
> > > > When you compare the quality of Wikipedias with what en.wp used to be
> > you
> > > > > are comparing apples and oranges. The Myanmar Wikipedia is better
> > > > informed
> > > > > on Myanmar than en.wp etc.
> > > > >
> > > >
> > > >
> > > > Is it? The entire Burmese Wikipedia contains a mere 31,646 content
> > pages
> > > at
> > > > the time of writing, covering (or trying to cover) all countries of
> the
> > > > world, and all aspects of human knowledge.[1]
> > > >
> > > > The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713
> pages
> > > > within its purview.[2] I dare say that's more articles on Myanmar
> than
> > > the
> > > > Burmese Wikipedia contains. As an indication, the English Wikipedia's
> > > > article on Myanmar is more than twice as long as the one in the
> Burmese
> > > > Wikipedia.
> > > >
> > > > Moreover, according to Freedom House[3], the internet in Myanmar is
> not
> > > > free:
> > > >
> > > > "The government detained and charged internet users for online
> > activities
> > > > [...] Government officials pressured social media users not to
> > distribute
> > > > or share content that offends the military, or disturbs the functions
> > of
> > > > government."
> > > >
> > > >
> > > >
> > > > > When you qualify a Wikipedia as fascist, it does not follow that
> the
> > > data
> > > > > is suspect. Certainly when data in a source that you so easily
> > dismiss
> > > is
> > > > > typically the same, there is not much meaning in what you say from
> a
> > > > > Wikidata point of view.
> > > > >
> > > >
> > > >
> > > > Data are always generated within a social context, and data generated
> > by
> > > > political extremists or people living under oppressive regimes are
> > > suspect
> > > > whenever they have political implications. (Looking at the
> descriptions
> > > of
> > > > Burmese politics, my feeling is the Burmese Wikipedia is not under
> > > > significant government control, but largely written by ex-pats.
> > However,
> > > > the situation is quite different in some other Wikipedias serving
> > > countries
> > > > labouring under similar regimes.)
> > > >
> > > >
> > > >
> > > > > PS What does your librarian think when she knows
> > > >
> > > >
> > > >
> > > > It was a he, but I'll leave him to join in himself if he chooses to.
> > > >
> > > >
> > > > I happen to work on Dukes of Friuli. Compare the data from Wikidata
> and
> > > the
> > > > > information by Reasonator based on the same item for one of them.
> > > > >
> > > > > https://tools.wmflabs.org/reasonator/?&q=2471519
> > > > > https://www.wikidata.org/wiki/Q2471519
> > > > >
> > > >
> > > >
> > > > Let's look at this example. Reasonator says of Grasulf II of Friulim,
> > "He
> > > > died in 653". There is no source. Wikidata says he died in 653, and
> the
> > > > indicated source is the Italian Wikipedia.
> > > >
> > > > However, when you look at the (very brief) Italian Wikipedia
> > article[4],
> > > > you will find that the year 653 is given with a question mark. The
> > > English
> > > > Wikipedia, in contrast, states, in its similarly brief article[5],
> > > >
> > > > "Nothing more is known about Grasulf and the date of his death is
> > > > uncertain."
> > > >
> > > > Do you now see the problem about nuance? Reasonator and Wikidata
> > > > confidently proclaim as uncontested fact something that in fact is
> > rather
> > > > uncertain.
> > > >
> > > > The sole source cited by both the English and the Italian Wikipedia
> is
> > > the
> > > > Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
> > > > rusty, but while the Historia mentions that Ago succeeded Grasulf
> upon
> > > the
> > > > latter's death, it says nothing specific about when that was. The
> > > > Historia's time indications are in general very vague, usually
> limited
> > to
> > > > the phrase "Circa haec tempora", meaning "about this time". So it is
> in
> > > > this case.
> > > >
> > > > For reference, the Google Knowledge Graph states equally confidently
> > that
> > > > Grasulf II of Friuli died in 651AD. This may be based on the English
> > > > Wikipedia's unsourced claim (in the template at the bottom of the
> > English
> > > > Wikipedia article) that his reign ended c. 651, or on some other
> source
> > > > like Freebase.
> > > >
> > > > The other Wikipedias that have articles on Grasulf II provide the
> > > following
> > > > death dates
> > > >
> > > > Catalan: 651
> > > > Galician: 653
> > > > Lithuanian: 653
> > > > Polish: 651
> > > > Romanian: Unknown
> > > > Russian: 653
> > > > Ukrainian: 651
> > > >
> > > > As for published sources, I can offer Ersch's Allgemeine Encyclopädie
> > > > (1849), which states on page 209 that Grasulf II died in 651.[7]
> > > >
> > > > The extreme vagueness of the available dates is pointed out by Thomas
> > > > Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts
> the
> > > end
> > > > of Grasulf's reign at 645, "as a mere random guess", and adds that
> "De
> > > > Rubeis, following Sigonius", puts the accession of Ago in 661.[8]
> > > >
> > > > There may well be better and more recent sources beyond my reach, but
> > > > having these published dates in Wikidata, with the source references,
> > > would
> > > > actually make some sense. Unsourced data, not so much.
> > > >
> > > > Answers are comfortable, but they are not knowledge when they are
> > > > unverifiable and/or wrong.
> > > >
> > > >
> > > > [1]
> > > https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles
> > > > [2]
> > > >
> > > >
> > >
> >
> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessment
> > > >
> > > > [3] https://freedomhouse.org/report/freedom-net/2015/myanmar
> > > > [4]
> > > >
> > > >
> > >
> >
> https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli&oldid=76641444
> > > > [5]
> > > >
> > > >
> > >
> >
> https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli&oldid=633223880
> > > > [6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV
> > > > [7]
> > > >
> > > >
> > >
> >
> https://books.google.co.uk/books?id=FzxYAAAAYAAJ&pg=PA209&dq=grasulf+friuli+651%7C653&hl=en&sa=X&ved=0ahUKEwiNh5Tz0rPJAhUIChoKHV6lDTYQ6AEILzAC#v=onepage&q=grasulf%20friuli%20651%7C653&f=false
> > > > [8]
> > > >
> > > >
> > >
> >
> https://books.google.co.uk/books?id=8ToOAwAAQBAJ&dq=grasulf+friuli+651%7C653&q=Grasulf+%22mere+random+guess%22#v=snippet&q=Grasulf%20%22mere%20random%20guess%22&f=false
> > > > _______________________________________________
> > > > Wikimedia-l mailing list, guidelines at:
> > > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > > Wikimedia-l@lists.wikimedia.org
> > > > Unsubscribe:
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> > > >
> > > _______________________________________________
> > > Wikimedia-l mailing list, guidelines at:
> > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > Wikimedia-l@lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> > >
> >
> >
> >
> > --
> > GN.
> > President Wikimedia Australia
> > WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
> > Photo Gallery: http://gnangarra.redbubble.com
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> >
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
If anything it proves that you did not understand. Happy that you
appreciate what you finally see.
Thanks,
GerardM

On 29 November 2015 at 03:38, Andreas Kolbe <jayen466@gmail.com> wrote:

> On Sun, Nov 29, 2015 at 12:37 AM, Gerard Meijssen <
> gerard.meijssen@gmail.com
> > wrote:
>
> > As to Grasulf, you failed to get the point. It was NOT about the data
> > itself but about the presentation.
> >
>
>
> QED. :)
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On 29/11/2015 09:42, Gerard Meijssen wrote:
> Hoi, Wikidata is a wiki and, you seem to always forget that. > > The corruption of data .. how? Each statement is its own data item
> how do you corrupt that? As I say so often, when you get a collection
> that is 80% correct you have an error rate of 20%.

Surely this isn't some exam paper where you get an 80% passing mark.
What you have is a basket of eggs ... 20% of which are poisonous.


_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
More FUD. Poisonous how?
Thanks,
GerardM

On 29 November 2015 at 11:33, Lilburne <lilburne@tygers-of-wrath.net> wrote:

> On 29/11/2015 09:42, Gerard Meijssen wrote:
>
>> Hoi, Wikidata is a wiki and, you seem to always forget that. > > The
>> corruption of data .. how? Each statement is its own data item
>>
> > how do you corrupt that? As I say so often, when you get a collection >
> that is 80% correct you have an error rate of 20%.
>
> Surely this isn't some exam paper where you get an 80% passing mark.
> What you have is a basket of eggs ... 20% of which are poisonous.
>
>
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On 29/11/2015 00:37, Gerard Meijssen wrote:
> Hoi,
> It was from the Myanmar WIkipedia that a lot of data was imported to
> Wikidata. Data that did not exist elsewhere. I do not care really what
> "Freedom House" says. I do not know them, I do know that the data is
> relevant and useful It was even the subject on a blogpost..
>
> You may ignore data that is not from a source that you like. This
> indiscriminate POV is not a NPOV.
>

Isn't the point that the data was taken primarily because it was
available, and that
there was no attempt to verify its accuracy. If I give you 10,000 images
of lichen but
before hand randomly switch the names of 2000 of them and add misleading
geodata
randomly to another 2000 are the images useful as data? Would including them
improve NPOV?


_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
When you do that all your data is removed and you are banned from Wikidata.
Thanks,
GerardM

On 29 November 2015 at 11:40, Lilburne <lilburne@tygers-of-wrath.net> wrote:

> On 29/11/2015 00:37, Gerard Meijssen wrote:
>
>> Hoi,
>> It was from the Myanmar WIkipedia that a lot of data was imported to
>> Wikidata. Data that did not exist elsewhere. I do not care really what
>> "Freedom House" says. I do not know them, I do know that the data is
>> relevant and useful It was even the subject on a blogpost..
>>
>> You may ignore data that is not from a source that you like. This
>> indiscriminate POV is not a NPOV.
>>
>>
> Isn't the point that the data was taken primarily because it was
> available, and that
> there was no attempt to verify its accuracy. If I give you 10,000 images
> of lichen but
> before hand randomly switch the names of 2000 of them and add misleading
> geodata
> randomly to another 2000 are the images useful as data? Would including
> them
> improve NPOV?
>
>
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On 28 November 2015 at 19:17, Ed Erhart <the.ed17@gmail.com> wrote:

> On the very specific point of knowledge and how it's not always possible to
> boil it down to a single quantifiable value, I couldn't agree more. Thank
> you, Andreas, for the detailed anecdote displaying that problem, and I'll
> be happy to provide more if needed.
>
> Does Wikidata have a way of marking data entries as estimates, or at least
> dates as circa (not just unknown)?
>
>
Yes https://www.wikidata.org/wiki/Property:P1317 however a quick comparison
between the English Wikipedia and wikidata suggests it isn't used very much.

Of course there are a bunch of other issues. It gives dates for Egyptian
Pharaohs without saying what chronology it is using. It keeps claiming
dates are Gregorian without showing any conversion has actually taken place
(wikipedians tend to be pretty poor when it comes to such conversions since
they require a fair bit of background knowledge. For example depending on
the year and writer the year in England can start on the 1st of January,
25th March or the first day of advent).

Wikidata doesn't do very well on carbon dating either. If we look at Ötzi
https://www.wikidata.org/wiki/Q171291

We again get dates with no indication of the calibration used. Really this
would be better handled using the uncalibrated C14 numbers (4550 ± 27BP
http://digitalcommons.library.arizona.edu/objectviewer?o=http%3A%2F%2Fradiocarbon.library.arizona.edu%2FVolume36%2FNumber2%2Fazu_radiocarbon_v36_n2_247_250_v.pdf
) and then adding enough information for the correct calibration curve to
be selected (Northern hemisphere land based which at the moment probably
means INTCAL13)

--
geni
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Gergo,


On Sun, Nov 29, 2015 at 12:36 AM, Gergo Tisza <gtisza@wikimedia.org> wrote:

> By the same logic, to the extent Wikipedia takes its facts from non-free
> external source, its free license would be a copyright violation. Luckily
> for us, that's not how copyright works.



I'm aware that facts are not copyrightable. By the same logic, Wikidata
being offered under a CC BY-SA license, say, would not prevent anyone from
extracting facts -- knowledge -- from it, and it would enable Wikidata to
import a lot of data it presently cannot, because of licence
incompatibilities.



> Statements of facts can not be
> copyrighted; large-scale arrangements of facts (ie. a full database)
> probably can, but CC does not prevent others from using them without
> attribution, just distributing them (again, it's like the GPL/Affero
> difference);



Distribution is the issue here – large-scale distribution and viral
propagation of data with a well-documented potential for manipulation and
error, in a way that makes the provenance of these data a closed book to
the end user.

Do you accept that this is a potential problem, and if so, how would you
guard against it, if not through the licence?



> there are sui generis database rights in some countries but
> not in the USA where both Wikipedia and most proprietary
> reusers/compatitors are located, so relying on neighbouring rights would
> not help there but cause legal uncertainty for reusers (e.g. OSM which has
> lots of legal trouble importing coordinates due to being EU-based).
>


It seems noteworthy that Freebase specifically said, with regard to loading
structured data, "If a data source is under CC-BY, you can load it into
Freebase as long as you provide attribution."[1]

Wikidata practice seems to have taken a different path regarding licence
compatibility, given its systematic imports from Wikipedia.

Interestingly enough, it's been pointed out to me that Denny said in
2012,[2]

---o0o---

Alexrk2, it is true that Wikidata under CC0 would not be allowed to import
content from a Share-Alike data source. Wikidata does not plan to extract
content out of Wikipedia at all. Wikidata will *provide* data that can be
reused in the Wikipedias. And a CC0 source can be used by a Share-Alike
project, be it either Wikipedia or OSM. But not the other way around. Do we
agree on this understanding? --Denny Vrandečić (WMDE)
<https://meta.wikimedia.org/wiki/User:Denny_Vrande%C4%8Di%C4%87_(WMDE)> (
talk
<https://meta.wikimedia.org/wiki/User_talk:Denny_Vrande%C4%8Di%C4%87_(WMDE)>)
12:39, 4 July 2012 (UTC)

---o0o---

The key sentence here is "Wikidata does not plan to extract content out of
Wikipedia at all."

That doesn't seem to be how things have turned out, because today we have
people on Wikidata raising alarms about mass imports from Wikipedia:[3]

---o0o---

Reliable Bot imports from wikipedias?

In a Wikipedia discussion I came by chance across a link to the following
discussion:

- Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import
<https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import>

[...] To provide an outside perspective as Wikipedian (and a potential
use[r] of WD in the future). I wholeheartedly agree with Snipre, in fact
"bots [ar]e running wild" and the uncontrolled import of data/information
from Wikipedias is one of the main reasons for some Wikipedias developing
an increasingly hostile attitude towards WD and its usage in Wikipedias.
*If* WD is ever to function as a central data storage for various Wikimedia
projects and in particular Wikipedia as well (in analogy to Commons), *then*
quality has to take the driver's seat over quantity. A central storage
needs a much better data integrity than the projects using it, because one
mistake in its data will multiply throughout the projects relying on WD,
which may cause all sorts of problems. For crude comparison think of a
virus placed on a central server than on a single client.The consequences
are much more severe and nobody in their right mind would run the server
with even less protection/restrictions than the client.

Another thin[g] is, that if you envision users of other Wikimedia projects
such as Wikipedia or even 3rd party external projects to eventually help
with data maintenance when they start using WD, then you might find them
rather unwilling to do so, if not enough attention is paid to quality,
instead they probably just dump WD from their projects.

In general all the advantages of the central data storage depend on the
quality (reliability) of data. If that is not given to reasonable high
degree, there is no point to have central data storage at all. All the
great application become useless if they operate on false data.--Kmhkmh
<https://www.wikidata.org/w/index.php?title=User:Kmhkmh&action=edit&redlink=1>
(talk <https://www.wikidata.org/wiki/User_talk:Kmhkmh>) 12:00, 19 November
2015 (UTC)

---o0o---

(I was unaware of that post by Kmhkmh when I started contributing to this
discussion, but it obviously echoes some of my own concerns.)

I've been told on the German Wikipedia that the Wikidata CC0 licence has
long been a controversial issue, subject to recurrent discussion,
especially with regard to official population statistics in Europe, whose
publishers often require attribution, making their wholesale import in
Wikidata's CC0 environment problematic.[4]

In reviewing these discussions, I couldn't help but be reminded of
Flickrwashing schemes by some contributors' lines of thought: how -- via
which intermediary steps -- can we get the info into our CC0 project
without being seen to fall foul of the original publishers' licenses?

As I understand it, the intent is to bully other data publishers into
making their data available under CC0 as well. I understand this from an
open-content perspective, and I can see how it might benefit Google's and
other information platforms' bottom line, but I reiterate -- there are
very, very significant downsides to having a central database subject to
anonymous manipulation by all comers whose data is automatically propagated
by major search engines. There are many autocratic regimes in the world
today who spend a lot of money and effort to achieve this kind of uniform
media response in their countries.

In my opinion, it creates a significant vulnerability in the global
information infrastructure. If, in more troubled times ahead, people are
fed the same unattributed lie by all major online outlets, because they are
all automatically propagating the content of Wikimedia's CC0 database, then
this could potentially alter the course of history, and not in a good way.

I am happy to hear ideas about how to address this that do not involve
licensing. We need more transparency about data provenance.

You may argue that Wikidata is still in its early days, and has nowhere
near the amount of data, nowhere near the reach and impact today to justify
such an effort. Maybe it never will, and I'm worrying for nothing.

But we thought much the same about Wikipedia around the time of the
Seigenthaler incident. Before we knew it, Wikipedia had become the world's
dominant information resource, with increasing numbers of government
officials, judges, journalists and academics happy to accept its word
uncritically – in a way that horrifies most Wikipedians, who are well aware
of the system's weaknesses.

Last month for example the Wikipedian in Residence at NIOSH (National
Institute for Occupational Safety and Health) said on Wikidata that he
would "cringe" at the thought of using Wikipedia as a source and personally
refrained from it:[5]

---o0o---


- As a note, I do semi-automated edits on my work account
<https://www.wikidata.org/wiki/User:James_Hare_(NIOSH)>, and I plan on
doing some as a volunteer as well. I don't use Wikipedia as a source (as
a Wikipedian of 11 years, I cringe at the thought ;), but if any batch
edits I do manage to screw something up despite my meticulous planning,
please let me know immediately. I will take responsibility for my own
messes. Harej <https://www.wikidata.org/wiki/User:Harej> (talk
<https://www.wikidata.org/wiki/User_talk:Harej>) 17:38, 27 October 2015
(UTC)


---o0o---

If Wikidata were to acquire the global reach its makers and sponsors hope
for, then we would have done well to build a robust system that minimises
harm, and cannot become a victim of its own success. I propose that there
is work to be done here.

Coming back briefly to the legal licensing situation, it seems to be fairly
complex even in the US, according to the relevant Wikilegal page on
Meta[6], with much depending on the amount of material extracted, as you
pointed out above.

Things are more complicated still in the EU, given that European law
protects databases created by EU citizens or residents (which includes a
good number of Wikimedians), with that protection extending to "sweat of
the brow" (unprotected in the US). EU law even prohibits the "repeated and
systematic extraction" of "insubstantial parts of the contents" of a
database (where the term "database" is defined broadly enough to include a
Wikipedia).

There's not much point in my saying more about the legal aspects of
licensing; even the advice from the Foundation's legal professionals says
it's rarely easy to predict how a court might rule under either EU or US
law.[6]

Andreas

[1] http://wiki.freebase.com/wiki/License_compatibility
[2]
https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_data.3F
[3]
https://www.wikidata.org/wiki/Wikidata:Project_chat#Reliable_Bot_imports_from_wikipedias.3F
[4]
https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00178.html
https://www.wikidata.org/w/index.php?title=Wikidata:General_disclaimer&diff=271182&oldid=270466
http://osdir.com/ml/general/2012-11/msg31088.html
http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg03088.html
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Modifying_license_.3F

https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_release_email_templates

https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Archive/2014/04#License
http://www.gossamer-threads.com/lists/wiki/foundation/450291#450291
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/05#Population_statistics.3F

https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_owners

[5]
https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&diff=prev&oldid=263358509

[6] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
It would be a gross violation of trust to bring Wikidata under a different
license. When an external source is willing to share its data, it can do
so. With explicit agreement we can copy data in from them in this way. Even
when this is not possible for whatever reason, we can still contribute
because we can compare data and on the basis of differences in existing
data curate our data and enable them to share our findings.

I am amused by your fear for manipulation. Yes, data can be manipulated but
once we see it happen, we can take measures when it affects the data we
hold. Provenance of data is at this stage something we at Wikidata wish
for. Arguably it does not make sense to make it a priority for all of our
data because it would stifle Wikidata and it is utterly against the wiki
spirit.

The best way to guard against manipulation is to cooperate widely and take
any difference in data as serious. It is in the differences where we want
to know why the differences and why they exist. Focussing on known issues
helps us identify systemic issues and when we do we can expose such
manipulation with proof. In this way we are using a SMART methodology. No I
would never use the license as a weapon, it is how manipulation is
justified.

Importing data from Wikipedia is a sensible thing to do. Its data is
relatively well known for its quality. It has its issues but its basis is
NPOV. When people are alarmed about importing from Wikipedia, it tells us
more of what they think of the quality of Wikipedia than of the quality of
Wikidata. When people are alarmed because they cannot control it, ask
yourself what is their problem and how do their arguments enable the notion
of Wikidata as a wiki? When imported data is wrong, there are tools to
remove content quite delicately. So identify an issue and it can be dealt
with.

When you argue that Wikidata cannot be used as a central storage. Fine, do
not use it. In the mean time quality of specific sets of data is of higher
quality than any Wikipedia. This is a proven fact. The question if Wikidata
is useful as a central datarepository at this time can only be answered as
NO when it means it is about all of Wikidata. When it is about specific
subsets of data the answer is clearly yes. It is also obvious that as time
goes on more subsets of data will be of a higher quality than any Wikipedia
(when thinking in terms of sets of data - there will always items where a
Wikipedia has an edge).

FYI I am in contact with a German university that is likely to use Wikidata
internally for its research data. It needs Reasonator type of functionality
to make it useful. It wants to share its data with Wikidata and wants two
way RSS feeds in order to include new information

When we set up cooperatation with statistical offices, we CAN attribute
easily by having bots import data on their behalf using THEIR user id and
adding sources to the new data. We can also provide data from their website
in applications.. It is not the license that means anything it is what we
agree to do. When we have sourced data in this way, you are silly to change
it. False attributions are not permitted under any license.

When we are afraid about a Seigenthaler type of event based on Wikidata,
rest assured there is plenty wrong in either Wikipedia or Wikidata tha
makes it possible for it to happen. The most important thing is to deal
with it responsibly. Just being afraid will not help us in any way. Yes we
need quality and quantity. As long as we make a best effort to improve our
data, we will do well.

As to the Wikipedian is residence, that is his opinion. At the same time
the article on ebola has been very important. It may not be science but it
certainly encyclopaedic. At the same time this Wikipedian in residence is
involved, makes a positive contribution and while he may make mistakes he
is part of the solution.

I am happy that you propose that work is to be done. What have you done but
more importantly what are you going to do? For me there is "Number of edits:
2,088,923" <https://www.wikidata.org/wiki/Special:Contributions/GerardM>
Thanks,
GerardM




On 29 November 2015 at 15:10, Andreas Kolbe <jayen466@gmail.com> wrote:

> Gergo,
>
>
> On Sun, Nov 29, 2015 at 12:36 AM, Gergo Tisza <gtisza@wikimedia.org>
> wrote:
>
> > By the same logic, to the extent Wikipedia takes its facts from non-free
> > external source, its free license would be a copyright violation. Luckily
> > for us, that's not how copyright works.
>
>
>
> I'm aware that facts are not copyrightable. By the same logic, Wikidata
> being offered under a CC BY-SA license, say, would not prevent anyone from
> extracting facts -- knowledge -- from it, and it would enable Wikidata to
> import a lot of data it presently cannot, because of licence
> incompatibilities.
>
>
>
> > Statements of facts can not be
> > copyrighted; large-scale arrangements of facts (ie. a full database)
> > probably can, but CC does not prevent others from using them without
> > attribution, just distributing them (again, it's like the GPL/Affero
> > difference);
>
>
>
> Distribution is the issue here – large-scale distribution and viral
> propagation of data with a well-documented potential for manipulation and
> error, in a way that makes the provenance of these data a closed book to
> the end user.
>
> Do you accept that this is a potential problem, and if so, how would you
> guard against it, if not through the licence?
>
>
>
> > there are sui generis database rights in some countries but
> > not in the USA where both Wikipedia and most proprietary
> > reusers/compatitors are located, so relying on neighbouring rights would
> > not help there but cause legal uncertainty for reusers (e.g. OSM which
> has
> > lots of legal trouble importing coordinates due to being EU-based).
> >
>
>
> It seems noteworthy that Freebase specifically said, with regard to loading
> structured data, "If a data source is under CC-BY, you can load it into
> Freebase as long as you provide attribution."[1]
>
> Wikidata practice seems to have taken a different path regarding licence
> compatibility, given its systematic imports from Wikipedia.
>
> Interestingly enough, it's been pointed out to me that Denny said in
> 2012,[2]
>
> ---o0o---
>
> Alexrk2, it is true that Wikidata under CC0 would not be allowed to import
> content from a Share-Alike data source. Wikidata does not plan to extract
> content out of Wikipedia at all. Wikidata will *provide* data that can be
> reused in the Wikipedias. And a CC0 source can be used by a Share-Alike
> project, be it either Wikipedia or OSM. But not the other way around. Do we
> agree on this understanding? --Denny Vrandečić (WMDE)
> <https://meta.wikimedia.org/wiki/User:Denny_Vrande%C4%8Di%C4%87_(WMDE)> (
> talk
> <
> https://meta.wikimedia.org/wiki/User_talk:Denny_Vrande%C4%8Di%C4%87_(WMDE)
> >)
> 12:39, 4 July 2012 (UTC)
>
> ---o0o---
>
> The key sentence here is "Wikidata does not plan to extract content out of
> Wikipedia at all."
>
> That doesn't seem to be how things have turned out, because today we have
> people on Wikidata raising alarms about mass imports from Wikipedia:[3]
>
> ---o0o---
>
> Reliable Bot imports from wikipedias?
>
> In a Wikipedia discussion I came by chance across a link to the following
> discussion:
>
> - Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import
> <
> https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import
> >
>
> [...] To provide an outside perspective as Wikipedian (and a potential
> use[r] of WD in the future). I wholeheartedly agree with Snipre, in fact
> "bots [ar]e running wild" and the uncontrolled import of data/information
> from Wikipedias is one of the main reasons for some Wikipedias developing
> an increasingly hostile attitude towards WD and its usage in Wikipedias.
> *If* WD is ever to function as a central data storage for various Wikimedia
> projects and in particular Wikipedia as well (in analogy to Commons),
> *then*
> quality has to take the driver's seat over quantity. A central storage
> needs a much better data integrity than the projects using it, because one
> mistake in its data will multiply throughout the projects relying on WD,
> which may cause all sorts of problems. For crude comparison think of a
> virus placed on a central server than on a single client.The consequences
> are much more severe and nobody in their right mind would run the server
> with even less protection/restrictions than the client.
>
> Another thin[g] is, that if you envision users of other Wikimedia projects
> such as Wikipedia or even 3rd party external projects to eventually help
> with data maintenance when they start using WD, then you might find them
> rather unwilling to do so, if not enough attention is paid to quality,
> instead they probably just dump WD from their projects.
>
> In general all the advantages of the central data storage depend on the
> quality (reliability) of data. If that is not given to reasonable high
> degree, there is no point to have central data storage at all. All the
> great application become useless if they operate on false data.--Kmhkmh
> <
> https://www.wikidata.org/w/index.php?title=User:Kmhkmh&action=edit&redlink=1
> >
> (talk <https://www.wikidata.org/wiki/User_talk:Kmhkmh>) 12:00, 19
> November
> 2015 (UTC)
>
> ---o0o---
>
> (I was unaware of that post by Kmhkmh when I started contributing to this
> discussion, but it obviously echoes some of my own concerns.)
>
> I've been told on the German Wikipedia that the Wikidata CC0 licence has
> long been a controversial issue, subject to recurrent discussion,
> especially with regard to official population statistics in Europe, whose
> publishers often require attribution, making their wholesale import in
> Wikidata's CC0 environment problematic.[4]
>
> In reviewing these discussions, I couldn't help but be reminded of
> Flickrwashing schemes by some contributors' lines of thought: how -- via
> which intermediary steps -- can we get the info into our CC0 project
> without being seen to fall foul of the original publishers' licenses?
>
> As I understand it, the intent is to bully other data publishers into
> making their data available under CC0 as well. I understand this from an
> open-content perspective, and I can see how it might benefit Google's and
> other information platforms' bottom line, but I reiterate -- there are
> very, very significant downsides to having a central database subject to
> anonymous manipulation by all comers whose data is automatically propagated
> by major search engines. There are many autocratic regimes in the world
> today who spend a lot of money and effort to achieve this kind of uniform
> media response in their countries.
>
> In my opinion, it creates a significant vulnerability in the global
> information infrastructure. If, in more troubled times ahead, people are
> fed the same unattributed lie by all major online outlets, because they are
> all automatically propagating the content of Wikimedia's CC0 database, then
> this could potentially alter the course of history, and not in a good way.
>
> I am happy to hear ideas about how to address this that do not involve
> licensing. We need more transparency about data provenance.
>
> You may argue that Wikidata is still in its early days, and has nowhere
> near the amount of data, nowhere near the reach and impact today to justify
> such an effort. Maybe it never will, and I'm worrying for nothing.
>
> But we thought much the same about Wikipedia around the time of the
> Seigenthaler incident. Before we knew it, Wikipedia had become the world's
> dominant information resource, with increasing numbers of government
> officials, judges, journalists and academics happy to accept its word
> uncritically – in a way that horrifies most Wikipedians, who are well aware
> of the system's weaknesses.
>
> Last month for example the Wikipedian in Residence at NIOSH (National
> Institute for Occupational Safety and Health) said on Wikidata that he
> would "cringe" at the thought of using Wikipedia as a source and personally
> refrained from it:[5]
>
> ---o0o---
>
>
> - As a note, I do semi-automated edits on my work account
> <https://www.wikidata.org/wiki/User:James_Hare_(NIOSH)>, and I plan on
> doing some as a volunteer as well. I don't use Wikipedia as a source (as
> a Wikipedian of 11 years, I cringe at the thought ;), but if any batch
> edits I do manage to screw something up despite my meticulous planning,
> please let me know immediately. I will take responsibility for my own
> messes. Harej <https://www.wikidata.org/wiki/User:Harej> (talk
> <https://www.wikidata.org/wiki/User_talk:Harej>) 17:38, 27 October 2015
> (UTC)
>
>
> ---o0o---
>
> If Wikidata were to acquire the global reach its makers and sponsors hope
> for, then we would have done well to build a robust system that minimises
> harm, and cannot become a victim of its own success. I propose that there
> is work to be done here.
>
> Coming back briefly to the legal licensing situation, it seems to be fairly
> complex even in the US, according to the relevant Wikilegal page on
> Meta[6], with much depending on the amount of material extracted, as you
> pointed out above.
>
> Things are more complicated still in the EU, given that European law
> protects databases created by EU citizens or residents (which includes a
> good number of Wikimedians), with that protection extending to "sweat of
> the brow" (unprotected in the US). EU law even prohibits the "repeated and
> systematic extraction" of "insubstantial parts of the contents" of a
> database (where the term "database" is defined broadly enough to include a
> Wikipedia).
>
> There's not much point in my saying more about the legal aspects of
> licensing; even the advice from the Foundation's legal professionals says
> it's rarely easy to predict how a court might rule under either EU or US
> law.[6]
>
> Andreas
>
> [1] http://wiki.freebase.com/wiki/License_compatibility
> [2]
>
> https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_data.3F
> [3]
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat#Reliable_Bot_imports_from_wikipedias.3F
> [4]
> https://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00178.html
>
> https://www.wikidata.org/w/index.php?title=Wikidata:General_disclaimer&diff=271182&oldid=270466
> http://osdir.com/ml/general/2012-11/msg31088.html
> http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg03088.html
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Modifying_license_.3F
>
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_release_email_templates
>
>
> https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Archive/2014/04#License
> http://www.gossamer-threads.com/lists/wiki/foundation/450291#450291
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/05#Population_statistics.3F
>
>
> https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#Data_owners
>
> [5]
>
> https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&diff=prev&oldid=263358509
>
> [6] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Simply if I have a litre of sewage and add to it 100ml of pure water,
I still have sewage. Conversely if I have a a litre of pure water and
pour in 100ml of sewage into it then what do I have?

What if 2 out of 10 bank statements are erroneous is that OK because
8 are accurate?

What if ever 2 out of 10 gas stations delivered Gasoline from the Diesel
pump?


On 29/11/2015 10:38, Gerard Meijssen wrote:
> Hoi,
> More FUD. Poisonous how?
> Thanks,
> GerardM
>
> On 29 November 2015 at 11:33, Lilburne <lilburne@tygers-of-wrath.net
> <mailto:lilburne@tygers-of-wrath.net>> wrote:
>
> On 29/11/2015 09:42, Gerard Meijssen wrote:
>
> Hoi, Wikidata is a wiki and, you seem to always forget that.
> > > The corruption of data .. how? Each statement is its own
> data item
>
> > how do you corrupt that? As I say so often, when you get a
> collection > that is 80% correct you have an error rate of 20%.
>
> Surely this isn't some exam paper where you get an 80% passing mark.
> What you have is a basket of eggs ... 20% of which are poisonous.
>
>
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> <mailto:Wikimedia-l@lists.wikimedia.org>
> Unsubscribe:
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org
> <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe>
>
>

_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Then you've not understood the point have you. Whether it is freely
available
ought to be the first stage of a process that verifies the accuracy of
the data.


the accuracy of the
On 29/11/2015 10:42, Gerard Meijssen wrote:
> Hoi,
> When you do that all your data is removed and you are banned from
> Wikidata.
> Thanks,
> GerardM
>
> On 29 November 2015 at 11:40, Lilburne <lilburne@tygers-of-wrath.net
> <mailto:lilburne@tygers-of-wrath.net>> wrote:
>
> On 29/11/2015 00:37, Gerard Meijssen wrote:
>
> Hoi,
> It was from the Myanmar WIkipedia that a lot of data was
> imported to
> Wikidata. Data that did not exist elsewhere. I do not care
> really what
> "Freedom House" says. I do not know them, I do know that the
> data is
> relevant and useful It was even the subject on a blogpost..
>
> You may ignore data that is not from a source that you like. This
> indiscriminate POV is not a NPOV.
>
>
> Isn't the point that the data was taken primarily because it was
> available, and that
> there was no attempt to verify its accuracy. If I give you 10,000
> images of lichen but
> before hand randomly switch the names of 2000 of them and add
> misleading geodata
> randomly to another 2000 are the images useful as data? Would
> including them
> improve NPOV?
>
>
>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> <mailto:Wikimedia-l@lists.wikimedia.org>
> Unsubscribe:
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org
> <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe>
>
>

_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
In the Netherlands water is an essential ingredient to our country. It is a
friend and it is an enemy. Where I live we are 3 meters below sea level.
The Rhine streams down after including all the effluent from Germany and
Switzerland. We do swim in the Rhine, it is clean enough for the WWF to
free sturgeons so that we may have them come and breed in the future. We
use its water and process it to drinking water.

Yes, shit happens and we can deal with that.

Your attempt at rhetorical questions is suspect because you do not see that
quality / purity in water is like with Wikidata a process. It is not in
doing it once. It is done by doing it again and again. That is why the
water is clean and iteratively we will what we ingress and make it the
quality that we will not get in any other way,
Thanks,
GerardM

On 29 November 2015 at 19:11, Lilburne <lilburne@tygers-of-wrath.net> wrote:

> Simply if I have a litre of sewage and add to it 100ml of pure water,
> I still have sewage. Conversely if I have a a litre of pure water and
> pour in 100ml of sewage into it then what do I have?
>
> What if 2 out of 10 bank statements are erroneous is that OK because
> 8 are accurate?
>
> What if ever 2 out of 10 gas stations delivered Gasoline from the Diesel
> pump?
>
>
> On 29/11/2015 10:38, Gerard Meijssen wrote:
>
>> Hoi,
>> More FUD. Poisonous how?
>> Thanks,
>> GerardM
>>
>> On 29 November 2015 at 11:33, Lilburne <lilburne@tygers-of-wrath.net
>> <mailto:lilburne@tygers-of-wrath.net>> wrote:
>>
>> On 29/11/2015 09:42, Gerard Meijssen wrote:
>>
>> Hoi, Wikidata is a wiki and, you seem to always forget that.
>> > > The corruption of data .. how? Each statement is its own
>> data item
>>
>> > how do you corrupt that? As I say so often, when you get a
>> collection > that is 80% correct you have an error rate of 20%.
>>
>> Surely this isn't some exam paper where you get an 80% passing mark.
>> What you have is a basket of eggs ... 20% of which are poisonous.
>>
>>
>>
>> _______________________________________________
>> Wikimedia-l mailing list, guidelines at:
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
>> Wikimedia-l@lists.wikimedia.org
>> <mailto:Wikimedia-l@lists.wikimedia.org>
>> Unsubscribe:
>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> <mailto:wikimedia-l-request@lists.wikimedia.org
>> <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe>
>>
>>
>>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Article by Mark Graham in Slate, Nov. 30, 2015:

Why Does Google Say Jerusalem Is the Capital of Israel?
It has to do with the fact that the Web is now optimized for machines, not
people.

http://www.slate.com/articles/technology/future_tense/2015/11/why_does_google_say_jerusalem_is_the_capital_of_israel.html

Excerpt:

[...] because of the ease of separating content from containers, the
provenance of data is often obscured. Contexts are stripped away, and
sources vanish into Google’s black box. For instance, most of the
information in Google’s infoboxes on cities doesn’t tell us where the data
is sourced from.

Second, because of the stripping away of context, it can be challenging to
represent important nuance. In the case of Jerusalem, the issue is less
that particular viewpoints about the city’s status as a capital are true or
false, but rather that there can be multiple truths, all of which are hard
to fold into a single database entry. Finally, it’s difficult for users to
challenge or contest representations that they deem to be unfair. Wikidata
is, and Freebase used to be, built on user-generated content, but those
users tend to be a highly specialized group—it’s not easy for lay users to
participate in those platforms. And those platforms often aren’t the place
in which their data is ultimately displayed, making it hard for some users
to find them. Furthermore, because Google’s Knowledge Base is so opaque
about where it pulls its information from, it is often unclear if those
sites are even the origins of data in the first place.

Jerusalem is just one example among many in which knowledge bases are
increasingly distancing (and in some case cutting off) debate about
contested knowledges of places. [followed by more examples]

My point is not that any of these positions are right or wrong. It is
instead that the move to linked data and the semantic Web means that many
decisions about how places are represented are increasingly being made by
people and processes far from, and invisible to, people living under the
digital shadows of those very representations. Contestations are
centralized and turned into single data points that make it difficult for
local citizens to have a significant voice in the co-construction of their
own cities. [...]

Linked data and the machine-readable Web have important implications for
representation, voice, and ultimately power in cities, and we need to
ensure that we aren't seduced into codifying, categorizing, and structuring
in cases when ambiguity, not certainty, reigns.
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
This thread is called "quality". There are ways to include multiple
truisms. Wikidata is the data project of the Wikimedia Foundation, it is a
wiki, so when you have issues, deal with it.

I prefer to quote what John Ruskin had to say: "Quality is never an
accident. It is always the result of intelligent effort". I am more
concerned with the fact that the Linguapax Prize does not have all of its
winners. I am more concerned that half of the items of Wikidata have fewer
than three statements.

These are issues that deal with the quality of Wikidata. As Magnus has
started to produce reports on issues between Mix'n Match and Wikidata, he
invites people to improve our quality. It is one way in which the quality
of our current data improves measurably.

When I blog about the Nansen Refugee award I report on the type of issues I
find in Wikipedia. It is easy to find fault. The point however is not that
Wikipedia is bad nor that Wikidata is good. The point is that in order to
achieve quality there is a lot of work to do.
Thanks,
GerardM

On 1 December 2015 at 12:27, Andreas Kolbe <jayen466@gmail.com> wrote:

> Article by Mark Graham in Slate, Nov. 30, 2015:
>
> Why Does Google Say Jerusalem Is the Capital of Israel?
> It has to do with the fact that the Web is now optimized for machines, not
> people.
>
>
> http://www.slate.com/articles/technology/future_tense/2015/11/why_does_google_say_jerusalem_is_the_capital_of_israel.html
>
> Excerpt:
>
> [...] because of the ease of separating content from containers, the
> provenance of data is often obscured. Contexts are stripped away, and
> sources vanish into Google’s black box. For instance, most of the
> information in Google’s infoboxes on cities doesn’t tell us where the data
> is sourced from.
>
> Second, because of the stripping away of context, it can be challenging to
> represent important nuance. In the case of Jerusalem, the issue is less
> that particular viewpoints about the city’s status as a capital are true or
> false, but rather that there can be multiple truths, all of which are hard
> to fold into a single database entry. Finally, it’s difficult for users to
> challenge or contest representations that they deem to be unfair. Wikidata
> is, and Freebase used to be, built on user-generated content, but those
> users tend to be a highly specialized group—it’s not easy for lay users to
> participate in those platforms. And those platforms often aren’t the place
> in which their data is ultimately displayed, making it hard for some users
> to find them. Furthermore, because Google’s Knowledge Base is so opaque
> about where it pulls its information from, it is often unclear if those
> sites are even the origins of data in the first place.
>
> Jerusalem is just one example among many in which knowledge bases are
> increasingly distancing (and in some case cutting off) debate about
> contested knowledges of places. [followed by more examples]
>
> My point is not that any of these positions are right or wrong. It is
> instead that the move to linked data and the semantic Web means that many
> decisions about how places are represented are increasingly being made by
> people and processes far from, and invisible to, people living under the
> digital shadows of those very representations. Contestations are
> centralized and turned into single data points that make it difficult for
> local citizens to have a significant voice in the co-construction of their
> own cities. [...]
>
> Linked data and the machine-readable Web have important implications for
> representation, voice, and ultimately power in cities, and we need to
> ensure that we aren't seduced into codifying, categorizing, and structuring
> in cases when ambiguity, not certainty, reigns.
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen <gerard.meijssen@gmail.com>
wrote:

> So identify an issue and it can be dealt with.
>


The fact an issue *can* be dealt with does not mean that it *will* be dealt
with.

For example, in the post that opened this discussion a little over a week
ago, you said:

"At Wikidata we often find issues with data imported from a Wikipedia.
Lists have been produced with these issues on the Wikipedia involved and
arguably they do present issues with the quality of Wikipedia or Wikidata
for that matter. So far hardly anything resulted from such outreach."

These were your own words: "hardly anything resulted from such outreach."
Wikimedia is three years into this project. If people produce lists of
quality issues, that's great, but if nothing happens as a result, that's
not so great.

An example of this is available in this very thread. Three days ago I
mentioned the issues with the Grasulf II of Friuli entries on Reasonator
and Wikidata. I didn't expect that you or anyone else would fix them, and
they haven't been, at the time of writing.

You certainly could have fixed them -- you have made hundreds of edits on
Wikidata since replying to that post of mine -- but you haven't. Adding new
data is more satisfying than sourcing and improving an obscure entry. (If
you're wondering why I didn't fix the entry myself, see the section "And to
answer the obvious question …" in last month's Signpost op-ed.[1])

This problem is replicated across the Wikimedia universe. Wikimedia
projects are run by volunteers. They work on what interests them, or
whatever they have an investment in. Fixing old errors is not as appealing
as importing 2 million items of new data (including tens or hundreds of
thousands of erroneous ones), because fixing errors is slow work. It
retards the growth of your edit count! You spend one hour researching a
date, and all you get for that effort is one lousy edit in your
contributions history. There are plenty of tasks allowing you to rack up
500 edits in 5 minutes. People seem to prefer those.

That is why Wikipedia has the familiar backlogs in areas like copyright
infringement or AfC. Even warning templates indicating bias or other
problematic content often sit for years without being addressed.

There is a systemic mismatch between data creation and data curation. There
is a lot of energy for the former, and very little energy for the latter.
That is why initiatives like the one started by WMF board member James
Heilman and others, to have the English Wikipedia's medical articles
peer-reviewed, are so important. They are small steps in the right
direction.



> When we are afraid about a Seigenthaler type of event based on Wikidata,
> rest assured there is plenty wrong in either Wikipedia or Wikidata tha
> makes it possible for it to happen. The most important thing is to deal
> with it responsibly. Just being afraid will not help us in any way. Yes we
> need quality and quantity. As long as we make a best effort to improve our
> data, we will do well.
>


That's "eventualism". "Quality is terrible, but eventually it will be
great, because ... we're all trying, and it's a wiki!" To me that sounds
more like religious faith or magical thinking than empirical science.

Things being on a wiki does not guarantee quality; far from it.[2][3][4][5]



> As to the Wikipedian is residence, that is his opinion. At the same time
> the article on ebola has been very important. It may not be science but it
> certainly encyclopaedic. At the same time this Wikipedian in residence is
> involved, makes a positive contribution and while he may make mistakes he
> is part of the solution.
>
> I am happy that you propose that work is to be done. What have you done but
> more importantly what are you going to do? For me there is "Number of
> edits:
> 2,088,923" <https://www.wikidata.org/wiki/Special:Contributions/GerardM>
>


I will do what I can to encourage Wikimedia Foundation board members and
management to review the situation, in consultation with outside academics
like those at the Oxford Internet Institute who are concerned about present
developments, and to consider whether more stringent sourcing policies are
required for Wikidata in order to assure the quality and traceability of
data in the Wikidata corpus.

The public is the most important stakeholder in this, and should be
informed and involved. If there are quality issues, the Wikimedia
Foundation should be completely transparent about them in its public
communications, neither minimising nor exaggerating the issues. Known
problems and potential issues should be publicised as widely as possible in
order to minimise the harm to society resulting from uncritical reuse of
faulty data.

I have started to reach out to scholars and journalists, inviting them to
review this thread as well as related materials, and form their own
conclusions. I may write an op-ed about it in the Signpost, because I
believe it's an important issue that deserves wider attention and debate.

As far as my own contributions are concerned, I am more inclined to boycott
Wikidata.

Apart from all the issues discussed over the past few days, there is
another aspect to my reluctance to contribute to Wikidata.

The Knowledge Graph is a major new Google feature. It adds value to
Google's search engine results pages. It stops people from clicking through
to other sources, including Wikipedia. The recent downturn in Wikipedia
pageviews has been widely linked to the Knowledge Graph.

By ensuring that more people visit Google's ad-filled pages, and stay on
them rather than clicking through to other sites, the Knowledge Graph is at
least partly responsible for recent increases in Google's revenue, which
currently stands at around $200 million a day.[6] (Income after expenses is
about a third of that, i.e. $65 million.)

The development of Wikidata was co-funded by Google, which I understand
donated 325,000 Euros (about $345,000) to that effort.[8] A little bit of
arithmetic shows that, with Google's profits running at $65 million a day,
it takes Google less than 8 minutes to earn that amount of money. Given how
much Google stands to benefit from this development, it seems a paltry
investment.

This set me thinking. If we assume that Wikipedia's and Wikidata's
contribution to Google's annual revenue via the Knowledge Graph is just
1/365 – the revenue of one day per year – the monetary value of these
projects to Google is still astronomical.

There have been around 2.5 billion edits to Wikimedia projects to date.[7]
If Google chose to give one day's revenue each year to Wikimedia
volunteers, as a thank-you, this would average out at about 200,000,000 /
2,500,000,000 = 8 cents per edit. Someone like Koavf, who's made 1.5
million edits[9], would stand to receive around $120,000 a year. Even my
paltry 50,000 edits would net me about $4,000 a year. That's the value of
free content.

And that's just Google. Other major players like Facebook and Bing profit,
too.

Wikidata seems custom-made to benefit Google and Microsoft, at the expense
of Wikipedia and other sites. Given my other commitments to Wikimedia
projects, the limited number of hours in a day, and all the other concerns
mentioned in this thread, I feel little inclined at present to further
expand my volunteering in order to work for these multi-billion dollar
corporations for free.


[1]
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed
[2]
http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-business-school-316133.html
[3]
http://www.salon.com/2013/05/17/revenge_ego_and_the_corruption_of_wikipedia/
[4]
https://www.washingtonpost.com/news/the-intersect/wp/2015/04/15/the-great-wikipedia-hoax/
[5]
http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-controversy-right-wing/
[6] https://investor.google.com/earnings/2015/Q2_google_earnings.html
[7] https://tools.wmflabs.org/wmcounter/
[8] https://www.wikimedia.de/wiki/Pressemitteilungen/PM_3_12_Wikidata_EN
[9]
https://www.washingtonpost.com/news/the-intersect/wp/2015/07/22/you-dont-know-it-but-youre-working-for-facebook-for-free/
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hoi,
<grin> I do work on quality issues. I blog about them. I work towards
implementing solutions. </grin> I have fixed quite a few errors in Wikidata
and I do not rack up as many edits as I could because of it.

In the mean time with your "I do not want to be involved attitude" you are
the proverbial sailor who stays on shore. It is your option to get your
hands dirty or not. However, a friend of mine mentioned this attitude and
compared it to the people who said that Wikipedia would never work. That is
fine so I will just move on away from many of your arguments..

I do not care about profit. I have over 2 million edits on Wikidata alone
and I have a few others on other projects as well. They may, it is implicit
in the license make a profit. The point is that as more data is freed, it
will free more data. With more free data we can inform more people. We can
share more of the sum of all available knowledge.

I wonder, there are many ways in which quality can be improved and all you
do is refer to others. Why should I bother with your arguments when they
are not yours and when you do not show how to make a difference? My
arguments are plausible and I actively work towards getting them
implemented. I do not need to convince people to do my work. The only thing
I want to do is ask people for their support so that we get sooner to the
stage where we will share in the sum of all available knowledge, something
we do not really do at this stage.
Thanks,
GerardM

On 1 December 2015 at 15:30, Andreas Kolbe <jayen466@gmail.com> wrote:

> On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen <
> gerard.meijssen@gmail.com>
> wrote:
>
> > So identify an issue and it can be dealt with.
> >
>
>
> The fact an issue *can* be dealt with does not mean that it *will* be dealt
> with.
>
> For example, in the post that opened this discussion a little over a week
> ago, you said:
>
> "At Wikidata we often find issues with data imported from a Wikipedia.
> Lists have been produced with these issues on the Wikipedia involved and
> arguably they do present issues with the quality of Wikipedia or Wikidata
> for that matter. So far hardly anything resulted from such outreach."
>
> These were your own words: "hardly anything resulted from such outreach."
> Wikimedia is three years into this project. If people produce lists of
> quality issues, that's great, but if nothing happens as a result, that's
> not so great.
>
> An example of this is available in this very thread. Three days ago I
> mentioned the issues with the Grasulf II of Friuli entries on Reasonator
> and Wikidata. I didn't expect that you or anyone else would fix them, and
> they haven't been, at the time of writing.
>
> You certainly could have fixed them -- you have made hundreds of edits on
> Wikidata since replying to that post of mine -- but you haven't. Adding new
> data is more satisfying than sourcing and improving an obscure entry. (If
> you're wondering why I didn't fix the entry myself, see the section "And to
> answer the obvious question …" in last month's Signpost op-ed.[1])
>
> This problem is replicated across the Wikimedia universe. Wikimedia
> projects are run by volunteers. They work on what interests them, or
> whatever they have an investment in. Fixing old errors is not as appealing
> as importing 2 million items of new data (including tens or hundreds of
> thousands of erroneous ones), because fixing errors is slow work. It
> retards the growth of your edit count! You spend one hour researching a
> date, and all you get for that effort is one lousy edit in your
> contributions history. There are plenty of tasks allowing you to rack up
> 500 edits in 5 minutes. People seem to prefer those.
>
> That is why Wikipedia has the familiar backlogs in areas like copyright
> infringement or AfC. Even warning templates indicating bias or other
> problematic content often sit for years without being addressed.
>
> There is a systemic mismatch between data creation and data curation. There
> is a lot of energy for the former, and very little energy for the latter.
> That is why initiatives like the one started by WMF board member James
> Heilman and others, to have the English Wikipedia's medical articles
> peer-reviewed, are so important. They are small steps in the right
> direction.
>
>
>
> > When we are afraid about a Seigenthaler type of event based on Wikidata,
> > rest assured there is plenty wrong in either Wikipedia or Wikidata tha
> > makes it possible for it to happen. The most important thing is to deal
> > with it responsibly. Just being afraid will not help us in any way. Yes
> we
> > need quality and quantity. As long as we make a best effort to improve
> our
> > data, we will do well.
> >
>
>
> That's "eventualism". "Quality is terrible, but eventually it will be
> great, because ... we're all trying, and it's a wiki!" To me that sounds
> more like religious faith or magical thinking than empirical science.
>
> Things being on a wiki does not guarantee quality; far from it.[2][3][4][5]
>
>
>
> > As to the Wikipedian is residence, that is his opinion. At the same time
> > the article on ebola has been very important. It may not be science but
> it
> > certainly encyclopaedic. At the same time this Wikipedian in residence is
> > involved, makes a positive contribution and while he may make mistakes he
> > is part of the solution.
> >
> > I am happy that you propose that work is to be done. What have you done
> but
> > more importantly what are you going to do? For me there is "Number of
> > edits:
> > 2,088,923" <https://www.wikidata.org/wiki/Special:Contributions/GerardM>
> >
>
>
> I will do what I can to encourage Wikimedia Foundation board members and
> management to review the situation, in consultation with outside academics
> like those at the Oxford Internet Institute who are concerned about present
> developments, and to consider whether more stringent sourcing policies are
> required for Wikidata in order to assure the quality and traceability of
> data in the Wikidata corpus.
>
> The public is the most important stakeholder in this, and should be
> informed and involved. If there are quality issues, the Wikimedia
> Foundation should be completely transparent about them in its public
> communications, neither minimising nor exaggerating the issues. Known
> problems and potential issues should be publicised as widely as possible in
> order to minimise the harm to society resulting from uncritical reuse of
> faulty data.
>
> I have started to reach out to scholars and journalists, inviting them to
> review this thread as well as related materials, and form their own
> conclusions. I may write an op-ed about it in the Signpost, because I
> believe it's an important issue that deserves wider attention and debate.
>
> As far as my own contributions are concerned, I am more inclined to boycott
> Wikidata.
>
> Apart from all the issues discussed over the past few days, there is
> another aspect to my reluctance to contribute to Wikidata.
>
> The Knowledge Graph is a major new Google feature. It adds value to
> Google's search engine results pages. It stops people from clicking through
> to other sources, including Wikipedia. The recent downturn in Wikipedia
> pageviews has been widely linked to the Knowledge Graph.
>
> By ensuring that more people visit Google's ad-filled pages, and stay on
> them rather than clicking through to other sites, the Knowledge Graph is at
> least partly responsible for recent increases in Google's revenue, which
> currently stands at around $200 million a day.[6] (Income after expenses is
> about a third of that, i.e. $65 million.)
>
> The development of Wikidata was co-funded by Google, which I understand
> donated 325,000 Euros (about $345,000) to that effort.[8] A little bit of
> arithmetic shows that, with Google's profits running at $65 million a day,
> it takes Google less than 8 minutes to earn that amount of money. Given how
> much Google stands to benefit from this development, it seems a paltry
> investment.
>
> This set me thinking. If we assume that Wikipedia's and Wikidata's
> contribution to Google's annual revenue via the Knowledge Graph is just
> 1/365 – the revenue of one day per year – the monetary value of these
> projects to Google is still astronomical.
>
> There have been around 2.5 billion edits to Wikimedia projects to date.[7]
> If Google chose to give one day's revenue each year to Wikimedia
> volunteers, as a thank-you, this would average out at about 200,000,000 /
> 2,500,000,000 = 8 cents per edit. Someone like Koavf, who's made 1.5
> million edits[9], would stand to receive around $120,000 a year. Even my
> paltry 50,000 edits would net me about $4,000 a year. That's the value of
> free content.
>
> And that's just Google. Other major players like Facebook and Bing profit,
> too.
>
> Wikidata seems custom-made to benefit Google and Microsoft, at the expense
> of Wikipedia and other sites. Given my other commitments to Wikimedia
> projects, the limited number of hours in a day, and all the other concerns
> mentioned in this thread, I feel little inclined at present to further
> expand my volunteering in order to work for these multi-billion dollar
> corporations for free.
>
>
> [1]
> https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-10-07/Op-ed
> [2]
>
> http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-business-school-316133.html
> [3]
>
> http://www.salon.com/2013/05/17/revenge_ego_and_the_corruption_of_wikipedia/
> [4]
>
> https://www.washingtonpost.com/news/the-intersect/wp/2015/04/15/the-great-wikipedia-hoax/
> [5]
>
> http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-controversy-right-wing/
> [6] https://investor.google.com/earnings/2015/Q2_google_earnings.html
> [7] https://tools.wmflabs.org/wmcounter/
> [8] https://www.wikimedia.de/wiki/Pressemitteilungen/PM_3_12_Wikidata_EN
> [9]
>
> https://www.washingtonpost.com/news/the-intersect/wp/2015/07/22/you-dont-know-it-but-youre-working-for-facebook-for-free/
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On Tue, Dec 1, 2015 at 4:16 PM, Gerard Meijssen <gerard.meijssen@gmail.com>
wrote:

> In the mean time with your "I do not want to be involved attitude" you are
>
the proverbial sailor who stays on shore.



Well, me and 99.9999 percent of the global population. Not everyone has to
contribute to Wikidata. :)



> My arguments are plausible and I actively work towards getting them
> implemented. I do not need to convince people to do my work. The only thing
> I want to do is ask people for their support so that we get sooner to the
> stage where we will share in the sum of all available knowledge, something
> we do not really do at this stage.
>


Thanks for the spirited debate, and good luck to you, Gerard. May your
efforts be fruitful.

Andreas
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On 2015-12-01 12:27, Andreas Kolbe wrote:
> Article by Mark Graham in Slate, Nov. 30, 2015:
>
> Why Does Google Say Jerusalem Is the Capital of Israel?
> It has to do with the fact that the Web is now optimized for machines,
> not
> people.
>

> Second, because of the stripping away of context, it can be challenging
> to
> represent important nuance. In the case of Jerusalem, the issue is less
> that particular viewpoints about the city’s status as a capital are
> true or
> false, but rather that there can be multiple truths, all of which are
> hard
> to fold into a single database entry. Finally, it’s difficult for users
> to
> challenge or contest representations that they deem to be unfair.
> Wikidata
> is, and Freebase used to be, built on user-generated content, but those
> users tend to be a highly specialized group—it’s not easy for lay users
> to
> participate in those platforms. And those platforms often aren’t the
> place
> in which their data is ultimately displayed, making it hard for some
> users
> to find them. Furthermore, because Google’s Knowledge Base is so opaque
> about where it pulls its information from, it is often unclear if those
> sites are even the origins of data in the first place.
>
> Jerusalem is just one example among many in which knowledge bases are
> increasingly distancing (and in some case cutting off) debate about
> contested knowledges of places. [followed by more examples]
>

The story with Jerusalem is very simple. I created the Wikidata item.
The English description was "city in Israel". Then POV pushers came.
Some of them wanted "city in Palestine", and others wanted "capital of
Israel". Then one user, who later was elected to the board of Wikimedia
Israel, canvassed a number of users in Hebrew Wikipedia. When there were
too many POV pushers, I just unwatched the page, and it became "capital
of Israel". Later on, someone managed to change it to smth neutral.
That's it. There is nothing automatic here.

Cheers
Yaroslav

_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
Hi Yaroslav,

Thanks for the background. The "POV pushing" you describe is of course what
Graham and Ford are examining in their paper.

For what it's worth, the Wikidata item for Jerusalem[1] still contains the
statement "capital of Israel" today.

As I understand it, the Knowledge Graph uses a number of sources to "guess"
whether something is factual or not. Whether Wikidata is one of them, and
what weight it has in this process, is something I suspect no one outside
Google knows.

The op-ed I mentioned writing last week is now out as part of the current
Signpost issue.[2]

Andreas

[1] https://www.wikidata.org/wiki/Q1218
[2]
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed

On Mon, Dec 7, 2015 at 8:29 PM, Yaroslav M. Blanter <putevod@mccme.ru>
wrote:

> The story with Jerusalem is very simple. I created the Wikidata item. The
> English description was "city in Israel". Then POV pushers came. Some of
> them wanted "city in Palestine", and others wanted "capital of Israel".
> Then one user, who later was elected to the board of Wikimedia Israel,
> canvassed a number of users in Hebrew Wikipedia. When there were too many
> POV pushers, I just unwatched the page, and it became "capital of Israel".
> Later on, someone managed to change it to smth neutral. That's it. There is
> nothing automatic here.
>
> Cheers
> Yaroslav
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Quality issues [ In reply to ]
On Mon, Dec 7, 2015 at 9:53 PM, Andreas Kolbe <jayen466@gmail.com> wrote:

> Hi Yaroslav,
>
> Thanks for the background. The "POV pushing" you describe is of course what
> Graham and Ford are examining in their paper.
>
> For what it's worth, the Wikidata item for Jerusalem[1] still contains the
> statement "capital of Israel" today.
>


Really, I do not understand the difference between this kind of problem and
Wikipedia's edit wars or conflicts.
Wikidata represents knowledge in a structured, collaborative way: both
features define it, and it seems the op-ed just doesn't like them (either
one or both).

Aubrey
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

1 2 3 4 5 6 7  View All