Mailing List Archive

Fwd: Revision tagging: use cases needed
-------- Original Message --------
Subject: Revision tagging: use cases needed
Date: Tue, 14 Feb 2012 14:18:49 -0800
From: Dario Taraborelli <dtaraborelli@wikimedia.org>

We're getting to a point where we need to be able to flag specific
revisions as generated via specific tools. For example if we generate
edits via AFT call-to-actions we want to measure:
• their volume (compared to regular edits)
• their survival/revert rate

The same request is now emerging from the Article Creation Workflow
team, and having talked to many of you it sounds like community, mobile
and other engineering teams would benefit from the ability of saying:

"revision N was created with tool X [version Y]"

I started capturing some use cases on this etherpad:

http://etherpad.wikimedia.org/RevisionTags

I'd like to have your input to start building requirements and
evaluating possible solutions. Let me know off list if you have any
question/concern

Dario
_______________________________________

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Fwd: Revision tagging: use cases needed [ In reply to ]
change_tag table?

Seems straightforward.
The only thing is that we may not want to show some of those automatic
tags by default, so we would have to introduce a new concept of a
'hidden' tag.
There are several ways to accomplish that, a list in the configuration,
adding a new column, storing it in ct_params, or just using a convention
in the tag name for hidden ones.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Fwd: Revision tagging: use cases needed [ In reply to ]
+1 to adding to a modified version of change_tag, or something like it.
While unfamiliar with the current tagging interface(s), the content of
ct_tag seems arbitrary ("possible movie studio tagger" appears 4 times in
enwiki.change_tag.ct_tag out of >2mil rows) and it probably makes sense to
keep machine tagging automatically added at the time of an edit distinct
from the apparent post-edit human/bot annotation use of ct_tag.

Re: information on which automatic tags to hide, I don't think that should
be stored with every row. Keeping that in configuration (where
configuration options may consist of patterns to match) seems more
appropriate.

The primary use cases for this feature appear to be around offline analysis
and I'd like to see design take into account the possibility of this table
existing in a separate database from the revision table at some point in
the future.

-A

On Wed, Feb 15, 2012 at 10:27 AM, Platonides <Platonides@gmail.com> wrote:

> change_tag table?
>
> Seems straightforward.
> The only thing is that we may not want to show some of those automatic
> tags by default, so we would have to introduce a new concept of a
> 'hidden' tag.
> There are several ways to accomplish that, a list in the configuration,
> adding a new column, storing it in ct_params, or just using a convention
> in the tag name for hidden ones.
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Fwd: Revision tagging: use cases needed [ In reply to ]
On Wed, Feb 15, 2012 at 12:05 PM, Asher Feldman <afeldman@wikimedia.org>wrote:

> +1 to adding to a modified version of change_tag, or something like it.
> While unfamiliar with the current tagging interface(s), the content of
> ct_tag seems arbitrary ("possible movie studio tagger" appears 4 times in
> enwiki.change_tag.ct_tag out of >2mil rows) and it probably makes sense to
> keep machine tagging automatically added at the time of an edit distinct
> from the apparent post-edit human/bot annotation use of ct_tag.
>

I'm going to jump in here and explain what change_tags actually is.

In 2009, while developing the Abuse Filter, I wanted a way to mark
suspicious edits for human or bot review on the basis of abuse filter
heuristics and rules.

I ended up developing the change_tags infrastructure, hoping to use it as a
*generic* framework for marking edits in various ways. Currently you can
filter Recent Changes, Contributions and Logs by their tags, and the tags
appear on those logs, RC and contributions, generally in parentheses.

In the three years since I introduced the feature, AbuseFilter has been the
only user of that functionality, and because the community could add
arbitrary tags to filters, all the tags are currently community-added
AbuseFilter tags.

I have some hopes that we could use change_tags for things other than
AbuseFilter, but my understanding is that last time we tried this the
community felt like the infrastructure was being "intruded on". Perhaps
some modifications to the infrastructure could allow abuse filter and other
tags to coexist in the ecosystem.

—Andrew

--
Andrew Garrett
Wikimedia Foundation
agarrett@wikimedia.org
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Fwd: Revision tagging: use cases needed [ In reply to ]
Andrew Garrett wrote:
> I have some hopes that we could use change_tags for things other than
> AbuseFilter, but my understanding is that last time we tried this the
> community felt like the infrastructure was being "intruded on". Perhaps
> some modifications to the infrastructure could allow abuse filter and other
> tags to coexist in the ecosystem.

Uh, I think the community was mostly upset that you didn't provide any tag
management interface. At all.

It's been a while, but the last time I looked, there was no way to add tags,
remove tags, or modify tags. I think there are still a number of revisions
on the English Wikipedia that have been tagged by the AbuseFilter extension
with text like "potentially libelous addition" or other incendiary comments
of that nature, with no means of removal for false positives.

I don't know if it was intentional (and I imagine it wasn't), but your reply
about the community's feelings toward whatever infrastructure you're
referring to reads like a bit of a slap in the face. The scare quotes didn't
help.

The tagging system was poorly implemented. That's why it's been
under-utilized.

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Fwd: Revision tagging: use cases needed [ In reply to ]
On Wed, Feb 15, 2012 at 3:24 PM, MZMcBride <z@mzmcbride.com> wrote:

> Andrew Garrett wrote:
> > I have some hopes that we could use change_tags for things other than
> > AbuseFilter, but my understanding is that last time we tried this the
> > community felt like the infrastructure was being "intruded on". Perhaps
> > some modifications to the infrastructure could allow abuse filter and
> other
> > tags to coexist in the ecosystem.
>
> Uh, I think the community was mostly upset that you didn't provide any tag
> management interface. At all.
>
> It's been a while, but the last time I looked, there was no way to add
> tags,
> remove tags, or modify tags. I think there are still a number of revisions
> on the English Wikipedia that have been tagged by the AbuseFilter extension
> with text like "potentially libelous addition" or other incendiary comments
> of that nature, with no means of removal for false positives.
>
> I don't know if it was intentional (and I imagine it wasn't), but your
> reply
> about the community's feelings toward whatever infrastructure you're
> referring to reads like a bit of a slap in the face. The scare quotes
> didn't
> help.
>
> The tagging system was poorly implemented. That's why it's been
> under-utilized.


I caught Max online to talk about this.

I want to clarify that I was talking specifically about the possibility of
using the AbuseFilter tagging interface in other extensions. For example,
to tag changes made using particular tools or features, rather than being
upset that the community had not used the tagging interface as much as I
might have hoped. There are several technical shortcomings which make other
uses of change tagging likely to "intrude" (scare quotes because I'm not
sure that I have the right word) on the current community use of the
tagging feature. Among them are my failure to secure namespacing for change
tags, and the fact that tags are displayed in some way or other after items
in logs unconditionally. The lack of a tag management interface is one, but
it's a work-intensive problem that requires design work and does not
address the idea of using tags in other software contexts – though it does
open up new (and possibly helpful) uses of change tagging to the community
as well as allowing some cleanup work to take place.

I also want to make sure that I reinforce my qualification on the comment
that the community felt that change tagging was being "intruded upon". It's
something that I heard somewhere and isn't intended to mean that we need to
"work on" the community. I'm intending to say that some further work is
needed to genericise the feature so that Abuse Filter and other tagging
infrastructure can coexist – my point about the community objecting wasn't
intended to imply that these hypothetical/mythological objectors were being
unreasonable.

Hope this makes more sense to you than it does to me. :-)

—Andrew

--
Andrew Garrett
Wikimedia Foundation
agarrett@wikimedia.org
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l