Mailing List Archive

Showing bytes added/removed in each edit in "View history" and "User contributions"
Hi all.

"Recent changes" shows bytes added/removed in green/red. But "View history"
only shows revision length in bytes, and "User contributions" shows no byte
counts at all.

I think it would be nice for both "View history"[1] and "User contributions" to
show bytes added/removed. This would make it easier to distinguish between
small contributions from big ones: between multiple-sentence additions and
small typo fixes.

What do you think?

All the best,
-Jason

^ [1]. You can already get bytes added/removed to history revisions using a
gadget. Just add the following line to your vector.js:
importScript('fr:MediaWiki:Gadget-HistoryNumDiff.js');


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On 07/28/2010 04:57 AM, Jason Spiro wrote:
> I think it would be nice for both "View history"[1] and "User contributions" to
> show bytes added/removed. This would make it easier to distinguish between
> small contributions from big ones: between multiple-sentence additions and
> small typo fixes.

I'm not sure we should even show byte counts by default. It must be very
confusing for newbies (especially if they don't know what a byte is).
And it clutters up the UI.
Perhaps make it optional and disable by default? It's mostly targeted at
experienced users anyway.
If we'd make it optional, I don't think your proposal would be any
problem (and as a Wikipedian I'd love to have that feature!).

-- Tobias (User:Church of emacs)
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Wed, Jul 28, 2010 at 2:37 AM, church.of.emacs.ml
<church.of.emacs.ml@googlemail.com> wrote:
>
> On 07/28/2010 04:57 AM, Jason Spiro wrote:
>>
>> I think it would be nice for both "View history"[1] and "User contributions" to
>> show bytes added/removed.  This would make it easier to distinguish between
>> small contributions from big ones:  between multiple-sentence additions and
>> small typo fixes.
>
> I'm not sure we should even show byte counts by default. It must be very
> confusing for newbies (especially if they don't know what a byte is).
> And it clutters up the UI.
> Perhaps make it optional and disable by default? It's mostly targeted at
> experienced users anyway.
> If we'd make it optional, I don't think your proposal would be any
> problem (and as a Wikipedian I'd love to have that feature!).
>
> -- Tobias (User:Church of emacs)

Newbies know what characters are, and the byte counts are really just
character counts. If someone wants to see page history, then they
probably also would benefit from knowing which edits are text
additions and which are text removals, no?

Has anyone ever done usability studies of newbies -- new Internet
users, experienced Internet users who are non-editors, or new editors?
Have the study conductors watched how they play with the history
tools?

Maybe you and I should each ask our moms to try the history tools and
see how they react to seeing the history screens and the byte counts
on those screens.

By the way, why does page history say "12,345 bytes" and not "12,345
characters"?

--
Jason Spiro: software/web developer, packager, trainer, IT consultant.
I support Linux, UNIX, Windows, and more. Contact me to discuss your needs.
+1 (416) 992-3445 / www.jspiro.com

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Mon, Aug 2, 2010 at 4:59 PM, Jason A. Spiro <jasonspiro4@gmail.com> wrote:
> Has anyone ever done usability studies of newbies -- new Internet
> users, experienced Internet users who are non-editors, or new editors?

Yep, that's what the Usability Initiative does.

>  Have the study conductors watched how they play with the history
> tools?

That I don't know. I don't know if descriptions of the Usability
Initiative's studies are all public, or what. Maybe one of them could
fill us in. My personal guess is that the best usability for newbies
would be to hide as many things as possible to make it less
intimidating.

> By the way, why does page history say "12,345 bytes" and not "12,345
> characters"?

Because it's 12,345 bytes, not 12,345 characters. :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Mon, Aug 2, 2010 at 5:18 PM, Aryeh Gregor
<Simetrical+wikilist@gmail.com> wrote:
>
> On Mon, Aug 2, 2010 at 4:59 PM, Jason A. Spiro <jasonspiro4@gmail.com> wrote:
>
>> Has anyone ever done usability studies of newbies -- new Internet
>> users, experienced Internet users who are non-editors, or new editors?
>
> Yep, that's what the Usability Initiative does.

Ah, I just took a look at their website now:
http://usability.wikimedia.org/wiki/Main_Page

>> Have the study conductors watched how they play with the history
>> tools?
>
> That I don't know.  I don't know if descriptions of the Usability
> Initiative's studies are all public, or what.  Maybe one of them could
> fill us in.  My personal guess is that the best usability for newbies
> would be to hide as many things as possible to make it less
> intimidating.
>
>> By the way, why does page history say "12,345 bytes" and not "12,345
>> characters"?
>
> Because it's 12,345 bytes, not 12,345 characters.  :)

Does the difference really matter so much that we must really use the
more-obscure and more-technical term "bytes"?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Mon, Aug 2, 2010 at 5:28 PM, Jason A. Spiro <jasonspiro4@gmail.com> wrote:
> Does the difference really matter so much that we must really use the
> more-obscure and more-technical term "bytes"?

In English, maybe not. In a lot of languages, they'll differ by a
somewhat unpredictable factor that can be as high as three. The sane
thing would be to just make the counts be in characters rather than
bytes to begin with, of course -- it's hardly difficult. I imagine
Chinese people are puzzled when RC reports +3 and there was only one
character added.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
Στις 02-08-2010, ημέρα Δευ, και ώρα 17:36 -0400, ο/η Aryeh Gregor
έγραψε:
> On Mon, Aug 2, 2010 at 5:28 PM, Jason A. Spiro <jasonspiro4@gmail.com> wrote:
> > Does the difference really matter so much that we must really use the
> > more-obscure and more-technical term "bytes"?
>
> In English, maybe not. In a lot of languages, they'll differ by a
> somewhat unpredictable factor that can be as high as three. The sane
> thing would be to just make the counts be in characters rather than
> bytes to begin with, of course -- it's hardly difficult. I imagine
> Chinese people are puzzled when RC reports +3 and there was only one
> character added.

I would love it if the indicator was in characters instead of bytes.
That's more meaningful for almost every project. Readers are looking at
text after all, not at raw strings.

Ariel



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
2010/8/2 Aryeh Gregor <Simetrical+wikilist@gmail.com>:
> That I don't know.  I don't know if descriptions of the Usability
> Initiative's studies are all public, or what.  Maybe one of them could
> fill us in.
There are videos around, yes, but I'm not sure we have reports.
Digging around on usabilitywiki should turn stuff up, or maybe someone
closer to these tests (both geographically and in terms of expertise)
can provide more exact links.

The tests were specifically focused on editing and general navigation,
and did not test the history view AFAIK.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On 08/03/2010 01:48 AM, Ariel T. Glenn wrote:
> I would love it if the indicator was in characters instead of bytes.
> That's more meaningful for almost every project. Readers are looking at
> text after all, not at raw strings.
>
> Ariel
>

That would require introduction of another field to revision table,
since byte count is not convertible to characher count in UTF-8.

--vvv

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Mon, Aug 2, 2010 at 6:45 PM, Victor Vasiliev <vasilvv@gmail.com> wrote:
> That would require introduction of another field to revision table,
> since byte count is not convertible to characher count in UTF-8.

No, we'd just have to repurpose rev_len to mean "characters" instead
of "bytes", and update all the old rows. We don't actually need the
byte count for anything, do we?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Mon, Aug 2, 2010 at 5:36 PM, Aryeh Gregor
<Simetrical+wikilist@gmail.com> wrote:
>
> On Mon, Aug 2, 2010 at 5:28 PM, Jason A. Spiro <jasonspiro4@gmail.com> wrote:
>>
>> Does the difference really matter so much that we must really use the
>> more-obscure and more-technical term "bytes"?
>
> In English, maybe not.  In a lot of languages, they'll differ by a
> somewhat unpredictable factor that can be as high as three.  The sane
> thing would be to just make the counts be in characters rather than
> bytes to begin with, of course -- it's hardly difficult.  I imagine
> Chinese people are puzzled when RC reports +3 and there was only one
> character added.

A question for the non-English wiki contributors out there: Do you
honestly care that MediaWiki shows byte counts and not character
counts? If so, why do you care?

Best regards,
-Jason

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Tue, Aug 3, 2010 at 2:53 PM, Jason A. Spiro <jasonspiro4@gmail.com> wrote:
> A question for the non-English wiki contributors out there:  Do you
> honestly care that MediaWiki shows byte counts and not character
> counts?  If so, why do you care?

If the count itself is useful (I don't think it is), then it is
probably way more useful when it's remotely accurate.

Of course, if the inaccuracy doesn't matter, then perhaps we could
just display random numbers next to the changes. That might be just as
helpful, and will save us a lot of trouble.

--
Andrew Garrett
http://werdn.us/

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On 8/3/10, Aryeh Gregor <Simetrical+wikilist@gmail.com> wrote:
> No, we'd just have to repurpose rev_len to mean "characters" instead
> of "bytes", and update all the old rows. We don't actually need the
> byte count for anything, do we?

Byte count is used. For example in Chinese Wikipedia, one of the
criteria of "Did you know" articles is ">= 3000 bytes".

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
This is a policy requirement, not a technical requirement, and can surely be
adjusted.

Am 03.08.2010 07:14, schrieb Liangent:
> On 8/3/10, Aryeh Gregor <Simetrical+wikilist@gmail.com> wrote:
>> No, we'd just have to repurpose rev_len to mean "characters" instead
>> of "bytes", and update all the old rows. We don't actually need the
>> byte count for anything, do we?
>
> Byte count is used. For example in Chinese Wikipedia, one of the
> criteria of "Did you know" articles is ">= 3000 bytes".
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On 8/3/10, ChrisiPK <chrisipk@gmail.com> wrote:
> This is a policy requirement, not a technical requirement, and can surely be
> adjusted.

It seems 1 zh char = 3 bytes gives a kind of proper weight among
characters. Obviously, zh chars look more important (when counting the
amount of content) than en chars, which are usually wikisyntax, in
zh.wp...

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
Ahem.

The revision size (and page size, meaning that of last revision) in
bytes, is available in the API. If you change the definition there is
no telling what you will break. Essentially you can't.

A character count would have to be another field.

best,
Robert

On Tue, Aug 3, 2010 at 9:53 AM, ChrisiPK <chrisipk@gmail.com> wrote:
> This is a policy requirement, not a technical requirement, and can surely be
> adjusted.
>
> Am 03.08.2010 07:14, schrieb Liangent:
>> On 8/3/10, Aryeh Gregor <Simetrical+wikilist@gmail.com> wrote:
>>> No, we'd just have to repurpose rev_len to mean "characters" instead
>>> of "bytes", and update all the old rows.  We don't actually need the
>>> byte count for anything, do we?
>>
>> Byte count is used. For example in Chinese Wikipedia, one of the
>> criteria of "Did you know" articles is ">= 3000 bytes".
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Tue, Aug 3, 2010 at 1:14 AM, Liangent <liangent@gmail.com> wrote:
> Byte count is used. For example in Chinese Wikipedia, one of the
> criteria of "Did you know" articles is ">= 3000 bytes".

I mean, is byte count used for anything where character count couldn't
be used just about as well? Like is there some code that uses rev_len
to figure out whether an article can fit into a field limited to X
bytes, or whatever? (That's probably unsafe anyway.)

On Tue, Aug 3, 2010 at 3:48 AM, Robert Ullmann <rlullmann@gmail.com> wrote:
> The revision size (and page size, meaning that of last revision) in
> bytes, is available in the API. If you change the definition there is
> no telling what you will break.

The same could be said of practically any user-visible change. I
mean, maybe if we add a new special page we'll break some script that
was screen-scraping Special:SpecialPages. We can either freeze
MediaWiki and never change anything for fear that we'll break
something, or we can evaluate each potential change on the basis of
how likely it is to break anything. I can't see anything breaking too
badly if rev_len is reported in characters instead of bytes -- the
only place it's likely to be useful is in heuristics, and by their
nature, those won't break too badly if the numbers they're based on
change somewhat.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
Just butting in here, if I recall correctly, both the PHP-native
mb_strlen() and the MediaWiki fallback mb_strlen() functions are
considerably slower (1.5 to 5 times as slow). Unless there's another
way to count characters for multibyte UTF strings, this would not be
a feasible idea.

-X!

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Tue, Aug 3, 2010 at 10:59 AM, soxred93 <soxred93@gmail.com> wrote:
> Just butting in here, if I recall correctly, both the PHP-native
> mb_strlen() and the MediaWiki fallback mb_strlen() functions are
> considerably slower (1.5 to 5 times as slow).

They only have to be run once, when the revision is saved. It's not
likely to be a noticeable cost.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
Aryeh Gregor wrote:
> On Tue, Aug 3, 2010 at 10:59 AM, soxred93 <soxred93@gmail.com> wrote:
>
>> Just butting in here, if I recall correctly, both the PHP-native
>> mb_strlen() and the MediaWiki fallback mb_strlen() functions are
>> considerably slower (1.5 to 5 times as slow).
>>
>
> They only have to be run once, when the revision is saved. It's not
> likely to be a noticeable cost.
>
Yup, though we might as well remember that not everyone has mb_
functions installed.
MediaWiki is intended to be functional both with, and without mb_
functions. That's another point towards storing both and falling back to
bytes when the char field isn't populated.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Tue, Aug 3, 2010 at 5:09 PM, Daniel Friesen
<lists@nadir-seen-fire.com> wrote:
> Yup, though we might as well remember that not everyone has mb_
> functions installed.

if ( !function_exists( 'mb_strlen' ) ) {
/**
* Fallback implementation of mb_strlen, hardcoded to UTF-8.
* @param string $str
* @param string $enc optional encoding; ignored
* @return int
*/
function mb_strlen( $str, $enc="" ) {
$counts = count_chars( $str );
$total = 0;

// Count ASCII bytes
for( $i = 0; $i < 0x80; $i++ ) {
$total += $counts[$i];
}

// Count multibyte sequence heads
for( $i = 0xc0; $i < 0xff; $i++ ) {
$total += $counts[$i];
}
return $total;
}
}

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
(just remember that it's 1.5 to 5 times slower, like I said earlier.
Whether or not that's an issue will have to be decided by higher powers)

On Aug 3, 2010, at 5:54 PM, Aryeh Gregor wrote:

> On Tue, Aug 3, 2010 at 5:09 PM, Daniel Friesen
> <lists@nadir-seen-fire.com> wrote:
>> Yup, though we might as well remember that not everyone has mb_
>> functions installed.
>
> if ( !function_exists( 'mb_strlen' ) ) {
> /**
> * Fallback implementation of mb_strlen, hardcoded to UTF-8.
> * @param string $str
> * @param string $enc optional encoding; ignored
> * @return int
> */
> function mb_strlen( $str, $enc="" ) {
> $counts = count_chars( $str );
> $total = 0;
>
> // Count ASCII bytes
> for( $i = 0; $i < 0x80; $i++ ) {
> $total += $counts[$i];
> }
>
> // Count multibyte sequence heads
> for( $i = 0xc0; $i < 0xff; $i++ ) {
> $total += $counts[$i];
> }
> return $total;
> }
> }
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Tue, Aug 3, 2010 at 8:12 PM, soxred93 <soxred93@gmail.com> wrote:
> (just remember that it's 1.5 to 5 times slower, like I said earlier.
> Whether or not that's an issue will have to be decided by higher powers)

This is not some question that has to be decided by
specially-appointed performance gurus -- just do some quick testing.
Like so:

$ echo '<?php $str = str_repeat( "aאπ", 200000000 ); $start =
microtime( true ); mb_strlen( $str ); var_dump( microtime( true ) -
$start );' | php
float(1.1920928955078E-5)

Note that this string is one *billion* bytes long, and the mb_strlen()
still takes only about 10 *microseconds*. If you look at our own
mb_strlen() implementation, the only non-O(1) part is count_chars(),
and for that we find:

$ echo '<?php $str = str_repeat( "aאπ", 200000000 ); $start =
microtime( true ); count_chars( $str ); var_dump( microtime( true ) -
$start );' | php
float(1.8740479946136)

I.e., less than two seconds for a one-billion-byte string. This is
about 100,000 times worse than native mb_strlen(), and about 200,000
times worse than strlen(), but on a sub-megabyte article, it's still
only a millisecond or so in absolute terms.

In the future, remember that you can run this kind of
order-of-magnitude performance assessment yourself very easily. You
*have* to, to write code that performs decently -- you can't just push
all performance considerations off to reviewers. Thankfully, it's
easy to answer this kind of performance question. Things that involve
nontrivial scalability, like database operations, are considerably
harder, and you do need to develop specific expertise to easily
estimate what performance will be like, but that's not the case here.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Showing bytes added/removed in each edit in "View history" and "User contributions" [ In reply to ]
On Mon, Aug 2, 2010 at 5:48 PM, Ariel T. Glenn <ariel@wikimedia.org> wrote:
> Óôéò 02-08-2010, çìÝñá Äåõ, êáé þñá 17:36 -0400, ï/ç Aryeh Gregor
> Ýãñáøå:
>> On Mon, Aug 2, 2010 at 5:28 PM, Jason A. Spiro <jasonspiro4@gmail.com> wrote:
>> > Does the difference really matter so much that we must really use the
>> > more-obscure and more-technical term "bytes"?
>>
>> In English, maybe not.  In a lot of languages, they'll differ by a
>> somewhat unpredictable factor that can be as high as three.  The sane
>> thing would be to just make the counts be in characters rather than
>> bytes to begin with, of course -- it's hardly difficult.  I imagine
>> Chinese people are puzzled when RC reports +3 and there was only one
>> character added.
>
> I would love it if the indicator was in characters instead of bytes.
> That's more meaningful for almost every project.  Readers are looking at
> text after all, not at raw strings.

I've just reported your mutual wish at
https://bugzilla.wikimedia.org/show_bug.cgi?id=25198 Ariel and Aryeh.

And at https://bugzilla.wikimedia.org/show_bug.cgi?id=25199 I've
reported my original idea of showing the number of added or removed
characters on more pages.

To all who replied, thank you for your feedback. I am now
unsubscribing from wikitech-l. Please CC me on all replies.

--
Jason Spiro: software/web developer, packager, trainer, IT consultant.
I support Linux, UNIX, Windows, and more. Contact me to discuss your needs.
+1 (416) 992-3445 / www.jspiro.com

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l