Mailing List Archive

wikipedia is one of the slower sites on the web
Seems to me playing the role of the average dumb user, that
en.wikipedia.org is one of the rather slow websites of the many websites
I browse.

No matter what browser, it takes more seconds from the time I click on a
link to the time when the first bytes of the HTTP response start flowing
back to me.

Seems facebook is more zippy.

Maybe Mediawiki is not "optimized".

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Wed, Jul 28, 2010 at 3:13 PM, <jidanni@jidanni.org> wrote:
> Seems to me playing the role of the average dumb user, that
> en.wikipedia.org is one of the rather slow websites of the many websites
> I browse.
>
> No matter what browser, it takes more seconds from the time I click on a
> link to the time when the first bytes of the HTTP response start flowing
> back to me.
>
> Seems facebook is more zippy.
>
> Maybe Mediawiki is not "optimized".

Is this logged in or not? If you're not logged in, you should be
hitting Squid cache most of the time, and we should be about as fast
as anyone with similar RTT. But you might easily be far away from the
nearest Wikipedia server than the nearest Facebook server. And if
you're logged in, I'm betting we're much less optimized -- certainly
if you have unusual parser preferences (which I'm sure you do), so you
miss the parser cache regularly.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
> And if
> you're logged in, I'm betting we're much less optimized -- certainly
> if you have unusual parser preferences (which I'm sure you do), so you
> miss the parser cache regularly.

Could you please elaborate on that? Thanks.

Andrei

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
> Could you please elaborate on that? Thanks.

we don't have large blinking red lights when people deviate with their parser cache settings - that makes them miss the cache and each pageview is slow.

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Thu, Jul 29, 2010 at 4:07 PM, Strainu <strainu10@gmail.com> wrote:
> Could you please elaborate on that? Thanks.

When pages are parsed, the parsed version is cached, since parsing can
take a long time (sometimes > 10 s). Some preferences change how
pages are parsed, so different copies need to be stored based on those
preferences. If these settings are all default for you, you'll be
using the same parser cache copies as anonymous users, so you're
extremely likely to get a parser cache hit. If any of them is
non-default, you'll only get a parser cache hit if someone with your
exact parser-related preferences viewed the page since it was last
changed; otherwise it will have to reparse the page just for you,
which will take a long time.

This is probably a bad thing. I'd think that most of the settings
that fragment the parser cache should be implementable in a
post-processing stage, which should be more than fast enough to run on
parser cache hits as well as misses. But we don't have such a thing.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
> This is probably a bad thing. I'd think that most of the settings
> that fragment the parser cache should be implementable in a
> post-processing stage, which should be more than fast enough to run on
> parser cache hits as well as misses. But we don't have such a thing.

some of which can be even done with css/js, I guess.
I'm all for simplifying whatever processing backend has to do :-)

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
Domas Mituzas wrote:
>> This is probably a bad thing. I'd think that most of the settings
>> that fragment the parser cache should be implementable in a
>> post-processing stage, which should be more than fast enough to run on
>> parser cache hits as well as misses. But we don't have such a thing.
>
> some of which can be even done with css/js, I guess.
> I'm all for simplifying whatever processing backend has to do :-)
>
> Domas

We have a couple of options: {$edit}{$printable} which do in fact the
same (remove the sections edit links), so they could be merged.
Additionally, the non-editsection version can be retrieved from the
editsectioned one with a preg_replace.
So yes, I think it can be simplified without even affecting the poor
CSSless users.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
2010/7/30 Platonides <Platonides@gmail.com>

>
>
> We have a couple of options: {$edit}{$printable} which do in fact the
> same (remove the sections edit links), so they could be merged.
> Additionally, the non-editsection version can be retrieved from the
> editsectioned one with a preg_replace.
> So yes, I think it can be simplified without even affecting the poor
> CSSless users.
>
>
Perhaps you're telling the same I'm going to suggest,... My idea is, to have
online a static version, very fast too of any page, that could be the
default version for unlogged users; very similar to the CD static version of
wiki projects, only adding some trick to switch to the normal, editable,
complete, customable (but slow) version. Obviusly this version would have
only one version of any page, with no need to parse it again according to
user preferences.

Don't matter if such an idea is completely fool, I'm far form an expert!

Alex
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
Alex Brollo wrote:
> 2010/7/30 Platonides <Platonides@gmail.com>
>
>
>> We have a couple of options: {$edit}{$printable} which do in fact the
>> same (remove the sections edit links), so they could be merged.
>> Additionally, the non-editsection version can be retrieved from the
>> editsectioned one with a preg_replace.
>> So yes, I think it can be simplified without even affecting the poor
>> CSSless users.
>>
>>
>>
> Perhaps you're telling the same I'm going to suggest,... My idea is, to have
> online a static version, very fast too of any page, that could be the
> default version for unlogged users; very similar to the CD static version of
> wiki projects, only adding some trick to switch to the normal, editable,
> complete, customable (but slow) version. Obviusly this version would have
> only one version of any page, with no need to parse it again according to
> user preferences.
>
> Don't matter if such an idea is completely fool, I'm far form an expert!
>
> Alex
>
That's pretty much the purpose of the caching servers.

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
2010/7/30 Daniel Friesen <lists@nadir-seen-fire.com>

>
> That's pretty much the purpose of the caching servers.
>

Yes, but I presume that a big advantage could come from having a
simplified, unique, js-free version of the pages online, completely devoid
of "user preferences" to avoid any need to parse it again when uploaded by
different users with different preferences profile. Nevertheless I say
again: it's only a completely layman idea.

--
Alex
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Fri, Jul 30, 2010 at 6:23 AM, Aryeh Gregor
<Simetrical+wikilist@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 4:07 PM, Strainu <strainu10@gmail.com> wrote:
>> Could you please elaborate on that? Thanks.
>
> When pages are parsed, the parsed version is cached, since parsing can
> take a long time (sometimes > 10 s).  Some preferences change how
> pages are parsed, so different copies need to be stored based on those
> preferences.  If these settings are all default for you, you'll be
> using the same parser cache copies as anonymous users, so you're
> extremely likely to get a parser cache hit.  If any of them is
> non-default, you'll only get a parser cache hit if someone with your
> exact parser-related preferences viewed the page since it was last
> changed; otherwise it will have to reparse the page just for you,
> which will take a long time.
>
> This is probably a bad thing.

Could we add a logged-in-reader mode, for people who are infrequent
contributors but wish to be logged in for the prefs.

They could be served a slightly old cached version of the page when
one is available for their prefs. e.g. if the cached version is less
than a minute old.
The down side is that if they see an error, it may already be fixed.
OTOH, if the page is being revised frequently, the same is likely to
happen anyway. The text could be stale before it hits the wire due to
parsing delay.

For pending changes, the pref 'Always show the latest accepted
revision (if there is one) of a page by default' could be enabled by
default. Was there any discussion about the default setting for this
pref?

--
John Vandenberg

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Fri, Jul 30, 2010 at 11:49 AM, John Vandenberg <jayvdb@gmail.com> wrote:
>
> Could we add a logged-in-reader mode, for people who are infrequent
> contributors but wish to be logged in for the prefs.
>
> They could be served a slightly old cached version of the page when
> one is available for their prefs.  e.g. if the cached version is less
> than a minute old.
> The down side is that if they see an error, it may already be fixed.
> OTOH, if the page is being revised frequently, the same is likely to
> happen anyway.  The text could be stale before it hits the wire due to
> parsing delay.

That could work on the first 3-5 wikipedias by number of visitors, for
the rest you are most likely to serve VERY old versions (or just
re-parse the page if you put a low threshold).

Strainu

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Fri, Jul 30, 2010 at 2:42 AM, Alex Brollo <alex.brollo@gmail.com> wrote:
> Yes, but I  presume that a big advantage could come  from having a
> simplified, unique, js-free version of the pages online, completely devoid
> of "user preferences" to avoid any need to parse it again when uploaded by
> different users with different preferences profile.

This is exactly what we have when you're logged out. The request goes
to a Squid, and it serves a static cached file, no dynamic bits (if
it's already cached). When you log in, it can't be static, because we
display your name in the upper right, etc.

On Fri, Jul 30, 2010 at 4:49 AM, John Vandenberg <jayvdb@gmail.com> wrote:
> Could we add a logged-in-reader mode, for people who are infrequent
> contributors but wish to be logged in for the prefs.

As soon as you're logged in, you're missing Squid cache, because we
have to add your name to the top, attach your user CSS/JS, etc. You
can't be served the same HTML as an anonymous user. If you want to be
served the same HTML as an anonymous user, log out.

Fortunately, the major slowdown is parser cache misses, not Squid
cache misses. To avoid parser cache misses, just make sure you don't
change parser-affecting preferences to non-default values. (We don't
say which these are, of course . . .)

> They could be served a slightly old cached version of the page when
> one is available for their prefs.  e.g. if the cached version is less
> than a minute old.

That would make no difference. If you've fiddled with your
preferences nontrivially, there's a good chance that not a single
other user has the exact same preferences, so you'll only hit the
parser cache if you yourself have viewed the page recently. For
instance, if you set your stub threshold to 357 bytes, you'll never
hit anyone else's cache (unless someone else has that exact stub
threshold). Even if you just fiddle with on/off options, there are
several, and the number of combinations is exponential.

Moreover, practically no page changes anywhere close to once per
minute. If the threshold is set that low, you'll essentially never
get extra parser cache hits. On the other hand, extra infrastructure
will be needed to keep around stale parser cache entries, so it's a
clear overall loss.

> The down side is that if they see an error, it may already be fixed.
> OTOH, if the page is being revised frequently, the same is likely to
> happen anyway.  The text could be stale before it hits the wire due to
> parsing delay.

However, in that case everyone will see the new contents at more or
less the same time -- it won't be inconsistent.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Sat, Jul 31, 2010 at 1:45 AM, Aryeh Gregor
<Simetrical+wikilist@gmail.com> wrote:
> On Fri, Jul 30, 2010 at 4:49 AM, John Vandenberg <jayvdb@gmail.com> wrote:
>> Could we add a logged-in-reader mode, for people who are infrequent
>> contributors but wish to be logged in for the prefs.
>
> ...
>
> Fortunately, the major slowdown is parser cache misses, not Squid
> cache misses.  To avoid parser cache misses, just make sure you don't
> change parser-affecting preferences to non-default values.  (We don't
> say which these are, of course . . .)

So you're telling my theoretical logged-in-reader to use default
prefs, or log out, when the reason they are a logged-in-reader is so
they can control their preferences..!

>> They could be served a slightly old cached version of the page when
>> one is available for their prefs.  e.g. if the cached version is less
>> than a minute old.
>
> That would make no difference.  If you've fiddled with your
> preferences nontrivially, there's a good chance that not a single
> other user has the exact same preferences, so you'll only hit the
> parser cache if you yourself have viewed the page recently.  For
> instance, if you set your stub threshold to 357 bytes, you'll never
> hit anyone else's cache (unless someone else has that exact stub
> threshold).  Even if you just fiddle with on/off options, there are
> several, and the number of combinations is exponential.

Someone who sets their stub threshold to 357 is their own performance enemy.

Surely there are a few common 'preference sets' which large numbers of
readers use?

How many people only look at the front page in the morning, and jump
to a few pages from there..?

> Moreover, practically no page changes anywhere close to once per
> minute.  If the threshold is set that low, you'll essentially never
> get extra parser cache hits.  On the other hand, extra infrastructure
> will be needed to keep around stale parser cache entries, so it's a
> clear overall loss.

There are plenty of pages which change more than once per minute,
however I'd expect a much higher threshold, variable based on the
volume of page activity, or some other mechanism to determine whether
the cached version is acceptably stale for the logged-in-reader.

There is no infrastructure required for extra stale entries. If the
viewer is happy to accept the slightly stale revision for there chosen
prefs, serve it. If not, reparse.

>> The down side is that if they see an error, it may already be fixed.
>> OTOH, if the page is being revised frequently, the same is likely to
>> happen anyway.  The text could be stale before it hits the wire due to
>> parsing delay.
>
> However, in that case everyone will see the new contents at more or
> less the same time -- it won't be inconsistent.

Not on frequently changing pages. many edits can occur while I am
pulling the page down the wire. I then need to read the page to find
this error.

--
John Vandenberg

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
>>>>> "AG" == Aryeh Gregor <Simetrical+wikilist@gmail.com> writes:

AG> Fortunately, the major slowdown is parser cache misses, not Squid
AG> cache misses. To avoid parser cache misses, just make sure you don't
AG> change parser-affecting preferences to non-default values. (We don't
AG> say which these are, of course . . .)

Hmmm, maybe they're there amongst the "!"s below.
$ lynx --source http://en.wikipedia.org/wiki/Main_Page | grep parser
Expensive parser function count: 44/500
<!-- Saved in parser cache with key enwiki:pcache:idhash:15580374-0!3!0!default!1!en!4!edit=0 and timestamp 20100731001330 -->

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
Aryeh Gregor schrieb:
> As soon as you're logged in, you're missing Squid cache, because we
> have to add your name to the top, attach your user CSS/JS, etc. You
> can't be served the same HTML as an anonymous user. If you want to be
> served the same HTML as an anonymous user, log out.

This is a few years old, but I guess it's still relevant:
<http://brightbyte.de/page/Client-side_skins_with_XSLT> I experimented a bit
with ways to do all the per-user preference stuff on the client side, with XSLT.

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Fri, Jul 30, 2010 at 1:32 PM, John Vandenberg <jayvdb@gmail.com> wrote:
> So you're telling my theoretical logged-in-reader to use default
> prefs, or log out, when the reason they are a logged-in-reader is so
> they can control their preferences..!

Yep. You want features, you often pay a performance penalty. In this
case the performance penalty should be reducible, or at least clearly
marked, but that's a general rule anyway.

> Surely there are a few common 'preference sets' which large numbers of
> readers use?

Changing any parser-related preference will kill page load times.

> There are plenty of pages which change more than once per minute,

No pages change once per minute on average. That would be 1440 edits
per day, or more than 500,000 per year. Only one page on enwiki
(WP:AIAV) has more than 500,000 edits *total*, let alone per year.
There were only 18 edits to WP:ANI between 17:00 and 18:00 today, just
for example, which is less than one edit every three minutes. There
are some times when a particular page changes many times in a minute
-- like when a major event occurs and everyone rushes to update an
article -- but these are rare and don't last long.

You also seem to be missing how many different possible parser cache
keys there are. It's not like there are only five or ten possible
versions. As I said before -- if you change your parser-related
settings around a bunch, you will probably rarely or never hit parser
cache except when you yourself viewed the page since it last changed.
There are too many possible permutations of settings here.

> however I'd expect a much higher threshold, variable based on the
> volume of page activity, or some other mechanism to determine whether
> the cached version is acceptably stale for the logged-in-reader.
>
> There is no infrastructure required for extra stale entries.  If the
> viewer is happy to accept the slightly stale revision for there chosen
> prefs, serve it.  If not, reparse.

Look, this is just not a useful solution, period. It would be
extremely ineffective. If you extended the permitted staleness level
so much that it would be moderately effective, it would be useless,
because you'd be seeing hours- or days-old articles. On the other
hand, for a comparable amount of effort you could implement a solution
that actually is effective, like adding an extra postprocessing stage.

On Fri, Jul 30, 2010 at 8:22 PM, <jidanni@jidanni.org> wrote:
> Hmmm, maybe they're there amongst the "!"s below.
> $ lynx --source http://en.wikipedia.org/wiki/Main_Page | grep parser
> Expensive parser function count: 44/500
> <!-- Saved in parser cache with key enwiki:pcache:idhash:15580374-0!3!0!default!1!en!4!edit=0 and timestamp 20100731001330 -->

Yes. That key is generated by the following line in
includes/parser/ParserCache.php:

$key = wfMemcKey( 'pcache', 'idhash',
"{$pageid}-{$renderkey}!{$hash}{$edit}{$printable}" );

The relevant bit of that, for us, is $hash, which is generated by
getPageRenderingHash() in includes/User.php:

// stubthreshold is only included below for completeness,
// it will always be 0 when this function is called by parsercache.

$confstr = $this->getOption( 'math' );
$confstr .= '!' . $this->getOption( 'stubthreshold' );
if ( $wgUseDynamicDates ) {
$confstr .= '!' . $this->getDatePreference();
}
$confstr .= '!' . ( $this->getOption( 'numberheadings' ) ? '1' : '' );
$confstr .= '!' . $wgLang->getCode();
$confstr .= '!' . $this->getOption( 'thumbsize' );
// add in language specific options, if any
$extra = $wgContLang->getExtraHashOptions();
$confstr .= $extra;

So anonymous users on enwiki have math=3, stubthreshold=0 (although
the comment indicates this is irrelevant somehow), date preferences =
'default', numberheadings = 1, language = 'en', thumbsize = 4.
Changing any of those from the default will make you miss the parser
cache on enwiki.

On Sat, Jul 31, 2010 at 12:58 PM, Daniel Kinzler <daniel@brightbyte.de> wrote:
> This is a few years old, but I guess it's still relevant:
> <http://brightbyte.de/page/Client-side_skins_with_XSLT> I experimented a bit
> with ways to do all the per-user preference stuff on the client side, with XSLT.

XSLT seems a bit baroque. If the goal is to use script to avoid cache
misses, why not just use plain old JavaScript? A lot more people know
it, it supports progressive rendering (does XSLT?), and it's much
better supported. In particular, your approach of serving something
other than HTML and relying on XSLT support to transform it will
seriously confuse text browsers, search engines, etc.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
Aryeh Gregor wrote:
> Look, this is just not a useful solution, period. It would be
> extremely ineffective. If you extended the permitted staleness level
> so much that it would be moderately effective, it would be useless,
> because you'd be seeing hours- or days-old articles. On the other
> hand, for a comparable amount of effort you could implement a solution
> that actually is effective, like adding an extra postprocessing stage.

Yes, I have some ideas on how to improve it.


> On Fri, Jul 30, 2010 at 1:32 PM, John Vandenberg <jayvdb@gmail.com> wrote:
> Someone who sets their stub threshold to 357 is their own performance enemy.

In fact, setting the stub threshold to anything disables the parser
cache. You can only hit it when it is set to 0.

Aryeh, can you do some statistics about the frequency of the different
stub thresholds? Perhaps restricted to people which edited this year, to
discard unused accounts.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
2010/8/1 Platonides <Platonides@gmail.com>:
> Aryeh, can you do some statistics about the frequency of the different
> stub thresholds? Perhaps restricted to people which edited this year, to
> discard unused accounts.
>
He can't, but I can. I ran a couple of queries and put the result at
http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Sun, Aug 1, 2010 at 1:43 PM, Roan Kattouw <roan.kattouw@gmail.com> wrote:
> 2010/8/1 Platonides <Platonides@gmail.com>:
>> Aryeh, can you do some statistics about the frequency of the different
>> stub thresholds? Perhaps restricted to people which edited this year, to
>> discard unused accounts.
>>
> He can't, but I can.  I ran a couple of queries and put the result at
> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold
>

Isn't stub threshold a *reading* preference? It wouldn't be
unreasonable to assume that someone could have that
preference set and not regularly edit.

Also doesn't take into account people who haven't changed
their preferences in a long time (and thus aren't in user_props
yet)

-Chad

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Sun, Aug 1, 2010 at 4:43 PM, Roan Kattouw <roan.kattouw@gmail.com> wrote:
> He can't, but I can.  I ran a couple of queries and put the result at
> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold

I can too -- I'm a toolserver root, so I have read-only access to
pretty much the whole database (minus some omitted
databases/tables/columns, mainly IP addresses and maybe private
wikis). But no need, since you already did it. :) The data isn't
complete because not all users have been ported to user_properties,
right?

One easy hack to reduce this problem is just to only provide a few
options for stub threshold, as we do with thumbnail size. Although
this is only useful if we cache pages with nonzero stub threshold . .
. why don't we do that? Too much fragmentation due to the excessive
range of options?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
2010/8/1 Aryeh Gregor <Simetrical+wikilist@gmail.com>:
> On Sun, Aug 1, 2010 at 4:43 PM, Roan Kattouw <roan.kattouw@gmail.com> wrote:
>> He can't, but I can.  I ran a couple of queries and put the result at
>> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold
>
> I can too -- I'm a toolserver root, so I have read-only access to
> pretty much the whole database (minus some omitted
> databases/tables/columns, mainly IP addresses and maybe private
> wikis).
Ah yes, I forgot about that. I was assuming you'd need access to the
live DB for this.

> But no need, since you already did it.  :)  The data isn't
> complete because not all users have been ported to user_properties,
> right?
>
I don't know. Cursory inspection seems to indicate user_properties is
relatively complete, but comprehensive count queries are too slow for
me to dare run them on the cluster. Maybe you could run something
along the lines of SELECT COUNT(DISTINCT up_user) FROM
user_properties; on the toolserver and compare it with SELECT COUNT(*)
FROM user;

> One easy hack to reduce this problem is just to only provide a few
> options for stub threshold, as we do with thumbnail size.  Although
> this is only useful if we cache pages with nonzero stub threshold . .
> . why don't we do that?  Too much fragmentation due to the excessive
> range of options?
Maybe; but the fact that the field is present but set to 0 in the
parser cache key is very weird. SVN blame should probably be able to
tell who did this and hopefully why.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
Roan Kattouw wrote:
>> One easy hack to reduce this problem is just to only provide a few
>> options for stub threshold, as we do with thumbnail size. Although
>> this is only useful if we cache pages with nonzero stub threshold . .
>> . why don't we do that? Too much fragmentation due to the excessive
>> range of options?
> Maybe; but the fact that the field is present but set to 0 in the
> parser cache key is very weird. SVN blame should probably be able to
> tell who did this and hopefully why.
>
> Roan Kattouw (Catrope)

Look at Article::getParserOutput() on how $wgUser->getOption(
'stubthreshold' ) is explicitely check that it is 0 before enabling the
parser cache.
*There are several other entry points to the ParserCache in Article,
it's a bit mixed.


Note that we do offer several options, not only the free-text field. I
think that the underlying problem is that when changing an article from
98 bytes to 102, we would need to invalidate all pages linking to it for
stubthresholds of 100 bytes.

Since the pages are reparsed, custom values are not a problem now.
I think that to cache for the stubthresholds, we would need to cache
just before the replaceLinkHolders() and perform the replacement at the
user request.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
Roan Kattouw wrote:
> 2010/8/1 Platonides:
>> Aryeh, can you do some statistics about the frequency of the different
>> stub thresholds? Perhaps restricted to people which edited this year, to
>> discard unused accounts.
>>
> He can't, but I can. I ran a couple of queries and put the result at
> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold
>
> Roan Kattouw (Catrope)

Thanks, Roan.
I think that the condition should have been the inverse (users with
recent edits, not users which don't have old edits) but anyway it shows
that with a few (8-10) values we could please almost everyone.

Also, it shows that people don't understand how to disable it. The tail
has many extremely large values which can only mean "don't treat stubs
different".


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: wikipedia is one of the slower sites on the web [ In reply to ]
On Sun, Aug 1, 2010 at 5:03 PM, Roan Kattouw <roan.kattouw@gmail.com> wrote:
> I don't know. Cursory inspection seems to indicate user_properties is
> relatively complete, but comprehensive count queries are too slow for
> me to dare run them on the cluster. Maybe you could run something
> along the lines of SELECT COUNT(DISTINCT up_user) FROM
> user_properties; on the toolserver and compare it with SELECT COUNT(*)
> FROM user;

That won't work, because it won't count users whose settings are all
default. However, we can tell who's switched because user_options
will be empty.

On Sun, Aug 1, 2010 at 5:48 PM, Platonides <Platonides@gmail.com> wrote:
> Note that we do offer several options, not only the free-text field. I
> think that the underlying problem is that when changing an article from
> 98 bytes to 102, we would need to invalidate all pages linking to it for
> stubthresholds of 100 bytes.

Aha, that must be it. Any stub threshold would require extra page
invalidation, which we don't do because it would be pointlessly
expensive. Postprocessing would fix the problem.

> Since the pages are reparsed, custom values are not a problem now.
> I think that to cache for the stubthresholds, we would need to cache
> just before the replaceLinkHolders() and perform the replacement at the
> user request.

Yep. Or parse further, but leave markers lingering in the output
somehow. We don't need to cache the actual wikitext, either way. We
just need to cache at some point after all the heavy lifting has been
done, and everything that's left can be done in a couple of
milliseconds.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

1 2 3  View All