Mailing List Archive

PHP parser made twice as fast
I just doubled the speed of the PHP parser.

In my test page ([[Anime]], ~60 links with half broken), I cut the time
for replaceInternalLinks from 800ms to 350ms, and the time for
Article::view from 1310 to 610ms.

This was acheived by eliminating redundant calls to secureAndSplit,
using static variables for constants, and catering to PHP oddities such
as the fact that === is slower than ==.

Okay, you may shower me with praise now.

-- Tim Starling.
Re: PHP parser made twice as fast [ In reply to ]
Tim-
> I just doubled the speed of the PHP parser.

> In my test page ([[Anime]], ~60 links with half broken), I cut the time
> for replaceInternalLinks from 800ms to 350ms, and the time for
> Article::view from 1310 to 610ms.

> This was acheived by eliminating redundant calls to secureAndSplit,
> using static variables for constants, and catering to PHP oddities such
> as the fact that === is slower than ==.

> Okay, you may shower me with praise now.

Very nice. Can we get this into stable ASAP?

Regards,

Erik
Re: PHP parser made twice as fast [ In reply to ]
Tim Starling wrote:

> I just doubled the speed of the PHP parser.
>
> In my test page ([[Anime]], ~60 links with half broken), I cut the
> time for replaceInternalLinks from 800ms to 350ms, and the time for
> Article::view from 1310 to 610ms.
>
> This was acheived by eliminating redundant calls to secureAndSplit,
> using static variables for constants, and catering to PHP oddities
> such as the fact that === is slower than ==.
>
> Okay, you may shower me with praise now.
>
> -- Tim Starling.
>
Fantastic!

-- Neil
Re: PHP parser made twice as fast [ In reply to ]
Tim Starling wrote:
> I just doubled the speed of the PHP parser.
>
> In my test page ([[Anime]], ~60 links with half broken), I cut the time
> for replaceInternalLinks from 800ms to 350ms, and the time for
> Article::view from 1310 to 610ms.
>
> This was acheived by eliminating redundant calls to secureAndSplit,
> using static variables for constants, and catering to PHP oddities such
> as the fact that === is slower than ==.
>
> Okay, you may shower me with praise now.

Cool! Keep going like this, and I'll throw my C++-parser away ;-)

Magnus
Re: PHP parser made twice as fast [ In reply to ]
On Wednesday, Oct 22, 2003, at 16:57 US/Pacific, Tim Starling wrote:
> This was acheived by eliminating redundant calls to secureAndSplit,
> using static variables for constants, and catering to PHP oddities
> such as the fact that === is slower than ==.
>
> Okay, you may shower me with praise now.

*sprinkle sprinkle sprinkle* Three cheers for Tim!

I've backported at least part of the changes to stable; it does make a
difference. On a copy of [[List of China-related topics]], with 1994
broken links and 1 live one, the new code gives about a 15% increase in
total page load speed -- which is a big difference considering that the
page takes about 3.5 seconds to render on my 2GHz Athlon with a single
request as the sole load! 16 of those page views in 60 seconds vs 70
seconds is a definite improvement. I'm sure there's more tweaking to be
done...

(A note: when you're into this many links, the time it takes to load
the info out of the link tables can actually be a significant chunk of
the render time. More aggressive caching will hopefully render all this
moot soon, though...)

Since I'm insane, I've also rewritten the replaceInternalLinks loop,
which has been pissing me off for a long time. It now lets
secureAndSplit do the link parsing rather than trying to do some of it
itself, so the code should be easier to maintain. The rewrite doesn't
seem to have made a significant impact on speed either way compared
with my initial backport of Tim's bits, but the code's IMHO cleaner and
I fixed some bugs while I was in there:

* initial spaces in the title of a link with a namespace are now
trimmed, so [[Wikipedia:_Oops]] now maps correctly to
[[Wikipedia:Oops]] instead of generating a technically illegal title
with initial whitespace in cur_title.
* 'media' is treated as a localizable pseudo-namespace like 'special',
and can be adjusted in the language files
* media links should now go into the imagelinks table and show up in
the image backlinks
* on the off chance someone makes a link in the form
[[:media:foo.jpg]], the link won't turn into a misguided attempt to
link to an article called "Media:foo.jpg".
* inline language links in the form [[:fr:lien interwiki]] finally work
* certain illegal links that were vanishing from the output completely
are now rendered as plaintext (such as "[[ ]]")

Another behavior change that could be changed back if people don't
think it's a good idea:
* on links with the initial colon, the colon now isn't displayed in the
default link text
* 'class="internal"' removed from normal inline links, it just wastes
bandwidth without doing anything useful

This is all "bug fixes" in stable... I've committed to cvs but haven't
installed it just yet, but unless someone turns up problems in testing
I'll install it tomorrow or so when I've got time to look it over and
babysit the server for a while after it's online.

The new fixes will need merging into the dev branch along with all the
other fixes...

-- brion vibber (brion @ pobox.com)
Re: PHP parser made twice as fast [ In reply to ]
Brion Vibber schrieb:

> * inline language links in the form [[:fr:lien interwiki]] finally work

Is this the same as [[FrWikipedia:lien interwiki]]?


Kurt
Re: PHP parser made twice as fast [ In reply to ]
Cool. Now that the algorithm is cleaned up, when do we rewrite it in
assembler?

Louis
(who is mostly kidding since the database dips wouldn't go any faster)

Tim Starling wrote:
> I just doubled the speed of the PHP parser.
>
> In my test page ([[Anime]], ~60 links with half broken), I cut the time
> for replaceInternalLinks from 800ms to 350ms, and the time for
> Article::view from 1310 to 610ms.
>
> This was acheived by eliminating redundant calls to secureAndSplit,
> using static variables for constants, and catering to PHP oddities such
> as the fact that === is slower than ==.
>
> Okay, you may shower me with praise now.
>
> -- Tim Starling.
Re: PHP parser made twice as fast [ In reply to ]
Brion Vibber wrote:
> I've backported at least part of the changes to stable; it does make a
> difference. On a copy of [[List of China-related topics]], with 1994
> broken links and 1 live one, the new code gives about a 15% increase in
> total page load speed -- which is a big difference considering that the
> page takes about 3.5 seconds to render on my 2GHz Athlon with a single
> request as the sole load! 16 of those page views in 60 seconds vs 70
> seconds is a definite improvement. I'm sure there's more tweaking to be
> done...
>
> (A note: when you're into this many links, the time it takes to load the
> info out of the link tables can actually be a significant chunk of the
> render time. More aggressive caching will hopefully render all this moot
> soon, though...)
>
> Since I'm insane, I've also rewritten the replaceInternalLinks loop,
> which has been pissing me off for a long time. It now lets
> secureAndSplit do the link parsing rather than trying to do some of it
> itself, so the code should be easier to maintain.

That's good. I cut down the number of title-parsing operations from 4 or
5 to 2, and you got it from 2 to 1. It probably didn't impact on speed
much because secureAndSplit is slower than the way replaceInternalLinks
was doing it. Optimising secureAndSplit will now have a more pronounced
effect.

Last night, I moved the first wfProfileIn to the top of Setup.php. It
turns out loading code is taking about 30% of the profiled time (for
[[Anime]] again). I was able to reduce that figure a little bit by
making a few more files conditionally included. But my ISP is down so I
couldn't commit it.

Maybe we should try PHPA:

http://www.php-accelerator.co.uk/

> The rewrite doesn't
> seem to have made a significant impact on speed either way compared with
> my initial backport of Tim's bits, but the code's IMHO cleaner and I
> fixed some bugs while I was in there:
>
> * initial spaces in the title of a link with a namespace are now
> trimmed, so [[Wikipedia:_Oops]] now maps correctly to [[Wikipedia:Oops]]
> instead of generating a technically illegal title with initial
> whitespace in cur_title.

What about [[Wikipedia:__Oops]]?

> * 'media' is treated as a localizable pseudo-namespace like 'special',
> and can be adjusted in the language files
> * media links should now go into the imagelinks table and show up in the
> image backlinks
> * on the off chance someone makes a link in the form [[:media:foo.jpg]],
> the link won't turn into a misguided attempt to link to an article
> called "Media:foo.jpg".
> * inline language links in the form [[:fr:lien interwiki]] finally work
> * certain illegal links that were vanishing from the output completely
> are now rendered as plaintext (such as "[[ ]]")
>
> Another behavior change that could be changed back if people don't think
> it's a good idea:
> * on links with the initial colon, the colon now isn't displayed in the
> default link text
> * 'class="internal"' removed from normal inline links, it just wastes
> bandwidth without doing anything useful

All sounds good to me.

-- Tim Starling
Re: Re: PHP parser made twice as fast [ In reply to ]
On Fri, 24 Oct 2003, Tim Starling wrote:
> Maybe we should try PHPA:
>
> http://www.php-accelerator.co.uk/

[fr] Nous l'utilisons déjà. :)
[eo] Ni jam uzas ghin. :)

> > * initial spaces in the title of a link with a namespace are now
> > trimmed, so [[Wikipedia:_Oops]] now maps correctly to [[Wikipedia:Oops]]
> > instead of generating a technically illegal title with initial
> > whitespace in cur_title.
>
> What about [[Wikipedia:__Oops]]?

[eo] Sinsekvo de spacoj jam estas unuigita, do estu:
[fr] Selon ma memoire, une sequence des espaces serait unifiée en une
espace:

'Wikipedia:__Oops' -> 'Wikipedia:_Oops' -> (4, 'Oops')

-- brion vibber (brion @ pobox.com)
Re: PHP parser made twice as fast [ In reply to ]
Brion Vibber wrote:
> On Fri, 24 Oct 2003, Tim Starling wrote:
>
>>Maybe we should try PHPA:
>>
>>http://www.php-accelerator.co.uk/
>
>
> [fr] Nous l'utilisons déjà. :)
> [eo] Ni jam uzas ghin. :)

Ah. I should probably get that on my test system then.

>>>* initial spaces in the title of a link with a namespace are now
>>>trimmed, so [[Wikipedia:_Oops]] now maps correctly to [[Wikipedia:Oops]]
>>>instead of generating a technically illegal title with initial
>>>whitespace in cur_title.
>>
>>What about [[Wikipedia:__Oops]]?
>
>
> [eo] Sinsekvo de spacoj jam estas unuigita, do estu:
> [fr] Selon ma memoire, une sequence des espaces serait unifiée en une
> espace:
>
> 'Wikipedia:__Oops' -> 'Wikipedia:_Oops' -> (4, 'Oops')

Okay, I just did a test and it looks like your memory serves you
correctly. Currently, [[Wikipedia:_____Oops]] will take you a page
titled [[Wikipedia:_Oops]]. I imagine when your latest fix is
implemented, it will become [[Wikipedia:Oops]].

-- ~~~~
Re: PHP parser made twice as fast [ In reply to ]
Tim Starling wrote:
> Okay, you may shower me with praise now.

Dude, you rock.

I hereby decree, in my usual authoritarian and bossy manner, that
today (10/31) shall forever be known as Tim Starling Day. Wikipedians
of the distant future will marvel at the day when the new parsing
algorithm dawned upon us. Tonight at dinner, every Wikipedian should
say a toast to Tim and his many inventions.

In countries that celebrate Halloween, children will first say "Trick
or Treat" and then, when they get the candy, they will say "Secure and
Split" and run away, in honor of Tim's work in this area.

See also:
http://en.wikipedia.org/wiki/Wikipedia%3AMagnus_Manske_Day

(Some may ask: but when is Brion Vibber day? Ah, but you should know
by now: *every day* is Brion Vibber day!)

--Jimbo