Mailing List Archive

SMP performance degradation with sysbench
Hi lkml,

according to the test below (sysbench) Linux seems to have scalability
problems beyond 8 client threads:
http://jeffr-tech.livejournal.com/6268.html#cutid1
http://jeffr-tech.livejournal.com/5705.html
Hardware is an 8-core amd64 system and jeffr seems willing to try more
Linux versions on that machine.
Anyway, is there anyone who can reproduce this?


Chiacchiera con i tuoi amici in tempo reale!
http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Lorenzo Allegrucci wrote:
> Hi lkml,
>
> according to the test below (sysbench) Linux seems to have scalability
> problems beyond 8 client threads:
> http://jeffr-tech.livejournal.com/6268.html#cutid1
> http://jeffr-tech.livejournal.com/5705.html
> Hardware is an 8-core amd64 system and jeffr seems willing to try more
> Linux versions on that machine.
> Anyway, is there anyone who can reproduce this?

I have reproduced it on a quad core test system.

With 4 threads (on 4 cores) I get a high throughput, with
approximately 58% user time and 42% system time.

With 8 threads (on 4 cores) I get way lower throughput,
with 37% user time, 29% system time 35% idle time!

The maximum time taken per query also increases from
0.0096s to 0.5273s. Ouch!

I don't know if this is MySQL, glibc or Linux kernel,
but something strange is going on...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Rik van Riel wrote:
> Lorenzo Allegrucci wrote:
>
>> Hi lkml,
>>
>> according to the test below (sysbench) Linux seems to have scalability
>> problems beyond 8 client threads:
>> http://jeffr-tech.livejournal.com/6268.html#cutid1
>> http://jeffr-tech.livejournal.com/5705.html
>> Hardware is an 8-core amd64 system and jeffr seems willing to try more
>> Linux versions on that machine.
>> Anyway, is there anyone who can reproduce this?
>
>
> I have reproduced it on a quad core test system.
>
> With 4 threads (on 4 cores) I get a high throughput, with
> approximately 58% user time and 42% system time.
>
> With 8 threads (on 4 cores) I get way lower throughput,
> with 37% user time, 29% system time 35% idle time!
>
> The maximum time taken per query also increases from
> 0.0096s to 0.5273s. Ouch!
>
> I don't know if this is MySQL, glibc or Linux kernel,
> but something strange is going on...

Like you, I'm also seeing idle time start going up as threads increase.

I initially thought this was a problem with the multiprocessor scheduler,
because the pattern is exactly like some artificat in the load balancing.

However, after looking at the stats, and testing a couple of things, I
think it may not be after all.

I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
I am seeing is that MySQL is having trouble putting enough load into the
scheduler.

Virtually all of the sleep time is coming from unix_stream_recvmsg, which
seems to be what the clients and server threads use to communicate with.
There doesn't seem to be any other tell-tale event that the database is
blocking on.

It seems like it might at least partially be a problem with MySQL
thread/connection management.

I found a couple of interesting issues so far. Firstly, the MySQL version
that I'm using (5.0.26-Max) is making lots of calls to sched_setscheduler
attempting to fiddle with SCHED_OTHER priority in what looks like an
attempt to boot CPU time while holding some resource. All these calls
actually fail, because you cannot change SCHED_OTHER priority like that.
Adding a hack to make it fall through to set_user_nice provides a boost
which eliminates the cliff (but a downward degredation is still there).

Secondly, I've raised the thread numbers from 16 to 32 for my system,
which also provides a bit more (although doesn't help the downward
slope).

Combined, it looks like around 30-40% improvement past 16 threads. It
isn't anything like making up for the dropoff seen in the blog link, but
different systems, different mysql version... I wonder how close we are
with this hack in place?

Attached is a graph of my numbers, from 1 to 32 clients. plain = 2.6.20.1,
sched is with the attached sched patch, and thread is with 32 rather than
16 clients.

Anyway, I'll keep experimenting. If anyone from MySQL wants to help look
at this, send me a mail (eg. especially with the sched_setscheduler issue,
you might be able to do something better).

Nick

--
SUSE Labs, Novell Inc.
Re: SMP performance degradation with sysbench [ In reply to ]
Nick Piggin wrote:
> Rik van Riel wrote:
>
>> Lorenzo Allegrucci wrote:
>>
>>> Hi lkml,
>>>
>>> according to the test below (sysbench) Linux seems to have scalability
>>> problems beyond 8 client threads:
>>> http://jeffr-tech.livejournal.com/6268.html#cutid1
>>> http://jeffr-tech.livejournal.com/5705.html
>>> Hardware is an 8-core amd64 system and jeffr seems willing to try more
>>> Linux versions on that machine.
>>> Anyway, is there anyone who can reproduce this?
>>
>>
>>
>> I have reproduced it on a quad core test system.
>>
>> With 4 threads (on 4 cores) I get a high throughput, with
>> approximately 58% user time and 42% system time.
>>
>> With 8 threads (on 4 cores) I get way lower throughput,
>> with 37% user time, 29% system time 35% idle time!
>>
>> The maximum time taken per query also increases from
>> 0.0096s to 0.5273s. Ouch!
>>
>> I don't know if this is MySQL, glibc or Linux kernel,
>> but something strange is going on...
>
>
> Like you, I'm also seeing idle time start going up as threads increase.
>
> I initially thought this was a problem with the multiprocessor scheduler,
> because the pattern is exactly like some artificat in the load balancing.

"artificat"

Wow. I must need some sleep :) Please excuse any other typos!

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On Tue, Feb 27, 2007 at 12:36:04AM +1100, Nick Piggin wrote:
> I found a couple of interesting issues so far. Firstly, the MySQL
> version that I'm using (5.0.26-Max) is making lots of calls to

FYI, MySQL fixed some scalability problems in version 5.0.30, as
mentioned here:

http://www.mysqlperformanceblog.com/2007/01/03/innodb-benchmarks/

It may be worth using more recent sources than 5.0.26 if tracking down
scaling problems in MySQL.

--Pete

----------------------------------
Pete Harlan
ArtSelect, Inc.
harlan@artselect.com
http://www.artselect.com
ArtSelect is a subsidiary of a21, Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On Mon, Feb 26, 2007 at 04:04:01PM -0600, Pete Harlan wrote:
> On Tue, Feb 27, 2007 at 12:36:04AM +1100, Nick Piggin wrote:
> > I found a couple of interesting issues so far. Firstly, the MySQL
> > version that I'm using (5.0.26-Max) is making lots of calls to
>
> FYI, MySQL fixed some scalability problems in version 5.0.30, as
> mentioned here:
>
> http://www.mysqlperformanceblog.com/2007/01/03/innodb-benchmarks/
>
> It may be worth using more recent sources than 5.0.26 if tracking down
> scaling problems in MySQL.

The blog post that originated this discussion ran tests on 5.0.33
Not that the mysql version should really matter. The key point here
is that FreeBSD and Linux were running the *same* version, and
FreeBSD was able to handle the situation better somehow.

Dave

--
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Howdy,

MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
http://ossipedia.ipa.go.jp/capacity/EV0612260303/
(written in Japanese but you may read the graph. We compared
5.0.24 vs 5.0.32)

The following is oprofile data
<==
CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples % app name symbol name
47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
18600010 6.6502 mysqld rec_get_offsets_func
18121328 6.4790 mysqld btr_search_guess_on_hash
11453095 4.0949 mysqld row_search_for_mysql

MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
machine.

I think there are a lot of room to be inproved in MySQL implementation.

On 2/27/07, Dave Jones <davej@redhat.com> wrote:
> On Mon, Feb 26, 2007 at 04:04:01PM -0600, Pete Harlan wrote:
> > On Tue, Feb 27, 2007 at 12:36:04AM +1100, Nick Piggin wrote:
> > > I found a couple of interesting issues so far. Firstly, the MySQL
> > > version that I'm using (5.0.26-Max) is making lots of calls to
> >
> > FYI, MySQL fixed some scalability problems in version 5.0.30, as
> > mentioned here:
> >
> > http://www.mysqlperformanceblog.com/2007/01/03/innodb-benchmarks/
> >
> > It may be worth using more recent sources than 5.0.26 if tracking down
> > scaling problems in MySQL.
>
> The blog post that originated this discussion ran tests on 5.0.33
> Not that the mysql version should really matter. The key point here
> is that FreeBSD and Linux were running the *same* version, and
> FreeBSD was able to handle the situation better somehow.
>
> Dave
>
> --
> http://www.codemonkey.org.uk
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Regards,
Hiro
--
Hiro Yoshioka
mailto:hyoshiok at miraclelinux.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Hiro Yoshioka wrote:
> Howdy,
>
> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> (written in Japanese but you may read the graph. We compared
> 5.0.24 vs 5.0.32)
>
> The following is oprofile data
> ==>
> cpu=8-mysql=5.0.32-gcc=3.4/oprofile-eu=2200-op=default-none/opreport-l.txt
> <==
> CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
> mask of 0x00 (Unhalted core cycles) count 100000
> samples % app name symbol name
> 47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
> 19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
> 18600010 6.6502 mysqld rec_get_offsets_func
> 18121328 6.4790 mysqld btr_search_guess_on_hash
> 11453095 4.0949 mysqld row_search_for_mysql
>
> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> machine.
>
> I think there are a lot of room to be inproved in MySQL implementation.

That's one aspect.

The other aspect of the problem is that when the number of
threads exceeds the number of CPU cores, Linux no longer
manages to keep the CPUs busy and we get a lot of idle time.

On the other hand, with the number of threads being equal to
the number of CPU cores, we are 100% CPU bound...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Hi,

From: Rik van Riel <riel@redhat.com>
> Hiro Yoshioka wrote:
> > Howdy,
> >
> > MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> > http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> > (written in Japanese but you may read the graph. We compared
> > 5.0.24 vs 5.0.32)
snip
> > MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> > machine.
> >
> > I think there are a lot of room to be inproved in MySQL implementation.
>
> That's one aspect.
>
> The other aspect of the problem is that when the number of
> threads exceeds the number of CPU cores, Linux no longer
> manages to keep the CPUs busy and we get a lot of idle time.
>
> On the other hand, with the number of threads being equal to
> the number of CPU cores, we are 100% CPU bound...

I have a question. If so, what is the difference of kernel's
view between SMP and CPU cores?

Another question. When the number of threads exceeds the number of
CPU cores, we may get a lot of idle time. Then a workaround of
MySQL is that do not creat threads which exceeds the number
of CPU cores. Is it right?

Regards,
Hiro
--
Hiro Yoshioka
CTO/Miracle Linux Corporation
http://blog.miraclelinux.com/yume/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Hiro Yoshioka wrote:
> Hi,
>
> From: Rik van Riel <riel@redhat.com>
>> Hiro Yoshioka wrote:
>>> Howdy,
>>>
>>> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
>>> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
>>> (written in Japanese but you may read the graph. We compared
>>> 5.0.24 vs 5.0.32)
> snip
>>> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
>>> machine.
>>>
>>> I think there are a lot of room to be inproved in MySQL implementation.
>> That's one aspect.
>>
>> The other aspect of the problem is that when the number of
>> threads exceeds the number of CPU cores, Linux no longer
>> manages to keep the CPUs busy and we get a lot of idle time.
>>
>> On the other hand, with the number of threads being equal to
>> the number of CPU cores, we are 100% CPU bound...
>
> I have a question. If so, what is the difference of kernel's
> view between SMP and CPU cores?

None. Each schedulable entity (whether a fully fledged
CPU core or an SMT/HT thread) is treated the same.

> Another question. When the number of threads exceeds the number of
> CPU cores, we may get a lot of idle time. Then a workaround of
> MySQL is that do not creat threads which exceeds the number
> of CPU cores. Is it right?

Not really, that would make it impossible for MySQL to
handle more simultaneous database queries than the system
has CPUs.

Besides, it looks like this is not a problem in MySQL
per se (it works on FreeBSD) but some bug in Linux.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On Mon, 26 Feb 2007 23:31:29 -0500, Rik van Riel <riel@redhat.com> wrote:

> Hiro Yoshioka wrote:
> > Hi,
> >
> > From: Rik van Riel <riel@redhat.com>
> >> Hiro Yoshioka wrote:
> >>> Howdy,
> >>>
> >>> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> >>> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> >>> (written in Japanese but you may read the graph. We compared
> >>> 5.0.24 vs 5.0.32)
> > snip
> >>> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> >>> machine.
> >>>
> >>> I think there are a lot of room to be inproved in MySQL implementation.
> >> That's one aspect.
> >>
> >> The other aspect of the problem is that when the number of
> >> threads exceeds the number of CPU cores, Linux no longer
> >> manages to keep the CPUs busy and we get a lot of idle time.
> >>
> >> On the other hand, with the number of threads being equal to
> >> the number of CPU cores, we are 100% CPU bound...
> >
> > I have a question. If so, what is the difference of kernel's
> > view between SMP and CPU cores?
>
> None. Each schedulable entity (whether a fully fledged
> CPU core or an SMT/HT thread) is treated the same.
>

And what do the SMT and Multi-Core scheduling options in the kernel
config are for ? Because of this thread I re-read the help text, and
it looks like on could de-select the SMT scheduler option, get a
working SMP system, and see what difference ? I suppose its related
to migration and cache flushing and so on, but where could I get
more details ?
And more strange, what is the difference between multi-core and
normal SMP configs ?

> > Another question. When the number of threads exceeds the number of
> > CPU cores, we may get a lot of idle time. Then a workaround of
> > MySQL is that do not creat threads which exceeds the number
> > of CPU cores. Is it right?
>
> Not really, that would make it impossible for MySQL to
> handle more simultaneous database queries than the system
> has CPUs.
>

I don't know myqsl internals, but you assume one thread per query.
If its more like Apache, one long living thread for several connections ?
Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?

> Besides, it looks like this is not a problem in MySQL
> per se (it works on FreeBSD) but some bug in Linux.
>


--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.19-jam07 (gcc 4.1.2 20070115 (prerelease) (4.1.2-0.20070115.1mdv2007.1)) #2 SMP PREEMPT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
J.A. Magallón wrote:
> On Mon, 26 Feb 2007 23:31:29 -0500, Rik van Riel <riel@redhat.com> wrote:
>
>> Hiro Yoshioka wrote:

>>> Another question. When the number of threads exceeds the number of
>>> CPU cores, we may get a lot of idle time. Then a workaround of
>>> MySQL is that do not creat threads which exceeds the number
>>> of CPU cores. Is it right?
>> Not really, that would make it impossible for MySQL to
>> handle more simultaneous database queries than the system
>> has CPUs.
>>
>
> I don't know myqsl internals, but you assume one thread per query.
> If its more like Apache, one long living thread for several connections ?

Yes, they are longer lived client connections. One thread
per connection, just like Apache.

> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?

That still doesn't fix the potential Linux problem that this
benchmark identified.

To clarify: I don't care as much about MySQL performance as
I care about identifying and fixing this potential bug in
Linux.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Rik van Riel wrote:
> J.A. Magallón wrote:
>>[...]
>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
>
> That still doesn't fix the potential Linux problem that this
> benchmark identified.
>
> To clarify: I don't care as much about MySQL performance as
> I care about identifying and fixing this potential bug in
> Linux.

IIRC a long time ago there was a change in the scheduler to prevent a
low prio task running on a sibling of a hyperthreaded processor to slow
down a higher prio task on another sibling of the same processor.

Basically the scheduler would put the low prio task to sleep during an
adequate task slice to allow the other sibling to run at full speed for
a while.

I don't know the scheduler code well enough, but comments like this one
make me think that the change is still in place:

> /*
> * If an SMT sibling task has been put to sleep for priority
> * reasons reschedule the idle task to see if it can now run.
> */
> if (rq->nr_running) {
> resched_task(rq->idle);
> ret = 1;
> }

If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.

--
Paulo Marques - www.grupopie.com

"The face of a child can say it all, especially the
mouth part of the face."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On Tue, 2007-02-27 at 09:02 -0500, Rik van Riel wrote:
> J.A. Magallón wrote:
> > On Mon, 26 Feb 2007 23:31:29 -0500, Rik van Riel <riel@redhat.com> wrote:
> >
> >> Hiro Yoshioka wrote:
>
> >>> Another question. When the number of threads exceeds the number of
> >>> CPU cores, we may get a lot of idle time. Then a workaround of
> >>> MySQL is that do not creat threads which exceeds the number
> >>> of CPU cores. Is it right?
> >> Not really, that would make it impossible for MySQL to
> >> handle more simultaneous database queries than the system
> >> has CPUs.
> >>
> >
> > I don't know myqsl internals, but you assume one thread per query.
> > If its more like Apache, one long living thread for several connections ?
>
> Yes, they are longer lived client connections. One thread
> per connection, just like Apache.
>
> > Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
>
> That still doesn't fix the potential Linux problem that this
> benchmark identified.
>
> To clarify: I don't care as much about MySQL performance as
> I care about identifying and fixing this potential bug in
> Linux.

Here http://people.freebsd.org/~kris/scaling/mysql.html Kris Kennaway
talks about a patch for FreeBSD 7 which addresses poor scalability
of file descriptor locking and that it's responsible for almost all
of the performance and scaling improvements.


Chiacchiera con i tuoi amici in tempo reale!
http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On 2/27/07, Paulo Marques <pmarques@grupopie.com> wrote:
> Rik van Riel wrote:
> > J.A. Magallón wrote:
> >>[...]
> >> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
> >
> > That still doesn't fix the potential Linux problem that this
> > benchmark identified.
> >
> > To clarify: I don't care as much about MySQL performance as
> > I care about identifying and fixing this potential bug in
> > Linux.
>
> IIRC a long time ago there was a change in the scheduler to prevent a
> low prio task running on a sibling of a hyperthreaded processor to slow
> down a higher prio task on another sibling of the same processor.
>
> Basically the scheduler would put the low prio task to sleep during an
> adequate task slice to allow the other sibling to run at full speed for
> a while.
>
> I don't know the scheduler code well enough, but comments like this one
> make me think that the change is still in place:

<snip>

> If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.

To chime in here, I was attempting to reproduce this on an 8-way Xeon
box (4 dual-core). SCHED_SMT and SCHED_MC on led to scaling issues
when above 4 threads (4 threads was the peak). To the point, where I
couldn't break 1000 transactions per second. Turning both off (with
2.6.20.1) gives much better performance through 16 threads. I am now
running for the cases from 17 to 32 to see if I can reproduce the
problem at hand. I'll regenerate my data and post numbers soon.

I don't know if anyone else has those on in their kernel .config, but
I'd suggest turning them off, as Paulo said.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Hiro Yoshioka wrote:
> Howdy,
>
> MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> (written in Japanese but you may read the graph. We compared
> 5.0.24 vs 5.0.32)
>
> The following is oprofile data
> ==>
> cpu=8-mysql=5.0.32-gcc=3.4/oprofile-eu=2200-op=default-none/opreport-l.txt
> <==
> CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
> mask of 0x00 (Unhalted core cycles) count 100000
> samples % app name symbol name
> 47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
> 19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
> 18600010 6.6502 mysqld rec_get_offsets_func
> 18121328 6.4790 mysqld btr_search_guess_on_hash
> 11453095 4.0949 mysqld row_search_for_mysql
>
> MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> machine.

Curious that it calls pthread_mutex_trylock (as opposed to
pthread_mutex_lock) so often. Maybe they're doing some kind of mutex
lock busy-looping?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On 2/26/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Rik van Riel wrote:
> > Lorenzo Allegrucci wrote:
> >
> >> Hi lkml,
> >>
> >> according to the test below (sysbench) Linux seems to have scalability
> >> problems beyond 8 client threads:
> >> http://jeffr-tech.livejournal.com/6268.html#cutid1
> >> http://jeffr-tech.livejournal.com/5705.html
> >> Hardware is an 8-core amd64 system and jeffr seems willing to try more
> >> Linux versions on that machine.
> >> Anyway, is there anyone who can reproduce this?
> >
> >
> > I have reproduced it on a quad core test system.
> >
> > With 4 threads (on 4 cores) I get a high throughput, with
> > approximately 58% user time and 42% system time.
> >
> > With 8 threads (on 4 cores) I get way lower throughput,
> > with 37% user time, 29% system time 35% idle time!
> >
> > The maximum time taken per query also increases from
> > 0.0096s to 0.5273s. Ouch!
> >
> > I don't know if this is MySQL, glibc or Linux kernel,
> > but something strange is going on...
>
> Like you, I'm also seeing idle time start going up as threads increase.
>
> I initially thought this was a problem with the multiprocessor scheduler,
> because the pattern is exactly like some artificat in the load balancing.
>
> However, after looking at the stats, and testing a couple of things, I
> think it may not be after all.
>
> I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
> I am seeing is that MySQL is having trouble putting enough load into the
> scheduler.

Here are some graphs from the 4-socket/8-way Xeon box (no SMT, no MC
in .config) I posted about earlier.

transactions.png resembles Nick's results pretty closely, in that a
drop-off occurs, at the same # of threads, too. That seems weird to
me, but I haven't thought about it too closely. Shouldn't Nick's be
dropping off closer to 16 threads (that would be 1 per core, then,
right?)

idle.png is the average % idle according to sar over each run from 1
to 32 threads. This appears to confirm what Rik was seeing.

Not sure if my data is hurting or helping, but this box remains
available for further tests.

Thanks,
Nish
Re: SMP performance degradation with sysbench [ In reply to ]
From: Robert Hancock <hancockr@shaw.ca>
Subject: Re: SMP performance degradation with sysbench
Date: Tue, 27 Feb 2007 18:20:25 -0600
Message-ID: <45E4CAC9.4070504@shaw.ca>

> Hiro Yoshioka wrote:
> > Howdy,
> >
> > MySQL 5.0.26 had some scalability issues and it solved since 5.0.32
> > http://ossipedia.ipa.go.jp/capacity/EV0612260303/
> > (written in Japanese but you may read the graph. We compared
> > 5.0.24 vs 5.0.32)
> >
> > The following is oprofile data
> > ==>
> > cpu=8-mysql=5.0.32-gcc=3.4/oprofile-eu=2200-op=default-none/opreport-l.txt
> > <==
> > CPU: Core Solo / Duo, speed 2666.76 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit
> > mask of 0x00 (Unhalted core cycles) count 100000
> > samples % app name symbol name
> > 47097502 16.8391 libpthread-2.3.4.so pthread_mutex_trylock
> > 19636300 7.0207 libpthread-2.3.4.so pthread_mutex_unlock
> > 18600010 6.6502 mysqld rec_get_offsets_func
> > 18121328 6.4790 mysqld btr_search_guess_on_hash
> > 11453095 4.0949 mysqld row_search_for_mysql
> >
> > MySQL tries to get a mutex but it spends about 16.8% of CPU on 8 core
> > machine.
>
> Curious that it calls pthread_mutex_trylock (as opposed to
> pthread_mutex_lock) so often. Maybe they're doing some kind of mutex
> lock busy-looping?

Yes, it is.

Regards,
Hiro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Paulo Marques wrote:
> Rik van Riel wrote:
>> J.A. Magallón wrote:
>>> [...]
>>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
>>
>> That still doesn't fix the potential Linux problem that this
>> benchmark identified.
>>
>> To clarify: I don't care as much about MySQL performance as
>> I care about identifying and fixing this potential bug in
>> Linux.
>
> IIRC a long time ago there was a change in the scheduler to prevent a
> low prio task running on a sibling of a hyperthreaded processor to slow
> down a higher prio task on another sibling of the same processor.
>
> Basically the scheduler would put the low prio task to sleep during an
> adequate task slice to allow the other sibling to run at full speed for
> a while.
>
> I don't know the scheduler code well enough, but comments like this one
> make me think that the change is still in place:
>
>> /*
>> * If an SMT sibling task has been put to sleep for priority
>> * reasons reschedule the idle task to see if it can now run.
>> */
>> if (rq->nr_running) {
>> resched_task(rq->idle);
>> ret = 1;
>> }
>
> If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.
>
That may be the case, but in my opinion if this helps it doesn't "solve"
the problem, because the real problem is that a process which is not on
a HT is being treated as if it were.

Note that Intel does make multicore HT processors, and hopefully when
this code works as intended it will result in more total throughput. My
supposition is that it currently is NOT working as intended, since
disabling SMT scheduling is reported to help.

A test with MC on and SMT off would be informative for where to look next.

--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Nish Aravamudan wrote:
> On 2/26/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>> Rik van Riel wrote:
>> > Lorenzo Allegrucci wrote:
>> >
>> >> Hi lkml,
>> >>
>> >> according to the test below (sysbench) Linux seems to have scalability
>> >> problems beyond 8 client threads:
>> >> http://jeffr-tech.livejournal.com/6268.html#cutid1
>> >> http://jeffr-tech.livejournal.com/5705.html
>> >> Hardware is an 8-core amd64 system and jeffr seems willing to try more
>> >> Linux versions on that machine.
>> >> Anyway, is there anyone who can reproduce this?
>> >
>> >
>> > I have reproduced it on a quad core test system.
>> >
>> > With 4 threads (on 4 cores) I get a high throughput, with
>> > approximately 58% user time and 42% system time.
>> >
>> > With 8 threads (on 4 cores) I get way lower throughput,
>> > with 37% user time, 29% system time 35% idle time!
>> >
>> > The maximum time taken per query also increases from
>> > 0.0096s to 0.5273s. Ouch!
>> >
>> > I don't know if this is MySQL, glibc or Linux kernel,
>> > but something strange is going on...
>>
>> Like you, I'm also seeing idle time start going up as threads increase.
>>
>> I initially thought this was a problem with the multiprocessor scheduler,
>> because the pattern is exactly like some artificat in the load balancing.
>>
>> However, after looking at the stats, and testing a couple of things, I
>> think it may not be after all.
>>
>> I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
>> I am seeing is that MySQL is having trouble putting enough load into the
>> scheduler.
>
>
> Here are some graphs from the 4-socket/8-way Xeon box (no SMT, no MC
> in .config) I posted about earlier.
>
> transactions.png resembles Nick's results pretty closely, in that a
> drop-off occurs, at the same # of threads, too. That seems weird to
> me, but I haven't thought about it too closely. Shouldn't Nick's be
> dropping off closer to 16 threads (that would be 1 per core, then,
> right?)

I don't think it is exactly a matter of processes >= cores, but rather
just a general problem at higher concurrency.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On 2/27/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Nish Aravamudan wrote:
> > On 2/26/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >
> >> Rik van Riel wrote:
> >> > Lorenzo Allegrucci wrote:
> >> >
> >> >> Hi lkml,
> >> >>
> >> >> according to the test below (sysbench) Linux seems to have scalability
> >> >> problems beyond 8 client threads:
> >> >> http://jeffr-tech.livejournal.com/6268.html#cutid1
> >> >> http://jeffr-tech.livejournal.com/5705.html
> >> >> Hardware is an 8-core amd64 system and jeffr seems willing to try more
> >> >> Linux versions on that machine.
> >> >> Anyway, is there anyone who can reproduce this?
> >> >
> >> >
> >> > I have reproduced it on a quad core test system.
> >> >
> >> > With 4 threads (on 4 cores) I get a high throughput, with
> >> > approximately 58% user time and 42% system time.
> >> >
> >> > With 8 threads (on 4 cores) I get way lower throughput,
> >> > with 37% user time, 29% system time 35% idle time!
> >> >
> >> > The maximum time taken per query also increases from
> >> > 0.0096s to 0.5273s. Ouch!
> >> >
> >> > I don't know if this is MySQL, glibc or Linux kernel,
> >> > but something strange is going on...
> >>
> >> Like you, I'm also seeing idle time start going up as threads increase.
> >>
> >> I initially thought this was a problem with the multiprocessor scheduler,
> >> because the pattern is exactly like some artificat in the load balancing.
> >>
> >> However, after looking at the stats, and testing a couple of things, I
> >> think it may not be after all.
> >>
> >> I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
> >> I am seeing is that MySQL is having trouble putting enough load into the
> >> scheduler.
> >
> >
> > Here are some graphs from the 4-socket/8-way Xeon box (no SMT, no MC
> > in .config) I posted about earlier.
> >
> > transactions.png resembles Nick's results pretty closely, in that a
> > drop-off occurs, at the same # of threads, too. That seems weird to
> > me, but I haven't thought about it too closely. Shouldn't Nick's be
> > dropping off closer to 16 threads (that would be 1 per core, then,
> > right?)
>
> I don't think it is exactly a matter of processes >= cores, but rather
> just a general problem at higher concurrency.

Ok, thanks for the clarification.

-Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On 2/27/07, Bill Davidsen <davidsen@tmr.com> wrote:
> Paulo Marques wrote:
> > Rik van Riel wrote:
> >> J.A. Magallón wrote:
> >>> [...]
> >>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
> >>
> >> That still doesn't fix the potential Linux problem that this
> >> benchmark identified.
> >>
> >> To clarify: I don't care as much about MySQL performance as
> >> I care about identifying and fixing this potential bug in
> >> Linux.
> >
> > IIRC a long time ago there was a change in the scheduler to prevent a
> > low prio task running on a sibling of a hyperthreaded processor to slow
> > down a higher prio task on another sibling of the same processor.
> >
> > Basically the scheduler would put the low prio task to sleep during an
> > adequate task slice to allow the other sibling to run at full speed for
> > a while.
> >
> > I don't know the scheduler code well enough, but comments like this one
> > make me think that the change is still in place:
> >
> >> /*
> >> * If an SMT sibling task has been put to sleep for priority
> >> * reasons reschedule the idle task to see if it can now run.
> >> */
> >> if (rq->nr_running) {
> >> resched_task(rq->idle);
> >> ret = 1;
> >> }
> >
> > If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.
> >
> That may be the case, but in my opinion if this helps it doesn't "solve"
> the problem, because the real problem is that a process which is not on
> a HT is being treated as if it were.
>
> Note that Intel does make multicore HT processors, and hopefully when
> this code works as intended it will result in more total throughput. My
> supposition is that it currently is NOT working as intended, since
> disabling SMT scheduling is reported to help.

It does help, but we still drop off, clearly. Also, that's my
baseline, so I'm not able to reproduce the *sharp* dropoff from the
blog post yet.

> A test with MC on and SMT off would be informative for where to look next.

I'm rebooting my box with 2.6.20.1 and exactly this setup now.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
On 2/27/07, Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
> On 2/27/07, Bill Davidsen <davidsen@tmr.com> wrote:
> > Paulo Marques wrote:
> > > Rik van Riel wrote:
> > >> J.A. Magallón wrote:
> > >>> [...]
> > >>> Its the same to answer 4+4 queries than 8 at half the speed, isn't it ?
> > >>
> > >> That still doesn't fix the potential Linux problem that this
> > >> benchmark identified.
> > >>
> > >> To clarify: I don't care as much about MySQL performance as
> > >> I care about identifying and fixing this potential bug in
> > >> Linux.
> > >
> > > IIRC a long time ago there was a change in the scheduler to prevent a
> > > low prio task running on a sibling of a hyperthreaded processor to slow
> > > down a higher prio task on another sibling of the same processor.
> > >
> > > Basically the scheduler would put the low prio task to sleep during an
> > > adequate task slice to allow the other sibling to run at full speed for
> > > a while.
<snip>
> > > If that is the case, turning off CONFIG_SCHED_SMT would solve the problem.
<snip>
> > Note that Intel does make multicore HT processors, and hopefully when
> > this code works as intended it will result in more total throughput. My
> > supposition is that it currently is NOT working as intended, since
> > disabling SMT scheduling is reported to help.
>
> It does help, but we still drop off, clearly. Also, that's my
> baseline, so I'm not able to reproduce the *sharp* dropoff from the
> blog post yet.
>
> > A test with MC on and SMT off would be informative for where to look next.
>
> I'm rebooting my box with 2.6.20.1 and exactly this setup now.

Here are the results:

idle.png: average % idle over 120s runs from 1 to 32 threads
transactions.png: TPS over 120s runs from 1 to 32 threads

Hope the data is useful. All I can conclude right now is that SMT
appears to help (contradicting what I said earlier), but that MC seems
to have no effect (or no substantial effect).

Thanks,
Nish
Re: SMP performance degradation with sysbench [ In reply to ]
On Tue, 2007-02-27 at 20:05 +0100, Lorenzo Allegrucci wrote:
> On Tue, 2007-02-27 at 09:02 -0500, Rik van Riel wrote:
> > That still doesn't fix the potential Linux problem that this
> > benchmark identified.
> >
> > To clarify: I don't care as much about MySQL performance as
> > I care about identifying and fixing this potential bug in
> > Linux.
>
> Here http://people.freebsd.org/~kris/scaling/mysql.html Kris Kennaway
> talks about a patch for FreeBSD 7 which addresses poor scalability
> of file descriptor locking and that it's responsible for almost all
> of the performance and scaling improvements.

How does Linux scale with many threads contending for file descriptor
lock?
Has anyone tried to run the test with oprofile?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench [ In reply to ]
Hi Nick,

> Anyway, I'll keep experimenting. If anyone from MySQL wants to help look
> at this, send me a mail (eg. especially with the sched_setscheduler issue,
> you might be able to do something better).

I took a look at this today and figured Id document it:

http://ozlabs.org/~anton/linux/sysbench/

Bottom line: it looks like issues in the glibc malloc library, replacing
it with the google malloc library fixes the negative scaling:

# apt-get install libgoogle-perftools0
# LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All