Mailing List Archive

OOM notifications
Hi,

AIX contains the SIGDANGER signal to notify applications to free up some
unused cached memory:

http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html

There have been a few discussions on implementing such an idea on Linux,
but nothing concrete has been achieved.

On the kernel side Rik suggested two notification points: "about to
swap" (for desktop scenarios) and "about to OOM" (for embedded-like
scenarios).

With that assumption in mind it would be necessary to either have two
special devices for notification, or somehow indicate both events
through the same file descriptor.

Comments are more than welcome.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
OOM notifications [ In reply to ]
Hi,

AIX contains the SIGDANGER signal to notify applications to free up some
unused cached memory:

http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html

There have been a few discussions on implementing such an idea on Linux,
but nothing concrete has been achieved.

On the kernel side Rik suggested two notification points: "about to
swap" (for desktop scenarios) and "about to OOM" (for embedded-like
scenarios).

With that assumption in mind it would be necessary to either have two
special devices for notification, or somehow indicate both events
through the same file descriptor.

Comments are more than welcome.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On 10/18/2007 10:25 PM, Marcelo Tosatti wrote:

> AIX contains the SIGDANGER signal to notify applications to free up some
> unused cached memory:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
>
> There have been a few discussions on implementing such an idea on Linux,
> but nothing concrete has been achieved.
>
> On the kernel side Rik suggested two notification points: "about to
> swap" (for desktop scenarios) and "about to OOM" (for embedded-like
> scenarios).
>
> With that assumption in mind it would be necessary to either have two
> special devices for notification, or somehow indicate both events
> through the same file descriptor.
>
> Comments are more than welcome.

Given the desktop/embedded distinction you made, do you need both scenarios
active at the same time? If not, it seems something like a

echo -n <level> >/proc/sys/vm/danger

could do with just one sigdanger notification point? (with <level> suitably
defined as or in terms of the used threshold value).

Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Thu, 18 Oct 2007 22:38:21 +0200
Rene Herman <rene.herman@keyaccess.nl> wrote:

> On 10/18/2007 10:25 PM, Marcelo Tosatti wrote:
>
> > AIX contains the SIGDANGER signal to notify applications to free up
> > some unused cached memory:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
> >
> > There have been a few discussions on implementing such an idea on
> > Linux, but nothing concrete has been achieved.
> >
> > On the kernel side Rik suggested two notification points: "about to
> > swap" (for desktop scenarios) and "about to OOM" (for embedded-like
> > scenarios).
> >
> > With that assumption in mind it would be necessary to either have
> > two special devices for notification, or somehow indicate both
> > events through the same file descriptor.
> >
> > Comments are more than welcome.
>
> Given the desktop/embedded distinction you made, do you need both
> scenarios active at the same time? If not, it seems something like a
>
> echo -n <level> >/proc/sys/vm/danger
>
> could do with just one sigdanger notification point? (with <level>
> suitably defined as or in terms of the used threshold value).

If you do that, how are applications to know which of the two
scenarios is happening when they get a signal?

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On 10/18/2007 10:52 PM, Rik van Riel wrote:
> On Thu, 18 Oct 2007 22:38:21 +0200
> Rene Herman <rene.herman@keyaccess.nl> wrote:
>
>> On 10/18/2007 10:25 PM, Marcelo Tosatti wrote:
>>
>>> AIX contains the SIGDANGER signal to notify applications to free up
>>> some unused cached memory:
>>>
>>> http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
>>>
>>> There have been a few discussions on implementing such an idea on
>>> Linux, but nothing concrete has been achieved.
>>>
>>> On the kernel side Rik suggested two notification points: "about to
>>> swap" (for desktop scenarios) and "about to OOM" (for embedded-like
>>> scenarios).
>>>
>>> With that assumption in mind it would be necessary to either have
>>> two special devices for notification, or somehow indicate both
>>> events through the same file descriptor.
>>>
>>> Comments are more than welcome.
>> Given the desktop/embedded distinction you made, do you need both
>> scenarios active at the same time? If not, it seems something like a
>>
>> echo -n <level> >/proc/sys/vm/danger
>>
>> could do with just one sigdanger notification point? (with <level>
>> suitably defined as or in terms of the used threshold value).
>
> If you do that, how are applications to know which of the two
> scenarios is happening when they get a signal?

They don't -- that's why I asked if you need both scenario's active at the
same time. SIGDANGER would just be SIGPLEASEFREEALLYOUCAN with the operator
deciding through setting the level at which point applications get it.

Or put differently; what's the additional value of notifying an application
that the system is about to go balistic when you've already asked it to free
all it could earlier? SIGSEEDAMNITITOLDYOUSO?

Don't get me wrong; never saw this discussion earlier, may be sensible...

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Thu, 18 Oct 2007 23:06:52 +0200
Rene Herman <rene.herman@keyaccess.nl> wrote:

> They don't -- that's why I asked if you need both scenario's active
> at the same time. SIGDANGER would just be SIGPLEASEFREEALLYOUCAN with
> the operator deciding through setting the level at which point
> applications get it.
>
> Or put differently; what's the additional value of notifying an
> application that the system is about to go balistic when you've
> already asked it to free all it could earlier? SIGSEEDAMNITITOLDYOUSO?

The first threshold - "we are about to swap" - means the application
frees memory that it can. Eg. free()d memory that glibc has not yet
given back to the kernel, or JVM running the garbage collector, or ...

The second threshold - "we are out of memory" - means that the first
approach has failed and the system needs to do something else. On an
embedded system, I would expect some application to exit or maybe
restart itself.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On 10/18/2007 11:18 PM, Rik van Riel wrote:

> On Thu, 18 Oct 2007 23:06:52 +0200
> Rene Herman <rene.herman@keyaccess.nl> wrote:
>
>> They don't -- that's why I asked if you need both scenario's active
>> at the same time. SIGDANGER would just be SIGPLEASEFREEALLYOUCAN with
>> the operator deciding through setting the level at which point
>> applications get it.
>>
>> Or put differently; what's the additional value of notifying an
>> application that the system is about to go balistic when you've
>> already asked it to free all it could earlier? SIGSEEDAMNITITOLDYOUSO?
>
> The first threshold - "we are about to swap" - means the application
> frees memory that it can. Eg. free()d memory that glibc has not yet
> given back to the kernel, or JVM running the garbage collector, or ...
>
> The second threshold - "we are out of memory" - means that the first
> approach has failed and the system needs to do something else. On an
> embedded system, I would expect some application to exit or maybe
> restart itself.

That first threshold sounds fine yes. To me, the second mostly sounds like a
job for SIGTERM though.

The OOM killer could after it selected the task for killing first try a TERM
on it to give a chance to exit gracefully and only when that doesn't help
make it eligible for killing on a second round through the badness calculation.

You could moreover _never_ make a task eligible for killing before it
received a SIGTERM, thereby guaranteeing that everyone got the SIGTERM
before killing anything, and it seems SIGTERM would be a more focussed
version of SIGDANGER2 then.

Would at least forego any need for multiplexing the DANGER signal.

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Rene Herman wrote:
> That first threshold sounds fine yes. To me, the second mostly sounds
> like a job for SIGTERM though.

I agree. Applications shouldn't be expected to be yet more complicated
and have different levels of low memory handling. You might want to
give a process a second shot at handling SIGDANGER but after that's it's
all about preparation for a shutdown.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHF9m/2ijCOnn/RHQRAhwjAKC38y1OLv0mE5sWHY31CwJ2ZaoAXwCglDTO
05pmpe8jMVhwM0nlCHqZyaQ=
=5DvG
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On 10/19/2007 12:01 AM, Rene Herman wrote:
> On 10/18/2007 11:18 PM, Rik van Riel wrote:
>
>> On Thu, 18 Oct 2007 23:06:52 +0200
>> Rene Herman <rene.herman@keyaccess.nl> wrote:
>>
>>> They don't -- that's why I asked if you need both scenario's active
>>> at the same time. SIGDANGER would just be SIGPLEASEFREEALLYOUCAN with
>>> the operator deciding through setting the level at which point
>>> applications get it.
>>>
>>> Or put differently; what's the additional value of notifying an
>>> application that the system is about to go balistic when you've
>>> already asked it to free all it could earlier? SIGSEEDAMNITITOLDYOUSO?
>>
>> The first threshold - "we are about to swap" - means the application
>> frees memory that it can. Eg. free()d memory that glibc has not yet
>> given back to the kernel, or JVM running the garbage collector, or ...
>>
>> The second threshold - "we are out of memory" - means that the first
>> approach has failed and the system needs to do something else. On an
>> embedded system, I would expect some application to exit or maybe
>> restart itself.
>
> That first threshold sounds fine yes. To me, the second mostly sounds
> like a job for SIGTERM though.
>
> The OOM killer could after it selected the task for killing first try a
> TERM on it to give a chance to exit gracefully and only when that
> doesn't help make it eligible for killing on a second round through the
> badness calculation.
>
> You could moreover _never_ make a task eligible for killing before it
> received a SIGTERM, thereby guaranteeing that everyone got the SIGTERM
> before killing anything, and it seems SIGTERM would be a more focussed
> version of SIGDANGER2 then.

Well, no, that "guarantee" is fairly badly formulated but I mean "before
everyone got a SIGTERM" ofcourse. That is, first do the same selection as
now but don't send KILL but TERM and mark the task as having received a TERM
already and make it not eligible anymore. Only when there are no TERM
eligible tasks anymore, start sending KILL.

> Would at least forego any need for multiplexing the DANGER signal.

Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
Ulrich Drepper wrote:

> I agree. Applications shouldn't be expected to be yet more complicated
> and have different levels of low memory handling. You might want to
> give a process a second shot at handling SIGDANGER but after that's it's
> all about preparation for a shutdown.

I disagree. From an embedded viewpoint it would be nice to have a
"please free up memory", then a "we really need memory NOW", then
finally the kernel oom killer.

The advantage of the middle message is that it allows userspace to do
smarter things if it wants to (for instance, if there is an overall
system manager or some such thing, it could do a better job of
restarting tasks than the kernel oom killer since it knows the relative
importance of tasks).

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Thu 2007-10-18 15:10:07, Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Rene Herman wrote:
> > That first threshold sounds fine yes. To me, the second mostly sounds
> > like a job for SIGTERM though.
>
> I agree. Applications shouldn't be expected to be yet more complicated
> and have different levels of low memory handling. You might want to
> give a process a second shot at handling SIGDANGER but after that's it's
> all about preparation for a shutdown.

That works okay on a PC, but try cellphone one day.

You want management app to close the least used application. You do
not want _kernel_ to select "who to send SIGTERM to".
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
>>>>> "Pavel" == Pavel Machek <pavel@ucw.cz> writes:

Pavel> That works okay on a PC, but try cellphone one day.

Pavel> You want management app to close the least used
Pavel> application. You do not want _kernel_ to select "who to send
Pavel> SIGTERM to".

That's why I would prefer that *all* processes receive the
SIGDANGER/whatever (and of course ignore it by default). Only the
management app would handle it in the case you describe and would
select one or more applications to unload to free some memory.

Sam
--
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
Samuel Tardieu wrote:
>>>>>>"Pavel" == Pavel Machek <pavel@ucw.cz> writes:

> Pavel> That works okay on a PC, but try cellphone one day.
>
> Pavel> You want management app to close the least used
> Pavel> application. You do not want _kernel_ to select "who to send
> Pavel> SIGTERM to".
>
> That's why I would prefer that *all* processes receive the
> SIGDANGER/whatever (and of course ignore it by default). Only the
> management app would handle it in the case you describe and would
> select one or more applications to unload to free some memory.

It's still helpful to have two stages...one that all apps can listen to
(and try to reduce their footprint if possible), and a second one that
only the manager would handle (and would kill some suitable target).
Finally, if all that fails then the kernel starts whacking things.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Thu, 18 Oct 2007 16:15:31 -0400
Marcelo Tosatti <marcelo@kvack.org> wrote:

> Hi,
>
> AIX contains the SIGDANGER signal to notify applications to free up some
> unused cached memory:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
>
> There have been a few discussions on implementing such an idea on Linux,
> but nothing concrete has been achieved.
>
> On the kernel side Rik suggested two notification points: "about to
> swap" (for desktop scenarios) and "about to OOM" (for embedded-like
> scenarios).
>
> With that assumption in mind it would be necessary to either have two
> special devices for notification, or somehow indicate both events
> through the same file descriptor.
>
> Comments are more than welcome.

Martin was talking about some mad scheme wherin you'd create a bunch of
pseudo files (say, /proc/foo/0, /proc/foo/1, ..., /proc/foo/9) and each one
would become "ready" when the MM scanning priority reaches 10%, 20%, ...
100%.

Obviously there would need to be a lot of abstraction to unhook a permanent
userspace feature from a transient kernel implementation, but the basic
idea is that a process which wants to know when the VM is getting into the
orange zone would select() on the file "7" and a process which wants to
know when the VM is getting into the red zone would select on file "9".

It get more complicated with NUMA memory nodes and cgroup memory
controllers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
Andrew Morton wrote:
> On Thu, 18 Oct 2007 16:15:31 -0400
> Marcelo Tosatti <marcelo@kvack.org> wrote:
>
>> Hi,
>>
>> AIX contains the SIGDANGER signal to notify applications to free up some
>> unused cached memory:
>>
>> http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
>>
>> There have been a few discussions on implementing such an idea on Linux,
>> but nothing concrete has been achieved.
>>
>> On the kernel side Rik suggested two notification points: "about to
>> swap" (for desktop scenarios) and "about to OOM" (for embedded-like
>> scenarios).
>>
>> With that assumption in mind it would be necessary to either have two
>> special devices for notification, or somehow indicate both events
>> through the same file descriptor.
>>
>> Comments are more than welcome.
>
> Martin was talking about some mad scheme wherin you'd create a bunch of
> pseudo files (say, /proc/foo/0, /proc/foo/1, ..., /proc/foo/9) and each one
> would become "ready" when the MM scanning priority reaches 10%, 20%, ...
> 100%.
>
> Obviously there would need to be a lot of abstraction to unhook a permanent
> userspace feature from a transient kernel implementation, but the basic
> idea is that a process which wants to know when the VM is getting into the
> orange zone would select() on the file "7" and a process which wants to
> know when the VM is getting into the red zone would select on file "9".
>
> It get more complicated with NUMA memory nodes and cgroup memory
> controllers.

We ended up not doing that, but making a scanner that saw what
percentage of the LRU was touched in the last n seconds, and
printing that to userspace to deal with.

Turns out priority is a horrible metric to use for this - it
stays at default for ages, then falls off a cliff far too
quickly to react to.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Fri, 26 Oct 2007 14:05:47 -0700
Martin Bligh <mbligh@mbligh.org> wrote:

> > Martin was talking about some mad scheme wherin you'd create a bunch of
> > pseudo files (say, /proc/foo/0, /proc/foo/1, ..., /proc/foo/9) and each one
> > would become "ready" when the MM scanning priority reaches 10%, 20%, ...
> > 100%.
> >
> > Obviously there would need to be a lot of abstraction to unhook a permanent
> > userspace feature from a transient kernel implementation, but the basic
> > idea is that a process which wants to know when the VM is getting into the
> > orange zone would select() on the file "7" and a process which wants to
> > know when the VM is getting into the red zone would select on file "9".
> >
> > It get more complicated with NUMA memory nodes and cgroup memory
> > controllers.
>
> We ended up not doing that, but making a scanner that saw what
> percentage of the LRU was touched in the last n seconds, and
> printing that to userspace to deal with.
>
> Turns out priority is a horrible metric to use for this - it
> stays at default for ages, then falls off a cliff far too
> quickly to react to.

Sure, but in terms of high-level userspace interface, being able to
select() on a group of priority buckets (spread across different nodes,
zones and cgroups) seems a lot more flexible than any signal-based
approach we could come up with.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Fri, 26 Oct 2007 14:11:12 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> Sure, but in terms of high-level userspace interface, being able to
> select() on a group of priority buckets (spread across different
> nodes, zones and cgroups) seems a lot more flexible than any
> signal-based approach we could come up with.

Absolutely, the process needs to be able to just poll or
select on a file descriptor from the process main loop.

I am not convinced that the magic of NUMA memory distribution
and NUMA memory pressure should be visible to userspace. Due
to the thundering herd problem we cannot wake up all of the
processes that select on the filedescriptor at the same time
anyway, so we can (later on) add NUMA magic to the process
selection logic in the kernel to only wake up processes on
the right NUMA nodes.

The initial patch probably does not need that.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
Rik van Riel wrote:
> On Fri, 26 Oct 2007 14:11:12 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> Sure, but in terms of high-level userspace interface, being able to
>> select() on a group of priority buckets (spread across different
>> nodes, zones and cgroups) seems a lot more flexible than any
>> signal-based approach we could come up with.
>
> Absolutely, the process needs to be able to just poll or
> select on a file descriptor from the process main loop.
>
> I am not convinced that the magic of NUMA memory distribution
> and NUMA memory pressure should be visible to userspace. Due
> to the thundering herd problem we cannot wake up all of the
> processes that select on the filedescriptor at the same time
> anyway, so we can (later on) add NUMA magic to the process
> selection logic in the kernel to only wake up processes on
> the right NUMA nodes.
>
> The initial patch probably does not need that.

Depends if you're using cpusets or not, I think?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Fri, 26 Oct 2007 14:59:01 -0700
Martin Bligh <mbligh@mbligh.org> wrote:

> Rik van Riel wrote:
> > On Fri, 26 Oct 2007 14:11:12 -0700
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> >> Sure, but in terms of high-level userspace interface, being able to
> >> select() on a group of priority buckets (spread across different
> >> nodes, zones and cgroups) seems a lot more flexible than any
> >> signal-based approach we could come up with.
> >
> > Absolutely, the process needs to be able to just poll or
> > select on a file descriptor from the process main loop.
> >
> > I am not convinced that the magic of NUMA memory distribution
> > and NUMA memory pressure should be visible to userspace. Due
> > to the thundering herd problem we cannot wake up all of the
> > processes that select on the filedescriptor at the same time
> > anyway, so we can (later on) add NUMA magic to the process
> > selection logic in the kernel to only wake up processes on
> > the right NUMA nodes.
> >
> > The initial patch probably does not need that.
>
> Depends if you're using cpusets or not, I think?

The kernel knows on which cpuset a process can run.

The process itself may have been relocated to a different
cpuset at runtime, without it even knowing.

Because of that I think the magic of which process(es) to wake
up when there is memory pressure in some NUMA node should
live in the kernel.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
Andrew Morton wrote:
> It get more complicated with NUMA memory nodes and cgroup memory
> controllers.
>

At OLS this year, users wanted user space notification of OOM
for cgroup memory controller. When a group is about to OOM,
a notification can help an external application re-adjust
memory limits across the system.

Keeping some memory reserved for handling OOM, this scheme could
be extended to handle global OOM conditions as well.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
Hello,

> AIX contains the SIGDANGER signal to notify applications to free up some
> unused cached memory:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
>
> There have been a few discussions on implementing such an idea on Linux,
> but nothing concrete has been achieved.
>
> On the kernel side Rik suggested two notification points: "about to
> swap" (for desktop scenarios) and "about to OOM" (for embedded-like
> scenarios).
>
> With that assumption in mind it would be necessary to either have two
> special devices for notification, or somehow indicate both events
> through the same file descriptor.
Actually, wouldn't a generic netlink interface be more elegant? Then
we could connect it with DBUS and it would be much easier for
applications (Desktop) to handle such events.
I agree that near-to-oom conditions are quite volatile and maybe we
want a technically simple (and thus more reliable) mechanism for the
notification but I anyway wanted to point to this possibility.

Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Tue, 30 Oct 2007 15:57:20 +0100
Jan Kara <jack@suse.cz> wrote:

> Hello,
>
> > AIX contains the SIGDANGER signal to notify applications to free up some
> > unused cached memory:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
> >
> > There have been a few discussions on implementing such an idea on Linux,
> > but nothing concrete has been achieved.
> >
> > On the kernel side Rik suggested two notification points: "about to
> > swap" (for desktop scenarios) and "about to OOM" (for embedded-like
> > scenarios).
> >
> > With that assumption in mind it would be necessary to either have two
> > special devices for notification, or somehow indicate both events
> > through the same file descriptor.
> Actually, wouldn't a generic netlink interface be more elegant? Then
> we could connect it with DBUS and it would be much easier for
> applications (Desktop) to handle such events.
> I agree that near-to-oom conditions are quite volatile and maybe we
> want a technically simple (and thus more reliable) mechanism for the
> notification but I anyway wanted to point to this possibility.

There's nothing wrong with being able to get this info via DBUS,
but we cannot expect every database and JVM out there (big targets
for the "reduce your memory footprint" thing on servers) to grow
a DBUS interface.

--
All Rights Reversed
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Tue 30-10-07 11:23:46, Rik van Riel wrote:
> On Tue, 30 Oct 2007 15:57:20 +0100
> Jan Kara <jack@suse.cz> wrote:
>
> > Hello,
> >
> > > AIX contains the SIGDANGER signal to notify applications to free up some
> > > unused cached memory:
> > >
> > > http://www.ussg.iu.edu/hypermail/linux/kernel/0007.0/0901.html
> > >
> > > There have been a few discussions on implementing such an idea on Linux,
> > > but nothing concrete has been achieved.
> > >
> > > On the kernel side Rik suggested two notification points: "about to
> > > swap" (for desktop scenarios) and "about to OOM" (for embedded-like
> > > scenarios).
> > >
> > > With that assumption in mind it would be necessary to either have two
> > > special devices for notification, or somehow indicate both events
> > > through the same file descriptor.
> > Actually, wouldn't a generic netlink interface be more elegant? Then
> > we could connect it with DBUS and it would be much easier for
> > applications (Desktop) to handle such events.
> > I agree that near-to-oom conditions are quite volatile and maybe we
> > want a technically simple (and thus more reliable) mechanism for the
> > notification but I anyway wanted to point to this possibility.
>
> There's nothing wrong with being able to get this info via DBUS,
> but we cannot expect every database and JVM out there (big targets
> for the "reduce your memory footprint" thing on servers) to grow
> a DBUS interface.
Hmm, that's right, but still the kernel->userspace interface could be via
netlink (which is much more flexible than signals etc.) and then in userspace
we could implement also some simple interface (UNIX socket?) for server like
apps...

Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: OOM notifications [ In reply to ]
On Tue, 30 Oct 2007 16:55:25 +0100
Jan Kara <jack@suse.cz> wrote:

> Hmm, that's right, but still the kernel->userspace interface could be via
> netlink (which is much more flexible than signals etc.) and then in userspace
> we could implement also some simple interface (UNIX socket?) for server like
> apps...

I think we all agree that it should not be a Unix signal, if only
because glibc cannot manipulate memory pools from signal handlers :)

The low memory message (for lack of a better word) needs to get to
userspace over a file descriptor, which the process can select() or
poll() on from its main loop.

Whether that is a device node, a sysfs file, a netlink socket or
something else ... I don't particularly care :)

--
All Rights Reversed
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/