Mailing List Archive

kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
Dear all,

I hope this is the right place to send this sort of backtrace dump.

I'm getting the following sort of dumps (below) on a 3.2.27 kernel on
an arm/kirkwood (actually DreamPlug) machine that's used as a router.

I imagine it being somehow related to the fact that it operates a
network bridge (I imagine this because I have another identical
machine with exactly the same kernel and a very similar config but not
running a bridge, and the warning never pops up).

Is this worth investigating? (I will, of course, provide the config
file and any other relevant data if the answer is "yes".) Is this
potentially serious? (I'm getting hard lockups on this machine which
I suspect are due to hardware and unrelated to this, but if someone
tells me it could be the cause, I'd be more than happy to believe it.)

[24711.204492] ------------[ cut here ]------------
[24711.209151] WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
[24711.216667] Modules linked in: 8021q ath9k_htc mac80211 ath9k_common ath9k_hw ath cfg80211 bnep rfcomm sit tunnel4 sch_ingress cls_fw cls_u32 sch_sfq sch_htb pppoe pppox ppp_generic slhc bridge stp llc ip6t_REJECT ip6table_filter ip6table_mangle xt_NOTRACK ip6table_raw ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp ipt_REJECT xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat xt_TCPMSS xt_tcpudp xt_mark iptable_mangle ip_tables x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 orion_wdt ipv6 snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi btmrvl_sdio btmrvl snd_seq snd_timer snd_seq_device snd bluetooth soundcore
[24711.280663] [<c000d728>] (unwind_backtrace+0x0/0xf0) from [<c0022f74>] (warn_slowpath_common+0x50/0x68)
[24711.290124] [<c0022f74>] (warn_slowpath_common+0x50/0x68) from [<c0022fa8>] (warn_slowpath_null+0x1c/0x24)
[24711.299845] [<c0022fa8>] (warn_slowpath_null+0x1c/0x24) from [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c)
[24711.309914] [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c) from [<c009cfb4>] (__get_free_pages+0x10/0x3c)
[24711.319805] [<c009cfb4>] (__get_free_pages+0x10/0x3c) from [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc)
[24711.329269] [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc) from [<c038d638>] (pskb_expand_head+0x68/0x298)
[24711.338901] [<c038d638>] (pskb_expand_head+0x68/0x298) from [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6])
[24711.348638] [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6]) from [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6])
[24711.358333] [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6]) from [<c0394870>] (__netif_receive_skb+0x544/0x66c)
[24711.368106] [<c0394870>] (__netif_receive_skb+0x544/0x66c) from [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge])
[24711.379899] [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge]) from [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge])
[24711.392271] [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge]) from [<c03bd2a4>] (nf_iterate+0x8c/0xb4)
[24711.401988] [<c03bd2a4>] (nf_iterate+0x8c/0xb4) from [<c03bd328>] (nf_hook_slow+0x5c/0x118)
[24711.410540] [<c03bd328>] (nf_hook_slow+0x5c/0x118) from [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge])
[24711.420367] [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge]) from [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c)
[24711.430872] [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c) from [<c031e254>] (mv643xx_eth_poll+0x540/0x734)
[24711.440680] [<c031e254>] (mv643xx_eth_poll+0x540/0x734) from [<c0397390>] (net_rx_action+0x118/0x314)
[24711.449970] [<c0397390>] (net_rx_action+0x118/0x314) from [<c0029924>] (__do_softirq+0xac/0x234)
[24711.458817] [<c0029924>] (__do_softirq+0xac/0x234) from [<c0029f00>] (irq_exit+0x94/0x9c)
[24711.467046] [<c0029f00>] (irq_exit+0x94/0x9c) from [<c00094b0>] (handle_IRQ+0x34/0x84)
[24711.475007] [<c00094b0>] (handle_IRQ+0x34/0x84) from [<c04398d4>] (__irq_svc+0x34/0x98)
[24711.483068] [<c04398d4>] (__irq_svc+0x34/0x98) from [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94)
[24711.491908] [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94) from [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c)
[24711.501532] [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c) from [<c0009764>] (cpu_idle+0x88/0xdc)
[24711.510201] [<c0009764>] (cpu_idle+0x88/0xdc) from [<c05d8720>] (start_kernel+0x2a0/0x2f0)
[24711.518512] ---[ end trace e1776fbe32468909 ]---

--
David A. Madore
( http://www.madore.org/~david/ )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
David Madore <david+ml@madore.org> :
[...]
> I imagine it being somehow related to the fact that it operates a
> network bridge (I imagine this because I have another identical
> machine with exactly the same kernel and a very similar config but not
> running a bridge, and the warning never pops up).

Could it not be a genuine allocation failure ?

--
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Wed, Aug 29, 2012 at 09:32:20PM +0200, Francois Romieu wrote:
> David Madore <david+ml@madore.org> :
> [...]
> > I imagine it being somehow related to the fact that it operates a
> > network bridge (I imagine this because I have another identical
> > machine with exactly the same kernel and a very similar config but not
> > running a bridge, and the warning never pops up).
>
> Could it not be a genuine allocation failure ?

I have no idea. How can I tell? In any case, if having 512MB RAM
isn't enough for the kernel in the router of a small home's network,
that's a bug somewhere, isn't it?

Also:

On Wed, Aug 29, 2012 at 02:25:48AM +0200, David Madore wrote:
> Is this worth investigating? (I will, of course, provide the config
> file and any other relevant data if the answer is "yes".) Is this
> potentially serious? (I'm getting hard lockups on this machine which
> I suspect are due to hardware and unrelated to this, but if someone
> tells me it could be the cause, I'd be more than happy to believe it.)

I'm now inclined to believe the hard lockups are indeed related to
this (I can semi-reproducibly make them happen with only network
traffic - actually, with the messages of a compilation taking place on
another machine being routed through this box (over IPv6)).

So how can I help debug this? (One difficulty is that I have only
remote access to this box, and it's not meant for experimenting with.)

--
David A. Madore
( http://www.madore.org/~david/ )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, Aug 31, 2012 at 12:59:36PM +0200, David Madore wrote:
> On Wed, Aug 29, 2012 at 09:32:20PM +0200, Francois Romieu wrote:
> > David Madore <david+ml@madore.org> :
> > [...]
> > > I imagine it being somehow related to the fact that it operates a
> > > network bridge (I imagine this because I have another identical
> > > machine with exactly the same kernel and a very similar config but not
> > > running a bridge, and the warning never pops up).
> >
> > Could it not be a genuine allocation failure ?
>
> I have no idea. How can I tell? In any case, if having 512MB RAM
> isn't enough for the kernel in the router of a small home's network,
> that's a bug somewhere, isn't it?

PS: I'm also getting the following kind of messages from a wlan
interface that's on the bridge:

[ 268.976317] ieee80211 phy0: failed to reallocate TX buffer
[ 716.880515] ieee80211 phy0: failed to reallocate TX buffer
[ 1160.877677] ieee80211 phy0: failed to reallocate TX buffer

Could they be related?

--
David A. Madore
( http://www.madore.org/~david/ )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
David Madore <david+ml@madore.org> :
[...]
> [ 268.976317] ieee80211 phy0: failed to reallocate TX buffer
> [ 716.880515] ieee80211 phy0: failed to reallocate TX buffer
> [ 1160.877677] ieee80211 phy0: failed to reallocate TX buffer
>
> Could they be related?

Yes. This is a pskb_expand_head failure from net/mac80211/tx.c.
The router is experiencing GFP_ATOMIC allocation failure

You should monitor /proc/slabinfo content. A pig may appear over time.

--
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
[. Cc'ing original mail to netdev as the problem may be recognized there ]

On Wed, 29 Aug 2012, David Madore wrote:
> Dear all,
>
> I hope this is the right place to send this sort of backtrace dump.
>
> I'm getting the following sort of dumps (below) on a 3.2.27 kernel on
> an arm/kirkwood (actually DreamPlug) machine that's used as a router.
>
> I imagine it being somehow related to the fact that it operates a
> network bridge (I imagine this because I have another identical
> machine with exactly the same kernel and a very similar config but not
> running a bridge, and the warning never pops up).
>
> Is this worth investigating? (I will, of course, provide the config
> file and any other relevant data if the answer is "yes".) Is this
> potentially serious? (I'm getting hard lockups on this machine which
> I suspect are due to hardware and unrelated to this, but if someone
> tells me it could be the cause, I'd be more than happy to believe it.)
>
> [24711.204492] ------------[ cut here ]------------
> [24711.209151] WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
> [24711.216667] Modules linked in: 8021q ath9k_htc mac80211 ath9k_common ath9k_hw ath cfg80211 bnep rfcomm sit tunnel4 sch_ingress cls_fw cls_u32 sch_sfq sch_htb pppoe pppox ppp_generic slhc bridge stp llc ip6t_REJECT ip6table_filter ip6table_mangle xt_NOTRACK ip6table_raw ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp ipt_REJECT xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat xt_TCPMSS xt_tcpudp xt_mark iptable_mangle ip_tables x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 orion_wdt ipv6 snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi btmrvl_sdio btmrvl snd_seq snd_timer snd_seq_device snd bluetooth soundcore
> [24711.280663] [<c000d728>] (unwind_backtrace+0x0/0xf0) from [<c0022f74>] (warn_slowpath_common+0x50/0x68)
> [24711.290124] [<c0022f74>] (warn_slowpath_common+0x50/0x68) from [<c0022fa8>] (warn_slowpath_null+0x1c/0x24)
> [24711.299845] [<c0022fa8>] (warn_slowpath_null+0x1c/0x24) from [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c)
> [24711.309914] [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c) from [<c009cfb4>] (__get_free_pages+0x10/0x3c)
> [24711.319805] [<c009cfb4>] (__get_free_pages+0x10/0x3c) from [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc)
> [24711.329269] [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc) from [<c038d638>] (pskb_expand_head+0x68/0x298)
> [24711.338901] [<c038d638>] (pskb_expand_head+0x68/0x298) from [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6])
> [24711.348638] [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6]) from [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6])
> [24711.358333] [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6]) from [<c0394870>] (__netif_receive_skb+0x544/0x66c)
> [24711.368106] [<c0394870>] (__netif_receive_skb+0x544/0x66c) from [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge])
> [24711.379899] [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge]) from [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge])
> [24711.392271] [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge]) from [<c03bd2a4>] (nf_iterate+0x8c/0xb4)
> [24711.401988] [<c03bd2a4>] (nf_iterate+0x8c/0xb4) from [<c03bd328>] (nf_hook_slow+0x5c/0x118)
> [24711.410540] [<c03bd328>] (nf_hook_slow+0x5c/0x118) from [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge])
> [24711.420367] [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge]) from [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c)
> [24711.430872] [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c) from [<c031e254>] (mv643xx_eth_poll+0x540/0x734)
> [24711.440680] [<c031e254>] (mv643xx_eth_poll+0x540/0x734) from [<c0397390>] (net_rx_action+0x118/0x314)
> [24711.449970] [<c0397390>] (net_rx_action+0x118/0x314) from [<c0029924>] (__do_softirq+0xac/0x234)
> [24711.458817] [<c0029924>] (__do_softirq+0xac/0x234) from [<c0029f00>] (irq_exit+0x94/0x9c)
> [24711.467046] [<c0029f00>] (irq_exit+0x94/0x9c) from [<c00094b0>] (handle_IRQ+0x34/0x84)
> [24711.475007] [<c00094b0>] (handle_IRQ+0x34/0x84) from [<c04398d4>] (__irq_svc+0x34/0x98)
> [24711.483068] [<c04398d4>] (__irq_svc+0x34/0x98) from [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94)
> [24711.491908] [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94) from [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c)
> [24711.501532] [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c) from [<c0009764>] (cpu_idle+0x88/0xdc)
> [24711.510201] [<c0009764>] (cpu_idle+0x88/0xdc) from [<c05d8720>] (start_kernel+0x2a0/0x2f0)
> [24711.518512] ---[ end trace e1776fbe32468909 ]---

Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
is failing, which can easily happen, and cause your "failed to reallocate
TX buffer" errors; but it's well worth looking up what's actually on
lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:

if (order >= MAX_ORDER) {
WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

That was probably not a sane allocation request, it has gone out of range:
maybe the skb header is even corrupted. If you're lucky, it might be
something that netdev will recognize as already fixed.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
Hugh Dickins <hughd@google.com> :
> [. Cc'ing original mail to netdev as the problem may be recognized there ]
[...]
> Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> is failing, which can easily happen, and cause your "failed to reallocate
> TX buffer" errors; but it's well worth looking up what's actually on
> lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
>
> if (order >= MAX_ORDER) {
> WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

You are right. The wifi Tx path error is not related.

I overlooked it was using SLUB btw.

> That was probably not a sane allocation request, it has gone out of range:
> maybe the skb header is even corrupted. If you're lucky, it might be
> something that netdev will recognize as already fixed.

Afaics nothing related to mv643xx_eth.c, nor pskb_expand_head nor
ip6_forward, at least nothing trivially close. Eric may provide a
better answer.

--
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
Since I had a rare occasion to physically access the machine, I did
the following experiment: connect another machine to the serial
console, run

while true ; do date ; cat /proc/slabinfo ; echo '***' ; sleep 3 ; done

and generate lots of IPv6 traffic through the box (as I mentioned, for
some reason, a Firefox compilation through ssh seems particularly
effective). So I now have lots of slabinfo data and, beyond the
initial WARNING, I also got messages along the lines of "swapper: page
allocation failure: order:10, mode:0x4020".

I put the full log in <URL:
http://www.madore.org/~david/.tmp/pollux-dump.0
> (unfortunately a bit garbled, because sometimes the cat slabinfo
was interspaced with printk output, but there are still plenty of
usable lines of each sort).

For completeness, here's a sample message from a page allocation
failure, and a copy of /proc/slabinfo from just about that time (I
have no idea how to read this, but one thing I can say is that there
is no extraordinarily large number in this):

[ 567.757489] swapper: page allocation failure: order:10, mode:0x4020
[ 567.763815] [<c000d728>] (unwind_backtrace+0x0/0xf0) from [<c009a87c>] (warn_alloc_failed+0xcc/0x10c)
[ 567.773119] [<c009a87c>] (warn_alloc_failed+0xcc/0x10c) from [<c009ce48>] (__alloc_pages_nodemask+0x530/0x68c)
[ 567.783184] [<c009ce48>] (__alloc_pages_nodemask+0x530/0x68c) from [<c009cfb4>] (__get_free_pages+0x10/0x3c)
[ 567.793084] [<c009cfb4>] (__get_free_pages+0x10/0x3c) from [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc)
[ 567.802547] [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc) from [<c038d638>] (pskb_expand_head+0x68/0x298)
[ 567.812317] [<c038d638>] (pskb_expand_head+0x68/0x298) from [<bf0e93ec>] (ip6_forward+0x4d4/0x7bc [ipv6])
[ 567.822056] [<bf0e93ec>] (ip6_forward+0x4d4/0x7bc [ipv6]) from [<bf0ebebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6])
[ 567.831751] [<bf0ebebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6]) from [<c0394870>] (__netif_receive_skb+0x544/0x66c)
[ 567.841521] [<c0394870>] (__netif_receive_skb+0x544/0x66c) from [<bf1d9054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge])
[ 567.853306] [<bf1d9054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge]) from [<bf1d9ae8>] (br_nf_pre_routing+0x59c/0x67c [bridge])
[ 567.865673] [<bf1d9ae8>] (br_nf_pre_routing+0x59c/0x67c [bridge]) from [<c03bd2a4>] (nf_iterate+0x8c/0xb4)
[ 567.875387] [<c03bd2a4>] (nf_iterate+0x8c/0xb4) from [<c03bd328>] (nf_hook_slow+0x5c/0x118)
[ 567.883800] [<c03bd328>] (nf_hook_slow+0x5c/0x118) from [<bf1d3fa4>] (br_handle_frame+0x1b8/0x290 [bridge])
[ 567.893624] [<bf1d3fa4>] (br_handle_frame+0x1b8/0x290 [bridge]) from [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c)
[ 567.904137] [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c) from [<c031e254>] (mv643xx_eth_poll+0x540/0x734)
[ 567.913928] [<c031e254>] (mv643xx_eth_poll+0x540/0x734) from [<c0397390>] (net_rx_action+0x118/0x314)
[ 567.923215] [<c0397390>] (net_rx_action+0x118/0x314) from [<c0029924>] (__do_softirq+0xac/0x234)
[ 567.932058] [<c0029924>] (__do_softirq+0xac/0x234) from [<c0029f00>] (irq_exit+0x94/0x9c)
[ 567.940421] [<c0029f00>] (irq_exit+0x94/0x9c) from [<c00094b0>] (handle_IRQ+0x34/0x84)
[ 567.948392] [<c00094b0>] (handle_IRQ+0x34/0x84) from [<c04398d4>] (__irq_svc+0x34/0x98)
[ 567.956454] [<c04398d4>] (__irq_svc+0x34/0x98) from [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94)
[ 567.965299] [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94) from [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c)
[ 567.974925] [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c) from [<c0009764>] (cpu_idle+0x88/0xdc)
[ 567.983581] [<c0009764>] (cpu_idle+0x88/0xdc) from [<c05d8720>] (start_kernel+0x2a0/0x2f0)
[ 567.991893] Mem-info:
[ 567.994185] Normal per-cpu:
[ 567.996995] CPU 0: hi: 186, btch: 31 usd: 84
[ 568.001815] active_anon:5592 inactive_anon:34 isolated_anon:0
[ 568.001820] active_file:2845 inactive_file:6118 isolated_file:0
[ 568.001825] unevictable:418 dirty:13 writeback:0 unstable:0
[ 568.001829] free:12507 slab_reclaimable:632 slab_unreclaimable:1124
[ 568.001835] mapped:2546 shmem:47 pagetables:152 bounce:0
[ 568.031126] Normal free:50028kB min:2884kB low:3604kB high:4324kB active_anon:22368kB inactive_anon:136kB active_file:11380kB inactive_file:24472kB unevictable:1672kB isolated(anon):0kB isolated(file):0kB present:520192kB mlocked:1672kB dirty:52kB writeback:0kB mapped:10184kB shmem:188kB slab_reclaimable:2528kB slab_unreclaimable:4496kB kernel_stack:584kB pagetables:608kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 568.071329] lowmem_reserve[]: 0 0
[ 568.074696] Normal: 1*4kB 1*8kB 8*16kB 7*32kB 0*64kB 2*128kB 1*256kB 4*512kB 20*1024kB 11*2048kB 1*4096kB = 50028kB
[ 568.085282] 9350 total pagecache pages
[ 568.089039] 0 pages in swap cache
[ 568.092363] Swap cache stats: add 0, delete 0, find 0/0
[ 568.097621] Free swap = 0kB
[ 568.100506] Total swap = 0kB
[ 568.117927] 131072 pages of RAM
[ 568.121087] 12771 free pages
[ 568.123972] 2839 reserved pages
[ 568.127140] 1361 slab pages
[ 568.129945] 8003 pages shared
[ 568.132919] 0 pages swap cached

slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nf_conntrack_expect 0 0 176 23 1 : tunables 0 0 0 : slabdata 0 0 0
nf_conntrack_c06d1258 128 128 248 16 1 : tunables 0 0 0 : slabdata 8 8 0
ip6_dst_cache 72 72 224 18 1 : tunables 0 0 0 : slabdata 4 4 0
RAWv6 11 11 704 11 2 : tunables 0 0 0 : slabdata 1 1 0
UDPLITEv6 0 0 704 11 2 : tunables 0 0 0 : slabdata 0 0 0
UDPv6 33 33 704 11 2 : tunables 0 0 0 : slabdata 3 3 0
tw_sock_TCPv6 0 0 160 25 1 : tunables 0 0 0 : slabdata 0 0 0
TCPv6 12 12 1344 12 4 : tunables 0 0 0 : slabdata 1 1 0
ubifs_inode_slab 0 0 408 10 1 : tunables 0 0 0 : slabdata 0 0 0
mqueue_inode_cache 8 8 512 8 1 : tunables 0 0 0 : slabdata 1 1 0
jffs2_xattr_datum 0 0 56 73 1 : tunables 0 0 0 : slabdata 0 0 0
jffs2_refblock 0 0 248 16 1 : tunables 0 0 0 : slabdata 0 0 0
jffs2_i 0 0 360 11 1 : tunables 0 0 0 : slabdata 0 0 0
fat_inode_cache 0 0 392 10 1 : tunables 0 0 0 : slabdata 0 0 0
fat_cache 0 0 24 170 1 : tunables 0 0 0 : slabdata 0 0 0
jbd2_revoke_record 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
journal_handle 170 170 24 170 1 : tunables 0 0 0 : slabdata 1 1 0
revoke_record 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0
ext4_inode_cache 0 0 568 14 2 : tunables 0 0 0 : slabdata 0 0 0
ext4_free_data 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_allocation_context 0 0 112 36 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_io_end 0 0 576 14 2 : tunables 0 0 0 : slabdata 0 0 0
ext4_io_page 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0
ext2_inode_cache 0 0 448 9 1 : tunables 0 0 0 : slabdata 0 0 0
ext3_inode_cache 952 952 464 17 2 : tunables 0 0 0 : slabdata 56 56 0
ext3_xattr 0 0 56 73 1 : tunables 0 0 0 : slabdata 0 0 0
dquot 0 0 160 25 1 : tunables 0 0 0 : slabdata 0 0 0
dnotify_mark 56 56 72 56 1 : tunables 0 0 0 : slabdata 1 1 0
dio 0 0 320 12 1 : tunables 0 0 0 : slabdata 0 0 0
UNIX 27 27 448 9 1 : tunables 0 0 0 : slabdata 3 3 0
UDP-Lite 0 0 544 15 2 : tunables 0 0 0 : slabdata 0 0 0
arp_cache 28 28 288 14 1 : tunables 0 0 0 : slabdata 2 2 0
RAW 15 15 544 15 2 : tunables 0 0 0 : slabdata 1 1 0
UDP 45 45 544 15 2 : tunables 0 0 0 : slabdata 3 3 0
tw_sock_TCP 32 32 128 32 1 : tunables 0 0 0 : slabdata 1 1 0
TCP 13 13 1216 13 4 : tunables 0 0 0 : slabdata 1 1 0
eventpoll_pwq 102 102 40 102 1 : tunables 0 0 0 : slabdata 1 1 0
blkdev_requests 54 54 216 18 1 : tunables 0 0 0 : slabdata 3 3 0
blkdev_ioc 0 0 48 85 1 : tunables 0 0 0 : slabdata 0 0 0
fsnotify_event_holder 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0
biovec-256 10 10 3072 10 8 : tunables 0 0 0 : slabdata 1 1 0
biovec-128 0 0 1536 10 4 : tunables 0 0 0 : slabdata 0 0 0
biovec-64 10 10 768 10 2 : tunables 0 0 0 : slabdata 1 1 0
sock_inode_cache 143 143 352 11 1 : tunables 0 0 0 : slabdata 13 13 0
file_lock_cache 36 36 112 36 1 : tunables 0 0 0 : slabdata 1 1 0
net_namespace 0 0 1248 13 4 : tunables 0 0 0 : slabdata 0 0 0
shmem_inode_cache 715 715 352 11 1 : tunables 0 0 0 : slabdata 65 65 0
taskstats 12 12 328 12 1 : tunables 0 0 0 : slabdata 1 1 0
proc_inode_cache 350 396 352 11 1 : tunables 0 0 0 : slabdata 36 36 0
sigqueue 56 56 144 28 1 : tunables 0 0 0 : slabdata 2 2 0
bdev_cache 9 9 448 9 1 : tunables 0 0 0 : slabdata 1 1 0
sysfs_dir_cache 11118 11118 80 51 1 : tunables 0 0 0 : slabdata 218 218 0
filp 925 925 160 25 1 : tunables 0 0 0 : slabdata 37 37 0
inode_cache 2508 2508 320 12 1 : tunables 0 0 0 : slabdata 209 209 0
dentry 4928 4928 128 32 1 : tunables 0 0 0 : slabdata 154 154 0
buffer_head 1984 1984 64 64 1 : tunables 0 0 0 : slabdata 31 31 0
vm_area_struct 1932 1932 88 46 1 : tunables 0 0 0 : slabdata 42 42 0
mm_struct 50 50 384 10 1 : tunables 0 0 0 : slabdata 5 5 0
signal_cache 84 84 640 12 2 : tunables 0 0 0 : slabdata 7 7 0
sighand_cache 84 84 1312 12 4 : tunables 0 0 0 : slabdata 7 7 0
task_struct 90 90 1088 15 4 : tunables 0 0 0 : slabdata 6 6 0
anon_vma_chain 2210 2210 24 170 1 : tunables 0 0 0 : slabdata 13 13 0
anon_vma 1280 1280 32 128 1 : tunables 0 0 0 : slabdata 10 10 0
radix_tree_node 962 962 304 13 1 : tunables 0 0 0 : slabdata 74 74 0
idr_layer_cache 260 260 152 26 1 : tunables 0 0 0 : slabdata 10 10 0
kmalloc-8192 44 44 8192 4 8 : tunables 0 0 0 : slabdata 11 11 0
kmalloc-4096 114 128 4096 8 8 : tunables 0 0 0 : slabdata 16 16 0
kmalloc-2048 88 160 2048 8 4 : tunables 0 0 0 : slabdata 20 20 0
kmalloc-1024 176 176 1024 8 2 : tunables 0 0 0 : slabdata 22 22 0
kmalloc-512 504 504 512 8 1 : tunables 0 0 0 : slabdata 63 63 0
kmalloc-256 240 240 256 16 1 : tunables 0 0 0 : slabdata 15 15 0
kmalloc-128 512 512 128 32 1 : tunables 0 0 0 : slabdata 16 16 0
kmalloc-64 1728 1728 64 64 1 : tunables 0 0 0 : slabdata 27 27 0
kmalloc-32 10240 10240 32 128 1 : tunables 0 0 0 : slabdata 80 80 0
kmalloc-192 546 546 192 21 1 : tunables 0 0 0 : slabdata 26 26 0
kmalloc-96 1092 1092 96 42 1 : tunables 0 0 0 : slabdata 26 26 0
kmem_cache 32 32 128 32 1 : tunables 0 0 0 : slabdata 1 1 0
kmem_cache_node 128 128 32 128 1 : tunables 0 0 0 : slabdata 1 1 0

--
David A. Madore
( http://www.madore.org/~david/ )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-08-31 at 19:21 -0700, Hugh Dickins wrote:

Hi,

> Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> is failing, which can easily happen, and cause your "failed to reallocate
> TX buffer" errors; but it's well worth looking up what's actually on
> lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
>
> if (order >= MAX_ORDER) {
> WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>
> That was probably not a sane allocation request, it has gone out of range:
> maybe the skb header is even corrupted. If you're lucky, it might be
> something that netdev will recognize as already fixed.

I have the same problem on the exact same hardware and found the cause:

Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue Apr 10 20:08:39 2012 +0000

net: allow pskb_expand_head() to get maximum tailroom

[ Upstream commit 87151b8689d890dfb495081f7be9b9e257f7a2df ]


It turns out this change has a bad side effect on drivers that uses
skb_recycle(), in that case mv643xx_eth.c

Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
recycled skb going multiple times through a path that needs to expand
skb head will get bigger and bigger each time, and you eventually end up
with an allocation failure.

An idea to fix this would be to pass needed skb size to skb_resize() and
set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)

skb recycling gives a small speed boost, but does not get a lot of test
coverage since only 3 drivers uses it

--
Maxime


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote:
> On Fri, 2012-08-31 at 19:21 -0700, Hugh Dickins wrote:
>
> Hi,
>
> > Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> > is failing, which can easily happen, and cause your "failed to reallocate
> > TX buffer" errors; but it's well worth looking up what's actually on
> > lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
> >
> > if (order >= MAX_ORDER) {
> > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >
> > That was probably not a sane allocation request, it has gone out of range:
> > maybe the skb header is even corrupted. If you're lucky, it might be
> > something that netdev will recognize as already fixed.
>
> I have the same problem on the exact same hardware and found the cause:
>
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue Apr 10 20:08:39 2012 +0000
>
> net: allow pskb_expand_head() to get maximum tailroom
>
> [ Upstream commit 87151b8689d890dfb495081f7be9b9e257f7a2df ]
>
>
> It turns out this change has a bad side effect on drivers that uses
> skb_recycle(), in that case mv643xx_eth.c
>
> Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> recycled skb going multiple times through a path that needs to expand
> skb head will get bigger and bigger each time, and you eventually end up
> with an allocation failure.
>
> An idea to fix this would be to pass needed skb size to skb_resize() and
> set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)
>
> skb recycling gives a small speed boost, but does not get a lot of test
> coverage since only 3 drivers uses it
>

Thanks Maxime

Sure we can probably fix this issue, but its really not worth the pain.

I would get rid of it, its superseded by build_skb() to get cache hot
skbs anyway, and more over, rx path now uses skb->head allocated from a
page fragment for optimal GRO/TCP coalescing behavior.

skb_recycle() assumes skb allocation is slow, but its not per se.

Cache line misses are expensive, thats the real issue.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote:
> O
>
> Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> recycled skb going multiple times through a path that needs to expand
> skb head will get bigger and bigger each time, and you eventually end up
> with an allocation failure.
>

Because there is not enough headroom ?

> An idea to fix this would be to pass needed skb size to skb_resize() and
> set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)

I am trying to decode this but I cant ;)

What is skb_resize() ?
and what do you mean setting skb->data to MIN(NET_SKB_PAD, (skb->end -
skb->head - skb_size) / 2)

Care to explain again your idea ?

Thanks !



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Thu, 2012-10-04 at 18:50 +0200, Eric Dumazet wrote:

> > Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> > recycled skb going multiple times through a path that needs to expand
> > skb head will get bigger and bigger each time, and you eventually end up
> > with an allocation failure.
> >
>
> Because there is not enough headroom ?

yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
packet forwarded has its headroom expanded, it is then recycled and gets
its original default headroom back, then it gets forwarded,
expanded, ...

> > An idea to fix this would be to pass needed skb size to skb_resize() and
> > set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)
>
> I am trying to decode this but I cant ;)
>
> What is skb_resize() ?
> and what do you mean setting skb->data to MIN(NET_SKB_PAD, (skb->end -
> skb->head - skb_size) / 2)
>
> Care to explain again your idea ?

damn typo, I meant skb_recycle(), and my formula is probably wrong.

skb_size is passed to skb_recycle_check() to ensure the skb is at least
that big.

The idea is it to pass the same value to skb_recycle(), allowing it to
set skb->data somewhat at the middle of skb head space (after honoring
NET_SKB_PAD), that way the recycled skb won't have its head expanded
again if it takes the same path.

--
Maxime


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Thu, 2012-10-04 at 19:09 +0200, Maxime Bizon wrote:

> yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
> packet forwarded has its headroom expanded, it is then recycled and gets
> its original default headroom back, then it gets forwarded,
> expanded, ...

Hmm, this sounds bad (especially without recycle)

Might I assume NET_SKB_PAD is 32 on this arch ?

We might adjust NET_SKB_PAD to avoid all these memory copies...


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Thu, 2012-10-04 at 19:17 +0200, Eric Dumazet wrote:

> > yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
> > packet forwarded has its headroom expanded, it is then recycled and gets
> > its original default headroom back, then it gets forwarded,
> > expanded, ...
>
> Hmm, this sounds bad (especially without recycle)
>
> Might I assume NET_SKB_PAD is 32 on this arch ?

It is, I have a setup with 6to4 tunneling, so needed headroom on tx is
quite big.

I used to be careful about raising this value to avoid drivers using
slab-4096 instead of slab-2048, but since our boards no longer have 16MB
of RAM and with the recent changes in mainline it doesn't seem to be an
issue anymore.

It's not a that big issue in the non recycle case, just lower
performance if the tunable is not set correctly. Though it would be nice
to have a stat/counter so you know when you hit this kind of slow path.

But on the recycle case, skb->head is reallocated to twice the size each
time the packet is recycled and takes the same path again. This stresses
the VM and you eventually get packet loss (and scary printk)

--
Maxime


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Thu, 2012-10-04 at 19:34 +0200, Maxime Bizon wrote:
> On Thu, 2012-10-04 at 19:17 +0200, Eric Dumazet wrote:
>
> > > yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
> > > packet forwarded has its headroom expanded, it is then recycled and gets
> > > its original default headroom back, then it gets forwarded,
> > > expanded, ...
> >
> > Hmm, this sounds bad (especially without recycle)
> >
> > Might I assume NET_SKB_PAD is 32 on this arch ?
>
> It is, I have a setup with 6to4 tunneling, so needed headroom on tx is
> quite big.
>

If we change NET_SKB_PAD minimum to be 64 (as it is on x86), it should
be enough for the added tunnel encapsulation or not ?


> I used to be careful about raising this value to avoid drivers using
> slab-4096 instead of slab-2048, but since our boards no longer have 16MB
> of RAM and with the recent changes in mainline it doesn't seem to be an
> issue anymore.

Yes, granted we can allocate order-3 pages for delivering skb->head
fragments, adding 32 bytes doesnt switch to slab-4096 since we dont use
it anymore.

>
> It's not a that big issue in the non recycle case, just lower
> performance if the tunable is not set correctly. Though it would be nice
> to have a stat/counter so you know when you hit this kind of slow path.
>

Yeah, we already mentioned this idea in the past. We are lazy now we
have good performance tools (perf)

> But on the recycle case, skb->head is reallocated to twice the size each
> time the packet is recycled and takes the same path again. This stresses
> the VM and you eventually get packet loss (and scary printk)
>

OK, so to fix this on stable trees, skb_recycle() should not recycle skb
if skb->head is too big.

By the way, another problem with skb_recycle() is that skb->truesize can
be wrong as well. (One skb might had one frag, with a truesize of
2048/4096 bytes, and this frag was pulled in skb->head, so skb->truesize
is slightly wrong.

So we also must check if skb->truesize is equal to
SKB_TRUESIZE(skb_end_offset(skb)), or reset it in skb_recycle(),
I have no strong opinion.

Something like this (untested) patch :

But I really think we should remove skb_recycle() when net-next is
opened again.


diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b33a3a1..13ca215 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2659,7 +2659,10 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size)
skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD);
if (skb_end_offset(skb) < skb_size)
return false;
-
+ if (skb_end_offset(skb) > 2 * skb_size)
+ return false;
+ if (skb->truesize != SKB_TRUESIZE(skb_end_offset(skb)))
+ return false;
if (skb_shared(skb) || skb_cloned(skb))
return false;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Thu, 2012-10-04 at 18:29 +0200, Eric Dumazet wrote:
> On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote:
> > On Fri, 2012-08-31 at 19:21 -0700, Hugh Dickins wrote:
> >
> > Hi,
> >
> > > Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> > > is failing, which can easily happen, and cause your "failed to reallocate
> > > TX buffer" errors; but it's well worth looking up what's actually on
> > > lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
> > >
> > > if (order >= MAX_ORDER) {
> > > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> > >
> > > That was probably not a sane allocation request, it has gone out of range:
> > > maybe the skb header is even corrupted. If you're lucky, it might be
> > > something that netdev will recognize as already fixed.
> >
> > I have the same problem on the exact same hardware and found the cause:
> >
> > Author: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Tue Apr 10 20:08:39 2012 +0000
> >
> > net: allow pskb_expand_head() to get maximum tailroom
> >
> > [ Upstream commit 87151b8689d890dfb495081f7be9b9e257f7a2df ]
> >
> >
> > It turns out this change has a bad side effect on drivers that uses
> > skb_recycle(), in that case mv643xx_eth.c
> >
> > Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> > recycled skb going multiple times through a path that needs to expand
> > skb head will get bigger and bigger each time, and you eventually end up
> > with an allocation failure.
> >
> > An idea to fix this would be to pass needed skb size to skb_resize() and
> > set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)
> >
> > skb recycling gives a small speed boost, but does not get a lot of test
> > coverage since only 3 drivers uses it
> >
>
> Thanks Maxime

By the way, the commit you pointed has no effect on the reallocation
performed by pskb_expand_head() :

int size = nhead + skb_end_offset(skb) + ntail;

So pskb_expand_head() always assumed the current head is fully used, and
because we have some kmalloc-power-of-two contraints, each time
pskb_expand_head() is called with a non zero (nhead + ntail) we double
the skb->head ksize.

So why are we using skb_end_offset(skb) here is the question.

I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.

(ie not counting the unused bytes from tail_pointer to end_pointer)

Its probably long to audit all pskb_expand_head() users (about 77 in
tree), but most of them use (nhead = 0, ntail = 0)

It looks like the following patch should be fine.


diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index cdc2859..dd42c6a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1053,11 +1053,22 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
{
int i;
u8 *data;
- int size = nhead + skb_end_offset(skb) + ntail;
+ unsigned int tail_offset = skb_tail_pointer(skb) - skb->head;
+ int size = nhead + ntail;
long off;

BUG_ON(nhead < 0);

+ /* callers using nhead == 0 and ntail == 0 want to get a fresh copy,
+ * so allocate same amount of memory (skb_end_offset)
+ * For others, they want extra headroom or tailroom against the
+ * currently used portion of header (skb->head -> skb_tail_pointer)
+ */
+ if (!size)
+ size = skb_end_offset(skb);
+ else
+ size += tail_offset;
+
if (skb_shared(skb))
BUG();

@@ -1074,7 +1085,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void.
*/
- memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head);
+ memcpy(data + nhead, skb->head, tail_offset);

memcpy((struct skb_shared_info *)(data + size),
skb_shinfo(skb),


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote:

> By the way, the commit you pointed has no effect on the reallocation
> performed by pskb_expand_head() :

The commit has a side effect, because the problem appeared after it was
merged (and goes away if I revert it)

> int size = nhead + skb_end_offset(skb) + ntail;
>
> So pskb_expand_head() always assumed the current head is fully used, and
> because we have some kmalloc-power-of-two contraints, each time
> pskb_expand_head() is called with a non zero (nhead + ntail) we double
> the skb->head ksize.

That is true, but only after the commit I mentioned.

Before that commit, we indeed reallocate skb->head to twice the size,
but skb->end is *not* positioned at the end of newly allocated data. So
on the next pskb_expand_head(), if head and tail are not big values, the
kmalloc() will be of the same size.


The commit adds this after allocation:

size = SKB_WITH_OVERHEAD(ksize(data))
[...]
skb->end = skb->head + size;

so on the next pskb_expand_head, we are going to allocate twice the size
for sure.

> So why are we using skb_end_offset(skb) here is the question.
>
> I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.

I think your patch is wrong, ntail is not the new tailroom size, it's
what missing to the current tailroom size, by adding ntail + nhead +
tail_offset we are removing previous tailroom.

We cannot shrink the skb that way here I guess, a caller may check
needed headroom & tailroom, calls with nhead=1/ntail=0 because only
headroom is missing, but after the call tailroom would be less than
before the call.

Why don't we juste reallocate to this size:

MAX(current_alloc_size, nhead + ntail + current_end - current_head)

--
Maxime


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 12:49 +0200, Maxime Bizon wrote:
> On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote:
>
> > By the way, the commit you pointed has no effect on the reallocation
> > performed by pskb_expand_head() :
>
> The commit has a side effect, because the problem appeared after it was
> merged (and goes away if I revert it)
>
> > int size = nhead + skb_end_offset(skb) + ntail;
> >
> > So pskb_expand_head() always assumed the current head is fully used, and
> > because we have some kmalloc-power-of-two contraints, each time
> > pskb_expand_head() is called with a non zero (nhead + ntail) we double
> > the skb->head ksize.
>
> That is true, but only after the commit I mentioned.
>
> Before that commit, we indeed reallocate skb->head to twice the size,
> but skb->end is *not* positioned at the end of newly allocated data. So
> on the next pskb_expand_head(), if head and tail are not big values, the
> kmalloc() will be of the same size.
>
>
> The commit adds this after allocation:
>
> size = SKB_WITH_OVERHEAD(ksize(data))
> [...]
> skb->end = skb->head + size;
>
> so on the next pskb_expand_head, we are going to allocate twice the size
> for sure.

Yes, but the idea of the patch was to _avoid_ next pskb_expand_head()
calls...

Its defeated because you have a too small NET_SKB_PAD, and skb_recycle()
inability to properly detect ans skb is oversized.

>
> > So why are we using skb_end_offset(skb) here is the question.
> >
> > I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.
>
> I think your patch is wrong, ntail is not the new tailroom size, it's
> what missing to the current tailroom size, by adding ntail + nhead +
> tail_offset we are removing previous tailroom.
>



> We cannot shrink the skb that way here I guess, a caller may check
> needed headroom & tailroom, calls with nhead=1/ntail=0 because only
> headroom is missing, but after the call tailroom would be less than
> before the call.
>
> Why don't we juste reallocate to this size:
>
> MAX(current_alloc_size, nhead + ntail + current_end - current_head)

Hmm,

this changes nothing assuming current_end == skb_end_offset(skb)
and current_head = skb->head

Not sure what you mean.

I guess the right answer is to change API, because we have no clue if
the callers want some tailroom or not.

If they have 'enough' they currently pass 0, so we dont really now how
many bytes they really need..

In fact very few call sites need to increase the tailroom, so it's
probably very easy to do this change.

New convention would be : pass number of needed bytes after current
tail, not after current end.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 14:22 +0200, Eric Dumazet wrote:
> On Fri, 2012-10-05 at 12:49 +0200, Maxime Bizon wrote:
> > On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote:
> >
> > > By the way, the commit you pointed has no effect on the reallocation
> > > performed by pskb_expand_head() :
> >
> > The commit has a side effect, because the problem appeared after it was
> > merged (and goes away if I revert it)
> >
> > > int size = nhead + skb_end_offset(skb) + ntail;
> > >
> > > So pskb_expand_head() always assumed the current head is fully used, and
> > > because we have some kmalloc-power-of-two contraints, each time
> > > pskb_expand_head() is called with a non zero (nhead + ntail) we double
> > > the skb->head ksize.
> >
> > That is true, but only after the commit I mentioned.
> >
> > Before that commit, we indeed reallocate skb->head to twice the size,
> > but skb->end is *not* positioned at the end of newly allocated data. So
> > on the next pskb_expand_head(), if head and tail are not big values, the
> > kmalloc() will be of the same size.
> >
> >
> > The commit adds this after allocation:
> >
> > size = SKB_WITH_OVERHEAD(ksize(data))
> > [...]
> > skb->end = skb->head + size;
> >
> > so on the next pskb_expand_head, we are going to allocate twice the size
> > for sure.
>
> Yes, but the idea of the patch was to _avoid_ next pskb_expand_head()
> calls...
>
> Its defeated because you have a too small NET_SKB_PAD, and skb_recycle()
> inability to properly detect ans skb is oversized.
>
> >
> > > So why are we using skb_end_offset(skb) here is the question.
> > >
> > > I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.
> >
> > I think your patch is wrong, ntail is not the new tailroom size, it's
> > what missing to the current tailroom size, by adding ntail + nhead +
> > tail_offset we are removing previous tailroom.
> >
>
>
>
> > We cannot shrink the skb that way here I guess, a caller may check
> > needed headroom & tailroom, calls with nhead=1/ntail=0 because only
> > headroom is missing, but after the call tailroom would be less than
> > before the call.
> >
> > Why don't we juste reallocate to this size:
> >
> > MAX(current_alloc_size, nhead + ntail + current_end - current_head)
>
> Hmm,
>
> this changes nothing assuming current_end == skb_end_offset(skb)
> and current_head = skb->head
>
> Not sure what you mean.

Following patch maybe ...

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index cdc2859..f6c1f52 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1053,11 +1053,22 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
{
int i;
u8 *data;
- int size = nhead + skb_end_offset(skb) + ntail;
+ unsigned int tail_offset = skb_tail_pointer(skb) - skb->head;
+ int size = nhead + ntail;
long off;

BUG_ON(nhead < 0);

+ /* callers using nhead == 0 and ntail == 0 wants to get a fresh copy,
+ * so allocate same amount of memory (skb_end_offset)
+ * For others, they want extra head or tail against the currently
+ * used portion of header (skb->head -> skb_tail_pointer).
+ * But we dont shrink the head.
+ */
+ if (size)
+ size += tail_offset;
+ size = max_t(int, size, skb_end_offset(skb));
+
if (skb_shared(skb))
BUG();

@@ -1074,7 +1085,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void.
*/
- memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head);
+ memcpy(data + nhead, skb->head, tail_offset);

memcpy((struct skb_shared_info *)(data + size),
skb_shinfo(skb),


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 14:37 +0200, Eric Dumazet wrote:

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index cdc2859..f6c1f52 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1053,11 +1053,22 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> {
> int i;
> u8 *data;
> - int size = nhead + skb_end_offset(skb) + ntail;
> + unsigned int tail_offset = skb_tail_pointer(skb) - skb->head;
> + int size = nhead + ntail;
> long off;
>
> BUG_ON(nhead < 0);
>
> + /* callers using nhead == 0 and ntail == 0 wants to get a fresh copy,
> + * so allocate same amount of memory (skb_end_offset)
> + * For others, they want extra head or tail against the currently
> + * used portion of header (skb->head -> skb_tail_pointer).
> + * But we dont shrink the head.
> + */
> + if (size)
> + size += tail_offset;
> + size = max_t(int, size, skb_end_offset(skb));
> +
>

This can be factorized to :

size = tail_offset + nhead + ntail;
size = max_t(int, size, skb_end_offset(skb));



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 14:22 +0200, Eric Dumazet wrote:

> Yes, but the idea of the patch was to _avoid_ next pskb_expand_head()
> calls...

yes but we cannot be sure of that, the caller may not have a good idea
of the headroom needed for the whole lifetime of the skb

it's better to think we will reduce number of calls, not avoid them

that's why I think doubling the size each time is dangerous, since we
silently request bigger and bigger allocations if an skb takes an
unoptimized path

> Hmm,
>
> this changes nothing assuming current_end == skb_end_offset(skb)
> and current_head = skb->head

My idea was to leave skb->end at its last position even if we grow
skb->head.

Since we have a way to know the current allocation size of skb->head,
further pskb_expand_head() calls to request tailroom would just push
skb->tail & skb->end together if that fits in current ksize().

I've not looked at recent changes in mainline, since you changed how
skb->head is managed, that may be totally impossible.

Your proposed changed API change to expand_head will fill this anyway.

> New convention would be : pass number of needed bytes after current
> tail, not after current end.

Fully agree on this

--
Maxime


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote:

> > New convention would be : pass number of needed bytes after current
> > tail, not after current end.
>
> Fully agree on this
>

Here is the proposal :

Change all occurrences of :

pskb_expand_head(skb, 0, 0, gfp)

to

pskb_realloc_head(skb, gfp)
[ an inline calling a common function ]



And few other pskb_expand_head() calls a new function

pskb_may_expand_head(skb, nhead, ntail, gfp)
[ an inline calling same common function ]


nhead : number of needed bytes before [data,tail] portion
(might be already < skb_headroom(skb))

ntail : number of needed bytes after [data,tail] portion
(might be already > skb_tailroom(skb))

For example, net/xfrm/xfrm_output.c would be like the following :

(Note how all the various ugly tests will be centralized in the common
function)

And the common function would now know the caller intent.


diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 95a338c..15760df 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -24,18 +24,10 @@ static int xfrm_output2(struct sk_buff *skb);
static int xfrm_skb_check_space(struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb);
- int nhead = dst->header_len + LL_RESERVED_SPACE(dst->dev)
- - skb_headroom(skb);
- int ntail = dst->dev->needed_tailroom - skb_tailroom(skb);
-
- if (nhead <= 0) {
- if (ntail <= 0)
- return 0;
- nhead = 0;
- } else if (ntail < 0)
- ntail = 0;
-
- return pskb_expand_head(skb, nhead, ntail, GFP_ATOMIC);
+ int nhead = dst->header_len + LL_RESERVED_SPACE(dst->dev);
+ int ntail = dst->dev->needed_tailroom;
+
+ return pskb_may_expand_head(skb, nhead, ntail, GFP_ATOMIC);
}

static int xfrm_output_one(struct sk_buff *skb, int err)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 15:02 +0200, Eric Dumazet wrote:

> On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote:
>
> > > New convention would be : pass number of needed bytes after current
> > > tail, not after current end.
> >
> > Fully agree on this
> >
>
> Here is the proposal :

Looks fine

What is your plan for the actual pskb_expand_head() code now that you
will have absolute values for headroom & tailroom ?

Because there will still be callers that have no clue of required
tailroom (nor further headroom requirement), like skb_cow() in
vlan_reorder_header().

--
Maxime


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 16:50 +0200, Maxime Bizon wrote:
> On Fri, 2012-10-05 at 15:02 +0200, Eric Dumazet wrote:
>
> > On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote:
> >
> > > > New convention would be : pass number of needed bytes after current
> > > > tail, not after current end.
> > >
> > > Fully agree on this
> > >
> >
> > Here is the proposal :
>
> Looks fine
>
> What is your plan for the actual pskb_expand_head() code now that you
> will have absolute values for headroom & tailroom ?
>

They stay relative values.

For example, netlink_trim() really wants to shrink the skb head.


> Because there will still be callers that have no clue of required
> tailroom (nor further headroom requirement), like skb_cow() in
> vlan_reorder_header().
>

What I plan is to not shrink size, unless specifically asked.

Its 3.8 material anyway, so a stable fix is needed on skb_recycle() and
NET_SKB_PAD minimal value.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c() [ In reply to ]
On Fri, 2012-10-05 at 17:04 +0200, Eric Dumazet wrote:
>
>
> What I plan is to not shrink size, unless specifically asked.
>
> Its 3.8 material anyway, so a stable fix is needed on skb_recycle()
> and NET_SKB_PAD minimal value.

You think removing skb_recycle() is too big a change for stable ?

Driver change is simple, as recycling is not guaranteed today you have
this:

if (!try_recycle(skb))
skb = alloc_skb()
>
we just remove the try_recycle part, we are not adding any new code
path.


I'm not the right person to give you a correct NET_SKB_PAD value, I have
a lot of out of tree patches in my kernels, some custom drivers as well
that needs quite a lot of headroom, and uncommon network setup.

--
Maxime


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All