Mailing List Archive

[PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476)
This was found in scale testing at OSR; ospfd is adding the same link
over and over again to the SPF tree. This fix prevents the resulting
memory corruption from happening and adds a debug message to track
occurence of this issue and/or confirm a proper fix.

(This version was improved by Scott Feldman over the earlier RFC.)

* ospfd/ospf_spf.c: (ospf_spf_add_parent) loop over existing vertices
and refuse to add duplicates.

Tested-by: Martin Winter <mwinter@opensourcerouting.org>
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
---
ospfd/ospf_spf.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/ospfd/ospf_spf.c b/ospfd/ospf_spf.c
index d7f9ba2..974875e 100644
--- a/ospfd/ospf_spf.c
+++ b/ospfd/ospf_spf.c
@@ -422,7 +422,8 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
struct vertex_nexthop *newhop,
unsigned int distance)
{
- struct vertex_parent *vp;
+ struct vertex_parent *vp, *wp;
+ struct listnode *node;

/* we must have a newhop, and a distance */
assert (v && w && newhop);
@@ -456,7 +457,19 @@ ospf_spf_add_parent (struct vertex *v, struct vertex *w,
w->distance = distance;
}

- /* new parent is <= existing parents, add it to parent list */
+ /* new parent is <= existing parents, add it to parent list (if nexthop
+ * not on parent list)
+ */
+ for (ALL_LIST_ELEMENTS_RO(w->parents, node, wp))
+ {
+ if (memcmp(newhop, wp->nexthop, sizeof(*newhop)) == 0)
+ {
+ if (IS_DEBUG_OSPF_EVENT)
+ zlog_debug ("%s: ... nexthop already on parent list, skipping add", __func__);
+ return;
+ }
+ }
+
vp = vertex_parent_new (v, ospf_lsa_has_link (w->lsa, v->lsa), newhop);
listnode_add (w->parents, vp);

--
1.7.8.6

_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [ In reply to ]
On Wed, 25 Jul 2012, David Lamparter wrote:

> This was found in scale testing at OSR; ospfd is adding the same link
> over and over again to the SPF tree. This fix prevents the resulting
> memory corruption from happening and adds a debug message to track
> occurence of this issue and/or confirm a proper fix.

So not a fix, but a hack to make the problem go away, until a fix is
found? :)

It'd be really good to document a topology that causes this "add same link
forever" problem to occur. I.e. in terms of the router and network LSA
content. Ideally, to find a very simple topology that triggers this.

The other thing we desperately need is unit tests for the SPF code.

regards,
--
Paul Jakma paul@jakma.org @pjakma Key ID: 64A2FF6A
Fortune:
He who slings mud generally loses ground.
-- Adlai Stevenson
_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [ In reply to ]
On Wed, 25 Jul 2012, Paul Jakma wrote:

> The other thing we desperately need is unit tests for the SPF code.

Which is bug #349:

https://bugzilla.quagga.net/show_bug.cgi?id=349

I'd really recommend resolving that bug before trying to make any changes
to the SPF code.

regards,
--
Paul Jakma paul@jakma.org @pjakma Key ID: 64A2FF6A
Fortune:
That all men should be brothers is the dream of people who have no brothers.
-- Charles Chincholles, "Pensees de tout le monde"
_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [ In reply to ]
On Jul 25, 2012, at 10:44 AM, Paul Jakma wrote:

> On Wed, 25 Jul 2012, David Lamparter wrote:
>
>> This was found in scale testing at OSR; ospfd is adding the same link
>> over and over again to the SPF tree. This fix prevents the resulting
>> memory corruption from happening and adds a debug message to track
>> occurence of this issue and/or confirm a proper fix.
>
> It'd be really good to document a topology that causes this "add same link forever" problem to occur. I.e. in terms of the router and network LSA content. Ideally, to find a very simple topology that triggers this.

The topology used in testing was an NxN grid of routers. For example, a 4x4 grid:

r1
/\
/ \
r2 r3
/\ /\
/ \/ \
r4 r5 r6
/\ /\ /\
/ \/ \/ \
r7 r8 r9 r10
\ /\ /\ /
\/ \/ \/
r11 r12 r13
\ /\ /
\/ \/
r14 r15
\ /
\/
r16

So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow combinatorially, so for a larger N, say 13, you have a very large number indeed. Enough to exhaust memory in our testing. The patch removes the duplicates as SPF works thru the graph, so r16 above would have just the 2 nhs, not the 20 nhs without the patch.

-scott
_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [ In reply to ]
On Wed, 25 Jul 2012, Scott Feldman wrote:

> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8
> inherits nhs from r4 and r5, so it get 3 nhs, two of which are
> duplicates. The further you get from root, the more duplication. For
> this small grid, the duplicates aren't a big deal. However, the
> duplicate nhs grow combinatorially,

Ah. I had the impression this was some kind of logic bug - a buggy bug.
Didn't realise it was an "extremely inefficient" bug. :) So this is the
correct fix.

One thing, I'd suggest perhaps arranging the fix:

- you set a compare function on the linked list

- you add a listnode_add_uniq function using that

(or otherwise extend the linked-lists to add possibility for having lists
that don't admit duplicates)

regards,
--
Paul Jakma paul@jakma.org @pjakma Key ID: 64A2FF6A
Fortune:
One of the chief duties of the mathematician in acting as an advisor...
is to discourage... from expecting too much from mathematics.
-- N. Wiener
_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [ In reply to ]
On Jul 25, 2012, at 12:52 PM, Paul Jakma wrote:

> On Wed, 25 Jul 2012, Scott Feldman wrote:
>
>> So from root r1, r4 gets nh thru r2, r5 gets nhs thru r2/3, and r8 inherits nhs from r4 and r5, so it get 3 nhs, two of which are duplicates. The further you get from root, the more duplication. For this small grid, the duplicates aren't a big deal. However, the duplicate nhs grow combinatorially,
>
> Ah. I had the impression this was some kind of logic bug - a buggy bug. Didn't realise it was an "extremely inefficient" bug. :) So this is the correct fix.

Thanks for confirming.

> One thing, I'd suggest perhaps arranging the fix:
>
> - you set a compare function on the linked list
>
> - you add a listnode_add_uniq function using that
>
> (or otherwise extend the linked-lists to add possibility for having lists that don't admit duplicates)

That's a good idea. Or, I noticed this comment on vertex_parent_new:

/* TODO: Parent list should be excised, in favour of maintaining only
* vertex_nexthop, with refcounts.
*/

So that's an alternative to the link list. Do you have a recommendation which way to go?

In the meantime, if there are no objections, let's go with the patch David submitted. Making the list duplicate check or using refcounts can be added later.

-scott
_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev
Re: [PATCH] ospfd: avoid exhausting memory with OSPF vertices (BZ#476) [ In reply to ]
On Wed, 25 Jul 2012, Scott Feldman wrote:

> That's a good idea. Or, I noticed this comment on vertex_parent_new:
>
> /* TODO: Parent list should be excised, in favour of maintaining only
> * vertex_nexthop, with refcounts.
> */
>
> So that's an alternative to the link list. Do you have a recommendation
> which way to go?

Going over that comment, I can't see any way to get rid of the parent
list. That comment should just be deleted.

> In the meantime, if there are no objections, let's go with the patch
> David submitted. Making the list duplicate check or using refcounts can
> be added later.

Sorted list should be trivial to add after the patch, yes.

I wonder if there are realistic, large networks where even that would be a
performance problem (10+ parent vertices)? Then it'd need a trie indexed
on the parent router-ID perhaps. Hopefully that's never an issue. :)

regards,
--
Paul Jakma paul@jakma.org @pjakma Key ID: 64A2FF6A
Fortune:
The more things change, the more they stay insane.
_______________________________________________
Quagga-dev mailing list
Quagga-dev@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-dev