Mailing List Archive

PF_RING 75-88% packet loss (using Suricata)
This is my first time posting so I apologize if this is a simple
question or has been asked before.

I am seeing 75-88% packet loss on PF_RING, running Suricata Suricata
1.3dev (rev e6dea5c) and PF_RING 5.3.0 on CentOS. Suricata is pegging
all four 1.6 GHz processor cores but the reason I'm posting here is
because it looks like PF_RING is responsible for all the drops.

The suricata.drop log is not showing drops and I'm running Suricata
with the pf_ring options '--pfring-int=eth2 --pfring-cluster-id=99
--pfring-cluster-type=cluster_flow ' and '--runmode=autofp' (I have
also increased pre-allocation, reassembly, and session memory sizes in
Suricata's config).

ifconfig doesn't show the drops (except for some packets that wanted
to be forwarded and 1 checksum error):

# /sbin/ifconfig eth2
eth2 Link encap:Ethernet HWaddr 00:1B:78:31:D1:D4
UP BROADCAST RUNNING NOARP PROMISC MULTICAST MTU:1500 Metric:1
RX packets:2340853835 errors:1 dropped:130 overruns:0 frame:1
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2910286855 (2.7 GiB) TX bytes:0 (0.0 b)
Interrupt:185 Memory:f4000000-f4012800

# cat /proc/net/pf_ring/1509-eth2.23
Bound Device(s) : eth2
Slot Version : 13 [5.3.0]
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : Suricata
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 128
Num Poll Calls : 4711762
Channel Id : -1
Cluster Id : 0
Min Num Slots : 4982
Bucket Len : 1522
Slot Len : 1682 [bucket+header]
Tot Memory : 8388608
Tot Packets : 3654955802
Tot Pkt Lost : 2819352763
Tot Insert : 835603039
Tot Read : 835593109
Insert Offset : 6562037
Remove Offset : 6565218
Tot Fwd Ok : 0
Tot Fwd Errors : 0
Num Free Slots : 0

# cat /proc/net/pf_ring/info
PF_RING Version : 5.3.0 ($Revision: exported$)
Ring slots : 4096
Slot version : 13
Capture TX : Yes [RX+TX]
IP Defragment : No
Socket Mode : Standard
Transparent mode : Yes (mode 0)
Total rings : 1
Total plugins : 0

I already increased some memory limits in the OS:

sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_max=33554432
sysctl -w net.ipv4.tcp_rmem=33554432
sysctl -w net.ipv4.tcp_wmem=33554432
sysctl -w net.core.netdev_max_backlog=5000

RAM usage on the box is less than half of the 3+ GB and eth2 basically
sits off a span port on the switch and sees 40-60 MiB of traffic.

Any idea why PF_RING is dropping so much? Let me know what other info you need.

Thanks.

-Mike Cox
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc
Re: PF_RING 75-88% packet loss (using Suricata) [ In reply to ]
On 03/05/12 16:28, Mike Cox wrote:
> This is my first time posting so I apologize if this is a simple
> question or has been asked before.
>
> I am seeing 75-88% packet loss on PF_RING, running Suricata Suricata
> 1.3dev (rev e6dea5c) and PF_RING 5.3.0 on CentOS. Suricata is pegging
> all four 1.6 GHz processor cores but the reason I'm posting here is
> because it looks like PF_RING is responsible for all the drops.
>
> The suricata.drop log is not showing drops and I'm running Suricata
> with the pf_ring options '--pfring-int=eth2 --pfring-cluster-id=99
> --pfring-cluster-type=cluster_flow ' and '--runmode=autofp' (I have
> also increased pre-allocation, reassembly, and session memory sizes in
> Suricata's config).

Do you have any tcp.reassembly_gap in Suricata's stats.log? Another one
to look at is "tcp.ssn_memcap_drop" in case you've not made the buffers
big enough. I'd also recommend "runmode=workers". How many threads do
have configured? This sort of discussion is better on the oisf-users
list, though.

More to the point in PF_RING, what network card and driver are you using?

>
> ifconfig doesn't show the drops (except for some packets that wanted
> to be forwarded and 1 checksum error):
>
> # /sbin/ifconfig eth2
> eth2 Link encap:Ethernet HWaddr 00:1B:78:31:D1:D4
> UP BROADCAST RUNNING NOARP PROMISC MULTICAST MTU:1500 Metric:1
> RX packets:2340853835 errors:1 dropped:130 overruns:0 frame:1
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:2910286855 (2.7 GiB) TX bytes:0 (0.0 b)
> Interrupt:185 Memory:f4000000-f4012800
>
> # cat /proc/net/pf_ring/1509-eth2.23
> Bound Device(s) : eth2
> Slot Version : 13 [5.3.0]
> Active : 1
> Breed : Non-DNA
> Sampling Rate : 1
> Capture Direction : RX+TX
> Socket Mode : RX+TX
> Appl. Name : Suricata
> IP Defragment : No
> BPF Filtering : Disabled
> # Sw Filt. Rules : 0
> # Hw Filt. Rules : 0
> Poll Pkt Watermark : 128
> Num Poll Calls : 4711762
> Channel Id : -1
> Cluster Id : 0

That doesn't match what you said on the command line

...


> # cat /proc/net/pf_ring/info
> PF_RING Version : 5.3.0 ($Revision: exported$)
> Ring slots : 4096
> Slot version : 13
> Capture TX : Yes [RX+TX]


> IP Defragment : No
> Socket Mode : Standard
> Transparent mode : Yes (mode 0)

I'd recommend "transparent_mode=2" on the PF_RING options; you probably
don't need TX.

> RAM usage on the box is less than half of the 3+ GB and eth2 basically
> sits off a span port on the switch and sees 40-60 MiB of traffic.

I'm monitoring more than 10x that with Suricata + PF_RING on a quite old
box.

Best Wishes,
Chris

--
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin, c.d.wakelin@reading.ac.uk
IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc
Re: PF_RING 75-88% packet loss (using Suricata) [ In reply to ]
Thanks Chris. The reason I didn't post this to an OISF list is because it
appears to be related to PF_RING (I know Suricata is pegging the processors
and maybe that is causing PF_RING to drop so much but if not, I was going
to address that after I got PF_RING working). I've profiled the Suricata
rules and I don't have any "hogs" (just running the ET set) and the
stats.log file from Suricata show some drops but drop.log is empty:

-------------------------------------------------------------------
Counter | TM Name | Value
-------------------------------------------------------------------
decoder.pkts | RxPFR1 | 833798615
decoder.bytes | RxPFR1 | 502533217963
decoder.ipv4 | RxPFR1 | 833225481
decoder.ipv6 | RxPFR1 | 0
decoder.ethernet | RxPFR1 | 833798615
decoder.raw | RxPFR1 | 0
decoder.sll | RxPFR1 | 0
decoder.tcp | RxPFR1 | 737496688
decoder.udp | RxPFR1 | 80191916
decoder.sctp | RxPFR1 | 0
decoder.icmpv4 | RxPFR1 | 220048
decoder.icmpv6 | RxPFR1 | 0
decoder.ppp | RxPFR1 | 0
decoder.pppoe | RxPFR1 | 0
decoder.gre | RxPFR1 | 0
decoder.vlan | RxPFR1 | 0
decoder.avg_pkt_size | RxPFR1 | 603
decoder.max_pkt_size | RxPFR1 | 1514
defrag.ipv4.fragments | RxPFR1 | 406
defrag.ipv4.reassembled | RxPFR1 | 4
defrag.ipv4.timeouts | RxPFR1 | 0
defrag.ipv6.fragments | RxPFR1 | 0
defrag.ipv6.reassembled | RxPFR1 | 0
defrag.ipv6.timeouts | RxPFR1 | 0
flow_mgr.closed_pruned | FlowManagerThread | 23484250
flow_mgr.new_pruned | FlowManagerThread | 2602702
flow_mgr.est_pruned | FlowManagerThread | 3074493
flow.memuse | FlowManagerThread | 41752256
flow.spare | FlowManagerThread | 101334
flow.emerg_mode_entered | FlowManagerThread | 0
flow.emerg_mode_over | FlowManagerThread | 0
tcp.sessions | Detect | 9464226
*tcp.ssn_memcap_drop | Detect | 0*
tcp.pseudo | Detect | 2961971
tcp.invalid_checksum | Detect | 0
tcp.no_flow | Detect | 0
tcp.reused_ssn | Detect | 637
tcp.memuse | Detect | 415236096
tcp.syn | Detect | 12401324
tcp.synack | Detect | 12055841
tcp.rst | Detect | 12859628
*tcp.segment_memcap_drop | Detect | 120855441*
tcp.stream_depth_reached | Detect | 207
tcp.reassembly_memuse | Detect | 6442450836
*tcp.reassembly_gap | Detect | 2469925*
detect.alert | Detect | 7408

For Suricata, I have 'threads: 1' under 'pfring:' and the default
'detect-thread-ratio: 1.5' under 'threading:'. NIC info (I don't think
this supports 'transparent_mode=2"):

# dmesg | grep 'Ethernet'
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20,
2011)

# ls /proc/net/pf_ring/
1509-eth2.23 dev info plugins_info

# ls /proc/net/pf_ring/dev/
eth0 eth1 eth2 eth3

ls /proc/net/pf_ring/dev/eth2
info

# cat /proc/net/pf_ring/dev/eth2/info
Name: eth2
Index: 4
Address: 00:1B:78:31:D1:D4
Polling Mode: NAPI/TNAPI
Type: Ethernet
Family: Standard NIC
# Bound Sockets: 1
# Used RX Queues: 1

Am I missing something here? I'm pretty new to PF_RING so any help is
appreciated. Thanks.

-Mike Cox

On Thu, May 3, 2012 at 10:41 AM, Chris Wakelin <c.d.wakelin@reading.ac.uk>
wrote:
> On 03/05/12 16:28, Mike Cox wrote:
>> This is my first time posting so I apologize if this is a simple
>> question or has been asked before.
>>
>> I am seeing 75-88% packet loss on PF_RING, running Suricata Suricata
>> 1.3dev (rev e6dea5c) and PF_RING 5.3.0 on CentOS. Suricata is pegging
>> all four 1.6 GHz processor cores but the reason I'm posting here is
>> because it looks like PF_RING is responsible for all the drops.
>>
>> The suricata.drop log is not showing drops and I'm running Suricata
>> with the pf_ring options '--pfring-int=eth2 --pfring-cluster-id=99
>> --pfring-cluster-type=cluster_flow ' and '--runmode=autofp' (I have
>> also increased pre-allocation, reassembly, and session memory sizes in
>> Suricata's config).
>
> Do you have any tcp.reassembly_gap in Suricata's stats.log? Another one
> to look at is "tcp.ssn_memcap_drop" in case you've not made the buffers
> big enough. I'd also recommend "runmode=workers". How many threads do
> have configured? This sort of discussion is better on the oisf-users
> list, though.
>
> More to the point in PF_RING, what network card and driver are you using?
>
>>
>> ifconfig doesn't show the drops (except for some packets that wanted
>> to be forwarded and 1 checksum error):
>>
>> # /sbin/ifconfig eth2
>> eth2 Link encap:Ethernet HWaddr 00:1B:78:31:D1:D4
>> UP BROADCAST RUNNING NOARP PROMISC MULTICAST MTU:1500
Metric:1
>> RX packets:2340853835 errors:1 dropped:130 overruns:0 frame:1
>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:2910286855 (2.7 GiB) TX bytes:0 (0.0 b)
>> Interrupt:185 Memory:f4000000-f4012800
>>
>> # cat /proc/net/pf_ring/1509-eth2.23
>> Bound Device(s) : eth2
>> Slot Version : 13 [5.3.0]
>> Active : 1
>> Breed : Non-DNA
>> Sampling Rate : 1
>> Capture Direction : RX+TX
>> Socket Mode : RX+TX
>> Appl. Name : Suricata
>> IP Defragment : No
>> BPF Filtering : Disabled
>> # Sw Filt. Rules : 0
>> # Hw Filt. Rules : 0
>> Poll Pkt Watermark : 128
>> Num Poll Calls : 4711762
>> Channel Id : -1
>> Cluster Id : 0
>
> That doesn't match what you said on the command line
>
> ...
>
>
>> # cat /proc/net/pf_ring/info
>> PF_RING Version : 5.3.0 ($Revision: exported$)
>> Ring slots : 4096
>> Slot version : 13
>> Capture TX : Yes [RX+TX]
>
>
>> IP Defragment : No
>> Socket Mode : Standard
>> Transparent mode : Yes (mode 0)
>
> I'd recommend "transparent_mode=2" on the PF_RING options; you probably
> don't need TX.
>
>> RAM usage on the box is less than half of the 3+ GB and eth2 basically
>> sits off a span port on the switch and sees 40-60 MiB of traffic.
>
> I'm monitoring more than 10x that with Suricata + PF_RING on a quite old
> box.
>
> Best Wishes,
> Chris
>
> --
> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
> Christopher Wakelin, c.d.wakelin@reading.ac.uk
> IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
> Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
Re: PF_RING 75-88% packet loss (using Suricata) [ In reply to ]
On 03/05/12 17:26, Mike Cox wrote:
> Thanks Chris. The reason I didn't post this to an OISF list is because it
> appears to be related to PF_RING (I know Suricata is pegging the processors
> and maybe that is causing PF_RING to drop so much but if not, I was going
> to address that after I got PF_RING working). I've profiled the Suricata
> rules and I don't have any "hogs" (just running the ET set) and the
> stats.log file from Suricata show some drops but drop.log is empty:
> *tcp.ssn_memcap_drop | Detect | 0*
> *tcp.segment_memcap_drop | Detect | 120855441*
> *tcp.reassembly_gap | Detect | 2469925*

I think you may need to increase one of the buffers there. I have
"stream\-memcap" set to 64mb and "stream\-reassembly\-memcap set to
4gb!). Try doubling them from the defaults. You might also want to
increase "max-pending-packets" - I have it at 5000.

The reassembly gaps confirm you're losing packets somewhere.

It might be worth trying something like "tcpdump -i eth2 -s0 -v -w
/dev/null" for a few seconds without Suricata running, then see if you
got any dropped packets in "ethtool -S" and the "kernel dropped" line
when tcpdump stops.

>
> For Suricata, I have 'threads: 1' under 'pfring:' and the default
> 'detect-thread-ratio: 1.5' under 'threading:'. NIC info (I don't think
> this supports 'transparent_mode=2"):

For a quad-core machine you probably want "threads: 4".

>
> # dmesg | grep 'Ethernet'
> bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20,
> 2011)

No, bnx2 isn't properly supported in PF_RING (so won't support
transparent_mode=1 or 2); the developers don't have any cards. I'd
recommend adding an Intel e1000e-driver card and using the
PF_RING-enabled driver; the cards are very cheap these days (I think the
cheaper Intel PCI-express Gigabit cards use this and cost only 20-30 GBP
or probably the same in USD!).

It's also not a bad idea to have the packet-capturing happening in a
different driver to your management (SSH etc.), otherwise
transparent_mode=2 will kill your SSH connections (they don't get to see
their traffic).

Best Wishes,
Chris

--
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
Christopher Wakelin, c.d.wakelin@reading.ac.uk
IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc
Re: PF_RING 75-88% packet loss (using Suricata) [ In reply to ]
Thanks Chris. Running this:

/usr/sbin/tcpdump -nn -s0 -vv -c 1000000 -i eth2 -w /dev/null

I get 0 drops without Suricata running and 5-50K drops (0.5-5%) when it is
running. This doesn't match with the almost 90% reported drops in the
pf_ring numbers.

ethtool doesn't show the massive drops:

# /sbin/ethtool -S eth2
NIC statistics:
rx_bytes: 22774537822664
rx_error_bytes: 0
tx_bytes: 0
tx_error_bytes: 0
...
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 1
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
...
rx_filtered_packets: 141179039
rx_ftq_discards: 0
rx_discards: 0
rx_fw_discards: 130

I already have Suricata configured like this:

stream:
memcap: 1gb
...
reassembly:
memcap: 1gb
depth: 4mb
...
libhtp:

default-config:
personality: IDS
request-body-limit: 16kb
response-body-limit: 4mb

As far as a e1000 NIC, I'm sure I have one laying around but it still seems
there is larger issue that having a supported NIC may not fix (although I'm
sure it will help some and I plan on going that route soon). I am running
an older kernel though:

# uname -r
2.6.18-308.4.1.el5

I still think there is something up with PF_RING or I set it up/configured
it wrong.

# /sbin/modprobe pf_ring enable_tx_capture=0

# cat /proc/net/pf_ring/info
PF_RING Version : 5.3.0 ($Revision: exported$)
Ring slots : 4096
Slot version : 13
Capture TX : No [RX only]
IP Defragment : No
Socket Mode : Standard
Transparent mode : Yes (mode 0)
Total rings : 0
Total plugins : 0

# cat /proc/net/pf_ring/4044-eth2.4
Bound Device(s) : eth2
Slot Version : 13 [5.3.0]
Active : 1
Breed : Non-DNA
Sampling Rate : 1
Capture Direction : RX+TX
Socket Mode : RX+TX
Appl. Name : Suricata
IP Defragment : No
BPF Filtering : Disabled
# Sw Filt. Rules : 0
# Hw Filt. Rules : 0
Poll Pkt Watermark : 128
Num Poll Calls : 0
Channel Id : -1
Cluster Id : 0
Min Num Slots : 4982
Bucket Len : 1522
Slot Len : 1682 [bucket+header]
Tot Memory : 8388608
Tot Packets : 433924
Tot Pkt Lost : 397198
Tot Insert : 36726
Tot Read : 27323
Insert Offset : 5233452
Remove Offset : 5521393
Tot Fwd Ok : 0
Tot Fwd Errors : 0
Num Free Slots : 0

Thanks.

-Mike Cox

On Thu, May 3, 2012 at 11:53 AM, Chris Wakelin <c.d.wakelin@reading.ac.uk>wrote:

> On 03/05/12 17:26, Mike Cox wrote:
> > Thanks Chris. The reason I didn't post this to an OISF list is because
> it
> > appears to be related to PF_RING (I know Suricata is pegging the
> processors
> > and maybe that is causing PF_RING to drop so much but if not, I was going
> > to address that after I got PF_RING working). I've profiled the Suricata
> > rules and I don't have any "hogs" (just running the ET set) and the
> > stats.log file from Suricata show some drops but drop.log is empty:
> > *tcp.ssn_memcap_drop | Detect | 0*
> > *tcp.segment_memcap_drop | Detect | 120855441*
> > *tcp.reassembly_gap | Detect | 2469925*
>
> I think you may need to increase one of the buffers there. I have
> "stream\-memcap" set to 64mb and "stream\-reassembly\-memcap set to
> 4gb!). Try doubling them from the defaults. You might also want to
> increase "max-pending-packets" - I have it at 5000.
>
> The reassembly gaps confirm you're losing packets somewhere.
>
> It might be worth trying something like "tcpdump -i eth2 -s0 -v -w
> /dev/null" for a few seconds without Suricata running, then see if you
> got any dropped packets in "ethtool -S" and the "kernel dropped" line
> when tcpdump stops.
>
> >
> > For Suricata, I have 'threads: 1' under 'pfring:' and the default
> > 'detect-thread-ratio: 1.5' under 'threading:'. NIC info (I don't think
> > this supports 'transparent_mode=2"):
>
> For a quad-core machine you probably want "threads: 4".
>
> >
> > # dmesg | grep 'Ethernet'
> > bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July
> 20,
> > 2011)
>
> No, bnx2 isn't properly supported in PF_RING (so won't support
> transparent_mode=1 or 2); the developers don't have any cards. I'd
> recommend adding an Intel e1000e-driver card and using the
> PF_RING-enabled driver; the cards are very cheap these days (I think the
> cheaper Intel PCI-express Gigabit cards use this and cost only 20-30 GBP
> or probably the same in USD!).
>
> It's also not a bad idea to have the packet-capturing happening in a
> different driver to your management (SSH etc.), otherwise
> transparent_mode=2 will kill your SSH connections (they don't get to see
> their traffic).
>
> Best Wishes,
> Chris
>
> --
> --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+-
> Christopher Wakelin, c.d.wakelin@reading.ac.uk
> IT Services Centre, The University of Reading, Tel: +44 (0)118 378 2908
> Whiteknights, Reading, RG6 6AF, UK Fax: +44 (0)118 975 3094
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>