2010
03.28

Who ate all the bandwidth?

2 people like this post.
Share

Today Internet browsing is particularly slow.
At seemingly random intervals, available bandwith drops down and people get more and more irritable. :)

How do you find out why this is happening?

The possible causes boil down to:

  1. Router/Firewall1 is not pleased by “something”. Could be an attack or a bug in the device firmware.
  2. Too many connections. Maybe they’re not passing much traffic, but the internet gateway can’t keep up with their number. I’ve seen firewalls perform very badly in this respect. E.g.: 3 connections trying to download/upload as fast as they can, and a total, aggregate, b/w of 10Mbps. Those 3 plus 3000 “normal” connections and a total b/w of 6Mbps.
  3. A reasonable amount of connections, effectively eating all of the available bandwidth.

I’ll skip case A, for now. 😉
In case B you’ll likely want to know the firewall’s idea of “netstat”, meaning the complete listing of TCP/UDP/other connections. No big deal if the device has got some sort of CLI access: capture its output, import it into a spreadsheet, or use awk/sort/grep2 to build your stats. Usually, computing total number of connections by source IP address and sorting accordingly, is enough to gain some insight about what’s going on.
Case C… For long-running (days) data analysis, you could use a tool like NTOP. But if, like me today, you need to act quickly (perhaps because you know that the issue will disappear soon), iftop can hardly be beaten.
Both tools require the machine they run on to be able to “sniff” all the traffic passing through the firewall. This can be accomplished by configuring monitoring/monitored port(s) on a switch. Monitored ports get their inbound/outbound traffic copied to the monitoring one. Different vendors call the thing a different way, port mirroring is also a good keyphrase. Here are a couple of resources:

(You could as well use a hub instead of a switch and get implicit mirroring of any port, to any port of the hub. Just unplug the firewall, link the hub to the switch, plug firewall and monitoring host in the hub. Kludgy but quick and easy, if you can afford the temporary cabling changes, and the bottleneck introduced by the hub…)

So:

  • Find the switch where the firewall is connected to. Which side of the firewall? It depends on where you believe the issues originates from. Let’s say the culprit is most likely to lie on the LAN → switch port A.
  • Connect your laptop/monitoring machine to the same switch → port B.
  • Set up monitoring: port A is monitored, port B is monitoring.
  • Run iftop, maybe telling it to also show port numbers (“-P”, without this switch, you’ll only see totals by source/destination IP addresses couple), don’t display hostnames “-n”, the interface “-i eth0″ and provide a meaningful filter (here I’m selecting packets whose source is not on the LAN3. The “-p” option instructs iftop to capture packets in promiscuous mode. Without it, iftop won’t lift off the wire packets that aren’t addressed to the machine on which it is running.
    iftop -p -P -n -i eth0 -f 'not src net 192.168.200.0/23'

    Iftop will produce a realtime table of running connections, sorted by how demanding they are in terms of bandwidth (10s average, by default). See the screenshot below; the top connections are due to two running video conference streams stealing 1Mbit/second worth of bandwidth, each.

    iftop output

    iftop's output


    Once everything is set up and you’re able to read iftop’s output, spotting the “top talkers” of your net becomes kids play, enjoy!

    1. for brevity, I’ll just say “firewall” from now on.
    2. Yuri is king at doing that. See his AWK weekly series.
    3. iftop will still show these source addresses, since its output is always made of bidirectional “connections”. Only, counters pertaining to the LAN → outside direction, won’t increase.
    Share

2 comments so far

Add Your Comment
  1. […] This post was mentioned on Twitter by Sam Hunt, Sam Hunt. Sam Hunt said: News Update: http://www.108.bz/posts/it/who-ate-all-the-bandwidth/ http://ow.ly/16UMgR […]

  2. […] the firewalls was acting as a bottleneck if presented with certain workloads (many connections, see this) . Being able to generate these kinds of report proved very useful in troubleshooting… […]