2010
07.03

A newly installed FortiGate cluster (a simple two node HA active-passive setup) and some packet loss issues…
Ping from the LAN side to the Internet (or from the firewall itself) resulted in about 20% packet loss, while the other way around (WAN to firewall’s main public IP) didn’t work at all.

I used the following command to check my MAC addresses:

FORTIGATE-PRI # diagnose hardware deviceinfo nic wan1
[..]
Current_HWaddr                  00:09:0f:09:00:08
Permanent_HWaddr                00:09:0f:d1:be:ef
[..]

Then resorted to the “show mac” switches facilites (some Cisco, some ProCurve) to know on which network ports that particular MAC lied… Only to discover that the cluster’s “logical” MAC address (00:09:0f:09:00:08) wasn’t really located where I expected it to be.
Well, FortiGate’s MAC addresses aren’t randomly generated. They have predictable values that depend on the firewall’s port number. The eight port (or wan1, in my case) will always have a virtual MAC as the one above. What will happen if you have two clusters (as we had) sitting on the same L2 network segment (on the same broadcast domain, that is)? You said MAC address conflict? You’re right.
The solution is simple, use the group-id directive to tweak the logical MAC address, i.e.:

config system ha
    set group-id 10
end

Changes the second right-most bytes of the MAC, from 00 to 0a:

before  00:09:0f:09:00:08
after   00:09:0f:09:0a:08

Point is that the “FortiOS High Availablity Handbook” explains the case very thoroughly! See page 192, paragraph “Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain”. We’re so used to discardable product documentation that sometimes we don’t even try to look for clues where they should normally reside.
Instead of troubleshooting, this time, I should really have Read The (unexpectedly) Fine Manual…

2010
06.14

Following up on the “Unknown devices on IBM servers” post, let me talk about a similar situation with HP machines (DL180 G6, in my case).

The device that Windows fails to identify is this one:

PCI\VEN_8086&DEV_3A22&CC_0106

More info can be found by looking up the IDs in the pci.ids file (as I often do), or by means of the various “Unkown Device Identifier” type of software (e.g. this one). If you have a Linux machine at hand, such a one-liner may suit you:

# sed -n -e '/^8086/,/3a22/p' /usr/share/misc/pci.ids | sed -n -e '1p;$p'
8086  Intel Corporation
        3a22  82801JI (ICH10 Family) SATA AHCI Controller

What’s missing is an Intel SATA driver; needless to say that you won’t find it anywere on HP site.
I downloaded and installed the Rapid Storage Technology Driver from Intel’s web site (here). A 280KB download named “STOR_all32_f6flpy_9.6.0.1014_PV.zip” fixed things up for me.
Maybe the proper thing to try would’ve been the latest (March 2010) Proliant Support Pack, but it’s a big download and I didn’t have the time. Also, the onboard SATA controller isn’t really used (the additional SAS RAID is, instead) and I just wanted to get rid of the yellow warning sign in Device Manager.

2010
05.28

The offline ACU CD

1 person likes this post.

Well hidden in their labyrinthian web site, you may stumble upon HP’s “Array Configuration Utility (ACU) Offline CD for Smart Array”. A plain bootable CD, useful when ACU simply can’t be installed on the server/OS.
Example: I needed to tweak SSP (Selective Storage Presentation) settings on an MSA1000, connected through Fiber Channel HBAs (QLogic) to some rather old HP DL580 G2. The servers were running VMware ESX 3i 3.5.0 build-207095 (the latest one compatibile with those kind of CPUs) with no management agents installed. Since the MSA1000 can only be managed “in-band” or via a non standard serial cable the Customer, of course, lost long ago, I rebooted an ESX host with the offline ACU CD…
Before that, I also tried a standard SmartStart CD, but it didn’t work. I had version 7.80 (way younger than the servers/HBAs), but no link lights on the FC switch, meaning no firmware loaded on the QLogic card, meaning no SmartStart supported HBA drivers. Offline ACU CD version 8.20.19 worked like a charm instead. Find its latest release by searching “array configuration utility” on hp.com, clicking on “Download software”, then “Linux GUI ACU”. Download link is somewhere in that page…

2010
05.24

(This, for once, is going to be quick.)
Did you know about the Dnscmd.exe command? Read about it here and here. It’s the command-line/DOS prompt way to configure Microsoft’s DNS servers… If you need to create many zones/records at once, it saves you lots of clicks.
Here’s how to add six DNS zones (same domain name, different TLD). With the /DSPrimary option, the zone will be stored into Active Directory (rather than a file).

dnscmd /ZoneAdd domainname.bz  /DSPrimary
dnscmd /ZoneAdd domainname.biz /DSPrimary
dnscmd /ZoneAdd domainname.com /DSPrimary
dnscmd /ZoneAdd domainname.eu  /DSPrimary
dnscmd /ZoneAdd domainname.net /DSPrimary
dnscmd /ZoneAdd domainname.org /DSPrimary

And here’s how to add the same “A” record (named “www”) to each of the zones created above.

dnscmd dns-dc-hostname /RecordAdd domainname.bz  www A 10.0.0.123
dnscmd dns-dc-hostname /RecordAdd domainname.biz www A 10.0.0.123
dnscmd dns-dc-hostname /RecordAdd domainname.com www A 10.0.0.123
dnscmd dns-dc-hostname /RecordAdd domainname.eu  www A 10.0.0.123
dnscmd dns-dc-hostname /RecordAdd domainname.net www A 10.0.0.123
dnscmd dns-dc-hostname /RecordAdd domainname.org www A 10.0.0.123

As you may have guessed this is the typical scenario where you’ve got to re-create some external zones, on the internal DNS servers. That’s needed in order for the internal hosts to reach some server with the “public” DNS name, but the private IP.
For the sake of completeness, let me also mention that you could achieve the same effect by leaving DNS as it is, and configuring “loopback NAT”/”double NAT” on the router/firewall. E.g.: an internal Host wants to reach an internal Server, given it’s public hostname, mapped to a public IP address. It asks the (possibly internal) DNS to translate the name. DNS doesn’t know the zone, it forwards the query to an external DNS Server, obtaining a public IP address that it hands back to the Client. Since its address is non-local, while trying to talk with the Server, the Client sends packets to its default gateway (possibly the router/firewall). The firewall matches the server’s public IP addresses, substituting it with the right private one. It also changes the source IP, swapping the Client’s with the firewall’s LAN address. This way Client and Server are actually communicating through the firewall, even if they’re both internal hosts. And the Server can’t tell Client A from Client B since every connection to it comes from the firewall’s IP address. That’s the main reason why I prefer duplicating the public DNS zones on internal DNS servers, with private IP addresses: you avoid routing internal traffic through the firewall, and avoid NAT where there shouldn’t be any.

2010
05.16

This post will show you how to generate a list of all the users’ Distinguished Name, then filter it, then do something useful with it.

Scenario: saturday morning (after having crashed into bed at 4:00 a.m., btw), Customer calls. A virus hit the Company and one of the most annoying consequences of the outburst, is that every domain user account gets locked due to brute-force login attempts (as per the “Account Lockout Threshold” policy). While they run around cleaning PCs and fixing A/V installations1, I’m asked for a method to quickly unlock the accounts.

I tend to carry out these kind of tasks “the Unix way”, using the available DOS prompt commands and a bit of VBScript.

  • Start off by calling LDIFDE:

    ldifde -r "(objectclass=user)" -l sAMAccountName -m -f users.ldf

    LDIFDE exports/imports Active Directory data to/from properly formatted (LDIF) text files. I use it a lot. Ran as shown above, LDIFDE exports the objects of class “user” into a file named users.ldf . Of the many attributes an LDAP object bears, I tell LDIFDE to output just the “sAMAccountName” one. If I hadn’t specified any attribute, in the resulting file I’d have found duplicate DNs for the same user. That’s because of how the resulting LDIF file is described. Some A/D data is “incrementally” added to existing objects given their DN. I just picked sAMAccountName because every user has one and, also, to keep the file small.

  • Then:
    findstr /I /b dn.*ou=service.users users.ldf > service_users.txt
    findstr /I /b dn.*cn=users users.ldf > normal_users.txt

    findstr is Microsoft’s “poor man version” of grep, supporting a subset of the regular expression everyone has or should’ve come to love. Here I’m using it to extract Distinguished Names from the LDIF (only the ones that lie in a given Organizational Unit), and saving them to the *_users.txt files. They will look like:

    dn: CN=squidauth,OU=Service Users,DC=contoso,DC=com
    dn: CN=exchangebackup,OU=Service Users,DC=contoso,DC=com
    dn: CN=ldap,OU=Service Users,DC=contoso,DC=com
    dn: CN=batchcopy,OU=Service Users,DC=contoso,DC=com
  • Here’s the VBScript function to unlock an account given its DN:
    Sub unlockuser(userDN)
      Set objUser = GetObject ("LDAP://" & userDN)
      objUser.IsAccountLocked = False
      objUser.SetInfo
    End Sub

    We just need to transform findstr’s output, substituting the leading “dn: ” with “unlockuser” and enclosing in double quotes what follows. At the top of the new, transformed, file, we’ll copy/paste unlockuser subroutine definition. That’ll make our final script.

  • How to carry out the transform? Using this VBS snippet; it processes its Standard Input line by line, and outputs the modifications on Standard Output, just like any Unix file filtering command.
    Set StdIn = WScript.StdIn
    Do While Not StdIn.AtEndOfStream
        line = stdin.readline
        line = right(line,len(line)-4)
        wscript.echo "unlockuser """ & line & """"
    Loop

    I saved it in a “dnfilter.vbs” file and used it this way:

    type service_users.txt | cscript /nologo dnfilter.vbs > unlock_service_users.vbs

    To obtain something like this:

    unlockuser "CN=squidauth,OU=Service Users,DC=contoso,DC=com"
    unlockuser "CN=exchangebackup,OU=Service Users,DC=contoso,DC=com"
    unlockuser "CN=ldap,OU=Service Users,DC=contoso,DC=com"
    unlockuser "CN=batchcopy,OU=Service Users,DC=contoso,DC=com"

As I said, add the unlockuser function at the top of unlock_service_users.vbs and you’ll have your bulk unlocking script.

  1. A/V usefulness is often questionable. At least three times a year an unfortunate Customer gets infected by a 0-day threat… :(
2010
05.13

Analysing TCP based protocols often means dealing with TCP sessions (also called streams or flows).
A TCP connection, from an application point of view, is much like a bidirectional file descriptor through which ordered data can be read or written. “On the wire” though, data is not ordered at all. It is split into packets, possibly shuffled and mixed with other traffic. You can capture packets using a sniffer, but to make any sense of them you also need an analyzer tool able to do the reordering/reassembling job. Wireshark, for instance, doubles as a sniffer and an analyzer, backed up by the ubiquitous libpcap.

Imagine having dumped/sniffed 1GB worth of traffic. We would like to pinpoint a single TCP session, isolating it from the rest. Here’s how we could proceed:

  • Identify the source/destination addresses and source/destination ports we’re interested in. Then throw away any packet that doesn’t match this tuple. That’s what Wireshark basically does when you select a packet, right click and hit “Follow TCP Stream”. If the same tuple doesn’t get reused for another, unrelated, session, this method works just fine1.
  • Reorder/reassemble packets.
  • Extract packets’ payload.
  • Present the payload in a way that makes sense. That depends on the L7 protocol. HTTP without keep-alive is strictly request/response: print what the client sent to the server (outbound traffic) before and then what the server answered (inbound traffic). Other protocols may behave differently and you may choose to separate inbound traffic from outbound, or rely on timing to correctly present the dialogue between peers.

Besides Wireshark, there are tools that do just that and can also be automated. See TShark or tcpflow.

What if you want to script everything and build your own TCP analyzer? Perl’s module Net::Analysis is surprisingly convenient for the task. It does the dirty job I described above and presents your code with ready to be processed TCP sessions.

Practical goal: saving MP3 files streamed by Grooveshark. Disclaimer: I’m by no means pushing anyone to illegally download stuff, this is just a working, sensible, instructional example that uses a song freely available anyway (by Revolution Void, check them out here, they’re great).

GroovesharkListener.pm extends Net::Analysis::Listener::HTTP. It sniffs all the traffic from/to port 80 and, as soon as he sees an HTTP response with a content-type of “audio”, dumps its content to file and quits. Simple as that.

Put the module some place where Perl can find it and then launch (as root):

# perl -MNet::Analysis -e main GroovesharkListener 'port 80'
(starting live capture)
/crossdomain.xml
text/xml
/service.php?addSongsToQueueExt
text/html; charset=UTF-8
/static/amazonart/m8c8c9f4291508bca130c1caac2bda75b.png
image/png
[...some more cruft...]
/stream.php
audio/mpeg
Dumping 8481224 bytes to groovesharkgyzBy.mp3 be patient...

# id3v2 -l groovesharkgyzBy.mp3
id3v1 tag info for groovesharkgyzBy.mp3:
Title  : Invisible Walls                 Artist: Revolution Void              
Album  : Increase the Dosage             Year: 2004, Genre: Other (12)
Comment: http://www.jamendo.com/         Track: 1

That’s it, just one more thing. Net::Analysis doesn’t allow you to select a specific network interface, it just picks up the first available one. I wrote a small patch to address this shortcoming, it adds a “device=” parameter that you can use this way:

# perl -MNet::Analysis -e main GroovesharkListener,device=wlan1 'port 80'

And here’s what GroovesharkListener.pm looks like:

# choose a song
# run (as root or via sudo):
#   perl -MNet::Analysis -e main GroovesharkListener 'port 80'
# click "play" and wait for the file to be dumped...
#                             -- Giuliano - http://www.108.bz
package Net::Analysis::Listener::GroovesharkListener;
use strict;
use base qw(Net::Analysis::Listener::HTTP);
use File::Temp;

sub http_transaction {
    my ($self, $args) = @_;
    my ($http_req) = $args->{req};
    my ($http_resp) = $args->{resp};

    print $http_req->uri(), "\n";
    my $content_type = $http_resp->header('Content-Type');
    print "$content_type\n";
    if ($content_type =~ /audio/i) {
        my $fh = new File::Temp(TEMPLATE => 'groovesharkXXXXX',
            SUFFIX   => '.mp3',
            UNLINK   => 0);
        print "Dumping ".length($http_resp->content)." bytes to ".$fh->filename." be patient...\n";
        print $fh $http_resp->content;
        exit;
    }
}

1;
  1. newer Wireshark(s) use the “tcp.stream eq x” primitive
2010
04.28

Domain Controllers replicate Active Directory data with each other. They do so through Connections that are partly generated by the KCC (Knowledge Consistency Checker), partly configured by you: the Sysadmin . Each connection is one-way. If you open Active Directory Sites and Services, expand a Site and then a Server node, you’ll notice that Connections listed under NTDS Settings are labeled “From Server” and “From Site”. In the image below (stolen from here), the DC named HEIDITEST will replicate AD changes by sending them to MHILLMAN2. The Connection Object is thus defined from HEIDITEST, to MHILLMAN2. You can expect a specular Connection to exist, defined under the NTDS Settings node of HEIDITEST.

See Active Directory Replication for a more in-depth explanation.
Besides Connection objects automatically created by the KCC, which does its best to build a proper replication topology, you sometimes add your own for fault/link tolerance or other reasons. If the domain is sufficiently big, things may become messy. Instead of fumbling my way through Active Directory Sites and Services I wanted to automatically generate a visual representation of such topology, with DCs and Connections: time to write yet another script.

This time I chose VBS over Perl, hoping that this post would be more “instructional”. Perl on Windows is not so common, while VBScript is the standard way to automate stuff on that O.S. (despite the language being incredibly clumsy and annoying1).

As for the graph format, I chose to output Graphviz DOT format/language.

The script works this way:

  • Find the current domain.
  • Find all the Domain Controllers (AD objects of class nTDSDSA, see this) and the Site they’re in.
  • For each DC/Site, select nTDSConnection objects in NTDS Settings. Of course this is done by means of LDAP queries over ADO, but the view we get is equivalent to what we’re seeing in Active Directory Sites and Services.
  • Print the DOT graph on standard output: DCs, connections and sites. DCs in the same site will be clustered together.

To use it, first generate the graph’s definition:

cscript /nologo ntdsconnections_graph.vbs > AD-pre.dot

Then use Graphviz’s tools to lay out the graph and turn it into an actual image. For optimal results, I suggest something like:

ccomps -x AD-pre.dot | dot | gvpack -u | neato -Tpng -n2 > AD-pre.png

Here’s what showed up, in my test case:

And here’s the same Domain, after some treatment:

Such graphs may be useful from a Sysadmin point of view, but they’re quite ugly, honestly. I originally thought to use Graphviz to output “some” format, read it in Dia or similar diagram drawing software, and then fix the aesthetics. But Dia support (if it ever worked) has been dropped from Grapviz (December 10, 2009). Dia’s 0.97.1 tarball bears a “dot2dia.py” plugin, but I haven’t hacked it into working. Any other editable format known to Graphviz (e.g.: SVG) doesn’t support “connector” primitives meaning that arrows won’t stick to objects while you drag them around… I’ll follow up if I make some progress.

' A/D replication topology graph (Graphviz .DOT format)
' in the current Domain.
' ----------------------------
' Giuliano - http://www.108.bz

Set objRootDSE = GetObject("LDAP://RootDSE")
strConfigurationNC = objRootDSE.Get("configurationNamingContext")

Set adoCommand = CreateObject("ADODB.Command")
Set adoConnection = CreateObject("ADODB.Connection")
adoConnection.Provider = "ADsDSOObject"
adoConnection.Open "Active Directory Provider"
adoCommand.ActiveConnection = adoConnection

strBase = "<LDAP://" & strConfigurationNC & ">"
strFilter = "(objectClass=nTDSDSA)"
strAttributes = "AdsPath"
strQuery = strBase & ";" & strFilter & ";" & strAttributes & ";subtree"

adoCommand.CommandText = strQuery
adoCommand.Properties("Page Size") = 100
adoCommand.Properties("Timeout") = 60
adoCommand.Properties("Cache Results") = False

Set adoRecordset = adoCommand.Execute

Dim dictDCtoSite
Set dictDCtoSite = CreateObject("Scripting.Dictionary")
Dim dictSites
Set dictSites = CreateObject("Scripting.Dictionary")
Dim arrLink()

Function pp(s)
    pp = Replace(right(s,len(s)-3), "-", "_") ' trash the leading "CN="
End Function

Do Until adoRecordset.EOF
    Set objDC = _
        GetObject(GetObject(adoRecordset.Fields("AdsPath").Value).Parent)
    Set objSite = _
        GetObject(GetObject(objDC.Parent).Parent)
    dictDCtoSite.Add objDC.name, objSite.name
    if not dictSites.Exists(objSite.name) Then
        dictSites.Add objSite.name, 1
    End If
    adoRecordset.MoveNext
Loop
adoRecordset.Close

For Each strDcRDN in dictDCtoSite.Keys
    strSiteRDN = dictDCtoSite.Item(strDcRDN)

    strNtdsSettingsPath = "LDAP://cn=NTDS Settings," & strDcRDN & _
    ",cn=Servers," & strSiteRDN & ",cn=Sites," & strConfigurationNC

    Set objNtdsSettings = GetObject(strNtdsSettingsPath)

    objNtdsSettings.Filter = Array("nTDSConnection")

    For Each objConnection In objNtdsSettings
        'WScript.Echo strSiteRDN & " : " & Split(objConnection.fromServer, ",")(1) & " -> " & strDcRDN
        ReDim Preserve arrLink(2,k)
        arrLink(0,k) = strSiteRDN
        arrLink(1,k) = Split(objConnection.fromServer, ",")(1)
        arrLink(2,k) = strDcRDN
        k = k + 1
    Next

    Set strNtdsSettingsPath = Nothing
Next

Dim arrSubgraphs()
Redim arrSubgraphs(dictSites.Count-1)

WScript.Echo "Digraph AD {"
WScript.Echo "  fontname=helvetica;"
WScript.Echo "  node [fontname=helvetica];"
' Same site links
For Each strSiteRDN in dictSites
    nosamesitelinks = True
    headerwritten = False
    For k = 0 To Ubound(arrLink, 2)
        If strSiteRDN = arrLink(0,k) Then
            if dictDCtoSite.Item(arrLink(1,k)) = dictDCtoSite.Item(arrLink(2,k)) Then
                if nosamesitelinks Then
                    nosamesitelinks = False
                    WScript.Echo "    subgraph cluster_" & pp(strSiteRDN) & " {"
                    headerwritten = True
                End If
                WScript.Echo "        " & pp(arrLink(1,k)) & " -> " & pp(arrLink(2,k)) & ";"
            End If
        End If
    Next
    If headerwritten Then
        WScript.Echo "        label= """ & pp(strSiteRDN) & """"
        WScript.Echo "    }"
    End If
Next
Wscript.Echo
' Inter-site links
For k = 0 To Ubound(arrLink, 2)
    if dictDCtoSite.Item(arrLink(1,k)) <> dictDCtoSite.Item(arrLink(2,k)) Then
        WScript.Echo "    " & pp(arrLink(1,k)) & " -> " & pp(arrLink(2,k)) & ";"
    End If
Next
WScript.Echo "}"
  1. No powerful and convenient data types, no free and ready to use debugger, no public CPAN-like module repository, unnecessarily verbose syntax; I may go on for an hour…
2010
04.20

A while ago I was trying to get my head around some nasty network performance issues. A couple of firewalls were in the play, along with a Bandwidth Manager device (an Allot NetEnforcer AC-402).

I wasn’t completely satisfied with NetEnforcer reporting functions and wanted something more dependable and realtime. Well, if you turn to the device’s CLI access (SSH), you’ll notice an interesting acthruput command.
It shows the current throughput per Interface, Line, Pipe and Virtual Channel. What more could you ask for?

AC:~# acthruput
---------------------------------------------------------
Entity         Name                              Bits/sec
---------------------------------------------------------
INTERFACE      Internal                           1918600
  LINE         1                                  1770720
      PIPE     8                                     2144
          VC   32                                    2144
      PIPE     5                                     7136
          VC   8                                     7136
[..]
---------------------------------------------------------
INTERFACE      External                           9509880
  LINE         1                                  9421000
      PIPE     8                                    96960
          VC   32                                   96960
      PIPE     13                                     752
          VC   22                                     752
[..]

As you can see, acthruput identifies Pipes by number. How do you relate this number to the actual mnemonic pipe name? Use “acstat -l pipe“, which also displays the total number of live connections per pipe .

AC:~# acstat -l pipe
---------------------------------------------------------------------------------
Rule QID                Rule name                                Live connections
---------------------------------------------------------------------------------
1.8.0.0.0               Customer1 ; Fallback                     10
1.13.0.0.0              Customer2 ; Fallback                     7
1.5.0.0.0               Customer3 ; Fallback                     23
[..]

Wrap acthruput in a while loop that adds a timestamp and a delay (→ sampling frequency). Start your terminal emulator logging facilities, hit enter, wait, ctrl-c, stop logging.

AC:~# while [ 1 ] ; do date; acthruput; sleep 10; done

Eventually, clean the log a bit and feed it to the Perl script you’ll find at the end of this post.

$ DATE='Thu Dec 10'; grep "$DATE\|INTERFACE\|LINE\|PIPE" "log.txt"  | ./allot_fmt.pl "$DATE" > log.csv

The script outputs CSV formatted data:

timestamp;ifc;L1;P1;P10;P12;P2;P3;P4;P5;P8;P9;
Thu Dec 10 14:48:00 CET 2009;Int;2779648;2599928;4608;;111760;1024;;9792;;52536;
Thu Dec 10 14:48:00 CET 2009;Ext;8372424;5372392;206448;;2407264;60720;;258816;;66784;
Thu Dec 10 14:48:12 CET 2009;Int;1909272;1699872;3776;;170624;512;;1216;;33272;
Thu Dec 10 14:48:12 CET 2009;Ext;7932680;7370584;97152;;350920;36432;;12144;;65448;
[..]

And here’s what it looks like when opened up in OpenOffice Calc (sorry, no fancy formatting).
NetEnforcer bandwidth report
The graph above shows that the 8Mbps link (the “Line”, in Allot’s parlance) is not saturated. Problem was that, during that timeframe, we were also trying to make Iperf “consume” all of the available bandwidth. We couldn’t make it because one of the firewalls was acting as a bottleneck if presented with certain workloads (many connections, see this) . Being able to generate these kinds of report proved very useful in troubleshooting…

#!/usr/bin/perl
# Giuliano - http://www.108.bz
use strict;

my @samples;

my $lastsample;
my $lastint;

while (<STDIN>) {
    s/[\r\n]*//g;
    next unless $_;
    if (/$ARGV[0]/) {
        $lastsample = [];
        $lastsample->[0] = $_;
        $lastsample->[1] = {};
        push @samples, $lastsample;
        #print "$_\n";
    } elsif (/INTERFACE/) {
        s/^.*INTERFACE.*(Int|Ext)ernal.*$/$1/;
        $lastint = $_;
        #print "$lastint\n";
    } elsif (/LINE/) {
        s/^.*LINE\s*([0-9]+)\s*(\d+).*$/L$1;$2/;
        my ($line,$tput) = split ';', $_;
        #print "$line,$tput\n";
        $lastsample->[1]->{$lastint}->{$line} = $tput;
    } elsif (/PIPE/) {
        s/^.*PIPE\s*([0-9]+)\s*(\d+).*$/P$1;$2/;
        my ($pipe,$tput) = split ';', $_;
        #print "$pipe,$tput\n";
        $lastsample->[1]->{$lastint}->{$pipe} = $tput;
    } else {
        print STDERR "wtf\n";
        exit;
    }
}

my $keys = {};
foreach my $sample (@samples) {
    foreach my $int (keys %{$sample->[1]}) {
        foreach my $key (keys %{$sample->[1]->{$int}}) {
            $keys->{$key} = 1;
        }
    }
}

my @keys = sort keys %$keys;

print "timestamp;ifc;";
foreach my $key (@keys) {
    print "$key;";
}
print "\n";
foreach my $sample (@samples) {
    foreach my $int (('Int','Ext')) {
        print "$sample->[0];";
        print "$int;";
        foreach my $key (@keys) {
            print "$sample->[1]->{$int}->{$key};";
        }
        print "\n";
    }
}

exit;
2010
04.13

Today I had the need to (automatically) rewrite sender addresses of an email depending on the recipient domain. A way to trick Postfix into applying a sort of “conditional masquerading”. Postfix rewriting tables are just static key → value dictionaries: they’re used to lookup B given A, but there’s no available logic to cope with more complicated patterns.
A little more context to help me explain: I’m talking about a monitoring system. Alert emails are generated by Nagios and handed to a local Postfix on the same server. And here are the rules to implement:

  • A locally generated email whose destination is inside the company, should leave Postfix with a @FQDN suffix (@hostname.localdomain.lan) in its sender addresses. Sender addresses shouldn’t be rewritten/masqueraded at all.
  • A locally generated email whose destination is outside of the company, needs to be masquerated, its sender addresses rewritten as @extdomain.com .

Moreover, but that’s a routing matter rather than a rewriting one:

  • Emails directed to @smsgw.localdomain.lan have to be relayed through a different mail server.

As you can see, the logic is: lookup B (rewritten sender) given A (sender) depending on C (recipient).

I found the right hint deeply buried in Postfix’s mailing list: check out Noel Jones post, kudos to him.

  • First, define a new smtp transport in “master.cf”; just copy/paste the existing one and change the first column to whatever name you like. We are explicitly telling the new transport that it will use its own generic regexp map (the -o command-line option).

    [root@hostname postfix]# cd /etc/postfix
    [root@hostname postfix]# grep '^\(smtp\|toext\).*unix' master.cf
    smtp      unix  -       -       n       -       -       smtp
    toext     unix  -       -       n       -       -       smtp -o smtp_generic_maps=regexp:/etc/postfix/generic_toext
  • We also need to take control over the mail routing mechanism. This is done by enabling transport maps.
    [root@hostname postfix]# grep ^transport main.cf
    transport_maps = regexp:/etc/postfix/transport
  • Transport maps (remember that they’re matched against From addresses) are configured in order to:
    • Route mail that should be delivered locally through the local transport. This will preserve /etc/aliases and .forward behaviour and make everything act like you expect on Unix.
    • Route mail to @smsgw.localdomain.lan, via its dedicated gateway, using the “standard” smtp transport.
    • Route mail to @localdomain.lan, through the main SMTP gateway, using the smtp transport.
    • Route any other message through the main SMTP gateway, but use our custom transport.
    [root@hostname postfix]# tail -4 transport
    /@hostname\.localdomain\.lan$/  local:hostname.localdomain.lan
    /@smsgw\.localdomain\.lan$/     smtp:[smsgw.localdomain.lan]
    /@localdomain\.lan/             smtp:[gateway.localdomain.lan]
    /@.*$/                          toext:[gateway.localdomain.lan]
  • The custom transport’s generic map rewrites recipient adresses, shortening the FQDN by preserving just the domain name, and changing the address part before the @ sign. Hostname is being stripped but I still want to be able to tell, at a glance, from where the message originates. When they leave the mail system, rewritten addresses look like username-hostname@extdomain.com .
    [root@hostname postfix]# cat generic_toext
    /^(.*)@hostname\.localdomain\.lan$/ $1-hostname@extdomain.com
2010
03.28

Today Internet browsing is particularly slow.
At seemingly random intervals, available bandwith drops down and people get more and more irritable. :)

How do you find out why this is happening?

The possible causes boil down to:

  1. Router/Firewall1 is not pleased by “something”. Could be an attack or a bug in the device firmware.
  2. Too many connections. Maybe they’re not passing much traffic, but the internet gateway can’t keep up with their number. I’ve seen firewalls perform very badly in this respect. E.g.: 3 connections trying to download/upload as fast as they can, and a total, aggregate, b/w of 10Mbps. Those 3 plus 3000 “normal” connections and a total b/w of 6Mbps.
  3. A reasonable amount of connections, effectively eating all of the available bandwidth.

I’ll skip case A, for now. ;)
In case B you’ll likely want to know the firewall’s idea of “netstat”, meaning the complete listing of TCP/UDP/other connections. No big deal if the device has got some sort of CLI access: capture its output, import it into a spreadsheet, or use awk/sort/grep2 to build your stats. Usually, computing total number of connections by source IP address and sorting accordingly, is enough to gain some insight about what’s going on.
Case C… For long-running (days) data analysis, you could use a tool like NTOP. But if, like me today, you need to act quickly (perhaps because you know that the issue will disappear soon), iftop can hardly be beaten.
Both tools require the machine they run on to be able to “sniff” all the traffic passing through the firewall. This can be accomplished by configuring monitoring/monitored port(s) on a switch. Monitored ports get their inbound/outbound traffic copied to the monitoring one. Different vendors call the thing a different way, port mirroring is also a good keyphrase. Here are a couple of resources:

(You could as well use a hub instead of a switch and get implicit mirroring of any port, to any port of the hub. Just unplug the firewall, link the hub to the switch, plug firewall and monitoring host in the hub. Kludgy but quick and easy, if you can afford the temporary cabling changes, and the bottleneck introduced by the hub…)

So:

  • Find the switch where the firewall is connected to. Which side of the firewall? It depends on where you believe the issues originates from. Let’s say the culprit is most likely to lie on the LAN → switch port A.
  • Connect your laptop/monitoring machine to the same switch → port B.
  • Set up monitoring: port A is monitored, port B is monitoring.
  • Run iftop, maybe telling it to also show port numbers (“-P”, without this switch, you’ll only see totals by source/destination IP addresses couple), don’t display hostnames “-n”, the interface “-i eth0″ and provide a meaningful filter (here I’m selecting packets whose source is not on the LAN3. The “-p” option instructs iftop to capture packets in promiscuous mode. Without it, iftop won’t lift off the wire packets that aren’t addressed to the machine on which it is running.
    iftop -p -P -n -i eth0 -f 'not src net 192.168.200.0/23'

    Iftop will produce a realtime table of running connections, sorted by how demanding they are in terms of bandwidth (10s average, by default). See the screenshot below; the top connections are due to two running video conference streams stealing 1Mbit/second worth of bandwidth, each.

    iftop output

    iftop's output


    Once everything is set up and you’re able to read iftop’s output, spotting the “top talkers” of your net becomes kids play, enjoy!
    1. for brevity, I’ll just say “firewall” from now on.
    2. Yuri is king at doing that. See his AWK weekly series.
    3. iftop will still show these source addresses, since its output is always made of bidirectional “connections”. Only, counters pertaining to the LAN → outside direction, won’t increase.