Dumping streaming media in 25 lines of Perl

3 people like this post.

Analysing TCP based protocols often means dealing with TCP sessions (also called streams or flows).
A TCP connection, from an application point of view, is much like a bidirectional file descriptor through which ordered data can be read or written. “On the wire” though, data is not ordered at all. It is split into packets, possibly shuffled and mixed with other traffic. You can capture packets using a sniffer, but to make any sense of them you also need an analyzer tool able to do the reordering/reassembling job. Wireshark, for instance, doubles as a sniffer and an analyzer, backed up by the ubiquitous libpcap.

Imagine having dumped/sniffed 1GB worth of traffic. We would like to pinpoint a single TCP session, isolating it from the rest. Here’s how we could proceed:

  • Identify the source/destination addresses and source/destination ports we’re interested in. Then throw away any packet that doesn’t match this tuple. That’s what Wireshark basically does when you select a packet, right click and hit “Follow TCP Stream”. If the same tuple doesn’t get reused for another, unrelated, session, this method works just fine1.
  • Reorder/reassemble packets.
  • Extract packets’ payload.
  • Present the payload in a way that makes sense. That depends on the L7 protocol. HTTP without keep-alive is strictly request/response: print what the client sent to the server (outbound traffic) before and then what the server answered (inbound traffic). Other protocols may behave differently and you may choose to separate inbound traffic from outbound, or rely on timing to correctly present the dialogue between peers.

Besides Wireshark, there are tools that do just that and can also be automated. See TShark or tcpflow.

What if you want to script everything and build your own TCP analyzer? Perl’s module Net::Analysis is surprisingly convenient for the task. It does the dirty job I described above and presents your code with ready to be processed TCP sessions.

Practical goal: saving MP3 files streamed by Grooveshark. Disclaimer: I’m by no means pushing anyone to illegally download stuff, this is just a working, sensible, instructional example that uses a song freely available anyway (by Revolution Void, check them out here, they’re great).

GroovesharkListener.pm extends Net::Analysis::Listener::HTTP. It sniffs all the traffic from/to port 80 and, as soon as he sees an HTTP response with a content-type of “audio”, dumps its content to file and quits. Simple as that.

Put the module some place where Perl can find it and then launch (as root):

# perl -MNet::Analysis -e main GroovesharkListener 'port 80'
(starting live capture)
text/html; charset=UTF-8
[...some more cruft...]
Dumping 8481224 bytes to groovesharkgyzBy.mp3 be patient...

# id3v2 -l groovesharkgyzBy.mp3
id3v1 tag info for groovesharkgyzBy.mp3:
Title  : Invisible Walls                 Artist: Revolution Void              
Album  : Increase the Dosage             Year: 2004, Genre: Other (12)
Comment: http://www.jamendo.com/         Track: 1

That’s it, just one more thing. Net::Analysis doesn’t allow you to select a specific network interface, it just picks up the first available one. I wrote a small patch to address this shortcoming, it adds a “device=” parameter that you can use this way:

# perl -MNet::Analysis -e main GroovesharkListener,device=wlan1 'port 80'

And here’s what GroovesharkListener.pm looks like:

# choose a song
# run (as root or via sudo):
#   perl -MNet::Analysis -e main GroovesharkListener 'port 80'
# click "play" and wait for the file to be dumped...
#                             -- Giuliano - http://www.108.bz
package Net::Analysis::Listener::GroovesharkListener;
use strict;
use base qw(Net::Analysis::Listener::HTTP);
use File::Temp;

sub http_transaction {
    my ($self, $args) = @_;
    my ($http_req) = $args->{req};
    my ($http_resp) = $args->{resp};

    print $http_req->uri(), "\n";
    my $content_type = $http_resp->header('Content-Type');
    print "$content_type\n";
    if ($content_type =~ /audio/i) {
        my $fh = new File::Temp(TEMPLATE => 'groovesharkXXXXX',
            SUFFIX   => '.mp3',
            UNLINK   => 0);
        print "Dumping ".length($http_resp->content)." bytes to ".$fh->filename." be patient...\n";
        print $fh $http_resp->content;

  1. newer Wireshark(s) use the “tcp.stream eq x” primitive

7 comments so far

Add Your Comment
  1. great post!

    Such a shame I could not install Net analysis because I was imposible to install all the packages in my mac…

    Now I’m trying to do a objective-C version of your script

    • Well, using Objective-C you’ll go far beyond 25 lines of code. :)
      Net::Analysis and Net::Pcap should indeed work on Mac OS X: they’re using the same libraries as tcpdump. I’ll try as soon as I’ve got a Mac at hand.

      Good luck with your program,


  2. I’m a complete perl-newbie but I can’t find a place where GroovesharkListener.pm is found by perl except in the main perl library (that I cannot access without the root password!) Can you help me explaining where or how I can place that piece of software?

    • Chances are that you’ll need root access anyway to be able to sniff traffic…
      But try something like this:
      $ mkdir -p Net/Analysis/Listener
      $ mv GroovesharkListener.pm Net/Analysis//Listener/
      $ sudo perl -I. -MNet::Analysis -e main ‘GroovesharkListener’ ‘port 80′
      (starting live capture)

      See the “sudo”? You may do without it, if your user has access to the whatever packet filtering facility your O.S. is supplying…
      One more thing: should you need to target a specific network interface, I’ve got a patch to Net::Analysis just for that.


  3. Great stuff! It will be definitely useful for my iPod and my gym session :)

  4. Great stuff! It will be definitely useful for my iPod and my gym session

  5. Great post; useful stuff!