Having worked with networking gear for many years I thought it was about time to jump in and post something to our blog, and why not start by talking about pcap files. As most of you already know, when testing and providing support of networking products, it is common that you will get a big pcap file. Often the file can be so big that it is at best slow when opening in Wireshark, or at worst it may be impossible. Make no mistake, I am a big fan of Wireshark and can not remember a day here on the job where I didn’t use this wonderful tool. But the question is, how do you complete tasks such as “grab some TCP sessions where there is no data from server” if opening a 200MB pcap file crashes Wireshark every time?
No worries, programming to the rescue!
To solve the problem I used Perl (feel free to use your favorite language) to open a pcap file and do some analysis. Let us look at finding sessions where the client sent data but the server didn’t send any data in response. To make it easy I’ve included all the steps I took and, where appropriate, the code. Since the point is to illustrate how to use script language like Perl to do the job, the code is greatly simplified. For the convenience of reader, the complete code is listed at the end.
Step 1. Open the pcap file and put it in binary mode:
$inputFile = $ARGV[0]; #the first command line parameter is the name of the pcap file open(FD, "<$inputFile") || die "failed to open $inputFile $!\n"; A pcap file is a binary file, so open it in binary mode binmode(FD);
Step 2a. Knowing the structure of the pcap file is helpful here: pcap files typically start with 24 byte file header followed by a sequence of packets. The most important thing from the file header is to read the first 4 bytes to find the endianess.
read(FD, $fileHdr, 24); #skip the 24 byte pcap file header checkFileHdr($fileHdr); #the routine is defined later, it checks the file header to find endianness
Step 2b. Each packet consists of 16 bytes of packet header (timestamp, length etc. Be sure not to mistake it with the protocol header) plus packet data. For details of pcap file format, you can read, for example, pcap file format . We process them by first reading the 16 bytes of header and then read the packat data. Note how endianness plays a role here.
while (!eof(FD))
{
read(FD, $pktHdr, 16);
my ($pktSecond,$pktMicroSecond,$capturedPktLen,$actualPktLen)
= unpack($pktHdrFormat, $pktHdr);
#endianness determines how time and packet length are stored.
if (read(FD, $pktBuf, $capturedPktLen) != $capturedPktLen )
{ print "Failed to read pkt data\n"; last;}
$proto = unpack("C", substr($pktBuf, 23,1));
#this is the byte in ip hdr for protocol type, UDP is 17, TCP is 6
if ($proto == 6) #we are only interested in TCP packet for this task
{
processTCPPkt(); #we will explain this function next
}
}
Step 3. In function “processTCPPkt”, we need to decide the offset of various fields, such as IP header protocol field, source IP address, destination IP address, source TCP port, destination TCP port and more. How? Once again I rely on Wireshark (with a small pcap file of course). It clearly shows, for example, source IP address is at offset 26 and source TCP port is at offset 34 (assume the packet doesn’t have VLAN tags or ip option fields).This function will look at each tcp packet, if it’s a TCP SYN packet, we set up a hash for the TCP session using src IP and port, dst IP and port. If it’s a data packet, we want to find whether it’s from a client or a server by using the hash, we also make sure the TCP session state keeps track of whether there is client data or server data
sub processTCPPkt
{
my $srcIp = substr($pktBuf, 26,4);
my $dstIp = substr($pktBuf, 30,4);
my $srcPort = substr($pktBuf, 34,2);
my $dstPort = substr($pktBuf, 36,2);
my $tcpFlags = unpack("C", substr($pktBuf, 47,1));
if ($tcpFlags == 2) #2 means TCP SYN
{
$hashKey = "$srcIp$srcPort$dstIp$dstPort";
if (! defined $tcpSessionState{$hashKey})
{
$tcpSessionState{$hashKey} = 0;
#note that TCP session is only hashed from client side to server side,
i.e. $srcIp is TCP client sip
}
} elsif ($tcpFlags == 0x18) #TCP PSH ACK pkt, a.k.a data packet
{
$hashKey = "$srcIp$srcPort$dstIp$dstPort";
if (defined $tcpSessionState{$hashKey})
#this pkt is from client, see the above for tcp session hash set up
{
$tcpSessionState{$hashKey} |= 1;
#so if the session has client data, the least significent bit will be 1
} else
{
$hashKey = "$dstIp$dstPort$srcIp$srcPort";
if (defined $tcpSessionState{$hashKey})
#this pkt is from server
{
$tcpSessionState{$hashKey} |= 2;
#if the session has server data, the least significient bit will be 1
}
}
}
}
Step 4. After we go through all the packets, we go through all the TCP sessions and print the sessions that have no server data.
foreach $tcpSession (keys %tcpSessionState)
{
if (($tcpSessionState{$tcpSession} & 2) == 0)
{
printSession($tcpSession);
}
}
We save it to a file, say, “findBadSessions.pl”. Here is part of the sample output for the command:
perl findBadSessions.pl bigPcap.pcap
1.1.17.167:7697 --> 1.2.167.89:80
1.1.172.218:7840 --> 1.2.26.37:80
1.1.121.19:7698 --> 1.2.122.238:80
1.1.127.196:7652 --> 1.2.129.100:80
1.1.172.131:7532 --> 1.2.174.40:80
You can then grab a TCP sessions using tcpdump or windump as: windump -r “bigPcap.pcap” -w output.pcap host 1.1.17.167 and tcp port 7697 and then your output.pcap will contain the TCP session, and it is small enough to be opened by Wireshark.
Since the code is arranged to make it easy to read you may want to format it in your favorite coding style and add more error check if you see fit. Whatever you do let me know in the comments section. You may also be wondering about the speed of using Perl in processing a big pcap file. Yes, a program written in C is faster, but Perl is also fast. On my Windows XP (3.1GH, duo core), I ran this Perl program on a 234MB pcap file with 13655 TCP sessions and it took about 2 seconds.
With this method we can also do the following:
- Find which TCP session has retransmissions from a big pcap file
- Open two big pcap files and find out which packets that are present in the first pcap file but not in the second one. This is useful in determining what packets ard dropped by a device under test.
- Determine the average latency between HTTP request and HTTP response.
- Honestly, using this method the options are limitless.
One of the better side benefits of completing this task is when you hear compliments from your colleagues in the form of the question, “How did you find the needle in the haystack?”.
For the sake of completeness, here is the entire Perl script:
my $pktHdrFormat;#depending the endianess of file hdr, the 16 byte pkt hdr should be read accordingly
my %tcpSessionState; #this is used to keep track of all the TCP sessions.
$inputFile = $ARGV[0]; #the first command line parameter is the name of the pcap file
open(FD, "<$inputFile") || die "failed to open $inputFile $!\n";
binmode(FD); #pcap file is a binary file, so open it in binary mode
read(FD, $fileHdr, 24); #skip the 24 byte pcap file header
checkFileHdr($fileHdr); #check the file header to find endianness
#now process each packets
my $pktSecond;
my $pktMicroSecond;
my $capturedPktLen;
my $actualPktLen;
while (!eof(FD))
{
read(FD, $pktHdr, 16);
my ($pktSecond,$pktMicroSecond,$capturedPktLen,$actualPktLen) = unpack($pktHdrFormat, $pktHdr);
#print "$pktSecond,$pktMicroSecond,$capturedPktLen,$actualPktLen\n";
if (read(FD, $pktBuf, $capturedPktLen) != $capturedPktLen ) { print "Failed to read pkt data\n"; last;}
$proto = unpack("C", substr($pktBuf, 23,1));
#this is the byte in ip hdr for protocol type, UDP is 17, TCP is 6
if ($proto == 6) #we are only interested in TCP packet for this task
{
processTCPPkt();
}
}
foreach $tcpSession (keys %tcpSessionState)
{
if (($tcpSessionState{$tcpSession} & 2) == 0)
{
printSession($tcpSession);
}
}
#this function will look at tcp packet, if it's a TCP SYN, we set up a hash for the TCP session
#using src IP and port, dst IP and port. If it's data packet, we want to find whether it's from
#client or server by using hash. For data packet from client, we make sure the tcp session state
#keep tracks of whether there is client data or server data
sub processTCPPkt
{
my $srcIp = substr($pktBuf, 26,4);
my $dstIp = substr($pktBuf, 30,4);
my $srcPort = substr($pktBuf, 34,2);
my $dstPort = substr($pktBuf, 36,2);
my $tcpFlags = unpack("C", substr($pktBuf, 47,1));
if ($tcpFlags == 2) #2 means TCP SYN
{
$hashKey = "$srcIp$srcPort$dstIp$dstPort";
if (! defined $tcpSessionState{$hashKey})
{
#print "SYN\n";
$tcpSessionState{$hashKey} = 0;
#note that TCP session is only hashed from client side to server side,
i.e. $srcIp is TCP client sip
}
} elsif ($tcpFlags == 0x18) #TCP PSH ACK pkt, a.k.a data packet
{
$hashKey = "$srcIp$srcPort$dstIp$dstPort";
if (defined $tcpSessionState{$hashKey})
#this pkt is from client, see the above for tcp session hash set up
{
$tcpSessionState{$hashKey} |= 1;
#so if the session has client data, the least significent bit will be 1
} else
{
$hashKey = "$dstIp$dstPort$srcIp$srcPort";
if (defined $tcpSessionState{$hashKey})
#this pkt is from server
{ $tcpSessionState{$hashKey} |= 2;
#if the session has server data, the least significient bit will be 1
}
}
}
}
sub checkFileHdr
{
my $fHdr = shift;
my $signature = unpack("N", substr($fHdr,0,4));
if ($signature == 0xa1b2c3d4)
{
$pktHdrFormat = "NNNN"
} elsif ($signature == 0xd4c3b2a1)
{
$pktHdrFormat = "VVVV";
} else
{
die "unexpected signature bytes";
}
}
sub printSession
{
my $session = shift;
my @B = unpack("C*", $session);
printf "%d.%d.%d.%d:%d --> %d.%d.%d.%d:%d\n",
$B[0], $B[1],$B[2],$B[3], $B[4]*256+$B[5], $B[6],$B[7],$B[8],$B[9], $B[10]*256+$B[11];
}