What Are Traffic Analysis and Metadata?

Traffic on Blvd.

In Learning Tree’s System and Network Security Introduction we discuss “traffic analysis,” noting that even if data are encrypted, one can still find out information by looking at who is sending encrypted data to whom. Along that same line, there has been a lot of discussion in the press recently about “metadata” – information about data or sent along with data. That is, metadata is data about data. It is metadata that allows most forms of Internet traffic analysis. In this post I’d like to provide a brief introduction to the ideas of metadata and traffic analysis.

Traffic Analysis

The basic idea behind traffic analysis – at least as it applies here – is that there are patterns in many forms of communication. As we are concerned with cyber security, we’ll talk primarily about patterns in network traffic. Traffic analysis is indeed a broader topic and applies to SIGINT or signals intelligence of military and other security agencies. I’ll leave studying that to your own research.

If we think just about IP or TCP/IP in particular, there is metadata associated with each packet. Specifically, we have at least the source and destination IP addresses and the TCP (or UDP) port numbers. We can learn a lot from those alone. Likewise we can learn a lot from the metadata associated with a call from a mobile phone. That metadata includes at least: calling party, called party, cell information, possibly GPS location information, and call duration.

Consider an IP datagram from an IP address at Company X to one at Company Y. If we see repeated traffic between those two addresses or address ranges, we can surmise that at least one person at each of these companies is communicating. If the communication is over TCP port 25, we see that it is highly likely that email is being exchanged between the two organizations. (Yes, port 25 could be used for something else in order to confuse the observer.) If in that case we have access to the whole IP datagram, and the contents of that datagram are unencrypted, we could read the email messages.

Metadata from wireshark
Wireshark capture showing TCP and IP metadata

That we can read email messages is not surprising. Most long-lived TCP applications such as SNMP (email), FTP (file transfer), and Telnet (remote login) send their data unencrypted. That is because in the earliest days of the ARPAnet (the predecessor of the Internet) the sites were all military and contractors so encryption wasn’t viewed as essential. Computers were also much slower, and any kind of meaningful encryption would have required a significant overhead.

Now, what if email between Companies X and Y suddenly became encrypted or the companies set up a VPN (Virtual Private Network)? From this we might conclude that the communication had a higher requirement for secrecy. That could, perhaps, signal a change in the relationship between the companies. If there had been “unconfirmed reports” of a business transaction between them, this might help confirm that possibility.

We can also gain information from metadata even if all the communication between two parties is encrypted. Consider an employee of company A who suddenly starts sending encrypted data to company B: maybe she is applying for a job (or sharing data she should be keeping confidential). If I make repeated https connections to a particular bank, it is likely that the bank is either a) my bank, or b) one of my clients.

The point is that communication between two or more parties can imply some kind of relationship between the parties. It isn’t a guarantee, but it is a strong indicator.

There are three lessons here:

  1. Metadata is valuable to provide indication of a relationship between parties.
  2. Even when content is encrypted, information about communication can be extracted from the metadata.
  3. When the nature of communication changes (e.g. from unencrypted to encrypted) it may provide information about the nature of the communication.

To avoid leaking information through these avenues, individuals and organizations can employ one or several techniques including:

  • Encrypting all communication so encrypted communication is not perceived as “special
  • Using VPNs to limit exposure of endpoint metadata
  • Using a tool designed to obscure metadata (including the actual source and destination of e.g. an email message) such as Tor.

Of course, these tools and techniques could be used by either the good guys or the bad guys. It is also important to note that intelligence agencies may have ways around many of the countermeasures that they don’t want to share for obvious reasons. But these tools and techniques do have legitimate uses for businesses, government agencies, and individuals.

What do you think about these tools and techniques? Let us know in the comments below.

To your safe computing,
John McDermott

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.