Proven Fingerprinting Techniques for Effective CAASM

Proven fingerprinting techniques for effective attack surface management

Tom Sellers

Rob King

Updated September 4, 2024, 2:08pm EDT

One of the key components of runZero's ability to provide asset discovery, exposure management, and attack surface management data is its ability to identify an asset's operating system (OS), hardware, and services aka fingerprinting. This is often performed with very little or even conflicting data.

In this blog, we explore commonly used fingerprinting techniques and gain insights from the runZero Research Team on their approach to deciphering a real-world fingerprinting challenge. Let's go!

Fingerprinting concepts #

For the purposes of this blog, “fingerprinting” is defined as the process of trying to identify, with as much precision as possible, some aspect of an asset. There can be significant variation in the precision that can be achieved when fingerprinting. With certain data we may be able to identify the operating system and exact build number. With different data, it may only be possible to vaguely bucket the asset into an OS family such as “Windows” or “Linux.” For services we can sometimes even determine the programming language it was written in and perhaps a range of language versions that may have been used. All outcomes can be possible against the same asset depending on which protocols and services we can observe.

Fingerprinting techniques generally fall into one of three categories:

three main types of fingerprinting techniques

An example of self identification based fingerprinting would be an SSH MOTD banner of "Red Hat Enterprise Linux Server release 5.11 (Tikanga)". That is pretty straightforward and doesn't require any additional data. Attribute based fingerprinting, which we will discuss further in the next sections, includes looking at various response and data attributes such as TCP field values such as MSS or Window Scale. Behavior based techniques typically take more work to find and implement. An example would be when a particular OS or service implementation drops a TCP connection only when sent a certain payload at a particular stage in protocol negotiation.

A hat by any other name #

Identifying the OS of a network-connected system, without credentials, and with minimal services, has always been a game of precision. Some of the trickiest examples are the forks of the Red Hat Enterprise Linux (RHEL) distribution.

CentOS and certain other Linux distributions such as Oracle Linux were originally forks or “bug and binary compatible” redistributions of Red Hat Enterprise Linux. The relationship changed in 2021 when Red Hat, which acquired CentOS in 2014, discontinued CentOS Linux and created CentOS Stream. With this change CentOS would no longer be downstream of RHEL but would instead be the upstream source from which RHEL is created. The logical flow has since changed again and now has Fedora as the root with both CentOS Stream and RHEL downstream. In response to CentOS Linux being discontinued two new distributions were created: AlmaLinux OS and Rocky Linux.

Often, the only real difference between these distributions is the replacement of Red Hat trademarks and branding with that of the particular Linux project. In many cases, these distributions are byte-for-byte identical at the software package and network levels. These present a challenge to remote fingerprinting as a result.

To overcome these challenges, we collect and analyze enormous amounts of data. Our first pass at trying to differentiate the RHEL derivatives used a combination of two attributes, such as SSH version negotiation strings and the TCP Receive Window size. Over time, we realized this wasn’t going to be sufficient and that we needed more and better data.

Analyzing data at scale is useful, but in situations like this it is vital to know exactly what combination of distribution and version leads to what results. For this effort we built hundreds of virtual machines running as many versions of the different distributions as we could. In some cases, these releases were over two decades old!

Verify target, one SYN only #

From each of these virtual machines we collected as much information as we could about how the TCP stack communicated. While it is true that fingerprinting an operating system via TCP stack quirks has been a thing for years, our challenge was to improve our detection while sending the absolute minimum amount of traffic and, importantly, to look for evidence that would persist through common configuration changes by the system administrators.

To explain our findings, we first need to define some terms:

TCP Receive Window: Maximum amount of data that a particular endpoint can receive and buffer. The sending host has to stop after sending the maximum amount of data and wait for ACK and window updates.
MTU: Maximum Transmission Unit, which is the largest packet that the network interface can accept.
MSS: Maximum Segment Size, which is the maximum amount of TCP data that can fit into a single packet, calculated as the MTU minus the protocol headers.
TCP Window Scale: An optional factor by which the TCP Receive Window is scaled; this allows receive windows to exceed the maximum of 65535 bytes that can be specified in the TCP Receive Window field.

Of the TCP attributes that we observed, the one that provided the murkiest fingerprinting results was the TCP Window Scale. The values for it, when present, range from 0 to 14. With this information, we can usually determine if the target is running a general family of operating systems.

FIGURE 1 - TCP Window Scale by operating system.

Combining the TCP Receive Window and MSS offered the next significant improvement. In our past work, leveraging the Receive Window size sometimes yielded values that seemed to change unexpectedly. The reason why became clear when we looked at the data from the lab.

The key points were:

Changes to the link-layer MTU impacts the value of MSS, since MSS is calculated as the MTU minus the size of certain TCP/IP headers.
MSS is different between IPv6 and IPv4 due to the IPv6 IP headers being 20 bytes larger.
For Linux-based systems, Receive Windows less than the maximum value were almost always an even multiple of MSS. Due to the MSS difference mentioned above this means that the Receive Windows would vary as well.
Critically, the MSS multiplier for Linux-based OSs correlated with the Linux kernel version.

With the information above in hand, we can organize Linux systems into specific kernel version buckets based on the observed multiplier. That is quite a bit of information from the response to a single SYN packet!

FIGURE 2 - Relationship between IPv4/IPv6 MSS Multiplier and Linux Kernel version.

The kernel version also offers a hint as to the relative age of the system. A MSS multiplier of 4 indicates that the machine is likely running an ancient version of Linux, far beyond EOL, and certainly not something that should still be in production.

A little from column A, a little from column B #

TCP-based fingerprinting by itself doesn’t improve fingerprinting of RHEL derivatives as much as we’d like. Since most of the systems in our analysis had SSH running, we looked for patterns in RHEL-derivative type and version in the light of SSH version negotiation advertisements (for example, SSH-2.0-OpenSSH_8.7) combined with the Linux kernel version. This strategy quickly yielded results. We found that we could generally identify the distribution’s major version, and in some cases, minor version range as well.

The screenshots below demonstrate how specific patterns pop out under bulk analysis.

FIGURE 3 - Relationship between different Enterprise Linux distribution versions and various network attributes.

As we can see in this screenshot, by combining SSH version advertisement and various measured TCP attributes, it is possible to narrow the Linux distribution involved, sometimes down to individual point releases. Even when it is not possible to precisely determine the version, it is almost always possible to determine if the distribution in question is derived from RHEL.

FIGURE 4 - runZero detecting operating systems derived from Red Hat Enterprise Linux.

While determining which RHEL-based distribution an asset is running from just SSH remains unsolved, the work involved resulted in greatly improving the ability to assert the OS family, major version, and sometimes minor versions of the OS. This provides customers insight into the state of their asset fleet as well as the age, support, and end of life status of these assets. The same techniques also allow us to fingerprint other operating systems, such as OpenBSD, down to the specific release version.

Final thought #

Precise fingerprinting is the foundation for delivering actionable asset discovery, exposure management, and attack surface management data to any type of organization. The runZero Research Team’s process behind precise fingerprinting enables security and IT teams to better understand where and when to take action against potential threats in their environments.

Want to learn more about runZero’s unique research on the state of asset security? Check out the runZero Research Report for a deeper look into the drivers behind cyber asset attack surface management.

Not a runZero customer yet? Start a free trial and gain complete asset inventory and attack surface visibility in minutes.