blog/content/posts/2024/05/studying-a-communication-protocol/index.md

+++
title = "Studying a communication protocol"
summary = "Step 2: Using a shark to sniff packets"
date = "2024-05-01"

tags = ["Reverse Engineering", "Attendance Reader", "TCP", "Sniffing", "Wireshark"]
categories = ["Projects"]
series = ["Attendance Reader"]
series_order = 2
+++

In the previous article, we started studying how the attendance reader client
works, we even attempted to decompile its executable. In this article, I'd like
to explore the communication protocol that the client uses to talk to the
reader.

There are basically two reasons why I didn't immediately reverse-engineer the
protocol:

1. If I could decompile the executable code, I could create an alternative
   client much more easily;
2. Sometimes it's not possible (not easily, at least) to *sniff* a
   communication 'cause of
   [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security).

However, decompiling DLLs is far from easy because:

> There's no magic "go back" button, there's a "generate shitty C code with
> random-ass variable names" button, but that's not a very good button
>
> **fasterthanlime** in the [How does the detour crate
> work?](https://www.youtube.com/watch?v=aLeMCUXFJwY&t=174s) video

If you're interested, the NSA has developed its own decompiler called
[Ghidra](https://ghidra-sre.org/), check it out.

## Client configuration

In the last article, we only installed the client for Windows but never opened
it.

Since we need a client that can actually interact with the reader to intercept
the communication, I reopened my VM with [Windows 10
AME](https://archive.org/details/windows10-ame-21h1-2021-08-09/) and finished
configuring the client:

{{< carousel images="images/01-client-setup/*" aspectRatio="16-9"
interval="1000" >}}

Once the configuration is completed (and after manually modifying some
configuration files because the client still couldn't see the reader on the
network), we can request the reader's data over the network.

After opening the client **as an administrator**, pressing the button to
download data, and waiting **two minutes**, a total of 3543 attendances
appeared on the screen.

Something's odd: why does it take two minutes to transfer the equivalent of a
file weighing just under 200 kiB?

Doing some quick math:

{{< katex >}}
$$
\frac{3543\ \textrm{lines}}{120\ \textrm{seconds}} \ \cdot\sim460\ \textrm{bit
per row} = 13.26\ kib/s
$$

13 kibps of useful throughput on a 100 Mbps connection? ***This sucks!***

I don't want to know what disaster of italian corporate coding could have
caused this, but I have a feeling I’m about to find out...

## *The quieter you become...*

To analyze the network, I will use [Wireshark](https://wireshark.org), a very
popular tool for this type of operations.

After installing it and adding our user to the `wireshark` group, we can run it
and begin to *sniff* all packets on our network interface.

![Wireshark in operation](images/02-wireshark-working.png "Here's Wireshark
listening to all the packets circulating on my network.")

If this is your first time using a tool like this, you might notice that even
in a small Local Area Network there are a lot of packets flying around — too
many to analyze individually.

This is where filters come and save the day. If we type the following string
into the filter bar:

```
ip.addr == <Device's IP>
```

We will see only packets that come *from* or are directed *to* the specified IP
address. We can also filter traffic that passes through a specific TCP port
with:

```
ip.addr == <IP> && tcp.port == <Port>
```

Filters in Wireshark are a vast argument; here's a [link to the official
documentation](https://wiki.wireshark.org/DisplayFilters) for those interested.

Once we start recording with the correct filters, we can start another full
scan of attendances on the official client, and we should see the packet
exchange between the client and the device in real-time.

![Wireshark with the IP filter](images/03-wireshark-with-filter.png "The
packets exchanged between the client and the device.")

At the end of the process, we've recorded an astonishing 14,423 packets,
carrying 3,543 attendances. *Things just get stranger...*

By taking a quick look at the traffic, we can deduce a few things:

1. The transport layer uses the TCP protocol on port `5005`;
2. [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) is not used,
   *phew*;
3. There are at least three phases:
   * An initial setup phase;
   * A second phase in which data is exchanged with a few but large packets;
   * A third phase with many but small packets, where you can occasionally
     observe employee names in ASCII.

!["test" user in the ASCII box](images/04-test-name.png "A familiar name
appears in the ASCII box at the bottom right.")

To study the protocol in more depth, we'd need only the content of the TCP
packets. This is where Wireshar comes in handy.

If we select a packet from the TCP communication we're interested in and
right-click, selecting `Follow` > `TCP Stream`, Wireshark will automatically
open the payload of all packets and show only the level-7 traffic.

If we view the data as `Raw`, Wireshark will display the exchanged data in
hexadecimal format, with messages sent by the client in red and responses from
the attendance reader in blue.

Now we can copy the payloads into our preferred text editor and start to study
the protocol.

![The TCP stream shown by Wireshark](images/05-wireshark-tcp-stream.png "This
is what the message exchange looks like when we open the TCP packets.")

## Fuck around and find out

Now we just need to understand the communication protocol, which,
unfortunately, isn't in a text-based format like ASCII or UTF-8.

It may seem complex, but it only took me an afternoon to find a comprehensive
enough solution for what I need to do.

### Requests

Client requests are all 16 bytes long and have this structure:

```regex
^55aa([0-9a-f]{24})([0-9a-f]{4})$
```

* The first two bytes are always `55 aa` (`01010101 10101010` in binary);
* The next 12 bytes specify the client command. I will call them "payload" from
  now on;
* Finally, there are two **little-endian** bytes indicating the packet number,
  starting from `00 00`.

I noticed that the server doesn't check if the last two bytes are sent
sequentially, so they can remain at `00 00` throughout the message exchange.

### Responses

Server responses do not have a fixed length and are divided into two parts,
which I will call "header" and "payload." The header is always present and is
10 bytes long, while the payload can be absent.

When there's no payload, the message acts like a kind of `null`/`ACK`.

```regex
^aa55([0-9a-f]{16})(?:55aa([0-9a-f]+))?$
```

* The first two bytes are always `aa 55` (`10101010 01010101` in binary);
* The following eight bytes are the header. Usually, they are `01 01 00 00 00
  00 00 00`, but they can change;
* If a payload is present, the message continues with `55 aa` (`01010101
  10101010` in binary);
* The remaining bytes are the payload.

---

### Ping

If we want to perform a "ping" and check if the server responds, we can send a
request with the payload set to `01 80 00 00 00 00 00 00 00 00 00 00`:

```
55aa0180000000000000000000000100
aa550101000000000000
```

The server will then respond with a packet without a payload and the header set
to `01 01 00 00 00 00 00 00`.

### Employee name

Knowing the ID of an employee, it's possible to ask the server for their name
by sending a request with a payload set to `01 c7 xx xx xx xx 00 00 00 00 14
00`, where `xx xx xx xx` is a 32-bit **little-endian** integer representing the
employee ID.

```
55aa01c7xxxxxxxx0000000014000100
aa55010100000000000055aaxxxxxxxxxxxxxxxxxxxx4c0000000000595a7c7c0000
```

If the response header is set to `01 00 00 00 00 00 00 00`, then this means
that the username was not found. However, if it is set to `01 01 00 00 00 00 00
00`, then the first 10 bits of the payload represent the employee's name.

If the name is shorter than 10 characters, the remaining space will be filled
with null-terminator characters `\0`.

These messages comprise almost the entirety of the third phase I described in
the last chapter, the one with many but small messages. This suggests that the
client quickly dumps the attendance data, then spends two whole minutes
downloading the employee's name **for each attendance**, even if it's been
requested before. Someone should teach these developers the concept of
[memoization](https://en.wikipedia.org/wiki/Memoization)...

### Total number of records

To ask for the total number of attendances registered on the device, you need
to send a request with a payload of `01 b4 08 00 00 00 00 00 ff ff 00 00`:

```
55aa01b4080000000000ffff00000100
aa550101xxxx00000000
```

Where `xx xx` is the number of saved attendances represented as a 16-bit
**little-endian** integer.

65535 maximum requests seem a bit too few, but I guess it's a future-me
problem.

### Downloading all records

The list of all attendances must be downloaded in blocks, continuing to request
1024-byte blocks from the server (approximately 85.333 attendances at a time)
until the entire list is extracted.

To do this, we first have to request the total number of attendances, then send
a request with a payload of `01 a4 00 00 00 00 xx xx 00 00 00 04`, where `xx
xx` is the total number of attendances in **little-endian**.

```
55aa01a400000000xxxx000000040100
aa55010100000000000055aa ...
```

The server will respond with a 1026-byte payload, containing the initial
records followed by two zero bytes.

We can request another 1026-byte block by sending a request with a payload of
`01 a4 00 00 00 00 00 00 xx xx 00 04`, where `xx xx` is a **little-endian**
integer starting from `01 00`:

```
55aa01a4000000000000010000040100
aa55010100000000000055aa ...
```

Once the records are finished, the server will start sending padding bytes set
to `ff` to reach 1026-byte.

### Record structure

Once we have all the registration blocks, we can break them down into
individual registrations, each one 12 bytes long. I wasn't able to
understand what all the bytes represent, but the important ones are:

```regex
..([26ae]).{5}([0-9a-f]{8})([0-9a-f]{8})
```

* The second byte's two most significant bits indicate if the registration
  represents an entry or an exit:
  * If it's `00`, it's the first entry;
  * If it's `01`, it's the first exit;
  * If it's `10`, it's the second entry;
  * If it's `11`, it's the second exit;
* The second-to-last four bytes represent the employee ID (in
  **little-endian**);
* The last four bytes represent the date and time of the attendance (in
  **little-endian**).

Initially I thought the date was represented as a UNIX Epoch, but it seems to
have this format when shown as big-endian:

* The first 6 bits represent the minutes;
* The next 5 bits represent the hours;
* The next 5 bits represent the days;
* The next 4 bits represent the months;
* The last 12 bits represent the years.

---

I suspect that the first four bytes of each attendance may contain:

* The seconds;
* The recording method (if the employee checked-in with the PIN, fingerprint,
  or the badge);
* The recorder ID.

But since these aren't very important fields, I've decided to ignore them for
now.

## Testing using the terminal

If you want to test communication without writing any program that sends bytes
over a TCP socket, you can use some basic core utilities like `netcat` and
`xxd`:

```shell
# If you're using Bash or Zsh
function send_bytes { echo -n "$3" | xxd -r -p | timeout 1 nc "$1" "$2" | xxd; }

# If you're using Fish
function send_bytes -a ip port data
    echo -n "$data" | xxd -r -p | timeout 1 nc "$ip" "$port" | xxd
end

send_bytes 127.0.0.1 5005 55aa0180000000000000000000000100
```

Trying some requests from the examples above, I can confirm everything seems to
work correctly. In the next article, we'll see how to create a small Rust
library to extract data from the reader.