Added an article

2024-05-01 12:54:36 +02:00
parent 64a7887a77
commit af4dba4db5
10 changed files with 696 additions and 0 deletions
--- a/content/posts/2024/05/studying-a-communication-protocol/index.md
+++ b/content/posts/2024/05/studying-a-communication-protocol/index.md
@@ -0,0 +1,338 @@
+++
+title = "Studying a communication protocol"
+summary = "Step 2: Using a shark to sniff packets"
+date = "2024-05-01"
+
+tags = ["Reverse Engineering", "Attendance Reader", "TCP", "Sniffing", "Wireshark"]
+categories = ["Projects"]
+series = ["Attendance Reader"]
+series_order = 2
+++
+
+In the previous article, we started studying how the attendance reader client
+works, we even attempted to decompile its executable. In this article, I'd like
+to explore the communication protocol that the client uses to talk to the
+reader.
+
+There are basically two reasons why I didn't immediately reverse-engineer the
+protocol:
+
+1. If I could decompile the executable code, I could create an alternative
+   client much more easily;
+2. Sometimes it's not possible (not easily, at least) to *sniff* a
+   communication 'cause of
+   [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security).
+
+However, decompiling DLLs is far from easy because:
+
+> There's no magic "go back" button, there's a "generate shitty C code with
+> random-ass variable names" button, but that's not a very good button
+>
+> **fasterthanlime** in the [How does the detour crate
+> work?](https://www.youtube.com/watch?v=aLeMCUXFJwY&t=174s) video
+
+If you're interested, the NSA has developed its own decompiler called
+[Ghidra](https://ghidra-sre.org/), check it out.
+
+## Client configuration
+
+In the last article, we only installed the client for Windows but never opened
+it.
+
+Since we need a client that can actually interact with the reader to intercept
+the communication, I reopened my VM with [Windows 10
+AME](https://archive.org/details/windows10-ame-21h1-2021-08-09/) and finished
+configuring the client:
+
+{{< carousel images="images/01-client-setup/*" aspectRatio="16-9"
+interval="1000" >}}
+
+Once the configuration is completed (and after manually modifying some
+configuration files because the client still couldn't see the reader on the
+network), we can request the reader's data over the network.
+
+After opening the client **as an administrator**, pressing the button to
+download data, and waiting **two minutes**, a total of 3543 attendances
+appeared on the screen.
+
+Something's odd: why does it take two minutes to transfer the equivalent of a
+file weighing just under 200 kiB?
+
+Doing some quick math:
+
+{{< katex >}}
+$$
+\frac{3543\ \textrm{lines}}{120\ \textrm{seconds}} \ \cdot\sim460\ \textrm{bit
+per row} = 13.26\ kib/s
+$$
+
+13 kibps of useful throughput on a 100 Mbps connection? ***This sucks!***
+
+I don't want to know what disaster of italian corporate coding could have
+caused this, but I have a feeling I’m about to find out...
+
+## *The quieter you become...*
+
+To analyze the network, I will use [Wireshark](https://wireshark.org), a very
+popular tool for this type of operations.
+
+After installing it and adding our user to the `wireshark` group, we can run it
+and begin to *sniff* all packets on our network interface.
+
+![Wireshark in operation](images/02-wireshark-working.png "Here's Wireshark
+listening to all the packets circulating on my network.")
+
+If this is your first time using a tool like this, you might notice that even
+in a small Local Area Network there are a lot of packets flying around — too
+many to analyze individually.
+
+This is where filters come and save the day. If we type the following string
+into the filter bar:
+
+```
+ip.addr == <Device's IP>
+```
+
+We will see only packets that come *from* or are directed *to* the specified IP
+address. We can also filter traffic that passes through a specific TCP port
+with:
+
+```
+ip.addr == <IP> && tcp.port == <Port>
+```
+
+Filters in Wireshark are a vast argument; here's a [link to the official
+documentation](https://wiki.wireshark.org/DisplayFilters) for those interested.
+
+Once we start recording with the correct filters, we can start another full
+scan of attendances on the official client, and we should see the packet
+exchange between the client and the device in real-time.
+
+![Wireshark with the IP filter](images/03-wireshark-with-filter.png "The
+packets exchanged between the client and the device.")
+
+At the end of the process, we've recorded an astonishing 14,423 packets,
+carrying 3,543 attendances. *Things just get stranger...*
+
+By taking a quick look at the traffic, we can deduce a few things:
+
+1. The transport layer uses the TCP protocol on port `5005`;
+2. [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) is not used,
+   *phew*;
+3. There are at least three phases:
+   * An initial setup phase;
+   * A second phase in which data is exchanged with a few but large packets;
+   * A third phase with many but small packets, where you can occasionally
+     observe employee names in ASCII.
+
+!["test" user in the ASCII box](images/04-test-name.png "A familiar name
+appears in the ASCII box at the bottom right.")
+
+To study the protocol in more depth, we'd need only the content of the TCP
+packets. This is where Wireshar comes in handy.
+
+If we select a packet from the TCP communication we're interested in and
+right-click, selecting `Follow` > `TCP Stream`, Wireshark will automatically
+open the payload of all packets and show only the level-7 traffic.
+
+If we view the data as `Raw`, Wireshark will display the exchanged data in
+hexadecimal format, with messages sent by the client in red and responses from
+the attendance reader in blue.
+
+Now we can copy the payloads into our preferred text editor and start to study
+the protocol.
+
+![The TCP stream shown by Wireshark](images/05-wireshark-tcp-stream.png "This
+is what the message exchange looks like when we open the TCP packets.")
+
+## Fuck around and find out
+
+Now we just need to understand the communication protocol, which,
+unfortunately, isn't in a text-based format like ASCII or UTF-8.
+
+It may seem complex, but it only took me an afternoon to find a comprehensive
+enough solution for what I need to do.
+
+### Requests
+
+Client requests are all 16 bytes long and have this structure:
+
+```regex
+^55aa([0-9a-f]{24})([0-9a-f]{4})$
+```
+
+* The first two bytes are always `55 aa` (`01010101 10101010` in binary);
+* The next 12 bytes specify the client command. I will call them "payload" from
+  now on;
+* Finally, there are two **little-endian** bytes indicating the packet number,
+  starting from `00 00`.
+
+I noticed that the server doesn't check if the last two bytes are sent
+sequentially, so they can remain at `00 00` throughout the message exchange.
+
+### Responses
+
+Server responses do not have a fixed length and are divided into two parts,
+which I will call "header" and "payload." The header is always present and is
+10 bytes long, while the payload can be absent.
+
+When there's no payload, the message acts like a kind of `null`/`ACK`.
+
+```regex
+^aa55([0-9a-f]{16})(?:55aa([0-9a-f]+))?$
+```
+
+* The first two bytes are always `aa 55` (`10101010 01010101` in binary);
+* The following eight bytes are the header. Usually, they are `01 01 00 00 00
+  00 00 00`, but they can change;
+* If a payload is present, the message continues with `55 aa` (`01010101
+  10101010` in binary);
+* The remaining bytes are the payload.
+
+---
+
+### Ping
+
+If we want to perform a "ping" and check if the server responds, we can send a
+request with the payload set to `01 80 00 00 00 00 00 00 00 00 00 00`:
+
+```
+55aa0180000000000000000000000100
+aa550101000000000000
+```
+
+The server will then respond with a packet without a payload and the header set
+to `01 01 00 00 00 00 00 00`.
+
+### Employee uame
+
+Knowing the ID of an employee, it's possible to ask the server for their name
+by sending a request with a payload set to `01 c7 xx xx xx xx 00 00 00 00 14
+00`, where `xx xx xx xx` is a 32-bit **little-endian** integer representing the
+employee ID.
+
+```
+55aa01c7xxxxxxxx0000000014000100
+aa55010100000000000055aaxxxxxxxxxxxxxxxxxxxx4c0000000000595a7c7c0000
+```
+
+The first 10 bits of the payload contain the employee's name; if it's shorter
+than 10 characters, the remaining space will be filled with null terminators
+(`\0`).
+
+These messages comprise almost the entirety of the third phase I described in
+the last chapter, the one with many but small messages. This suggests that the
+client quickly dumps the attendance data, then spends two whole minutes
+downloading the employee's name **for each attendance**, even if it's been
+requested before. Someone should teach these developers the concept of
+[memoization](https://en.wikipedia.org/wiki/Memoization)...
+
+### Total number of records
+
+To ask for the total number of attendances registered on the device, you need
+to send a request with a payload of `01 b4 08 00 00 00 00 00 ff ff 00 00`:
+
+```
+55aa01b4080000000000ffff00000100
+aa550101xxxx00000000
+```
+
+Where `xx xx` is the number of saved attendances represented as a 16-bit
+**little-endian** integer.
+
+65535 maximum requests seem a bit too few, but I guess it's a future-me
+problem.
+
+### Downloading all records
+
+The list of all attendances must be downloaded in blocks, continuing to request
+1024-byte blocks from the server (approximately 85.333 attendances at a time)
+until the entire list is extracted.
+
+To do this, we first have to request the total number of attendances, then send
+a request with a payload of `01 a4 00 00 00 00 xx xx 00 00 00 04`, where `xx
+xx` is the total number of attendances in **little-endian**.
+
+```
+55aa01a400000000xxxx000000040100
+aa55010100000000000055aa ...
+```
+
+The server will respond with a 1026-byte payload, containing the initial
+records followed by two zero bytes.
+
+We can request another 1026-byte block by sending a request with a payload of
+`01 a4 00 00 00 00 00 00 xx xx 00 04`, where `xx xx` is a **little-endian**
+integer starting from `01 00`:
+
+```
+55aa01a4000000000000010000040100
+aa55010100000000000055aa ...
+```
+
+Once the records are finished, the server will start sending padding bytes set
+to `ff` to reach 1026-byte.
+
+### Record structure
+
+Once we have all the registration blocks, we can break them down into
+individual registrations, each one 12 bytes long. I wasn't able to
+understand what all the bytes represent, but the important ones are:
+
+```regex
+..([26ae]).{5}([0-9a-f]{8})([0-9a-f]{8})
+```
+
+* The second byte's two most significant bits indicate if the registration
+  represents an entry or an exit:
+  * If it's `00`, it's the first entry;
+  * If it's `01`, it's the first exit;
+  * If it's `10`, it's the second entry;
+  * If it's `11`, it's the second exit;
+* The second-to-last four bytes represent the employee ID (in
+  **little-endian**);
+* The last four bytes represent the date and time of the attendance (in
+  **little-endian**).
+
+Initially I thought the date was represented as a UNIX Epoch, but it seems to
+have this format when shown as big-endian:
+
+* The first 6 bits represent the minutes;
+* The next 5 bits represent the hours;
+* The next 5 bits represent the days;
+* The next 4 bits represent the months;
+* The last 12 bits represent the years.
+
+---
+
+I suspect that the first four bytes of each attendance may contain:
+
+* The seconds;
+* The recording method (if the employee checked-in with the PIN, fingerprint,
+  or the badge);
+* The recorder ID.
+
+But since these aren't very important fields, I've decided to ignore them for
+now.
+
+## Testing using the terminal
+
+If you want to test communication without writing any program that sends bytes
+over a TCP socket, you can use some basic core utilities like `netcat` and
+`xxd`:
+
+```shell
+# If you're using Bash or Zsh
+function send_bytes { echo -n "$3" | xxd -r -p | timeout 1 nc "$1" "$2" | xxd; }
+
+# If you're using Fish
+function send_bytes -a ip port data
+    echo -n "$data" | xxd -r -p | timeout 1 nc "$ip" "$port" | xxd
+end
+
+send_bytes 127.0.0.1 5005 55aa0180000000000000000000000100
+```
+
+Trying some requests from the examples above, I can confirm everything seems to
+work correctly. In the next article, we'll see how to create a small Rust
+library to extract data from the reader.