15 KiB
+++ title = "Re-implementing a protocol in Rust" summary = "Setp 3: Creating a library based on reverse engineering" date = "2024-08-01"
tags = ["Library", "Attendance Reader", "TCP", "Rust"] categories = ["Projects"] series = ["Attendance Reader"] series_order = 3 +++
In the previous article, we managed to understand the meaning of the packets exchanged between the official client and the attendance reader.
There is only one thing left to do: Rewrite the API in Rust!
Recreating the Official API
To start, let's install Rust and create a new project using Cargo, Rust's package manager, using the following command:
cargo new r701
We can then open the project with our text editor of choice.
Since we need to create a library, let's create the file src/lib.rs
and start
writing the struct that will describe our reader:
// src/lib.rs
use std::io::Result;
use std::net::{TcpStream, ToSocketAddrs};
#[derive(Debug)]
pub struct R701 {
tcp_stream: TcpStream,
sequence_number: u16,
}
impl R701 {
pub fn connect(connection_info: impl ToSocketAddrs) -> Result<Self> {
// Create a new R701 struct
let mut new = Self {
tcp_stream: TcpStream::connect(connection_info)?,
sequence_number: 0,
};
// Try to ping the endpoint
new.ping()?;
Ok(new)
}
}
Our struct contains two fields:
tcp_stream
, which contains the descriptor of the connection to our reader;sequence_number
, which stores the number of the last packet sent.
To test if our struct connects correctly, we can modify the file src/main.rs
so that it connects to our endpoint:
// src/main.rs
use r701::R701;
fn main() {
let r701 = R701::connect("127.0.0.1:5005").unwrap();
println!("{:?}", r701);
}
If we now run cargo run
...
Hurray! Our client successfully connects to the TCP server!
The next step will be to use the library std::net::TcpStream to execute the queries we derived from our attempt at reverse engineering and obtain and process the responses.
Since all requests have a standard
structure, we can
create a method that takes as input the payload of a request (represented by a
slice of 12 u8
) and returns a Vec<u8>
containing the response:
// src/lib.rs
impl R701 {
// ...
pub fn request(&mut self, payload: &[u8; 12]) -> Result<Vec<u8>> {
// Create a blank request
let mut request = [0x55, 0xaa, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
// Insert the payload
request[2..14].clone_from_slice(payload);
// Insert the sequence number
request[14..].clone_from_slice(&self.sequence_number.to_le_bytes());
self.sequence_number += 1;
// Send the request
self.tcp_stream.write_all(&request)?;
// Create a buffer and return the response
let mut buffer = BufReader::new(&self.tcp_stream);
Ok(buffer.fill_buf()?.to_vec())
}
}
We can verify that everything works correctly by sending a ping packet and expecting the correct response:
// src/main.rs
use r701::R701;
fn main() {
let r701 = R701::connect("127.0.0.1:5005").unwrap();
assert_eq!(
r701.request(&[0x01, 0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]).unwrap(),
[0xaa, 0x55, 0x01, 0x01, 0, 0, 0, 0, 0, 0],
);
}
We could even make ping a method in our struct:
// src/lib.rs
impl R701 {
// ...
pub fn ping(&mut self) -> Result<()> {
// Create a request with a payload of `01 80 00 00 00 00 00 00 00 00 00 00`
let response = self.request(&[0x01, 0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])?;
// If the response is not `aa 55 01 01 00 00 00 00 00 00` then return an error
if response != [0xaa, 0x55, 0x01, 0x01, 0, 0, 0, 0, 0, 0] {
return Err(Error::new(InvalidData, "Malformed response"));
}
Ok(())
}
}
In this way we can also create methods to obtain the name of an employee, the total number of records, and a block of records.
If you are interested, all the source code is already available at nicolabelluti/r701.
{{< gitea server="https://git.nicolabelluti.me" repo="nicolabelluti/r701" >}}
Extracting Attendances via the TryInto Trait
Once we have created the method that allows us to extract a block of attendances, we need to find the idiomatic way to transform it from an array of bytes to a struct that represents a single attendance.
To start, let's do some refactoring by renaming src/lib.rs
to src/r701.rs
and creating a new src/lib.rs
containing these lines:
// src/lib.rs
mod r701;
pub use r701::R701;
This way, the external interface of our library will not change, but we can organize our code into different files.
Let's add the file src/record.rs
and include it in src/lib.rs
// src/lib.rs
mod r701;
mod record;
pub use r701::R701;
pub use record::{Record, Clock};
// src/record.rs
use chrono::{DateTime, Local, TimeZone};
pub enum Clock {
FirstIn,
FirstOut,
SecondIn,
SecondOut,
}
pub struct Record {
pub employee_id: u32,
pub clock: Clock,
pub datetime: DateTime<Local>,
}
With this code, we have defined the structure of a record, which, as we mentioned in the previous article, consists of the employee ID, the date and time it was recorded, and the state (whether it is the first entry, the first exit, the second entry, or the second exit).
Since we don't want to go crazy managing time, let's import the chrono crate for date management:
cargo add chrono --no-default-features --features clock
To facilitate the conversion from a byte vector to our Record
struct, we can
implement the
TryInto trait:
// src/record.rs
impl TryFrom<&[u8]> for Record {
type Error = &'static str;
fn try_from(record_bytes: &[u8]) -> Result<Self, Self::Error> {
// ...
}
}
The finished code is available here.
We can test if the conversion is correct through a simple test:
// src/record.rs
// ...
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn valid_record_conversion() {
let record_bytes: &[u8] = &[0x10, 0x23, 0x0b, 0x1d, 0x01, 0, 0, 0, 0xb2, 0x17, 0x01, 0];
assert_eq!(
record_bytes.try_into(),
Ok(Record {
employee_id: 1,
clock: Clock::FirstIn,
datetime: Local.with_ymd_and_hms(1970, 1, 1, 0, 0, 0).single().unwrap(),
})
)
}
}
Putting It All Together with Iterators
Once we have found a way to extract bytes from the device and a way to convert them into a struct, we need to find the idiomatic way to combine the two, and this is where iterators come into play.
To implement the
Iterator trait, we
only need to define the next()
method, which, starting from the first
element, returns the next element.
Once this method is defined, we will have access to many other tools, such as map(), filter(), fold(), and, if we import the itertools crate, also sorted() and into_group_map_by(), just to name a few.
First, let's create a new struct RecordIterator
with a from()
constructor
that allows us to generate an iterator by taking a mutable reference to an
R701
struct as input:
// src/lib.rs
mod r701;
mod record;
mod record_iterator;
pub use r701::R701;
pub use record::{Record, Clock};
pub use record_iterator::RecordIterator;
// src/record_iterator.rs
use crate::R701;
use std::io::Result;
#[derive(Debug)]
pub struct RecordIterator<'a> {
r701: &'a mut R701,
input_buffer: Vec<u8>,
sequence_number: u16,
total_records: u16,
record_count: u16,
}
impl<'a> RecordIterator<'a> {
pub fn from(r701: &'a mut R701) -> Result<Self> {
// ...
}
}
The from()
method requires the reader to provide the total number of
timestamps and the first block of attendances, saving them respectively in the
total_records
variable and the input_buffer
vector.
The next()
method of the Iterator
trait will then take the first 12 bytes
from the input_buffer
and transform them into a Record
struct using the
TryInto
trait that we implemented in the previous chapter.
When the input_buffer
is empty, the reader is requested for another block of
attendances until all are read.
If you are interested, all the code is already available on Git.
// src/record_iterator.rs
// ...
impl<'a> Iterator for RecordIterator<'a> {
type Item = Record;
fn next(&mut self) -> Option<Self::Item> {
// ...
}
}
Just for completeness, we can implement an into_record_iter
method in the
R701
struct to simplify the use of the iterator:
// src/r701.rs
use crate::RecordIterator;
// ...
impl R701 {
// ...
pub fn into_record_iter(&mut self) -> Result<RecordIterator> {
RecordIterator::from(self)
}
}
Making Everything Blazingly Fast
First, let's create a main that creates a file with the same structure as the
AGLog_001.txt
file we saw in the first
chapter
of this series:
// src/main.rs
use r701::R701;
fn main() {
let mut r701 = R701::connect("127.0.0.1:5005").unwrap();
println!("No\tMchn\tEnNo\t\tName\t\tMode\tIOMd\tDateTime\t");
r701.into_record_iter()
.unwrap()
.collect::<Vec<_>>()
.iter()
.enumerate()
.for_each(|(id, record)| {
let name = r701
.get_name(record.employee_id)
.unwrap()
.unwrap_or(format!("user #{}", record.employee_id));
println!(
"{:0>6}\t{}\t{:0>9}\t{: <10}\t{}\t{}\t{}",
id + 1,
1,
record.employee_id,
name,
35,
record.clock as u8,
record.datetime.format("%Y/%m/%d %H:%M:%S"),
);
});
}
With this main()
, we can obtain all the records in just under a minute, which
is half the time taken by the official closed-source
client.
We are slightly cheating, as our client cannot extract the ID of the recorder,
the attendance recording method, and the seconds of the DateTime
field, but
for now we can ignore them as they are superfluous fields.
Memoizing Employee Names
To speed things up even more, we could avoid asking the reader for the name of the employee for each record.
We can create a HashMap
of names and, for each record, check if the name is
already present in it. If not, we can ask the reader for the employee's name
and then save it in the HashMap
.
This way, we reduce the number of requests to the minimum required.
// src/main.rs
use r701::R701;
use std::collections::HashMap;
fn main() {
let mut names = HashMap::new();
let mut r701 = R701::connect("127.0.0.1:5005").unwrap();
println!("No\tMchn\tEnNo\t\tName\t\tMode\tIOMd\tDateTime\t");
r701.into_record_iter()
.unwrap()
.collect::<Vec<_>>()
.iter()
.enumerate()
.for_each(|(id, record)| {
let name = names.entry(record.employee_id).or_insert_with(|| {
r701.get_name(record.employee_id)
.unwrap()
.unwrap_or(format!("user #{}", record.employee_id))
});
// ...
});
}
With this simple modification, we go from obtaining all records in a minute to obtaining them in one second. Now that is blazingly fast!
Limiting Attendance Reading to a Certain Time Frame
Since I am interested in the data from the last month, we can use the take_while() and skip_while() methods to exclude all elements prior to last month and to stop the iterator once all relevant records have been extracted:
// src/main.rs
use r701::R701;
use std::collections::HashMap;
use chrono::{Local, TimeZone};
fn main() {
let start = Local.with_ymd_and_hms(2024, 7, 1, 0, 0, 0).unwrap();
let end = Local.with_ymd_and_hms(2024, 8, 1, 0, 0, 0).unwrap();
let mut names = HashMap::new();
let mut r701 = R701::connect("127.0.0.1:5005").unwrap();
println!("No\tMchn\tEnNo\t\tName\t\tMode\tIOMd\tDateTime\t");
r701.into_record_iter()
.unwrap()
.take_while(|record| record.datetime < end)
.skip_while(|record| record.datetime < start)
.collect::<Vec<_>>()
.iter()
.enumerate()
.for_each(|(id, record)| {
// ...
});
}
This modification does not improve performance in any way, but there is one last very simple improvement we can apply for this specific use case...
Reading Records in Reverse
Instead of starting from the first record ever registered and excluding all records until we reach the first of the month we're interested in, we could read the records in reverse, starting from the most recent one and going back to the oldest.
This improvement requires a few modifications, but it is worth it considering that it reduces the time from just under a second to 0.2 seconds!
// src/main.rs
// ...
fn main() {
// ...
println!("No\tMchn\tEnNo\t\tName\t\tMode\tIOMd\tDateTime\t");
r701.into_record_iter()
.unwrap()
.take_while(|record| record.datetime >= start)
.skip_while(|record| record.datetime >= end)
.collect::<Vec<_>>()
.iter()
.rev()
.enumerate()
.for_each(|(id, record)| {
// ...
});
}