Added on: Sunday, 04 June, 2023 | Updated on: Monday, 21 October, 2024 2023
Hello there! This will be the first of many blogposts detailing the work I will be doing for the Tor Project as part of Google Summer of Code 2023.
Here I will detail some of the work I did during some period of time, the challenges I faced and the outcomes of that work.
The project I was selected for is titled “Arti API exploration to build example tools”.
Confused? Let’s break it down.
Arti is the Rust rewrite of the Tor software that allows you to make TCP connections through the Tor Network, run Tor relays and bridges etc. You can read more about it from its repository
Arti was built in mind with the goal that other developers should be able to use Arti to build Tor-powered software, and has incorporated that thinking into its design.
So, it exports some crates that other developers can use in their Rust projects, and has some documentation on them, including some basic demonstration code snippets that you can follow along with at home.
However, Arti is a fairly bleeding-edge project. It didn’t hit version 1.0.0 too long ago, and due to the breakneck speed of development, APIs are not set in stone. There is a lot of breakage that could be potentially encountered by another developer.
In this project, I will be creating certain sample programs in Rust using Arti’s APIs
My goal will be to build my sample programs and document any difficulties that come up.
In this way, the project can get valuable feedback from an outsider who doesn’t have much knowledge of the codebase and the way the Arti proxy does things.
A bit before the GSoC contributor applications opened up, I created
a repo which housed my attempt at building something using
arti-client
and arti-hyper
: A download manager
which would download Tor Browser from the Tor Project website.
Now, I only opened this up to play around with the existing APIs, but I’d actually found a bug on my first day.
I continued to hack on this project and even included it in my proposal, both to link to and to continue to work on in the GSoC work period.
I have worked on the project for a bit and now it is able to create six different circuits through the Tor Network (so that means requests go through six different exit nodes) and download the Linux version of Tor Browser through Tor.
Now that I have been selected, it would probably be a good idea to use this repo to house all my work for this summer, not just the download manager.
To do this, I looked to the Arti repo for inspiration.
Arti’s main repo houses all the crates that Arti needs or exports.
The root directory of the repository has a crates/
folder
which has all the different crates in it.
The Cargo.toml
in the root directory is configured to
create a cargo workspace, which means that it tells cargo that there are
multiple crates inside this one repo.
Once I added my projects inside the crates
folder, I was
able to declare which one I wanted to run using
cargo run --bin <crate-name>
At this point, the download manager was working well, and I wanted to further enhance it by working to add pluggable transport support to it. However, I was not too familiar with the APIs so I pivoted to working on one of the smaller projects in my proposal: a connection checker tool which just tries to connect to a website through Tor via a normal Tor connection, a bridge or obfs4 or Snowflake bridge.
This tool was specifically chosen to be built by me in order to better gauge Arti’s bridge APIs, and while developing this tool I did find some useful feedback for the Arti devs.
There wasn’t really any example code given to connect to a Snowflake proxy. I brought this to the attention of the Arti devs and filed an issue since this wasn’t something I could’ve fixed myself. It is addressed in this MR
The example code I was using did not actually work, this was due to the process not getting an exclusive lock on the cache for the directory info that Arti can share with multiple instances. While normal Arti usage can share this cache info, when using bridges this falls apart and is a known bug.
Since there were some issues with getting bridges working, I shifted to something else.
When reporting this issue over IRC, nickm suggested using
tor_error::Report
to generate an error message instead of
copy pasting panic output or even the Display trait’s output
This was the first I saw of this trait, and it took some delving into
tor-error
’s docs to figure out how to use it.
Essentially, tor_error::Report
implements the
report()
method, which generates a nice, easy to understand
error message from the error that has been caught.
So, instead of writing
function_which_fails().unwrap();
and looking at a complicated panic, you can write
match function_which_fails() {
Ok(_) => {
// some code here
},
Err(e) => {
println!("{}", e.report());
}
}
Now knowing this trait was there for Arti’s APIs, it would’ve made
debugging much easier. So, I created an
MR to add a section on Error Reporting in the docs for
arti-client
Now that this was done, I worked on another project under the proposal.
Since the download manager had gotten enough work for a while and the connection checker was stalled, I decided to work on the DNS resolver, which was a sample program I chose specifically to highlight how non-HTTP(S) TCP-based protocols might utilize Arti to make their connections anonymous.
The DNS resolver will use DNS over TCP to make a query to a DNS server for a particular domain.
I researched the protocol and found that DNS over TCP was virtually identical to regular UDP-based DNS.
This teaching resource helped me understand the DNS header and payload generation, and even provided some dummy values I could use to validate my DNS request.
The first thing I did was write the structs according to the definitions given in the above teaching resource, which I cross-referenced across various other sources.
After that, I resorted to looking for some crate which could directly
serialize and deserialize the structs into Vec<u8>
,
however, after trying serde
and bincode
I
realized that these crates all used their own bespoke format, and I’d
have to just manually write the code to serialize and deserialize.
In order to do that, I defined a trait AsBytes
which has
as_bytes()
method to be implemented by both
Header
and Query
structs (which represent the
DNS header and query message respectively)
After this was done and I was able to verify that the method worked as intended, I ran into another roadblock: I was getting a response of 0 bytes every time.
While at first I did directly send these bytes over Tor, I later
resorted to using tokio::net::UdpSocket
or
tokio::net::TcpStream
in order to validate my crafted
request. This was a good step, since Wireshark would reveal that my
packet was not, in fact, valid.
I’d first dropped down from Tor to TCP, but when even that didn’t really work I went down to UDP. It was here that Wireshark revealed that my packet was a mangled UDP packet.
Apparently, some values weren’t set right, so after fixing that and verifiying in Wireshark, I went up to TCP. Here is where the statement “DNS over TCP was virtually identical to regular UDP-based DNS” falls apart.
The only real differences I could see from my packet and what
dig
generates for DNS over TCP was the following:
Two bytes being added before my DNS message starts:
0x00 0x33
. I just added these in using
.push()
.
The entire TCP payload being of 53 bytes. The dig
command accomplished this by making an additional request, but adding
support for this seemed to be a bit beyond the objective of this
project, so I (at first) copied the bytes from Wireshark and
appended them to the Vec I was sending over the network. When this
worked, I experimented and found if I simply sent a 53 byte packet, even
I padded them with 0x00
, it would be recognized as a DNS
packet.
Now, I don’t really know why this happens. Keep in mind, I was sending the same exact payload over UDP and TCP, yet the previous iteration (without these hacks) only worked in UDP and not in TCP.
Now, even after doing this, I was able to see that even though my machine gets a response back, all that my Rust program saw was zeroes. I later figured out that by fixing the size of the buffer I used to store the response, I would get the response.
This week was just the start, and I’ve been learning something new almost every day. Here’s to more progress in the coming weeks and more improvements to Arti!
This website was made using Markdown, Pandoc, and a custom program to automatically add headers and footers (including this one) to any document that’s published here.
Copyright © 2024 Saksham Mittal. All rights reserved. Unless otherwise stated, all content on this website is licensed under the CC BY-SA 4.0 International License