Nonchalant Guidance

Added on: Sunday, 04 June, 2023 | Updated on: Monday, 21 October, 2024 2023

GSoC 2023 Blog 1

Hello there! This will be the first of many blogposts detailing the work I will be doing for the Tor Project as part of Google Summer of Code 2023.

Here I will detail some of the work I did during some period of time, the challenges I faced and the outcomes of that work.

Brief Intro on My Project

The project I was selected for is titled “Arti API exploration to build example tools”.

Confused? Let’s break it down.

Arti is the Rust rewrite of the Tor software that allows you to make TCP connections through the Tor Network, run Tor relays and bridges etc. You can read more about it from its repository
Arti was built in mind with the goal that other developers should be able to use Arti to build Tor-powered software, and has incorporated that thinking into its design.

So, it exports some crates that other developers can use in their Rust projects, and has some documentation on them, including some basic demonstration code snippets that you can follow along with at home.

However, Arti is a fairly bleeding-edge project. It didn’t hit version 1.0.0 too long ago, and due to the breakneck speed of development, APIs are not set in stone. There is a lot of breakage that could be potentially encountered by another developer.
In this project, I will be creating certain sample programs in Rust using Arti’s APIs

My goal will be to build my sample programs and document any difficulties that come up.

Maybe certain APIs are hard to use, or undocumented, or certain operations cause Arti to fail (exposing a bug). All these issues will be brought to the notice of the Arti team and fixes can be discussed and implemented.

In this way, the project can get valuable feedback from an outsider who doesn’t have much knowledge of the codebase and the way the Arti proxy does things.

Reorganizing work repository

A bit before the GSoC contributor applications opened up, I created a repo which housed my attempt at building something using arti-client and arti-hyper: A download manager which would download Tor Browser from the Tor Project website.

Now, I only opened this up to play around with the existing APIs, but I’d actually found a bug on my first day.

I continued to hack on this project and even included it in my proposal, both to link to and to continue to work on in the GSoC work period.

I have worked on the project for a bit and now it is able to create six different circuits through the Tor Network (so that means requests go through six different exit nodes) and download the Linux version of Tor Browser through Tor.

Now that I have been selected, it would probably be a good idea to use this repo to house all my work for this summer, not just the download manager.

To do this, I looked to the Arti repo for inspiration.

Arti’s main repo houses all the crates that Arti needs or exports. The root directory of the repository has a crates/ folder which has all the different crates in it.

The Cargo.toml in the root directory is configured to create a cargo workspace, which means that it tells cargo that there are multiple crates inside this one repo.

Once I added my projects inside the crates folder, I was able to declare which one I wanted to run using cargo run --bin <crate-name>

Making connections through bridges

At this point, the download manager was working well, and I wanted to further enhance it by working to add pluggable transport support to it. However, I was not too familiar with the APIs so I pivoted to working on one of the smaller projects in my proposal: a connection checker tool which just tries to connect to a website through Tor via a normal Tor connection, a bridge or obfs4 or Snowflake bridge.

This tool was specifically chosen to be built by me in order to better gauge Arti’s bridge APIs, and while developing this tool I did find some useful feedback for the Arti devs.

There wasn’t really any example code given to connect to a Snowflake proxy. I brought this to the attention of the Arti devs and filed an issue since this wasn’t something I could’ve fixed myself. It is addressed in this MR
The example code I was using did not actually work, this was due to the process not getting an exclusive lock on the cache for the directory info that Arti can share with multiple instances. While normal Arti usage can share this cache info, when using bridges this falls apart and is a known bug.

Since there were some issues with getting bridges working, I shifted to something else.

Making error reporting easier for new developers

When reporting this issue over IRC, nickm suggested using tor_error::Report to generate an error message instead of copy pasting panic output or even the Display trait’s output

This was the first I saw of this trait, and it took some delving into tor-error’s docs to figure out how to use it.

Essentially, tor_error::Report implements the report() method, which generates a nice, easy to understand error message from the error that has been caught.

So, instead of writing

function_which_fails().unwrap();

and looking at a complicated panic, you can write

match function_which_fails() {
  Ok(_) => {
    // some code here
  },
  Err(e) => {
    println!("{}", e.report());
  }
}

Now knowing this trait was there for Arti’s APIs, it would’ve made debugging much easier. So, I created an MR to add a section on Error Reporting in the docs for arti-client

Now that this was done, I worked on another project under the proposal.

Working on the DNS resolver

Since the download manager had gotten enough work for a while and the connection checker was stalled, I decided to work on the DNS resolver, which was a sample program I chose specifically to highlight how non-HTTP(S) TCP-based protocols might utilize Arti to make their connections anonymous.

The DNS resolver will use DNS over TCP to make a query to a DNS server for a particular domain.

I researched the protocol and found that DNS over TCP was virtually identical to regular UDP-based DNS.

This teaching resource helped me understand the DNS header and payload generation, and even provided some dummy values I could use to validate my DNS request.

The first thing I did was write the structs according to the definitions given in the above teaching resource, which I cross-referenced across various other sources.

After that, I resorted to looking for some crate which could directly serialize and deserialize the structs into Vec<u8>, however, after trying serde and bincode I realized that these crates all used their own bespoke format, and I’d have to just manually write the code to serialize and deserialize.

In order to do that, I defined a trait AsBytes which has as_bytes() method to be implemented by both Header and Query structs (which represent the DNS header and query message respectively)

After this was done and I was able to verify that the method worked as intended, I ran into another roadblock: I was getting a response of 0 bytes every time.

While at first I did directly send these bytes over Tor, I later resorted to using tokio::net::UdpSocket or tokio::net::TcpStream in order to validate my crafted request. This was a good step, since Wireshark would reveal that my packet was not, in fact, valid.

I’d first dropped down from Tor to TCP, but when even that didn’t really work I went down to UDP. It was here that Wireshark revealed that my packet was a mangled UDP packet.

Apparently, some values weren’t set right, so after fixing that and verifiying in Wireshark, I went up to TCP. Here is where the statement “DNS over TCP was virtually identical to regular UDP-based DNS” falls apart.

The only real differences I could see from my packet and what dig generates for DNS over TCP was the following:

Two bytes being added before my DNS message starts: 0x00 0x33. I just added these in using .push().
The entire TCP payload being of 53 bytes. The dig command accomplished this by making an additional request, but adding support for this seemed to be a bit beyond the objective of this project, so I (at first) copied the bytes from Wireshark and appended them to the Vec I was sending over the network. When this worked, I experimented and found if I simply sent a 53 byte packet, even I padded them with 0x00, it would be recognized as a DNS packet.

Now, I don’t really know why this happens. Keep in mind, I was sending the same exact payload over UDP and TCP, yet the previous iteration (without these hacks) only worked in UDP and not in TCP.

Now, even after doing this, I was able to see that even though my machine gets a response back, all that my Rust program saw was zeroes. I later figured out that by fixing the size of the buffer I used to store the response, I would get the response.

Conclusion

This week was just the start, and I’ve been learning something new almost every day. Here’s to more progress in the coming weeks and more improvements to Arti!

This website was made using Markdown, Pandoc, and a custom program to automatically add headers and footers (including this one) to any document that’s published here.