Creating Rust-based NodeJS modules

Posted Saturday, June 10th, 2017

Creating NodeJS modules with Rust - sample webapp session from Amora Labs on Vimeo.

The popularity enjoyed by NodeJS in the past years made it a lot easier for front-end developers to become fullstack developers while mastering a single language — Javascript — and thousands JS-based shops flourished. The non-blocking nature of NodeJS and the asynchronous ways of JS made it relatively easy to ship quality webapps while in parallel, the copious amounts of cheap CPU and RAM available on the cloud allowed developers to mostly ignore naive algorithms unless they were working on products at a scale that most developers don’t do such as Facebook and Google stuff.

In this article I’ll show you how to build Rust-based NodeJS modules by exploring how to improve a naive webapp bottleneck. I thought this was a better example than the standard hello world we tend to see in these type of article, still, there are many ways to solve the problem presented here and before complaining to me that the one true solution to this problem lies elsewhere, one should remember that this is a demo crafted to show the reader about NodeJS modules. What is presented here is not a real product. So without further ado, lets first check what is that we are building here.

Airports closer to you webapp

Our webapp allows the user to input a latitude and longitude by hand or by geolocating themselves and check the available airports in a 30 kilometer radius around those coordinates. The data for the airports is inside a CSV file that contains about 46 thousand entries. Our little system needs to check all those entries and present the proper results to the user in a timely manner.

How does it works? Why is it intensive?

As you might remmeber from some past lecture on algorithm analysis and Big O notation, not all algorithms are created equal. Even though you might have the correct solution for a problem, your code might be costlier in terms of CPU and/or memory than other solutions. In the case of our system, we have a big loop where we iterate over about 46.000 airports computing the Haversine Distance between the airport coordinates and the coordinates given by the user to decide if the aiport is close enough and thus included in the result set.

Be aware that the Earth is not a perfect sphere: in an age where some people think the Earth flat, it might be better to make sure people realize that the Earth is also not a perfect sphere which means that the Haversine Distance is always imprecise and includes some errors. You can minimize the errors with some clever math aproximations but still, the value is not the real distance but something close to it. The float points used in the calculation also introduce some tiny errors in the distance as we stray further and further from perfect mathematical constructions.

We do have some intensive parts such as the parsing of the CSV but the loop is the most intensive part of our source code and the real bottleneck on our performance as it runs 46k times for each request.

Building it with NodeJS

Instead of building the CSV parsing and the Haversine calculation from scratch, we’ll benefit from those that came before us and pushed awesome modules into NPM. For this project we’ll use the CSV and the geolib packages to do the blunt work and focus on our naive loop but first lets look into our data representation.

Airports as CSV entries

The CSV file represents each record as a line. In our case, the first line contains the header info with the column names used and every other line contains an airport record. Think of it as a giant excel spreadsheet. It looks like this:

ident,kind,name,coordinates,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code
00A,heliport,Total Rf Heliport,"-74.93360138, 40.07080078",11,NA,US,US-PA,Bensalem,00A,,00A

We’ll convert that kind of data into a JS object we can use.

Airports as an array of objects

After we use the CSV package to parse the data file, we’ll end up with a giant array of objects that ressembles this:

[
    {
        ident: "00A"
        kind: "heliport",
        name: "Total Rf Heliport",
        coordinates: "-74.93360138, 40.07080078",
        elevation_ft: "11",
        continent: "NA",
        iso_country: "US",
        iso_region: "US-PA",
        municipality: "Bensalem",
        gps_code: "00A",
        iata_code: "",
        local_code: "00A"
    }
    ...
    thousand more records
    ...
]

With this array in hand, we can craft our loop

Searching for close airports

Assuming we’ve place our array in a variable called data and that the user passed coordinates in variables lat and lon, the following code would find airports that are 30km or closer.

Tricky bugs: Notice how the coordinates are being split() from the coordinates column and inverted before passing to geolib. Thats because the CSV has them in the format of longitude and latitude instead of the latitude, longitude convention used here. That was tricky to figure out as both values look the same (floats).

let found = data
    .filter(e => {
        let lat1, lon1, lat2, lon2;
        [lon1, lat1] = e.coordinates.split(",");
        [lat2, lon2] = [lat, lon];

        let dist = geolib.getDistance(
            { latitude: lat1, longitude: lon1 },
            { latitude: lat2, longitude: lon2 }
        );

        return dist <= 30000;
    });

Running this search on my modest machine, a Surface 4 Pro with an i5 and 8gb of RAM, I get the following results:

Listening on http://localhost:5000/
params { lat: '-22.903106394777886', lon: '-43.112671367135995' }
node, result count 50
node: 3536.817ms
params { lat: '-22.903106394777886', lon: '-43.112671367135995' }
node, result count 50
node: 3159.232ms
params { lat: '-22.903106394777886', lon: '-43.112671367135995' }
node, result count 50
node: 3620.821ms

So about 3.5 seconds for an answer, in web time that is like an eternity. Our algorithm is a very straight forward one, basically just a filter on a large array. There are many ways to improve this and teams today might be tempted to:

Store all the airports in a database, program the haversine distance as a function in PL/SQL or whatever their database uses, use SELECT calls to find stuff.
Go crazy and spawn 50 microservices on AWS and Elastic Search and Tensor Flow and Machine Learn unicorn magic and kubernetes and new airportOps team, etc, you get the gist about buzzwords.
Delegate all that boring GIS stuff to a third-party API from Google which you have no control over.
$INSERT_OTHER_SOLUTION

And it is not my prerrogative to tell you that any of those things are wrong, you do what is best for your team. In our ficticious NodeJS shop here, we can imagine a group of fullstack JS developers pondering what to do. There might be talks about switching the whole backend to some other language, they might be afraid of not having the necessary skills in whatever language is being considered, or something along those lines, until someone proposes something less drastic — why not rewrite just the loop as a native NodeJS module and call it from our current backend?

This approach requires minimal change to their curent infrastructure and code. It requires no commitment to new external services or trust in external APIs.

Going native with Rust

There are many ways of building native NodeJS modules, writing it in C/C++ is probably the most popular or at least the most common way of doing it but those languages are tricky since there are many ways to shoot yourself in the foot with them if you’re not careful. Our ficticious team of developers might want to play with a safer language, one that makes certain common C/C++ bugs impossible: RUST!

DISCLAIMER: Thanks for making this far in the article, before moving on I need to make sure you understand one thing: I am a Rust newbie. The program presented here is the second program I write in Rust in my life. I haven’t even finished the Rust book yet. While this may sound bad for me being an expert in anything, it is actually a very positive point for Rust as it enables someone like me to ship good and reliable software. So please read on.

If I can make a poor analogy for a second here, lets imagine that the problem presented in this article is a nail and the JS code our developers are using to solve it is a hammer. Instead of find some better and clever way to fix nails, we’ll just replace the JS hammer by borrowing a safer and rusty hammer. The algorithm will be exatcly the same, an iterator that goes through all the 46.000 records, calculates the Haversine distance and places close airports in a result set. So we’re not thinking smarter, we’re just switching tools and doing the same naive algorithm.

Let us work backward, I will show you the results now so that you may be impressed (because I know that since the disclaimer above about my newbiness, you have been dismissing everything I say) and then we’ll check out how it works. So doing the same queries using Rust gets us the following results:

params { lat: '-22.903106394777886', lon: '-43.112671367135995' }
rust, result count 50
rust: 142.168ms
params { lat: '-22.903106394777886', lon: '-43.112671367135995' }
rust, result count 50
rust: 115.435ms
params { lat: '-22.903106394777886', lon: '-43.112671367135995' }
rust, result count 50
rust: 123.389ms

HOLY IMPROVEMENTS BATMAN!!! The exact same algorithm, with the exact same CSV and coordinates is now executing in about 130ms. Thats one magnitude order better than the previous solution and it didn’t even required knuthing the hell out of our algorithms. Now, that I have your attention, lets see how it works:

Searching airports in Rust

Like our previous NodeJS solution, we’ll do the parsing and calculations from scratch, we’ll use the CSV and geo crates. To be more effective in the CSV parsing section, we’ll use serde to deserialize the airports into a nice native Rust structure.

The airport structure

Even though the CSV crate is able to deserialize the airports into a vector of some generic structure, we get better performance by defining a proper structure:

#[derive(Debug, Deserialize, Serialize, Clone)]
struct Airport {
    ident: String,
    kind: String,
    name: String,
    coordinates: String,
    elevation_ft: String,
    continent: String,
    iso_country: String,
    iso_region: String,
    municipality: String,
    gps_code: String,
    iata_code: String,
    local_code: String
}

Thanks to Rust traits we get lots of freebies using the derive attribute which means we don’t need to write code for serializing, deserializing, pretty-printing and cloning our structure 💖🦀💖.

The loop in Rust

With that working and the CSV being parsed into a vector of Airport structes, we can search for close airports with:

for result in rdr.deserialize() {
    let airport: Airport = match result {
        Ok(f) => f,
        Err(_e) => return Ok(JsUndefined::new())
    };

    let v: Vec<&str> = airport.coordinates.split(", ").collect();
    let lon1: f64 = f64::from_str(v[0]).or_else(|_e| JsError::throw(TypeError, "longitude from CSV is wrong"))?;
    let lat1: f64 = f64::from_str(v[1]).or_else(|_e| JsError::throw(TypeError, "latitude from CSV is wrong"))?;
    let p = Point::new(lat1, lon1);
    let dist = p.haversine_distance(&Point::new(lat2, lon2));

    if dist < 30_000.0 {
        r.push(airport.clone());

    }
}

In the snippet above:

rdr: is the CSV reader. The deserialize() call will give us an iterator.
airport: will be an Airport structure.
r: is a vector of airport structures.

As you can see, even though it is in a different language, it is the same algorithm as before. It has some more error checking as we need to make sure that our values are what they are supposed to be (in terms of their types) before being allowed to compute stuff.

Interfacing NodeJS and Rust

This is all very cool but we still need a way to make Rust talk to NodeJS. You could write everything from scratch by building a shared library in Rust and using the ffi NodeJS package to call it but there are easier solutions out there — Neon is a library and toolchain for embedding Rust in your Node.js apps and libraries.

As mentioned above, this was my second Rust program but it was my first Neon project. I was brand new to this, actually to all of this, and yet this project never seemed daunting or beyond my skills (even though I probably butchered most of the Rust best practices here).

Neon makes it almost trivial to build stuff in Rust and test from the NodeJS side of the app. It creates a little scaffold of files and folders for you. In there, there is a native folder that contains your Rust files and cargo.toml. It also builds boilerplate JS files in the lib folder to load and export whatever you’re building in Rust.

An API is provided so that you can work with Javascript types from Rust and also interface with v8. From my experience, if you take your time to make sure that the arguments being passed into your function from NodeJS are what they need to be and be careful assembling your response or callback, it is very easy to arrange this roundtrip between NodeJS and Rust.

Getting the parameters from the call

When the developer inside NodeJS calls our module, they must pass four parameters which are:

the filename with the CSV airport data.
a latitude
a longitude
a callback to receive the array with results

In Rust we are checking them like this:

fn airport_distance(call: Call) -> JsResult<JsUndefined> {

    let scope = call.scope;
    let file: String = call.arguments.require(scope, 0)?.check::<JsString>()?.value();
        let lat2: f64 = call.arguments.require(scope, 1)?.check::<JsNumber>()?.value();
    let lon2: f64 = call.arguments.require(scope, 2)?.check::<JsNumber>()?.value();
    let fn_handle = call.arguments.get(call.scope, 3).unwrap();
    ...

Assembling the response

Unlike Helix which is a similar project to interface Ruby and Rust, Neon has no way to automagically convert our vector of airports to a Javascript object. I ended up coding another loop to go through the vector assembling the JS array with the objects inside it using the low-level functions provided by Neon to assemble such values:

let arr = JsArray::new(scope, r.len() as u32);
let mut i = 0;

for a in r.into_iter() {
    let obj = JsObject::new(scope);

    obj.set("ident", JsString::new(scope, &a.ident).expect("ident from results array is wrong"))?;
    obj.set("kind", JsString::new(scope, &a.kind).expect("kind from results array is wrong"))?;
    obj.set("name", JsString::new(scope, &a.name).expect("name from results array is wrong"))?;
    obj.set("coordinates", JsString::new(scope, &a.coordinates).expect("coordinates from results array is wrong"))?;
    obj.set("elevation_ft", JsString::new(scope, &a.elevation_ft).expect("elevation_ft from results array is wrong"))?;
    obj.set("continent", JsString::new(scope, &a.continent).expect("continent from results array is wrong"))?;
    obj.set("iso_country", JsString::new(scope, &a.iso_country).expect("iso_country from results array is wrong"))?;
    obj.set("iso_region", JsString::new(scope, &a.iso_region).expect("iso_region from results array is wrong"))?;
    obj.set("municipality", JsString::new(scope, &a.municipality).expect("municipality from results array is wrong"))?;
    obj.set("gps_code", JsString::new(scope, &a.gps_code).expect("gps_code from results array is wrong"))?;
    obj.set("iata_code", JsString::new(scope, &a.iata_code).expect("iata_code from results array is wrong"))?;
    obj.set("local_code", JsString::new(scope, &a.local_code).expect("local_code from results array is wrong"))?;

    arr.set(i, obj)?;
    i = i + 1;
}

Again, r is our vector of airport structures and we’re assembling a JS array of objects piece by piece. After that, we’re ready to execute the callback that was passed to Rust with the result array:

if let Some(function) = fn_handle.downcast::<JsFunction>() {
    let args: Vec<Handle<JsArray>> = vec![arr];
    let _ = function.call(scope, JsNull::new(), args);
}

These are all the interesting parts of the code, all the rest is just boilerplate stuff to read CSV, include crates and tell NodeJS the name of our exported function. The whole module has just 128 lines.

Conclusion & Source Code

This has been an interesting experiment for me. I started with a slow but understandable webapp in NodeJS, moved to optmize it using a language that is new to me, ended up getting a lot of performance without ever feeling out of control like I would be if I was doing C/C++ (where I would probably shoot myself in the foot with pointers).

The source code for this webapp is available on my github and by following the instructions on the README file, you’ll be able to replicate all this at your machine. A Dockerfile is also provided in the case you don’t want to install Rust, NodeJS and Neon on your environment.

Rust is a safe and friendly language that doesn’t sacrifice performance and expressiveness. It was easy to build this and I feel empowered to try this type of approach in real world scenarios in the future. I advise all NodeJS developers to try to build a sample module using Rust, get your feet wet, there are friendly rustaceans to help you all the way.