Exploring Hourglass APIs in Rust

There are two talks on APIs that I think every programmer should watch, learn, and study from:

Designing and Evaluating Reusable Components from Casey Muratori. This absolutely wonderful talk is the fundamental source for how to design APIs.
Hourglass Interfaces for C++ APIs from Stefanus Du Toit. This talk discusses having a rich API for users that is backed by a C API into proprietary code.

These two talks give a really good and grounded lesson on fundamental API design - the choices you make and the ramifications they can have. In particular, the design challenge and solution of the hourglass API design is a really nice approach to being able to ship a high functioning API that is backed by some proprietary software that you don’t want to ship. For example:

struct MyDataOpaque;

extern "C" MyDataOpaque* mydata_create(int);
extern "C" void mydata_destroy(MyDataOpaque*);

struct MyData final {
  explicit MyData(int someState) : opaque(mydata_create(someState)) {}

  ~MyData() { mydata_destroy(opaque); }

private:
  MyDataOpaque* const opaque;
};

This is a classic hourglass API in C++ - we’ve got the C API that returns some opaque data, and the hourglass MyData struct lets us expose this to C++ users in a way they are familiar with. In Stefanus’ talk he went into detail about how they also implement the C API in C++ - meaning that both sides of the thin C API are written in the high level language they enjoy - C++.

This all got me thinking - would this approach be possible in some fashion with Rust? First the caveats:

I suspect, but am not well versed enough with Rust, that using different versions of Rust to build either side of the hourglass API could cause explosions. So I’m going to assume that both sides are built with the same version.
I really hope that it is safe to use memory allocating functions (like Box or Vec) on both Rust sides of a C API.

These pretty major caveats aside - is it possible?

The Bottom of the Hourglass⌗

First we’ll implement the bottom of the hourglass - this would be the proprietary code that you don’t want to ship (all your secret sauce might be in it!).

#[repr(C)]
pub struct MyDataOpaque {
  some_state: i32,
}

impl MyDataOpaque {
  pub fn new(some_state: i32) -> MyDataOpaque {
    MyDataOpaque { some_state }
  }
}

#[no_mangle]
pub extern fn mydata_create(some_state: i32) -> *mut MyDataOpaque {
  Box::into_raw(Box::new(MyDataOpaque::new(some_state)))
}

#[no_mangle]
pub extern fn mydata_destroy(d : *mut MyDataOpaque) {
  // This causes the Drop to be called on the box, freeing everything.
  let _ = unsafe{ Box::from_raw(d) };
}

It’s a quite simple bit of code (for our simple example), but the crux of it is just some opaque struct, and some unmangled public extern symbols for the exported C API.

Building the Two Crate Solution⌗

I wanted a way to build a Rust crate for the above, then build with cargo on a second crate that uses it. I did not want to have a single instantiation of cargo because I wanted to be 100% sure that I was exercising the export and import paths of Rust. To do this I used a build.rs like:

use std::env;
use std::process::Command;

fn main() {
    let out_dir = env::var_os("OUT_DIR").unwrap();
    let working_dir = env::var_os("CARGO_MANIFEST_DIR").unwrap();

    Command::new("cargo")
        .arg("build")
        .arg("--manifest-path")
        .arg(&format!(
            "{}/MyDataOpaque/Cargo.toml",
            working_dir.to_str().unwrap()
        ))
        .arg("--target-dir")
        .arg(&format!("{}", out_dir.to_str().unwrap()))
        .arg("--release")
        .status()
        .unwrap();

    println!(
        "cargo:rustc-link-search=native={}/release",
        out_dir.to_str().unwrap()
    );
}

This build script calls cargo in a sub-folder called MyDataOpaque, builds the crate, and then adds the built library as a link requirement to our crate.

Building the top of the Hourglass⌗

So now we’ve got a Rust library, exposed via a C API, that we want to write Rust bindings for.

Just to prove it worked I first tried using cbindgen to generate a C API for the bottom of the hourglass, and then bindgen to take this C API and generate FFI bindings for it in the top of the hourglass, but these have a ton of dependencies (like libclang) that I wasn’t happy about. So I instead decided to just manually write the C API for the top of the hourglass.

Aside: assuming this approach was viable you could forsee an hourglass_bindgen crate that did something similar to what cbindgen does, but just writes out the Rust FFI to the C API instead.

#[repr(C)]
pub struct MyDataOpaque {
    _private: [u8; 0],
}

#[link(name = "MyDataOpaque")]
extern "C" {
    fn mydata_create(some_state: i32) -> *mut MyDataOpaque;
    fn mydata_destroy(opaque: *mut MyDataOpaque);
}

pub struct MyData {
    opaque: *mut MyDataOpaque,
}

impl MyData {
    pub fn new(some_state: i32) -> MyData {
        MyData {
            opaque: unsafe { mydata_create(some_state) },
        }
    }
}

impl Drop for MyData {
    fn drop(&mut self) {
        unsafe {
            mydata_destroy(self.opaque);
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn create_destroy() {
        let _ = MyData::new();
    }
}

As we can see this code is very similar to the C++ example I showed at the beginning of the post - it wraps a C API in a Rust struct and then can use the existing safe mechanisms of Rust (like the Drop trait) to ensure safety of the data used.

And if I run it?

running 1 test
test tests::create_destroy ... ok

It works, nice!

Conclusion⌗

Hourglass APIs appear to work in Rust (minus the caveats above that I’ll have to explore further). I do think that maybe using no_std in the bottom of the hourglass could mitigate even issues with multiple versions of Rust being used, but again I’ll have to verify it.

You could even forsee of an approach where in the build.rs instead of building some other crate (whose source you don’t want to ship), you could use the TARGET to fetch the correctly pre-built version from a web service, or maybe have a bunch of the library versions resident in the crate (obviously the total crate size limit might become an issue though), and pick between them.

It’s pretty awesome to me that this approach looks feasible though!