Cookie Consent by Free Privacy Policy Generator ๐Ÿ“Œ Introduction to Code Generation in Rust

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š Introduction to Code Generation in Rust


๐Ÿ’ก Newskategorie: Programmierung
๐Ÿ”— Quelle: dev.to

This article is about generating Rust code from other Rust code, not for the code generation step of the rustc compiler. Another term for source code generation is metaprogramming, but it will be referred to as code generation here. The reader is expected to have some Rust knowledge.

What problems can it solve?

I want to ship a web frontend embedded inside a Rust binary to end users, such as a desktop application. Projects like Tauri achieve embedding with code generation by writing Rust code that generates more Rust code. Why does Tauri choose to use code generation over less complicated solutions? Letโ€™s take a look at what that solution might look like.

Imagine the output of our web frontend looks like:

dist
โ”œโ”€โ”€ assets
โ”‚  โ”œโ”€โ”€ script-44b5bae5.js
โ”‚  โ”œโ”€โ”€ style-48a8825f.css
โ”œโ”€โ”€ index.html

Letโ€™s embed these in our Rust project by using include_str!(), which adds the content of the specified file into the binary. That would look something like this:

use std::collections::HashMap;

fn main() {
    let mut assets = HashMap::new();

    assets.insert(
        "/index.html",
        include_str!("../dist/index.html")
    );

    assets.insert(
        "/assets/script-44b5bae5.js",
        include_str!("../dist/assets/script-44b5bae5.js")
    );

    assets.insert(
        "/assets/style-48a8825f.css",
        include_str!("../dist/assets/style-48a8825f.css")
    );
}

Straightforward enough, now we can grab those assets directly from the final binary! However, what if we donโ€™t always know the assetsโ€™ filenames ahead of time? Letโ€™s say we have worked more on our frontend project and now its output looks like:

dist
โ”œโ”€โ”€ assets
โ”‚  โ”‚ # script-44b5bae5.js previously
โ”‚  โ”œโ”€โ”€ script-581f5c69.js
โ”‚  โ”‚
โ”‚  โ”‚ # style-48a8825f.css previously
โ”‚  โ”œโ”€โ”€ style-e49f12aa.css
โ”œโ”€โ”€ index.html

Ahโ€ฆ the filenames of our assets have changed due to our frontend bundler utilizing cache busting. The Rust code no longer compiles until we fix the filenames inside of it. It would be a terrible developer experience if we had to update our Rust code every time we changed the frontend - imagine if we had dozens of assets! Tauri uses code generation to avoid this by finding the assets at compile time and generating Rust code which calls the correct assets.

Tools

Letโ€™s talk about a few tools for code generation and then use them to implement our own simple asset bundler.

  • The quote crate enables us to write Rust code that gets transformed into data which then generates syntactically correct Rust code. This crate is ubiquitous across the Rust ecosystem for writing code generation.

  • The walkdir crate provides an easy way to recursively grab all items in a directory. This crate is highly applicable for our asset bundler use-case.

  • The phf crate implements a HashMap implementation using perfect hash functions. This is useful when all keys and values in the map are known before itโ€™s built. This crate is highly applicable for our asset bundler use-case.

Rust code generation typically occurs in build scripts or macros. We will be building our simple asset bundler using build scripts because we will be accessing the disk. While procedural macros can also do that, it can be problematic in a few ways.

Building the Assets Bundler

The source code is available on GitHub if you want to see how everything is put together afterwards.

Create our library

Letโ€™s start off with creating a new Rust library:

cargo new --lib asset-bundler
cd asset-bundler

We want to create a way for applications that use this library to grab the assets, so letโ€™s create that first. This will involve us creating a wrapper around phf::Map and a method to let callers get the content.

cargo add phf --features macros

We donโ€™t need too much functionality from our Assets struct, just a way to create it and a way to get at whatโ€™s inside of it. The following goes into src/lib.rs:

pub use phf; // re-export phf so we can use it later

type Map = phf::Map<&'static str, &'static str>;

/// Container for compile-time embedded assets.
pub struct Assets(Map);

impl From<Map> for Assets {
    fn from(value: Map) -> Self {
        Self(value)
    }
}

impl Assets {
    /// Get the contents of the specified asset path.
    pub fn get(&self, path: &str) -> Option<&str> {
        self.0.get(path).copied()
    }
}

Codegen

Now, we build the library that will be used in a build script to generate our code. Because we will be having multiple crates in the same repository, letโ€™s quickly convert the project to a cargo workspace. Letโ€™s add the following to the top of our Cargo.toml:

[workspace]
members = ["codegen"]

Now we are ready to continue creating our codegen library. Run these commands to create our project and grab our dependencies:

cargo new --lib codegen --name asset-bundler-codegen
cargo add quote walkdir --package asset-bundler-codegen

Time to think a bit of what functionality we need and boil it down into a few concrete steps.

  • We pass an assets path to our function, which we will call base.

  • We check if base exists, or else we canโ€™t do anything.

  • Recursively gather all file paths inside base.

  • Generate code to embed all the file paths.

One last thing to mention, we want to get assets by passing in a relative path. We want assets.get("index.html"), not assets.get("../dist/index.html"). This means we will need to keep track of that base directory passed into our function. Letโ€™s write those requirements down as code inside of codegen/src/lib.rs:

/// Generate Rust code to create an [`asset-bundler::Asset`] from the passed path.
pub fn codegen(path: &Path) -> std::io::Result<String> {
    // canonicalize also checks if the path exists
    // which is the only case that makes sense for us
    let base = path.canonicalize()?;

    let paths = gather_asset_paths(&base);
    Ok(generate_code(&paths, &base))
}

/// Recursively find all files in the passed directory.
fn gather_asset_paths(base: &Path) -> Vec<PathBuf> {
  todo!()
}

/// Generate Rust code to create an [`asset-bundler::Asset`].
fn generate_code(paths: &[PathBuf], base: &Path) -> String {
  todo!()
}

Letโ€™s take on gather_assets_paths first, since itโ€™s more specific to our project than codegen. We will use walkdir to recursively grab all the files from the passed base directory. This is a simple example project, so we will ignore errors for now by using flatten() which removes nested iterators. Because Result also implementโ€™s IntoIterator, we are only left with successful values. Letโ€™s implement it in codegen/src/lib.rs:

/// Recursively find all files in the passed directory.
fn gather_asset_paths(base: &Path) -> Vec<PathBuf> {
  let mut paths = Vec::new();
  for entry in WalkDir::new(base).into_iter().flatten() {
    // we only care about files, ignore directories
    if entry.file_type().is_file() {
      paths.push(entry.into_path())
    }
  }

  paths
}

Cool cool cool.

Now we have a list of all asset files that are supposed to be included in the binary. The second function will generate the actual Rust code, but letโ€™s see what the code we are generating should look like. We need to make sure that:

  • We import the correct dependencies.

  • The phf::Map is created with all the values, we can use phf::phf_map! to help.

  • Our Assets struct from our first library is created.

The first point is pretty important, we need to make sure we are calling the correct library. We can prevent crate name collisions by using a leading :: on our use statement. Additionally, we need to make sure we have our re-exported phf, otherwise the end application will fail to compile if it itself doesnโ€™t depend on phf.

Using the frontend example from above, this is how phf_map! should look like:

use ::asset_bundler::{Assets, phf::{self, phf_map}};

let map = phf_map! {
  "index.html" => include_str!("../dist/index.html"),
  "assets/script-44b5bae5.js" => include_str!("../dist/assets/script-44b5bae5.js"),
  "assets/style-48a8825f.css" => include_str!("../dist/assets/style-48a8825f.css")
};

let assets = Assets::from(map);

Our first problem comes from us only having the paths used in include_str!(), we donโ€™t have the โ€œkeyโ€ paths. We also need to turn our paths into strings at some point, because that is how they are used in the generated code. Letโ€™s first figure out how to transform our list of paths into a list of strings suitable for keys. We need to strip the base prefix we resolved earlier from all the paths, so letโ€™s write that inside of codegen/src/lib.rs:

/// Turn paths into relative paths suitable for keys.
fn keys(paths: &[PathBuf], base: &Path) -> Vec<String> {
  let mut keys = Vec::new();

  for path in paths {
    // ignore this failure case for this example
    if let Ok(key) = path.strip_prefix(base) {
      keys.push(key.to_string_lossy().into())
    }
  }

  keys
}

The values of the map are easier. Their paths are already the ones [include_dir!()] need, so we just need to turn them into strings. Letโ€™s write this one with an Iterator, which we also could have done with keys:

let values = paths.iter().map(|p| p.to_string_lossy());

So now we have both keys and values in usable formats. Next comes the macro part, where we will actually be generating code from all the data.

Letโ€™s talk about how we are about to use double brackets. This is not something required when doing code generation, but in our case we want to use the resulting Assets anywhere. By using a block expression we can use it anywhere an expression is valid, which is lots of places.

Second, we are about to use some very unfamiliar syntax for those of you who have not written macros before. While it may seem strange at first, the syntax here is widely used across the ecosystem. In particular, we are going to be using the repetition syntax of quote. This allows us to use our two collections of keys and values together.

Letโ€™s do it:

quote! {{
  use ::asset_bundler::{Assets, phf::{self, phf_map}};
  Assets::from(phf_map! {
    #( #keys => include_str!(#values) ),*
  })
}}

While the syntax is surely a departure from normal Rust code, hopefully you are able to recognize some familiar patterns we already went over. Hereโ€™s a side-by-side comparison to the phf_map! example we did before:

let keys = ["key1", "key2", "key3"];
let values = ["value1", "value2", "value3"];
quote! {
  phf_map! {
    #( #keys => include_str!(#values) ),*
  }
}

// turns into this
phf_map! {
  "key1" => include_str!("value1"),
  "key2" => include_str!("value2"),
  "key3" => include_str!("value3")
}

With all that out of the way, letโ€™s plug that into our generate_code function we created earlier to see how it interacts with the rest of the code. Inside of codegen/src/lib.rs:

/// Generate Rust code to create an [`asset-bundler::Asset`].
fn generate_code(paths: &[PathBuf], base: &Path) -> String {
  let keys = keys(paths, base);
  let values = paths.iter().map(|p| p.to_string_lossy());

  // double brackets to make it a block expression
  let output = quote! {{
        use ::asset_bundler::{Assets, phf::{self, phf_map}};
        Assets::from(phf_map! {
            #( #keys => include_str!(#values) ),*
        })
    }};

  output.to_string()
}

/// Turn paths into relative paths suitable for keys
fn keys(paths: &[PathBuf], base: &Path) -> Vec<String> {
  let mut keys = Vec::new();

  for path in paths {
    // ignore this failure case for this example
    if let Ok(key) = path.strip_prefix(base) {
      keys.push(key.to_string_lossy().into())
    }
  }

  keys
}

Phew! That actually wraps up the codegen library. Iโ€™ll drop the full codegen/src/lib.rs here, and then we can skedaddle to actually using what we just worked on:

use quote::quote;
use std::path::{Path, PathBuf};
use walkdir::WalkDir;

/// Generate Rust code to create an [`asset-bundler::Asset`] from the passed path.
pub fn codegen(path: &Path) -> std::io::Result<String> {
  // canonicalize also checks if the path exists
  // which is the only case that makes sense for us
  let base = path.canonicalize()?;

  let paths = gather_asset_paths(&base);
  Ok(generate_code(&paths, &base))
}

/// Recursively find all files in the passed directory.
fn gather_asset_paths(base: &Path) -> Vec<PathBuf> {
  let mut paths = Vec::new();
  for entry in WalkDir::new(base).into_iter().flatten() {
    // we only care about files, ignore directories
    if entry.file_type().is_file() {
      paths.push(entry.into_path())
    }
  }

  paths
}

/// Generate Rust code to create an [`asset-bundler::Asset`].
fn generate_code(paths: &[PathBuf], base: &Path) -> String {
  let keys = keys(paths, base);
  let values = paths.iter().map(|p| p.to_string_lossy());

  // double brackets to make it a block expression
  let output = quote! {{
        use ::asset_bundler::{Assets, phf::{self, phf_map}};
        Assets::from(phf_map! {
            #( #keys => include_str!(#values) ),*
        })
    }};

  output.to_string()
}

/// Turn paths into relative paths suitable for keys.
fn keys(paths: &[PathBuf], base: &Path) -> Vec<String> {
  let mut keys = Vec::new();

  for path in paths {
    // ignore this failure case for this example
    if let Ok(key) = path.strip_prefix(base) {
      keys.push(key.to_string_lossy().into())
    }
  }

  keys
}

Using it

We just made a simple asset bundler in 50 lines of code, and itโ€™s time to use it! We will start off with creating a new example project to consume the two libraries we just created.

First, add a new item to the root Cargo.toml:

[workspace]
members = ["codegen", "example"]

Then, we create the example binary and add our dependencies:

cargo new --bin example
cargo add asset-bundler --path . --package example
cargo add --build asset-bundler-codegen --path codegen --package example
touch example/build.rs
mkdir -p example/assets/scripts

Letโ€™s start off the Rust code with the build script since we just created our codegen library. We will want to call the codegen function we created earlier to get the generated code. Now we can write this generated Rust code to somewhere our other code can use it. This is going into our example/build.rs:

use std::path::Path;

fn main() {
    let assets = Path::new("assets");
    let codegen = match asset_bundler_codegen::codegen(assets) {
        Ok(codegen) => codegen,
        Err(err) => panic!("failed to generate asset bundler codegen: {err}"),
    };

    let out = std::env::var("OUT_DIR").unwrap();
    let out = Path::new(&out).join("assets.rs");
    std::fs::write(out, codegen.as_bytes()).unwrap();
}

We ended up writing the code to $OUT_DIR/assets.rs because build scripts set $OUT_DIR to a unique directory for each crate, and new versions of the same crate. The path we just wrote to will be important in just a second, but first letโ€™s create some assets to actually use.

We want to create some assets that are somewhat representative of the example we used at the start. In this case, letโ€™s imagine that these assets are for a webserver and the files are served to the browser. This article isnโ€™t the place for implementing the server, but we will mimic the index.htmlโ€™s script dependencies by using what asset they require as their contents. Run these commands to create them:

echo -n "scripts/loader-a1b2c3.js" > example/assets/index.html
echo -n "scripts/dashboard-f0e9d8.js" > example/assets/scripts/loader-a1b2c3.js
echo -n "console.log('dashboard stuff')" > example/assets/scripts/dashboard-f0e9d8.js

Itโ€™s time to put it together and get a glimpse of how it works! We set up the examples so that there is only a single โ€œalways knownโ€ filename index.html. Our goal is to get the content of that dashboard script using only a index.html literal. Here we will jump to the each next asset in example/src/main.rs:

fn main() {
  // include the assets our build script created
  let assets = include!(concat!(env!("OUT_DIR"), "/assets.rs"));

  let index = assets.get("index.html").unwrap();
  let loader = assets.get(index).unwrap();
  let dashboard = assets.get(loader).unwrap();

  assert_eq!(dashboard, "console.log('dashboard stuff')");
}

Donโ€™t forget, you can see all the code on GitHub.

Thatโ€™s it!

A very bare-bones asset bundler in 94 lines of code, including the example. Treating code generation like any other Rust code is an important aspect to keeping it understandable and maintainable. In those 90 lines of code, there were only a handful of lines for doing actual code generation. Letโ€™s break down what we didโ€ฆ

  • We created the asset-bundler crate that provides the Assets type and re-exported phf to ensure that our codegen crate could use it.

  • We created the asset-bundler-codegen crate to hold all the functionality codegen uses, along with providing a public function codegen to utilize it.

  • We created the example build script to call the codegen function on its own assets. The generated code was written to a file which we then included in our example/main.rs.

While having a separate crate isnโ€™t necessary for specifically build script code generation, it is very common. Not only does it help separate concerns and prevent unused dependencies, it also helps prevent circular dependencies on more complex projects. Having a separate crate is required for performing code generation with procedural macros.

Code generation is a powerful tool to bring advanced functionality to your Rust programs. Our example from earlier, Tauri, uses it extensively to perform code injection, compression, and validation for its own asset bundling.

Demystify code generation by writing it as regular Rust code, empowering you to build powerful software.

Author: Chip Reed, Security Engineer at CrabNebula

...



๐Ÿ“Œ Introduction to Code Generation in Rust


๐Ÿ“ˆ 31.78 Punkte

๐Ÿ“Œ Introduction to the series [1 of 35] | Beginner's Series to: Rust | Beginner's Series to Rust


๐Ÿ“ˆ 29.73 Punkte

๐Ÿ“Œ How Does AI Code Generation Work? And what are some popular AI code generation tools?


๐Ÿ“ˆ 23.1 Punkte

๐Ÿ“Œ Introduction to format string vulnerabilities - Introduction to Binary Exploitation - Hack The Box Leet Test


๐Ÿ“ˆ 21.45 Punkte

๐Ÿ“Œ Functions in Rust: a good introduction


๐Ÿ“ˆ 20.23 Punkte

๐Ÿ“Œ Rust GraphQL APIs for NodeJS Developers: Introduction


๐Ÿ“ˆ 20.23 Punkte

๐Ÿ“Œ Rust Tutorial: Learn How To Be Productive In Rust (2018)


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Programmiersprache: Rust 1.31 markiert den Start von Rust 2018


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Medium CVE-2019-12083: Rust-lang RUST


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust Survey 2019: Rust-Nutzer wollen besser abgeholt werden


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Experimente mit Rust: Microsoft prรคsentiert Public Preview von Rust/WinRT


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust 1.31 Released As 'Rust 2018' In Major Push For Backwards Compatibility


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Kubernetes mit Rust: Microsoft setzt fรผr Cloud-Entwicklung auf Rust


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Kubernetes mit Rust: Microsoft setzt fรผr Cloud-Entwicklung auf Rust


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Security: Denial of Service in rust-cbindgen und rust (SUSE)


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Denial of Service in rust-cbindgen und rust (SUSE)


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Announcing Rust 1.45.0 | Rust Blog


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Denial of Service in rust-cbindgen und rust (SUSE)


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ yaml-rust Crate up to 0.4.0 on Rust Deserialization Recursion denial of service


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust: Rust-Foundation soll gegrรผndet werden


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Medium CVE-2020-25792: Rust-lang RUST


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Medium CVE-2020-25794: Rust-lang RUST


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Medium CVE-2020-25791: Rust-lang RUST


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Medium CVE-2021-31162: Rust-lang RUST


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust fรผr Linux: Google fรถrdert die Umstellung des Linux-Kernels auf Rust โ€“ finanziert Entwickler fรผr ein Jahr


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust Fuzzing #3: How to write (better) Rust fuzz targets?


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Digging through Rust to find Gold: Extracting Secrets from Rust Malware


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ heise-Angebot: betterCode() Rust: Online-Konferenz โ€“ Dein praxisnaher Einstieg in Rust


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust developers can now use explicit disciminants on enums with fields thanks to Rust 1.66 changes


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust WebAssembly (wasm) on Arch Linux with Webpack (Rust 1.66)


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust Basics Series #1: Create and Run Your First Rust Program


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust Basics Series #2: Using Variables and Constants in Rust Programs


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust Basics Series #3: Data Types in Rust


๐Ÿ“ˆ 19.01 Punkte

๐Ÿ“Œ Rust Basics Series #4: Arrays and Tuples in Rust


๐Ÿ“ˆ 19.01 Punkte











matomo