WasThat: Simple Wasm Function

Deciding on Goals

An interpeter must parse, validate, and then execute instructions. For a Ruby interpreter, the instructions are statements in the Ruby language. For a Wasm interpreter, instructions are Wasm bytecode.

I tend to equate an interpreter's implementation complexity with the complexity of the instructions. Glancing at the Wasm spec, there are hundreds of instructions with all sorts of behaviors. Yikes.

When faced with overwhelming numbers, I start to spend think more about the hundreds of decisions ahead. Should I write the parser first? Can I skip validation just to get things working? Do I know how SIMD works? How do I test if the code works?

Taking a breath, I start to formulate a plan. I need to define a "minimal viable product" for this project. For this series of articles, the product is a working interpreter.

Instead of expecting to handle all Wasm bytecode, I want to be able to execute the simplest programs I can think of and then incrementally add support for more instructions.

Fortunately, not every instruction appears in every Wasm program. A minimal viable product could be an interpreter which handles only a subset of all the possible instructions.

Simple Programs

There are many definitions of a "simple" program, so here are a few examples:

Hello World

pub fn hello() {
  println!("Hello world!");
}

Printing Hello World! to the screen is the standard "new to the language" program. Unfortunately, it is not ideal for this project. Strings are inherently more complex than any primitive type such as an i32 or bool. Performing I/O (or in this case just the output) can require thousands of instructions. Error handling will add even more complexity.

For Wasm programs specifically, there is no direct access to system calls (so printing to the console is not simple out of the box by default). "Hello World!" seems too complex for the first program to interpret.

Adding Two Numbers

Another common program is a function which adds two numbers. Something as simple as:

pub fn add(x: i32, y: i32) -> i32 {
  x + y
}

It is far simpler than "Hello World!", but it requires passing in 2 arguments to the function.

Return a Number

When starting to build an interpreter, it is probably best to use the simplest function possible. There are few programs as simple as a function which returns a constant number.

pub fn magic_num() -> i32 {
  42
}

It is an absolutely useless and trivial function, but it is the simplest example to start with.

Generating a Wasm Module

While I decided on the function, I need to generate the actual Wasm bytecode for the interpreter to execute. I'm not well versed in Wasm binary or textual formats, so I decided to write the function in a Rust library and generate the Wasm bytecode from it.

Create the Rust Library

Creating a new Rust library crate:

cargo new --lib return-number

And in the src/lib.rs, I changed the code to:

pub fn magic_num() -> i32 {
  42
}

Building the crate should be successful:

cargo build

Supporting Wasm

In order to support Wasm as a compile target, a few things must be done to the code.

Add cdylib as a crate-type

In Cargo.toml, the crate type needs to be changed to cdylib.

[lib]
crate-type = ["cdylib"]

The cdylib value indicates a dynamic system library should be built when compiling the code. When targeting wasm32-unknown-unknown, a Wasm bytecode module will be created.

If other artifacts still need to be supported (e.g. the crate is depended on by another Rust crate so it should be used like a normal Rust library), then crate-type can take multiple values like:

[lib]
crate-type = ["cdylib", "lib"]

For more information about crate-type, see the Rust Reference on Linkage.

Exporting the Function

For a Rust library, pub on a global function indicates the function may be called by another crate (assuming all the module visibility rules are also followed).

Unfortunately, declaring a public function for other use cases is not as simple. Code is generally very inflexible across external boundaries. Suppose there is a function declared like:

fn add(x: i32, y: i32) -> i32 {
  x + y
}

Calling the function from another Rust function is easy. However, suppose there is a program which is written in a different language like C or Python and it needs to call the Rust function.

It is possible to invoke code from a different language, and it is possible for another language to invoke Rust code. There are restrictions and rules which must be followed. A calling convention (akin to a protocol) is used which dictates various details like how are the arguments passed and what is the state of the CPU and memory when the function returns. The two relevant topics are foreign function interfaces (FFI) and application binary interfaces (ABI) (which are not to be confused with application programming interfaces). If either term is new, I strongly recommend reading more about them.

Back to the problem at hand, since the magic_num function is intended to be used in a Wasm module, it needs to be exposed with a stable ABI. The function should be declared with extern and C as the ABI type like:

#[unsafe(no_mangle)]
pub extern "C" fn magic_num() -> i32 {
    42
}

The no_mangle attribute guarantees the name of the function will not be changed.

The function looks noisier now with the extra keyword and attribute. Unfortunately, the "noise" is required to maintain a stable external interface. If the Rust code also supported being called by C code, it would require similar declarations.

Building for Wasm

Now the code should be buildable as a Wasm module:

cargo build --target wasm32-unknown-unknown --release

There are many target platforms supported by rustc.

The --release is to make the generated artifacts "production only" code in the hopes that it has equal to or less instructions than the "debug" build.

If there is an error saying that the wasm32-unknown-unknown target does not exist (or some std library does not exist), try:

rustup target add wasm32-unknown-unknown

And then re-try the cargo build command again.

In the end, there should be a file in:

target/wasm32-unknown-unknown/release/return_number.wasm

Congrats, there is a generated Wasm module.

Inspecting the Wasm Module

The return_number.wasm should be a small binary file. On my system using Rust 1.86.0, it is 333 bytes.

333 bytes still seems to be too many bytes for a function which just returns a constant. A function returning a constant integer can be implemented with 1 instruction on many CPU architectures.

Printing out the Wasm file's contents in hexadecimal:

xxd return_number.wasm
00000000: 0061 736d 0100 0000 0105 0160 0001 7f03  .asm.......`....
00000010: 0201 0004 0501 7001 0101 0503 0100 1006  ......p.........
00000020: 1903 7f01 4180 80c0 000b 7f00 4180 80c0  ....A.......A...
00000030: 000b 7f00 4180 80c0 000b 0731 0406 6d65  ....A......1..me
00000040: 6d6f 7279 0200 096d 6167 6963 5f6e 756d  mory...magic_num
00000050: 0000 0a5f 5f64 6174 615f 656e 6403 010b  ...__data_end...
00000060: 5f5f 6865 6170 5f62 6173 6503 020a 0601  __heap_base.....
00000070: 0400 412a 0b00 3c04 6e61 6d65 0013 1272  ..A*..<.name...r
00000080: 6574 7572 6e5f 6e75 6d62 6572 2e77 6173  eturn_number.was
00000090: 6d01 0c01 0009 6d61 6769 635f 6e75 6d07  m.....magic_num.
000000a0: 1201 000f 5f5f 7374 6163 6b5f 706f 696e  ....__stack_poin
000000b0: 7465 7200 4d09 7072 6f64 7563 6572 7302  ter.M.producers.
000000c0: 086c 616e 6775 6167 6501 0452 7573 7400  .language..Rust.
000000d0: 0c70 726f 6365 7373 6564 2d62 7901 0572  .processed-by..r
000000e0: 7573 7463 1d31 2e38 362e 3020 2830 3566  ustc.1.86.0 (05f
000000f0: 3938 3436 6638 2032 3032 352d 3033 2d33  9846f8 2025-03-3
00000100: 3129 0049 0f74 6172 6765 745f 6665 6174  1).I.target_feat
00000110: 7572 6573 042b 0a6d 756c 7469 7661 6c75  ures.+.multivalu
00000120: 652b 0f6d 7574 6162 6c65 2d67 6c6f 6261  e+.mutable-globa
00000130: 6c73 2b0f 7265 6665 7265 6e63 652d 7479  ls+.reference-ty
00000140: 7065 732b 0873 6967 6e2d 6578 74         pes+.sign-ext

Looking at the data, I confirm I cannot read binary code like Neo from the Matrix. There are some strings like magic_num which make me hopeful that the module does contain the expected code, but it is hard to understand anything else.

Fortunately, the WebAssembly community has developed many tools to inspect Wasm bytecode. Some of the tools convert the Wasm binary bytecode into the official Wasm textual format (WAT).

I installed wasm-tools on a whim:

cargo install --locked wasm-tools

Then, I ran:

wasm-tools print return_number.wasm

And got back:

(module $return_number.wasm
  (type (;0;) (func (result i32)))
  (table (;0;) 1 1 funcref)
  (memory (;0;) 16)
  (global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
  (global (;1;) i32 i32.const 1048576)
  (global (;2;) i32 i32.const 1048576)
  (export "memory" (memory 0))
  (export "magic_num" (func $magic_num))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2))
  (func $magic_num (;0;) (type 0) (result i32)
    i32.const 42
  )
  (@producers
    (language "Rust" "")
    (processed-by "rustc" "1.86.0 (05f9846f8 2025-03-31)")
  )
  (@custom "target_features" (after code) "\04+\0amultivalue+\0fmutable-globals+\0freference-types+\08sign-ext")
)

The WebAssembly text format uses S-expressions which makes it similar to Scheme/Lisp. There is lots of metadata text with the @producers and @custom which could probably be safely removed if I knew which compiler options to pass in.

There are many unexpected expressions, but a few standout to me:

  • The overall (module $return_number.wasm ...) expression indicates all of the expressions as belonging to the single module.

  • The (type (;0;) (func result i32))) expression seems to declare a function type which returns a single i32 matching the expected "return number" function.

  • The (export "magic_num" (func $magic_num)) function matches the expected exported function name.

  • The (func $magic_num (;0;) (type 0) (result i32) i32.const 42) seems to be equivalent to the expected function body. Assuming $magic_num is just an identifier, the previous export expression is exporting this function definition.

If I read the official spec and the Textual Format, I could probably understand more about the module, but I'm satisified for now.

Conclusion

I have decided on a simple Rust function which I have generated the equivalent Wasm binary bytecode for. The next step should be parsing the Wasm bytecode.

I wrote about a minimal "success" path in generating the Wasm bytecode, but it may be helpful to try to omit a step (such as not using extern "C" in the function declaration) and see what the resulting Wasm bytecode would be.

The source code for this article is located in the WasThat repository under the examples/return-number directory.

Rust 1.86.0 was used to build the code.