Deciding on Goals
An interpeter must parse, validate, and then execute instructions. For a Ruby interpreter, the instructions are statements in the Ruby language. For a Wasm interpreter, instructions are Wasm bytecode.
I tend to equate an interpreter's implementation complexity with the complexity of the instructions. Glancing at the Wasm spec, there are hundreds of instructions with all sorts of behaviors. Yikes.
When faced with overwhelming numbers, I start to spend think more about the hundreds of decisions ahead. Should I write the parser first? Can I skip validation just to get things working? Do I know how SIMD works? How do I test if the code works?
Taking a breath, I start to formulate a plan. I need to define a "minimal viable product" for this project. For this series of articles, the product is a working interpreter.
Instead of expecting to handle all Wasm bytecode, I want to be able to execute the simplest programs I can think of and then incrementally add support for more instructions.
Fortunately, not every instruction appears in every Wasm program. A minimal viable product could be an interpreter which handles only a subset of all the possible instructions.
Simple Programs
There are many definitions of a "simple" program, so here are a few examples:
Hello World
pub fn hello() {
println!("Hello world!");
}
Printing Hello World! to the screen is the standard "new to the language"
program. Unfortunately, it is not ideal for this project. Strings are inherently
more complex than any primitive type such as an i32 or bool. Performing I/O
(or in this case just the output) can require thousands of instructions. Error
handling will add even more complexity.
For Wasm programs specifically, there is no direct access to system calls (so printing to the console is not simple out of the box by default). "Hello World!" seems too complex for the first program to interpret.
Adding Two Numbers
Another common program is a function which adds two numbers. Something as simple as:
pub fn add(x: i32, y: i32) -> i32 {
x + y
}
It is far simpler than "Hello World!", but it requires passing in 2 arguments to the function.
Return a Number
When starting to build an interpreter, it is probably best to use the simplest function possible. There are few programs as simple as a function which returns a constant number.
pub fn magic_num() -> i32 {
42
}
It is an absolutely useless and trivial function, but it is the simplest example to start with.
Generating a Wasm Module
While I decided on the function, I need to generate the actual Wasm bytecode for the interpreter to execute. I'm not well versed in Wasm binary or textual formats, so I decided to write the function in a Rust library and generate the Wasm bytecode from it.
Create the Rust Library
Creating a new Rust library crate:
cargo new --lib return-number
And in the src/lib.rs, I changed the code to:
pub fn magic_num() -> i32 {
42
}
Building the crate should be successful:
cargo build
Supporting Wasm
In order to support Wasm as a compile target, a few things must be done to the code.
Add cdylib as a crate-type
In Cargo.toml, the crate type needs to be changed to cdylib.
[lib]
crate-type = ["cdylib"]
The cdylib value indicates a dynamic system library should be built when
compiling the code. When targeting wasm32-unknown-unknown, a Wasm bytecode
module will be created.
If other artifacts still need to be supported (e.g. the crate is depended on by
another Rust crate so it should be used like a normal Rust library), then
crate-type can take multiple values like:
[lib]
crate-type = ["cdylib", "lib"]
For more information about crate-type, see the Rust Reference on
Linkage.
Exporting the Function
For a Rust library, pub on a global function indicates the function may be called by
another crate (assuming all the module visibility rules are also followed).
Unfortunately, declaring a public function for other use cases is not as simple. Code is generally very inflexible across external boundaries. Suppose there is a function declared like:
fn add(x: i32, y: i32) -> i32 {
x + y
}
Calling the function from another Rust function is easy. However, suppose there is a program which is written in a different language like C or Python and it needs to call the Rust function.
It is possible to invoke code from a different language, and it is possible for another language to invoke Rust code. There are restrictions and rules which must be followed. A calling convention (akin to a protocol) is used which dictates various details like how are the arguments passed and what is the state of the CPU and memory when the function returns. The two relevant topics are foreign function interfaces (FFI) and application binary interfaces (ABI) (which are not to be confused with application programming interfaces). If either term is new, I strongly recommend reading more about them.
Back to the problem at hand, since the magic_num function is intended to be
used in a Wasm module, it needs to be exposed with a stable ABI. The function
should be declared with extern and C as the ABI type like:
#[unsafe(no_mangle)]
pub extern "C" fn magic_num() -> i32 {
42
}
The no_mangle attribute guarantees the name of the function will not be
changed.
The function looks noisier now with the extra keyword and attribute.
Unfortunately, the "noise" is required to maintain a stable external interface.
If the Rust code also supported being called by C code, it would require
similar declarations.
Building for Wasm
Now the code should be buildable as a Wasm module:
cargo build --target wasm32-unknown-unknown --release
There are many target platforms supported by rustc.
The --release is to make the generated artifacts "production only" code in the
hopes that it has equal to or less instructions than the "debug" build.
If there is an error saying that the wasm32-unknown-unknown target does not
exist (or some std library does not exist), try:
rustup target add wasm32-unknown-unknown
And then re-try the cargo build command again.
In the end, there should be a file in:
target/wasm32-unknown-unknown/release/return_number.wasm
Congrats, there is a generated Wasm module.
Inspecting the Wasm Module
The return_number.wasm should be a small binary file. On my system using Rust
1.86.0, it is 333 bytes.
333 bytes still seems to be too many bytes for a function which just returns a constant. A function returning a constant integer can be implemented with 1 instruction on many CPU architectures.
Printing out the Wasm file's contents in hexadecimal:
xxd return_number.wasm
00000000: 0061 736d 0100 0000 0105 0160 0001 7f03 .asm.......`....
00000010: 0201 0004 0501 7001 0101 0503 0100 1006 ......p.........
00000020: 1903 7f01 4180 80c0 000b 7f00 4180 80c0 ....A.......A...
00000030: 000b 7f00 4180 80c0 000b 0731 0406 6d65 ....A......1..me
00000040: 6d6f 7279 0200 096d 6167 6963 5f6e 756d mory...magic_num
00000050: 0000 0a5f 5f64 6174 615f 656e 6403 010b ...__data_end...
00000060: 5f5f 6865 6170 5f62 6173 6503 020a 0601 __heap_base.....
00000070: 0400 412a 0b00 3c04 6e61 6d65 0013 1272 ..A*..<.name...r
00000080: 6574 7572 6e5f 6e75 6d62 6572 2e77 6173 eturn_number.was
00000090: 6d01 0c01 0009 6d61 6769 635f 6e75 6d07 m.....magic_num.
000000a0: 1201 000f 5f5f 7374 6163 6b5f 706f 696e ....__stack_poin
000000b0: 7465 7200 4d09 7072 6f64 7563 6572 7302 ter.M.producers.
000000c0: 086c 616e 6775 6167 6501 0452 7573 7400 .language..Rust.
000000d0: 0c70 726f 6365 7373 6564 2d62 7901 0572 .processed-by..r
000000e0: 7573 7463 1d31 2e38 362e 3020 2830 3566 ustc.1.86.0 (05f
000000f0: 3938 3436 6638 2032 3032 352d 3033 2d33 9846f8 2025-03-3
00000100: 3129 0049 0f74 6172 6765 745f 6665 6174 1).I.target_feat
00000110: 7572 6573 042b 0a6d 756c 7469 7661 6c75 ures.+.multivalu
00000120: 652b 0f6d 7574 6162 6c65 2d67 6c6f 6261 e+.mutable-globa
00000130: 6c73 2b0f 7265 6665 7265 6e63 652d 7479 ls+.reference-ty
00000140: 7065 732b 0873 6967 6e2d 6578 74 pes+.sign-ext
Looking at the data, I confirm I cannot read binary code like Neo from the
Matrix. There are some strings like magic_num which make me hopeful that the
module does contain the expected code, but it is hard to understand anything
else.
Fortunately, the WebAssembly community has developed many tools to inspect Wasm bytecode. Some of the tools convert the Wasm binary bytecode into the official Wasm textual format (WAT).
I installed wasm-tools on a whim:
cargo install --locked wasm-tools
Then, I ran:
wasm-tools print return_number.wasm
And got back:
(module $return_number.wasm
(type (;0;) (func (result i32)))
(table (;0;) 1 1 funcref)
(memory (;0;) 16)
(global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
(global (;1;) i32 i32.const 1048576)
(global (;2;) i32 i32.const 1048576)
(export "memory" (memory 0))
(export "magic_num" (func $magic_num))
(export "__data_end" (global 1))
(export "__heap_base" (global 2))
(func $magic_num (;0;) (type 0) (result i32)
i32.const 42
)
(@producers
(language "Rust" "")
(processed-by "rustc" "1.86.0 (05f9846f8 2025-03-31)")
)
(@custom "target_features" (after code) "\04+\0amultivalue+\0fmutable-globals+\0freference-types+\08sign-ext")
)
The WebAssembly text format uses S-expressions which makes it similar
to Scheme/Lisp. There is lots of metadata text with the @producers and
@custom which could probably be safely removed if I knew which compiler
options to pass in.
There are many unexpected expressions, but a few standout to me:
-
The overall
(module $return_number.wasm ...)expression indicates all of the expressions as belonging to the single module. -
The
(type (;0;) (func result i32)))expression seems to declare a function type which returns a single i32 matching the expected "return number" function. -
The
(export "magic_num" (func $magic_num))function matches the expected exported function name. -
The
(func $magic_num (;0;) (type 0) (result i32) i32.const 42)seems to be equivalent to the expected function body. Assuming$magic_numis just an identifier, the previousexportexpression is exporting this function definition.
If I read the official spec and the Textual Format, I could
probably understand more about the module, but I'm satisified for now.
Conclusion
I have decided on a simple Rust function which I have generated the equivalent Wasm binary bytecode for. The next step should be parsing the Wasm bytecode.
I wrote about a minimal "success" path in generating the Wasm bytecode, but it
may be helpful to try to omit a step (such as not using extern "C" in the
function declaration) and see what the resulting Wasm bytecode would be.
The source code for this article is located in the WasThat repository
under the examples/return-number directory.
Rust 1.86.0 was used to build the code.