Estimating callGasLimit for UserOperations

A deep dive into all the known approaches used for calculating a UserOperation's callGasLimit and their trade-offs.

The callGasLimit (CGL) specifies the maximum amount of gas available for the execution phase of a UserOperation. Within the ERC-4337 standard, all account actions are encoded in the callData and allocated gas up to the callGasLimit.

Calculating a reliable estimate for CGL can be especially tricky. A good heuristic should work for any arbitrary userOperation where the return value won’t cause an out of gas (OOG) error once submitted. And if a UserOperation cannot be executed, it should return the relevant revert data. In this article we review several approaches to estimating CGL that have been used by both Stackup and other bundler clients to date.

The complications of estimating callGasLimit

Out of all known edge cases, the one that we’re most interested in is the 63/64 EVM rule. Since EIP-150, the use of the CALL opcode (and all it variants) cannot consume more than 63/64 of the remaining gas. Which means as a transaction’s call stack gets deeper, more gas must be reserved upfront to meet the gas requirements of higher call frames.

The current approaches to estimating callGasLimit

Due to this rule, many approaches to estimating gas limits for EOA transactions are based on a binary search approach. Essentially, narrowing down on a minimum gas limit that can successfully execute the transaction without an OOG exception.

In the context of ERC-4337, estimating CGL with binary search can work too. But as we’ll see there is always a constraint that is encountered at each approach.

RPC to eth_estimateGas

This was the initial approach used by most bundle implementations. In this case we call the underlying node’s RPC method eth_estimateGas and set the following parameters:

  • From: EntryPoint address
  • To: Sender address
  • Data: UserOperation’s callData

Under the hood, the node does a binary search where each iteration makes a call with the given parameters. It will eventually narrow down on a CGL or throw an exception if the transaction cannot be completed with the latest state.

This approach works well except on a UserOperation’s first transaction. At this stage the contract account is not yet deployed which means the underlying simulation cannot execute the required code paths to get a reliable measure of the CGL.

Additionally, this approach does not take into account any relevant state changes that may occur during verification. For example, paying the prefund during verification and then not having enough ETH left over for transfers during execution.

Binary search with eth_call to simulateHandleOp

To solve this problem, the EntryPoint has a helper method called simulateHandleOp which allows off-chain entities to simulate a UserOperation’s verification and execution. On the first transaction, the simulation will deploy the contract account during verification so that the required code and state is in place to get a reliable measure of CGL during execution.

simulateHandleOp function from EntryPoint v0.6

Above is the implementation of the EntryPoint’s simulateHandleOp method. In it, you’ll notice that any errors encountered during _executeUserOp is not returned by simulateHandleOp. Instead, the EntryPoint will emit any information on reverts in a log called UserOperationRevertReason.

UserOperationRevertReason event from EntryPoint v0.6

The problem with this is that the eth_call method cannot inspect logs. Even if we were to do a binary search for CGL here, there would be no way of telling wether the callData completed successfully or not.

A workaround in solidity

To bypass this problem it’s possible to use eth_call overrides to replace the EntryPoint with a proxy contract that is able to run the binary search within solidity. This would give us the ability to return the final result of the callData execution. If the transaction completes successfully, it will narrow down on an optimal CGL value otherwise it returns the revert reason to the eth_call request.

This approach was implemented by Alchemy’s Rundler and discussed in further details here.

Binary search with debug_traceCall to simulateHandleOp

An alternative approach to solving this problem is to instead run the same binary search using debug_traceCall instead of eth_call. This gives us the ability to instrument custom tracers that can retrieve any emitted logs or reverts during simulation. Which means we can detect wether the execution completes successfully or throws an exception.

With a custom tracer we can use a successful execution or an OOG exception as the two bounds for running a binary search.

  • If successful: reduce CGL by half for next iteration.
  • Else if OOG: increase CGL by half for next iteration.
  • Else if any other exception: return the error.

This will eventually narrow down to a reliable CGL value. The only problem is that a debug_traceCall is expensive. Given a maximum gas limit of 30 million, this is equivalent to 24 iterations in the worst case which ends up adding a significant amount of latency for users.

Backtracking with debug_traceCall to simulateHandleOp

To optimize this, we’ll need to find a way for our tracer to do more within a single call. This leads us to the current approach of using backtracking to reduce the number of total simulations.

A simple call stack, 3 levels deep

Let’s take a simple EVM call stack that’s 3 levels deep. The length of the bar represents the gas used at each depth. With tracing, we can easily observe the gas used at each frame on exit. The naive approach here would be to set the CGL as the gas used at depth 1. But remember that gas used is not equal to gas required, hence the reason why all past approaches used binary search.

Breakdown of gas

Now consider the breakdown of gas used. The blue is the gas used by operations within that frame and the green is the total gas used by all nested frames. Recall that nested frames can only use up to 63/64 of the gas remaining at the current depth.

For simplicity, we’ll assume that the total gas used at depth 1 is equal to 1000 and the blue portion is equal to 360. This leaves the gas used by the nested frame at depth 2 equal to 640 (i.e. 1000 - 360). By setting the CGL to 1000, the gas remaining is exactly 640. However the nested frame can only use 63/64 of the remaining gas which would be 630. This would then lead to an OOG error.

Annotated with gas required calculation

Given the 63/64 rule, we then have to add a buffer so that 63/64 of the remaining gas is enough to execute the nested frame. In the ideal case, this can be done with a single trace. On exit of each frame we subtract the green portion (gas used at the nested frame) and replace it by the gas required at the nested frame with a 1/64th buffer. This value is then used in the same calculation as we exit our way back down the call stack. The final gas required value is our CGL.

We can generalize this to the following equation, where N is the current depth:

gasRequiredN = gasUsedN - gasUsedN+1 + [(gasRequiredN+1 * 64) / 63]

Dealing with sibling call stacks

Sibling call stacks

One edge case to be aware of is frames with multiple nested call stacks. As we backtrack we’ll need to keep values related to the nested frame for our calculation of gasRequiredN (i.e. the values for gasUsedN+1 and gasRequiredN+1). On exiting each frame, it’s important that these values are not accumulated to prevent any double counting bugs on re-entry to higher depths.

Static gas discounts

Backtracking is a neat optimization. Although, we still can’t rule out binary search altogether. The current backtracking method assumes that 63/64 of the remaining gas is exactly enough to execute the nested frame and all of it gets passed to the CALL. But some contracts will pass the remaining gas minus a static discount.

For example:

someMethodCall{ gas: gasleft() - discount }()

Because of this discount, we can run into a case where the actual gas passed will not be enough to execute the nested frame. For cases like these, we can still fallback to using binary search. However, we can put much more efficient bounds to find the correct CGL with less iterations.

Performance compared to binary search only

In practice we still need to break this up into 2 to 12 tracer calls. First is for the initial backtracking, second is for re-simulation to check everything is ok. If second call passes then we end it there. However, if second call ends in an OOG error we then have to fallback to binary search which can cost up to 10 additional traces.

This cuts about half the number of iterations in the worse case, from 24 to 12. If falling back to binary search is  unavoidable, then any performance gains from here will need to rely on migrating tracers from JS to Go which would then need to be compiled with the underlying Geth client.

Results compared to binary search only

Aside from the biggest benefit of reducing tracer calls, another positive side effect is a much more efficient CGL value compared to binary search. To optimize the number of iterations, most binary search approaches will cut off at a certain range which results in a slightly higher estimated value than what is required.

Tracer vs eth_estimate

At a depth of 3, this execution returns an estimate of ~14.8K less than eth_estimateGas for an already deployed account. Although this difference is not large enough to be a constraint, it’s worth calling out since it does get our prefund closer to the minimum gas required. This means more ETH can stay in the user’s account rather than getting refunded to the EntryPoint deposit and carried over to the next transaction.

Summary

The latest implementation of Stackup Bundler uses this backtracking approach. In practice the implementation can be challenging. It’s much easier to miss certain edge cases which would have otherwise been accounted for during the multiple iterations of just binary search.

It would of course have been much simpler to stick with binary search only or even eth_estimateGas. But given the context of ERC-4337 and all the constraints of past approaches we think that consolidating on a more efficient tracer is the way to go if using the debug_traceCall method.

Our implementation of the tracer is also open source. Since this is a fairly novel approach to CGL estimation, we don’t expect it to be perfect. That said, its a good starting point to further optimize CGL estimations through tracing.