Rust via Desugarings
This book proposes to explain the meaning of a piece of Rust code by successively desugaring it into into a simpler subset of Rust. At the end of this process we reach a subset simple enough that it can hopefully be described formally.
I'm writing this book in the context of two projects that are working towards formalizing Rust:
- MiniRust aims to specify the operational semantics of Rust, i.e. what it means to execute Rust, including a precise delineation of what is and isn't Undefined Behavior;
- a-mir-formality aims to describe, among other things, the trait and type system of Rust (including borrow-checking).
Both of these share the limitation of working only with function bodies in a very simplified+precise form (MIR). I'm writing this book as a complement to those two, filling the gap of how to get from real Rust code to this simplified+precise form1.
Goals
I have three goals in writing this, in order of importance:
-
Explanatory: I would like this book to be a reference to reach for when one doesn't understand a piece of Rust code or an error message. I care more to show where and what kind of transformations happen rather than exactly how they happen.
-
Specificatory: I want this book to be sufficiently complete (if not precise) that we could imagine building a specification for Rust on top of it. There should be no unknown unknowns when reading this. When details are skimmed over there should be a link to a source of truth about what really happens.
-
Implementablory: I want this to be close enough to the reality of the compiler that we could make a rustc-based tool that outputs the outcome of some of these desugaring steps. As we've learned many times, the best teaching tool is one you can interact with.
Non-Goals
This book focuses on the meaning of function bodies: statements, expressions, control-flow. It does not explain e.g. how typechecking works, anything about traits, type layouts, constant evaluation, linking crates, etc. In fact it heavily relies on types and trait data being known-facts we can make use of.
This book does not aim to be an actual specification document2 : duplicating the contents of the Reference would be a waste of effort; I see this book more as a guide for how to read the Reference.
Caveats
In order for each step to produce valid and understandable Rust code, I took the liberty to assume the existence of a number of language features that don't exist in real Rust. See the Extra Language Features chapter for details.
This also mostly doesn't include async, in large parts because I'm not very familiar with the
details of how it's implemented.
While I do my best to be precise and correct, this is just a fun project I'm doing with my current knowledge of Rust. This book will contain mistakes, imprecisions and omissions; please open an issue or PR if you notice any!
-
The majority of the info in this book is present in one way or another in the Rust Reference. The point of this book is in part to know where to start and which Reference sections are relevant to a given piece of Rust code. Also the Reference isn't executable even in theory, unlike this book. ↩
-
I do wonder if it would make sense for the Reference to be structured in that way. ↩
Desugaring Steps
Here is a birds-eye view of the transformations we'll be doing:
- Resolve names and expand macros;
- Lower surface control-flow sugar (
for,while,?,if let/let else, etc.) into a handful of constructs (loop,if,break/continue/return); - Make implicit operations explicit: autoderef/autoref, coercions, method resolution, operator overloading, match ergonomics;
- Materialize temporaries so that every intermediate value gets a name and a lifetime.
- Turn closures into plain structs;
- Make drops explicit.
Each step must produce an equivalent program, i.e. the desugared program compiles if and only if the original one does, and both have the same semantics. Some of the desugaring steps fails to enforce that; this is noted in their Discussion section.
At the end of all that, we get a program in a very limited and precise subset of Rust. See The Final Language for details and discussion.
Name Resolution & Macro Expansion
Name Resolution is the process by which Rust figures out what every identifier (local variable, function call, module, import, etc) refers to. Macro expansion is the process of running a macro and replacing the macro call with its output.
The two are intertwined: a macro may emit new macros, which will affect the state of name resolution for macros. For instance:
macro_rules! define_bump { ($name:ident) => { macro_rules! $name { ($x:expr) => { $x + 1 } } }; } // Name resolution figures out that this points to the macro above. define_bump!(bump); fn main() { // We can't resolve this name until after expanding the `define_bump!` call above. let _ = bump!(3); }
In fact the whole process is stateful: we must expand items in declaration order and I think even the order in which we explore modules can have consequences.
To represent the output of name resolution and hygiene, as part of our desugaring we expand all names to full paths, for any name where there could be ambiguity we rename identifiers to make them unique, and we insert Explicit Hygiene Markers as appropriate.
For example:
mod foo { fn bar(x: u32) {} } use foo::*; fn main() { let x = 4; let x = x + 1; bar(x); } // becomes: mod foo { fn bar() {} } fn main() { let x1 = 4; let x2 = x1 + 1; crate::foo::bar(x2); }
This also deals with macro hygiene:
#![allow(unused)] fn main() { fn foo() -> u32 { let x = 1; macro_rules! check { () => { x == 1 }; // Uses `x` from the definition site. } let x = 2; if check!() { x } else { 0 } } // becomes: fn foo() -> u32 { let x1 = 1; let x2 = 2; if x1 == 1 { x2 } else { 0 } } }
See the Reference for details on macro expansion and name resolution.
At the end of this step, there are no macros left, no use statements, every item is referred to by
its full path, and all variables have unique names.
Control-flow Desugarings
At the end of this series of steps, the remaining control-flow operators are loop, if, all
pattern matching constructs like match and let .. else, and
on_unwind.
Loop Desugaring
for and while loops are desugared into a conditionless loop:
#![allow(unused)] fn main() { for $pat in $iter { $loop_body } // becomes { let mut iter = IntoIterator::into_iter($iter); while let Some($pat) = iter.next() { $loop_body } } }
And then:
#![allow(unused)] fn main() { while $condition { $loop_body } // becomes loop { if $condition { $loop_body } else { break; } } }
Try Desugaring
$expr? desugars to (see RFC):
#![allow(unused)] fn main() { match Try::branch($expr) { ControlFlow::Continue(v) => v, ControlFlow::Break(r) => return FromResidual::from_residual(r), } }
Lazy Boolean Operators
The boolean operators && and || are lazy, which means that they only evaluate their
right-hand-side if it is can influence the final value of the operation.
#![allow(unused)] fn main() { $lhs || $rhs // desugars to: if $lhs { true } else { $rhs } }
#![allow(unused)] fn main() { $lhs && $rhs // desugars to: if $lhs { $rhs } else { false } }
Inside an if expression, the && and ||
operators are also allowed to be mixed with if let (this is called "let-chains").
We do not touch these at this stage, they will be dealt with in a later pass.
After this step, the only && and || operators left are involved in let-chains.
In particular, they're all directly inside an if.
Backlinks
Type-Directed Expression Transformations
At the end of this series of steps there are no invisible type changes left. The type of each subexpression is the type expected from its context.
Method Resolution & Operator Overload
Method calls are the expressions that look like $receiver.method($args..). Method calls in Rust
involve a fair bit of implicit magic: the receiver expression may be referenced and/or dereferenced
to get to the right type, and figuring out which methods are available requires trait solving.
Explaining how that works is out of scope for this guide (see the Reference section); whatever the exact process, the result is that we replace each method call with a full-unambiguous function call expression and some expression adjustments:
#![allow(unused)] fn main() { let opt = Some(42); let x: &i32 = opt.as_ref().clone().unwrap(); // desugars to: let x: &i32 = Option::unwrap(<Option<&i32> as Clone>::clone(&Option::as_ref(&opt))); }
The <Type as Trait>::method(self, args..) syntax is called UFCS (Uniform Function Call
Syntax)
and allows specifying exactly what trait method is getting called. Note also how opt got borrowed
into &opt in order to match the type required for Option::as_ref.
Aside postfix method calls, a number of operations can be overridden using traits. We desugar such overridden operations into the appropriate method call:
a + b -> Add::add(a, b)a += b -> AddAssign::add_assign(&mut a, b)a - b -> Sub::sub(a, b)-a -> Neg::neg(a)a[b] -> Index::index(&a, b)/IndexMut::index_mut(&mut a, b)f(args...) -> Fn::call/FnMut::call_mut/FnOnce::call_once(f, (args...))- etc
The non-overriden versions of these operations stay unchanged.
For example + on integers is built-in, but on integer references is defined by a trait:
#![allow(unused)] fn main() { let x = 1 + 2 + &3; // becomes let x = 1 + <i32 as Add<&i32>>::add(2, &3); }
Note that we don't handle the * operator (overrideable with Deref/DerefMut) here, we'll do it
in Deref/DerefMut Desugarings.
At the end of this step every method call and overridden operator use has been turned into a plain function call.
Autoderef
"Autoderef" refers the fact that postfix operations can work through
references by automatically inserting dereferences *$expr.
Some such derefs were introduced in method resolution already.
In this step we desugar the remaining cases: field access and indexing.
In the expression $expr.field, if the type of $expr does not have a field field then we
desugar it to (*$expr).field and try again, until it is no longer legal to dereference the place.
The expression $expr[$index] also has this behavior.
Coercions
Type coercions are implicit operations that change the type of a value. They happen automatically at certain locations when the expected type doesn't match the actual type of an expression.
The locations where coercions can happen are called "coercion sites" and listed in this Reference section. The allowed coercions are then listed here.
In this step, we desugar these coercions into explicit conversions. The outcome is either an
as-cast ($expr as $ty) or a reborrow (e.g. &*$expr).
For example:
#![allow(unused)] fn main() { fn foo(s: &str) { .. } let x: String = ...; let string_ref: &String = &x; foo(string_ref); // becomes: foo(&**string_ref); }
#![allow(unused)] fn main() { let x = 42u32; let dyn_x: &dyn Debug = &x; let meta = core::ptr::metadata(dyn_x); // becomes: let dyn_x: &dyn Debug = &x as &dyn Debug; let meta = core::ptr::metadata(dyn_x as *const dyn Debug); }
After this step, expressions have the type expected of them.
Deref/DerefMut Desugarings
The expression *$expr is allowed if $expr is a built-in reference type (&T, &mut T, *const T, *mut T and Box<T>), or if $expr: T and T: core::ops::Deref.
In this step we desugar the Deref case.
Let T: Deref be a type and $expr be an expression of that type.
Let context!(..) be a context in which one may find the expression *$expr.
For each context we figure out an associated mutability: &mut $expr and if let Some(ref mut x) = $expr are two examples of mutable contexts.
Uses that don't require mutability are considered immutable contexts.
Examples of immutable contexts are &$expr and function_call($expr).
For let place p = $expr; assignments, mutability is inferred from the
mutability of the contexts in which p is used.
We then desugar as follows:
#![allow(unused)] fn main() { context!(*$expr) // becomes, if the context is considered mutable: context!(*<T as DerefMut>::deref_mut(&mut $expr)) // otherwise, if the context is considered immutable: context!(*<T as Deref>::deref(&$expr)) }
For example:
#![allow(unused)] fn main() { let arc: Arc<Option<i32>> = ...; if arc.is_some() { .. } // becomes, before this step: if Option::is_some(&*arc) { .. } // becomes, after this step: if Option::is_some(&*Deref::deref(&arc)) { .. } }
#![allow(unused)] fn main() { let mut v: Vec<u32> = ...; match *v { [0, ref mut rest @ ..] => .., _ => .., } // becomes, because the pattern context is seen to be mutable: match *DerefMut::deref_mut(&mut v) { [0, ref mut rest @ ..] => .., _ => .., } }
Backlinks
Match Ergonomics
In their fundamental operation, patterns must have the same type as the place
they're matching on, e.g. Some(_) applies to places of type Option<..>.
"Match ergonomics" is the feature that allows some mismatches here, specifically
this allows patterns to transparently match through references.
The exact details are given in RFC 2005 "Match ergonomics" and the edition guide. In terms of desugaring, this step transforms patterns that involve match ergonomics into patterns that don't, i.e. that have exact types.
#![allow(unused)] fn main() { let opt: &&Option<u32> = ..; if let Some(x) = opt { .. } // becomes: if let &&Some(ref x) = opt { .. } }
After this step, patterns have exact types and explicit binding modes (i.e. x vs ref x vs
ref mut x).
Expression Unnesting
At the end of this series of steps, every place
context apart from
simple let-bindings contains a side-effect-free place expression and every value context contains
an operand, as defined in the Final Language section.
Temporaries and Lifetime Extension
A "value-to-place coercion" occurs when a value expression is used in a context where a place is needed, e.g. because it is borrowed, matched on, or has a field accessed. See this blog post for more details about place/value expressions and place/value contexts.
Whenever that happens, the value will get stored in a temporary variable. In this step, we make these temporaries explicit.
The rules that determine the scope of these temporaries are complex; they're described in the Reference. You may also enjoy this blog post with a more explanatory style.
In this step, for each expression $expr to be coerced, we first add a let tmp; statement,
then assign it tmp = $expr; (these two steps can be merged), then use tmp where the expression was.
The placement of the let tmp; determines how long the value will live and its drop order.
To get the right scope, extra blocks { .. } may be added.
For example:
#![allow(unused)] fn main() { let s = if Option::is_some(&Option::clone(&opt)) { let _x = &42; &String::new() } else { &String::new() }; // becomes: let tmp3; let tmp4; let s = if { let tmp1 = Option::clone(&opt); Option::is_some(&tmp1) } { let tmp2 = 42; let _x = &tmp2; tmp3 = String::new(); &tmp3 } else { tmp4 = String::new(); &tmp4 }; }
Or:
#![allow(unused)] fn main() { let opt: RwLock<Option<u32>> = ... if let Some(x) = Option::as_ref(&*Result::unwrap(RwLock::read(&opt))) { ... } else { ... } // becomes (in edition 2024): if let tmp = Result::unwrap(RwLock::read(&opt)) && let Some(x) = Option::as_ref(&*tmp) { ... } else { ... } }
Note how in let chains we may introduce the temporaries as part of the let chain to get the
right scope. Our Extended Let Chains allow forward declarations
let x; in the middle of a let chain for that purpose.
Taking an example from the edition book:
#![allow(unused)] fn main() { fn f() -> usize { let c = RefCell::new(".."); c.borrow().len() } // Becomes, after method resolution: fn f() -> usize { let c = RefCell::new(".."); str::len(*<Ref<_> as Deref>::deref(&RefCell::borrow(&c))) } // Before 2024, this becomes: fn f() -> usize { let tmp1; // Added at the start of scope so that it drops after the other locals. let tmp2; let c = RefCell::new(".."); tmp1 = RefCell::borrow(&c); // error[E0597]: `c` does not live long enough tmp2 = <Ref<_> as Deref>::deref(&tmp1); str::len(*tmp2) } // After 2024, this becomes: fn f() -> usize { let c = RefCell::new(".."); let tmp1; // drops before other locals let tmp2; tmp1 = RefCell::borrow(&c); tmp2 = <Ref<_> as Deref>::deref(&tmp1); str::len(*tmp2) } }
There is an exception to the above: temporaries can, when sensible, become statics instead of local variables. This is called "constant promotion":
#![allow(unused)] fn main() { let x = &1 + 2; // becomes: static TMP: u32 = 1 + 2; let x = &TMP; // this allows `x` to have type `&'static u32` }
After this step, all place contexts contain place expressions.
Implementation
This is a completely fake PoC implementation, as a placeholder to try out my test harness.
#![allow(unused)] fn main() { //! Entirely silly PoC desugaring pass. pub struct ValueToPlaceDesugarer<'tcx, 'a> { body: &'a mut Body<'tcx>, } impl<'tcx, 'a> ValueToPlaceDesugarer<'tcx, 'a> { pub fn new(body: &'a mut Body<'tcx>) -> Self { Self { body } } pub fn run(&mut self) { // Only iterate over the original expressions; newly created ones don't need to be // revisited. for expr_id in self.body.thir.exprs.indices() { if let ExprKind::Borrow { borrow_kind, arg } = self.body.thir.exprs[expr_id].kind.clone() { if self.should_temp(arg) { self.introduce_temp(expr_id, borrow_kind, arg); } } } } fn should_temp(&self, arg: thir::ExprId) -> bool { let peeled = self.peel_value_expr(arg); matches!(self.body.thir.exprs[peeled].kind, ExprKind::Call { .. }) } fn peel_value_expr(&self, mut id: thir::ExprId) -> thir::ExprId { loop { match self.body.thir.exprs[id].kind { ExprKind::Scope { value, .. } | ExprKind::Use { source: value } | ExprKind::NeverToAny { source: value } | ExprKind::PlaceTypeAscription { source: value, .. } | ExprKind::ValueTypeAscription { source: value, .. } | ExprKind::PlaceUnwrapUnsafeBinder { source: value } | ExprKind::ValueUnwrapUnsafeBinder { source: value } | ExprKind::WrapUnsafeBinder { source: value } => { id = value; } _ => break id, } } } fn fresh_scope(&mut self) -> region::Scope { region::Scope { local_id: self.body.new_hir_id().local_id, data: ScopeData::Node, } } }
Some markdown explanation:
#![allow(unused)] fn main() { fn introduce_temp( &mut self, expr_id: thir::ExprId, borrow_kind: rustc_middle::mir::BorrowKind, arg: thir::ExprId, ) { let hir_id = self.body.new_hir_id(); let tmp_name = Symbol::intern("tmp"); self.body.insert_synthetic_local_name(hir_id, tmp_name); let var_expr_id = self.body.thir.exprs.push(Expr { kind: ExprKind::VarRef { id: thir::LocalVarId(hir_id), }, ..self.body.thir.exprs[arg] }); let borrow_expr = Expr { kind: ExprKind::Borrow { borrow_kind, arg: var_expr_id, }, ..self.body.thir.exprs[expr_id] }; let borrow_expr_id = self.body.thir.exprs.push(borrow_expr); let arg_expr = &self.body.thir.exprs[arg]; let pat = Pat { ty: arg_expr.ty, span: arg_expr.span, kind: PatKind::Binding { name: tmp_name, mode: hir::BindingMode(hir::ByRef::No, Mutability::Not), var: thir::LocalVarId(hir_id), ty: arg_expr.ty, subpattern: None, is_primary: true, is_shorthand: false, }, }; let assign_stmt = Stmt { kind: StmtKind::Let { remainder_scope: self.fresh_scope(), init_scope: self.fresh_scope(), pattern: Box::new(pat), initializer: Some(arg), else_block: None, lint_level: thir::LintLevel::Inherited, span: self.body.thir.exprs[arg].span, }, }; let stmt_id = self.body.thir.stmts.push(assign_stmt); let block = thir::Block { targeted_by_break: false, region_scope: self.fresh_scope(), span: self.body.thir.exprs[expr_id].span, stmts: Box::from([stmt_id]), expr: Some(borrow_expr_id), safety_mode: thir::BlockSafety::Safe, }; let block_id = self.body.thir.blocks.push(block); self.body.thir.exprs[expr_id].kind = ExprKind::Block { block: block_id }; } } }
Intermediate Subexpression Elimination
In this step we add intermediate variables for every subexpression.
Specifically, if an expression that isn't a simple binding can be written expr!($subexpr),
we rewrite it.
If $subexpr is a value expression,
we rewrite it to { let tmp = $subexpr; expr!(tmp) };
if it is a place expression,
we rewrite it to { let place tmp = $subexpr; expr!(tmp) };
We do this in an order that preserves normal left-to-right evaluation order. We skip subexpressions that are constants.
#![allow(unused)] fn main() { let mut vec = Vec::new(); vec.push(42); vec[0] += 1; // becomes before this step: (*<Vec<_> as DerefMut>::deref_mut(&mut vec))[0] += 1; // becomes after this step: { let tmp1 = &mut vec; let tmp2 = <Vec<_> as DerefMut>::deref_mut(tmp1); *tmp2 += 1; } }
#![allow(unused)] fn main() { let x = 1 + 2 + Some(3).as_ref().unwrap(); // becomes, before this step: let x = { let tmp1 = Some(3); 1 + <u32 as Add<&u32>>::add(2, Option::unwrap(Option::as_ref(&tmp1))) }; // becomes, after this step: let x = { let tmp1 = Some(3); let tmp2 = &tmp1; let tmp3 = Option::as_ref(tmp2); let tmp4 = Option::unwrap(tmp3); let tmp5 = <u32 as Add<&u32>>::add(2, tmp4); 1 + tmp5 }; }
An example that will be important for Bounds Checks:
#![allow(unused)] fn main() { expr!($place[$i][$j]) // becomes { let place p = $place; let i = $i; let place q = p[i]; // order is important because the bound check happens here let j = $j; let place r = q[j]; expr!(r) } }
At the end of this step, every value context contains either a constant or a variable and every place expression is non-nested.
Discussion
We don't really have to skip constants. I just did it that way because MIR allows constants a operands. I think it's a practical matter of not ending up with billions of variables.
Backlinks
Bound Checks
After desugaring temporaries, the remaining place expressions are mostly side-effect free. The exception is bounds checks. In this step we make bounds checks explicit.
After the previous desugarings, all place expressions are broken into let place bindings, so the
only way an indexing expression may show up is as let place q = p[i]; where p and i are
bindings.
We desugar this as follows, using Unchecked Indexing:
#![allow(unused)] fn main() { let place q = p[i]; // becomes: let len = core::slice::length(&raw const p); assert!(i < len, "appropriate message"); let place q = unchecked_index!(p, i); }
We do something similar for range indexing.
At the end of this step, there are no checked indexing place expressions left.
Discussion
This desugaring is actually unsound if we don't run borrow-checking before doing it:
#![allow(unused)] fn main() { let mut x: &[[u32; 1]] = &[[42]]; let _ = &mut x[0][{x = &[]; 0}]; // becomes: let _ = { let i = 0; let len = x.len(); assert!(i < len); let p = unchecked_index!(x, i); let j = {x = &[]; 0}; &mut p[j] // out of bounds access }; }
Rustc rejects this code using borrow-checking tricks. See Borrow Checking.
let place should probably have a way to disallow
invalidating a place alias.
Backlinks
Overflow Checks
Depending on compilation flags, built-in arithmetic operations may introduce overflow checks. We desugar them here.
#![allow(unused)] fn main() { $a + $b // becomes, in debug mode: { let (res, overflow) = core::intrinsics::add_with_overflow($a, $b); if overflow { panic!("appropriate message") } res } }
We do similar checks for subtraction, multiplication, division, remainder, and right/left shifts. Some checks are not optional, like the zero check for division and remainder.
After this step, all built-in arithmetic operations are infallible.
Functional Record Update
In this step we desugar Functional Record Update syntax:
#![allow(unused)] fn main() { // Assume `Struct` has 4 fields named a,b,c,d x = Struct { a: $expr_a, b: $expr_b, ..$place }; // becomes: x = Struct { a: $expr_a, b: $expr_b, c: $place.c, d: $place.d }; }
This is correct because ..$expr is a place context and we've already made
sure that all place context contain side-effect free place expressions.
Explicit Copies/Moves
When a place expression is used where a value is needed, the contents of the place are copied or moved out, depending on the type of the place (this is called a "place-to-value coercion").
We'll use the copy! and move! operators proposed in Explicit
Copy/Move.
This step adds a copy! or move! to every place-to-value coercion:
#![allow(unused)] fn main() { let x = (String::new(), 42); let y = x.0; let z = x.1; // becomes: let y = move!(x.0); let z = copy!(x.1); }
After this step, the use of each place is explicit: copy, move, borrow, etc.
Backlinks
Pattern Desugarings
At the end of this series of steps no patterns remain and all bindings are declared uninitialized
(let x;/let mut x;).
Desugaring Pattern Expressions
This step transforms all the expressions that involve patterns into either match or if let expressions.
Patterns can show up in the following locations. In what follows, $pat is a pattern that's not
a simple binding.
-
Function parameters
#![allow(unused)] fn main() { fn f($pat: T) { $body } // becomes fn f(tmp: T) { let $pat = tmp; $body } } -
If let
#![allow(unused)] fn main() { if let $pat = $expr { $then } // becomes if let $pat = $expr { $then } else {} } -
If let else
#![allow(unused)] fn main() { if let $pat = $expr { $then } else { $else } // stays unchanged } -
let
#![allow(unused)] fn main() { let $pat = $expr; // becomes let $pat = $expr else { unsafe { core::hint::unreachable_unchecked() } }; } -
let else
#![allow(unused)] fn main() { { let $pat = $expr else { $else }; $body } // becomes if let $pat = $expr { $body } else { $else } }Where we added a block to make
let elsethe first statement of its block. -
Destructuring assignment
#![allow(unused)] fn main() { $pat = $expr; // becomes if let $pat_ = $expr { x1 = x1_; .. xn = x1_; } else { unsafe { core::hint::unreachable_unchecked() } } }where each of the
xiis one of the variables bound in$pat, and$pat_is the outcome of changing$patto use the variablesxi_instead; -
Matches
#![allow(unused)] fn main() { match $expr { $pat if $guard => $arm, .. } // stays unchanged }
After this step, the only expressions involving patterns are match and if let else expressions.
Or-patterns
"Or-patterns" are the patterns that look like $pat | $pat.
They are tricky when they have bindings and when they're under match guards.
We desugar them in this step.
We first "normalize" or-patterns by moving any nested | to the outside of a pattern, e.g. (0 | 1, 2 | 3) becomes (0, 2) | (0, 3) | (1, 2) | (1, 3) (see Discussion below about the combinatorial
explosion).
This expansion is done left-to-right.
Inside let chains, we simply turn let $pat1 | $pat2 = $expr into let $pat1 = $expr || let $pat2 = $expr using Extended Let Chains.
Inside matches, we encode the non-tree-like control-flow directly:
#![allow(unused)] fn main() { match $place { $pat1 | $pat2 if $guard => $arm, $remaining_arms } // becomes: 'match_end: { let x1_; // declare the bindings bound in the patterns, renamed to avoid shadowing. .. let xn_; 'arm: { break 'match_end (match $place { $pat1 if $guard => { x1_ = x1; .. xn_ = xn; break 'arm; }, $pat2 if $guard => { x1_ = x1; .. xn_ = xn; break 'arm }, $remaining_arms }); } $arm_ // modified to use `xi_` instead of `xi` } }
After this step, patterns don't involve |.
Discussion
Drop order
The let-chain desugaring is actually incorrect wrt drop order: or-patterns declare their bindings in the order given by the first subpattern (Reference), but our desugaring will drop them in the order of the alternative that succeeds.
I expect that design choice to prove to be trouble when mixing or-patterns and if-let guard patterns however, so I'd actually propose we make or-patterns drop their bindings in the order of the alternative that succeeded. This would make the proposed desugaring correct.
Running guards several times
Note an interesting property that this desugaring makes clear: a single match guard may run several times. This can be observed, e.g.:
#![allow(unused)] fn main() { let mut guard_count = 0; match (false, false) { (a, _) | (_, a) if { guard_count += 1; a } => {} _ => {} } assert_eq!(guard_count, 2); // succeeds // is equivalent to: let mut guard_count = 0; match (false, false) { (a, _) if { guard_count += 1; a } => {} (_, a) if { guard_count += 1; a } => {} _ => {} } assert_eq!(guard_count, 2); // succeeds }
See also this fun test.
Combinatorial explosion
This desugaring has the benefit of simplicity but two big drawbacks: it duplicates user code (the
match guards), and more importantly causes combinatorial explosion.
For example, (true|false, true|false, true|false, true|false) if $guard desugars to 16 patterns
and 16 copies of the guard code.
A more robust approach could be to give an index to each sub-pattern and branch/loop on these indices to know the right bindings to use/number of times to run a guard. For example, using guard patterns:
#![allow(unused)] fn main() { match $place { ($a | $b, Some($c | Ok($d | $e))) if $guard => $arm, _ => {} } // could become something like: macro_rules! cond_pat { // A conditional pattern: behaves like `$p` if `$i == 0` or like `$q` if `$i == 1` ($i:expr, $p:pat | $q:pat) => { (place pl if $i == 0 && let $p = pl) | (place pl if $i == 1 && let $q = pl) }; } 'success: for i in 0..=1 { for j in 0..=1 { let max_k = if j == 0 { 0 } else { 1 }; for k in 0..=max_k { if let ( cond_pat!(i, $a | $b), Some(cond_pat!(j, $c | Ok(cond_pat!(k, $d | $e)), ) = $place && guard { $arm break 'success } } } } }
By-Value Bindings
By-value bindings in patterns are special because they need to work in two steps: first we check
that the whole pattern matches, then we can set the by-value bindings.
To get flexibility in the coming steps, we transform these bindings using let place and If Let Guards.
Let pat!(x) stand for a pattern that involves a by-value binding x.
Inside a let-chain, we turn let pat!(x) = $expr into let pat!(place p) = $expr if let x = p.
For matches without guards, we desugar as follows:
#![allow(unused)] fn main() { match .. { pat!(x) => $arm, .. } // becomes: match .. { pat!(place p) => { let x = p; $arm } .. } }
For match guards, we need to make sure that any moves happen only when the guard succeeds. To make this work, we only give guards shared access to the value. In order for the binding to have the right type, we use a clever trick:
#![allow(unused)] fn main() { match .. { pat!(x) if $guard => $arm, .. } // becomes: match .. { pat!(place p) if let x1 = &p && let place x = *x1 && $guard => { let x = p; $arm } .. } }
After this step, patterns no longer have by-value bindings.
Match Guard Mutable Bindings
Match guards are only allowed shared-reference access to the variables bound in the branch.
In the previous step we dealt with by-value bindings.
In this step we deal with by-mutable-reference bindings, using let place and If Let Guards again.
Let $pat be a pattern that has a ref mut x binding.
We desugar this as follows:
#![allow(unused)] fn main() { match .. { $pat if $guard => $arm, .. } // becomes match .. { // To give only shared-access but have `x` keep its type, we use the little trick again: $pat if let x1 = &x && let place x = *x1 && $guard => { $arm } .. } }
After this step, guards no longer need special handling around bindings.
Discussion
Modifying discriminants in match guards
Guards actually have another mutability restriction: for soundness they must not be allowed to modify any discriminant that participates in the match.
In rustc this is enforced using borrow-checker tricks. This desugaring ignores that entirely; see also Borrow Checking.
Backlinks
Match Desugaring
We can now simply transform:
#![allow(unused)] fn main() { match $place { $pat1 if $guard1 => $arm1, $pat2 if $guard2 => $arm2, $pat3 => $arm3, } }
into:
#![allow(unused)] fn main() { if let $pat1 = $place && $guard1 { $arm1 } else if let $pat2 = $place && $guard2 { $arm2 } else if let $pat3 = $place { $arm3 } else { unsafe { core::hint::unreachable_unchecked() } } }
This is valid because 1. the scrutinee of the match has been turned into a side-effect-less place expression, and 2. we've dealt with any trickiness around guards, either related to or-patterns or to bindings.
If there are no arms, we emit:
#![allow(unused)] fn main() { let _ = $place; unsafe { core::hint::unreachable_unchecked() } }
At the end of this step the only remaining branching and pattern construct is if let else.
Pattern Unnesting
Operationally, pattern matching expressions stand for a series of comparisons of discriminants or
integers.
In this step we'll compile each if let expression down to built-in comparisons by recursively
simplifying the expressions.
By way of example:
let _ = $x=>true;let 42u32 = $x=>$x == 42u32;let 42u32..=73u32 = $x=>42u32 <= $x && $x <= 73u32;let &$p = $x=>let $p = *$x;let ($p0, $p1) = $x=>let $p0 = $x.0 && let $p1 = $x.1;let Struct { a: $pa, b: $pb } = $x=>let $pa = $x.a && let $pa = $x.b;let Enum::Variant { a: $pa, b: $pb } = $x=>$x.enum#discriminant == discriminant_of!(Enum, Variant) && let $pa = $x.Variant.a && let $pb = $x.Variant.b;let [$pa, .., $pz] = $x=>let len = core::slice::len(&raw const $x) && len >= 2 && let $pa = $x[0] && let $pz = $x[len - 1].
Note that we use Enum Projections and Enum Discriminant Access for enums. Note also that we don't deal with or-patterns because they've been dealt with already.
The left-to-right order is important here; these are lazy boolean operators.
At the end of this step, the only remaining patterns are x/ref x/ref mut x/place x bindings.
Discussion
Evaluation order
The exact semantics of patterns are not decided yet. What's presented in this section is actually a proposal I'm putting forward, that happens to mostly1 be compatible with what's implemented in rustc today.
This proposal has the benefit and drawback of setting in stone a particular order of evaluation. This is useful for unsafe code that may want to know exactly what is accessed in which order, and detrimental to optimizations.
The kind of unsafe code that may motivate such a rigid order is manually-implemented tagged unions:
#![allow(unused)] fn main() { struct MyOption<T> { is_some: bool, contents: MyOptionContents<T>, } union MyOptionContents<T> { uninit: (), init: T, } impl<T> MyOption<T> { fn as_ref(&self) -> Option<&T> { unsafe { match *self { MyOption { is_some: true, contents: MyOptionContents { ref init } } => Some(init), MyOption { is_some: false, .. } => None, } } } } }
I don't know if this sufficient motivation, nor exactly the extent of optimizations we'd lose if we set this in stone.
Precise semantics
There are tiny discrepancies between this proposed desugaring and what the lang team has decided to
be true today. For example, non-#[non_exhaustive] enums with a single variant don't incur
a discriminant read today but do in this desugaring.
These topics are in flux so I'll keep the simple desugaring for now.
Slice patterns
Our desugaring of slice patterns is incorrect wrt borrow-checking, because the borrow-checker can
actually track the disjointness of borrows such as let [ref mut x, ref mut y] = $place.
To express this, we'd need a feature to represent indexing with indices that the borrow-checker is
allowed to inspect (it normally does not inspect indices).
MIR has such an operation.
Backlinks
-
At least one difference is that rustc tests or-pattern alternatives after other patterns to reduce duplicate work. So
matches!($x, (true|true, false))is actually compiled tomatches!($x.1, false) && matches!($x.0, true|true). There are also details around enums with only one variant. Also constants in patterns get turned into patterns, which may behave differently than plain==comparison does in terms of exact UB. ↩
Let chains
"Let chains" are what we call expressions like if let $pat1 = $expr1 && let $pat2 = $expr2.
In this step, we desugar these. Note that we also support || in let
chains.
In what follows, $expr1/$expr2 are expressions made of &&, ||, let $binding = $place, let binding;, and boolean expressions.
First, the base cases:
#![allow(unused)] fn main() { if let $binding = $place { $then } else { $else } // becomes if true { let $binding = $place; $then } else { $else } }
#![allow(unused)] fn main() { if let $binding; { $then } else { $else } // becomes if true { let $binding; $then } else { $else } }
Then the && case, using block-break to jump over the else branch:
#![allow(unused)] fn main() { if $expr1 && $expr2 { $then } else { $else } // becomes 'exit: { if $expr1 { if $expr2 { break 'exit $then; } } $else } }
And finally the || case, which we specify as follows, with a duplication:
#![allow(unused)] fn main() { if $expr1 || $expr2 { $then } else { $else } // becomes if $expr1 { $then } else if $expr2 { $then // duplicated :/ } else { $else } }
After this step, the only remaining branching construct is if on booleans.
Discussion
Avoiding duplication
We'd like to avoid duplicating user code, especially as in this case this can lead to exponential blowup of code size. The tricky part is preserving drop order, particularly on unwind. This section proposes a solution; it is quite involved, yet it's the simplest I found.
First we get rid of non-place bindings in two steps:
- Move binding declarations to the left by turning every
$bool_expr && let x;intolet x; && $bool_expr; - Move binding declarations out of
||as follows, wherex1is a fresh name. (This usesscope_end!)#![allow(unused)] fn main() { (let x; && $expr1) || $expr2 // becomes let x1; && (let place x = x1 && $expr1 || { scope_end!(x1); true } && $expr2) }#![allow(unused)] fn main() { $expr1 || (let x; && $expr2) // becomes let x1; && ({ scope_end!(x1); true } && $expr1 || let place x = x1 && $expr2) }
This produces new top-level &&-chains, onto which we recursively apply the && case above.
This takes care to declare a series of bindings in the correct order for each branch,
which ensures correct drop order even on unwind.
Now the only bindings left are let place bindings.
We first move these to the right by transforming let place p = $place && $expr
into $expr && let place p = $place.
If $expr mentioned p, we substitute $place in its stead.
This even allows swapping two let place bindings.
By the above the order of the let place bindings is unimportant,
so for any place p that has a let place alias
in both || alternatives, we can write the condition as follows:
#![allow(unused)] fn main() { ($expr1 && let place p = $place1) || ($expr2 && let place p = $place2) }
then transform it using conditional place aliases:
#![allow(unused)] fn main() { let branch; && ($expr1 && { branch = true; true } || $expr2 && { branch = false; true }) && let place p = if_place!(branch, $place1, $place2) }
At the end of this, the remaining ||-chains involve only boolean expressions,
which we can desugar like in Lazy Boolean Operators
without needing to care about binding scopes.
Worked example:
#![allow(unused)] fn main() { if (let Some(a) = foo() && let Some(b) = a.method()) || (let Some(b) = bar() && let Some(a) = b.method()) { .. } // becomes: { let foo_left; let a_left; let method_left; let b_left; let bar_right; let b_right; let method_right; let a_right; let branch; if ({ foo_left = foo(); true } && foo_left.is_some() && { a_left = foo_left.Some.0; true } && { method_left = a_left.method(); true } && method_left.is_some() && { b_left = method_left.Some.0; true } && { branch = true; true } ) || ({ scope_end!(b_left); true } && { scope_end!(method_left); true } && { scope_end!(a_left); true } && { scope_end!(foo_left); true } && { bar_right = bar(); true } && bar_right.is_some() && { b_right = bar_right.Some.0; true } && { method_right = b_right.method(); true } && method_right.is_some() && { a_right = method_right.Some.0; true } && { branch = false; true } ) { let place a = if_place!(branch, a_left, a_right); let place b = if_place!(branch, b_left, b_right); .. } } }
The hoops we have to jump through to avoid duplicating code are not great. The thing we're trying to express is (somewhat) simple, but expressing it in surface Rust is hard.
An alternative to the proposed approach would be to make all the scope ends and unwind paths
explicit beforehand, which would give us full control over
the order in which locals are dropped.
Attempts at writing this in a compositional way have so far failed,
hence the conditional let place approach.
An in-between solution could be a feature to control drop order of bindings dynamically.
Spec-wise we can possibly just keep the version that duplicates; it has the benefit of utmost simplicity.
Backlinks
Desugaring Bindings
All the let expressions left are now bindings.
We desugar them all into binding declarations and assignments:
-
By-value bindings:
#![allow(unused)] fn main() { let x = $place; // becomes let x: $ty; x = &$place; } -
By-ref bindings:
#![allow(unused)] fn main() { let ref x = $place; // becomes let x: $ty; x = &$place; } -
By-ref-mut bindings:
#![allow(unused)] fn main() { let ref mut x = $place; // becomes let x: $ty; x = &mut $place; } -
Place aliases:
For place aliases, the RHS is already a side-effect-free place expression. We can therefore simply substitute
$placeforpsyntactically. For example:#![allow(unused)] fn main() { let place p = x.field; something(&p); something_else(p); // becomes: something(&x.field); something_else(x.field); }
At the end of this step, all the bindings are declared uninitialized (let x;/let mut x;).
Backlinks
Closure Desugarings
At the end of this series of steps no closure expressions remain.
Closure capture
Closures and async blocks can refer to the places in their environment:
#![allow(unused)] fn main() { let mut x = 0; let mut increment = || x += 1; // `&mut x` is captured increment(); increment(); assert_eq!(x, 2); }
This gets eventually compiled to a data type that stores references to, or the contents of, the captured places. The compiler determines automatically how to capture each place depending on how it is used in the closure/async block.
In this step, we use move expressions and let place to make all these captures explicit (note that this is very
different from move!(..) introduced in Explicit Copies/Moves).
For every captured place, we introduce at the start of the closure a let place p = ...; that uses
a move expression.
The rest of the closure stays unchanged.
Our initial example becomes:
#![allow(unused)] fn main() { let mut increment = || x = copy!(x) + 1; // desugars to: let mut increment = || { let place x = *move(&mut x); x = copy!(x) + 1 }; }
Another example:
#![allow(unused)] fn main() { let mut x = Some(42); let mut replace = move |new: u32| Option::replace(&mut x, copy!(new)); // desugars to: let mut replace = |new: u32| { let place x = move(x); Option::replace(&mut x, copy!(new)) }; }
This final example uses a unique immutable borrow (which we introduce in Unique-Immutable
Borrow) since a &mut borrow would require let mut rx:
#![allow(unused)] fn main() { let mut x = 42; let rx = &mut x; let mut increment = || *rx += 1; // desugars to: let mut increment = || { let place rx = *move(&uniq rx); *rx += 1 }; }
See the Reference for details about what gets captured and how. Thanks to previous desugarings all place uses are explicit, which makes the analysis straightforward.
After this step, all closure and async block captures are explicit, using move expressions.
Backlinks
Closure Desugaring
Once captures are explicit, desugaring closures into ADTs becomes straightforward.
A closure becomes a struct, with one field per move($expr) expression, and that field is
initialized with $expr. That struct then implements the appropriate Fn* traits. In the new
function body, what was a move(..) expression is replaced with the appropriate field expression.
Let's take our previous examples again:
#![allow(unused)] fn main() { let mut increment = || { let place x = *move(&mut x); x = copy!(x) + 1 }; // desugars to struct Closure<'a> { capture1: &'a mut u32, } impl FnOnce<()> for Closure<'_> { type Output = (); fn call_once(mut self, args: ()) { self.call_mut(args) } } impl FnMut<()> for Closure<'_> { fn call_mut(&mut self, _args: ()) { let place x = *self.capture1; x = copy!(x) + 1 } } let mut increment = Closure { capture1: &mut x }; }
and
#![allow(unused)] fn main() { let mut replace = |new: u32| { let place x = move(x); Option::replace(&mut x, copy!(new)) }; // desugars to struct Closure { capture1: u32, } impl FnOnce<(u32,)> for Closure { type Output = Option<u32>; fn call_once(mut self, args: (u32,)) -> Option<u32> { self.call_mut(args) } } impl FnMut<(u32,)> for Closure { fn call_mut(&mut self, (new,): (u32,)) -> Option<u32> { let place x = self.capture1; Option::replace(&mut x, copy!(new)) } } let mut replace = Closure { capture1: x }; }
To clean up the newly generated closure expressions, we run the Intermediate Subexpression Elimination, Explicit Copies/Moves and Desugaring Bindings steps again.
After this step, there are no closure expressions left.
Desugaring Nested Scopes
At the end of this series of steps, the exact code to be run for drop purposes has been made explicit, even on unwind. Scopes are no longer relevant for variable liveness or drops.
Removing Tail Expressions
After the previous desugarings, any block that returns a value is the target of an assignment. In this step we move the assignment inside the block so as to remove all tail expressions.
#![allow(unused)] fn main() { $place = { $statements; $expr }; // becomes: { $statements; $place = $expr; } }
#![allow(unused)] fn main() { $place = if $bool { $then } else { $else }; // becomes: if $bool { $place = $then; } else { $place = $else; } }
#![allow(unused)] fn main() { $place = loop { $statements; if $bool { break $expr; } }; // becomes loop { $statements; if $bool { $place = $expr; break; } } }
The one block that is special is the whole function.
Since the tail of a block is a value context we know the tail
expression of the function, if any, is a local variable.
We then simply add an explicit return statement.
#![allow(unused)] fn main() { fn $name($args..) -> $ty { $statements; $local } // becomes fn $name($args..) -> $ty { $statements; return $local; } }
After this step, all blocks end in a statement rather than an expression, and all blocks and
control-flow expressions have type ().
Explicit Binding Scopes
This step makes the end of variable scopes explicit using the Explicit End Of Scope feature.
At the end of each scope, for each variable x declared in that scope in reverse order of
declaration, we add a scope_end!(x) statement.
Before every break;/continue; statement, we similarly scope_end all the in-scope variables that will no
longer be in scope at the target of the break;/continue;.
Finally before a return $local; statement we end the scopes of all locals except $local.
For example:
#![allow(unused)] fn main() { let x; loop { let b; b = foo(); let c; c = bar(); if b { break; } else if c { return b; } } // becomes let x; loop { let b; b = foo(); let c; c = bar(); if b { scope_end!(c); scope_end!(b); break; } else if c { scope_end!(c); scope_end!(x); return b; } } scope_end!(x); }
Explicit Drop Locations
This step makes explicit where drops may happen, using the ensure_dropped!
macro.
We add ensure_dropped! statements in two locations: at end of scopes and before variable
assignments.
#![allow(unused)] fn main() { scope_end!($local); // becomes: ensure_dropped!($local); scope_end!($local); }
#![allow(unused)] fn main() { $place = $expr; // becomes: ensure_dropped!($place); $place = $expr; }
One tricky case is assignment through a mutable reference:
#![allow(unused)] fn main() { let a: &mut String = ...; *a = String::new(); // this drops the previous string // becomes: ensure_dropped!(*a); // this drops the previous string *a = String::new(); // borrowck knows there's no previous string to drop }
This is not allowed in today's Rust, but is legal for us thanks to the Moving Out Of
&mut feature.
After this step, all assignments are to statically uninitialized places (hence won't cause implicit
drops), and scope_end never needs to drop anything.
Explicit Unwind Cleanup
In this step we make explicit the cleanup that happens on unwinding, using the Cleanup On Unwinding feature.
We surround every function call and every use of ensure_dropped with an on_unwind block.
In the cleanup part of this block, we add ensure_dropped!($local); scope_end!($local); statements
for each in-scope local, in reverse order of declaration.
#![allow(unused)] fn main() { let n = 42; let x = String::new(); // becomes, before this stage: let n; n = 42; let x; x = String::new(); ensure_dropped!(x); scope_end!(x); ensure_dropped!(n); scope_end!(n); // becomes, after this stage: let n; n = 42; let x; x = on_unwind String::new() { ensure_dropped!(x); scope_end!(x); ensure_dropped!(n); scope_end!(n); }; on_unwind ensure_dropped!(x) { ensure_dropped!(x); scope_end!(x); ensure_dropped!(n); scope_end!(n); }; scope_end!(x); on_unwind ensure_dropped!(n) { ensure_dropped!(n); scope_end!(n); }; scope_end!(n); }
After this step, unwinding no longer causes any code to run implicitly; it has all been made explicit.
Scope Flattening
Now that all ends of scope are explicit, we can remove any blocks that aren't the target of a break 'label.
#![allow(unused)] fn main() { { let x; x = String::new(); } // becomes let x; x = String::new(); }
Final Desugarings
At the end of these series of steps, everything is explicit and we've reached the final language.
Phased Initialization
At this stage, some compound value expressions remain, namely struct, enum and union constructors. In this step we desugar those into individual assignments, using Phased Initialization.
#![allow(unused)] fn main() { x = Struct { a: $expr_a, b: $expr_b }; // becomes: x.a = $expr_a; x.b = $expr_b; }
#![allow(unused)] fn main() { x = Enum::Variant { a: $expr_a, b: $expr_b }; // becomes: x.Variant.a = $expr_a; x.Variant.b = $expr_b; x.enum#discriminant = discriminant_of!(Enum, Variant)); }
Note that we don't desugar tuple struct/enum constructors since these are semantically function calls.
The one aggregate we keep is array repeat expressions [$expr; $const].
Drop Elaboration
We know where drops may happen, so we now only need to decide which drops to run at each relevant program point.
Dropping happens in-place by running the compiler-generated core::ptr::drop_in_place function on
the place to be dropped.
Instead of calling that function directly we use the
drop_in_place!() built-in macro so that the borrow-checker can
tell that the place has been deinitialized.
In this step we'll replace every ensure_dropped!($place) with a series of
appropriate calls to drop_in_place!($subplace).
For any subplace of $place that hasn't been explicitly moved out, we insert a call to
drop_in_place!.
This can require adding extra booleans ("drop flags") if different branches haven't moved the same
places:
#![allow(unused)] fn main() { let x = Struct { a: String::new(), b: String::new(), }; if foo() { drop(move!(x.a)); } else { drop(move!(x.b)); } ensure_dropped!(x.a); x.a = "some other string".to_owned(); ensure_dropped!(x); scope_end!(x); // becomes: let a_is_initialized = true; let b_is_initialized = true; if foo() { drop(move!(x.a)); a_is_initialized = false; } else { drop(move!(x.b)); b_is_initialized = false; } if a_is_initialized { drop_in_place!(x.a); } x.a = "some other string".to_owned(); if b_is_initialized { drop_in_place!(x.b); } drop_in_place!(x.a); scope_end!(x); }
(unwind paths omitted for legibility)
After this step, all the code involved in dropping values is explicit.
Backlinks
Borrow Checking?
None of the desugarings so far mention borrow-checking. And the reason why is that unlike type-checking, borrow-checking really is just a "check". In particular, the result of borrow-checking must not influence runtime behavior.
So there should not need to be a desugaring related to borrow-checking.
Discussion
The question remains of when to run borrow-checking. Ideally we'd run it at the end of all the desugarings; as presented though we lose some information while desugaring, in particular around matches, so borrow-checking here would accept unsound code (see e.g. Bound Checks), and reject code that is accepted today (see the note about slice patterns in Pattern Unnesting).
If we wanted to have accurate borrow-checking, we'd need, at least:
- Fake borrows/fake reads of the places involved in a match (see Match Guard Mutable Bindings);
- Fake borrows/fake reads of the matched place to detect
let x: !; match x {}; - Fake borrows/fake reads around bounds checks to reject
x[0][{x = &[]; 0}](see Bound Checks); - Some false edges I don't recall where (I know MIR has some for loops and match guards but both of these are irrelevant for us);
- Keep
let _ = $place;place mentions I think; - Support tracking some constant slice indices, for the purpose of borrow-checking slice patterns (see Pattern Unnesting).
Backlinks
The Final Language
We have now fully desugared our Rust program. The resulting program uses a very limited subset of Rust, described here.
An "constant value" $const is:
- A constant literal
true,42u32,3.14f64,"str literal", etc; - A named constant
CONSTANT; - A function item ZST.
A "place expression" $place is:
- A local variable
x; - A named static or thread local
STATIC; - A dereference
*$place; - A field access
$place.$field_name; - An enum field access
$place.$variant_name.$field_name; - A discriminant access
$place.enum#discriminant(see Enum Discriminant Access); - Unchecked indexing
unchecked_index!($place, $operand),unchecked_index!($place, $operand..=$operand)(see Unchecked Indexing).
An "operand" $operand is:
- A place access
copy!($place)ormove!($place)(see Explicit Copy/Move); - A constant
$const.
An "value expression" $val_expr is:
- An operand
$operand; - A borrow
&$place/&mut $place/&raw const $place/&raw mut $place/&uniq $place; - A cast
$operand as $ty; - A built-in operation
$operand + $operand,$operand >= $operand,!$operand, etc; - A repeat expression
[$operand; $const].
A "statement" $statement is:
- A variable declaration
let x: $ty;/let mut x: $ty;. - Assignment
$place = $val_expr; - Function call
$place = $operand($operand..); - If expression
if $operand { $block } else { $block }; - Loop expression
'a: loop { $block }; - Unwind cleanup expression
on_unwind { $block } { $block }(see Cleanup On Unwinding); - Named block
'a: { $block };; - Jumps
break 'a/continue 'a; - Return
return $operand; scope_end!($local)drop_in_place!($place)inline_asm!(..).
A "block" $block is a list of ;-terminated statements. It is always of type ().
A fully desugared function body is a block.
Difference with MIR
This target language is intentionally very close to MIR1. The main differences are:
- Our language has structured control-flow whereas MIR has a graph of blocks with
gotos; - MIR has
StorageLive/StorageDeadstatements to track allocation/deallocation of locals; we instead havelet x;to allocate andscope_end!(x)(see Explicit End Of Scope) that marks where the local is deinitialized. This may or may not be exactly equivalent; - Instead of
return value;, MIR has a return place that must be written to before returning.
There a probably a ton of subtle differences I haven't noticed, but so far going from this to MIR looks pretty straightforward.
Difference with MiniRust
MiniRust is intentionally quite close to MIR2. Beyond the differences with MIR we already saw, from what I know to get valid MiniRust we'd also need at least the following:
- Corountine transform, which transforms
asyncblocks into state machines; I didn't know where to fit that in the desugarings and generally skipped anything related toasync; - Change a bunch of intrinsic calls like
ptr::read,u32::add_with_overflowto built-in operations.
There's likely more, I haven't investigated in detail yet.
Discussion
I feel like this accomplished the goals I set out in the introduction pretty well. I noted decisions made and caveats throughout the book.
The most important caveat to note is the question of borrow-checking. In the relevant section, I highlight how borrow-checking after our desugarings would allow unsound code to compile. We therefore need to either borrow-check somewhere in the middle or desugar differently. I am leaving this question open for now.
I am also left unsatisfied with the desugaring of ||-chains.
It seems we have two bad choices: duplicate user code (and risk exponential blowup),
or emit nasty-looking code.
There may be a clever language feature that can alleviate this conundrum.
Conclusion
I am pretty pleased with the shape this took. I am now convinced that this way of presenting things is fruitful. In an ideal universe this book would be combined with MiniRust and a-mir-formality to make a reference interpreter written in literate programming style; if I had to choose I would quite like that as an official spec for Rust.
By far the trickiest part of all this was the interaction between temporaries and patterns3. I'd like to thank @dianne for helping me get temporary lifetimes right and figure out a pass ordering that doesn't loop onto itself all over the place.
I don't know what will become of this book now. It would be a shame for it not to be kept up-to-date as the language evolves. Only time will tell if this will be deemed a worthy investment.
Thanks for reading, and please come chat4 if you've read this far! If you find mistakes or missing details I welcome issues and PRs.
Backlinks
-
MIR is actually a bunch of languages in a trenchcoat. The MIR I'm talking about here is a MIR post-drop elaboration but pre-coroutine transform. ↩
-
In this case, a different MIR than I was talking about. MiniRust is closer to runtime MIR. ↩
-
Desugaring or-patterns required temporaries to be handled, but these seemed to require let chains to be desugared, but we can't desugar let chains in if let guards before desugaring match guards in some form, etc etc. It was all a big fun mutually-dependent knot. ↩
-
Check my GitHub profile for contact info. ↩
Extra Language Features
To make the desugarings possible/easier, this book takes the liberty to assume a few extra Rust features available; this section describes them.
These are overwhelmingly not my ideas! These come from RFCs, giving a syntax to existing MIR operations, or ideas I've seen float around in random Zulip threads over the years. I tried to give credit when I knew who to credit.
Backlinks
Automatic Drop
This feature adds a ensure_dropped!($place) built-in macro.
This compiles to appropriate calls to drop_in_place!($place)
for any parts of the place still initialized.
Backlinks
Cleanup On Unwinding
This feature adds a new control-flow construct on_unwind $expr { $block }.
This evaluates $expr and returns its value,
unless $expr causes unwinding to occur.
In that case, the statements in $block are executed before
unwinding continues.
This is used to make cleanup code explicit. This could look like:
#![allow(unused)] fn main() { let x = String::new(); function_call() // becomes: let x = String::new(); on_unwind function_call() { scope_end!(x); }; }
Backlinks
Enum Discriminant Access
To read the discriminant of an enum value today, one must use core::mem::discriminant(&place). This
unfortunately requires a &-borrow, which may introduce UB in unsafe contexts.
Moreover there's no stable way to get a discriminant value without having an enum value with that
discriminant around.
To solve this, I propose to combine ideas from two existing RFCs:
- from RFC 3607 we add a new place expression
$place.enum#discriminantevaluates to the discriminant value at that place; - from RFC 3727, we add
discriminant_of!($enum_type, $variant_name)that returns the discriminant value for that variant.
Using both, we have that matches!($place, Some(_)) is equivalent to $place.enum#discriminant == discriminant_of!(Option<_>, Some).
Those two will also come useful for Phased Initialization.
Backlinks
Enum Projections
To desugar enum patterns we need a way to talk about the places inside of an enum variant.
If $place is of an enum type, this features adds a $place.$variant_name place expression that
refers to the contents of the $variant_name variant.
That place then has one field for each field of the variant.
For example:
#![allow(unused)] fn main() { let opt: &mut Option<u32> = ...; if let Some(ref mut x) = *opt { // x: &mut u32 here } // desugars to: if let Some(_) = *opt { let x = unsafe { &mut (*opt).Some.0 }; } }
Using this place is UB if the enum value didn't have the correct discriminant, and thus this operation
requires an unsafe block.
The place $place.$variant_name can't be used by itself because we don't have a type for it; it
must be directly followed by a field projection.
Backlinks
Explicit Copy/Move
This feature adds an explicit syntax to explicitly copy or move the contents of a place:
copy!($place)copies the contents of the place. It is a type error if applied to a non-Copytype;move!($place)moves the contents out of the place, regardless of whether the value implementsCopy. The place is considered uninitialized afterwards.
Backlinks
Explicit End Of Scope
When a binding goes out of scope, any parts of it that remained initialized are dropped, and any borrows of it are invalidated.
This feature introduces a special macro scope_end!($local) which has this same effect.
Operationally, this compiles to a call to ensure_dropped!($place), followed by
a deallocation of the local.
Unlike move outs, this is permanent: it is an error to re-initialize $local after
scope_end!($local).
Note: scope_end!($local); scope_end!($local); is the same as scope_end!($local);.
Backlinks
Explicit Hygiene Markers
Macro hygiene is not only there to avoid identifier name clashes; it affects language semantics in
at least one other way: editions. When the behavior of a program depends on the edition, the edition
to use is taken from an appropriate token; for example for the if let
rescope changes
I believe the relevant edition is that of the if token.
When a crate in edition N calls a macro from a crate in edition M, each token recalls which
edition it comes from. So the following code may or may not panic depending on the edition of the
crate that defines the if_let macro.
use core::cell::Cell; // Possibly defined in another crate. macro_rules! if_let { (($pat:pat = $expr:expr) { $($then:tt)* } else { $($else:tt)* }) => { if let $pat = $expr { $($then)* } else { $($else)* } } } fn main() { let x = Cell::new(0); x.set(42); if_let!((None = Some(&ZeroOnDrop(&x))) { assert_eq!(x.get(), 42); // In edition 2024, value is dropped here } else { assert_eq!(x.get(), 0); // Fails in edition 2021 }); // In edition 2021, value is dropped here assert_eq!(x.get(), 0); } /// Sets the value to `0` on drop. struct ZeroOnDrop<'b>(&'b Cell<u32>); impl<'b> Drop for ZeroOnDrop<'b> { fn drop(&mut self) { self.0.set(0); } }
Naively expanding macros could therefore change program semantics.
To be able to expand the macro in a semantics-preserving way, this feature adds a #[edition = "..."] attribute that changes the edition of the following token.
So the macro above would expand to #[edition = "2021"] if let ... or #[edition = "2024"] if let ... depending on the edition of the crate that declares it.
Discussion
I do not know if there are other hygiene considerations that can affect program semantics. If so we'll need to add similar attributes.
This attribute proposal is actually incomplete: attributes are not allowed between any two tokens.
Maybe the edition-relevant tokens can all be preceded by an attribute? Otherwise we'll need to
invent more syntax like x#edition#2024 or such horrors.
Backlinks
Extended Let Chains
Let Chain Disjunctions
This feature enables if let $pat = $expr || let $pat = $expr.
This works as one would expect.
This can be mixed with normal boolean conditions and &&-based let chains.
If any binding is bound in both alternatives, it must have the same type in both
and it can then be referred to in the if body.
let place bindings cannot be referred to in the arm body,
unless we accept the "conditional let place" feature which I'm feeling icky about.
#![allow(unused)] fn main() { if let Some(y) = foo() || (let Some(x) = bar() && let Some(y) = baz(x) && cond()) { // `x` is accessible here .. } }
The drop order of the bindings depends on which branch succeeded.
#![allow(unused)] fn main() { if (let Some(a) = foo() && let Some(b) = a.method()) || (let Some(b) = bar() && let Some(a) = b.method()) { .. } }
Let Chain Forward Declarations
This feature also enables let x; in the middle of a let chain.
This declares the variable without initializing it.
Its drop order is determined by this declaration site just like a normal let x = ....
This is used to get the right temporary lifetimes and drop order when desugaring.
#![allow(unused)] fn main() { if let Some(x) = foo() && let y; && let Some(z) = if bar() { y = Some(thing()); &y } else { &None } { .. } }
Backlinks
If Let Guards
This is RFC 2294 which is to be
stabilized soon: this feature enables if let in match guards.
#![allow(unused)] fn main() { match ui.wait_event() { KeyPress(mod_, key, datum) if let Some(action) = intercept(mod_, key) => act(action, datum), ev => accept!(ev), } }
Backlinks
In-Place Drop
This feature introduces a built-in macro operator drop_in_place!($place) that 1. sets the place as
uninitialized for the purposes of borrow-checking, and 2. calls core::ptr::drop_in_place(&raw mut $place) on it.
This is used in Drop Elaboration to make drops explicit.
This can't be desugared to two separate steps because deinitializing first would make the &raw mut
borrow invalid, and deinitializing last would cause the place to be double-dropped if the drop code
panics.
Backlinks
Move Expressions for Closure Captures
Closures and async blocks can "capture" places from their surrounding environment, which is an implicit operation. To make this explicit I propose to use a feature called "move expressions" that's been discussed lately.
This feature adds a move($expr) expression, valid in a closure, that:
- Evaluates
$exprin the parent of the closure; - Stores the result inside the closure;
- Acts as a place alias for the place where that result is stored (i.e. the field of the closure object).
For example, a function that increments a captured variable can be expressed as:
#![allow(unused)] fn main() { // Implicit capture: let mut increment = || x += 1; // Explicit capture: let mut increment = || *move(&mut x) += 1; }
To see why it's important that move(..) be a place expression, consider:
#![allow(unused)] fn main() { let mut x = Some(42); // This moves the whole of `x` inside the closure; this closure could be returned from // the current function. let mut replace = move |new: u32| x.replace(new); // Explicit capture: let mut replace = |new: u32| Option::replace(&mut move(x), new); }
Here the &mut move(..) directly borrows the place where we stored the initial value, and modifies
it on each call. That would not work if move(..) was a value expression.
Move expressions can be nested: move(move($expr)) evaluates the inner move($expr) when the outer
closure is created, then the outer move(move($expr)) when the inner closure is created:
#![allow(unused)] fn main() { let mut x = Some(42); let generate_replacer = || { do_some_stuff(); |new: u32| Option::replace(&mut move(move(x)), new) }; // is equivalent to: let generate_replacer = || { do_some_stuff(); let inner_x = move(x); |new: u32| Option::replace(&mut move(inner_x), new) }; }
Backlinks
Moving out of &mut
This proposes a (hopefully) simple modification to the borrow-checker: we should allow moving out of
&mut T references, as long as we put a value back before anyone else can see it.
In practice, that means that we must put the value back before any function call since any function call may unwind (unless compiled with panic=abort I guess).
#![allow(unused)] fn main() { fn foo(x: &mut Vec<u32>) { let vec: Vec<_> = *x; // move out // can't call any functions oops *x = vec; // write a value back // now we're back to normal } }
The purpose of this feature in this document is to Make dropping
explicit.
If we wanted it as a general feature, we may need things like nopanic functions to make it usable.
Backlinks
Phased Initialization
This feature allows the following1 :
#![allow(unused)] fn main() { let x: (u32, u32); x.0 = 42; x.1 = 43; // use `x` as normal here }
Until the value is fully initialized, panic/going out of scope would simply drop the
already-initialized fields. The moment the last field is initialized, the value is treated as a full
value, and dropping it would use its Drop impl, if any.
Note that by that token, let x: ZST; is sufficient to initialize a fieldless struct ZST;. This
may or may not be desirable.
For enums, I propose to use the semantics of RFC 3727 along with the syntax of RFC 3607 that we saw in Enum Discriminant Access:
#![allow(unused)] fn main() { let x: Option<u32>; unsafe { x.Some.0 = 42; x.enum#discriminant = discriminant_of!(Option<u32>, Some)); } }
Per the RFC, the discriminant is only allowed to be set once the rest of the fields are initialized. The value is therefore considered fully initialized the moment we write to its discriminant.
Backlinks
-
I swear I recall seeing an RFC proposing exactly that but I can't find it. ↩
Place Aliases
This feature adds place p bindings, allowed anywhere patterns are except inside or-patterns.
let place p = $expr; evaluates $expr to a place expression and then works as an alias to that
place.
It expresses the idea of "compute a place once and use it many times". In practice, if we apply our
initial desugaring steps to let place p = $expr;, we end up with $expr being a side-effect-free
place expression, which we can then syntactically substitute wherever p is used
(this is done in the Desugaring Bindings step).
For example:
#![allow(unused)] fn main() { let place p = x.field; // this does not try to move out of the place something(&p); something_else(p); // now this moves out // would be desugared to: something(&x.field); something_else(x.field); // now this moves out }
#![allow(unused)] fn main() { let place p = x.method().field; something(&p); // would be desugared to: let tmp = x.method(); something(&tmp.field); }
The one point where this feature is a bit tricky is autoderef:
#![allow(unused)] fn main() { let mut x: std::cell::RefMut<Struct> = ...; let place p = x.field; // should this use `deref` or `deref_mut`? something(&p); something_else(&mut p); // causes `deref_mut` to be called above // becomes: let mut x: std::cell::RefMut<Struct> = ...; let tmp = <_ as DerefMut>::deref_mut(&mut x) let place p = (*tmp).field; something(&p); something_else(&mut p); // and then: let mut x: std::cell::RefMut<Struct> = ...; let mut x: std::cell::RefMut<Struct> = ...; let tmp = <_ as DerefMut>::deref_mut(&mut x) something(&(*tmp).field); something_else(&mut (*tmp).field); }
For that to work, when determining the mutability that a place p binding requires of its matched
place, we consider the mutability required of p anywhere it is used.
This is done in the Deref/DerefMut Desugarings step.
Conditional Place Aliases
This is bit of a crazy feature extension, added to avoid duplicating code in Let Chain Desugaring.
This introduces a built-in macro if_place!($bool, $place1, $place2), valid as a place expression, which
behaves as follows:
- It propagates any place operation to the inside:
if_place!($bool, $place1, $place2).field = if_place!($bool, $place1.field, $place2.field); - When a non-place operation is done to it, it turns into a normal
if:&if_place!($bool, $place1, $place2) = if $bool { &$place1 } else { &$place2 }.
This is the only feature among those proposed that I don't actually wish to see in the language. I hope we can find a cleaner solution to the code duplication problem.
Backlinks
Unchecked Indexing
$place[$index] is a place operation that has the side effect of panicking on
out-of-bounds access.
This feature introduces the place expression unchecked_index!($place, $index) that does the same
but without the bounds check.
Just like built-in indexing, this supports range indexing.
It is of course unsafe to use.
Backlinks
Unique-Immutable Borrow
&uniq T is a new reference type that like &mut T is guaranteed to be the only reference pointing
to a given place. Unlike &mut T however, it doesn't allow mutating the pointed value.
The point of this borrow is that a &uniq &mut T is allowed to mutate the underlying T but not
the &mut T itself. This is used to desugar closure captures.
Discussion
This borrow actually exists in the
compiler
for exactly the same reason we need it: for capturing &mut references.
It is just not exposed to users.
I'm not sure if this is useful for anything else. I recall from discussions on true
reborrowing that possibly a function like fn reborrow<'a, 'b>(x: &'a uniq &'b mut T) -> &'a mut T is more true to what reborrowing does than the
same function with a &mut &mut T argument, but I haven't thought this through in detail.
Backlinks
Worked Examples
Example 1: A pattern match
Source:
#![allow(unused)] fn main() { fn is_north(cmd: &(Option<&str>, i32)) -> bool { match cmd { (Some("north" | "n"), dist) if *dist > 0 => true, (Some(_), _) | (None, _) => false, } } }
After match ergonomics:
#![allow(unused)] fn main() { fn is_north(cmd: &(Option<&str>, i32)) -> bool { match cmd { &(Some("north" | "n"), ref dist) if *dist > 0 => true, &(Some(_), _) | (None, _) => false, } } }
After or-pattern desugaring and match desugaring:
#![allow(unused)] fn main() { fn is_north(cmd: &(Option<&str>, i32)) -> bool { 'match_end: { let dist_; 'arm: { break 'match_end (if let &(Some("north"), ref dist) = cmd && *dist > 0 { dist_ = dist; break 'arm; } else if let &(Some("n"), ref dist) = cmd && *dist > 0 { dist_ = dist; break 'arm; } else if let &(Some(_), _) = cmd { false } else if let &(None, _) = cmd { false } else { unsafe { core::hint::unreachable_unchecked() } }); } true } } }
After pattern unnesting:
#![allow(unused)] fn main() { fn is_north(cmd: &(Option<&str>, i32)) -> bool { 'match_end: { let dist_; 'arm: { break 'match_end ( if (*cmd).0.enum#discriminant == discriminant_of!(Option, Some) && (*cmd).0.Some.0 == "north" && let ref dist = (*cmd).1 && *dist > 0 { dist_ = dist; break 'arm; } else if (*cmd).0.enum#discriminant == discriminant_of!(Option, Some) && (*cmd).0.Some.0 == "n" && let ref dist = (*cmd).1 && *dist > 0 { dist_ = dist; break 'arm; } else if (*cmd).0.enum#discriminant == discriminant_of!(Option, Some) { false } else if (*cmd).0.enum#discriminant == discriminant_of!(Option, None) { false } else { unsafe { core::hint::unreachable_unchecked() } } ); } true } } }
After if-let-chain desugaring (and simplifying blocks a bit):
#![allow(unused)] fn main() { fn is_north(cmd: &(Option<&str>, i32)) -> bool { 'match_end: { let dist_; 'arm: { if (*cmd).0.enum#discriminant == discriminant_of!(Option, Some) { if (*cmd).0.Some.0 == "north" { if let ref dist = (*cmd).1 { if *dist > 0 { dist_ = dist; break 'arm; } } } } if (*cmd).0.enum#discriminant == discriminant_of!(Option, Some) { if (*cmd).0.Some.0 == "n" { if let ref dist = (*cmd).1 { if *dist > 0 { dist_ = dist; break 'arm; } } } } break 'match_end { if (*cmd).0.enum#discriminant == discriminant_of!(Option, Some) { false } else if (*cmd).0.enum#discriminant == discriminant_of!(Option, None) { false } else { unsafe { core::hint::unreachable_unchecked() } } }; } true } } }
Example 2: Closure capture and bounds checks
Source:
#![allow(unused)] fn main() { fn bump_first(xs: &mut [i32]) -> i32 { let mut total = 0; let mut bump = |delta: i32| { total += delta; xs[0] += total; total }; bump(1); bump(2) } }
After expression unnesting:
#![allow(unused)] fn main() { fn bump_first(xs: &mut [i32]) -> i32 { let mut total = 0; let mut bump = |delta: i32| { total = copy!(total) + copy!(delta); let index = 0; let len = core::slice::length(&raw const *xs); assert!(copy!(index) < copy!(len), "index out of bounds"); unchecked_index!(*xs, copy!(index)) += copy!(total); total }; bump(1); bump(2) } }
After closure capture desugarings:
#![allow(unused)] fn main() { fn bump_first(xs: &mut [i32]) -> i32 { let mut total = 0; let mut bump = |delta: i32| { let place total = *move(&mut total); let place xs = *move(&uniq xs); total = copy!(total) + copy!(delta); let index = 0; let len = core::slice::length(&raw const *xs); assert!(copy!(index) < copy!(len), "index out of bounds"); unchecked_index!(*xs, copy!(index)) += copy!(total); total }; bump(1); bump(2) } }