Armed with the knowledge of IR
and Type
from last time
, we can finally write some code:
fn lower(
ast: Ast<TypedVar>,
scheme: ast::TypeScheme
) -> (IR, Type) {
let (ir_ty, types) = lower_ty_scheme(scheme);
todo!()
}
One line down. That’s great progress (especially compared to last post)!
Refresher on Base AST
It’s been awhile since we worked with our base typechecker. I don’t really remember what all was in there, so let’s go over it to get back up to speed.
Recall that our base AST is:
enum Ast<V> {
/// A local variable
Var(V),
/// An integer literal
Int(isize),
/// A function literal
/// (lambda, closure, etc.)
Fun(V, Box<Self>),
/// Function application
App(Box<Self>, Box<Self>),
}
Where our V
generic is the type of variables. It starts out as Var
:
struct Var(usize);
By the end of type checking, we’ve graduated it to TypedVar
:
struct TypedVar(Var, Type);
Naturally, our next question is “What’s Type
?”:
struct TypeVar(u32);
enum Type {
// A type variable
Var(TypeVar),
// Type of integers
Int,
// Type of functions
Fun(Box<Self>, Box<Self>),
}
Alongside the Ast<TypedVar>
, lower
also takes the TypeScheme
produced by typechecking.
The TypeScheme
binds all the unsolved type variables that appear in our typed AST:
struct TypeScheme {
unbound: BTreeSet<TypeVar>,
ty: Type,
}
With that we know everything we’ll see from the typechecker, back to lowering.
Lowering Type Schemes Link to heading
The first step in lowering is to lower our TypeScheme
.
We start with the TypeScheme
because it informs how we lower types.
Types show up both in the TypeScheme
itself and the Ast
, so until we know how to lower types we can’t lower our Ast
.
lower_ty_scheme
’s signature is:
fn lower_ty_scheme(
scheme: ast::TypeScheme
) -> (Type, LowerTypes) {
todo!()
}
lower_ty_scheme
starts by handling bound type variables.
Recall that our type scheme tracks all the generalized type variables that appear in our AST’s type.
The type scheme is the only place these type variables can be introduced.
Each of these AST TypeVar
s becomes an IR TypeVar
bound by a TyFun
node which means we have to convert from AST TypeVar
to DeBruijn Indices.
Fortunately, this is straightforward.
Our type scheme has all the type variables we’re ever going to see and in the order we’re going to see them.
We convert them to DeBruijn Indices once, and we know that will work for every type we need to lower in Ast
:
let ty_env = scheme
.unbound
.into_iter()
.rev()
.enumerate()
.map(|(i, tyvar)| (tyvar, TypeVar(i)))
.collect();
For each type variable in scheme.unbound
, we grab its index and use that as our DeBruijn Index.
Importantly, we do this in reverse order.
The output of this is ty_env
, a HashMap<ast::TypeVar, TypeVar>
.
If we have a type scheme:
let a = TypeVar(4);
let b = TypeVar(23);
let c = TypeVar(149);
let foo = TypeScheme {
unbound: btree_set![a, b, c],
ty: Type::fun(Type::Var(a), Type::fun(Type::Var(b), Type::Var(c)))
};
Our type env will be:
// Excuse the made up `hash_map!` macro.
let ty_env = hash_map![
a: TypeVar(2),
b: TypeVar(1),
c: TypeVar(0)
];
Remember that because DeBruijn Indices count how many TyFun
s we skip over, a
gets TypeVar(2)
not TypeVar(0)
.
This is why we reverse scheme.unbound
when constructing ty_env
.
Back in lower_ty_scheme
, we use ty_env
to lower the type
field of our TypeScheme
.
let lower = LowerTypes { env: ty_env };
let lower_ty = lower.lower_ty(scheme.ty);
LowerTypes
is a helper struct to hold our type environment while we recurse over a type:
struct LowerTypes {
env: HashMap<ast::TypeVar, TypeVar>,
}
It defines one method lower_ty
which is so short we don’t even need to break it up:
impl LowerTypes {
fn lower_ty(&self, ty: ast::Type) -> Type {
match ty {
ast::Type::Int => Type::Int,
ast::Type::Var(v) => Type::Var(self.env[&v]),
ast::Type::Fun(arg, ret) => {
let arg = self.lower_ty(*arg);
let ret = self.lower_ty(*ret);
Type::fun(arg, ret)
}
}
}
}
self.env[&v]
could panic if it encounters a v
absent from env
.
We’re okay with panicking, but if we wanted to be more graceful we’d use get
and call expect
to provide a nice error message.
If we ever do panic, we have a type containing a variable not listed in scheme.unbound
.
That’s a compiler bug.
Note lower_ty
never produces a Type::TyFun
.
ast::Type
can’t ever introduce a type variable, so it never needs to bind a type variable with TyFun
.
As we know, type variables are only introduced by TypeScheme
.
Leaving us with one final task in lower_ty_scheme
: introduce TyFun
s for our type variables.
let bound_lower_ty =
(0..lower.env.len())
.fold(lower_ty, |ty, _| Type::ty_fun(Kind::Type, ty));
For each type variable bound by our type scheme, we wrap lower_ty
in a TyFun
.
In our AST, we distinguish between ast::TypeScheme
and ast::Type
.
As we can see, that distinction disappears in lowering.
Both ast::TypeScheme
and ast::Type
turn into IR Type
.
As an example, lowering our foo
type scheme produces IR type:
TyFun(Kind::Type,
TyFun(Kind::Type,
TyFun(Kind::Type,
Fun(
Var(TypeVar(2),
Fun(TypeVar(1), TypeVar(0)))))))
A TyFun
for each variable in foo.unbound
, and our TypeVar
’s have been converted to DeBruijn Indices.
Putting it all together, our lower_ty_scheme
is:
fn lower_ty_scheme(scheme: ast::TypeScheme) -> (Type, LowerTypes) {
let ty_env = scheme
.unbound
.into_iter()
.rev()
.enumerate()
.map(|(i, tyvar)| (tyvar, TypeVar(i)))
.collect();
let lower = LowerTypes { env: ty_env };
let lower_ty = lower.lower_ty(scheme.ty);
let bound_lower_ty = (0..lower.env.len()).fold(lower_ty, |ty, _| {
Type::ty_fun(Kind::Type, ty)
});
(bound_lower_ty, lower)
}
Time to add another line to lower
:
fn lower(ast: Ast<TypedVar>, scheme: ast::TypeScheme) -> (IR, Type) {
let (ir_ty, types) = lower_ty_scheme(scheme);
// New!
let mut lower_ast = LowerAst {
supply: VarSupply::default(),
types,
};
let ir = lower_ast.lower_ast(ast);
todo!()
}
We got a couple lines this time. Oh, boy!
Lowering Ast Link to heading
LowerAst
shares a purpose with LowerTypes
:
struct LowerAst {
supply: VarSupply,
types: LowerTypes,
}
It holds state for the recursive AST functions we’re going to write.
VarSupply
exists to map AST variables into IR variables and generate new IR variables.
Its internals are uninteresting but can be found in the full code
.
Suffice to say, it supports one method:
impl VarSupply {
fn supply_for(&mut self, var: ast::Var) -> VarId {
// boring...
}
}
supply_for
caches internally.
If we pass the same ast::Var
, we receive the same VarId
.
Moving on, LowerAst
supports one method lower_ast
:
impl LowerAst {
fn lower_ast(&mut self, ast: Ast<TypedVar>) -> IR {
match ast {
//...
}
}
}
We’re well versed in this pattern by now.
Match on the ast
and produce an IR
term for each case.
First up is variables:
Ast::Var(TypedVar(var, ty)) => IR::Var(Var::new(
self.supply.supply_for(var),
self.types.lower_ty(ty),
)),
A TypedVar
turns into an IR Var
.
Thanks to all that code we wrote earlier converting is very easy.
We convert our ast::Var
into a Var
using VarSupply
and lower our ast::Type
using LowerTypes
.
Because self.types
uses the same ty_env
as in lower_ty_scheme
, we can be confident it will lower types equivalently.
Ast::Int(i) => IR::Int(i),
Once again Int
holds us down by being a consistent freebie.
Fun
isn’t much more complicated:
Ast::Fun(TypedVar(var, ty), body) => {
let ir_ty = self.types.lower_ty(ty);
let ir_var = self.supply.supply_for(var);
let ir_body = self.lower_ast(*body);
IR::fun(Var::new(ir_var, ir_ty), ir_body)
}
More code, but there are no surprises here.
We deconstruct our Fun
node, lower all its parts, and recombine them into an IR Fun
.
Last but not least, App
:
Ast::App(fun, arg) => {
let ir_fun = self.lower_ast(*fun);
let ir_arg = self.lower_ast(*arg);
IR::app(ir_fun, ir_arg)
}
Straightforward again, we lower our part and recombine them into an IR App
.
It feels almost like we’ve done nothing at all. Each of our cases turns an AST node into an IR node of the same name. This is rooted in our AST being lambda calculus and our IR being System F. Translating the lambda calculus into System F, a superset of the lambda calculus, requires little work.
Lowering Totality Link to heading
With lower_ast
complete, we return to lower
.
One task remains.
Just as we wrapped our lowered type in TyFun
nodes, we have to wrap our lowered Ast
in TyFun
nodes:
fn lower(ast: Ast<TypedVar>, scheme: ast::TypeScheme) -> (IR, Type) {
let (ir_ty, types) = lower_ty_scheme(scheme);
let mut lower_ast = LowerAst {
supply: VarSupply::default(),
types,
};
let ir = lower_ast.lower_ast(ast);
// New!
let bound_ir =
(0..lower_ast.types.env.len())
.fold(ir, |ir, _| IR::ty_fun(Kind::Type, ir));
(bound_ir, ir_ty)
}
The most important section of lowering resides in this innocuous bit of code. Terms in the AST have no concept of types. They show up purely in type schemes and types. In our IR, by contrast, we have a term that binds types at the term level.
The consequences of representing types in our term are far-reaching.
If an optimization removes ty_fun
s from a term, we know we’re removing polymorphism.
If we see a term with no ty_fun
, we know that term can’t be generic.
Determining where polymorphism resides is instrumental to monomorphization
and emitting machine code.
Two passes we’ll visit before completing our simple compiler.
Our lower
function now translates our implicitly typed AST into our explicitly typed IR.
Lowering may look ceremonial here, but rest assured it will get more interesting as we add more features from our fancier type checkers
.
Even without fancy features, we’ve taken another step towards completing our compiler.