Exploring inband lifetimes by converting librustc_mir
The diff from the conversion can be found here.
General Feeling
Inband lifetimes is a limited change, and does not feel like it greatly enhances code. However, it also doesn’t hurt much and feels slightly better in many cases.
However, there are numerous edge cases and slight pain points, many having to do with a lack of known standard ways to do things. As such, many of the edge cases are likely to fall away as we develop after stabilization and come up with standard methods to work with the new feature.
The primary work to migrate is essentially just deleting ~all lifetime headers (<'a, 'b, 'c>
)
across impls and functions. More intensive migration would involve replacing untied/single-use
lifetimes with '_
in all cases. This is quite hard to do from a person perspective (though
compiler can likely do so fairly easily).
Concern points
Determining what a lifetime’s “scope” is
The problem here lies in 'tcx
-like lifetimes which are named repeatedly, e.g. the 'tcx
lifetime
below is distinct from the 'tcx
lifetime in the parent.
Inband lifetimes remove the requirement to declare lifetimes at each scope. This is certainly a win for horizontal codespace and for ease of adding a lifetime (two places, input and output, instead of three) but is a loss for determining whether the lifetime being referenced comes from the function, that is, is freshly declared, related to the lifetime of each call or comes from a parent impl block.
# 2015
impl<'a, 'tcx> Foo for Bar<'a, 'tcx> {
fn foo(&self) {
// ... code ...
fn baz<'a, 'tcx>(baz: &'a mut Bar<'a, 'tcx>) -> &'a mut Vec<u32> {
unimplemented!()
}
}
}
# 2018
impl Foo for Bar<'a, 'tcx> {
fn foo(&self) {
// ... code ...
// where does this 'a refer to? In order to know, I need to look at indent levels essentially --
// harder than before, to an extent.
fn baz(baz: &'a mut Bar<'a, 'tcx>) -> &'a mut Vec<u32> {
unimplemented!()
}
}
}
Another case that introduces friction (especially, I think, for new users):
# 2018
impl MirBorrowckCtxt<'cx, 'gcx, 'tcx> {
fn report_use_of_moved_or_uninitialized(
&mut self,
curr_move_out: &FlowAtLocation<MovingOutStatements<'_, 'gcx, 'tcx>>,
) {
unimplemented!()
}
}
The &FlowAtLocation<MovingOutStatements<'_, 'gcx, 'tcx>>
type for curr_move_out
: what does the
'_
refer to here? Some new lifetime akin to what in 2015 would have 'a
on the fn
,
not the impl
? A new user might become confused, while “experienced” Rust programmers would likely
relatively quickly determine that this is equivalent to <'a>
on the fn in Rust 2015. However, it
would take a slight amount of thought
This specific pain point is grouped in with determining scope because previously such a lifetime
could always be ‘looked up’ whereas now '_
is never declared.
Need for '_
in already borrowed data
Consider &Mir<'_>
: what benefit does having the '_
there give us? The user is given no
additional information, because the &
already indicates that the type is borrowed data for some
lifetime. The reason we omit requiring '_
on &
and &mut
is because it already
obvious to the reader that there is a non-‘static lifetime there.
Can we do the same for any type that is explicitly behind &
/&mut
as well? This seems a natural
extension of the rules, though perhaps may not communicate quite as much information when
copying/moving code around.
The ever-prevalent 'a
lifetime
fn mir_borrowck<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) -> BorrowCheckResult<'tcx>
This definition uses the typical 'a
lifetime to indicate the TyCtxt’s internal borrows, because
TyCtxt
is actually just a pair of references. The straightforward way of rewriting this is to
simply omit the lifetime declarations:
fn mir_borrowck(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) -> BorrowCheckResult<'tcx>
I would argue though that this should be unidiomatic. Instead, a better way of rewriting this function would be:
fn mir_borrowck(tcx: TyCtxt<'_, 'tcx, 'tcx>, def_id: DefId) -> BorrowCheckResult<'tcx>
The initial intuition here is that any lifetime that is unnamed in all code should be replaced with '_
.
However, I’m not sure that '_
is quite the best way to do this. It might be that I’m just not used
to this yet, but today, at least, whenever you see a 'a
lifetime that generally just means that
either an explicit lifetime is necessary to tie a few lifetimes together, or you’re in a struct/impl
declaration with one lifetime.
Some part of me really likes 'a
over '_
because to me, seeing a
there immediately communicates
the unimportance of the lifetime, and lack of meaning attached to the lifetime. Unlike a
, though,
_
stands out quite a bit, especially in (most) fonts where the '
and _
symbols are quite far
apart vertically, and is harder to ignore. It also looks more sigil-y when compared to 'a
; '_
looks far more special than 'a
. In some ways it is, but it’s entire point is to de-emphasize
itself.
All this is somewhat nebulous; it’s unclear whether this is an initial feeling and we’ll all get
used to seeing 'a
replaced with '_
. However, it’s also not completely obvious that such a
replacement is always possible: for example, if the above function returned something more
complicated:
# 2015
fn mir_borrowck<...>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: &DefId) -> BorrowCheckResult<'a, 'tcx>`
# 2018 (but doesn't compile)
fn mir_borrowck(tcx: TyCtxt<'_, 'tcx, 'tcx>, def_id: &DefId) -> BorrowCheckResult<'_, 'tcx>`
This function can’t be written with '_
in the output, because that would be ambiguous. But, we
have no better name to give it. We want 'a
but that is (per the suggested rules) generally not the
best idea, might lead to needing to allow lints, that sort of thing.
I believe that in the past people have simply gone with 'a
here and if necessary cycled to 'b'
,
'c
, etc. as they need more lifetimes, but the new approach tends away from that, tending towards
trying to name all lifetimes explicitly.
The overall conclusion here is that '_
is primarily only usable for eliding any name in the input
parameters. Elision through it in the output parameters is likely to fail with time, as the above
code shows. This makes me believe that a valid approach may be to limit the use of '_
to input
arguments, and disallow it in the output parameters.
Does this mean we shouldn’t support '_
at all? I would argue that the answer is yes. In all cases
I’ve come up with, replacing '_
with an explicit tie to the input parameters isn’t that much more
work, and communicates intent far better. This is especially true to a beginner, who may believe
that '_
is special (it, afterall, looks special) whereas in most cases seeing the explicit
lifetime tie will be much clearer and less magic-feeling. Lifetimes are an area of the language
where beginners are often already confused, so eliminating magic seems like a good general sense. We
can also always permit elision through '_
in the future.
Closure Confusion
In the case where closure parameters are declared with lifetime names, it is unclear that these are
not ‘fresh’ names. See, for example, |mir: &mut MirBorrowckCtxt<'cx, 'gcx, 'tcx>|
: is the closure
generic over these lifetimes, or is it taking them from the parent scope? That is, are we treating
the function somewhat like an impl
block and inheriting lifetimes from it, or is the closure more
like fn foo<'cx, 'gcx, 'tcx>
?
Inband lifetimes make the distinction in a closure less clear, as especially new users may see these
‘cx, ‘gcx, and ‘tcx as re-declaring new lifetimes. This is especially true where 'a
or similar
generic names. Previously we’d be able to say that you can look for <...>
to find the
“declaration” and no other place could declare lifetimes or types. This is no longer the case.
This also means that rewriting a closure without captures to a fn
was never lossless, but
previously the compiler would say “please declare these lifetimes” when making such an attempt. Now,
no such warning/error is issued, but behavior has changed.
Lifetimes cannot use keyword names
For example, fn outgoing_edges(&'self self) -> Edges<'self>
is not permitted; which is not all
that nice. It seems like inband lifetimes are going to drive people towards wanting to use keywords
in lifetime names and it’s rather ugly to do 'self_
or the compiler’s semi-convention of this
.
I suppose the primary use case for reserving 'self
as a lifetime is to later be able to use it for
safe self-referential structs and the like, but perhaps we can allow it inside functions? There may
be a good middle ground here.
Repeated use of '_
For example, TyCtxt<'_, '_, '_>
. This looks somewhat like it’s tying the lifetimes of the TyCtxt
together, though of course after we explain that '_
is special then perhaps you won’t think that.
But it does seem like another case where '_
feels possibly harmful.
But it is actually quite common to see cases where nothing actually cares that the lifetimes are
tied together. For example, often you see code take a TyCtxt, where unless you need it to be
local (via 'tcx
being passed into both the 'gcx
and 'tcx
slots) you likely don’t actually care
what the lifetimes are; you just need three distinct lifetimes. This use case seems like a
surface-level perfect fit for '_
.
When named lifetimes don’t matter
Consider &Place<'tcx>
inside a function that also takes a TyCtxt
with that lifetime or impl
block of similar nature. There’s nothing you can actually do with the Place
that would use the
fact that it’s lifetime parameter is 'tcx
in most cases (if it’s variant over that lifetime, of
course), which means that this feels like it should instead be &Place<'_>
. But is that misleading?
All Places likely want to be 'tcx
in practice, so perhaps there’s no reason for us to elide the
lifetime there.
impl<'tcx> ToRegionVid for &'tcx RegionKind
uses 'tcx
, but this code arguably should be '_
or
'a
: the lifetime of the reference is absolutely unimportant.
Also Ty<'tcx>
as a param where the output is bool
or some such.
This is actually quite common in rustc code, especially where a type could contain tcx data but in this case contains static data (e.g., enum with a reference in one variant, u32 in the other).
Replacing multiple ‘a, ‘b with ‘_ is really nice
impl<'a, 'b, 'gcx, 'tcx> TypeOutlivesDelegate<'tcx> for &'a mut ConstraintConversion<'b, 'gcx, 'tcx>
impl TypeOutlivesDelegate<'tcx> for &'_ mut ConstraintConversion<'_, 'gcx, 'tcx>
This is really nice, and a perfect use case for '_
: the lifetimes don’t matter, but we need to
specify them. More so, using '_
also gives the added benefit that using the lifetime becomes
impossible: you can’t refer to '_
from inside the impl block.
Error: lifetimes used in fn
or Fn
syntax must be explicitly declared using <...>
binders
Quite unexpected, and especially awkward when the lifetime must be declared at impl scope, causing all other lifetimes declared to no longer be able to use inband.
See also this issue, which indicates that this is expected behavior.
When you have lots of functions taking ‘a
Harder to tell if all of these are bound to some root struct/scope. Previously you’d declare them
via <'a>
at each function.
Thoughts for standard choices
Tie lifetime naming
Anytime a need arises to tie lifetimes together, use the argument name instead of 'a
or similar if
no better name exists (e.g., 'tcx
can be a better name). This makes it easier to speak about the
lifetimes.
Unused lifetimes in impl blocks
Consider impl Iterator for Edges<'_>
: is that preferable to impl Iterator for Edges<'graph>
?
I posit that the answer is no: explicitly stating the lifetime provides better context and more information to the user. However, it is more verbose.
I do think that in general we should eventually lint for “unused” lifetimes in impls as well,
suggesting '_
if that’s the pattern we recommend. This is made more difficult by the fact that
today we do not lint at all for unused lifetimes.
However, this distinction becomes more murky with structs that are essentially bags of references,
where the lifetime is actually uninteresting and has no good name, e.g., for impl Iterator for
TyCtxt<'a, 'gcx, 'tcx>
: the 'a
there is uninteresting and would like be better replaced with
'_
.
Single-use lifetimes are actually quite common
TyCtxt<'a, 'gcx, 'tcx>
is often used in old code and can often be replaced with
TyCtxt<'_, '_, 'tcx>
. It is unclear at this point whether doing so in all cases is beneficial.