# RakuAST RakuAST is an in-development proposed AST for the Raku language (if unfamiliar with ASTs, think of it as a document object model for Raku code). The intention is for it to become part of the language specification, and the tests covering it - in `t/12-rakuast/` so far - to become specification tests. ## Status RakuAST is currently in the early design/implementation phase. Absolutely everything is, at this point, speculative. More is missing than is done. The files in this directory are translated into NQP. The bodies of the methods *are* NQP, while the signatures and class bits are parsed and then built up using the Raku MOP and meta-objects, so that they introspect and fit in the type system like normal Raku objects. The overall approach to RakuAST is to: 1. Flesh out a decent amount of the object model and test-cover it via. `EVAL` of the ASTs. 2. Then see about having alternative actions/world that, if an env var is set, produce RakuAST and compile from that. (This is where we are right now.) Currently, this variable is `RAKUDO_RAKUAST`. 3. Gradually work our way through passing the Rakudo tests with that env var set. 4. And then the spectests. 5. And then make it the default (hopefully this means we will have few problems left to discover when we throw `CORE.setting` at it!) 6. Implement a RakuAST-based optimizer. 7. Optimize RakuAST itself. 8. Ship it! ## If you want to help... Designing and implementing the nodes themselves is - for now - probably best left to jnthn in hope we might achieve some consistency there - especially at this fairly early stage. However, for those eager to join in, there's still quite a few other things to do that will be helpful. Specifically: * Fix `make test`, which is grumpy because we're adding all the new `RakuAST` nodes and they're visible in the setting, but it wants to test there's no new symbols (which, well, there are). (done, for now at least) * Get the compiler ID to factor in the AST nodes, so that a re-compile with only AST changes will invalidate previous pre-comps, so we don't have to remove `lib/.precomp/` after such a build. (done) * Fix build system issues (doesn't rebuild if the AST compiler changes, etc.) (Difficulty: well, it involves a build system...) * Get the AST compiler to support roles, and gradually transition the things that should be roles to actually be roles. (Difficulty: maybe headachey, but you'll live) * Make the AST compiler support return types with `-->` and add them to the signature that is generated. Make accessors get these automatically based on the declared type. (Difficulty: not so bad.) * Make us check the types that are passed to methods. (Difficulty: depends how we decide to do it. Actually it may be that we just get NQP to do the type checks and then rely on that. In fact, we could teach it to decont incoming arguments too, and support `is raw` too, and then we get to clean up lots of explicit deconts in the bootstrap, MOP, etc. Then we simplify the RakuAST compiler.) * Make us indicate slurpiness when signatures are introspected. (Difficulty: easy, just need to make sure the AST compiler passes that along when we build the Parameter object. And in general: * While test-first development is being practiced, it's still possible to miss coverage; feel free to add more tests * Try it out, break it, etc. (Please don't report things that are missing at this point. Most things are, so it's not that useful. :-)) ## RakuAST goals * Model things at the language level, without normalizing too strongly (so, almost the opposite of the QAST design goal) * Everything that happens with the AST upto and including `CHECK` is for userspace; everything after it is not (the content of a `quasi` is not RakuAST for these purposes) * RakuAST can be constructed directly or using `quasi`s * RakuAST lets us use Raku as a frontend (example: a web framework could take a regex used for validation and translate the possible subset into something suitable for use with the `pattern` attribute in HTML, to naturally give client-side validation) * RakuAST lets us use Raku as a backend (example: compile a JavaScript style regex from a JSON schema into `RakuAST` nodes rather than Raku source) ## General API design notes * RakuAST nodes are constructed by passing named arguments, *unless* they have a single attribute to be constructed with, in which case they take it as a single positional argument. * Use Raku naming conventions (kebab-case names, etc.) * Use names that already exist for concepts, be that how they are called in the MOP or the grammar. * Everything should be achieveable through tree construction. The user should not have to do any linking themselves. Resolutions imply a graph, but are a level atop of that, and many use-cases outside of the compiler will never have to explicitly participate in that. * Nodes are free to cache things, but any laziness inside the nodes should be threadsafe. Benign races are fine (e.g. both calculate the same thing and one wins at installation). Effectively, anything perceived as a read operation should be safe in a threaded program. ## Design notes on specific topics ### The compile-time challenge Today in Rakudo, we do a lot of declarative things "as soon as possible", such as creating meta-objects. We then poke those into symbol tables. Macros mean that we have to have much more capability to delay that, because, for instance, a `class` declaration written in a `quasi` must spring into existence per time the `quasi` is instantiated. At the same time, activities at `BEGIN` time are well known as having effects on the parsing and compilation following them within the compilation unit. At the simplest level, knowing what things are types, which are terms, and which are subs is critical to being able to parse Raku code (for example, `if foo { }` is going to pass the block to a `sub` `foo`, but would be the block of the `if` when `foo` is a term). More generally, `BEGIN` time may do things that impact upon the future parse in any way whatsoever. For the sake of a `quasi`, we'll need to restrict this, with `BEGIN` even deferred to splice time. Effectively, the set of language changes you can effect within a `quasi` are far more restricted. One particular challenge that will come up here is that of `use` or `import` statements in a `quasi`; since we don't do the `use` or `import` until splice time, we don't have the information to do the function vs. term distinction. An easy way out of this in many cases will be to do the `use` in the `macro` itself, in which case the imports are *fixated* and so can be referenced within the `quasi` unambiguously. Outside of a `quasi`, certain `BEGIN`-time things will act as "sequence points", forcing the formation of meta-objects and the enactment of compile-time work. These include: * `use` statements * A `BEGIN` phaser * A `constant` with a non-literal value [Conjectural: this may be possible to relax if we can understand something about the needs of their content.] Raku AST elements that will create a meta-object all do the type `RakuAST::Meta`. Such an element may be born in quasi or non-quasi state (they are born in the `quasi` state if they are written inside of a `quasi`, and transition to a non-quasi state upon quasi quote interpolation). Such `RakuAST::Meta` elements have the methods: * `meta-object` - throws an exception if the element is in quasi state. In a non-quasi state, returns the meta-object that the node describes, producing and caching it if it has not already been produced. * `has-meta-object` - returns `True` if the meta-object has already been produced, and `False` otherwise The production of the meta-object may entail the production of dependent meta-objects as a side-effect (for example, demanding the meta-object of a routine in turn requires the signature meta-object, which in turn requires the parameter meta-objects). The production of meta-objects requires that all references involved have been resolved. Sometimes, we need to establish circular relationships between meta-objects. In that case, we have stubby meta, which allows us to separate stubbing and finalizing the meta-object. ### Symbol resolution #### Declarations Every declaration of a symbol is a `RakuAST::Declaration`. Declarations are subject to instantiation when located within a `quasi`, being cloned just like anything else. #### Lookups Everything that is a reference to a symbol that is to be resolved is a `RakuAST::Lookup`. The resolution is supplied by a `RakuAST::Declaration`. Note that resolution to a declaration does not imply there's no runtime lookup involved. For example, a resolution that points to a lexical declaration will be subject to lexical lookup per scope instantiation at runtime. Not all lookups need a resolution by runtime. In fact, for a dynamic like `$*foo` it's usually not possible. A given lookup can indicate whether it must have been resolved. The API of a lookup is: * `needs-resolution` - returns `True` if the lookup must be resolved at compile time, and `False` if it's happy being entirely left until runtime * `is-resolved` - returns `True` if the lookup has a resolution * `resolution` - gets the resolution if their is one, or throws an exception if not * `set-resolution($decl)` - sets the resolution to a given `RakuAST::Declaration` * `resolve($context)` - resolves the resolution using the provided resolution context ### Resolution contexts A `RakuAST` node does not know about its context, and so is not on its own able to resolve itself. A resolution context provides the things needed in order to do that. It is able to perform lexical lookups, as well as having access to the current global symbol table. It can also resolve multi-level package lookups. There will be multiple different resolution context implementations suited to different cases (for example, that used when parsing a `quasi` will resolve variables to their fixations). So far we just have the `EVAL` resolver, which is for when we have an AST and so aren't doing any parsing. ### Scopes A `RakuAST::LexicalScope` models something that implies a lexical scope upon the lexical declarations within it. It can be called upon to locate all of the lexical declarations made immediately within it (excluding those within nested scopes). It is free to make a cache of these to avoid having to search each time, allowing for faster resolution of symbols. It will be for tree modifiers to invalidate such things. ### Provenance All nodes will have an optional "provenance" attribute (name TBD), which answers "where did I come from". By default we'll only attach such information to some elements (good enough to provide line information at statement level, as today), but will support a detailed mode where the exact textual positions of all elements can be recovered, as well as the original program text. This will be useful for those wanting to do things like tidying and syntax highlighters.