## $Id$ =head1 RAKUDO COMPILER OVERVIEW F =head2 How the Rakudo compiler works This document describes the architecture and operation of the Rakudo Raku (or simply Rakudo) compiler. The F describes how to build and run Rakudo. Rakudo has six main parts summarized below. Source code paths are relative to Rakudo's F directory, and platform specific filename extensions such as F<.exe> are sometimes omitted for brevity. =over 4 =item 1. Not Quite Perl builds Raku source code parts into Rakudo =item 2. A main program drives parsing, code generation and runtime execution (F) =item 3. A grammar parses user programs (F) =item 4. Action methods build a Parrot Abstract Syntax Tree (F) =item 5. Parrot extensions provide Raku run time behavior (F, F, F) =item 6. Libraries provide functions at run time (F, F, F, F, F) =back The F (generated from F<../tools/build/Makefile.in> by F<../Configure.pl>) compiles all the parts to form the F executable and the F or F "fake executable". We call it fake because it has only a small stub of code to start the Parrot virtual machine, and passes itself as a chunk of bytecode for Parrot to execute. The source code of the "fakecutable" is generated as F with the stub at the very end. The entire contents of F are represented as escaped octal characters in one huge string called C. What a hack! =head2 1. NQP The source files of Rakudo are preferably and increasingly written in Perl, the remainder in Parrot Intermediate Representation (PIR) or C. Not Quite Perl (nqp) provides the bootstrap step of compiling compiler code (yes!) written in a subset of Perl, into PIR. The latest version of NQP includes the I<6model> library, which is the building block for all Raku objects. It also comes with a regex engine that Rakudo uses. NQP is a bootstrapped compiler, it is mostly written in NQP. The source code of NQP is in a separate repository at L. Note, NQPx only I the Rakudo compiler, and does not compile or run user programs. =head3 Stages NQP compiles us a compiler in F<../perl6.pbc> and then F<../perl6> or F<../perl6>. NQP also compiles the I found in F. This is a library that controls how classes, methods, roles and so on work. The bare-bones compiler then loads the compiled metamodel, and compiles the I found in F. Those core files provide the runtime library (like the C and C classes). But note that many of these classes are also used when the final compiler processes your Raku scripts. =head2 2. Compiler main program A subroutine called C<'MAIN'>, in F, starts the source parsing and bytecode generation work. It creates a C object for the C<'raku'> source type. Before tracing Rakudo's execution further, a few words about Parrot process and library initialization. Parrot execution does not simply begin with 'main'. When Parrot executes a bytecode file, it first calls all subroutines in it that are marked with the C<:init> modifier. Rakudo has over 50 such subroutines, brought in by C<.include> directives in F, to create classes and objects in Parrot's memory. Similarly, when the executable loads libraries, Parrot automatically calls subs having the C<:load> modifier. The Rakudo C<:init> subs are usually also C<:load>, so that the same startup sequence occurs whether Rakudo is run as an executable or loaded as a library. So, that Rakudo 'main' subroutine had created a C object. Next, 'main' invokes the C<'command_line'> method on this object, passing the command line arguments in a PMC called C. The C<'command_line'> method is inherited from the C parent class (part of the PCT, remember). And that's it, apart from a C<'!fire_phasers'('END')> and an C. Well, as far a C<'main'> is concerned. The remaining work is divided between PCT, grammar and actions. =head2 3. Grammar Using C, C target C uses F to compile F to F. The compiler works by calling C method in F. After some initialization, TOP matches the user program to the comp_unit (meaning compilation unit) token. That triggers a series of matches to other tokens and rules (two kinds of regex) depending on the source in the user program. For example, here's the parse rule for Rakudo's C statement (in F): token statement_control:sym { :s [ || <.panic: 'unless does not take "else", please rewrite using "if"'> ] } This token says that an C statement consists of the word "unless" (captured into C<< $ >>), and then an expression followed by a block. The C<.panic:> is a typical "Awesome" error message and the syntax is almost exactly the same as in F, described below. Remember that for a match, not only must the C<< >> match the word C, the C<< >> must also match the C token. If you read more of F, you will learn that C in turn tries to match an C<< >> and a C<< >>, which in turn tries to match ..... That is why this parsing algorithm is called Recursive Descent. The top-level portion of the grammar is written using Raku rules (Synopsis 5) and is based on the STD.pm grammar in the C repository (L). There are a few places where Rakudo's grammar deviates from STD.pm, but the ultimate goal is for the two to converge. Rakudo's grammar inherits from PCT's C, which provides the C<< <.panic> >> rule to throw exceptions for syntax errors. =head2 4. Actions The F file defines the code that the compiler generates when it matches each token or rule. The output is a tree hierarchy of objects representing language syntax elements, such as a statement. The tree is called a Parrot Abstract Syntax Tree (PAST). The C class inherits from C, another part of the Parrot Compiler Toolkit. Look in F<../parrot/ext/nqp-rx/stage0/src/HLL-s0.pir> for several instances of C<.namespace ["HLL";"Actions"]>. When the PCT calls the C<'parse'> method on a grammar, it passes not only the program source code, but also a pointer to a parseactions class such as our compiled C. Then, each time the parser matches a named regex in the grammar, it automatically invokes the same named method in the actions class. Back to the C example, here's the action method for the C statement (from F): method statement_control:sym($/) { my $past := xblock_immediate( $.ast ); $past.pasttype('unless'); make $past; } When the parser invokes this action method, the current match object containing the parsed statement is passed into the method as C<$/>. In Raku, this means that the expression C<< $ >> refers to whatever the parser matched to the C token. Similarly there are C<< $ >> and C<< $ >> objects etc until the end of the recursive descent. By the way, C<< $ >> is Raku syntactic sugar for C< $/{'xblock'} >. The magic occurs in the C<< $.ast >> and C expressions in the method body. The C<.ast> method retrieves the PAST made already for the C subtree. Thus C<$past> becomes a node object describing code to conditionally execute the block in the subtree. The C statement at the end of the method sets the newly created C node as the PAST representation of the unless statement that was just parsed. The Parrot Compiler Toolkit provides a wide variety of PAST node types for representing the various components of a HLL program -- for more details about the available node types, see PDD 26 ( L ). The PAST representation is the final stage of processing in Rakudo itself, and is given to Parrot directly. Parrot does the remainder of the work translating from PAST to PIR and then to bytecode. =head2 5. Parrot extensions Rakudo extends the Parrot virtual machine dynamically (i.e. at run time), adding 14 dynamic opcodes ("dynops") which are additional virtual machine code instructions, and 9 dynamic PMCs ("dynpmcs") (PolyMorphic Container, remember?) which are are Parrot's equivalent of class definitions. The dynops source is in F, which looks like C, apart from some Perlish syntactic sugar. A F<../parrot_install/bin/ops2c> desugars that to F which your C compiler turns into a library. For this overview, the opcode names and parameters might give a vague idea what they're about: rakudo_dynop_setup() rebless_subclass(in PMC, in PMC) find_lex_skip_current(out PMC, in STR) x_is_uprop(out INT, in STR, in STR, in INT) get_next_candidate_info(out PMC, out PMC, out PMC) transform_to_p6opaque(inout PMC) deobjectref(out PMC, in PMC) descalarref(out PMC, in PMC) allocate_signature(out PMC, in INT) get_signature_size(out INT, in PMC) set_signature_elem(in PMC, in INT, in STR, in INT, inout PMC, inout PMC, inout PMC, inout PMC, inout PMC, inout PMC, in STR) get_signature_elem(in PMC, in INT, out STR, out INT, out PMC, out PMC, out PMC, out PMC, out PMC, out PMC, out STR) bind_signature(in PMC) x_setprophash(in PMC, in PMC) The dynamic PMCs are in F, one file per class. The language is again almost C, but with other sugary differences this time, for example definitions like C whose purpose will appear shortly. A F<../parrot_install/lib/x.y.z-devel/tools/build/pmc2c.pl> converts the sugar to something your C compiler understands. For a rough idea what these classes are for, here are the names: P6Invocation P6LowLevelSig MutableVAR Perl6Scalar ObjectRef P6role Perl6MultiSub Perl6Str and P6Opaque. =head3 Binder The dynops and the dynpmcs call a utility routine called a signature binder, via a function pointer called C. A binder matches parameters passed by callers of subs, methods and other code blocks, to the lexical names used internally. Parrot has a flexible set of calling conventions, but the Perl 6 permutations of arity, multiple dispatch, positional and named parameters, with constraints, defaults, flattening and slurping needs a higher level of operation. The answer lies in F which is compiled into C and C libraries. Read L for a more detailed explanation of the binder. F has three C<.loadlib> commands early on. The C loads the 9 PMCs, the C does the 14 dynops, and the C adds over 30 mathematical operators such as C, C, C, C
, C, C, C, C etc. (source in F) =head2 6. Builtin functions and runtime support The last component of the compiler are the various builtin functions and libraries that a Raku program expects to have available when it is running. These include functions for the basic operations (C<< infix:<+> >>, C<< prefix: >>) as well as common global functions such as C and C. The stage-1 compiler compiles these all and they become part of the final F. The source code is in F, F, F, F and F. =head2 Still to be documented * Rakudo PMCs * The relationship between Parrot classes and Rakudo classes * Protoobject implementation and basic class hierarchy =head1 AUTHORS Patrick Michaud is the primary author and maintainer of Rakudo. The other contributors and named in F. =head1 COPYRIGHT Copyright (C) 2007-2010, The Perl Foundation. =cut # Local Variables: # fill-column: 100 # End: # vim: expandtab shiftwidth=4: