------------------------------- Page    i -------------------------------

                    A Tour through the UTS C Compiler




                                                 D. M. Ritchie



                                                 Edited for UTS

------------------------------- Page   ii -------------------------------

                            TABLE OF CONTENTS


1.    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1

2.    The Intermediate Language . . . . . . . . . . . . . . . . . . .   1

3.    Expression Optimization . . . . . . . . . . . . . . . . . . . .   6

4.    Code Generation . . . . . . . . . . . . . . . . . . . . . . . .   8

5.    Delaying and reordering . . . . . . . . . . . . . . . . . . . .  19


                                                            Last Page  21

-------------------------------- Page  1 --------------------------------

1.    PREFACE

This document describes the C compiler currently running under UTS.  Most
of this document is taken  directly from 'A Tour through the Unix C  Com-
piler' written  by D.  M. Ritchie  of Bell  Labs.  Additional  statements
describing those features added for UTS have been included.  Also,  since
the compiler produces Amdahl-470 code and  not PDP-11 code, the text  has
been edited where appropriate.

Briefly, the current C compiler was built from the PDP-11 compiler  mark-
eted under UNIX-.   Obviously, major  modifications to  the code  tables,
together with adjustments to certain compiler routines, were necessary to
produce Amdahl-470 code.  However, except  for the code tables, the  com-
piler closely resembles the UNIX PDP-11 C compiler.




2.    THE INTERMEDIATE LANGUAGE

Communication between the two  phases of the  compiler proper is  carried
out by a pair of  intermediate files.  These files are treated as  having
identical structure, although the second file contains only the code gen-
erated for strings.

The intermediate language  is not machine-independent;  its structure  in
several ways reflects the fact that C was originally a  one pass compiler
chopped in two to reduce the  maximum memory requirement.  In fact,  only
the latest version of the  compiler has a complete intermediate  language
at all.  Until recently, the first phase of the compiler generated assem-
bly code for those constructions  it could deal with, and passed  expres-
sion parse trees, in absolute binary form,  to the second phase for  code
generation.  Now, at  least, all  interphase information is  passed in  a
describable form, and  there are  no absolute pointers  involved, so  the
coupling between the phases is not so strong.

The areas in which the machine (and system) dependencies are most notice-
able are

 1.  Storage allocation for automatic variables and arguments has already
     been done, and nodes for such variables refer to them by offset from
     a display pointer.  Type  conversion (for example,  from integer  to
     pointer) has already occurred using the assumption of byte  address-
     ing and 4 byte words.

 2.  Data representations  suitable to  the  Amdahl-470 are  assumed;  in
_______________
  -UNIX is a trademark of Bell Laboratories.

-------------------------------- Page  2 --------------------------------

     particular, floating point constants are passed as four words in the
     machine representation.

As it happens, each  intermediate file  is represented as  a sequence  of
binary numbers  without  any explicit  demarcations.   It consists  of  a
sequence of conceptual lines,  each headed by  an operator, and  possibly
containing various operands.  The operators are small numbers; to  assist
in recognizing failure in  synchronization, the high-order  byte of  each
operator word is always  the octal number  0376.  Operands are either  32
bit binary numbers  or strings  of characters  representing names.   Each
name is ended by a null character.  There is no alignment requirement for
numerical operands and so there is no padding after a name string.

The binary  representation was  chosen to  avoid converting  to and  from
character form and to minimize  the size of the files.  It would be  easy
to make each operator-operand 'line' in the file be a genuine,  printable
line, with the numbers in  octal or decimal; this was the  representation
originally used.

The operators fall naturally into two classes: those that represent  part
of an  expression, and  all  others.  Expressions  are transmitted  in  a
reverse Polish notation; as they are being read, a tree is built that  is
isomorphic to the tree built in the first phase.  Expressions  are passed
as a  whole, with  no nonexpression  operators intervening.   The  reader
maintains a stack; each leaf  of the expression tree (name, constant)  is
pushed on the stack; each unary operator replaces the top of the stack by
a node  whose operand  is  the old  top of  stack;  each binary  operator
replaces the top pair on the stack with a single entry.  When the expres-
sion is complete there is exactly one item on the  stack.  Following each
expression is a special operator that passes the unique previous  expres-
sion to the 'optimizer' described below and then to the code generator.

Here is the list of operators not themselves part of expressions.

EOF
     marks the end of an input file.

BDATA flag data ...
     specifies a sequence of bytes to be assembled as static data.  It is
     followed by pairs of words; the first member of the  pair is nonzero
     to mean that the data continue; a zero flag is not followed by  data
     and ends the operator.  The data bytes occupy the low-order  part of
     a word.

SINIT value
     specifies the value of an initialized external integer or  unsigned.
     The name associated with the  NLABEL preceding SINIT is the name  of
     the initialized external variable.

PROG

-------------------------------- Page  3 --------------------------------

     means that later information is to be compiled as program text.

DATA
     means that later information is to be compiled as static data.

BSS
     means that  later information  is to  be compiled  as  uninitialized
     static data.

SYMDEF name
     means that  the symbol  name  is an  external name  defined  in  the
     current program.  It is produced for each external data or  function
     definition.

CSPACE name size
     means that the name refers to a data  area whose size is the  speci-
     fied number of bytes.  It is produced for external data  definitions
     without explicit initialization.

SSPACE size
     means that size bytes should be set aside for data storage.  It pads
     out short initializations of external data and to reserve space  for
     static (internal)  data.  It  will  be preceded  by  an  appropriate
     label.

EVEN
     forces alignment on a  full-word boundary.  May  be produced  either
     before or after an external data definition to achieve proper align-
     ment.  It is not produced after strings except when they  initialize
     a character array.

HEVEN
     forces alignment on  a halfword  boundary.  May  be produced  either
     before or after an external data definition to achieve proper align-
     ment.

DEVEN
     forces alignment on a doubleword  boundary.  May be produced  either
     before or after an external data definition to achieve proper align-
     ment.

NLABEL name
     is produced  just before  an SSPACE,  SINIT, or  BDATA  initializing
     external data, and serves as a label for the data.

RLABEL name
     is produced just  before each  function definition,  and labels  its
     entry point.

SNAME name number

-------------------------------- Page  4 --------------------------------

     is produced at the start of  each function for each static  variable
     or label declared therein.   Later uses of  the variable will be  in
     terms of the given  number.  The  code generator uses  this only  to
     produce a debugging symbol table.

ANAME name number
     Likewise, each automatic variable's name and stack offset is  speci-
     fied by this operator.  Arguments count as automatics.

RNAME name number
     Each register variable is similarly named, with its register number.

SAVE number
     produces a register  save sequence  at the start  of each  function,
     just after its label (RLABEL).

SETREG number
     means the number of registers used for register variables.  It actu-
     ally gives the register number of the lowest nonfree register; it is
     redundant because the RNAME operators could be counted instead.

PROFIL
     is produced before the save sequence for functions when the  profile
     option is turned on.  It produces code to count the  number of times
     the function is called.

SWIT deflab line label value ...
     is produced for  switches.  When  control flows into  it, the  value
     being switched on is in the register forced by RFORCE  (below).  The
     switch statement occurred on the  specified line of the source,  and
     the label number of the default location is deflab.  Then the opera-
     tor is followed by a sequence  of label-number and value pairs;  the
     list is ended by a 0 label.

LABEL number type
     generates an internal label.  It is referred to elsewhere using  the
     given number.  Different labels are generated depending on the  type
     of the label (I.e.  whether it is  a label preceding  a string or  a
     label preceding an instruction.  The optimizer must be able to  dis-
     tinguish string labels from instruction labels.)

BRANCH number
     means an unconditional transfer to the internal label number given.

RETRN
     produces the return sequence for  a function.  It occurs only  once,
     at the end of each function.

EXPR line
     causes the expression just preceding  to be compiled.  The  argument

-------------------------------- Page  5 --------------------------------

     is the line number in the source where the expression occurred.

NAME class type name

NAME class type number
     means a name occurring  in an  expression.  The first  form is  used
     when the name is  external; the second  when the name is  automatic,
     static, or a register.  Then the number specifies the stack  offset,
     the label number, or the register number as appropriate.  Class  and
     type encoding is described elsewhere.

CON type value
     transmits an integer constant.   This and the  next three  operators
     occur as part of expressions.

FCON type 4-word-value
     transmits a floating constant as four words in Amdahl-470 notation.

SFCON type value
     transmits  a  floating  point  constant  whose  value  is  correctly
     represented by its high-order word in Amdahl-470 notation.

LCON type value
     transmits a long integer constant.

NULL
     means a null argument list of a function call in an expression; call
     is a binary operator whose second operand is the argument list.

CBRANCH label cond line
     produces a conditional branch.   It is an  expression operator,  and
     will be followed by an  EXPR.  The branch to the label number  takes
     place if the expression's truth value is  the same as that of  cond.
     That is,  if cond is  1 and  the expression evaluates  to true,  the
     branch is taken.

SRCSTR line string
     used in producing assembler listings.  Line is a source line  number
     and string is the C source line.  The line number  and C source line
     are reproduced as comments within the resulting .s file.

FILENME name
     passes the file name to the second pass  of the compiler so that  it
     can be used when printing error messages.

TEXTBSE number
     number specifies the number of registers  to be used as base  regis-
     ters for the text segment.   At least 2 registers are used for  text
     base registers.  Additional  registers are  taken from  the pool  of
     register variables.

-------------------------------- Page  6 --------------------------------

DATABSE number
     number specifies the number of registers  to be used as base  regis-
     ters for the data segment.   At least 2 registers are used for  data
     base registers.  Additional  registers are  taken from  the pool  of
     register variables.

binary operator type
     There  are  binary  operators  corresponding  to  each  such  source
     language operator; the type of the result of each is passed as well.
     Some  perhaps  unexpected  ones  are:  COMMA,  which  is  a   right-
     associative operator designed  to simplify right-to-left  evaluation
     of function arguments; prefix  and postfix ++  and --, whose  second
     operand is  the increment  amount,  as a  CON; QUEST  and COLON,  to
     express the conditional expression as  'a?(b:c)'; and a sequence  of
     special operators  for  expressing relations  between  pointers,  if
     pointer  comparison  is  different  from  integer  comparison  (e.g.
     unsigned).

unary operator type
     There are also numerous unary operators.  These include ITOF,  FTOI,
     FTOL, LTOF, ITOL, LTOI, FTOS, STOF, LTOS, STOL, ITOS and STOI, which
     convert among  floating,  long,  short,  and  integer;  JUMP,  which
     branches indirectly through a label expression; INIT, which compiles
     the value  of a  constant  expression used  as an  initializer;  and
     RFORCE, which is used before a return sequence or a  switch to place
     a value in an agreed upon register.




3.    EXPRESSION OPTIMIZATION

Each expression tree, as it is read  in, is subjected to a  comprehensive
analysis.  This is done by the optim routine and several subroutines; the
major things done are

 1.  Modifications and simplifications of  the tree so  its value may  be
     computed more efficiently and conveniently by the code generator.

 2.  Marking each interior node with an estimate of the number of  regis-
     ters required  to evaluate  it.  This  register count  is needed  to
     guide the code generation algorithm.

One thing that is not done is discovery or exploitation of common  subex-
pressions, nor is this done anywhere in the compiler.

The basic organization is simple: a depth-first scan of the tree.   Optim
does nothing for leaf nodes (except for automatics; see below), and calls

-------------------------------- Page  7 --------------------------------

unoptim to handle unary operators.  For binary operators, it calls itself
to process  the  operands, then  treats  each operator  separately.   One
important case is commutative and  associative operators, which are  han-
dled by acommute.

Here is  a brief  catalog of  the transformations  carried out  by  optim
itself.  It is not intended to be complete.  Some of  the transformations
are machine-dependent, although they may well be useful on machines other
than the Amdahl-470.

 1.  As described in the discussion  of unoptim below, the optimizer  can
     create a  node type  corresponding to  the location  addressed by  a
     register plus a constant offset.  Since this is precisely the imple-
     mentation of automatic variables  and arguments, where the  register
     is fixed by convention, such variables  are changed to the new  form
     to simplify later processing.

 2.  Associative and commutative operators are  processed by the  special
     routine acommute.

 3.  Relationals are turned around so the more complicated expression  is
     on the  left.  (So  that '2>f(x)' becomes  'f(x)<2'.) This  improves
     code generation  since  the  algorithm prefers  to  have  the  right
     operand require fewer registers than the left.

 4.  Operators with constant operands are evaluated.

 5.  Several special cases are simplified, such as division or  multipli-
     cation by 1, and shifts by 0.

The unoptim routine does the same type of processing for unary operators.

 1.  '*&x' and '&*x' are simplified to 'x'.

 2.  If r is a register and c is a constant, the expressions '*(r+c)' and
     '*r' are turned into a special kind of name node  that expresses the
     name itself  and  the  offset.   This  simplifies  later  processing
     because such  constructions  can appear  as  the the  address  of  a
     Amdahl-470 instruction.

 3.  If '!' is  applied to  a relational, the  '!' is  discarded and  the
     sense of the relational is reversed.

 4.  Special cases involving reflexive use  of negation and  complementa-
     tion are discovered.

 5.  Operations applying to constants are evaluated.

The acommute routine, called for  associative and commutative  operators,
discovers clusters of the same operator at the top levels  of the current

-------------------------------- Page  8 --------------------------------

tree, and arranges them in a  list: for 'a+((b+c)+(d+f))' the list  would
be'a,b,c,d,e,f'.  After each subtree is optimized, the list is sorted  in
decreasing difficulty of computation; as  mentioned above, the code  gen-
eration algorithm works best when  left operands are the difficult  ones.
The 'degree of  difficulty' computed  is finer  than the  mere number  of
registers required; a constant is considered simpler than the address  of
a static or  external, which  is simpler  than reference  to a  variable.
This makes it easy to fold all the constants together,  and also to merge
together the sum of a  constant and the address of  a static or  external
(since in such nodes  there is space  for an 'offset' value).  There  are
also special cases, like multiplication by 1 and addition of 0.

A special routine  is invoked  to handle  sums of  products.  Distrib  is
based on  the  fact  that  it  is better  to  compute  'c1*c2*x+c1*y'  as
'c1*(c2*x+y)' and makes  the divisibility  tests required  to assure  the
correctness of the transformation.  This transformation is rarely  possi-
ble with code directly written by the user, but it invariably occurs as a
result of the implementation of multidimensional arrays.

Finally, acommute reconstructs a tree  from the list of expressions  that
result.




4.    CODE GENERATION

The grand  plan for  code  generation is  independent of  any  particular
machine; it depends largely on  a set of tables.  But this fact does  not
necessarily make it easy to modify the compiler to produce code for other
machines, both because there is  a good deal of machine-dependent  struc-
ture in the tables, and because in  any event such tables are  nontrivial
to prepare.

The arguments to the basic code  generation routine rcexpr are a  pointer
to a  tree representing  an  expression, the  name of  a code  generation
table, and the number of a register in which the value of the  expression
should be placed.  Rcexpr returns the number of the register in which the
value ended up; its caller  may need to produce a  lr instruction if  the
value really needs to be in the given register.  There are four code gen-
eration tables.

Regtab is the basic  one, which  actually does the  job described  above:
namely, compile code that places the value represented by the  expression
tree in a register.

Cctab is used when the value of the expression is not needed, but instead
the value  of  the  condition  codes resulting  from  evaluation  of  the

-------------------------------- Page  9 --------------------------------

expression.  This table is used, for example, to evaluate the  expression
after if.  It is  clearly silly to  calculate the value  (0 or 1) of  the
expression 'a==b' in the context 'if (a==b) ...'

The sptab table is used when the value of  an expression is to be  pushed
onto the argument stack.   For example in  the function call  'f(a,b,c)',
the value  of the  three actual  arguments are  evaluated through  sptab.
Note: sptab has  no entries  but causes the  appropriate actions to  take
place after regtab compiles the arguments.

The efftab table is used  when an expression is to  be evaluated for  its
side effects, not its value.  This occurs mostly for expressions that are
statements, which have no value.  Thus  the code for the statement  'a=b'
need produce only the appropriate  load and store instructions, and  need
not leave the value of b in the called for register, while in the expres-
sion 'a+(b=c)' the value of  'b=c' will appear in the register  specified
in the code table.

All tables besides regtab are small, and handle only a few special cases.
If a subsidiary tables does not contain an entry applicable  to the given
expression tree, rcexpr uses regtab  to put the  value of the  expression
into a register and then  fixes things up; nothing need be done when  the
table was efftab, but a ltr instruction is produced when the table called
for was cctab, and a  st instruction, pushing the register on the  stack,
when the table was sptab.

The rcexpr routine itself picks off some special cases, then calls  cexpr
to do the  real work.   Cexpr tries to  find an  entry applicable to  the
given tree in the given table, and returns -1 if no such entry is  found,
letting rcexpr  try again  with a  different table.   A successful  match
yields a string containing both  literal characters that are written  out
and pseudo-operations, or macros, that are expanded.  Before studying the
contents of these strings we will consider how table entries are  matched
against trees.

Recall that most nonleaf nodes in an expression tree contain the name  of
the operator, the type of the value represented, and pointers to the sub-
trees (operands).  They also contain an estimate of the number of  regis-
ters required to evaluate the expression, placed there by the  expression
optimizer routines.  The register counts  guide the code generation  pro-
cess, which is based on the Sethi-Ullman algorithm.

The main code  generation tables  consist of entries  each containing  an
operator number and a pointer to a subtable for the  corresponding opera-
tor.  A  subtable consists  of a  sequence of  entries, each  with a  key
describing certain properties of  the operands of the operator  involved;
associated with the key is a code string.  Once the subtable  correspond-
ing to the operator is  found, the subtable is searched linearly until  a
key is found such that the properties demanded by the key are  compatible
with the operands of the tree node.  A successful match  returns the code

-------------------------------- Page 10 --------------------------------

string; an unsuccessful search, either for the operator in the main table
or a compatible key in the subtable, returns a failure indication.

The tables are all contained in a file  that must be processed to  obtain
an assembly language program.  Thus they are written in a special-purpose
language.  To provide definiteness to  the following discussion, here  is
an example of a subtable entry.

        %n,aw
                F
                a       R,A2

The '%' indicates the key; the information following (up to a blank line)
specifies the code string.   Briefly, this entry  is in the subtable  for
'+' of regtab; the key  specifies that the left  operand is any  integer,
character, or pointer expression, and the right operand is any word quan-
tity that is  directly addressable  (e.g. a variable  or constant).   The
code string  calls for the  generation of  the code to  compile the  left
(first) operand into the current  register ('F') and  then to produce  an
'add' instruction that  adds the  second operand ('A2')  to the  register
('R').  The notation will be explained below.

Only three features of the operands are used in deciding whether a  match
has occurred.  They are:

 1.  Is the type of the operand compatible with that demanded?

 2.  Is the 'degree of difficulty' (in the sense described below)  compa-
     tible?

 3.  The table may demand that the operand have a '*' (indirection opera-
     tor) as its highest operator.

As suggested above, the key for a subtable  entry is indicated by a  '%',
and a  comma-separated pair  of  specifications for  the operands.   (The
second specification  is ignored  for unary  operators.) A  specification
provides a type requirement  by including one  of the following  letters.
If no type letter is present, any integer, character, or pointer  operand
will satisfy the requirement (not float, double, or long).


     b   A byte (character) operand is required.

     w   A word (integer or pointer) operand is required.

     f   A float or double operand is required.

     d   A double operand is required.

     l   A long (64 bit integer) operand is required.

-------------------------------- Page 11 --------------------------------

     u   An unsigned operand is required.

     s   A short (16 bit integer) operand is required.

     t   An unsigned short operand is required.

Before discussing the 'degree of difficulty' specification, the algorithm
has to be explained more completely.  Rcexpr (and cexpr) are  called with
a register number in which  to place their result.   Registers 2, 3,  ...
are used during evaluation of expressions; the maximum register that  can
be used in this way depends on the  number of register variables, but  in
any event only r2 through r7 are available since r8  through r11 are used
for base registers, r12 is used for debugging purposes, r13 is used as  a
stack frame header, r14 holds  the return address during function  calls,
and r15 contains the address of  the current function.  The code  genera-
tion routines assume that when  called with register n as argument,  they
may use n+1, ...   (up to  the first register  variable) as  temporaries.
Consider the expression 'X+Y', where both X and Y are  expressions.  As a
first approximation, there are three ways  of compiling code to put  this
expression in register n.

 1.  If Y is an addressable cell, (recursively) put X into register n and
     add Y to it.

 2.  If Y is an expression that can be calculated in k registers, where k
     smaller than  the  number of  registers  available, compile  X  into
     register n, Y into register n+1, and add register n+1 to n.

 3.  Otherwise, compile Y into register n, save the result in a temporary
     (on the stack) compile X into register n, then add in the temporary.

The distinction between cases 2  and 3 therefore  depends on whether  the
right operand can be compiled  in fewer than k registers, where k is  the
number of free registers left  after registers 2 through  n are taken:  2
through n-1 are presumed to contain already computed temporary results; n
will, in case 2, contain the value of the left operand while the right is
being evaluated.

These considerations should make  clear the specification  codes for  the
degree of difficulty, bearing in mind that several special cases are also
present:

z    is satisfied when the operand is zero,  so that special code can  be
     produced for expressions like 'x=0'.

c    is satisfied when the operand is a positive (32 bit) constant.

q    is satisfied when the operand is a positive (64 bit) long  constant;
     this takes care of some  special cases in loading longs into  regis-
     ters.

-------------------------------- Page 12 --------------------------------

a    is satisfied when the operand  is addressable; this occurs only  for
     variables and constants.

r    is satisfied when the operand is a register variable.

e    is satisfied by an operand whose value can be generated in a  regis-
     ter using no more than k registers, where k is  the number of regis-
     ters left (not counting the  current register).  The 'e' stands  for
     'easy'.

n    is satisfied by any operand.  The 'n' stands for 'anything'.

These degrees of difficulty are  considered to lie  in a linear  ordering
and any  operand  that satisfies  an earlier-mentioned  requirement  will
satisfy a later one.  Since the subtables are searched linearly, if a 'c'
specification is included, almost certainly  a 'z' must be written  first
to prevent expressions containing the constant 0 to be compiled as if the
0 were an ordinary constant.

Finally, a  key  specification may  contain  a '*',  which  requires  the
operand to have an indirection  as its leading operator.  Examples  below
should clarify the utility of this specification.

Now let us consider the contents of the code string associated with  each
subtable  entry.   Conventionally,  lower-case  letters  in  this  string
represent literal  information that  is copied  directly to  the  output.
Upper-case letters generally introduce specific macro-operations, some of
which may be followed by modifying information.  The code strings in  the
tables are  written  with  tabs  and new-lines  used  freely  to  suggest
instructions  that  will  be  generated;  the  table  compiling   program
compresses tabs (using  the 0200  bit of the  next character) and  throws
away some of  the new-lines.   For example  the macro  'F' is  ordinarily
written on a  line by  itself; but since  its expansion  will end with  a
new-line, the new-line after 'F' itself  is dispensable.  This is all  to
reduce the size of the stored tables.

The first set of macro-operations  is concerned with compiling  subtrees.
Recall that this is done by the cexpr routine.  In  the following discus-
sion the 'current register' is generally the argument register to  cexpr;
that is, the place where  the result is desired.  The 'next register'  is
numbered one higher than the  current register.  (This explanation  isn't
fully true because  of complications,  described below, involving  opera-
tions that require even-odd register pairs.)

F    causes a recursive call to the  rcexpr routine to compile code  that
     places the value of the first (left) operand of the  operator in the
     current register.

F2   same as F but this macro insures that the value actually ends up  in
     the specified register.

-------------------------------- Page 13 --------------------------------

F1   generates code that places  the value  of the first  operand in  the
     next register.  It  is incorrectly  used if there  might be no  next
     register; that is, if the degree of difficulty of the first  operand
     is not 'easy'; if not, another register might not be available.

FS   generates code that pushes  the value  of the first  operand on  the
     stack.

Analogously,

S, S1, SS
     compile the second (right)  operand into the  current register,  the
     next register, or onto the stack.

To deal with registers, there are

R    which expands into the name of the current register.

R1   which expands into the name of the next register.

FR   which expands into the name of the current floating point register.

FR1  which expands into the name of the next floating point register.

R+   which expands into the the name of the current register plus 1.   It
     was suggested  above that  this is  the same as  the next  register,
     except for complications; here is one.  Long integer variables  have
     64 bits and require 2 registers; in such cases the  next register is
     the current register plus 2.  The code would like to talk about both
     halves of the long  quantity, so R  refers to the register with  the
     high-order part and R+ to the low-order part.

R-   This is another complication, involving multiplication.  Multiplica-
     tion involves a pair of registers of which the odd-numbered contains
     the left operand.  Cexpr arranges that the current register is  odd;
     the R- notation allows  the code to  refer to the next lower,  even-
     numbered register.

To refer to addressable quantities, there are the notations:

A1   causes generation of  the address  specified by  the first  operand.
     For this to be legal, the operand must be addressable;  its key must
     contain an 'a' or a more restrictive specification.

A2   correspondingly generates the address of the second operand  provid-
     ing it has one.

O    expands into the address on the stack of the current temporary.  The
     temporary is then popped from the stack.

-------------------------------- Page 14 --------------------------------

We now have enough mechanism to show a complete, if suboptimal, table for
the + operator on word or byte operands.

        %n,z
                F

        %n,aw
                F
                a       R,A2

        %n,e
                F2
                S1
                ar      R,R1

        %n,n
                SS
                F
                a       R,O

The first two sequences  handle some  special cases.  It  turns out  that
handling a right operand of 0 is unnecessary since the  expression optim-
izer throws out adds of 0.  The case where the right operand is  address-
able is handled next.  It  must be a word quantity, since the  Amdahl-470
lacks an  'add byte'  instruction.   Finally the  cases where  the  right
operand either can,  or cannot,  be done in  the available registers  are
treated.

The next macro instructions are conveniently introduced by noticing  that
the above table is suitable for subtraction as well as addition, since no
use is made of the commutativity of addition.  All that is needed is sub-
stitution of 's' for  'a'.  Considerable saving  of space is achieved  by
factoring out several similar operations.

I    is replaced by a string from  another table indexed by the  operator
     in the node being expanded.

Thus, given that the entries for '+' and '-' in the side table (which  is
called instab)  are 'a'  and 's'  respectively, the middle  of the  above
addition table can be written

        %n,aw
                F
                I       R,A2

        %n,e
                F2
                S1
                Ir      R,R1

-------------------------------- Page 15 --------------------------------

and it will be suitable for subtraction as well.

To handle unsigned shorts, the following macro is used.

Q    is replaced by the logical AND ('n') operator if the operand  previ-
     ously loaded into a register  is an unsigned short.  This  correctly
     zeroes out the high-order 2 bytes  of the register.  If the  operand
     is not an unsigned short, the entire line following Q is discarded.

Next, there is the question of character and floating point operations.

B1   generates the letter 'b' if the first operand is a character, ,  'h'
     if it is a short or unsigned short, 'e' if it is float, 'd' if it is
     a double,  and nothing  otherwise.  It  is used  in a  context  like
     'lB1', which  generates  an  'l',  'lh', 'le'  or  'ld'  instruction
     according to the type of the operand.

B2   is just like B1 but applies to the second operand.

BF   generates 'e' or  'd' if  the type of  the operator  node itself  is
     float or double, otherwise null.

Next, there is the question  of handling indirection properly.   Consider
the expression 'X + *Y', where X and Y are expressions, If  Y is either a
variable or qualifies as 'easy' in  the context, the expression would  be
compiled by placing the value of X in a register,  that of *Y in the next
register, and adding the registers.  It is easy to see that a better  job
can be done by compiling X, then Y (into the  next register), and produc-
ing the instruction symbolized by 'a   R,0(R1)'.  This scheme avoids gen-
erating the instruction 'l   R1,0(R1)' required to place the value of  *Y
in a register.  A  related case occurs  with the expression  'X + *(p+6)'
(assuming 'p' is  an integer pointer),  which exemplifies a  construction
frequent in structure  and array  references.  The  addition table  shown
above would produce

        [put X in register R]
        l       R1,'address of p'
        a       R1,=f'24'
        l       R1,0(R1)
        ar      R,R1

when the best code is

        [put X in R]
        l       R1,'address of p'
        a       R,24(R1)

As we said above, a key specification for a code table entry may  require
an operand to have an  indirection as its highest operator.  To make  use
of the requirement, the following macros are provided.

-------------------------------- Page 16 --------------------------------

F*   the first operand must have  the form *X.  If  in particular it  has
     the form *(Y + c),  for some constant c, then code is produced  that
     places the value of Y in  the current register.  Otherwise, code  is
     produced that loads X into the current register.

F1*  resembles F* except that the next register is loaded.

S*   resembles F* except that the second operand is loaded.

S1*  resembles S* except that the next register is loaded.

FS*  The first operand must have the form '*X'.   Push the value of X  on
     the stack.

SS*  resembles FS* except that it applies to the second operand.

To capture the constant that may have been skipped over in the above mac-
ros, there are

#1   The first operand must have the form *X; if in particular it has the
     form *(Y + c)  for c a constant,  then the constant is written  out,
     otherwise a null string.

#2   is the same as #1 except that the second operand is used.

Now we can  improve the  addition table  above.  Just  before the  '%n,e'
entry, put

        %n,ew*
                F2
                S1*
                a       R,#2(R1)

Some machine operations place  restrictions on the  registers used.   The
divide instruction, which implements  the divide and mod operations,  and
the multiply  instruction, require  the dividend  or multiplicand  to  be
placed in the odd  member of an even-odd  pair.  There is no theory  that
optimally accounts for  this kind  of requirement.  Cexpr  handles it  by
checking for a multiply,  divide, or mod  operation; in these cases,  its
argument register number is incremented by one or  two so that it is  odd
and so that it  is a member  of a free  even-odd pair.  The routine  that
determines the number of required registers conservatively estimates that
at least two  registers are  required for a  multiplication or  division.
After the expression is compiled, the register where the result ended  up
is returned.  (Divide and mod are actually the same operation  except for
the location of the result.)

These operations are the  ones that  cause results to  end in  unexpected
places, and this  possibility adds  a further level  of complexity.   The
simplest way of handling the problem is always to move the result to  the

-------------------------------- Page 17 --------------------------------

place where the  caller expected  it, but this  will produce  unnecessary
register moves in many simple cases; 'a=b*c' would generate

        l       r3,'value of b'
        m       r2,'value of c'
        lr      r2,r3
        st      r2,'address of a'

The next  thought is  use the  passed back  information about  where  the
result landed to change the  notion of the current register.  While  com-
piling the '=' operation above, which comes from a table entry like

        %a,n
                S
                stB1    R,A1

it is enough to redefine the meaning of 'R' after processing the 'S' that
does the multiply.   This technique  is used; the  tables are written  so
that correct code is produced.  The trouble is that the technique  cannot
be used in  general, because it  invalidates the count  of the number  of
registers required for an expression.   Consider just 'a*b+X' where X  is
some expression.  The algorithm assumes that the value of a*b,  once com-
puted, requires just one register.   If there are three registers  avail-
able (i.e. r2, r3 and r4), and X requires two  registers to compute, then
this expression will match a key  specifying '%n,e'.  If a*b is  computed
and left  in register  3, then  there are, contrary  to expectations,  no
longer two registers available to compute X,  but only one, and bad  code
will be produced.  To  guard against this  possibility, cexpr checks  the
result returned by recursive calls  that implement F,  S and their  rela-
tives.  If the result is not in the expected register, then the number of
registers required by the  other operand is  checked; if it  can be  done
using those registers that remain even after making unavailable the unex-
pectedly occupied register, then the  notions of the 'next register'  and
possibly the 'current register' are redefined.  Otherwise a register copy
instruction is produced.  A  register copy is  also always produced  when
the current operator is one of those that have odd-even requirements.

Finally, there are a few loose end  macro operations and facts about  the
tables.  The operators:

V    is used for long operations.  It is  written with an address like  a
     machine instruction; it expands into 'bc 12' if the operation is  an
     additive operator  and 'bc 3'  if  the operation  is  a  subtractive
     operator.  Its purpose is to allow common treatment of additive  and
     subtractive operations, which generate carries.

T    generates a 'srdl' instruction if an operand is an unsigned  integer
     or an unsigned short and a 'srda' otherwise.  It is used with divide
     and mod operations, which require a sign extended 64 bit operand.

-------------------------------- Page 18 --------------------------------

U    generates unsigned comparisons  for unsigned  integers and  unsigned
     shorts.

H    is analogous to the 'F' and 'S' macros, except that it calls for the
     generation of code for  the current tree  (not one of its  operands)
     using regtab.

ZW   generates a mask used in full-word field assignments.

ZS   generates a mask used in short field assignments.

ZB   generates a mask used in byte field assignments.

X    used in relationals involving long values.

The above is discussed in  terms of operators  with operands.  Leaves  of
the expression tree (variables  and constants), however, are peculiar  in
that they have no  operands.  To regularize  the matching process,  cexpr
examines its operand to  determine if it is a  leaf; if so, it creates  a
special 'load' operator whose operand is the leaf, and substitutes it for
the argument tree; this allows  the table entry for the created  operator
to use the 'A1' notation to load the leaf into a register.

Purely to save space in  the tables, pieces of  subtables can be  labeled
and referred to  later.  It turns  out, for example,  that in the  PDP-11
compiler, large portions of the efftab table for the '=' and '+='  opera-
tors are identical.  Thus '=' has an entry

        %[move3:]
        %a,aw
        %ab,a
                IBE     A2,A1

while part of the '+=' table is

        %aw,aw
        %       [move3]

Labels are written as  '%[...:]', before the  key specifications;  refer-
ences are written with '%   [...]'  after the key.  Peculiarities in  the
implementation make it necessary that labels appear before references  to
them.  This  feature is  not used  in the current  implementation of  the
Amdahl-470 compiler.

The example illustrates the utility of allowing separate keys to point to
the same code string.  The  assignment code works properly if either  the
right operand is a word, or the left operand  is a byte; but since  there
is no 'add byte'  instruction the addition  code has to be restricted  to
word operands.

-------------------------------- Page 19 --------------------------------

5.    DELAYING AND REORDERING

Intertwined with the code generation routines are two other, interrelated
processes.  The first, implemented by a routine called delay, is based on
the observation that  naive code  generation for  the expression  'a=b++'
would produce

        l       r2,'value of b'
        a       r2,=f'1'
        lr      r3,r2
        l       r2,'value of b'
        st      r2,'address of a'
        lr      r2,r3
        st      r2,'address of b'

The point is that the table for postfix ++ has to preserve the value of b
before incrementing it;  the general way  to do this  is to preserve  its
value in a register.  A cleverer scheme would generate

        l       r2,'value of b'
        st      r2,'address of a'
        lr      r0,r2
        a       r0,=f'1'
        st      r0,'address of b'

Delay is called for each expression input to rcexpr, and it searches  for
postfix ++ and -- operators.  If one is found applied  to a variable, the
tree is patched to bypass  the operator and compiled  as it stands;  then
the  increment or  decrement  itself  is  done.   The  effect  is  as  if
'a=b; b++' had been written.  In this  example, of course, the user  him-
self could  have done  the same  job, but more  complicated examples  are
easily built, for  example 'switch (x++)'.  An  essential restriction  is
that the condition codes not be required.  It would be  incorrect to com-
pile 'if (a++) ...' as

        l       r2,'value of a'
        ltr     r2,r2
        lr      r0,r2
        a       r0,=f'1'
        st      r0,'address of a'
        be      $2

because the "a   r0,=f'1'" destroys the required setting of the condition
codes.

Reordering is a similar type of optimization.  Many cases that it detects
are useful mainly with register  variables.  If r is a register  variable
and occupies r7, the expression 'r=x+y' is best compiled as

-------------------------------- Page 20 --------------------------------

        l       r7,'value of x'
        a       r7,'value of y'

but the codes tables without prior reordering would produce

        l       r2,'value of x'
        a       r2,'value of y'
        lr      r7,r2

The scheme  is to  compile  the expression  as  if it  had  been  written
'r=x; r+=y'.  The reorder routine is  called with a pointer to each  tree
that rcexpr is about to compile; if it has the right characteristics, the
'r=x' tree is built and  passed recursively to rcexpr; then the  original
tree is changed to read  'r+=y' and the calling  instance of rcexpr  com-
piles that instead.  Of course the whole business is itself  recursive so
that more  extended  forms  of  the same  phenomenon  are  handled,  like
'r=x+y|z'.

Care does  have to  be taken  to avoid  'optimizing' an  expression  like
'r=x+r' into 'r=x; r+=r'.  It is  required that the right operand of  the
expression on  the right  of the  '=' be  a register,  distinct from  the
register variable.

The second case that  reorder handles  is expressions of  the form  'r=X'
used as a subexpression.  Again,  the code out of the tables for  'x=r=y'
would be

        l       r2,'value of y'
        lr      r7,r2
        st      r2,'address of x'

whereas if r were a register it would be better to produce

        l       r7,'value of y'
        st      r7,'address of x'

When reorder discovers that a register variable is being assigned to in a
subexpression, it calls rcexpr recursively to compile the  subexpression,
then fiddles the tree passed to it  so that the register variable  itself
appears as the operand instead of the whole subexpression.  Here care has
to be taken to avoid an infinite regress, with rcexpr and reorder calling
each other forever to handle assignments to registers.

A third set  of cases  treated by  reorder comes  up when  any name,  not
necessarily a register, occurs as a left operand of an  assignment opera-
tor other than '=' or as an operand of prefix '++' or '--'.  Unless  con-
dition code tests  are involved,  when a subexpression  like '(a+=b)'  is
seen, the assignment is done and the argument  tree changed so that a  is
its operand; effectively  'x+(y+=z)' is  compiled as 'y+=z; x+y'.   Simi-
larly, prefix increment and decrement are pulled out and done first, then

-------------------------------- Page 21 --------------------------------

the remainder of the expression.

Throughout code generation, the expression  optimizer is called  whenever
delay or reorder change  the expression tree.   This allows some  special
cases to be found that otherwise would not be seen.
