% OVERVIEW OF STRUCTURE WORDS
% strucwords.tex
% David.N.Williams@umich.edu
% Last revision:  July 16, 2000

\section{Overview of structure words}\label{strucwords}
This overview is a brief functional description.  Stack patterns and
other specifications can be found in the implementation file {\tt
cstruct.fs}.

\subsection{Structure word set}
\begin{alltt}
  \verb|struct{  }struct  union{  }union|
  n-aligned  unstruct  field  array-field  bit-field  bit-pad
  cchar  cwchar  cint  cshort  clong  cpointer  cllong
  cfloat  cdouble  cldouble
  /type  /align  make-type-instance  typeof
  make-atomic-type  make-array-type  make-unstruct-type
  >sfa  >sfo  >sfa&type  >sfo&type
\end{alltt}

\subsection{Description}

Here is an example borrowed from Peters (see {\tt examples.fs}) except
that his word {\tt field} corresponds to our {\tt unstruct}, and his
word {\tt struct} corresponds to our {\tt field}:
\clearscreenpage
{\begin{alltt}
      \verb|struct{|
       12 unstruct first
       16 unstruct last
      \verb|}struct| name.struct

      \verb|struct{|
       2 unstruct month
       2 unstruct day
       2 unstruct year
      \verb|}struct| date.struct

      \verb|struct{|
       name.struct field name
       date.struct field doa
       12 unstruct mrn
       64 unstruct precis
      \verb|}struct| pt.struct
\end{alltt}}

The code above defines structure type words {\tt name.struct}, {\tt
date.struct}, and {\tt pt.struct}.  We are not particularly advocating
the {\tt .struct} naming convention.  If we were to do so, it would
probably be a {\tt .s} convention for structures and {\tt.u} for
unions.  The words {\tt unstruct} and {\tt field} between
\verb|struct{| and \verb|}struct| absorb the type information that
precedes them, define or look up identifying tokens for the field
names that follow them ({\tt first}, {\tt last}, {\tt month}, {\tt
day}, {\tt year}, {\tt name}, etc.), and build a sequence of field
definition parameters on the stack.  The word \verb|struct{| initiates
the sequence, and \verb|}struct| creates a structure type word and
compiles the sequence of field parameters from the stack into a
structure table in the structure type word's data field, preceded by
the other information described in Section~\ref{structypedef}.

When executed, the field names leave their identifying tokens, called
\id's, on the stack; and the structure type words leave their \dfa's,
that is, their \stype's.

Here is a structure type including one C {\tt char} field and one
field with an array of 10 C {\tt long}'s:

\begin{alltt}
      \verb|struct{|
       cchar field sue
       10 clong array-field george
      \verb|}struct| harry.struct
\end{alltt}

And here is one containing two arrays of {\tt harry.struct} structures:
\clearscreenpage
\begin{alltt}
      \verb|struct{|
       15 harry.struct array-field arthur
       20 harry.struct array-field marie
      \verb|}struct| harry-arrays
\end{alltt}

The examples above are included in {\tt shotype.fs}, along with union
versions and bit-field examples, to illustrate a browser for structure
and union types implemented there.

The word {\tt n-aligned} is mainly a factor in the field constructors
{\tt field}, {\tt array-field}, and {\tt bit-field}; but it can be
used explicitly when alignment of an unstructured field is wanted (the
field constructor {\tt unstruct} does no alignment).  For example, if
one wanted the field
\begin{alltt}
      2 unstruct year 
\end{alltt}
in {\tt date.struct} above to have an alignment of four, one could say:
\begin{alltt}
      2 4 n-align unstruct year 
\end{alltt}
This is tricky, because it not only has the effect of saying
\begin{alltt}
      4 unstruct year 
\end{alltt}
but also adjusts an implicit alignment deeper on the stack.  The best
policy is to avoid explicit use of {\tt n-aligned} if possible, and
use the implicit minimum alignment of structures, plus padding
included directly in the size of the unstructured field.

The word {\tt bit-pad} inserts unnamed bit padding.

The atomic type words {\tt cchar} \ldots\  {\tt cldouble} represent most
of the scalar GNU CC types.  Some of these are not standard C.

The word {\tt /type} converts any of the six type structures into its
data size in bytes, and {\tt /align} converts them to the sizes of
their alignments.  For bit-fields, the data size is that of the 0, 1,
or 2 container fields.

The words {\tt make-unstruct-type} and {\tt make-array-type} are used
in this implementation by {\tt unstruct} and {\tt array-field} as
factors that build type words on the fly.  They could also be used
explicitly to make unstructured type and array type words to be used
with {\tt field}, dispensing with {\tt unstruct} and {\tt
array-field}.  The word {\tt make-atomic-type} is intended to let the
user cover C implementation-dependent gaps, for our example, in our
pointer type coverage.  The {\tt make-} style of nomenclature is
borrowed from Anton Ertl's Gray \cite{aertl:94}.

The word {\tt make-type-instance} is used to allocate type instances.

A number of words like {\tt >sfo} do exactly the same thing when
operating on structure or union types.  To save names, we take the
attitude in such cases that a union is just a kind of structure.

The word {\tt >sfo} converts a structure or union type and a sequence
of \id's for nested substructures that resolves to an atomic or
unstructured or array field, such as {\tt last name} for the structure
type {\tt pt.struct} in the example above, into the offset of the
field from the \sda of an instance.  Here ``{\tt sfo}'' stands for
``structure field offset''.  For bit-fields, the offset of the first
container field is returned.  The word {\tt >sfo} can also convert a
truncated nesting, such as just {\tt name} with {\tt pt.struct}. 
Examples of the syntax are given in Section~\ref{extwords}.

The word {\tt >sfa} does the analogous thing, but takes an \sda as
well as an \stype as input, producing the address of the field instead
of the offset.  For bit-fields, that is the address of the first
container field.

The words \verb|>sfo&type| and \verb|>sfa&type| also leave the type
pointer.  They evolved from a factorization of Peters' implementation,
which returned sizes instead of types.

Nesting in the \id chains resolved by these words cannot go deeper
than an \id for one of the primitive types: unstructured, atomic data,
or bit-field.  It is also stopped by an array type.  Although an array
may have structure elements, a new chain would have to be started to
access any nesting in those, after indexing into the array.  As we
said earlier, we do not implement array access.

See {\tt cstruct.fs} for more details about the Structure word set.

\subsection{Implementation note}

Our implementation makes a type instance for every unstructured field,
array field, and bit-field in a structure type definition.  We
indicated above that {\tt unstruct} and {\tt array-field} can be
eliminated by using explicitly defined types with {\tt field}.  That
would reduce the type overhead if there were several unstructured
fields or arrays of the same type.

In the case of bit-fields, implicit type generation helps us track the
arbitrary bit offsets they can have in their containers.  We have not
been tempted to try to reduce that overhead.

\clearemptydoublepage