% STRUCTURE OF STRUCTURE DATA
% strucdata.tex
% David.N.Williams@umich.edu
% Last revision:  July 16, 2000

\section{Structure of structure data}\label{structdata}

In the interest of focus, we describe here the layout of structure data and
type information that we have in mind.

\subsection{Structure instances}

A structure instance contains the actual data of a structure, and
possibly more information.  Instances may be either named or unnamed. 
The essential kernel of either kind of instance is the structure data
itself, which we understand to exclude type information.  This is the
part whose layout should be C compatible.  We call the address of such
a memory block of pure data the \sda, for ``structure data address''. 
If we speak of a structure pointer, we mean the \sda.  Substructure
instances are presumed to contain only pure data.

As a rule of thumb in our discussion, unnamed structure instances
contain only pure data, and named instances (in the sense of named
Forth words) contain one extra piece of information, a pointer to the
structure type information.  That pointer is called an \stype. 
Although substructures have named identifiers, substructure instances
are not ``named structures'' in the Forth sense, and as we said above
contain only pure data.

Layout of a typical named structure instance:
\begin{alltt}
       stype
       field 1 data
       \vdots
       field n data
\end{alltt}

The address of {\tt field~1 data} is the \sda.  Alignment for
compatibility with the underlying system is understood.  There is a
subtlety here.  The named structure instance is typically made with
{\tt CREATE}, and \stype occupies the first cell in its data field, at
the Forth-aligned \dfa.  Then {\tt field~1 data} is not necessarily
located one cell after the \dfa, because the structure alignment may
not be consistent with that.  Structure data access words have to take
this into account.

A typical unnamed instance would omit the \stype.

\subsection{Structure type definitions}\label{structypedef}

The structure type definition contains the information about the
layout and sizes of the structure fields.  We implement the structure
type pointer, or \stype, as the \dfa of a structure type word.

\clearscreenpage

Layout of a structure type definition (implementation dependent):

\begin{alltt}
              code field {\normalfont(\verb|CREATE|'d action leaves the \dfa)}
      stype:  structure data size {\normalfont (including padding)}
              class {\normalfont (1 for structures)}
              structure alignment
              #fields
              field 1 parameters
              \vdots
              field n parameters
\end{alltt}

In this discussion the ``field'' in {\tt field~1}, etc., is used in the
sense of a C structure element.  In discussions of standard C,
``field'' is sometimes used as a synonym for ``bit-field''
\cite[p.~149]{k&r:88}, a practice we avoid.

The data field arrangement, following the code field, varies only a
little from Peters' ``type definition table''.  We call it instead the
``structure table'', reserving ``type definition'' for more generic
data typing.  In other words, our \stype is the address of the
structure table.  Including explicit information on the number of
fields rather than a table termination signal is an implementation
detail.  Although the layout of the structure table is an
implementation detail, we like keeping the ordering the same as for
the storage of the structure data.

The field parameters in the structure table allow the construction of
field offsets from the beginning of pure structure data, including
substructure nesting.  As long as they satisfy this function, their
order and content are implementation details.

Layout of structure field parameters (implementation dependent):
\begin{alltt}
       field identifier
       field offset
       field type pointer
\end{alltt}
        
\subsection{Structure field classes}

We require six classes of fields, organized in this implementation as
follows:
\begin{center}\noindent
\begin{tabular}{lc>{\tt}l}
        field&                  class&  \normalfont type pointer\\
        \hline
        unstructured data&      0&      ustype\\
        structure&              1&      stype\\
        atomic data type&       2&      adtype\\
        array&                  3&      atype\\
        union&                  4&      utype\\
        bit-field&              5&      bftype\\
\end{tabular}\end{center}

The class numbering is implementation dependent.  The numbering here
reflects our personal implementation priority, with levels of
conditional compilation in mind.  An important use of the class number
is to indicate nesting termination.

The unstructured data class should not be included in C compatible
structures.  Taken together with just the structure class, it can
provide a simple, standalone Forth structure facility with full
nesting and independent field identifiers, where the user keeps track
of primary data types and sizes.  This kernel is our ANS translation
of Peters' implementation, and there is an option in {\tt cstruct.fs}
to compile just that much.

The atomic data types are the standard C types, {\tt char}, {\tt
short}, {\tt long}, {\tt int}, {\tt long double}, {\tt float}, {\tt
char*}, etc.  Each has a type definition pointed to by an \adtype,
described in Section~\ref{typestruc}.

Arrays are made of elements all of the same kind (including especially
size), which may belong to any of the six classes except bit-fields.  To
be C compatible, the array elements should not be unstructured data.

Unions are made of elements whose storage space overlaps, with size
and alignment large enough to accommodate the largest.  The elements
may belong to any of the six classes, except that unstructured data
should not be included in C compatible unions.

As Kernighan and Ritchie express it \cite[pp.~148, 213]{k&r:88}, a
union is just a structure with all elements offset by zero from the
beginning, with alignment accommodating the biggest alignment of any
element, and with size big enough to hold any element.

There will be more words later about bit-fields than we would have wished.

In this implementation, each structure field parameter occupies one
cell, which means 12 bytes for each field in 32-bit environments. 
Since we expect that most applications will either involve relatively
few structure tables, or will correspond to an industrial strength
environment when there are many, this may not be excessive.  On the
other hand, restricting structure instances to 64K bytes is likely to
be more than adequate, in which case 16 bits for each of the first two
parameters should suffice, which would reduce the overhead to 8 bytes.

\clearemptydoublepage