% LAYOUT ALGORITHM
% layout.tex
% David.N.Williams@umich.edu
% Last revision:  July 13, 2000

\section{Layout algorithm}\label{layout}

Here is the algorithm we follow for structure and union layout, aimed
at compatibility with the GNU scheme for parametrizing systems, as
long as there are no unstructured fields, and with a certain exception
for bit-fields.  There is no distinction between signed and unsigned
types.  It is up to the user to handle that at the point of field
access, when it is an issue.  Byte-ordering is irrelevant at the
layout level---that affects field data access only.  Whether
bit-allocation for bit-fields is from left to right or right to left
within an embedding integer field is irrelevant at the layout level
for the same reason.

\subsection{Normal fields}

The rules in this subsection are mostly for everything except 
bit-fields.

1.  Each atomic data type has a size in address units and an alignment
in address units, which is system dependent.  These alignments are
often expressible in terms of only a few parameters, but each is
specified independently in our scheme at the level of the data type
structure.  We include a single, generic pointer type as an atomic
data type, not distinguishing among pointers to different data type
instances.  Other atomic data types can be supplied by the user.

2.  Each structure field has a size, an alignment, and an offset from
the beginning of the structure.

3.  The size of a structure is the sum of the sizes of its fields,
plus any padding between fields to achieve alignment of the later
field, plus a padding at the end to round up the size to a multiple of
the structure alignment.  The contribution of bit-fields to the size
will be discussed later.  GNU also includes a system dependent
\verb|ROUND_TYPE_SIZE| macro, which seems to be defined only for the
Intel 80960.  We have omitted this.  It would occur in the words
\verb|}struct| and \verb|}union|.

4.  The size of a union is the maximum of the sizes of its fields,
rounded up in the same way as the size of a structure.  The
contribution of bit-fields is again special.

5.  The alignment of a structure or union is the maximum of a minimum,
system prescribed alignment for structures and unions, and the alignments
of all of its fields except bit-fields, which count as integral atomic
types for the purpose of alignment.

6.  Each structure or union field consists of an atomic data type, a
structure, a union, an array, a bit-field, or an unstructured field.

7.  An array has elements all of the same type (which implies the same
size and alignment), which can be any of the atomic data types, a
structure, a union, or an array.

8.  The size of an array is the number of elements times the size of
one element.  The alignment of an array is the alignment of any
element.

9.  The raw alignment of an unstructured field is one address unit. 
Other alignments may be forced, but are recorded only implicitly in
the offset of the field, not in the unstructured field type instance.

\subsection{Bit-fields}

For bit-fields, we have already mentioned that we do not attempt to
map the GNU macros.  In the C Standard \cite[6.2.1.2]{ansic:90}, named
bit-fields are associated with one of the {\tt int} types (unsigned or
signed), and cannot have a width of more bits than that type.  Our
reading of the standard is that the actual container size can be
anything big enough, and does not have to be built from a sequence of
{\tt int} sizes.  The bit-field is allowed to overlap a container
boundary if there are too few bits available for packing in it, or
not, depending on the implementation.  The alignment of the container
is unspecified.  Unnamed bit-fields with only a container type and a
width specified can be used for padding; and an unnamed bit-field of
width zero forces packing to end in the current sequence of
bit-fields, if any, with the next bit-field starting a new container. 
The syntax for bit-field access is like that for an {\tt int}. 
Whether the policy for bit-field layout when embedding would overlap a
container boundary has to be the same for structures and unions is
undefined in the C Standard.\footnote%
{The terms ``unspecified'' and ``undefined'' have a technical meaning
in the C Standard, roughly the following.  ``Unspecified'' is for
correct language constructs and data, and means the standard
``explicitly imposes no requirements'' (not the same as imposing no
explicit requirements) \cite[3.17]{ansic:90}.  ``Undefined'' is for
nonportable or erroneous situations, and means the standard ``imposes
no requirements'' \cite[3.16]{ansic:90}.}

GNU CC admits other integral types for bit-fields, such as {\tt char},
{\tt long}, etc.

The approach we take in Forth makes possible the layout of a sequence
of contiguous bit-field declarations starting with any alignment, any
number of bits of initial, unnamed padding, any width of named
bit-fields with any unnamed padding in between, embedded in any
commensurate total number of address units with any unnamed
bit-padding at the end that is also commensurate with the total number
of address units.\footnote%
{As far as we understand, this is possible in GNU CC.}

We adopt the spirit that the container type not only limits the
maximum size of a bit-field, but also has size and alignment
implications for the container.  Flexibility of container size and
alignment is achieved by allowing non-{\tt int} container types.

1.  Any atomic type is allowed for bit-field embedding.  We call this
the container type or the type of the container field.  Container
types with the same number of bits and the same alignment are not
distinguished from each other.  The spirit is that the container type
should be of integral type, i.e., {\tt char}, {\tt int}, etc.

2.  The width of a named bit-field must be nonzero, while that of an
unnamed bit-field may be zero.

3.  When nonzero the width of a bit-field may not be larger than that
of its container type.  As many bits as possible are allocated from
the unallocated bits in the container field of any immediately
preceding bit-field with the same container type; and if a nonzero
width is left over, a new container field of the same type is started
for the remaining bits.  That is, we require the bit-field to overlap
an embedding boundary in such a case.  If there is an immediately
preceding bit-field of different container type, there is no embedding
in the preceding container field; and a new container field is started
(with the alignment of its type) for all of the bits in the bit-field. 
This paragraph applies only to nonzero widths.  It implies that such
bit-fields have either one container field, or two container fields of
the same type when it straddles an embedding boundary.  We require
that structure and union bit-field elements have the same layout;
i.e., if a bit-field in a structure overlaps, requiring two container
units, it also overlaps and generates two container units in a union. 
The offset of the first container field is zero in a union.

4.  A zero-width bit-field with any container type prevents further
embedding into any immediately preceding field, and aligns the initial
offset for the next structure or union field according to the
container type.  It does not allocate a new container field.

5.  Unnamed bit-fields survive in the structure or union table.  This
would not be necessary just to get the major effect of padding the
offsets of succeeding named bit-fields within their container fields,
and the offsets of those container fields and of other succeeding
named fields within the structure or union; but we think it's a good
idea to record the unnamed fields in the structure or union type
information.

6.  Both named and unnamed bit-fields may occur in any order, adjacent
to each other or isolated among other fields.

\clearemptydoublepage