% LAYOUT ALGORITHM % layout.tex % David.N.Williams@umich.edu % Last revision: July 13, 2000 \section{Layout algorithm}\label{layout} Here is the algorithm we follow for structure and union layout, aimed at compatibility with the GNU scheme for parametrizing systems, as long as there are no unstructured fields, and with a certain exception for bit-fields. There is no distinction between signed and unsigned types. It is up to the user to handle that at the point of field access, when it is an issue. Byte-ordering is irrelevant at the layout level---that affects field data access only. Whether bit-allocation for bit-fields is from left to right or right to left within an embedding integer field is irrelevant at the layout level for the same reason. \subsection{Normal fields} The rules in this subsection are mostly for everything except bit-fields. 1. Each atomic data type has a size in address units and an alignment in address units, which is system dependent. These alignments are often expressible in terms of only a few parameters, but each is specified independently in our scheme at the level of the data type structure. We include a single, generic pointer type as an atomic data type, not distinguishing among pointers to different data type instances. Other atomic data types can be supplied by the user. 2. Each structure field has a size, an alignment, and an offset from the beginning of the structure. 3. The size of a structure is the sum of the sizes of its fields, plus any padding between fields to achieve alignment of the later field, plus a padding at the end to round up the size to a multiple of the structure alignment. The contribution of bit-fields to the size will be discussed later. GNU also includes a system dependent \verb|ROUND_TYPE_SIZE| macro, which seems to be defined only for the Intel 80960. We have omitted this. It would occur in the words \verb|}struct| and \verb|}union|. 4. The size of a union is the maximum of the sizes of its fields, rounded up in the same way as the size of a structure. The contribution of bit-fields is again special. 5. The alignment of a structure or union is the maximum of a minimum, system prescribed alignment for structures and unions, and the alignments of all of its fields except bit-fields, which count as integral atomic types for the purpose of alignment. 6. Each structure or union field consists of an atomic data type, a structure, a union, an array, a bit-field, or an unstructured field. 7. An array has elements all of the same type (which implies the same size and alignment), which can be any of the atomic data types, a structure, a union, or an array. 8. The size of an array is the number of elements times the size of one element. The alignment of an array is the alignment of any element. 9. The raw alignment of an unstructured field is one address unit. Other alignments may be forced, but are recorded only implicitly in the offset of the field, not in the unstructured field type instance. \subsection{Bit-fields} For bit-fields, we have already mentioned that we do not attempt to map the GNU macros. In the C Standard \cite[6.2.1.2]{ansic:90}, named bit-fields are associated with one of the {\tt int} types (unsigned or signed), and cannot have a width of more bits than that type. Our reading of the standard is that the actual container size can be anything big enough, and does not have to be built from a sequence of {\tt int} sizes. The bit-field is allowed to overlap a container boundary if there are too few bits available for packing in it, or not, depending on the implementation. The alignment of the container is unspecified. Unnamed bit-fields with only a container type and a width specified can be used for padding; and an unnamed bit-field of width zero forces packing to end in the current sequence of bit-fields, if any, with the next bit-field starting a new container. The syntax for bit-field access is like that for an {\tt int}. Whether the policy for bit-field layout when embedding would overlap a container boundary has to be the same for structures and unions is undefined in the C Standard.\footnote% {The terms ``unspecified'' and ``undefined'' have a technical meaning in the C Standard, roughly the following. ``Unspecified'' is for correct language constructs and data, and means the standard ``explicitly imposes no requirements'' (not the same as imposing no explicit requirements) \cite[3.17]{ansic:90}. ``Undefined'' is for nonportable or erroneous situations, and means the standard ``imposes no requirements'' \cite[3.16]{ansic:90}.} GNU CC admits other integral types for bit-fields, such as {\tt char}, {\tt long}, etc. The approach we take in Forth makes possible the layout of a sequence of contiguous bit-field declarations starting with any alignment, any number of bits of initial, unnamed padding, any width of named bit-fields with any unnamed padding in between, embedded in any commensurate total number of address units with any unnamed bit-padding at the end that is also commensurate with the total number of address units.\footnote% {As far as we understand, this is possible in GNU CC.} We adopt the spirit that the container type not only limits the maximum size of a bit-field, but also has size and alignment implications for the container. Flexibility of container size and alignment is achieved by allowing non-{\tt int} container types. 1. Any atomic type is allowed for bit-field embedding. We call this the container type or the type of the container field. Container types with the same number of bits and the same alignment are not distinguished from each other. The spirit is that the container type should be of integral type, i.e., {\tt char}, {\tt int}, etc. 2. The width of a named bit-field must be nonzero, while that of an unnamed bit-field may be zero. 3. When nonzero the width of a bit-field may not be larger than that of its container type. As many bits as possible are allocated from the unallocated bits in the container field of any immediately preceding bit-field with the same container type; and if a nonzero width is left over, a new container field of the same type is started for the remaining bits. That is, we require the bit-field to overlap an embedding boundary in such a case. If there is an immediately preceding bit-field of different container type, there is no embedding in the preceding container field; and a new container field is started (with the alignment of its type) for all of the bits in the bit-field. This paragraph applies only to nonzero widths. It implies that such bit-fields have either one container field, or two container fields of the same type when it straddles an embedding boundary. We require that structure and union bit-field elements have the same layout; i.e., if a bit-field in a structure overlaps, requiring two container units, it also overlaps and generates two container units in a union. The offset of the first container field is zero in a union. 4. A zero-width bit-field with any container type prevents further embedding into any immediately preceding field, and aligns the initial offset for the next structure or union field according to the container type. It does not allocate a new container field. 5. Unnamed bit-fields survive in the structure or union table. This would not be necessary just to get the major effect of padding the offsets of succeeding named bit-fields within their container fields, and the offsets of those container fields and of other succeeding named fields within the structure or union; but we think it's a good idea to record the unnamed fields in the structure or union type information. 6. Both named and unnamed bit-fields may occur in any order, adjacent to each other or isolated among other fields. \clearemptydoublepage