% STRUCTURE OF STRUCTURE DATA % strucdata.tex % David.N.Williams@umich.edu % Last revision: July 16, 2000 \section{Structure of structure data}\label{structdata} In the interest of focus, we describe here the layout of structure data and type information that we have in mind. \subsection{Structure instances} A structure instance contains the actual data of a structure, and possibly more information. Instances may be either named or unnamed. The essential kernel of either kind of instance is the structure data itself, which we understand to exclude type information. This is the part whose layout should be C compatible. We call the address of such a memory block of pure data the \sda, for ``structure data address''. If we speak of a structure pointer, we mean the \sda. Substructure instances are presumed to contain only pure data. As a rule of thumb in our discussion, unnamed structure instances contain only pure data, and named instances (in the sense of named Forth words) contain one extra piece of information, a pointer to the structure type information. That pointer is called an \stype. Although substructures have named identifiers, substructure instances are not ``named structures'' in the Forth sense, and as we said above contain only pure data. Layout of a typical named structure instance: \begin{alltt} stype field 1 data \vdots field n data \end{alltt} The address of {\tt field~1 data} is the \sda. Alignment for compatibility with the underlying system is understood. There is a subtlety here. The named structure instance is typically made with {\tt CREATE}, and \stype occupies the first cell in its data field, at the Forth-aligned \dfa. Then {\tt field~1 data} is not necessarily located one cell after the \dfa, because the structure alignment may not be consistent with that. Structure data access words have to take this into account. A typical unnamed instance would omit the \stype. \subsection{Structure type definitions}\label{structypedef} The structure type definition contains the information about the layout and sizes of the structure fields. We implement the structure type pointer, or \stype, as the \dfa of a structure type word. \clearscreenpage Layout of a structure type definition (implementation dependent): \begin{alltt} code field {\normalfont(\verb|CREATE|'d action leaves the \dfa)} stype: structure data size {\normalfont (including padding)} class {\normalfont (1 for structures)} structure alignment #fields field 1 parameters \vdots field n parameters \end{alltt} In this discussion the ``field'' in {\tt field~1}, etc., is used in the sense of a C structure element. In discussions of standard C, ``field'' is sometimes used as a synonym for ``bit-field'' \cite[p.~149]{k&r:88}, a practice we avoid. The data field arrangement, following the code field, varies only a little from Peters' ``type definition table''. We call it instead the ``structure table'', reserving ``type definition'' for more generic data typing. In other words, our \stype is the address of the structure table. Including explicit information on the number of fields rather than a table termination signal is an implementation detail. Although the layout of the structure table is an implementation detail, we like keeping the ordering the same as for the storage of the structure data. The field parameters in the structure table allow the construction of field offsets from the beginning of pure structure data, including substructure nesting. As long as they satisfy this function, their order and content are implementation details. Layout of structure field parameters (implementation dependent): \begin{alltt} field identifier field offset field type pointer \end{alltt} \subsection{Structure field classes} We require six classes of fields, organized in this implementation as follows: \begin{center}\noindent \begin{tabular}{lc>{\tt}l} field& class& \normalfont type pointer\\ \hline unstructured data& 0& ustype\\ structure& 1& stype\\ atomic data type& 2& adtype\\ array& 3& atype\\ union& 4& utype\\ bit-field& 5& bftype\\ \end{tabular}\end{center} The class numbering is implementation dependent. The numbering here reflects our personal implementation priority, with levels of conditional compilation in mind. An important use of the class number is to indicate nesting termination. The unstructured data class should not be included in C compatible structures. Taken together with just the structure class, it can provide a simple, standalone Forth structure facility with full nesting and independent field identifiers, where the user keeps track of primary data types and sizes. This kernel is our ANS translation of Peters' implementation, and there is an option in {\tt cstruct.fs} to compile just that much. The atomic data types are the standard C types, {\tt char}, {\tt short}, {\tt long}, {\tt int}, {\tt long double}, {\tt float}, {\tt char*}, etc. Each has a type definition pointed to by an \adtype, described in Section~\ref{typestruc}. Arrays are made of elements all of the same kind (including especially size), which may belong to any of the six classes except bit-fields. To be C compatible, the array elements should not be unstructured data. Unions are made of elements whose storage space overlaps, with size and alignment large enough to accommodate the largest. The elements may belong to any of the six classes, except that unstructured data should not be included in C compatible unions. As Kernighan and Ritchie express it \cite[pp.~148, 213]{k&r:88}, a union is just a structure with all elements offset by zero from the beginning, with alignment accommodating the biggest alignment of any element, and with size big enough to hold any element. There will be more words later about bit-fields than we would have wished. In this implementation, each structure field parameter occupies one cell, which means 12 bytes for each field in 32-bit environments. Since we expect that most applications will either involve relatively few structure tables, or will correspond to an industrial strength environment when there are many, this may not be excessive. On the other hand, restricting structure instances to 64K bytes is likely to be more than adequate, in which case 16 bits for each of the first two parameters should suffice, which would reduce the overhead to 8 bytes. \clearemptydoublepage