% OVERVIEW OF STRUCTURE WORDS % strucwords.tex % David.N.Williams@umich.edu % Last revision: July 16, 2000 \section{Overview of structure words}\label{strucwords} This overview is a brief functional description. Stack patterns and other specifications can be found in the implementation file {\tt cstruct.fs}. \subsection{Structure word set} \begin{alltt} \verb|struct{ }struct union{ }union| n-aligned unstruct field array-field bit-field bit-pad cchar cwchar cint cshort clong cpointer cllong cfloat cdouble cldouble /type /align make-type-instance typeof make-atomic-type make-array-type make-unstruct-type >sfa >sfo >sfa&type >sfo&type \end{alltt} \subsection{Description} Here is an example borrowed from Peters (see {\tt examples.fs}) except that his word {\tt field} corresponds to our {\tt unstruct}, and his word {\tt struct} corresponds to our {\tt field}: \clearscreenpage {\begin{alltt} \verb|struct{| 12 unstruct first 16 unstruct last \verb|}struct| name.struct \verb|struct{| 2 unstruct month 2 unstruct day 2 unstruct year \verb|}struct| date.struct \verb|struct{| name.struct field name date.struct field doa 12 unstruct mrn 64 unstruct precis \verb|}struct| pt.struct \end{alltt}} The code above defines structure type words {\tt name.struct}, {\tt date.struct}, and {\tt pt.struct}. We are not particularly advocating the {\tt .struct} naming convention. If we were to do so, it would probably be a {\tt .s} convention for structures and {\tt.u} for unions. The words {\tt unstruct} and {\tt field} between \verb|struct{| and \verb|}struct| absorb the type information that precedes them, define or look up identifying tokens for the field names that follow them ({\tt first}, {\tt last}, {\tt month}, {\tt day}, {\tt year}, {\tt name}, etc.), and build a sequence of field definition parameters on the stack. The word \verb|struct{| initiates the sequence, and \verb|}struct| creates a structure type word and compiles the sequence of field parameters from the stack into a structure table in the structure type word's data field, preceded by the other information described in Section~\ref{structypedef}. When executed, the field names leave their identifying tokens, called \id's, on the stack; and the structure type words leave their \dfa's, that is, their \stype's. Here is a structure type including one C {\tt char} field and one field with an array of 10 C {\tt long}'s: \begin{alltt} \verb|struct{| cchar field sue 10 clong array-field george \verb|}struct| harry.struct \end{alltt} And here is one containing two arrays of {\tt harry.struct} structures: \clearscreenpage \begin{alltt} \verb|struct{| 15 harry.struct array-field arthur 20 harry.struct array-field marie \verb|}struct| harry-arrays \end{alltt} The examples above are included in {\tt shotype.fs}, along with union versions and bit-field examples, to illustrate a browser for structure and union types implemented there. The word {\tt n-aligned} is mainly a factor in the field constructors {\tt field}, {\tt array-field}, and {\tt bit-field}; but it can be used explicitly when alignment of an unstructured field is wanted (the field constructor {\tt unstruct} does no alignment). For example, if one wanted the field \begin{alltt} 2 unstruct year \end{alltt} in {\tt date.struct} above to have an alignment of four, one could say: \begin{alltt} 2 4 n-align unstruct year \end{alltt} This is tricky, because it not only has the effect of saying \begin{alltt} 4 unstruct year \end{alltt} but also adjusts an implicit alignment deeper on the stack. The best policy is to avoid explicit use of {\tt n-aligned} if possible, and use the implicit minimum alignment of structures, plus padding included directly in the size of the unstructured field. The word {\tt bit-pad} inserts unnamed bit padding. The atomic type words {\tt cchar} \ldots\ {\tt cldouble} represent most of the scalar GNU CC types. Some of these are not standard C. The word {\tt /type} converts any of the six type structures into its data size in bytes, and {\tt /align} converts them to the sizes of their alignments. For bit-fields, the data size is that of the 0, 1, or 2 container fields. The words {\tt make-unstruct-type} and {\tt make-array-type} are used in this implementation by {\tt unstruct} and {\tt array-field} as factors that build type words on the fly. They could also be used explicitly to make unstructured type and array type words to be used with {\tt field}, dispensing with {\tt unstruct} and {\tt array-field}. The word {\tt make-atomic-type} is intended to let the user cover C implementation-dependent gaps, for our example, in our pointer type coverage. The {\tt make-} style of nomenclature is borrowed from Anton Ertl's Gray \cite{aertl:94}. The word {\tt make-type-instance} is used to allocate type instances. A number of words like {\tt >sfo} do exactly the same thing when operating on structure or union types. To save names, we take the attitude in such cases that a union is just a kind of structure. The word {\tt >sfo} converts a structure or union type and a sequence of \id's for nested substructures that resolves to an atomic or unstructured or array field, such as {\tt last name} for the structure type {\tt pt.struct} in the example above, into the offset of the field from the \sda of an instance. Here ``{\tt sfo}'' stands for ``structure field offset''. For bit-fields, the offset of the first container field is returned. The word {\tt >sfo} can also convert a truncated nesting, such as just {\tt name} with {\tt pt.struct}. Examples of the syntax are given in Section~\ref{extwords}. The word {\tt >sfa} does the analogous thing, but takes an \sda as well as an \stype as input, producing the address of the field instead of the offset. For bit-fields, that is the address of the first container field. The words \verb|>sfo&type| and \verb|>sfa&type| also leave the type pointer. They evolved from a factorization of Peters' implementation, which returned sizes instead of types. Nesting in the \id chains resolved by these words cannot go deeper than an \id for one of the primitive types: unstructured, atomic data, or bit-field. It is also stopped by an array type. Although an array may have structure elements, a new chain would have to be started to access any nesting in those, after indexing into the array. As we said earlier, we do not implement array access. See {\tt cstruct.fs} for more details about the Structure word set. \subsection{Implementation note} Our implementation makes a type instance for every unstructured field, array field, and bit-field in a structure type definition. We indicated above that {\tt unstruct} and {\tt array-field} can be eliminated by using explicitly defined types with {\tt field}. That would reduce the type overhead if there were several unstructured fields or arrays of the same type. In the case of bit-fields, implicit type generation helps us track the arbitrary bit offsets they can have in their containers. We have not been tempted to try to reduce that overhead. \clearemptydoublepage