At first glance, the issues of software quality and software installability are almost completely distinct. There are high-quality software products which are trivial to install, and high-quality software products which are a nightmare. There are small, off-the-cuff programs on the net which are trivially easy to install, and others which give ones nightmare.
After fifteen years as both software developer and system administrator, I have found a number of features of software design which affect the installability of the end product. These are not large features, and during software development they usually seem to be trivial. This paper will attempt to point out what makes software installable and why one particular choice might be superior to another. It will also describe some of the system checks and precautions which a good install script will do.
There are standards efforts under way for software installation [Archer93], but these focus much more on the mechanics of installation rather than the qualities which make a package installable. The resulting standard will be an improvement over the current wild mish-mash, but it will not in and of itself improve the quality of the installations. It will simply make them more predictable.
As a part of this, we will occasionally discuss a theoretical word processing package WhizzyWimp[2]. All of WhizzyWimps installation features will be based on features of existing software packages. Some of these are taken from good examples found in the real world, others are taken from fixes needed to install recalcitrant packages. The guilty will remain nameless.
Much of this paper will look at the issues of what
makes for good installability.
Once this is complete,
we will turn to how those issues affect the development
of the program itself.
To be installable, a package must meet many criteria.
Some are related directly to installation, while
others cover the reconfiguration and use of the package
after installation.
During the installation process, a well-designed
package
After installation, a good package
In summary, a good package has minimum impact on
the rest of the system when installed.
In actual practice, the total changes can be reduced to the amount of
disk space consumed and the appearance of a single file
in the location of a sites standard executables.
These are not cast in stone; there may be reasons
to violate one or another of the above that are package-related.
For example, numerous word processors
have one-way conversion of old file formats to new formats.
This should not, however, mean that users of the
old form must upgrade if there is no need for sharing
files.
The benefits increased installability are many.
The most important for the supplier are:
In addition, the professional programmer is concerned with the quality of all work, not simply the
source code.
When building a package, the installation
and cost of ownership are as important an issue as correct performance
The professional should be as concerned about these items as about user interface or any
other developmental issue.[3]
By making all system changes in a single area, one
can minimize the effect of a change on a given system.
This has several additional benefits, which we will
discuss as well.
The ideal package consists of a single publicly
installed executable which references an area owned
only by that package.
In that way the system is minimally impacted by the additional software, and the
chance of packages interfering with each other is
reduced to almost zero.
With WhizzyWimp, it turns out we have a package
which consists of a number of separate executables.
There is WhizzyWimp itself, the spelling checker, the
PostScript[4] translation package, the index generator,
etc, etc.
All of these need to be installed in some
area where WhizzyWimp can find them, but we want to
avoid installing them in the public areas.
What we do is make WhizzyWimp itself begin with a
shell script which references a configuration file.
This particular model is based on the C News install
[Collyer87], but it may well predate such use.
The
WhizzyWimp script is shown in Figure 1.
Note that 90% of this file is error checking.
Nothing is ever used
before its existence is checked, and decent error messages are produced.
It's a reasonable example of
defensive shell script writing, with an overriding
effort to describe the error condition rather than
simply letting (non-)execution take its course.
Note the copious use of double quotes and curly braces.
The double quotes in the if tests insure that
uninitialized variables (whether from oversight or typographical errors)
will not cause the test to abort.
The curly brackets are defensive programming so that
future modifications are less likely to affect the
resilience of the script.
The double quotes in the
bodies of the error messages will capture errors from
uninitialized variables and from variables which
contain spaces or other whitespace.
There are other methods which can be used to test
definition of shell variables.
In the C News configuration we see constructs like
The most critical portion is the line
The configuration file show in Figure 2 follows similar paranoid principles.
Note that it also error checks for the existence
of every file and produces not simply an error message
about what is missing,
but a message about whether it is the standard or a
locally defined alternate which is missing.
The configuration file is also executable on its own,
so it may be tested without having to invoke the actual application.
All subsidiary programs and files which WhizzyWimp
requires should be installed in
Since the invocation of WhizzyWimp manipulates the
environment, it might be tempting to modify the users PATH
in the configuration file.
Don't!
This can result in ugly and subtle errors.
WhizzyWimp executables, both the master and any slaves, should reference
the environment variables to build paths for any subsidiary
executable they invoke.
This will avoid problems of name collision.
This method has the additional benefit of allowing
installation of multiple versions of WhizzyWimp.
Each version gets its own tree under the master WhizzyWimp tree.
Individual users can simply put the appropriate
version number in their environment, and the right thing will happen.
This allows the system manager to install
a new version of WhizzyWimp without disturbing old versions.
At some future date, the manager decides
the new version is stable and deletes the old version
or makes the directory inaccessible.
Users of the old version will get a useful error message when this occurs.
This also simplifies uninstallability, which we'll return to later.
We have reduced the number of changes visible to the average user
to one - the installation of the master WhizzyWimp script
in some (any!) normally searched public area.
We have not had to modify a users PATH or
This method permits the sophisticated administrator
to take very direct control of the installation location of WhizzyWimp.
The default installation
method will use the default directories as indicated,
the expert method will permit the local administrator
to make major changes without affecting the basic run
style of WhizzyWimp.
One complicating factor of this method is that it
requires the install script build the shell script on
the fly.
This is a long-solved problem.
The reader is
referred to the C-News [Collyer87] or INN source
[Salz92], or any Cygnus install package for examples.
Two tools of particular note are the Cygnus configuration utility
[Pixley92],
[Pixley93],
[Cygnus93]
and Larry Wall's MetaConfig [Wall].
When using shell scripts for configuration, one
wants to avoid needless repetition of configuration.
Since WhizzyWimp has a stand alone spelling utility, we
would like a reliable method of determining if the configuration has already been done.
The simplest method is to create an environment
variable which is used exclusively for that purpose and
check it at the beginning of the configuration file or
before doing the invocation of it.
Thus one would
change the beginning of the config file to
In most situations, modifications to user setup
files are both unneeded and dangerous.
This tends to
be the area where most sites do extensive customization, and it is the area where conflicts are most
likely to occur[5].
The already described method of a standard configuration file with a conditional invocation applies
equally well to personal initialization files like
We have already shown that it is not necessary to
modify users personal setup files.
Unfortunately, sometimes a package requires changes system boot files
or configuration files.
This can lead to some interesting problems.
Curiously, the better-managed the site, the more likely the problem.
Consider the following two real-world examples.
At the Industrial Technology Institute, we carefully placed all system configuration files under control of RCS.
This has the same benefits for system
management that it does for source code control.
It
also was a time bomb for one particular package which
added two lines to
A similar problem occurred at sites which run file
integrity checking packages for security
or rdist[Cooper92] for distributed file management.
In the first case,
the integrity package begin reporting security violations
on the
The situation is further complicated when multiple
packages are installed in quick succession.
Package A modifies a file and renames it to file.bak.
Package B modifies the same file and renames it to file.bak.
The original file is now lost.
There are some simple methods for dealing with this.
The install package should come with checksums
for all system files for which it intends to modify.
(Of course, this assumes the install has been checked
in advance on all systems for which you are selling it.)
A pristine system should be obtained and checksums computed
for all files which will be modified.
At install time, those files should be checksummed.
If there is no match, it's extremely likely that either
the file has changed or the install is being done on a
new type of system.
In either case, the install should
halt and report the problem.
Blindly modifying already changed files is never acceptable.
In addition, the scope of the changes should be strongly restricted.
Adding ports or daemons to
File permissions should be checked before making changes.
RCS and other similar systems make the controlled file unwritable.
Running as root, most packages simply ignore writ-ability and blast away.
Check first!
If
An error in an
The failures largely fall into two groups: bugs in
shell commands inserted into the
System V
Berkeley-based systems are not so lucky.
A number of changes have been proposed, including
[Nieusma],
[Romig91],
[Simmons91].
The reader should consult [Romig91]
in particular for good suggestions.
While the install package
cannot make wholesale changes to the
In general, do not take the existing style or
methods you see in the vendor-supplied
For the installer, the choice is a somewhat simpler one.
Use the System V style of a number of small,
individual
Once again, we have defensive shell programming
with checking for a file of the correct type.
Some particular points to note:
This last point, appropriate comments with begin
and end markers, also applies to any other system
tables which are modified.
However, one should not
assume those markers will still exist at uninstall
time.
One uninstall script looked for the begin
marker, found it, and deleted everything from there to
the end marker or end of file, whichever came first.
Many installation scripts assume the hardware or
software installed resides on the same machine as the
install media or the execution of the install script.
This is often not the case.
It is not at all unusual
for a systems /usr partition to be NFS-mounted read-only from one system, for /usr/local to be read-only
from another, and for the installation media to be attached to a third system.
These become formidable problems to which there is no general solution.
The
best one can do is design the package in such a way
that installation can be performed in a series of steps
appropriate to the various environments.
Extracting data from remote media should be possible without requiring remote root access.
An un-privileged account should be able to extract file sets using
tar, cpio, or whatever is appropriate.
The files containing the extracted sets can then be made read-accessible across the network to the install program.
There are four primary areas where files may need
to be installed:
Modifying each of these disk areas may require accessing the install media from four different systems,
sometimes with and without root access.
The only way to successfully deal with the problem is to break the
install process internally into four steps which can be
performed individually by the appropriate system managers if needed.
Thus there would typically a separate
set in the install media for each of the areas the
install modifies.
Fortunately, installing device drivers onto diskless dataless workstations
is an extreme case.
On the other hand, it doesn't have to happen often to prevent
sales of the particular hardware or software.
The install software should be designed to detect which (if
any) of these situations apply and generate the appropriate messages.
A relatively small amount of effort
results in an install which seems no different in the
standard case but handles the extremes well.
A good installation will include complete documentation of all changes to the system.
This documentation should cover files added, files changed, and preserved files.
Many packages include a Manifest file which lists
all files installed.
This is useful but should go much
further.
The manifest file should list all files,
their ownerships and group memberships, permission
modes, and a checksum of all files which should be
invariant.
Ideally there would also be a script
included which the administrator could run at any time
to determine if any files have changed and how, and
note missing and added files.
A number of existing
installation methods (Digital Equipment, UNIX V, proposed POSIX standard) come quite close.
When system files are changed, three particular
items should be documented.
First, the reason for the
change should be noted.
This should be done in a
ReadMe.InstallNotes file.
The file should also include
comments describing the manifest, the uninstall method,
and the next two items.
Second, a copy of the original unmodified file
should be kept.
Great caution should be used here! It
is not sufficient to copy /etc/rc.local to
Finally, a patch should be generated which will
undo the change.
As subsequent package installs and
system modifications are made the patch will no longer
work in it's literal form.
However, the patch file can
easily be modified to change the line number and can
become a tool to allow the experienced administrator to
either undo the changes made or re-apply the changes if
lost.
The typical inexperienced administrator or end
user will never need or see the ReadMe files.
Their purpose is not for day to day use, but for dealing with
errors.
The experienced administrator will look for it
immediately if a problem is found; and the inexperienced will eventually stumble over it.
A properly designed software package should permit
the simultaneous installation of multiple versions.
This is often temporarily necessary for the conversion
period between versions, and sometimes is needed indefinitely for the support of legacy systems.
One of
WhizzyWimps inspirations is a classic example of this.
A read-only version of WhizzyWimp is available to be
packaged with other software.
This allows the package
vendor to include very nice on-line documentation, but
leads to interesting complications.
The package vendor
may support products on systems (eg, Sun 3s running
SunOS 3.5) where up-to-date versions of WhizzyWimp will
no longer run.
This is fairly easy to deal with at the
end user site, but a nightmare for the package vendor.
The use of the WhizzyWimp version settings in the
configuration files is one way to handle the problem.
The package vendor has multiple versions of WhizzyWimp
installed, each neatly isolated into directories named
after the version.
Most users do not define a Whizzy-Wimp version number in their environment, and hence
always get the latest version.
Another form of multiple simultaneous installation
comes from having multiple processor and/or operating
system types in a network.
Here there are two methods,
both of which work well.
In both cases, one first isolates the executables for given OS/processor into
appropriately named directories.
For the first case, one can add a test to the configuration file
to dynamically determine which OS and processor is in use
and and create the appropriate pathname.
This complicates the configuration file somewhat,
but is amenable to simple analysis and test.
The other is more subtle, and is taken from the FrameMaker installation.
In this method, all scripts are symbolic links to a script called
As shipped by Frame, the
For a product to be uninstallable, there must be
careful tracking of the location of all installed files
and all modifications made to other system files.
Here is where the manifest files, dated backup files and
patch files come into their own.
The uninstall script should not blindly copy back
the modified system files, nor should it ignore them.
It should apply the patch file to a copy of the saved
copy (thereby reversing the changes) and compare the
result to the current installed copy.
If they differ,
there have been subsequent changes to the system file.
These changes would be lost if the old versions of the
system file were simply copied back.
Instead, a message should be generated by the uninstall script stating that the files are restored to their pre-install
version, and directing the uninstaller to the directory
containing the patch and install ReadMe file.
Another caution for uninstallation is the presence
of multiple versions.
If one is simply removing a now-outdated version, it is not a good idea to delete the
master shell script or the socket numbers from
/etc/services.
Note, however, that is is perfectly safe to delete
the executables and library files in the directory and
the specialized
Inevitably, someone will eventually notice the
message and investigate.
For ease in that investigation, the uninstall process should write a message in
the directory with the ReadMe file informing future
administrators of "pending" changes to the files.
The security issues which have already been discussed
focused on the issues of modifying files in a distributed network.
Unfortunately these are the easy issues.
Many packages require the use of a dedicated ID
for file ownership and set user ID programs.
This is a perfectly acceptable practice, but great caution should
be used in the implementation.
One cannot rely on a given user login id or UID
number to be available on any system.
Name conflicts[9] are inevitable
and can happen on systems of any size.
If your product is popular enough, you will eventually
find one.
A number of large sites are already facing
exhaustion of UID space [Doster90]
(as few as 32765 on some systems), so choosing a fixed UID number will
inevitably involve a collision as well.
The only general solution is to allow the product
login id to be modified by the site manager at install
time and modified again later if needed.
This modification would be made in the system install files The
running executable would then do the appropriate getpwnam(3) calls rather than depending on compiled-in
parameters of any type.
With a user id defined for the package, one can do
a great many useful things with suid programs which
will not compromise system security.
Unfortunately these require careful programming.
A full discussion of this is outside of the scope of this paper.
[Simmons90] examines many of these issues;
a careful examination of the C News or INN source will yield useful examples.
The lpr system is a classic bad example.
Many packages attempt to work around the issue by
installing suid root programs.
This should never be
done on a wholesale basis.
As system administrators
become and sites in general become more concerned about
security issues sites will simply refuse to install
such programs as unacceptable risks.
Other packages take a different tack - they make
all files globally writable, then depend on internal
equivalents of access control lists to manage data
integrity.
Typically such packages have been ported
from non-UNIX systems to UNIX and the developers were
either ignorant of how UNIX file and suid systems
worked or were not given sufficient time to do the correct work.
[10]
Adding IDs to a system is a surprising difficult task.
At a minimum the installation script should ask
before doing it; it is far preferable to have the
administrator set up the id first.
As a final note on IDs, everything which has been
said about user ids applies to group ids as well.
Some administrators will not blindly execute any
script as root.
Make your scripts straightforward,
whenever possible using standard utilities.
Many of the problems discussed can be laid at the
feet of programs which have been ported from non-UNIX
systems.
These programs and their operation usually
have many fundamental assumptions about how the operating system works.
When the port effort starts, the
developers and administrators take advantage of UNIXs
customizability and make their UNIX systems look as
much as possible like their original systems.
This
does minimize their own difficulties and seems to
reduce the slope of the learning curve of a new OS.
In
reality, they have not gotten up the curve at all.
They have merely delayed encountering it until the
package is ready to install at a UNIX-savvy site.
At
that point the curve becomes a brick wall and the product falls apart.
The whole world is not UNIX, or VMS, or MVS.
The
techniques which have been discussed here are specific
to UNIX, but their underlying principals are general.
We are now ready to articulate them and how they relate
to software development.
A general interface for getting locations of files
and the values of settable items should be developed.
A generalized getsetting() function should be so than
the high level interface is independent of the low-level implementation.
Such an interface works equally
well for UNIX environment variables, VMS logicals, data
from flat files, or compiled-in tables.
The underlying
mechanism is unimportant.
What is critical is an organized method of retrieving the data from a well-managed
store rather than compiling values into multiple locations in a program.
Once obtained, data should not be trusted until
tested.
Just as the configuration script checks to see
if given directories exist, the initialization portion
of the program should check the environment retrieved
by getsetting() against the actual system environment.
Far too many programs halt with the simple message
"could not open library".
This is inadequate.
The
error message should indicate which library, the name
of the file, and the type of error (permission denied,
missing file, etc).
The documentation provided must be two-fold.
First, standard documentation must be available for the
administrator who is attempting to debug a possibly
defective product installation.
Without knowledge of
what a proper installation is, the administrator can
never be sure that the installation is correct.
Second, the installation process must provide some
dynamic tracking of what it does and preserve that
tracking in a reasonable location.
Messages printed at
install time are useful, but must be supplemented with
logs, message files, and sometimes even recordings of
the install process.
Two of the most important principles of good programming are information hiding and well-defined interfaces with disciplined use.
These apply equally well
to the installation of software.
Follow them for the
installation, and the resulting package will be
improved.
The previously referenced POSIX standard [Archer93]
is a must.
FTPable drafts are available from dcdmjw.fnal.gov
in the directory /posix/1387.2.
As of this writing [1994], the 12th draft is now available.
An informal Software Installation Workshop was conducted by
Paul Anderson at the 1992 Large Installation System Administration Conference.
Notes from the workshop are in [Anderson93], and a mailing list formed
as a result can be reached at soft-managers-request@nas.nasa.gov.
[Anderson93]: Paul Anderson,
Software Installation On Large Systems,
;login:, March/April 1993, Volume 18, No. 2.
[Archer93]: Barrie Archer,
Towards a POSIX Standard for Software Administration,
Proceedings of the Large Installation Systems Administration Conference, 1993.
[Collyer87]: Geoff Collyer and Henry Spencer,
News Need not be Slow,
Winter USENIX Conference, 1987.
[Cooper92]: Michael A. Cooper,
Overhauling Rdist for the '90s,
Proceedings of the Large Installation Systems
Administration Conference, 1992.
[Cygnus93]: David MacKenzie, Roland McGrath, and Noah Friedman,
Autoconf: Generating Automatic Configuration Scripts,
Cygnus Support documentation.
[Doster90]: William A. Doster, Yew-Hong Leong, and Steven J Mattson,
Uniqname Overview,
Proceedings of the Large Installation Systems Administration Conference,
1990.
[Nieusma]:
Posting of rewritten Sun
[Pixley92]: K. Richard Pixley,
On Configuring Development Tools,
Cygnus Support documentation.
[Pixley93]: K. Richard Pixley,
Cygnus Configure,
Cygnus Support documentation.
[Romig91]: Steve Romig,
Some Useful Changes for Boot RC Files,
Proceedings of the Large Installation Systems Administration Conference,
1991.
[Salz92]: Rich Salz,
InterNetNews: Usenet transport for Internet sites,
Summer USENIX Conference, 1992.
[Simmons90]: Steve Simmons,
Life Without Root,
Proceedings of the Large Installation Systems Administration Conference, 1990.
[Simmons91]: Posting of rewritten Sun
[Wall]: Larry Wall, metaconfig(1) manual page, dist2 package,
posted in comp.sources.unix, Volume 16, 1988.
What Is Installability
Localizing Changes With A Shell Script
NEWSCTL=${NEWSCTL-/usr/lib/news}
While this is both correct and portable, it requires
the reader to know shell variable construction rules fairly intimately.
The if
construction is immediately
obvious to anyone with the slightest experience in programming,
and hence preferable.
. ${WHIZZYWIMPCONFIG}
This is the invocation of the configuration script itself.
By using the .
directive, we can make changes
in the local shell script environment.
As we add more and more programs to the WhizzyWimp package,
each of them refers to this configure script rather than
embedding the definitions individually in the programs.
When a system change is needed, a single change to the
configuration file updates all programs immediately.
${WHIZZYWIMP_HOME}
.
The structure under this tree can be as complex as desired.
As long as all shell scripts begin by invoking the
${WHIZZYWIMPCONFIG}
file and
all child processes are started from there,
the tree can be placed anywhere.
Any time system installation requires moving the tree,
the system manager can do so without fear of breaking WhizzyWimp.
.cshrc
or .login
files - always a risky proposition
anyway, as we have no reliable way of predicting what
shell a user uses or the method by which it is invoked
on a given system.
WHIZZY_WIMP_CONFIGURED=0
and a check for WHIZZY_WIMP_CONFIGURED
would be done
before invoking the config file.
Modification of User Setup Files
.cshrc
, .login
, .profile
, etc.
Given the configuration
method we have shown above, it should not be necessary.
But if it is needed, do it right.
Modification of Existing System Files
/etc/services
and a `paragraph' to
/etc/rc.local
.
When the system manager went to modify one of those files,
he checked out new copies from the RCS archive,
made the changes, and installed the modified file,
deleting the changes the install had made.
Bringing back the install-time changes required
restoring files from tape - not a favorite task of any administrator -
and then adding them into the RCS archive.
The problem occurred several times with
several packages before it was diagnosed;
now the managers routinely do rcsdiff
even on what appear
to be pristine files before checking out new copies for modification.
rc
files as soon as the package was installed.
Recovery involved rebuilding the integrity database.
In the second, the rdist update undid the install changes.
Unlike the situation at ITI,
the changed file was not backed up and hence not recoverable.
A new install of the product was required.
inetd.conf
, services
,
and other simple tables is fairly simple and hard to do wrong.
The various rc
files are much more complex,
and are discussed in a separate section below.
inetd.conf
is not writable,
the system manager is telling you something.
RC Files
rc
file can make an system unbootable.
In spite of this danger, a number of
installation methods are quite cavalier about how they
modify one or more of the rc
files.
rc
file, and excessive
changes to the file[6].
rc
files avoid the latter by isolating
the changes into a directory of rc
subfiles which are
executed in an easily-specified order.
The master rc
file is careful about how it executes those files,
and thus is resilient in the face of failure.
rc
files, whatever
changes are made should be done in accordance with the
suggestions for good quality rc
files.
rc
as examples
of how to do things.
The quality of standard rc
files
is low.
rc
files which are invoked by one of the
master rc
files.
In the master, make a minimal change
as shown in Figure 3.
Diskless, Dataless and Diskful Nodes
Documentation Of Changes
Dealing with multiple simultaneous installations
.wrapper
[8]
The .wrapper
script dynamically determines OS and processor type,
and invokes an executable from the appropriate directory.
The name of the executable is taken from $0
in the environment, almost always the same as the symbolic link to
.wrapper
.
.wrapper
script is functional
for FrameMaker but is not general enough to be
used by most other packages.
While it could be extended,
there seems to be no significant benefit to using it over
the simple configuration script method.
Uninstallability
rc
file which is invoked by the master
rc file.
The modification done to the master rc
file
checks for the existence of the specialized rc
file,
prints an appropriate message, and continues with system boot.
System Security
Porting From Non-UNIX Systems
Localization of Product Information
Verification of Configuration
Documentation
Conclusion
Other Commentary
References
rc
files to comp.sys.sun.
rc
files to
comp.sys.sun and Ultrix rc
files to comp.unix.ultrix.
Footnotes
1. UNIX is a trademark of X/Open. Today, anyway.
2. What You See I What I Mostly Programmed.
3. And packages which are easy to install and maintain
often result in gifts of beer at conferences such as this.
4. PostScript is a trademark of Adobe, Inc.
5. One CAD/CAM package which will remain unnamed
requires over 300 lines of addition to
.cshrc
and .login
files
and includes multiple(!) lines like set path = ( a b c )
,
utterly destroying the users carefully constructed path.
Worse, the installing site consisted of ksh
users.
Worst, the lines to be added contained several bugs.
6. Although I have seen one install which simply mangled the rc
file.
7. Some configurations literally rebuild the root
partition at every boot. In this case installation of
software which requires modification of files in /etc
is best left to the manual intervention of the site
managers.
8. The Frame .wrapper script cannot be reproduced here
due to copyright restrictions.
9. Around 1982 the Los Angeles phone book contained a
listing for the Ingres family, and the last name Root
has caused interesting issues on some systems.
10. There is one popular mainframe accounting package
which has been ported to UNIX which does exactly this.
It has now been two years since the problem was reported
and they have failed to fix it.
Back to Steve's home page.
Contact, License and Copy Issues.