Using
Computers
in
Linguistics:
A
Practical
Guide

The
UnixTM
Language
Family

Online Appendix:
Filter Languages

On Usenet:
comp.lang.awk
comp.lang.perl.*

Man pages:
sed awk nawk gawk

Downloadable Software:
HAD's AWK Page
Information on awk
DOS awks
Availability
of Awk and Perl

(from Henry Churchyard)

Tutorials and help:
How to use sed
awk Resources
Chapter 11, on awk,
of UNIX in a Nutshell
from O'Reilly
GNU Awk User's Guide
Coping With awk
How to use awk
Guide to awk
Introduction to Awk
The Perl Institute

sed, awk, and perl

sed, awk, and perl are some of the Unix utilities that implement Regular Expressions, mostly in tasks requiring pattern matching and substitution.

They are widely used for data manipulation, searching, and general programming. While they were originally developed for and are integrated into Unix, they have been ported to every other computing environment, including PCs.

sed is a stream editor, which follows commands just like an interactive editor, but is designed to run in batch mode, to perform repetitive search-and-replace commands untouched by human hand. It deals with individual characters and thus is more useful for phonological manipulation than large-scale textual analysis. It is cryptic, though no more so than, say, Turkish Vowel Harmony.

awk (named after its authors: Aho, Weinberger, and Kernighan), is a text-oriented pattern-matching language that is at its best and most powerful when coping with large amounts of moderately structured data. For instance, one can perform text analysis on Usenet posts using awk. It is less cryptic than sed, and works at the word level, rather than characters. It can do anything that sed can, but sed is faster and simpler for what it does. Awk exists in several dialects, including nawk ('new awk'), with a richer command set, and gawk ('Gnu awk'), part of the Gnu operating system from Free Software Foundation.

Both awk and sed exist on every Unix system; consult the local man pages for details of your specific implementation. They are also available for most microcomputer systems, including DOS. Both are line-oriented, and both have limitations, despite their utility.

Perl, by contrast, is a full-featured programming language, designed to be useful for handling text and will do everything sed and awk can and plenty more besides. The script that runs The Chomskybot is written in Perl, and so are most of the CGI scripts that drive search engines and other Web programs.


Back to Chapter Appendix         Up to Book Page  
Unix in General         Shells and Aliases       Regular Expressions  
Last change September 27, 1999       John Lawler