-- Frequency Counts is a perl program I use to get a quick frequency table from a column of data values without the hassle of running a huge statistical application like SAS or SPSS. can be used to answer the question "how many times do the unique values in columns x through y occur in this file?"

The input file is assumed to be formatted with the data of interest in the same column range on each line of the input file. The data may be numeric and/or character. The output is reminiscent of that produced by SAS proc freq, consisting of the individual values in increasing order, plus a frequency count and cumulative frequency count for each value.


The commandline is: [-h] [-c[#-#][#] filename] where: -c#-# indicates starting & ending column numbers of the variable, -c# indicates a single-column variable at column # and -n selects a numeric sort of the output instead of the default character sort.

As an example, the command -c210 data.file might produce output like:

(pts/1):~> -c210 data.file

        Page  1
        Frequencies for the values in columns 210-210
        in the file "data.file"

                              Value   Frequency   Frequency
                             -------  ---------  ----------
                                   0       2214        2214
                                   1       1009        3223
                                   2        533        3756
                                   9      15721       19477
If there had been many values to print out, the program would have output a paged listing, each page with the page number and header information as in the above table. In testing, this program was usually faster than a shell script that uses sort, cut, uniq, and awk to produce output without the cumulative frequencies (though to be fair my program does not handle records per case).
Back to Kent's Perl Page