FAQ for STAT-L/SCI.STAT.CONSULT, last modified on December 13, 2002

Compiled by Steve Simon, ssimon@cmh.edu.

"A further knowledge of facts is necessary before I would venture to give a final and definite opinion." Sherlock Holmes in The Adventure of Wisteria Lodge.

This FAQ is posted once a month to STAT-L/SCI.STAT.CONSULT. The FAQ now has two home pages:

http://www-personal.umich.edu/~dronis/statfaq.htm and


Variations and earlier versions of the FAQ can be found on other sites on the web. You are welcome to post all or part of this FAQ at your web site. Please don't modify it without my permission, and please let me know where you are posting it.

Note: I use uppercase for certain items like e-mail addresses and listserver commands to help highlight them. You can, however, use upper or lower case (or even mixed case) for any of these items.

Table of contents

2 What are other related listserv/usenet groups?
3 How do I know that my message got posted?
4 How do I use LISTSERV to...
5 How do I get the archives of STAT-L/SCI.STAT.CONSULT?
6 Why have I stopped seeing messages?
7 How can I contact the ASA, Biometric Society, or IMS?
8 How can I contact the major statistics software vendors?
9 Where can I find free/shareware statistical software?
10 What statistics resources can be found on the web?
11 What should I do about these "Spams"?
12 What are some of the problems with stepwise regression?
13 What is the answer to the Monty Hall, Envelope, or Birthday problem?
14 Can someone provide me with references and/or books about [topic]?
15 Can you recommend a good statistics software package?
16 Acknowledgments


STAT-L and SCI.STAT.CONSULT are a combined LISTSERV/USENET group for the discussion of statistical consulting issues. Through the magic of Internet, any message posted on SCI.STAT.CONSULT also appears on STAT-L. Any message posted on STAT-L appears on SCI.STAT.CONSULT. So you can follow all the fascinating questions and answers using either system.

We discuss statistical issues of all levels of difficulty, as well as statistical education, the practice of statistical consulting, and other related topics. We also like to debate some of the more controversial issues in Statistics like the validity of the statistical models used in the Bell Curve book and the pitfalls of stepwise regression models.

Be sure to put your name and e-mail address at the end of your message. Some people have e-mail systems that strip headers from a message, making it impossible for them to reply directly to you.

If you have a question about a particular statistics package, you will probably get a faster and more accurate answer by posting the question on the list that specializes in a particular package (e.g., SAS-L/COMP.SOFT-SYS.SAS or S-NEWS). Refer to the section "How can I contact the major statistics software vendors?"

We appreciate questions at a levels from beginner to expert. Sometimes, the beginner questions lead to some interesting discussions as to the subtle nuances in statistical consulting. If you want advice on how to analyze some data, please include some context as to what your data means and what you are trying to investigate. No one can answer a question well that only says "Listed below is some data. How do I analyze it?"

Be careful about advice on STAT-L/SCI.STAT.CONSULT. You'll find many people who are glad to help you, but you must realize the serious limitations of e-mail. There is no adequate substitute for getting advice face-to-face with a professional, especially BEFORE collecting any data and BEFORE performing any experiments. Even the most experienced and wise Statisticians will be unable to make sense out of a poorly designed study.

There are three types of messages that we discourage. First, try to avoid any overly commercial pitches, including posting your resume. On the other hand, we do like to hear about job openings, especially ones that list starting salaries so we can bemoan how little we make on our current jobs. Postings of upcoming conferences are also acceptable.

Second, don't post your homework questions on here, even if you have permission to do so from your teacher. If you're looking for help on a thesis or disertation, make sure that your advisor is aware that you are seeking outside help.

Third, while we enjoy a spirited debate, please refrain from flaming and personal attacks. Although we have occasional lapses, this list has a generally high level of civility and politeness. Let's keep it that way.

Here's some additional advice from Richard Ulrich for SCI.STAT.CONSULT folks.

If you are going to CROSS-POST to several groups, PLEASE send just one message in which you LIST THE SEVERAL GROUPS in the header. i) That way, when someone writes a response, it will show up in EACH group where the question could be read, not just in one. ii) That way, when a person reads with a Threaded-newsreader, he will see your message just ONCE, instead of over and over.

2 What are other related LISTSERV/USENET groups?

There are two nice web resources for Statistics related lists:

http://www.ukc.ac.uk/php/mff/netres/statlist.html and

http://www.stattransfer.com/lists.html is a nice site sponsored by Circle Systems, makers of Stat/Transfer software. It uses web forms to allow you to subscribe and unsubscribe to STAT-L and a lot of other lists.

SPECIAL WARNING!!! Please, please, please note that subscription requests go to the LISTSERV or MAILBASE address. If you send a subscription request to the list itself, it will be read by hundreds or thousands of people, none of whom can get you subscribed. Some of these people will be annoyed enough at your naivete that they will introduce you to a concept known as "flaming".

ALBERT-GIFI -- The Albert Gifi mailing list discusses correspondence analysis, multidimensional scaling, nonlinear multivariate analysis, and optimal scaling

How to subscribe: subscribe ALBERT-GIFI First-name Last-name

ALLSTAT -- Discussions on this list are similar to STAT-L/SCI.STAT.CONSULT, but there is a decidedly British flavor to ALLSTAT and a more U.S. flavor to STAT-L/SCI.STAT.CONSULT. This is particularly noticeable in the postings of meetings. ALLSTAT is a Mailbase system so it uses a slightly different syntax than the LISTSERV system.

How to subscribe: join ALLSTAT First-name Last-name
Post messages to: ALLSTAT@MAILBASE.AC.UK
Web info and FAQ: http://www.stats.gla.ac.uk/allstat/

Note: Contrary to previous information in this FAQ, you must include your name when subscribing. "Subscribe" can be substituted for "join," however. Here are some additional comments from Dr. Stuart Young, the list owner.

Note also, that while Allstat does indeed have a "UK flavour" it is not a discussion list. It is a "broadcast system" for distributing notices. Discussions are not encouraged on the list - replies go to the sender, not to the list.

CRSP-L -- Help With Center for Research in Security Prices (CRSP) Data Bases.

Subscriptions to: LISTSERV@TAMVM1.TAMU.EDU
How to subscribe: sub CRSP-L First-name Last-name
Post messages to: CRSP-L@TAMVM1.TAMU.EDU
Web info and FAQ: http://www-leland.stanford.edu/class/gsb/crsp/CRSP-L/

EDSTAT-L/SCI.STAT.EDU -- Statistics training and education issues.

How to subscribe: subscribe EDSTAT-L Firstname Lastname
Post messages to: EDSTAT-L@JSE.STAT.NCSU.EDU

MULTILEVEL -- This list is for people using multilevel analysis (multilevel modeling; hierarchical data analysis) and any associated software (e.g. MLn, HLM, VARCL, GENMOD). MULTILEVEL is a MAILBASE system so it uses a slightly different syntax than the LISTSERV system.

How to subscribe: subscribe MULTILEVEL first-name last-name

PSYCHOMETRICS -- A new listserv has been established for graduate students to discuss theoretical and applied issues in psychometrics. Faculty and research scientists are, of course, welcome to listen and offer insight.

Subscriptions to: majordomo@lists.stanford.edu
How to subscribe subscribe PSYCHOMETRICS:
Post messages to: psychometrics@lists.stanford.edu

SCI.STAT.MATH -- A more mathematical flavor can be found on this newsgroup, which sad to say, is not mirrored to any LISTSERVer.

SEMNET -- SEMNET is an open forum for ideas and questions about the methodology that includes analysis of covariance structures, path analysis, and confirmatory factor analysis.

Subscriptions to: LISTSERV@UA1VM.UA.EDU
How to subscribe: sub SEMNET first-name last-name
Post messages to: SEMNET@UA1VM.UA.EDU
Web info and FAQ: http://www.gsu.edu/~mkteer/semfaq.html

STEPS -- an e-mail discussion list for users of the STEPS (STatistics Education through Problem Solving) statistical software.

Subscriptions to: mailbase@mailbase.ac.uk
How to subscribe: join STEPS first-name last-name
Post messages to: steps@mailbase.ac.uk
Web info and FAQ: http://www.stats.gla.ac.uk/steps/

TEACHING-STATISTICS -- This list is for those concerned with the initial teaching of statistics in all phases of education. It will relate to the objectives of the journal Teaching Statistics and the associated Trust, and will also enable discussion of how to make teaching and learning statistics more effective.

Subscriptions to: mailbase@mailbase.ac.uk
How to subscribe: join teaching-statistics first-name last-name
Post messages to: teaching-statistics@mailbase.ac.uk
Web info and FAQ: http://www.mailbase.ac.uk/lists/teaching-statistics/

3 How do I know that my message got posted?

First of all, be patient. It takes a while for your message to be posted. Internet is faster than the Post Office, but it isn't always instantaneous. There's nothing more annoying than seeing the same messages posted again and again in a half hour time period by people who are unsure whether their messages got through. Please wait half a day or more before panicking.

Second, if you are having trouble posting, it is more likely than not a local problem. Check with your help desk or other local resource.

Third, no matter where you post your message from, if the message gets through, it will be added to two very nice USENET archives, AltaVista and DejaNews. Search for your message using the subject line or a reasonably unique phrase in the message itself. This system is not instantaneous. Wait half a day or more before searching for your message. See the section "How do I get the archives of STAT-L/SCI.STAT.CONSULT?" for the web address and other details about AltaVista and DejaNews.

Fourth, if you are using SCI.STAT.CONSULT, then you will eventually see a copy of your message, if it got posted. There are specal USENET groups where you can practice sending test messages (MISC.TEST or ALT.TEST). If you are a beginner, don't post to SCI.STAT.CONSULT until after you are comfortable posting to one of these test groups.

You will also see your message if you receive the digest from STAT-L.

If you receive individual messages rather than the digest from STAT-L, you will not see your own message when it is posted. The presumption is that you read it when you wrote it, so why would you want to see it again?

You can change this default in two ways. Send a e-mail to LISTSERV@VM1.MCGILL.CA with a one line message: SET STAT-L REPRO to inform STAT-L that you wish it to send you back a copy of any message you send in. Send a one line message: SET STAT-L ACK to inform STAT-L that you wish it to send a brief acknowledgment that your message has been sent to the list. Finally, send a one line message: SET STAT-L NOREPRO if you want to go back to the default. Please note that all of these commands go to LISTSERV and not to STAT-L.

Finally, please note that not every question posted on STAT-L/SCI.STAT.CONSULT gets an answer. No one is getting paid for their time, so you need to appeal to their curiosity or their altruism. If no one answered your question, maybe you need to ask the question differently?

4 How do I use LISTSERV to...

Before I discuss subscribing, changing digest options, etc., you should be aware of some resources that can help you with these problems.

There are two good web resources, the first specific to LISTSERV and the second a more general introduction that considers other systems such as mailbase:



If you are intimidated by sending commands to a listserver, check out


mentioned in section 2, which is a nice web resource for subscribing and unsubscribing to STAT-L and a lot of other lists.

Specific information about STAT-L is available at


...subscribe to STAT-L?

If you are using SCI.STAT.CONSULT, your USENET reader software should have a menu pick or a command that will allow you to subscribe to SCI.STAT.CONSULT. Every reader is different, so please consult your help file or your local computer guru.

To subscribe to STAT-L, send a message to LISTSERV@LISTS.MCGILL.CA with a single line: SUB STAT-L First-name Last-name in the body of the text. Please be sure that you send the message to LISTSERV@LISTS.MCGILL.CA and not to STAT-L@LISTS.MCGILL.CA. If you send your subscription request to STAT-L, hundreds of people will see your message and none of them will be able to subscribe you to the list. Some in fact will flame you for not reading these instructions more carefully.

It's sort of like a newspaper which has a circulation desk and a letters-to-the-editor desk. If you want to start delivery of the paper you send it to the circulation desk. If you want to start delivery of STAT-L, you send the request to LISTSERV. Sending a subscription request to STAT-L is like sending a letter to the editor that reads "Please start delivery of the Sunday paper to 1313 Mockingbird Lane"

...get the digest option turned on/off?

If you have no strong preference, the digest option (multiple messages compiled into a single mailing, usually daily) is less burdensome on Internet and creates fewer bounced messages for the list administrator to deal with. The default when you sign up is for the digest option.

To cancel digest format and to receive the list as separate mailings, send the command SET STAT-L MAIL to LISTSERV@LISTS.MCGILL.CA.

To receive the list in digest format, send the command SET STAT-L DIGEST in the body of a message to LISTSERV@LISTS.MCGILL.CA. Again, please be sure that you send all of these types of messages to LISTSERV@LISTS.MCGILL.CA and not to STAT-L@LISTS.MCGILL.CA.

...obtain a list of subscribers to STAT-L?

Send the command REVIEW STAT-L F=MAIL to LISTSERV@LISTS.MCGILL.CA. Send the command REVIEW STAT-L BY NAME F=MAIL to sort by name or REVIEW STAT-L BY COUNTRY F=MAIL to sort by country.

This list does not include subscribers to SCI.STAT.CONSULT, as they do not subscribe to the list the same way. I know of no way to obtain the list of subscribers to SCI.STAT.CONSULT.

...keep my name off of the list of subscribers?

Send the a message to LISTSERV@LISTS.MCGILL.CA with a line in the body of the message reading SET STAT-L CONCEAL YES in the body of the message.

To reverse this, send the command SET STAT-L CONCEAL NO in the body of the message.

...stop mail from STAT-L (temporarily or permanently)?

Send a message to LISTSERV@LISTS.MCGILL.CA (again, please don't send the message to STAT-L@LISTS.MCGILL.CA).

To signoff permanently, include the line UNSUBSCRIBE STAT-L in the body of the message.

To temporarily suspend mail, use the line SET STAT-L NOMAIL and when you are ready to resume reading, use the line SET STAT-L MAIL or SET listname DIGEST depending on your preference for individual messages versus a daily digest.

What if my initial signoff command doesn't work?

This happens sometimes, particularly if your e-mail address changes, even slightly. The key thing to remember here is that only the list owner can help you with this. Sending a message to STAT-L will not help much unless the list owner happens to be following STAT-L right at that moment.

I would recommend that you get a list of subscribers and see how your e-mail address looks to the system (see above for details). Some mail systems (like ELM) allow you to change the FROM field of a message. If your mail system supports this, then try sending a message to LISTSERV and change the FROM field so it looks like it came from the original address. You could also ask your system administrator to create a temporary (or permanent) alias name for you for outbound messages (including the necessary deviant domain part).

If none of the above works, or if it seems too complicated, don't panic. Every list has a human owner who can go in and unsubscribe you manually. You can find the e-mail address of the list owner on the same list of subscribers that you just got (again, see above). When I last checked in August 1995, the list owner was * OWNER= MICHAEL@LISTS.MCGILL.CA (Michael Walsh, McGill University) * (514-398-3680) Send a message directly to the list owner, explaining your problem. The list owner will manually unsubscribe you from STAT-L. Most lists now have the convention that listname-REQUEST@hostname and OWNER-listname@hostname will be sent to the owner of the list. So for our list, you could send a message to STAT-L-REQUEST@LISTS.MCGILL.CA or OWNER-STAT-L@LISTS.MCGILL.CA to resolve any problem where intervention of the list owner is needed.

5 How do I get the archives of STAT-L/SCI.STAT.CONSULT?

The are three ways to get archives of STAT-L/SCI.STAT.CONSULT. First, the LISTSERV software for STAT-L maintains monthly archive files back to 1994. Send the command INDEX STAT-L to LISTSERV.VM1.MCGILL.CA to obtain a listing of these file names. Ssend the command GET filename filetype F=MAIL to receive a specific archive file.

You can also search the archives for keywords, but the syntax is a throwback to mainframe days. Here's an example of how to find statistics humor in previous postings. Send the following message to LISTSERV@VM1.MCGILL.CA (not to STAT-L!)

// JOB Echo=No Database Search DD=Rules
//Rules DD *
Search jokes in stat-l Index

This will get you the following output:

Database STAT-L, 11 hits.
Index Item # Date Time Recs Subject ------ ---- ---- ---- -------
002264 94/05/12 20:47 57 Re: anyone know a good stats joke...
002346 94/05/16 12:42 24 Re: heard any good stats jokes?
002352 94/05/12 16:42 29 Re: anyone know a good stats joke...
002374 94/05/17 00:39 34 Re: anyone know a good stats joke...
002387 94/05/17 17:16 30 Re: anyone know a good stats joke...
004886 94/10/11 09:36 49 Re: The charge of epistemological naivete
005643 94/11/07 17:45 59 Re: Political Correctness vs. Offensive topics of +
005664 94/11/08 11:32 36 Re: Political Correctness vs. Offensive topics of +
008101 95/03/02 14:58 116 us government censorship to the internet?
009133 95/04/18 04:56 90 --NEED HELP WITH EVALUATION--
021605 96/12/23 10:04 48 Re: Farms (STAT-L 21 Dec 1996)

Obviously only some of these are successful hits. For example, any message with the word "epistemological" in the title can't be humorous. Send to LISTSERV@VM1.MCGILL.CA the following syntax to get the text of specific messages:

// JOB Echo=No Database Search DD=Rules
//Rules DD *
Search jokes in stat-l
Print all of 2264 2346 2352 2374 2387

Send the command GET LISTDB MEMO F=MAIL to LISTSERV@UGA.CC.UGA.EDU to get a full description of LISTSERV search functions (note that LISTSERV.VM1.MCGILL.CA does not have this file).

gopher://jse.stat.ncsu.edu/11/othergroups/statl/ is a gopher site that contains the archives of STAT-L. If you are still using gopher software, point it to jse.stat.ncsu. This site has archives going back to 1990. In case you were curious, there were 21 messages posted for the whole month of January 1990. Volume has picked up a bit since then.

http://www.reference.com also maintains an archive of STAT-L, other lists, USENET groups, and web discussion groups. I'm not sure how far back this archive goes.

Finally, archives of USENET messages, including messages for SCI.STAT.CONSULT are maintained at two sites, http://altavista.digital.com which apparently only goes back a month or so, and http://www.dejanews.com going back to March 19, 1995. Follow the instructions at either site for restricting your search to just one newsgroup.

Some people may wish to prevent their postings from being added to these databases. If your posting contains an X-Header looking like x-no-archive: yes or if you place x-no-archive: yes as the first line of the body text of your message, then your message not be archived.

6 Why have I stopped seeing messages?

Nine times out of ten, the problem is at your site. If you aren't already good friends with the people who administer your Internet connection, now is a good time to start. These people will know when the connection is running smoothly and when it is erratic.

Posting a test message to STAT-L/SCI.STAT.CONSULT is not likely to help. If you aren't seeing normal traffic, what makes you think that you will see your test message? Also, the people who read your test message are not in a position to diagnose your problem. Only your new found friends who run your local Internet connection are in a position to diagnose your problem.

Your first step is to check one of the USENET archives described above (Altavista or Dejanews). If you see messages in either archive that are more than 48 hours old and which you have not received at your local site (via either SCI.STAT.CONSULT or STAT-L), then you have a real problem.

There are some obvious self-diagnostic questions you should ask yourself. For STAT-L readers, ask yourself if you have received mail from other Internet sources. If not, then perhaps the problem is bigger than STAT-L. Also for STAT-L readers, find out if your site has been bouncing back e-mail recently. The number one cause for not getting STAT-L mail is that the list administrator noticed a bunch of bounced e-mail error messages and has de-activated your subscription.

To find out if you've been deactivated, send a message to LISTSERV@VM1.MCGILL.CA with QUERY STAT-L in the body of the message. Please make sure you send this to the LISTSERV address and not the STAT-L address. Within a few hours, you should get a reply showing your status. If you don't get a response, that's a good sign that the listserver is down, which would mean that nobody is getting messages from STAT-L. If you do get a response, here's what it might look like.

Distribution options for Steve Simon <ssimon@CMH.EDU>, list STAT-L: Ack= No, Mail= Digests, Files= Yes, Repro= No, Header= Short(BSMTP), Renewal= Yes, Conceal= No

If your account was de-activated, the response will be

You are not subscribed to the STAT-L list.

or your distribution option will be set to NOMAIL. In either case, work with your local Internet experts to fix the problem and then either re-subscribe or set the distribution option back to MAIL.

By the way, don't complain to the list owner for de-activating your account. The typical listowner has to sort through hundreds or thousands of bounced message reports weekly, and the only way to stop these bounced message reports is to de-activate accounts. The people who you need to talk to are your new found friends who maintain your Internet access.

Failure to receive messages is less common for SCI.STAT.CONSULT readers. If you are experiencing problems, the obvious thing to look for is whether any of the newsgroups are getting through. If nothing is getting through, then you have a local problem. If you get postings from other newsgroups, then perhaps your server has decided not to carry SCI.STAT.CONSULT anymore. Either way, you have to talk to your local Internet experts.

7 How can I contact the ASA, Biometric Society, or IMS?

American Statistical Association
1429 Duke St.
Alexandria, VA 22314-3402
Tel: 703-684-1221
FAX: 703-684-2036
E-M: asasinfo@amstat.org
Web: http://www.amstat.org

The International Biometric Society
808 17th Street, NW, Suite 200
Washington, DC 20006-3910
Tel: 202-223-9669
FAX: 202-223-9569
E-M: 75703.1407@compuserve.com
Web: http://www.stat.uga.edu/~lynne/symposium/biometric.html

Institute of Mathematical Statistics
3401 Investment Boulevard, Suite 7
Hayward, CA 94545
Tel: 510-783-8141 (Hazel Lowery)
FAX: 510-783-4131
E-M: HLLIMS@stat.berkeley.edu
Web: http://www.imstat.org

8 How can I contact the major statistics software vendors?

http://www.statistics.com/vendors.html, a web site maintained by Resampling Stats, Inc. has a very nice list of statistics software vendor information.

http://www.gsm.uci.edu/~joelwest/MacStats/ is a site for statistics software specific to the Macintosh.

Many of these companies have numerous locations and international distributors. I have only listed corporate headquarters to save space. If you can, check out the web site to get more detailed information. Also please bear in mind that mergers and other business activity may quickly make parts of this list obsolete.

Finally, I need to repeat my earlier plea about listservers. Please, please, please note that subscription requests go to the LISTSERV or MAILBASE or MAJORDOMO address.

Aptech Systems, Inc. (GAUSS)
23804 SE Kent-Kangley Road
Maple Valley, WA 98038 USA
Tel: 206-432-7855
FAX: 206-432-7832
Web: http://www.aptech.com/
E-M: support@aptech.com (support) info@aptech.com (sales information)

GAUSS mailing list --
How to subscribe: subscribe GAUSSIANS


Automatic Forecasting Systems, Inc.(Autobox)
PO Box 563
Hatboro, PA 19040
Tel: 215 675-0652
Fax: 215 672-2534
Web: http://www.autobox.com


Circle Systems, Inc. (Stat/Transfer)
1001 Fourth Avenue, Suite 3200
Seattle, WA 98154
Tel: 206-682-3783
Fax: 206-328-4788
Web: http://www.stattransfer.com
E-M: stat-transfer@circlesys.com (General Information) sales@circlesys.com (Sales) support@circlesys,com (Customer Support)


Civilized Software, Inc. (MLAB)
8120 Woodmont Ave. #250
Bethesda, MD 20815 USA
Tel: 1-301-652-4714
Fax: 1-301-656-1069
Web: http://www.civilized.com
E-M: csi@civilized.com


Conceptual Software Inc. (DBMS/COPY)
9660 Hillcroft # 510
Houston, TX 77096
Tel: 713-721-4200
Fax: 713-721-4298
Web: http://www.conceptual.com/
E-M: eroberts@conceptual.com (General Information) eroberts@conceptual.com (Sales) hfeldman@conceptual.com (Customer Support)


Cytel Software Corporation (StatXact, LogXact, EaSt)
675 Massachusettes Ave.
Cambridge, MA 02139 USA
Tel: (617) 661-2011
Fax: (617) 661-4405
Web: http://www.cytel.com
E-M: sales@cytel.com


Data Description, Inc. (DATADESK)
Box 4555
Ithaca, NY 14853 USA
Tel: (607) 257-1000
FAX: (607) 257-4146
Web: http://www.datadesk.com/datadesk/
E-M: datadesk@datadesk.com


DataMost Corporation (STATMOST)
520 West 9460 South
Sandy, UT 84070 USA
Tel: (801) 255-5008
Fax: (801) 255-5009
Web: http://www.datamost.com
E-M: techsupp@datamost.com


Kovach Computing Services.(SIMSTAT, XLSTAT, MVSP)
Web: http://www.kovcomp.co.uk/
E-M: info@kovcomp.co.uk

Also see Provalis Research


Manuguistics, Inc. (Statgraphics)
2115 East Jefferson St.
Rockville, MD 20852
Tel: 800-592-0050
Web: http://www.statgraphics.com/
E-M: sgsales@manu.com (sales) training@manu.com (training)


MathSoft, Inc. (MATHCAD, S-plus)
101 Main Street
Cambridge, MA 02142 USA
Tel: 617 577-1017
Fax: 617 577-8829
Web: http://www.mathsoft.com
E-M: ideas@mathsoft.com (comments and suggestions) support@mathsoft.com (Support, US or Canada) help@mathsoft.com (Support outside US/Canada) sales-info@mathsoft.com (Sales, US or Canada) int-info@mathsoft.com (Sales outside US/Canada)

S-plus mailing list --
Subscriptions to: s-news-request@wubios.wustl.edu
How to subscribe: subscribe s-news
Post messages to: s-news@wubios.wustl.edu
web site: http://www.biostat.wustl.edu/s-news/


The MathWorks, Inc. (MATLAB)
24 Prime Park Way
Natick, MA 01760-1500 USA
Tel: (508) 653-1415
Fax: (508) 653-2997
Web: http://www.mathworks.com/home.html
E-M: info@mathworks.com (Sales, pricing, information) support@mathworks.com (Technical support) bugs@mathworks.com (Bug reports) suggest@mathworks.com (Product suggestions) service@mathworks.com (Service)


Minitab Inc.
3081 Enterprise Drive
State College, PA 16801 USA
Tel: 814 238-3280
Fax: 814 238-4383
Web: http://www.minitab.com
E-M: sales@minitab.com


Modern Microcomputers (MODSTAT)
7302 Kim Shelly Court,
Mechanicsville, VA 23111
Tel: 804 746-3882
Web: http://members.aol.com/rcknodt/pubpage.htm
E-M: RCKnodt@aol.com


NCSS Statistical Software (NCSS, PASS)
329 North 1000 East Kaysville, Utah 84037 USA
Tel: (800) 898-6109 (801) 546-0445
Fax: (801) 546-3907
Web: http://www.ncss.com
E-M: ncss@ix.netcom.com


Palisade Corporation (@RISK)
31 Decker Road
Newfield, NY 14867 USA
Tel: 607-277-8000 800-432-7475
Fax: 607-277-8001
Web: http://www.palisade.com


5000 Adam Street
Montreal, QC
Tel: 1-800-242-4775 (from overseas: 713-524-6394)
FAX: 713-524-6398
Web: http://www.simstat.com

Also see Kovach Computing Services.


Quantitative Psychology Software (ANOVA MultiMedia, Quantos Power, Central Limit Theorem, Partialing Techniques)
Web: http://psychology.iupui.edu/fb/
E-M: jrasmuss@iupui.edu


Resampling Stats
612 N. Jackson St.
Arlington, VA 22201 USA
Tel: 703-522-2713
Fax: 703-522-5846
Web: http://www.statistics.com
E-M: stats@resample.com learning@statistics.com


SAS Institute Inc. (JMP, SAS, Statview)
SAS Campus Drive
Cary, NC 27513 USA
Tel: 919 677-8000 919 677-8008 (JMP technical support) 919 677-8000, ext 5071 (JMP sales)
Fax: 919 677-8123
Web: http://www.sas.com
ftp: ftp://ftp.sas.com
E-M: corpcom@unx.sas.com (Corporate Communications) sasedu@vm.sas.com (Education) eurwww@mvs.sas.com (European Offices) pubs@unx.sas.com (Publications) software@sas.sas.com (Sales and Marketing) bussol@unx.sas.com (Business Solutions Division) sasblb2@vm.sas.com (jmp-sales)

On September 26, 1997, SAS Institute purchased Statview software from Abacus, Inc. Information about Statview can be found at the web site, http://www.statview.com.

JMP mailing list --
How to subscribe: subscribe JMP-L
Post messages to: JMP-L@WUBIOS.WUSTL.EDU

SAS mailing list --
Subscriptions to: LISTSERV@UGA.CC.UGA.EDU
How to subscribe: subscribe SAS-L First-name Last-name
Post messages to: SAS-L@UGA.CC.UGA.EDU

SAS Technical Support News --
Subscriptions to: LISTSERV@VM.SAS.COM
How to subscribe: subscribe TSNEWS-L First-name Last-name
Post messages to: Messages posted by SAS Institute only


E-M: 75450.3171@compuserve.com


SPSS Inc. (BMDP, SPSS, Systat)
444 North Michigan Avenue
Chicago IL 60611 USA
Tel: 312 329-3410 800 543-2185 312-494-3283 (SYSTAT Technical Support)
Fax: 312/329-3668
BBS: 312/836-1900 (8/N/1)
ftp: ftp.spss.com
E-M: support@spss.com
Web: http://www.spss.com

BMDP mailing list --
Subscriptions to: LISTSERV@VM1.MCGILL.CA
How to subscribe: sub BMDP-L Firstname Lastname
Post messages to: BMDP-L@VM1.MCGILL.CA

SPSS mailing list --
Subscriptions to: LISTSERV@UGA.CC.UGA.EDU
How to subscribe: sub SPSSX-L Firstname Lastname
Post messages to: SPSSX-L@UGA.CC.UGA.EDU

SYSTAT mailing list --
Subscriptions to: LISTSERV@SPSS.COM
How to subscribe: sub SYSTAT-L Firstname Lastname
Post messages to: SYSTAT-L@SPSS.COM


Stata Corporation
702 University Drive
East College Station, Texas 77840 USA
Tel: 409-696-4600 800-STATA-PC
Fax: 409-696-4601
Web: http://www.stata.com/
E-M: stata@stata.com

STATA mailing list --
Subscriptions to: majordomo@hsphsun2.harvard.edu
How to subscribe: subscribe STATALIST
Post messages to: STATALIST@hsphsun2.HARVARD.EDU


Statistical Sciences (see MathSoft)


Statistics and Epidemiology Research Corporation (EGRET)
Tel: 206-632-3014
FAX: 206-547-4140
E-M: rhm@ms.washington.edu
Apparently, EGRET has been purchased by Cytel Corporation.


StatSoft, Inc. (STATISTICA)
2300 East 14th Street
Tulsa, OK, USA 74104-4442 USA
Tel: (918) 749-1119
Fax: (918) 749-2217
Web: http://www.statsoftinc.com
E-M: info@statsoftinc.com


Product Coordinator Statistical Software Center
Research Triangle Institute
3040 Cornwallis Road
Research Triangle Park NC 27709-2194 USA
Tel: (919) 541-6602
Fax: (919) 541-7431
Web: http://www.rti.org/patents/sudaan/sudaan.html
E-M: sudaan@rti.org


Web: http://www.unistat.com

Here is a list of software for experimental design, collated by Bob Wheeler.

RS/1 software - including RS/Discover (A general purpose statistics package with extensive experimental design and analysis capability.)
BBN Domain Corp.
150 Cambridge Park Dr.
Cambridge, MA 02140
Tel: 617-873-5000
Fax: 617-873-6153
E-M: jtsullivan@bbn.com
Web: http://www.bbndomain.com/


Design Ease & Design Expert software (Experimental design, analysis, and training.)
Stat-Ease, Inc.
2021 E. Hennepin Ave., Ste. 191
Minneapolis, MN 55413
Tel: 612-378-9449
Fax: 612-378-2152
E-M: 72103,1436@compuserve.com


ECHIP software (Experimental design, analysis and training for scientists and engineers.)
ECHIP, Incorporated
724 Yorklyn Road
Hockessin, DE 19707-8733
Tel: 302-239-5429
Fax: 302-239-6227
E-M: support@echip.com

9 Where can I find free/shareware statistical software?

Any search for free/shareware statistical software should start with Statlib. Other software is arranged alphabetically after the description of Statlib.

http://lib.stat.cmu.edu/ Statlib. Link last verified September 14, 2000. "Welcome to StatLib, a system for distributing statistical software, datasets, and information by electronic mail, FTP and WWW. Starting October 1st [2000], StatLib's URL will just be http://lib.stat.cmu.edu and not http://www.stat.cmu.edu which will be reserved for the URL of the Statistics Department at Carnegie Mellon."

www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml BUGS. Link last verified September 14, 2000. "Bayesian inference Using Gibbs Sampling is a piece of computer software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. It grew from a statistical research project at the MRC Biostatistics Unit, but now is developed jointly with the Imperial College School of Medicine at St Mary's, London. The Classic BUGS program uses text-based model description and a command-line interface, and versions are available for major computer platforms. A Windows version, WinBUGS, has an option of a graphical user interface and has on-line monitoring and convergence diagnostics. CODA is a suite of S-plus/R functions for convergence diagnostics. The programs are reasonably easy to use and come with a range of examples. Considerable caution is, however, needed in their use, since the software is not perfect and MCMC is inherently less robust than analytic statistical methods.There is no in-built protection against misuse."

ftp://plato.la.asu.edu/pub/donlp2 DONLP2. This is the ftp site for DONLP2. There have been recent updates to DONLP2, one of the few high-quality programs for general nonlinear programming problems available completely free over the net. There are four different versions (in f77 resp f2c/cc and with exact or numerical differentiation), there is a separate file with three papers as postscript files and the user's guide (README's and donlp2doc.txt file) have been updated last on 6-24-96.

http://www.epidata.dk/ EpiData. Link last verified on March 7, 2002. EpiData is a comprehensive yet simple tool for documented dataentry. Overall frequency tables (codebook) and listing of data included, but no statistical analysis tools. EpiData is free and currently developed for windows 95/98/NT/2000. (Works on PowerMac with emulator)

http://www.cdc.gov/publications.htm Epi-Info/Epi-Map. Link last verified September 14, 2000. "Epi Info. Public domain microcomputer programs for handling public health data. Epi Map. Displays data using geographic or other maps. Epi Meta. Performs meta analysis. DoEpi. A series of interactive exercises for teaching epidemiology computing."

http://GKing.Harvard.Edu Gary King's homepage. Link last verified September 14, 2000. This page includes a wide range of freeware/shareware authored or co-authored by Gary King. "ReLogit: Rare Events Logistic Regression -- for Stata or for Gauss; AMELIA: A Program for Missing Data -- for Windows or for Gauss ; CLARIFY: Software for Interpreting and Presenting Statistical Results (Stata macros); Gauss Procedures: A set of utilities and statistical procs, for those who program in Gauss; EI: A Program for Ecological Inference (requires Gauss); EzI: A(n Easy) Program for Ecological Inference; COUNT: A Program for Estimating Event Count and Duration Regressions; JudgeIt: A Program for Evaluating Electoral Systems and Redistricting Plans; Maxlik: A set of Gauss programs and datasets (annotated for pedagogical purposes) to implement many of the maximum likelihood-based models I discuss in Unifying Political Methodology: The Likelihood Theory of Statistical Inference; The Virtual Data Center Project: An operational, open-source, digital library to enable the sharing of quantitative research data, and the development of distributed virtual collections of data and documentation and the Geospatial Liboratory Project."

http://bevo.che.wisc.edu/octave/ GNU Octave. "GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab. It may also be used as a batch-oriented language. Octave has extensive tools for solving common numerical linear algebra problems, finding the roots of nonlinear equations, integrating ordinary functions, manipulating polynomials, and integrating ordinary differential and differential-algebraic equations. It is easily extensible and customizable via user-defined functions written in Octave's own language, or using dynamically loaded modules written in C++, C, Fortran, or other languages. GNU Octave is also freely redistributable software. You may redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation. Octave was written by John W. Eaton and many others. Because Octave is free software you are encouraged to help make Octave more useful by writing and contributing additional functions for it, and by reporting any problems you may have."

http://www.psychologie.uni-trier.de:8000/projects/gpower.html G*Power. Link last verified September 14, 2000. "G*Power is a general power analysis program that comes in two essentially equivalent versions: one runs under the Macintosh OS and the other was designed for MS-DOS. G*Power performs high-precision statistical power analyses for the most common statistical tests in behavioral research, that is, t-tests (independent samples, correlations, and any other t-test), F-tests (ANOVAS, multiple correlation and regression, and any other F-test), and Chi^2-tests (goodness of fit and contingency tables). G*Power computes power values for given sample sizes, effect sizes, and alpha levels (post hoc power analyses), sample sizes for given effect sizes, alpha levels, and power values (a priori power analyses), and alpha and beta values for given sample sizes, effect sizes, and beta/alpha ratios (compromise power analyses). The program may be used to display graphically the relation between any two of the relevant variables and it offers the opportunity to compute the effect size measures from basic parameters defining the alternative hypothesis."

http://www.kovcomp.com/ Kovach Computing Services. Link last verified September 14, 2000. This company produces and/or distributes the following statistical software: "MVSP - a MultiVariate Statistical Package, SIMSTAT - General purpose statistical program, WordStat - Textual content analysis add-in for Simstat, Simstat-TSF - Time series add-in for Simstat, XLSTAT - Statistical add-in for Excel spreadsheets (Windows & Mac), Data Desk - Exploratory Data Analysis (Windows & Mac), Oriana - Circular statistics, Wa-Tor - Population dynamics simulation." Some of this software is shareware or freeware. Free demos are available for much of the software also.

http://odin.mdacc.tmc.edu/anonftp M.D. Anderson Cancer Center Biomathematics Archive. Link last verified September 14, 2000. "This site contains all code available from the Section of Computer Science, Department of Biomathematics, University of Texas M. D. Anderson Hospital. The code can be freely copied and used (shareware distribution is encouraged) although the authors retain copyright for the University of Texas in order to control possible commercial incorporation."

http://www.uic.edu/~hedeker/mix.html The MIXOR/MIXREG Home Page by Don Hedeker. Link last verified October 3, 2000. "MIXOR, MIXREG, MIXNO, and MIXPREG programs A whole family of mixed-up programs! including mixed-effects linear regression, mixed-effects logistic regression for nominal or ordinal outcomes, mixed-effects probit regression for ordinal outcomes, mixed-effects Poisson regression, and mixed-effects grouped-time survival analysis. These models are also called multilevel models, hierarchical linear models, random-effects models, and random coefficients models, to name a few."

http://www.ioe.ac.uk/multilevel/ Multilevel Models Project. Link last verified September 14, 2000. "This page introduces some basic information about the Multilevel Models Project at the Institute of Education, University of London together with details of software, working papers and an introduction to multilevel models. It is updated periodically with links, including information about the project, macros for the MLwiN multilevel software, collaborations, newsletters etc."

http://www.sagebrushpress.com/pepibook.html PEPI. Link last verified December 31, 2001. Statistical software for Epidemiologists.

http://www-prophet.bbn.com/ PROPHET. Unable to verfity link on September 14, 2000. PROPHET is a UNIX-based workstation software package that gives researchers a wide range of computing capabilities. One of PROPHET's greatest assets is its new graphical user interface. Employing the latest advances in software technology, PROPHET lets you store, analyze and present Data Tables, Graphs, Statistical Analyses and Mathematical Modeling, and Sequence Analyses with high-resolution graphics and multiple windows. Anyone, from the computer-naive to the computer-sophisticate, can learn to use it quickly and effectively.

http://www.tugsg.com/qdstat/qdst_fs.htm QDStat. Link last verified September 14, 2000. "The QD of QDStat stands for "Quickly Done" although some irreverent individuals favor the term "Quick and Dirty." The package, however, remains true to either name and is an analytical tool for rapid, easy evaluation of relatively small uncomplicated data sets using procedures common to basic statistical textbooks. QDStat does not understand about three dimensional mixed designs in the Lindquist tradition (there now you know how old I am), factorial designs with confounded interactions, balanced lattice designs, balanced incomplete-block designs and similar things. Those who have need of such techniques will not be well served by QDStat and should direct their efforts to one of several other more extensive packages such as SASŪ. On the other hand, the needs of mere mortals may be met by QDStat."

http://www.cas.lancs.ac.uk/software/sabre3.1/sabre.html SABRE. Link last verified September 14, 2000. "SABRE is a program for the statistical analysis of binary, ordinal and count recurrent events. Such data are common in many surveys either with recurrent information collected over time or with a clustered sampling scheme. It is particularly appropriate for the analysis of work and life histories, and has been used intensively on many longitudinal datasets. Its development has been funded by ESRC and Lancaster University. In 1989, SABRE 2.0 was released, written by Jon Barry, Brian Francis and Richard Davies. SABRE 3.0, developed by Dave Stott, was released as freeware on the WWW in 1996. The current release is version 3.1. SABRE is available as freeware under the GNU general public licence on the WWW."

http://www.myatt.demon.co.uk/index.htm Some Free Public Health Software. Link last verified October 3, 2000. Mark Myatt has a nicely documented list of free software that he and others have written.

http://forrest.psych.unc.edu/research/index.html ViSta. Link last verified September 14, 2000. "ViSta, the Visual Statistics System, features statistical visualizations that are highly dynamic and very interactive. Dynamic, High-Interaction, Multi-View Graphics: ViSta constructs very-high-interaction, dynamic graphics that show you multiple views of your data simultaneously. The graphics are designed to augment your visual intuition so that you can better understand your data. See What Your Data Have To Say: ViSta's visually intuitive and computationally intensive approach to statistical data analysis is designed to clarify the meaning of data so that you can see what your data have to say. Freeware/Open Software: ViSta is free and open. It can be downloaded from the web. Platforms: ViSta runs under Windows, on Macintosh, and under Unix."

http://www.westat.com/statsoft.html. Westat Statistical Software. Link last verified September 14, 2000. "Westat supports two classes of software packages for statistics professionals. WesVar is a software package that computes estimates and replicate variance estimates for data collected using complex sampling and estimation procedures. Westat is the distributor in the U.S. and Canada for the Blaise family of software, a complete survey processing system."

http://www.stat.umn.edu/ARCHIVES/archives.html U of Minnesota Statistics: Software. Link last verified September 14, 2000. "XLISP-STAT is an object-oriented statistical computing environment based on a dialect of the Lisp language called XLISP. Macanova is an interactive program for statistical analysis and matrix algebra. On the Macanova home page you will find links for Macintosh, DOS, and Windows executables, documentation, and program source. Arc is software that accompanies the book, Applied Regression Including Computing and Graphics by R. Dennis Cook and Sanford Weisberg, published by John Wiley in August 1999. Arc is the sucessor to R-code. CUSUM Programs and data sets referenced in the book Cumulative Sum Charts and Charting by Douglas M. Hawkins and David H. Olwell. FIRM (Formal Inference-based Recursive Modeling) fits dendrographic models relating a dependent variable to a set of predictors."

10 What statistics resources can be found on the web?

This section does not include web sites described in the "How can I contact the major statistics software vendors?" section or in other parts of the FAQ. The web is growing and changing rapidly, so it is impossible for me to compile a comprehensive list. Here are some interesting sites which have been mentioned on STAT-L/SCI.STAT.CONSULT. You are welcome to send me other interesting web sites.

http://www.nottingham.ac.uk/~mhzmd/bonf.html A biography of Carlo Emilio Bonferroni (Michael Dewey).
http://www.research.att.com/~volinsky/bma.html Bayesian Model Averaging
http://members.tripod.com/~Probability/bayes02.htm Bayeseans vs. Non-Bayeseans
http://www.dartmouth.edu/~chance/chance_news/news.html Chance News
http://www.execpc.com/~helberg/statframes.html Clay Helberg's Statistics on the Web
http://www.indiana.edu/~stigtsts/ Commentaries on Significance Testing.
http://www.stats.gla.ac.uk/cti/ CTI Statistics (Resources for Statistics with an emphasis on teaching)
http://www-leland.stanford.edu/class/gsb/excel2sas.html Excel to SAS and other data translations.
http://noppa5.pc.helsinki.fi/koe/index.html experimental WWW pages for teaching Statistics
http://curriculum.qed.qld.gov.au/kla/eda/ Exploring Data website: curriculum support materials for teachers of introductory statistics.
http://www-stat.ucdavis.edu/stat.html Graduate programs in Statistics
http://members.aol.com/johnp71/javastat.html Interactive Statistics pages (Java/JavaScript).
http://www.rt66.com/~llubet Lloyd's Warehouse of Economic Indicators.
http://www.w3.org/Math/ Math ML, Mathematical Markup Language
ftp://ftp.sas.com/pub/neural/measurement.html Measurement theory FAQ.
http://snipe.ukc.ac.uk/cgi-bin/hpda/mff/ Mike Fuller's homepage which includes statistcs resources on the Internet and the list of statistics email discussion lists.
ftp://ftp.sas.com/pub/neural/FAQ.html Neural networks FAQ.
SAS tips on the web.
http://www.stat.wisc.edu/statistics/consult/ the Section on Statistical Consulting (ASA).
http://www.bioss.sari.ac.uk/smart/unix/moutline.htm SMART, Explorapaedia of Statistical and Mathematical Techniques
http://www.statserv.com/ St@tServ, the central information server for Statistics & Data Analysis on the Internet
http://www.interchg.ubc.ca/cacb/power Statistical power analysis software (Len Thomas).
http://www.xs4all.nl/~jcdverha/scijokes/1_2.html Statistics jokes.
http://www.stat.duke.edu/~box/sis/ Statistics in Sports Section (ASA)
http://www.execpc.com/~helberg/statistics.html Statistics on the Web (Clay Helberg).
http://www.isds.duke.edu/stats-sites.html Statistics servers and other links (The Institute of Statistics and Decision Sciences).
http://www.stat.ucla.edu/textbook/ UCLA Statistics Textbook (interactive pages using JavaScript, Perl, xlisp-stat, etc.)
http://www.statlets.com/ STATLETS: a collection of Java applets designed to assist you in analyzing data over the Internet or local intranets.
http://www.stat.ucla.edu/teach Teaching of statistics
http://faculty.vassar.edu/~lowry/VassarStats.html VassarStats (JavaScript statistics programming)
http://www.stat.ufl.edu/vlib/statistics.html/ Virtual Library of Statistics
http://www.utexas.edu/world/lecture/ World Lecture Hall (Web-based lectures on many academic topics including Statistics).

Web sites for statistics journals (compiled by Tony Corso)

http://www.ams.org/journals American Mathematical Society Journals
http://www.amstat.org/publications/index.html American Statistical Association Publications
http://www.stat.colostate.edu/annappr The Annals of Applied Probability
http://www.stat.berkeley.edu/users/annstat The Annals of Statistics
http://www.nuff.ox.ac.uk/biometrika Biometrika
http://www.wiwi.hu-berlin.de/~sigbert/cs.html Computational Statistics
http://fims-www.massey.ac.nz/~maths/jamds/ Journal of Applied Math and Decision Sciences
http://www.shef.ac.uk/uni/companies/apt/apt2.html Journal of Applied Probability
http://www.o2.net/~jasr/jasr.html Journal of Applied Statistical Reasoning
http://www.carfax.co.uk/jas-ad.htm Journal of Applied Statistics
http://www.pitt.edu/~csna/joc.html Journal of Classification
http://fisher.stat.unipg.it/iasc/Misc-stat-journ-JCGS.html Journal of Computational and Graphical Statistics
http://www.stat.ucla.edu/journals/jebs Journal of Educational and Behavioral Statistics
http://www.apnet.com/www/journal/mv.htm Journal of Multivariate Analysis
http://www.gbhap.com/journals/718/718-top.htm Journal of Nonparametric Statistics
http://jscs.stat.vt.edu/JSCS Journal of Statistical Computation and Simulation
http://www.elsevier.nl/locate/inca/505561 Journal of Statistical Planning and Inference
http://www.stat.ucla.edu/journals/jss Journal of Statistical Software
http://www2.ncsu.edu/ncsu/pams/stat/info/jse/homepage.html Journal of Statistics Education
http://interstat.stat.vt.edu/InterStat Interstat - Statistics on the Internet
http://vision.arc.nasa.gov/publications/Psychometrika Psychometrika
http://www.gbhap.com/journals/604/604-top.htm Statistics - Theoretical and Applied Statistics
http://www.elsevier.nl/inca/publications/store/5/0/5/5/7/3 Statistics & Probability Letters
http://www.stat.ucla.edu/ims/publications/journals/statsci Statistical Science Journal
http://www.maths.uq.oz.au/~gks/webguide/journals.html Guide to the Web for Statisticians: Journals

11 What should I do about these "Spams"?

http://www.cauce.org is a web site for the Coalition Against Unsolicited Commercial E-mails (CAUCE). Visit this site if you want to do something constructive to stop spam. This site is lobbying for legislation that would make junk e-mail illegal, just like junk FAXes were outlawed recently. In my humble opinion, this seems like the best solution to a problem that is getting worse and worse over time.

A message distributed across multiple newsgroups or list servers, usually for commercial purposes, is known as a Spam. Some examples of Spams that have hit STAT-L/SCI.STAT.CONSULT are the green card lawyers, information about lonely women in Russia, and blueprints of the original atom bomb. First, keep in mind that often it is not the original spam messages that are so conspicuous and potentially intrusive, but rather the inevitable threads of discussion which seem to result from them. Please do not complain to STAT-L about a spam. The person who sent the spam is almost certainly not a subscriber to STAT-L and will not see your complaint. Other victims of the spam will see your complaint though, which multiplies the annoying effect of the spam.

There are constructive steps that you can take to discourage a spam but be assured that hundreds if not thousands of people have probably already done this on your behalf. You can do nothing and still be assured that others are looking out for everyone's interests. So the best course of action is to shrug off the message. You might want to get in the practice of recognizing a spam by its subject line and deleting it unread.

Here are some constructive steps you can take to discourage inappropriate use of Internet resources.

http://www.glr.com/nojunk.html and http://kenjen.com/nospam/ are two sites that you can register at to notify bulk e-mailers that you do not wish to receive commercial e-mail. Some of the more "responsible" bulk e-mailers work with these sites to clean their address lists. Note that while some e-mail advertisements offer a way to remove your e-mail address from their list, there are some reports that doing this might actually increase the amount of spam that you get (see the CAUCE web site for more details).

net-abuse@nocs.insp.irs.gov is an e-mail address within the United States Internal Revenue Service. Because of the volume of e-mails that this address has been getting, the owner has asked that this site be restricted to instances of off-shore money laundering, "cheat the IRS" type UCE mailings and anything dealing with "hate mail" directed towards the IRS and its employees. They cannot investigate spam, unsolicited commercial e-mail, or e-mail pyramid schemes.

http://www.usps.gov/websites/depart/inspect/ is a page from the web site for the U.S. Postal Service. This particular page explains why chain letters (including Internet chain letters) are illegal and who to notify. The U.S. Postal Service has the power to impound all incoming mail to an address or post office box that is listed on a chain letter.

http://www.fraud.com/ is the web site of the National Fraud Information Center. They investigate reports of fraudulent uses of the Internet. They also have a toll free number 1-800-876-7060.

http://www.clark.net/pub/rolf/mmf/ is a humorous web site that publishes the name, address, phone, and e-mail accounts of people who foolishly participated in Internet chain letters like "Make Money Fast."

http://www.cco.caltech.edu/~cbrown/BL/ is a blacklist of Internet advertisers. Find out how to get someone added to the blacklist and ways that you can show your displeasure to advertisers on the blacklist. Be cautious, however, of some of the suggestions made at this site which, in my opinion, go beyond a constructive approach. The author, himself, notes that some of his suggestions may not be legal in some jurisdictions.

http://www.cm.org/nocem.html, http://www.compulink.co.uk/~net-services/spam/, and http://www.mmgco.com/nospam/ offer different software solutions to filter out spams.

news://news.admin.net-abuse.usenet and news://news.admin.net-abuse.email are two USENET newsgroups with information about abuse of the Internet.

12 What are some of the problems with stepwise regression?

All of this material is quoted from various e-mails that appeared on STAT-L/SCI.STAT.CONSULT in 1996. Thanks go to Ira Bernstein, Ronan Conroy, Frank Harrell for their detailed explanations and to Richard Ulrich who originally compiled these comments. I have done some very minor editing, (mostly adding and changing line breaks) but have tried to avoid any substantive changes to these well written explanations.

Frank Harrell's comments:

Here are SOME of the problems with stepwise variable selection.

1. It yields R-squared values that are badly biased high.

2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution.

3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med).

4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem.

5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996).

6. It has severe problems in the presence of collinearity.

7. It is based on methods (e.g. F tests for nested models) that were intended to be used to test pre-specified hypotheses.

8. Increasing the sample size doesn't help very much (see Derksen and Keselman)

9. It allows us to not think about the problem.

10. It uses a lot of paper.

Note that 'all possible subsets' regression does not solve any of these problems.


@article{alt89,author = "Altman, D. G. and Andersen, P. K.",journal = "Statistics in Medicine",pages = "771-783",title = "Bootstrap investigation of the stability of a {C}ox regression model",volume = "8",year = "1989" Shows that stepwise methods yields confidence limits that are fartoo narrow.}

@article{der92bac,author = {Derksen, S. and Keselman, H. J.},journal = {British Journal of Mathematical and Statistical Psychology},pages = {265-282},title = {Backward, forward and stepwise automated subset selection algorithms: {F}requency of obtaining authentic and noise variables},volume = {45},year = {1992},annote = {variable selection} Conclusions: "The degree of correlation between the predictor variables affected the frequency with which authentic predictor variables found their way into the final model. The number of candidate predictor variables affected the number of noise variables that gained entry to the model. The size of the sample was of little practical importance in determining the number of authentic variables contained in the final model. The population multiple coefficient of determination could be faithfully estimated by adopting a statistic that is adjusted by the total number of candidate predictor variables rather than the number of variables in the final model."}

@article{roe91pre,author = {Roecker, Ellen B.},journal = {Technometrics},pages = {459-468},title = {Prediction error and its estimation for subset--selected models},volume = {33},year = {1991} Shows that all-possible regression can yield models that are "too small".}

@article{man70why,author = {Mantel, Nathan},journal = {Technometrics},pages = {621-625},title = {Why stepdown procedures in variable selection},volume = {12},year = {1970},annote = {variable selection; collinearity}}

@article{hur90,author = "Hurvich, C. M. and Tsai, C. L.",journal = American Statistician,pages = "214-217",title = "The impact of model selection on inference in linear regression",volume = "44",year = "1990"}

@article{cop83reg,author = {Copas, J. B.},journal = "Journal of the Royal Statistical Society B",pages = {311-354},title = {Regression, prediction and shrinkage (with discussion)},volume = {45},year = {1983},annote = {shrinkage; validation; logistic model} Shows why the number of CANDIDATE variables and not the number in the final model is the number of d.f. to consider.}

@article{tib96reg,author = {Tibshirani, Robert},journal = "Journal of the Royal Statistical Society B",pages = {267-288},title = {Regression shrinkage and selection via the lasso},volume = {58},year = {1996},annote = {shrinkage; variable selection; penalized MLE; ridge regression}}

Ira Bernstein's comments:

I think that there are two distinct questions here: (a) _when_ is stepwise selection appropriate and (b) _why_ is it so popular.

Since I have seen some variation in usage of the term "stepwise", I define it as any of a number of _data_ driven variable selection schemes used in regression and discriminant analysis, among other applications. Some, inappropriately IMHO (since there is no official body to define "appropriate"), use it to describe what I would call hierarchical (_hypothesis_ driven) selection. Like I would assume many, I would discourage stepwise selection and encourage hierarchical selection. I, of course, assume the researcher does not "cheat" by defining his/her "hierarchy" given the data but does so by considering alternatives in advance of analysis and, preferably, replicates the study (dream on).

I would probably only argue slightly with "never" as an answer to the use of stepwise selection since I don't know what knowledge we would lose if all papers using stepwise regression were to vanish from journals at the same time programs providing their use were to become terminally virus-laden. However, I have been in situations that looked like "I have good reason to look at variables A, B, and C; then look at D, and E, but I have no basis to favor F over G or vice versa past that point." Older versions of SPSS (I haven't used newer versions since switching to SAS a decade ago) allowed this mixture, and I would personally not object to it as long as the strategy were defined in advance and made clear to readers.

As to part (b), I think that there are two groups that are inclined to favor its usage. One consists of individuals with little formal training in data analysis who confuse knowledge of data analysis with knowledge of the syntax of SAS, SPSS, etc. They seem to figure that "if its there in a program, its gotta be good and better than actually thinking about what my data might look like". They are fairly easy to spot and to condemn in a right-thinking group of well-trained data analysts (like ourselves). However, there is also a second group who are often well trained (and may be here in this group ready to flame me). They believe in statistics uber alles--given any properly obtained data base, a suitable computer program can objectively make substantive inferences without active consideration of the underlying hypotheses. If stepwise selection is the parent of this line blind data analysis, then automatic variable respecification in confirmatory factor analysis is the child.

Ronan Conroy's comments:

I am struck by the fact that Judd and McClelland in their excellent book "Data Analysis: A Model Comparison Approach" (Harcourt Brace Jovanovich, ISBN 0-15-516765-0) devote less than 2 pages to stepwise methods. What they do say, however, is worth repeating:

1. Stepwise methods will not necessarily produce the best model if there are redundant predictors (common problem).

2. All-possible-subset methods produce the best model for each possible number of terms, but larger models need not necessarily be subsets of smaller ones, causing serious conceptual problems about the underlying logic of the investigation.

3. Models identified by stepwise methods have an inflated risk of capitalising on chance features of the data. They frequently fail when applied to new datasets. They are rarely tested in this way.

4. Since the interpretation of coefficients in a model depends on the other terms included, "it seems unwise," to quote J and McC, "to let an automatic algorithm determine the questions we do and do not ask about our data". RC adds that stepwise methods abusers frequently would rather not think about their data, for reasons that are funny to describe over a second Guinness.

5. I quote this last point directly, as it is sane and succinct: "It is our experience and strong belief that better models and a better understanding of one's data result from focussed data analysis, guided by substantive theory." (p 204)

They end with a quote from Henderson and Velleman's paper "Building multiple regression models interactively". Biometrics 1981;37:391-411 "The data analyst knows more than the computer" and add "failure to use that knowledge produces inadequate data analysis."

Personally, I would no more let an automatic routine select my model than I would let some best-fit procedure pack my suitcase.

13 What is the answer to the Monty Hall, Envelope, or Birthday problem?

There is a classic probability puzzle, which is called the Monty Hall problem. Here's a nice description from the rec.puzzles FAQ. "The Monty Hall problem can be stated as follows: A gameshow host displays three closed doors. Behind one of the doors is a car. The other two doors have goats behind them. You are then asked to choose a door. After you have made your choice, one of the remaining two doors is then opened by the host (who knows what's behind the doors), revealing a goat. Will switching your initial guess to the remaining door increase your chances of guessing the door with the car?"

The general consensus is that the probability of winning the car is 1/3 if you don't switch and 2/3 if you do switch. But there are some implicit assumptions in this problem that cause a raging debate every time it appears on STAT-L. For example, the host may be perversely trying to goad you into a bad switch and reveals a door only when your current door has a car behind it. There are at least thirty web sites that discuss this problem. Here are three good sites:

http://www.smartpages.com/faqs/sci-math-faq/montyhall/faq.html SCI.MATH FAQ
http://www.cs.ruu.nl/wais/html/na-dir/puzzles/archive/decision.html REC.PUZZLES FAQ
http://www.ram.org/computing/monty_hall/monty_hall.html has a simulation model based on this problem.

You can also read about this problem in Engel, E. and Venetoulias, A. (1991). Monty Hall's probability puzzle. Chance, Vol 4, # 2, 6-9. and Selvin, S. (1975). A problem in probability, in "Letters to the Editor," The American Statistician, 29, 67 and 134.

The envelope exchange problem goes something like this (again from the rec.puzzles FAQ). "Someone has prepared two envelopes containing money. One contains twice as much money as the other. You have decided to pick one envelope, but then the following argument occurs to you: Suppose my chosen envelope contains $X, then the other envelope either contains $X/2 or $2X. Both cases are equally likely, so my expectation if I take the other envelope is .5 * $X/2 + .5 * $2X = $1.25X, which is higher than my current $X, so I should change my mind and take the other envelope. But then I can apply the argument all over again. Something is wrong here! Where did I go wrong? In a variant of this problem, you are allowed to peek into the envelope you chose before finally settling on it. Suppose that when you peek you see $100. Should you switch now?"

Again, there are some subtle assumptions in this problem that cause a lot of commentary. A good reference to the problem is Christensen, R. and Utts, J. (1992) "Bayesian Resolution of the 'Exchange Paradox,'" The American Statistician, 46(4), 274-276. Note also comments in the Letters to the Editor column in two separate issues the American Statistician in 1993 (pages 160, 311).

http://www.cs.ruu.nl/wais/html/na-dir/puzzles/archive/decision.html, the rec.puzzles FAQ contains a nice discussion of this problem.

The birthday problems goes something like this. There are "r" people in a room. What is the probability that two or more people have the same birthday?

Assuming uniform probabilities for each birthdate, the probability of a match is 1-(n!/(n^r)*(n-r)!) where n equals the number of days in a year and r equals the number of people in the group. For r=23, the probability exceeds 0.5. A nice summary of this problem with extensions into non-uniform birthdates is Nunnikhoven, T.S. (1992) "A Birthday Problem Solution for Nonuniform Birth Frequencies," The American Statistician, 46(4), 270-274.

http://pascal.dartmouth.edu/~zhu/applets/Birthday/Birthday.java is a Java applet for computing these probabilities.
http://www.mste.uiuc.edu/reese/birthday/intro.html has a simulation of the birthday problem.

14 Can someone provide me with references and/or books about [topic]?

Before you post a question like this, it would be nice if you did a little work beforehand. The best resource for finding references about a statistical topic is the Current Index to Statistics Extended Database (CISED), a CD-ROM with 180,000 references in statistics journals since 1974, with coverage of selected journals dating back as far as 1940. Many university libraries have this product, and some make it available to their students through a web browser. Licensing agreements, however, prevent libraries from making this product available to the general public. If you want to purchase an individual license, it is available for as little as $95.

http://www.stat.uchicago.edu/~cis/ is a web site that contains more information about CISED. Two e-mail contacts at IMS and ASA are kmkims@stat.berkeley.edu and cised@amstat.org, respectively.

http://www.stat.wisc.edu/statistics/consult/statbook.html is Glen McPherson's Essential Book List. Back in 1993, Glen McPherson polled the members of STAT-L/SCI.STAT.CONSULT to create a list of books essential to anyone in the statistical consulting field. The list is organized by major topic areas. Brian Yandell has put this list up on his web site.

http://www.stat.wisc.edu/statistics/consult/book.html is another interesting booklist that can be found at the same web site.

15 Can you recommend a good statistics software package?

If you want a good answer to this question, you need to be specific about your needs. Be sure to tell us which of the following factors are important to you:

  1. Ease of learning
  2. Quality of help files
  3. Extensibility/programmability
  4. Data entry and validation
  5. Data manipulation
  6. Data importing
  7. Real time graphics (scatterplt brushing, 3D rotation)
  8. Cost

Let us know what statistical procedures you need and what level of user the software is intended for. Tell us what type of computer you plan to run this on.

Also, you can visit some of the web sites listed above to see what the manufacturers have to say about their software packages.

Finally, many statics journals (e.g., The American Statistician) provide regular software reviews. you might find better answers to this question at your libary.

16 Acknowledgments

This list has grown thanks to the small and large contributions of many people. Part of it was shamelessly stolen from well written messages on STAT-L. Here is a partial list of people who you should thank for directly or indirectly contributing to this FAQ: Gary Ash, Kenneth Benoit, Grant Blank, Jim Box, Benjamin Chan, Ronan Conroy, Tony Corso, Donald Cram, Byron Davis, Barry DeCicco, Joe Dolgos, Steven Dubnoff, Rick Engberg, Emil Friedman, Mike Fuller, Steve Goodman, Bill Gould, Timothy Green, Duane Griffin, Clay Helberg, Tim Hesterberg, Charles Kincaid, Melvin Klassen, Warren Kovach, Jan de Leeuw, Lloyd Lubet, Haiko Luepsen, Hans Mittelmann, Brian Monsell, John Nash, Jonathan Newman, Michael Palij, Dennis Roberts, David Ronis, Warren Sarle, Ronald Schoenberg, Russell Schulz, Karsten Self, Jim Steiger, Len Thomas, Richard Ulrich, Vittorio Viaggi, Michael Walsh, Meredith Warshaw, Mitchell Watnik, Bob Wheeler, Will Wheeler, John Whittington, Forest Young, Sara Young, Stuart Young, Craig Ziegler.

If there are errors in this FAQ, they are probably my fault; it is difficult to accurately transcribe all of the information I have received, even with cut and paste. Please send any corrections and additions. Complaints are appreciated also, but please realize that I am doing this on a volunteer effort, mostly during lunch breaks and after work hours.

*** End of FAQ for STAT-L/SCI.STAT.CONSULT ***