USING PDQ EXPLORE FOR PUMS 5% DATA PDQ Explore is a high-speed computer program developed by Public Data Queries, Inc. and used with permission by the Population Studies Center for manipulating 1990 Census microdata. It includes both the PUMS 1% and 5% files. Instructions for accessing the program are available on the web through: http://www.pdq.com/pdq_explore.txt. Although UMich affiliates can access it through their login accounts, those outside the University may telnet to pdq.psc.lsa.umich.edu (login:demo; password:demo). A 24-page listing of Michigan PUMA geography appears at: http://www.lib.umich.edu/govdocs/pumasmi.html Below is a very simple search using the 5% PUMS file. The question: number of people born in China who immigrated to the U.S. between 1980 and 1980. (In the search below, pob=207 means place of birth is China; immigr=1..4 means immigration between 1980 and 1990l; state=26 means Michigan; puma=3100..4400 are the public use microdata areas in southeast Michigan.) 1. Access login.itd.umich.edu [Sign into login account] 2. At the % sign, TYPE ~pops/bin/explore [Brings up Explore from IFS space] pdq_explore: 3. TYPE load pums_1990_5pct [Loads the 5% PUMS nationally] pdq_explore: 4. TYPE select immigr:1..4 [Universe limited to people immigrating Query type tabulate: 1980-90, eg. 1 or 2 or 3 or 4. A colon must be used when giving a range.] ------------------------------------------------------------------------------ Select- 0: immigr:1..4 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ query tabulate on dataset pums_1990_5pct at server query.pdq.com:6968. Table has 1 elements. ============================================================================== pdq_explore: 5. TYPE [Universe limited to people born in select pob=207 China. China, Taiwan and Hong Kong would be pob=207|209|238. The line means OR. An = sign can be used for selecting one or OR-ed variables but not a range] Query type tabulate: ------------------------------------------------------------------------------ Select- 0: pob=207 Select- 1: immigr:1..4 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ query tabulate on dataset pums_1990_5pct at server query.pdq.com:6968. Table has 1 elements. ============================================================================== pdq_explore: 6. TYPE select state=26 [This ensures Michigan] Query type tabulate: ------------------------------------------------------------------------------ Select- 0: state=26 Select- 1: pob=207 Select- 2: immigr:1..4 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ query tabulate on dataset pums_1990_5pct at server query.pdq.com:6968. Table has 1 elements. ============================================================================== pdq_explore: 7. TYPE [Row indicates a range of PUMAS row puma:3100..4400 in Southeast Michigan. Impossible to identify scattered PUMAS as a row. Must recode. See recoded search at end of this example. To break this down by sex, type column sex] Query type tabulate: ------------------------------------------------------------------------------ Select- 0: state=26 Select- 1: pob=207 Select- 2: immigr:1..4 ------------------------------------------------------------------------------ Row [ 3100, 4400] : puma:3100..4400 ------------------------------------------------------------------------------ query tabulate on dataset pums_1990_5pct at server query.pdq.com:6968. Table has 1301 elements. ============================================================================== pdq_explore: 8. TYPE [Add population weight] weight pwgt1 Query type tabulate: ------------------------------------------------------------------------------ Select- 0: state=26 Select- 1: pob=207 Select- 2: immigr:1..4 ------------------------------------------------------------------------------ Row [ 3100, 4400] : puma:3100..4400 Weight: pwgt1 ------------------------------------------------------------------------------ query tabulate on dataset pums_1990_5pct at server query.pdq.com:6968. Table has 1301 elements. ============================================================================== pdq_explore: 9. TYPE [Execute program] run query took 11.422574 seconds to run (including communications overhead and compile time). Table of frequency counts N: 12501046 Selected: 144 ============================================================================== <<<<<< 1470 3100 392 Ann Arbor 3196 0 3197 0 3198 0 3199 0 3200 273 3201 0 3202 0 3301 231 3302 0 3303 0 3400 0 3401 0 3402 19 3403 119 3404 23 [Only selected portions of 3405 0 printout] 3406 0 3599 0 3600 38 3601 0 3602 0 3798 0 3799 0 3800 49 3801 0 3901 0 3902 91 3903 0 4000 183 4001 0 4002 0 4102 133 4103 281 4104 256 4105 87 4106 45 4107 95 4108 0 4399 0 4400 0 >>>>>> 0 Total 3785 pdq_explore: 10. TYPE quit galaxian% 11.TYPE logout RECODED SEARCH The following search recodes the pumas, so just Ann Arbor, the remainder of Washtenaw County, and the City of Detroit are selected. Note that the range of pumas must also be selected in the universe. 1. Access login.itd.umich.edu 2. % sign: ~pops/bin/explore 3. pdq_explore: load pums_1990_5pct 4. pdq_explore: select immigr:1..4 5. pdq_explore: select pob=207 6. pdq_explore: select state=26 7. pdq_explore: select puma:3100..3308 [New element for recode] 8. pdq_explore: recode new_puma At this point the computer loads a new program. Delete the introductory information (CNTRL-K) and also the #s. Write over the template. R: new_puma r: 3100 3100 1 AnnArb r: 3200 3200 2 Washtenaw r: 3301 3308 3 Detroit CNTRL-X to save. You will return to the main Explore menu. 9. pdq_explore: row new_puma(puma) 10. pdq_explore: weight pwgt1 ------------------------------------------------------------------------------ 11, pdq_explore: run <<<<<< 0 AnnArb 392 Washtenaw 273 Detroit 231 >>>>>> 0 Total 896 12. pdq_explore: quit 13. tempest% logout Alternative Recode R: new_puma r: 0 3099 0 Other r: 3100 3100 1 AnnArb r: 3200 3200 2 Washtenaw r: 3301 3308 3 Detroit r: 3309 9999 0 FUNCTIONS "Recode" groups together existing values. "Function" is an algebraic formula, which I don't entirely understand. EXAMPLE of Function: F: pums_1990_5pct.housing.hus_race f: sum ((sex==0 && relat1<=1)*race):0..327 F: pums_1990_5pct.housing.wif_race f: sum ((sex==1 && relat1<=1)*race):0..327