Sometimes you may want to take a random sample of your data, but you want to respect the stratification that was used when the data set was created. Other times you want to maintain certain proportions in the sampled data set; for example, drawing a sample from a data set, but having proportions of males and females that correspond to the current census figures. To draw these types of samples from your data set, you can use proc surveyselect. We will use the hsb2.sas7bdat dataset for our examples. Notice that the code on this page works with SAS 8.x. For the examples below, assume that you’ve imported this dataset into the work folder.
Example 1: Taking a 50% sample from each strata using simple random sampling (srs)
Before we take our sample, let’s look at the data set using proc means. Because we will use a by statement, we need to sort the data first. We will use the variable female as our stratification variable. Also, we will use an options statement to suppress the showing of the variable labels in the output.
proc sort data = hsb2; by female; run; options nolabel; proc means data = hsb2; by female; run; female=0 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 91 106.0109890 60.3122421 3.0000000 200.0000000 race 91 3.4285714 1.0867163 1.0000000 4.0000000 ses 91 2.1538462 0.6818777 1.0000000 3.0000000 schtyp 91 1.1538462 0.3628001 1.0000000 2.0000000 prog 91 2.0219780 0.6988566 1.0000000 3.0000000 read 91 52.8241758 10.5067105 31.0000000 76.0000000 write 91 50.1208791 10.3051607 31.0000000 67.0000000 math 91 52.9450549 9.6647845 35.0000000 75.0000000 science 91 53.2307692 10.7321707 26.0000000 74.0000000 socst 91 51.7912088 11.3338397 26.0000000 71.0000000 ------------------------------------------------------------------------------- female=1 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 109 95.8990826 55.6275553 1.0000000 198.0000000 race 109 3.4311927 1.0033921 1.0000000 4.0000000 ses 109 1.9724771 0.7510328 1.0000000 3.0000000 schtyp 109 1.1651376 0.3730197 1.0000000 2.0000000 prog 109 2.0275229 0.6866278 1.0000000 3.0000000 read 109 51.7339450 10.0578348 28.0000000 76.0000000 write 109 54.9908257 8.1337152 35.0000000 67.0000000 math 109 52.3944954 9.1510153 33.0000000 72.0000000 science 109 50.6972477 9.0385026 29.0000000 69.0000000 socst 109 52.9174312 10.2344086 26.0000000 71.0000000 -------------------------------------------------------------------------------
In the command below we have used several options. We have used the data = option to specify the data set from which we wish to draw the sample. The method option indicates the method by which we would like the sample drawn. SAS offers a wide range of options for this, including probability-proportional-to-size and systematic sampling. The samprate option is used to specify the sampling rate. Here, we have indicated .5, which means 50%. We have used the seed option to set the seed so that our results will be replicable. On the strata statement we specify the variable (or variables) that define the strata.
proc surveyselect data = hsb2 out = samp1 method = srs samprate = .5 seed = 9876; strata female; run; proc sort data = samp1; by female; run; proc means data = samp1; by female; run;female=0 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 46 94.5869565 60.1141788 5.0000000 197.0000000 race 46 3.1956522 1.2582098 1.0000000 4.0000000 ses 46 2.1521739 0.6981688 1.0000000 3.0000000 schtyp 46 1.0652174 0.2496374 1.0000000 2.0000000 prog 46 2.2173913 0.7276459 1.0000000 3.0000000 read 46 50.6956522 10.5848310 31.0000000 73.0000000 write 46 47.4565217 10.2473986 31.0000000 65.0000000 math 46 53.0869565 9.5657400 38.0000000 75.0000000 science 46 51.3043478 11.7735477 26.0000000 74.0000000 socst 46 48.9565217 12.4185462 26.0000000 71.0000000 SelectionProb 46 0.5054945 0 0.5054945 0.5054945 SamplingWeight 46 1.9782609 0 1.9782609 1.9782609 ------------------------------------------------------------------------------------- female=1 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 55 82.2727273 54.4056964 1.0000000 194.0000000 race 55 3.2545455 1.1420933 1.0000000 4.0000000 ses 55 1.9636364 0.7444520 1.0000000 3.0000000 schtyp 55 1.1090909 0.3146266 1.0000000 2.0000000 prog 55 2.1636364 0.7139778 1.0000000 3.0000000 read 55 50.4545455 10.3705748 28.0000000 76.0000000 write 55 54.3636364 8.5729195 35.0000000 67.0000000 math 55 51.9818182 9.9712381 33.0000000 72.0000000 science 55 50.4727273 10.2791673 31.0000000 69.0000000 socst 55 52.3272727 10.2885311 31.0000000 71.0000000 SelectionProb 55 0.5045872 0 0.5045872 0.5045872 SamplingWeight 55 1.9818182 0 1.9818182 1.9818182 -------------------------------------------------------------------------------------
If you want to know which cases were not selected, or if you want to use the two samples for validation purposes, you have to merge the sampled data set back with the original data set. An example is given below. Note that we need to sort both the original data set and the sampled data set on the same variable. This variable must uniquely identify each case in the data set. You can tell which cases were selected into the sample because they have values for Selection Prob and Sampling Weight. These variables were created by proc surveyselect, and hence are not in the original data file. If you want to create three or more data sets from your original data set, you can use Enterprise Miner.
proc sort data = hsb2; by id; run; proc sort data = samp1; by id; run; data merge1; set hsb2 samp1; by id; run; proc print data = merge1 (obs = 25); run;Selection Sampling Obs id female race ses schtyp prog read write math science socst Prob Weight 1 1 1 1 1 1 3 34 44 40 39 41 . . 2 1 1 1 1 1 3 34 44 40 39 41 0.50459 1.98182 3 2 1 1 2 1 3 39 41 33 42 41 . . 4 2 1 1 2 1 3 39 41 33 42 41 0.50459 1.98182 5 3 0 1 1 1 2 63 65 48 63 56 . . 6 4 1 1 1 1 2 44 50 41 39 51 . . 7 4 1 1 1 1 2 44 50 41 39 51 0.50459 1.98182 8 5 0 1 1 1 2 47 40 43 45 31 . . 9 5 0 1 1 1 2 47 40 43 45 31 0.50549 1.97826 10 6 1 1 1 1 2 47 41 46 40 41 . . 11 6 1 1 1 1 2 47 41 46 40 41 0.50459 1.98182 12 7 0 1 2 1 2 57 54 59 47 51 . . 13 7 0 1 2 1 2 57 54 59 47 51 0.50549 1.97826 14 8 1 1 1 1 2 39 44 52 44 48 . . 15 9 0 1 2 1 3 48 49 52 44 51 . . 16 10 1 1 2 1 1 47 54 49 53 61 . . 17 10 1 1 2 1 1 47 54 49 53 61 0.50459 1.98182 18 11 0 1 2 1 2 34 46 45 39 36 . . 19 11 0 1 2 1 2 34 46 45 39 36 0.50549 1.97826 20 12 0 1 2 1 3 37 44 45 39 46 . . 21 13 1 1 2 1 3 47 46 39 47 61 . . 22 13 1 1 2 1 3 47 46 39 47 61 0.50459 1.98182 23 14 0 1 3 1 2 47 41 54 42 56 . . 24 14 0 1 3 1 2 47 41 54 42 56 0.50549 1.97826 25 15 0 1 3 1 3 39 39 44 26 42 . .
Example 2: Using more than one strata variable
In this example, we will use three strata variables. The variable female has two values, and the variable ses has three levels. As before, we will sort the original data set on the strata variables, and then we will do a proc means to see what the variables look like before we draw the sample.
proc sort data = hsb2; by female ses prog; run; proc means data = hsb2; by female ses; run;female=0 ses=1 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 15 79.2000000 57.7262010 3.0000000 169.0000000 race 15 3.1333333 1.2459458 1.0000000 4.0000000 schtyp 15 1.0000000 0 1.0000000 1.0000000 prog 15 1.8000000 0.8618916 1.0000000 3.0000000 read 15 49.3333333 9.0999738 36.0000000 63.0000000 write 15 46.6000000 9.0301084 31.0000000 65.0000000 math 15 47.6000000 6.7802233 39.0000000 63.0000000 science 15 49.8000000 12.9735996 31.0000000 69.0000000 socst 15 43.3333333 9.9618319 26.0000000 57.0000000 ------------------------------------------------------------------------------- female=0 ses=2 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 47 109.6808511 64.3256343 7.0000000 200.0000000 race 47 3.3829787 1.1142007 1.0000000 4.0000000 schtyp 47 1.2127660 0.4136881 1.0000000 2.0000000 prog 47 2.1063830 0.7293250 1.0000000 3.0000000 read 47 52.1702128 10.6185219 31.0000000 73.0000000 write 47 49.5531915 10.1570462 31.0000000 67.0000000 math 47 53.4680851 10.5662528 35.0000000 75.0000000 science 47 53.4042553 10.2780043 34.0000000 74.0000000 socst 47 50.7872340 10.8826471 26.0000000 71.0000000 ------------------------------------------------------------------------------- female=0 ses=3 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 29 113.9310345 52.4934901 14.0000000 199.0000000 race 29 3.6551724 0.9364012 1.0000000 4.0000000 schtyp 29 1.1379310 0.3509312 1.0000000 2.0000000 prog 29 2.0000000 0.5345225 1.0000000 3.0000000 read 29 55.6896552 10.6035824 34.0000000 76.0000000 write 29 52.8620690 10.7760453 33.0000000 67.0000000 math 29 54.8620690 8.6177729 38.0000000 71.0000000 science 29 54.7241379 10.1906699 26.0000000 69.0000000 socst 29 57.7931034 9.5595103 31.0000000 71.0000000 ------------------------------------------------------------------------------- female=1 ses=1 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 32 72.3750000 51.6444107 1.0000000 173.0000000 race 32 3.0312500 1.1495967 1.0000000 4.0000000 schtyp 32 1.0625000 0.2459347 1.0000000 2.0000000 prog 32 1.9687500 0.7398507 1.0000000 3.0000000 read 32 47.7812500 9.5570760 28.0000000 68.0000000 write 32 52.5000000 9.2387682 35.0000000 65.0000000 math 32 49.9062500 9.7164954 39.0000000 72.0000000 science 32 46.7187500 9.0419609 29.0000000 63.0000000 socst 32 49.1875000 10.8700165 26.0000000 71.0000000 ------------------------------------------------------------------------------- female=1 ses=2 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 48 105.6666667 55.6372496 2.0000000 193.0000000 race 48 3.5833333 0.9415545 1.0000000 4.0000000 schtyp 48 1.1875000 0.3944428 1.0000000 2.0000000 prog 48 2.1250000 0.7329625 1.0000000 3.0000000 read 48 51.0000000 8.1632284 36.0000000 71.0000000 write 48 54.2500000 7.3296251 39.0000000 67.0000000 math 48 50.9791667 7.9157521 33.0000000 72.0000000 science 48 50.0416667 7.0889736 36.0000000 66.0000000 socst 48 53.2500000 8.9407030 31.0000000 71.0000000 ------------------------------------------------------------------------------- female=1 ses=3 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- id 29 105.6896552 53.7720742 26.0000000 198.0000000 race 29 3.6206897 0.8200084 1.0000000 4.0000000 schtyp 29 1.2413793 0.4354942 1.0000000 2.0000000 prog 29 1.9310345 0.5298945 1.0000000 3.0000000 read 29 57.3103448 11.2348420 36.0000000 76.0000000 write 29 58.9655172 6.7901334 36.0000000 67.0000000 math 29 57.4827586 8.7162438 42.0000000 71.0000000 science 29 56.1724138 9.5058965 31.0000000 69.0000000 socst 29 56.4827586 10.4765749 31.0000000 71.0000000 -------------------------------------------------------------------------------
The same options are used as above.
proc surveyselect data = hsb2 out = samp2 method = srs samprate = .5 seed = 9876; strata female ses; run; proc sort data = samp2; by female ses prog; run; proc means data = samp2; by female ses; run;female=0 ses=1 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 8 74.0000000 43.0614179 16.0000000 134.0000000 race 8 3.2500000 1.1649647 1.0000000 4.0000000 schtyp 8 1.0000000 0 1.0000000 1.0000000 prog 8 1.6250000 0.9161254 1.0000000 3.0000000 read 8 49.2500000 7.5545634 42.0000000 63.0000000 write 8 42.8750000 6.3569422 31.0000000 52.0000000 math 8 45.6250000 6.2549980 39.0000000 59.0000000 science 8 47.3750000 10.1409706 34.0000000 65.0000000 socst 8 43.3750000 10.1409706 26.0000000 57.0000000 SelectionProb 8 0.5333333 0 0.5333333 0.5333333 SamplingWeight 8 1.8750000 0 1.8750000 1.8750000 ------------------------------------------------------------------------------------- female=0 ses=2 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 24 120.0000000 69.1626941 9.0000000 200.0000000 race 24 3.3750000 1.0959411 1.0000000 4.0000000 schtyp 24 1.2916667 0.4643056 1.0000000 2.0000000 prog 24 1.9166667 0.7172815 1.0000000 3.0000000 read 24 52.3750000 9.3799625 34.0000000 68.0000000 write 24 49.5833333 9.5549517 31.0000000 62.0000000 math 24 52.1666667 10.6185509 35.0000000 75.0000000 science 24 52.3333333 9.5492621 36.0000000 74.0000000 socst 24 51.4166667 11.2207752 26.0000000 71.0000000 SelectionProb 24 0.5106383 0 0.5106383 0.5106383 SamplingWeight 24 1.9583333 0 1.9583333 1.9583333 ------------------------------------------------------------------------------------- female=0 ses=3 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 15 112.4666667 58.7632862 15.0000000 199.0000000 race 15 3.5333333 1.0600988 1.0000000 4.0000000 schtyp 15 1.2000000 0.4140393 1.0000000 2.0000000 prog 15 2.0666667 0.4577377 1.0000000 3.0000000 read 15 56.5333333 9.8913141 39.0000000 76.0000000 write 15 54.1333333 9.4405407 38.0000000 67.0000000 math 15 55.2666667 7.4782224 39.0000000 64.0000000 science 15 54.6666667 10.8210553 26.0000000 66.0000000 socst 15 57.4666667 8.5345237 42.0000000 71.0000000 SelectionProb 15 0.5172414 0 0.5172414 0.5172414 SamplingWeight 15 1.9333333 0 1.9333333 1.9333333 ------------------------------------------------------------------------------------- female=1 ses=1 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 16 75.5000000 48.5166638 1.0000000 161.0000000 race 16 3.1250000 1.1474610 1.0000000 4.0000000 schtyp 16 1.0625000 0.2500000 1.0000000 2.0000000 prog 16 2.0625000 0.8539126 1.0000000 3.0000000 read 16 45.0000000 9.9866578 28.0000000 61.0000000 write 16 49.8125000 8.5496101 35.0000000 62.0000000 math 16 46.9375000 8.3224095 39.0000000 72.0000000 science 16 45.0000000 8.5634884 29.0000000 61.0000000 socst 16 47.7500000 11.9749739 26.0000000 66.0000000 SelectionProb 16 0.5000000 0 0.5000000 0.5000000 SamplingWeight 16 2.0000000 0 2.0000000 2.0000000 ------------------------------------------------------------------------------------- female=1 ses=2 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 24 121.9583333 57.7404079 13.0000000 193.0000000 race 24 3.7083333 0.7506036 1.0000000 4.0000000 schtyp 24 1.2916667 0.4643056 1.0000000 2.0000000 prog 24 2.2916667 0.7506036 1.0000000 3.0000000 read 24 49.7500000 6.6217691 36.0000000 65.0000000 write 24 53.6666667 7.2090925 41.0000000 67.0000000 math 24 49.2500000 7.2306714 37.0000000 63.0000000 science 24 49.8750000 6.4626855 39.0000000 61.0000000 socst 24 52.0833333 8.9632907 31.0000000 71.0000000 SelectionProb 24 0.5000000 0 0.5000000 0.5000000 SamplingWeight 24 2.0000000 0 2.0000000 2.0000000 -------------------------------------------------------------------------------------female=1 ses=3 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------- id 15 116.0000000 55.5504918 26.0000000 194.0000000 race 15 3.7333333 0.5936168 2.0000000 4.0000000 schtyp 15 1.2666667 0.4577377 1.0000000 2.0000000 prog 15 1.8666667 0.3518658 1.0000000 2.0000000 read 15 58.8000000 9.6599320 36.0000000 68.0000000 write 15 58.2000000 7.7015768 36.0000000 67.0000000 math 15 58.9333333 9.8522417 42.0000000 71.0000000 science 15 56.2000000 9.9871346 31.0000000 69.0000000 socst 15 57.5333333 8.8790819 39.0000000 71.0000000 SelectionProb 15 0.5172414 0 0.5172414 0.5172414 SamplingWeight 15 1.9333333 0 1.9333333 1.9333333 -------------------------------------------------------------------------------------
Example 3: Using different sampling rates from each strata
In the examples above, we sampled from each strata at the same rate. However, sometimes you want to sample more from one strata than another. You can specify different sampling rates for each strata by enclosing the proportions in parentheses for the samprate option. Let’s first take a look at the cell counts for the strata variables female and ses.
proc freq data = hsb2; table female*ses /nopercent norow nocol; run;Table of FEMALE by SES FEMALE SES Frequency| 1| 2| 3| Total ---------+--------+--------+--------+ 0 | 15 | 47 | 29 | 91 ---------+--------+--------+--------+ 1 | 32 | 48 | 29 | 109 ---------+--------+--------+--------+ Total 47 95 58 200
The table below gives sampling rates we will use for each of the cells above.
ses=1 ses=2 ses=3 female=0 .70 .50 .70 female=1 .70 .50 .70 proc sort data = hsb2; by female ses ; run; proc surveyselect data = hsb2 out = samp3 method = srs samprate = (.7 .5 .7 .7 .5 .7) seed = 9876; strata female ses; run; proc sort data = samp3; by female ses; run; proc means data = samp3; by female ses; run;female=0 ses=1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------ id 3 126.6666667 42.4774450 81.0000000 165.0000000 race 3 4.0000000 0 4.0000000 4.0000000 schtyp 3 1.0000000 0 1.0000000 1.0000000 prog 3 2.0000000 1.0000000 1.0000000 3.0000000 read 3 47.6666667 13.8684294 36.0000000 63.0000000 write 3 45.3333333 3.2145503 43.0000000 49.0000000 math 3 50.6666667 10.4083300 39.0000000 59.0000000 science 3 53.3333333 16.8621865 34.0000000 65.0000000 socst 3 42.0000000 5.2915026 36.0000000 46.0000000 SelectionProb 3 0.2000000 0 0.2000000 0.2000000 SamplingWeight 3 5.0000000 0 5.0000000 5.0000000 ------------------------------------------------------------------------------------ female=0 ses=2 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------ id 38 97.1052632 63.5396405 7.0000000 195.0000000 race 38 3.2368421 1.1953561 1.0000000 4.0000000 schtyp 38 1.1315789 0.3425700 1.0000000 2.0000000 prog 38 2.1052632 0.7636850 1.0000000 3.0000000 read 38 50.6315789 10.3019837 31.0000000 73.0000000 write 38 48.9473684 10.2111422 31.0000000 67.0000000 math 38 52.0789474 10.0629950 35.0000000 75.0000000 science 38 52.5526316 10.6128361 34.0000000 74.0000000 socst 38 49.8157895 10.8698822 26.0000000 66.0000000 SelectionProb 38 0.8085106 0 0.8085106 0.8085106 SamplingWeight 38 1.2368421 0 1.2368421 1.2368421 ------------------------------------------------------------------------------------ female=0 ses=3 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------ id 12 94.4166667 55.1896867 14.0000000 192.0000000 race 12 3.2500000 1.3568011 1.0000000 4.0000000 schtyp 12 1.0833333 0.2886751 1.0000000 2.0000000 prog 12 2.0833333 0.5149287 1.0000000 3.0000000 read 12 57.1666667 12.7552722 34.0000000 76.0000000 write 12 53.8333333 8.5899871 39.0000000 67.0000000 math 12 56.5000000 8.5652575 39.0000000 71.0000000 science 12 54.0000000 12.0075734 26.0000000 66.0000000 socst 12 59.0000000 8.7282196 42.0000000 71.0000000 SelectionProb 12 0.4137931 0 0.4137931 0.4137931 ------------------------------------------------------------------------------------ female=0 ses=3 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------ SamplingWeight 12 2.4166667 0 2.4166667 2.4166667 ------------------------------------------------------------------------------------ female=1 ses=1 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------ id 4 81.7500000 67.3170360 19.0000000 173.0000000 race 4 3.0000000 1.4142136 1.0000000 4.0000000 schtyp 4 1.0000000 0 1.0000000 1.0000000 prog 4 1.7500000 0.9574271 1.0000000 3.0000000 read 4 39.5000000 9.8826447 28.0000000 50.0000000 write 4 49.5000000 11.6761866 35.0000000 62.0000000 math 4 47.0000000 9.4868330 40.0000000 61.0000000 science 4 48.0000000 12.1928941 34.0000000 63.0000000 socst 4 44.0000000 8.7177979 33.0000000 51.0000000 SelectionProb 4 0.1250000 0 0.1250000 0.1250000 SamplingWeight 4 8.0000000 0 8.0000000 8.0000000 ------------------------------------------------------------------------------------ female=1 ses=2 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------ id 24 97.0833333 52.4867823 2.0000000 186.0000000 race 24 3.5000000 1.0632191 1.0000000 4.0000000 schtyp 24 1.1250000 0.3378320 1.0000000 2.0000000 prog 24 2.2500000 0.6756639 1.0000000 3.0000000 read 24 53.9166667 8.9438668 36.0000000 71.0000000 write 24 56.0416667 7.0924281 41.0000000 65.0000000 math 24 51.2916667 8.4774851 33.0000000 72.0000000 science 24 51.3750000 8.0369392 39.0000000 66.0000000 socst 24 54.9583333 8.3378611 41.0000000 71.0000000 SelectionProb 24 0.5000000 0 0.5000000 0.5000000 SamplingWeight 24 2.0000000 0 2.0000000 2.0000000 ------------------------------------------------------------------------------------ female=1 ses=3 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------------ id 9 83.0000000 62.3959133 26.0000000 188.0000000 race 9 3.1111111 1.1666667 1.0000000 4.0000000 schtyp 9 1.3333333 0.5000000 1.0000000 2.0000000 prog 9 2.3333333 0.5000000 2.0000000 3.0000000 read 9 61.7777778 9.8460370 50.0000000 73.0000000 write 9 61.0000000 6.6520673 49.0000000 67.0000000 math 9 60.0000000 7.4161985 45.0000000 69.0000000 science 9 57.6666667 5.2440442 47.0000000 66.0000000 socst 9 57.6666667 11.4564392 36.0000000 71.0000000 SelectionProb 9 0.3103448 0 0.3103448 0.3103448 SamplingWeight 9 3.2222222 0 3.2222222 3.2222222 ------------------------------------------------------------------------------------
Example 4: Specifying the number of observations to be sampled
You can specify the number of observations to be sampled from each strata if you prefer. Instead of using the samprate option, you would use the n = option and list the numbers in parentheses.
proc sort data = hsb2; by female ses ; run; proc surveyselect data = hsb2 out = samp4 method = srs n = (11 24 21 23 24 21) seed = 9876; strata female ses; run; proc sort data = samp4; by female ses; run; proc means data = samp4; by female ses; run; female=0 ses=1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------- id 11 65.0000000 57.5742998 3.0000000 169.0000000 race 11 2.8181818 1.3280197 1.0000000 4.0000000 schtyp 11 1.0000000 0 1.0000000 1.0000000 prog 11 1.5454545 0.6875517 1.0000000 3.0000000 read 11 52.3636364 8.3339394 42.0000000 63.0000000 write 11 48.1818182 9.7551851 31.0000000 65.0000000 math 11 48.3636364 7.0038950 41.0000000 63.0000000 science 11 52.4545455 12.4929071 31.0000000 69.0000000 socst 11 46.3636364 8.9361371 31.0000000 57.0000000 SelectionProb 11 0.7333333 0 0.7333333 0.7333333 SamplingWeight 11 1.3636364 0 1.3636364 1.3636364 -------------------------------------------------------------------------------- female=0 ses=2 Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------- id 24 100.2500000 66.1107437 7.0000000 195.0000000 race 24 3.2916667 1.1601786 1.0000000 4.0000000 schtyp 24 1.2083333 0.4148511 1.0000000 2.0000000 prog 24 2.0000000 0.7801895 1.0000000 3.0000000 read 24 50.6250000 8.5201169 34.0000000 63.0000000 write 24 48.5833333 9.8241790 31.0000000 65.0000000 math 24 51.0416667 8.3690900 35.0000000 66.0000000 science 24 50.7083333 8.3222027 36.0000000 66.0000000 socst 24 48.7083333 10.4235485 26.0000000 66.0000000 SelectionProb 24 0.5106383 0 0.5106383 0.5106383 SamplingWeight 24 1.9583333 0 1.9583333 1.9583333 -------------------------------------------------------------------------------- female=0 ses=3 Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------- id 21 125.9523810 49.3786150 20.0000000 199.0000000 race 21 3.8095238 0.6796358 1.0000000 4.0000000 schtyp 21 1.1904762 0.4023739 1.0000000 2.0000000 prog 21 1.9523810 0.4976134 1.0000000 3.0000000 read 21 55.0476190 8.6340963 39.0000000 73.0000000 write 21 52.1428571 10.8778937 33.0000000 67.0000000 math 21 54.5714286 7.8394606 38.0000000 71.0000000 science 21 55.6190476 8.4289750 36.0000000 69.0000000 socst 21 57.2380952 10.1828521 31.0000000 71.0000000 SelectionProb 21 0.7241379 0 0.7241379 0.7241379 SamplingWeight 21 1.3809524 0 1.3809524 1.3809524 -------------------------------------------------------------------------------- female=1 ses=1 Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------- id 23 77.5652174 52.5537180 1.0000000 173.0000000 race 23 3.1739130 1.0724727 1.0000000 4.0000000 schtyp 23 1.0869565 0.2881041 1.0000000 2.0000000 prog 23 2.1304348 0.7570486 1.0000000 3.0000000 read 23 48.9130435 9.0599671 34.0000000 65.0000000 write 23 53.7391304 9.7057499 35.0000000 65.0000000 math 23 50.9565217 10.7765666 40.0000000 72.0000000 science 23 47.4782609 9.4670220 29.0000000 63.0000000 socst 23 49.2608696 11.7055441 26.0000000 71.0000000 SelectionProb 23 0.7187500 0 0.7187500 0.7187500 SamplingWeight 23 1.3913043 0 1.3913043 1.3913043 -------------------------------------------------------------------------------- female=1 ses=2 Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------- id 24 108.2083333 53.9234394 13.0000000 193.0000000 race 24 3.7083333 0.8064504 1.0000000 4.0000000 schtyp 24 1.2083333 0.4148511 1.0000000 2.0000000 prog 24 2.0833333 0.7172815 1.0000000 3.0000000 read 24 51.0833333 7.1439647 42.0000000 71.0000000 write 24 55.1666667 7.2751314 41.0000000 67.0000000 math 24 51.8333333 8.8251945 38.0000000 72.0000000 science 24 49.0416667 7.4043417 36.0000000 66.0000000 socst 24 53.5000000 9.2077472 31.0000000 71.0000000 SelectionProb 24 0.5000000 0 0.5000000 0.5000000 SamplingWeight 24 2.0000000 0 2.0000000 2.0000000 -------------------------------------------------------------------------------- female=1 ses=3 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------- id 21 109.5238095 60.1361946 26.0000000 198.0000000 race 21 3.5238095 0.9283883 1.0000000 4.0000000 schtyp 21 1.3333333 0.4830459 1.0000000 2.0000000 prog 21 1.9047619 0.4364358 1.0000000 3.0000000 read 21 57.8571429 12.0842282 36.0000000 76.0000000 write 21 60.7619048 4.7634521 52.0000000 67.0000000 math 21 58.5238095 9.0808537 42.0000000 71.0000000 science 21 57.0952381 8.5726586 34.0000000 69.0000000 socst 21 57.4761905 9.9880881 31.0000000 71.0000000 SelectionProb 21 0.7241379 0 0.7241379 0.7241379 SamplingWeight 21 1.3809524 0 1.3809524 1.3809524 -------------------------------------------------------------------------------