Sampling with replacement is easy to do while sampling without replacemant can be a bit trickier. We will perform sampling with replacement using several Mata functions. The dataset used in our examples has two variables; 1) y is the variable to be sampled, and 2) grp which could be considered to be strata or cluster.
Sampling with replacement for an entire sample
We will begin by randomly sampling from the entire sample. Here’s how the code fragment works. Firt, we move a copy of y to Mata using putmata. Next, we set the random seed and the compute the number of observstions in y using rows(). Next, we create a vector, p, that has equal probablities for each of the observations. The rdiscrete() is the key function in this example. It randomly generates values between 1 and n with probabilities found in matrix p. Finally, we use index as the row indicator in the matrix assignment statement to get the values of y.
clear input grp y 1 107 1 115 1 132 1 118 1 151 2 219 2 287 2 212 2 235 2 241 2 280 3 333 3 321 3 372 3 316 3 345 3 338 end list, sep(0) +-----------+ | grp y | |-----------| 1. | 1 107 | 2. | 1 115 | 3. | 1 132 | 4. | 1 118 | 5. | 1 151 | 6. | 2 219 | 7. | 2 287 | 8. | 2 212 | 9. | 2 235 | 10. | 2 241 | 11. | 2 280 | 12. | 3 333 | 13. | 3 321 | 14. | 3 372 | 15. | 3 316 | 16. | 3 345 | 17. | 3 338 | +-----------+ mata: mata clear putmata y mata: rseed(76543219) yrows = rows(y) p = J(yrows,1,1/yrows) index = rdiscrete(yrows,1,p) index 1 +------+ 1 | 15 | 2 | 11 | 3 | 2 | 4 | 6 | 5 | 3 | 6 | 17 | 7 | 10 | 8 | 6 | 9 | 1 | 10 | 8 | 11 | 9 | 12 | 16 | 13 | 3 | 14 | 14 | 15 | 8 | 16 | 16 | 17 | 17 | +------+ ywr = y[index[,1],1] end getmata ywr list, sep(0) +-----------------+ | grp y ywr | |-----------------| 1. | 1 107 316 | 2. | 1 115 280 | 3. | 1 132 115 | 4. | 1 118 219 | 5. | 1 151 132 | 6. | 2 219 338 | 7. | 2 287 241 | 8. | 2 212 219 | 9. | 2 235 107 | 10. | 2 241 212 | 11. | 2 280 235 | 12. | 3 333 345 | 13. | 3 321 132 | 14. | 3 372 372 | 15. | 3 316 212 | 16. | 3 345 345 | 17. | 3 338 338 | +-----------------+
Samiling with replacement within strata or cluster
For sampling with replacement within strata or clusters, we will make use moremata, a user written collection (Jann, 2005) of Mata functions. To get moremata just type ssc install moremata in Stata’s command window. The two functions we will use are _mm_panels() and mm_sample().
Here’s how this code fragment works. Note: The sort grp is very important because we want all of the observations feom a given strata or cluster to be grouped together. The _mm_panels() function creates a vector with the size of each strata or cluster. mm_sample() functons in a similar manner to the built-in function rdiscrete(), i.e., it creates a vector of row indices, which in this case, are generated at random within each strata or cluster.
We will continue using the same dataset as was used the first example.
clear input grp y 1 107 1 115 1 132 1 118 1 151 2 219 2 287 2 212 2 235 2 241 2 280 3 333 3 321 3 372 3 316 3 345 3 338 end sort grp list, sep(0) +-----------+ | grp y | |-----------| 1. | 1 107 | 2. | 1 115 | 3. | 1 132 | 4. | 1 118 | 5. | 1 151 | 6. | 2 219 | 7. | 2 287 | 8. | 2 212 | 9. | 2 235 | 10. | 2 241 | 11. | 2 280 | 12. | 3 333 | 13. | 3 321 | 14. | 3 372 | 15. | 3 316 | 16. | 3 345 | 17. | 3 338 | +-----------+ mata: mata clear putmata y putmata grp mata: rseed(87654321) yrows = rows(y) grpn = _mm_panels(grp) // moremata function grpn 1 +-----+ 1 | 5 | 2 | 6 | 3 | 6 | +-----+ index = mm_sample(.,grpn) // moremata function ywr = y[index[,1],1] end getmata ywr list, sep(0) +-----------------+ | grp y ywr | |-----------------| 1. | 1 107 132 | 2. | 1 115 151 | 3. | 1 132 118 | 4. | 1 118 115 | 5. | 1 151 132 | 6. | 2 219 219 | 7. | 2 287 212 | 8. | 2 212 241 | 9. | 2 235 212 | 10. | 2 241 280 | 11. | 2 280 219 | 12. | 3 333 321 | 13. | 3 321 338 | 14. | 3 372 333 | 15. | 3 316 333 | 16. | 3 345 333 | 17. | 3 338 345 | +-----------------+
Jann,B. (2005) moremata: Stata module (Mata) to provide various functions. Available from http://ideas.repec.org/c/boc/bocode/s455001.html.