For a 2-level hierarchical model, HLM requires two files for a 2-level model, one for level-1 and one for level-2. Similarly, for a 3-level hierarchical model, HLM requires three data files. This page shows some examples on how to convert a single Stata file into multiple data files for analyses in HLM.
For a single data file with both level-1 and level-2 variables, we will have to extract two files from it, one being with all the level-1 variables of interest and the other with all the level-2 variables of interest. The level-2 unit identifier serves as the linking variable to link the level-1 data file and level-2 data file together. It has to exists in both level-1 and level-2 file.
Example 1: Two data sets for 2-level modeling
We use HLM’s example data set hsball.dta to demonstrate how to extract two data sets from a data set with both level-1 variables and level-2 variables. This data set consists of student level variables and school level variables. The two data sets, the level-1 data set and level-2 data set will be student level data set and school level data set. The linking variable is the school identifier called id. Both level-1 data set and level-2 data set should be sorted by school. The command "unique" is used here to check if the variables that we suspect to be level-2 variables indeed are level-2 variables. You can download command "unique" by following the link after command "search unique". We used two pairs of Stata commands "preserve" and "restore" in the process to recover back our original data set. We use command collapse to aggregate level-2 variables to their level and that is what the level-2 data set is.
use https://stats.idre.ucla.edu/stat/hlm/faq/hsball, clear sort id unique id meanses size sector pracad disclim himinty Number of unique values of school meanses size sector pracad disclim himinty is 160 Number of records is 7185preservedrop size sector pracad disclim himinty meansessave hsb12_level1 file hsb12_level1.dta savedrestorepreserve collapse (mean) meanses size sector pracad disclim himinty, by(id) save hsb12_level2 file hsb12_level2.dta saved restore
Click here for the entire do file.
Example 2: Three data sets for 3-level modeling
The data set used in this example is an HLM example (Chapter 8) data set. We actually have combined three separate data sets together to come up with a single Stata data set called eg3all.dta just for the purpose of demonstration here. This data set consists of 1721 students nested in 60 schools. The information on students has been collected at multiple time points. Therefore, time is nested in students and students are nested in schools. The school level variables are size, lowinc and mobility. The student level variables are female black and hispanic. The time level variables are year, grade, math and retained. Variables year is shifted grade (year = grade – 1.5).
use https://stats.idre.ucla.edu/stat/hlm/faq/eg3all, clearunique childid Number of unique values of childid is 1721 Number of records is 7230 unique schoolid Number of unique values of schoolid is 60 Number of records is 7230 unique grade Number of unique values of grade is 6 Number of records is 7230 preserve /*extracting level-3 (school level) data*/ drop childid female black hispanic year grade math retained duplicates drop Duplicates in terms of all variables (7170 observations deleted) sort schoolid save eg3all_level3 file eg3all_level3.dta saved restore preserve /*extracting level-2 (student level) data*/ drop year grade math retained duplicates drop Duplicates in terms of all variables (5509 observations deleted) sort schoolid childid count 1721 save eg3all_level2 file eg3all_level2.dta savedrestorepreserve /*extracting level-1 (time level) data*/ drop schoolid female black hispanic size lowinc mobility sort childid save eg3all_level1 file eg3all_level1.dta saved restore
Click here for the entire do file.