This module will show how to use date variables, date functions, and date display formats in Stata.
Converting dates from raw data using the "date()" function
The trick to inputting dates in Stata is to forget they are dates, and treat them as character strings, and then later convert them into a Stata date variable. You might have the following date data in your raw data file.
type dates1.rawJohn 1 Jan 1960 Mary 11 Jul 1955 Kate 12 Nov 1962 Mark 8 Jun 1959
You can read these data by typing:
infix str name 1-4 str bday 6-17 using dates1.raw(4 observations read)
Using the list command, you can see that the date information has been read correctly into bday.
listname bday 1. John 1 Jan 1960 2. Mary 11 Jul 1955 3. Kate 12 Nov 1962 4. Mark 8 Jun 1959
Since bday is a string variable, you cannot do any kind of date computations with it until you make a date variable from it. You can generate a date version of bday using the date() function. The example below creates a date variable called birthday from the character variable bday. The syntax is slightly different depending on which version of Stata you are using. The difference is in how the pattern is specified. In Stata 9 it should be lower case (e.g., "dmy") and in Stata 10, it should be upper case for day, month, and year (e.g., "DMY") but lower case if you want to specify hours, minutes or seconds (e.g., "DMYhms"). Our data are in the order day, month, year, so we use "DMY" (or "dmy" if you are using Stata 9) within the date() command. (Unless otherwise noted, all other Stata commands on this page are the same for versions 9 and 10.)
In Stata version 9:
generate birthday=date(bday,"dmy")
In Stata version 10:
generate birthday=date(bday,"DMY")
Let’s have a look at both bday and birthday.
listname bday birthday 1. John 1 Jan 1960 0 2. Mary 11 Jul 1955 -1635 3. Kate 12 Nov 1962 1046 4. Mark 8 Jun 1959 -207
The values for birthday may seem confusing. The value of birthday for John is 0 and the value of birthday for Mark is -207. Dates are actually stored as the number of days from Jan 1, 1960 which is convenient for the computer storing and performing date computations, but is difficult for you and I to read.
We can tell Stata that birthday should be displayed using the %d format to make it easier for humans to read.
format birthday %d list name bday birthday 1. John 1 Jan 1960 01jan1960 2. Mary 11 Jul 1955 11jul1955 3. Kate 12 Nov 1962 12nov1962 4. Mark 8 Jun 1959 08jun1959
The date() function is very flexible and can handle dates written in almost any manner. For example, consider the file dates2.raw.
type dates2.raw John Jan 1 1960 Mary 07/11/1955 Kate 11.12.1962 Mark Jun/8 1959
These dates are messy, but they are consistent. Even though the formats look different, it is always a month day year separated by a delimiter (e.g., space slash dot or dash). We can try using the syntax from above to read in our new dates. Note that, as discussed above, for Stata version 10 the order of the date is declared in upper case letters (i.e., "MDY") while for version 9 it is declared in all lower case (i.e., "mdy").
clear infix str name 1-4 str bday 6-17 using dates2.raw (4 observations read) generate birthday=date(bday,"MDY") format birthday %d list name bday birthday 1. John Jan 1 1960 01jan1960 2. Mary 07/11/1955 11jul1955 3. Kate 11.12.1962 12nov1962 4. Mark Jun/8 1959 08jun1959
Stata was able to read those dates without a problem. Let’s try an even tougher set of dates. For example, consider the dates in dates3.raw.
type dates3.raw4-12-1990 4.12.1990 Apr 12, 1990 Apr12,1990 April 12, 1990 4/12.1990 Apr121990
Let’s try reading these dates and see how Stata handles them. Again, remember that for Stata version 10 dates are declared "MDY" while for version 9 they are declared "mdy".
clearinfix str bday 1-20 using dates3.raw(7 observations read)generate birthday=date(bday,"MDY")(1 missing value generated)format birthday %dlistbday birthday 1. 4-12-1990 12apr1990 2. 4.12.1990 12apr1990 3. Apr 12, 1990 12apr1990 4. Apr12,1990 12apr1990 5. April 12, 1990 12apr1990 6. 4/12.1990 12apr1990 7. Apr121990 .
As you can see, Stata was able to handle almost all of those crazy date formats. It was able to handle Apr12,1990 even though there was not a delimiter between the month and day (Stata was able to figure it out since the month was character and the day was a number). The only date that did not work was Apr121990 and that is because there was no delimiter between the day and year. As you can see, the date() function can handle just about any date as long as there are delimiters separating the month day and year. In certain cases Stata can read all numeric dates entered without delimiters, see help dates for more information.
Converting dates from raw data using the mdy() function
In some cases, you may have the month, day, and year stored as numeric variables in a dataset. For example, you may have the following data for birth dates from dates4.raw.
type dates4.raw7 11 1948 1 1 1960 10 15 1970 12 10 1971
You can read in this data using the following syntax to create a separate variable for month, day and year.
clearinfix month 1-2 day 4-5 year 7-10 using dates4.raw(4 observations read)listmonth day year 1. 7 11 1948 2. 1 1 1960 3. 10 15 1970 4. 12 10 1971
A Stata date variable can be created using the mdy() function as shown below.
generate birthday=mdy(month,day,year)
Let’s format birthday using the %d format so it displays better.
format birthday %dlistmonth day year birthday 1. 7 11 1948 11jul1948 2. 1 1 1960 01jan1960 3. 10 15 1970 15oct1970 4. 12 10 1971 10dec1971
Consider the data in dates5.raw, which is the same as dates4.raw except that only two digits are used to signify the year.
type dates5.raw7 11 48 1 1 60 10 15 70 12 10 71
Let’s try reading these dates just like we read dates4.raw.
clearinfix month 1-2 day 4-5 year 7-10 using dates5.raw(4 observations read)generate birthday=mdy(month,day,year)(4 missing values generated)format birthday %dlistmonth day year birthday 1. 7 11 48 . 2. 1 1 60 . 3. 10 15 70 . 4. 12 10 71 .
As you can see, the values for birthday are all missing. This is because Stata assumes that the years were literally 48, 60, 70 and 71 (it does not assume they are 1948, 1960, 1970 and 1971). You can force Stata to assume the century portion is 1900 by adding 1900 to the year as shown below (note that we use replace instead of generate since the variable birthday already exists).
replace birthday=mdy(month,day,year+1900)(4 real changes made)format birthday %dlistmonth day year birthday 1. 7 11 48 11jul1948 2. 1 1 60 01jan1960 3. 10 15 70 15oct1970 4. 12 10 71 10dec1971
Computations with elapsed dates
Date variables make computations involving dates very convenient. For example, to calculate everyone’s age on January 1, 2000 simply use the following conversion.
generate age2000=( mdy(1,1,2000) - birthday ) / 365.25listmonth day year birthday age2000 1. 7 11 48 11jul1948 51.47433 2. 1 1 60 01jan1960 40 3. 10 15 70 15oct1970 29.21287 4. 12 10 71 10dec1971 28.06023
Please note that this formula for age does not work well over very short time spans. For example, the age for a child on their his birthday will be less than one due to using 365.25. There are formulas that are more exact but also much more complex. Here is an example courtesy of Dan Blanchette.
generate altage = floor(([ym(2000, 1) - ym(year(birthday), month(birthday))] - [1 < day(birthday)]) / 12)
Other date functions
Given a date variable, one can have the month, day and year returned separately if desired, using the month(), day() and year() functions, respectively.
generate m=month(birthday)generate d=day(birthday)generate y=year(birthday)list m d y birthdaym d y birthday 1. 7 11 1948 11jul1948 2. 1 1 1960 01jan1960 3. 10 15 1970 15oct1970 4. 12 10 1971 10dec1971
If you’d like to return the day of the week for a date variable, use the dow() function (where 0=Sunday, 1=Monday etc.).
gen week_d=dow(birthday)list birthday week_dbirthday week_d 1. 11jul1948 0 2. 01jan1960 5 3. 15oct1970 4 4. 10dec1971 5
Summary
The date() function converts strings containing dates to date variables. The syntax varies slightly by version.
In Stata version 9:
gen date2 = date(date, "dmy")
In Stata version 10:
gen date2 = date(date, "DMY")
The mdy() function takes three numeric arguments (month, day, year) and converts them to a date variable.
generate birthday=mdy(month,day,year)
You can display elapsed times as actual dates with display formats such as the %d format.
format birthday %d
Other date functions include the month(), day(), year(), and dow() functions. For online help with dates, type help dates at the command line. For more detailed explanations about how Stata handles dates and date functions, please refer to the Stata Users Guide.