How can I generate a dose response curve SAS?

Measured responses to a set range of doses is a common form of data in medicine and pharmaceutical statistics. Fitting a function to these points allows for estimating expected responses at different doses and are used to determine what levels are considered most effective and what levels might be considered toxic.

When the measured response is binary, the logistic or probit regression can be used to generate a dose response curve. See our Data Analysis Example pages for logistic and probit regression for details on each of these.

If the measured response is continuous (cell counts, hormone concentrations, etc.), then a logistic model with three or more parameters would be appropriate. On this page, we will illustrate the four-parameter logistic model often referred to as an E_max model. There are several slightly different versions of this model. We will demonstrate the parameterization shown in Pharmaceutical Statistics Using SAS: A Practical Guide by Dmitrienko et al. for the expected response, Y, given a dose, X.

Image Emax4

There are four thetas estimated in this model: 3 and 4 are the maximum and minimum response levels, 1 is the concentration where the response is the midpoint between the maximum and minimum (sometimes referred to as ED50), and 2 is the slope parameter, indicating the speed with which the curve rises between the minimum and maximum.

We can look at a small example dataset and fit these four parameters. We will use the nlin procedure in SAS and will need to look at some summary statistics on our dose and response variables to provide reasonable starting values to the procedure. For the starting values of our maximum and minimum response, we can use the actual maximum and minimum values. For the ED50, the median or mean of the doses can be used as a starting value. And 1 is generally a reasonable starting value for the slope parameter.

data dr; input concentration response;
datalines;
.1	21.125
.1	20.575
.25	40.525
.5	26.15
.75	26.35
.75	44.275
1	49.725
1	63.6
10	49.35
10	68.875
100	58.025
100	58.075
1000	68.025
1000	52.3
;


proc means data = dr max min median; 
run;

The MEANS Procedure

Variable              Maximum         Minimum          Median
-------------------------------------------------------------
concentration         1000.00       0.1000000       1.0000000
response           68.8750000      20.5750000      49.5375000
-------------------------------------------------------------

proc nlin data = dr;
  parms top = 68.875 bottom = 20.575 EC50 = 1 hill = 1;
  model response = top + (bottom - top) / 
  (1 + (concentration / EC50)**hill);
run;

                                  Sum of        Mean               Approx
Source                    DF     Squares      Square    F Value    Pr > F

Model                      3      2915.8       971.9      11.64    0.0013
Error                     10       835.2     83.5176
Corrected Total           13      3751.0

The NLIN Procedure

                              Approx
Parameter      Estimate    Std Error    Approximate 95% Confidence Limits

top             59.1022       3.7306     50.7900     67.4144
bottom          27.0807       4.5930     16.8468     37.3146
EC50             0.8171       0.0884      0.6201      1.0141
hill            12.4058      11.8429    -13.9818     38.7933

Using these four parameter estimates, we can find predicted responses for more dose levels than the original 8 and plot the dose response curve.

data a;
  x=.01;
  do i=0 to 200;
    drcurve4 = 59.1022 + ( 27.0807 - 59.1022)/(1 + (x/0.8171)**12.4058);
	output;
	x=x+.01;
  end;
run;

proc sgplot data = a;
  scatter x = x y = drcurve4;
run;

If any of the estimated values are not variable (like a minimum of zero, a maximum of 100, or a slope of one), the proc nlin code can be modified to fit a three parameter model. Most typically, the slope is constrained to one and the minimum, maximum, and ED50 are estimated.

The four parameter E_max model assumes a symmetry to the curve. The estimated value of ED50 is the inflection point of the curve at which this symmetry occurs. However, if you wish to NOT assume a symmetric curve, a five parameter logistic model is an option. In such a model, the denominator of the fraction in the four parameter model is raised to a fifth parameter. Adding a fifth parameter to our proc nlin would result in the code below:

proc nlin data = dr;
  parms top = 68.875 bottom = 20.575 EC50 = 1 hill = 1 theta5 = 1;
  model response = top + (bottom - top) / 
  ((1 + (concentration / EC50)**hill)**theta5);
run;

However, this model is far more difficult to fit than the four parameter model. With our small dataset, SAS is unable to quickly converge on stable estimates. For a thorough comparison on the difference between four and five parameter models, see "The five-parameter logistic: A characterization and comparison with the four parameter logistic", 2005, by Gottschalk and Dunn.

References

Dmitienko, A., Chuang-Stein, C., D’Agostino, B. 2007. Pharmaceutical statistics using SAS: a practical guide. Cary: SAS Publishing.
Gottschalk, P. and Dunn, J. 2005. “The five-parameter logistic: A characterization and comparison with the four-parameter logistic”, Analytical Biochemistry, 343, 1, p54-65.