Chart below shows examples of Generalized Linear Models (GLM):
In this workshop, we will cover the key concepts of Generalized Linear Models (GLMs) and explore Logistic Regression, Poisson Regression, and briefly, the Negative Binomial model, with examples in R.
Before exploring generalized regression models, it’s helpful to first understand linear regression. This will provide insight into why and how we extend linear regression to generalized linear models (GLMs).
We use \(x\) for the predictor and \(y\) for the response variable.
In a Simple regression model we observe \(n\) pairs of data points \((x_i, y_i)\), where \(i = 1, 2, \cdots, n\). These data points are typically randomly and independently sampled from a population of interest. Our goal is to study the relationship between \(x\) and \(y\).
In a linear regression model, we assume that this relationship follows a linear function:
\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i\]
where:
\(\beta_0\) (intercept) and \(\beta_1\) (slope) are the model parameters estimated from the data.
\(\epsilon\) is the error term, representing unexplained variability.
We can extend this to multiple predictors, leading to the multiple linear regression model:
\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots +\beta_p x_{ip} + \epsilon_i\]
Here, each \(\beta_j\) represents the effect of predictor \(x_j\) on \(y\).
Linear regression is typically estimated using the Ordinary Least Squares (OLS) method, which finds model parameters by minimizing the sum of squared errors. OLS has an algebraic solution, allowing parameters to be computed analytically.
\[y = \overbrace{\beta_0 + \beta_1 x }^\text{Linear function}+ \overbrace{\epsilon}^\text{Error term}\]
Mean zero (\(\text{E}(\epsilon)=0\)).
Independence (errors are not correlated).
Constant variance (homoscedasticity).
Normal distribution (\(\epsilon \sim N(0, \sigma ^2)\)).
In other words, we assume that errors are independent and identically distributed (i.i.d.) from a normal distribution with mean zero.
\[\mu = \text{E}(y |x) = \beta_0 + \beta_1 x\]
\[\mu = \text{E}(y | x_1, x_2 , \cdots x_p) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\]
The above equation is also called the linear prediction.
The linear prediction can get values from \(-\infty\) to \(+\infty\).
While linear regression is a powerful tool, it has limitations. This leads us to more general models:
What if the outcome is not continuous?
What if errors do not have equal variance?
What if predicted values fall outside a plausible range?
These limitations motivate the need for generalized linear models (GLMs), which we will explore next.
For this example, we use elementary school API score data from the OARC website.
api00
) and school enrollment (enroll
). api00 enroll
1 693 247
2 570 463
3 546 395
4 571 418
5 478 520
6 858 343
We assume the relationship between \(y\) (the response variable) and \(x\) (the predictor) can be approximated by a straight line. However, the true nature of this relationship is unknown.
In R, we use the lm
function to fit a linear model. To extract and interpret the results, we use the summary
function:
Call:
lm(formula = api00 ~ enroll, data = dat)
Residuals:
Min 1Q Median 3Q Max
-285.50 -112.55 -6.70 95.06 389.15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 744.25141 15.93308 46.711 < 2e-16 ***
enroll -0.19987 0.02985 -6.695 7.34e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 135 on 398 degrees of freedom
Multiple R-squared: 0.1012, Adjusted R-squared: 0.09898
F-statistic: 44.83 on 1 and 398 DF, p-value: 7.339e-11
The estimated intercept is 744.25, and the estimated slope is -0.2.
plot
function creates the plot, and abline(m1)
adds the regression line:#Plotting the scatter plot
plot(api00 ~ enroll, data = dat,
main = "Scatter Plot: School Performance vs. Enrollment",
ylab = "API Score (2000)",
xlab = "Number of Students Enrolled",
xlim = c(0, 1000))
abline(m1, col = "blue")
This plot provides a clear visual representation of the negative relationship between school enrollment and academic performance.
The goal is to estimate the parameters \(\beta_0\) and \(\beta_1\)—the intercept and the slope—so that the fitted line best represents the data.
The estimated model parameters are denoted as \(\hat{\beta}_0\) and \(\hat{\beta}_1\).
The estimated regression model is:
\[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\]
Here, \(\hat{y}\) represents the predicted mean of the outcome for a given value of \(x\). It is also known as the fitted value.
The predicted value for each observation is given by \(\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i\).
The difference between the observed value \(y_i\) and the predicted value \(\hat{y_i}\) is called the residual and is given by:
\[e_i = y_i - \hat{y}_i\]
\[ \min \sum_{i = 1}^{n} (y_i - \hat{y_i}) ^ 2\]
This is known as the Least Squares method.
Let’s visualize the components of uncertainty in the linear model by plotting the fitted line on a scatter plot.
In the plot above,
\(y_i - \bar{y}\) is overall or total uncertainty which we try to explain by the linear model.
\(\hat{y}_i - \bar{y}\) is the uncertainty that can be explain by the linear model.
\(y_i - \hat{y}_i\) is the residual or uncertainty that cannot be explain by the linear model.
In linear regression, the likelihood function (possibility of observing the outcome given parameters) can be determined using normal distribution. In fact, we can use another statistical method to estimate parameters of the model, which is called Maximum Likelihood Method. In OLS the Maximum Likelihood Method is equivalent to Least square method.
We will look at model diagnostics plots to evaluate the model assumptions.
The plot in the left corner is the Residuals vs. Fitted plot.
This plot helps assess whether the assumptions of zero mean and constant variance of the error term hold. Based on the plot, there is no strong evidence to reject these assumptions.
If the error has a mean of zero, then as the sample size increases, the estimated parameters will converge to the true values of $\beta$ (i.e., the estimator is unbiased).
The assumption of constant variance (homoscedasticity) ensures that the Ordinary Least Squares (OLS) method provides parameter estimates with minimal variability.
The Normal Q-Q plot helps assess whether the residuals follow a normal distribution.
The normality assumption is crucial for statistical tests on model parameters, particularly in small samples.
However, for large sample sizes, the normality assumption becomes less critical due to the Central Limit Theorem.
\[ \mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\]
However, in some cases, the outcome for a given set of predictors can only take values within a restricted range:
If the outcome follows a count distribution (e.g., Poisson distribution), it can only take non-negative integer values (\(0, 1, 2, \cdots\)), and the mean must be strictly positive (\(\mu > 0\)).
If the outcome is dichotomous (e.g., follows a Bernoulli distribution), it can only take values 0 or 1, meaning the mean (i.e., probability of success) lies within the interval \((0,1)\).
In both of these cases, the standard linear prediction does not work correctly because it does not guarantee a plausible expected outcome.
In generalized linear regression, we use a link function to transform the linear prediction so that it aligns with the possible values of the expected outcome.
For example, when the mean is constrained to the interval \((0,1)\), we can use the logit function to map values between 0 and 1 to the entire real number range (\(-\infty, +\infty\)). The diagram below illustrates this transformation:
In the next slides, we will explore these link functions in more detail.
Suppose the probability of being admitted to graduate school is 0.3.
The odds of admission are defined as:
\[ \tt{odds(admit)} = \dfrac{p}{1 - p} = \dfrac{0.3}{1 - 0.3} = 0.4285714 \]
\[ \tt{odds(not \ admit)} = \dfrac{p}{1 - p} = \dfrac{0.7}{1 - 0.7} = 2.333333 \]
Example: Calculating Probability from Odds
Using the formula for odds:
\(odds = \dfrac{p}{1 - p}\)
Solving for \(p\):
\(p = \dfrac{odds} {1 + odds} = 9 / (9 + 1) = 0.9\)
Thus, the probability of the Lakers making the playoffs is 90%.
Another useful transformation is the log function.
The log of a positive value (between 0 and \(+\infty\)) maps it to a range of negative to positive infinity.
The inverse log function is the exponential function.
We often use the natural log (\(\ln\)) and the exponential function (\(e^x\)) in statistical modeling.
Applying the log transformation to odds (i.e., log(odds)) converts a probability (between 0 and 1) into an unrestricted real number (between \(-\infty\) and \(+\infty\)).
This transformation, log(odds), is called the logit function and is a fundamental component of logistic regression.
The exponential function plays a crucial role in generalized linear models (GLMs). Understanding its mathematical properties is essential for grasping the concepts behind GLMs.
For example,
\[10^3 = 1000 \Rightarrow \log_{10}(1000) = 3\] This means the logarithm base 10 of 1000 is 3.
Similarly,
\[2^6 = 64 \Rightarrow \log_{2}(64) = 6 \]
Euler’s number, \(e\), is a fundamental mathematical constant, approximately 2.718.
It can be expressed as an infinite series:
\[\displaystyle e=\sum \limits _{n=0}^{\infty }{\frac {1}{n!}}={\frac {1}{1}}+{\frac {1}{1}}+{\frac {1}{1\times 2}}+{\frac {1}{1\times 2\times 3}}+\cdots \approx 2.718282.\]
If \[e^6 = a\]
Then
\[\log_e(a) = 6\]
\[\exp(x + y) = \exp(x)\exp(y)\]
\[\log(xy) = \log(x) + \log(y)\]
\[\log(x/y) = \log(x) - \log(y)\]
In linear regression models we assumed the errors are distributed from a normal distribution with mean zero and variance \(\sigma^2\).
We can also say the outcome for a given \(x\) (i.e. \(y|x\)) has a normal distribution with mean \(\mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\) and variance \(\sigma^2\).
This is not always true and for example in \({0, 1}\) outcome the outcome has a Bernoulli distribution and the variance depends on the mean.
In generalized linear regression models we assume the outcome has a distribution from a family of distributions.
In R, the distribution of the outcome is defined by the argument family
in the glm
function.
In generalized linear regression models we assume a transformation (by the link function) of the mean of the distribution of the outcome has a linear relationship with predictors.
In the next few slides we review some of the special distributions.
For example weight of newborn babies.
The graph below shows the plot of PDF of a normal distribution with \(\mu = 2\) and \(sd = 1\).
For example, number of time we flipped a coin until we observe a head for the first time. This value could be in the set of \(\{1, 2, 3, ...\}\).
Sum of outcomes from rolling two dice as a random variable that takes values in the set of \(\{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\}\).
Number of patients hospitalized daily for COVID 19 in a healthcare facility.
Probability Mass Function
For example the PMF of a random variable of outcome of rolling a dice with outcome of \(\{1, 2, 3, 4, 5, 6\}\) is:
\(p(X = x) = 1/6\)
for \(x \in \{1, 2, 3, 4, 5, 6\}\).
Bernoulli random variable is a random variable that can get only two possible outcome, 1 with probability \(p\) and 0 with probability \(q = 1 - p\).
The PMF of Bernoulli with zero and one outcomes is:
\[f(x;p)=\begin{cases} 1 - p & {\text{if }}x=0,\\p & {\text{if }}x=1.\end{cases}\]
the mean of a Bernoulli random variable is:
\(\mu_X = E(X) = 1 \times p + 0 \times (1 - p) = p\)
and the variance is
\(\sigma^2_X = var(X) = E(X- \mu)^2 = (1 - p) ^ 2 \times p + (0 - p) ^ 2 \times (1 - p) \\ = p(1 - p) ( 1 - p + p) = p(1 - p)\)
The PMF of Binomial(n, p) is: (optional) \[\displaystyle f(x; n,p)=\Pr(x;n,p)=\Pr(X=x)=\frac{n!}{x!(n-x)!} p^{x}(1-p)^{n-x}\] For \(x = 0, 1, 2, 3, \cdots, n\).
Sum of \(n\) independent Bernoulli random variable with the same parameter \(p\) has Binomial distribution with parameters \(p\) and \(n\).
If we know the PMF of a random variable, we can calculate the probability of observing any given value by plug in that value in the PMF.
For example for a Binomial random variable with \(p = 0.2\) and \(n = 10\) we can calculate probabilities for each value of the random variable:
\(P(x = 0) = [10! / 0!(10 - 0)!] 0.2 ^ 0 (1 - 0.2) ^ {10} = 0.8 ^ {10} \approx 0.107\)
The mean of binomial is \(np\) and the variance is \(np(1 - p)\)
The graph below shows the plot of PMF of a Binomial random variable with \(p = 0.2\) and \(n = 10\).
The Poisson distribution is one of the most commonly used probability distributions for modeling count data. It describes the probability of a given number of events occurring in a fixed interval of time or space, assuming the events occur independently and at a constant average rate.
Key Characteristics:
Later, in regression for count outcome, we will explore the Poisson distribution in more detail, including its mathematical formulation, and applications.
Logistic regression is used when the outcome variable is binary, meaning it can take only two values (e.g., 1 and 0).
Example: Graduate School Admissions
In this example, we analyze binary data from the OARC statistics website.
A researcher is interested in examining how factors such as: GRE scores (Graduate Record Exam), GPA (Grade Point Average), and Undergraduate institution prestige affect the probability of admission into graduate school.
The response variable (admission status) is binary, meaning:
1 = Admitted
0 = Not admitted
The outcome variable: admit
(binary: 1 = admitted, 0 = not admitted).
The predictor variables:
gre
(Graduate Record Exam score) – continuousgpa
(Grade Point Average) – continuousrank
(Undergraduate institution prestige) – categorical (values 1 to 4):
Rank 1: Highest prestige
Rank 4: Lowest prestige
To summarize the dataset, we can use summary()
. If we need the standard deviations for each variable, we can apply the sd
function using sapply()
.
admit gre gpa rank
Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000
1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
Median :0.0000 Median :580.0 Median :3.395 Median :2.000
Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
admit gre gpa rank
0.4660867 115.5165364 0.3805668 0.9444602
This is also can be seen as regression model with intercept only model.
Variable admit is binary and it has value of 1 if a person admitted and 0 for not admitted. For the variable admit, the mean is the same as proportion of 1 or estimated probability of getting admit in a grad school.
\[ \hat{p} = \dfrac{X}{n} = \dfrac{\sum_{i=1}^nx_i}{n} \]
Where \(X\) is the total number of people admitted and \(n\) is total number of students and \(x_i\) is value of admit for the \(i\)-th student.
We can calculate standard deviation directly from the mean.
Note that this variable is binary and assuming a binary outcome is a result of a Bernoulli trial then, \(\hat\mu_{admit} = \hat p = \sum_{i=1}^{n} x_i /n\)
Since in Bernoulli distribution the variance is \(var(x) = p(1-p)\), we can use this formula to calculate the variance.
Remember in OLS, where the error assumed to be normally distributed and so the outcome, the variance assumed to be constant but in here the variance depends on the predicted mean value.
Of course not in the intercept only model because the expected mean is constant!
Now run a regression model of admit predicted by gre, assuming admit is a continues variable.
We are trying to estimate the mean of admit for given value of gre. Since the admit is 0 and 1 this can be seen as probability of admit.
Call:
lm(formula = admit ~ gre, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-0.4755 -0.3415 -0.2522 0.5989 0.8966
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1198407 0.1190510 -1.007 0.314722
gre 0.0007442 0.0001988 3.744 0.000208 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4587 on 398 degrees of freedom
Multiple R-squared: 0.03402, Adjusted R-squared: 0.03159
F-statistic: 14.02 on 1 and 398 DF, p-value: 0.0002081
As we can see the variance of the predicted outcome is systematically related to the predicted outcome and thus is not constant.
Another issue with this model is that:
In linear model we allow the predicted outcome to be any number between negative infinity to positive infinity.
But the probability can not be a negative number or a number greater than zero. So, we need to use a transformation function such that we can relate the outcome to a value between 0 and 1.
To overcome this issue and also the issue of equality of variance in OLS we need to use a different model.
\[ log(odds | x_1, x_2 , \cdots x_p) = \beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\]
Or,
\[\frac{p}{1-p} = e^{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}\]
or,
\[p = \frac{e^{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}}{1+e^{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}} \] Or,
\[p = \frac{1} {1 + e^{-(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}}\]
Notice that we do not need to use \(\epsilon\) to indicate the uncertainty of the outcome because the probability itself is the expected of the outcome and the outcome is one Bernoulli draw with this given probability.
In R we use function glm
to run a generalized linear model to estimate the coefficients of the model. The distribution of the outcome is defined by the argument “family” with the “option”logit” for the link function.
Call:
glm(formula = admit ~ gre, family = binomial(link = "logit"),
data = mydata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.901344 0.606038 -4.787 1.69e-06 ***
gre 0.003582 0.000986 3.633 0.00028 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 486.06 on 398 degrees of freedom
AIC: 490.06
Number of Fisher Scoring iterations: 4
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.901344270 0.6060376103 -4.787400 1.689561e-06
gre 0.003582212 0.0009860051 3.633056 2.800845e-04
2.5 % 97.5 %
(Intercept) -4.119988259 -1.739756286
gre 0.001679963 0.005552748
\[ log(\text{odds of admit}) = -2.90134 + 0.00382 \text{ GRE} \]
This is not very useful and easy to understand. We just can see this positive relationship is not very likely by chance (p-value = 0.00028).
The intercept is the expected log odds of someone getting admitted to graduate school when their GRE score is zero.
When we exponentiate the intercept then we get the expected odds of someone with GRE equal to zero.
\(exp(-2.901344) = 0.05494932\). Transforming to probability, \(p = 0.05494932/(1+0.05494932)= 0.052\), we get the expected probability of someone with GRE equal to 0.
What if we exponentiate the slope?
Assume the someone has GRE of 500. Then the predicted log-odds based on the model will be:
\[odds_{500} = e^{-2.901344 + 0.003582 \times 500}\]
If their GRE score increases by one unit then the odds will be:
\[odds_{501} = e^{-2.901344 + 0.003582 \times (500 + 1)}\]
If we divide the latter by the former we have:
\[\frac{odds_{501}}{odds_{500}} = \frac{e^{-2.901344 + 0.003582 \times (500 + 1)}}{e^{-2.901344 + 0.003582 \times (500)}} = \frac{e^{-2.901344}\times e ^{0.003582\times 500} \times e^{(0.003582)}}{e^{-2.901344}\times e ^{0.003582\times 500}} = e^{0.003582} \]
Thus,
The exponentiation of the slope is the odds ratio of admission for a unit increase of GRE score which is:
\[exp(0.003582) = 1.003588\] Or we can say the odds of getting admitted on average increase by \(0.3\%\) for one unit improvement of GRE score. This is the old GRE score and at that time it was in 10-point increments. What if someone improve their GRE by 50 unit?
\[\frac{odds_{550}}{odds_{500}} = \frac{e^{-2.901344 + 0.003582 \times (500 + 50)}}{e^{-2.901344 + 0.003582 \times (500)}} = \frac{e^{-2.901344}\times e ^{0.003582\times 500} \times e^{50\times(0.003582)}}{e^{-2.901344}\times e ^{0.003582\times 500}} = (e^{0.003582})^{50} \]
The exponentiation function is increases multiplicative and not additive. Thus, for 50 more GRE score the odds of admission increases by a factor of \((\exp(\beta_1))^{50} = 1.003588 ^{50} = 1.196115\). In another word their adds of getting admitted increases by about \(20\%\).
Calculate the exponentiated of the slope and interpret the slope of the model?
If someone increase their gpa by 0.2 how the expected odds of their admission changes?
Call:
glm(formula = admit ~ gre + gpa, family = binomial(link = "logit"),
data = mydata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.949378 1.075093 -4.604 4.15e-06 ***
gre 0.002691 0.001057 2.544 0.0109 *
gpa 0.754687 0.319586 2.361 0.0182 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 480.34 on 397 degrees of freedom
AIC: 486.34
Number of Fisher Scoring iterations: 4
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.949378063 1.075092882 -4.603675 4.151004e-06
gre 0.002690684 0.001057491 2.544403 1.094647e-02
gpa 0.754686856 0.319585595 2.361455 1.820340e-02
(Intercept) gre gpa
0.007087816 1.002694307 2.126945379
2.5 % 97.5 %
(Intercept) -7.1092205752 -2.886726963
gre 0.0006373958 0.004791712
gpa 0.1347509288 1.390131131
the odds of getting admitted on average increases by a factor of \(2.1269\) for one unit improvement of GPA given GRE score is constant.
Call:
glm(formula = admit ~ gre * gpa, family = binomial(link = "logit"),
data = mydata)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -13.818700 5.874816 -2.352 0.0187 *
gre 0.017464 0.009606 1.818 0.0691 .
gpa 3.367918 1.722685 1.955 0.0506 .
gre:gpa -0.004327 0.002787 -1.552 0.1205
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 477.85 on 396 degrees of freedom
AIC: 485.85
Number of Fisher Scoring iterations: 4
Estimate Std. Error z value Pr(>|z|)
(Intercept) -13.818700374 5.874815854 -2.352193 0.01866309
gre 0.017464312 0.009606206 1.818024 0.06906048
gpa 3.367918022 1.722685231 1.955040 0.05057838
gre:gpa -0.004326605 0.002786899 -1.552480 0.12054741
(Intercept) gre gpa gre:gpa
9.968153e-07 1.017618e+00 2.901805e+01 9.956827e-01
2.5 % 97.5 %
(Intercept) -2.576285e+01 -2.677081148
gre -9.057462e-04 0.036855984
gpa 8.009849e-02 6.850511379
gre:gpa -9.932612e-03 0.001022485
[1] 3.335476
[1] 3.335407
The interaction therm is an multiplication factor, for example for GPA, increases/decreases (here decreases) the odds of outcome by a factor of 0.9956827 for every unit of GRE.
For example if someone has GRE 500, for every 1 unit increase of their GPA then their odds of admission changes by a factor of \(29.01805 \times 0.9956827 ^ {500} = 3.3354\)
Or we can first calculate \(3.367918022 + 500 \times -0.004326605\) and then we can take the exponentiation.
There are more on logistic regression analysis and we do not have time to go over them.
\[Pr(Y = y) = \dfrac{e^{-\mu}(\mu)^y}{y!}, \text{ for } y = 0, 1, 2, \dots .\]
The mean of Poisson distribution is \(\mu\) and the variance of it is also \(\mu\).
In Poisson, we only have one parameter. The mean and variance of Poisson distribution are equal.
In a large number of Bernoulli trial with a very small probability of success the total number of success can be approximated by a Poisson distribution. This is called the law of rare events.
It can be shown that a Binomial distribution will converge to a Poisson distribution as \(n\) goes to infinite and \(p\) goes to zero such that \(np \rightarrow \mu\)
Suppose number of accidents in a sunny day in freeway 405 between Santa Monica and Wilshire blvd is Poisson with the rate of 2 accidents per day, then probability of 0, 1, 2, …, 8 accidents can be calculated using the PMF of Poisson:
\(P(Y = 0| \mu = 2) = \dfrac{e^{-2}(2)^0}{0!} = 1/e^2 \approx 0.1353\)
\(P(Y = 1| \mu = 2) = \dfrac{e^{-2}(2)^1}{1!} = 2/e^2 \approx 0.2707\)
\(P(Y = 2| \mu = 2) = \dfrac{e^{-2}(2)^2}{2!} = 2/e^2 \approx 0.2707\)
\(P(Y = 3| \mu = 2) = \dfrac{e^{-2}(2)^3}{3!} = 8/(6e^2) \approx 0.1804\)
\(P(Y = 4| \mu = 2) = \dfrac{e^{-2}(2)^4}{4!} = 16/(24e^2) \approx 0.0902\)
\(P(Y = 5| \mu = 2) = \dfrac{e^{-2}(2)^5}{5!} = 32/(120e^2) \approx 0.0361\)
\(P(Y = 6| \mu = 2) \approx 0.0120\)
\(P(Y = 7| \mu = 2) \approx 0.0034\)
\(P(Y = 8| \mu = 2) \approx 0.0009\)
The following graph shows the plot of PMF of a Poisson random variable with \(\mu = 2\)
This looks like binomial but not exactly the same!
Now assume number of accidents in a rainy day in freeway 405 between Santa Monica and wilshire blvd is Poisson with the rate of 5.5 accidents per day, then probability of each \(x = 0, 1, 2, \cdots, 8\) accidents using R is:
[1] 0.004086771 0.022477243 0.061812418 0.113322766 0.155818804 0.171400684
[7] 0.157117294 0.123449302 0.084871395
We can also say there is 2.75 times increase in the mean of accidents per day for a rainy day compare to a sunny day.
A sample of 20 number will be look like this:
[1] 6 2 2 2 2 4 1 1 2 0 2 1 4 2 1 4 1 2 1 0
[1] 2
[1] 2.210526
Now lets add some zeroes and some larger values to our sample
[1] 2.213115
[1] 14.07049
We can see for sample y2, the variance is larger than mean. This is called Overdispersion.
The way we draw our sample is close to Poisson with \(\mu = 2\) but not exactly. If we want to sample from a true Poisson we can use rpois()
function in R.
Poisson distribution for unit time and time t.
The basic Poisson distribution assumes that each count in the distribution occurs over a unit of time. However, we often count events over a period of time or over various areas. For example if the number of accidents occur on average 2 times per day, over the course of a month (30 days) the average of accident will be \(2 \times 30 = 60\). The rate of accidents per day still is 2 accident per day. The rate parameterization of Poisson distribution will be:
\[Pr(Y = y) = \dfrac{e^{-t\mu}(t\mu)^y}{y!}, \text{ for } y = 0, 1, 2, \dots .\] Where \(t\) represents the length of time. We will talk about this when we talk about exposure in a Poisson regression.
Count data consists of observations that take non-negative integer values (0, 1, 2, …).
The range of counts extends from zero to some unknown upper limit.
Theoretically, counts can range from zero to infinity, but in practice, they are always bounded by a reasonable maximum.
Types of Count Data
The number of occurrences of an event within a given time period.
The number of occurrences of an event within a specific geographic or spatial area.
The count of individuals affected by a particular condition, often adjusted for population size.
Examples of count data:
The number of accidents on a highway in a day.
The number of patients who die within 48 hours of a myocardial infarction in a hospital.
The count of people diagnosed with a specific disease, adjusted for population size.
Can we model the count response as a continuous random variable and by using ordinary least square estimate the parameters?
There are two problem with this approach:
In linear OLS model we may get the mean response as a negative value. This is not correct for the count data.
In linear OLS model we assume the variance of response for any given predictor is constant. This may not be correct for count data and in fact almost all the time it is not.
Poisson regression model is the standard model for count data.
In Poisson regression model we assume that the response variable for a given set of predictor variables is distributed as a Poisson distribution with the parameter \(\mu\) depends of values of predictors.
Since parameter \(\mu\) should be greater than zero for \(p\) number of predictors, we use the log function as link function.
Nearly all of the count models have the basic structure of the linear model. The difference is that the left-hand side of the model equation is in the log form.
Formalization of count model
A count model with one predictor is write as:
\[\text{log of mean response} = \ln(\mu) = \beta_0 + \beta x\]
From this model the expected count or mean response will be:
\[\text{mean response} = \mu =E(y|x) = \exp( \beta_0 + \beta_1 x)\] This ensures \(\mu\) will be positive for any value of \(\beta_0\), \(\beta_1\), or \(X\).
The Poisson model
The Poisson model is the model that the count \(y|x\) has a poisson distribution with the above mean. For an outcome \(y\) and single predictor \(x\) we have:
\[ f(y | x; \beta_0, \beta_1) = \dfrac{e^{-\exp( \beta_0 + \beta x)}(\exp( \beta_0 + \beta x))^{y}}{y!}\]
Plot of distribution of the response for Poisson regression model will be:
Back to our toy example of number of accidents in 405. Let \(x\) is a binary variable of 0 for sunny day and 1 for rainy day.
We have a sample of 20 days with 10 rainy and 10 sunny days from Poisson with rates 5 and 2 respectively.
In fact we can artificially generate this sample using R:
y <- c(rpois(20, lambda = 2), rpois(20, lambda = 5))
x <- rep(c(0,1), each = 20)
print(data.frame(y = y,x = x))
y x
1 2 0
2 3 0
3 0 0
4 4 0
5 2 0
6 5 0
7 1 0
8 3 0
9 4 0
10 1 0
11 3 0
12 2 0
13 2 0
14 0 0
15 2 0
16 6 0
17 2 0
18 1 0
19 2 0
20 1 0
21 9 1
22 10 1
23 1 1
24 4 1
25 6 1
26 5 1
27 5 1
28 7 1
29 8 1
30 6 1
31 6 1
32 7 1
33 6 1
34 5 1
35 5 1
36 6 1
37 6 1
38 4 1
39 8 1
40 5 1
Call:
glm(formula = y ~ x, family = poisson(link = "log"))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8329 0.1474 5.649 1.61e-08 ***
x 0.9505 0.1736 5.475 4.38e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 69.616 on 39 degrees of freedom
Residual deviance: 36.173 on 38 degrees of freedom
AIC: 160.7
Number of Fisher Scoring iterations: 5
The coefficient now are in the log metric.
(Intercept) x
2.300000 2.586957
This part is just for your information and is optional.
If we have more than one predictors for ith observation we will have:
\[\mu_i = E(y_i | x_{i1} , x_{i2}, \dots, x_{ip};\beta) = \exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots+ \beta_p x_{ip})\]
For simplicity we use \(\boldsymbol{x'\beta}_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots+ \beta_p x_{ip}\) and therefore we have
\[P(Y = y_i|\boldsymbol{x}_i, \boldsymbol{\beta}) = \dfrac{e^{-\exp(\boldsymbol{x'\beta}_i)}(\exp(\boldsymbol{x'\beta}_i))^y}{y!}\] and log-likelihood function will be:
\[\cal L(\boldsymbol{\beta}) = \sum_{i=1}^n [y_i \boldsymbol{x'\beta}_i - \exp(\boldsymbol{x'\beta}_i) - \ln y_i!]\]
For this part We will use German national health registry data set as an example of Poisson regression. The data set is also exist in package COUNT in R. We are going to use only year 1984.
Source: German Health Reform Registry, years pre-reform 1984-1988, From Hilbe and Greene (2008)
The data can be access by installing package COUNT
.
The response variable is the number of visit.
We are using outwork and age as predictors. outwork is a binary variable with 1 = not working and 0 = working. age is a continuous variable range 25-64.
Since ages are range from 25 to 64 and age 0 is not a valid value for age, to interpret the intercept we center variable age.
R code for loading the data and descriptive statistics
docvis hospvis edlevel age outwork female married kids hhninc educ self
1 1 0 3 54 0 0 1 0 3.050 15.0 0
2 0 0 1 44 1 1 1 0 3.050 9.0 0
3 0 0 1 58 1 1 0 0 1.434 11.0 0
4 7 2 1 64 0 0 0 0 1.500 10.5 0
5 6 0 3 30 1 0 0 0 2.400 13.0 0
6 9 0 3 26 1 0 0 0 1.050 13.0 0
edlevel1 edlevel2 edlevel3 edlevel4
1 0 0 1 0
2 1 0 0 0
3 1 0 0 0
4 1 0 0 0
5 0 0 1 0
6 0 0 1 0
docvis hospvis edlevel age
Min. : 0.000 Min. : 0.0000 Min. :1.00 Min. :25
1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:1.00 1st Qu.:35
Median : 1.000 Median : 0.0000 Median :1.00 Median :44
Mean : 3.163 Mean : 0.1213 Mean :1.38 Mean :44
3rd Qu.: 4.000 3rd Qu.: 0.0000 3rd Qu.:1.00 3rd Qu.:54
Max. :121.000 Max. :17.0000 Max. :4.00 Max. :64
outwork female married kids
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :1.0000 Median :0.0000
Mean :0.3665 Mean :0.4793 Mean :0.7891 Mean :0.4491
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
hhninc educ self edlevel1
Min. : 0.015 Min. : 7.00 Min. :0.00000 Min. :0.0000
1st Qu.: 2.000 1st Qu.:10.00 1st Qu.:0.00000 1st Qu.:1.0000
Median : 2.800 Median :10.50 Median :0.00000 Median :1.0000
Mean : 2.969 Mean :11.09 Mean :0.06118 Mean :0.8136
3rd Qu.: 3.590 3rd Qu.:11.50 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :25.000 Max. :18.00 Max. :1.00000 Max. :1.0000
edlevel2 edlevel3 edlevel4
Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
Median :0.0000 Median :0.0000 Median :0.00000
Mean :0.0524 Mean :0.0746 Mean :0.05937
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000 Max. :1.00000
glm
to run Poisson model. For family
option we use poisson(link = "log")
.And if we use centered age:
#running glm
pois_m <- glm(docvis ~ outwork + cage, family = poisson(link = "log"), data = rwm1984)
#extract the outcome by summary
summary(pois_m)
Call:
glm(formula = docvis ~ outwork + cage, family = poisson(link = "log"),
data = rwm1984)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9380965 0.0127571 73.53 <2e-16 ***
outwork 0.4079314 0.0188447 21.65 <2e-16 ***
cage 0.0220842 0.0008377 26.36 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 25791 on 3873 degrees of freedom
Residual deviance: 24190 on 3871 degrees of freedom
AIC: 31279
Number of Fisher Scoring iterations: 6
Coefficients are in log metric and we can directly interpret them in term of log response. In general we say: Log mean of the response variable increases by \(\beta\) for a one unit increase of the corresponding predictor variable, other predictors are held constant.
In our results, the expected log number of visits for patients who are out of work is 0.4 more than log number of visits for patients who are working with the same age.
For each one year increase in age, there is an increase in the expected log count of visit to the doctor of 0.022, holding outwork the same.
In order to have a change in predictor value reflect a change in actual visits to a physician, we must exponentiate the coefficient and confidence interval of coefficients.
\[ log(\mu_1) -log(\mu_0) = \log(\dfrac{\mu_1}{\mu_0}) = \beta_0 + \beta_1 (\text{at work = 1}) + \beta_2 \text{ age} - (\beta_0 + \beta_1 (\text{at work = 0}) + \beta_2 \text{ age}) = \beta_1\]
So, \(\exp(0.4079314) = 1.503704\) means that on avarage we expect that the number of doctor visit for a person at work to be 50% more than a person not at work having the same age.
The standard error can approximately calculated by delta method.
Delta Method:
\[se(\exp(\beta)) = exp(\beta) * se(\beta)\]
(Intercept) outwork cage
2.555113 1.503704 1.022330
(Intercept) outwork cage
0.0325958199 0.0283368154 0.0008564317
2.5 % 97.5 %
(Intercept) 2.492018 2.619805
outwork 1.449178 1.560282
cage 1.020653 1.024010
# sum of fit statistics for poisson model
pr <- sum(residuals(pois_m, type = "pearson") ^ 2)
pr / pois_m$df.resid
[1] 11.34322
[1] 8.074027
$AIC
[1] 31278.78
$AICn
[1] 8.074027
$BIC
[1] 31297.57
$BICqh
[1] 8.07418
Marginal effect at the mean: The number of visit made to a doctor increased by 6.6 % for each year of age, when outwork is set at its mean.
Average marginal effect: For each additional year of age, there are on average about 0.07 additional doctor visit with outwork at its mean value.
Discrete change or partial effect is used to evaluate the change in predicted of the response when a binary predictor changes value from 0 to 1.
Sometimes all data points are not occur at the same period , area or volumes. In this case we can use an offset with a model to adjust for counts of events over time period, area, and volumes.
\[ \exp(\boldsymbol{x'\beta}) = \mu / t\]
\[ \mu = \exp(\boldsymbol{x'\beta} + \log(t))\] The term \(\log(t)\) is referred as offset.
In this data set:
die : number of death
cases: number of cases
anterior: if the patient had a previous anterior myocardial infarction.
hcabg: if the patient had a history of CABG procedure.
Killip: a summary indicator of the cardiovascular health of the patient.
R code for exposure
die cases anterior hcabg killip kk1 kk2 kk3 kk4
1 5 19 0 0 4 0 0 0 1
2 10 83 0 0 3 0 0 1 0
3 15 412 0 0 2 0 1 0 0
4 28 1864 0 0 1 1 0 0 0
5 1 1 0 1 4 0 0 0 1
6 0 3 0 1 3 0 0 1 0
summary(fast <- glm(die ~ anterior + hcabg +factor(killip) , family = poisson,
offset = log(cases), data = fasttrakg))
Call:
glm(formula = die ~ anterior + hcabg + factor(killip), family = poisson,
data = fasttrakg, offset = log(cases))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.0698 0.1459 -27.893 < 2e-16 ***
anterior 0.6749 0.1596 4.229 2.34e-05 ***
hcabg 0.6614 0.3267 2.024 0.0429 *
factor(killip)2 0.9020 0.1724 5.234 1.66e-07 ***
factor(killip)3 1.1133 0.2513 4.430 9.44e-06 ***
factor(killip)4 2.5126 0.2743 9.160 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 116.232 on 14 degrees of freedom
Residual deviance: 10.932 on 9 degrees of freedom
AIC: 73.992
Number of Fisher Scoring iterations: 5
(Intercept) anterior hcabg factor(killip)2 factor(killip)3
0.01708133 1.96376577 1.93746487 2.46463342 3.04434885
factor(killip)4
12.33745552
(Intercept) anterior hcabg factor(killip)2 factor(killip)3
0.002492295 0.313359487 0.632970849 0.424784164 0.765119601
factor(killip)4
3.384215172
2.5 % 97.5 %
(Intercept) 0.0128329 0.02273622
anterior 1.4363585 2.68482829
hcabg 1.0212823 3.67554592
factor(killip)2 1.7581105 3.45508317
factor(killip)3 1.8602297 4.98221265
factor(killip)4 7.2067174 21.12096274
$AIC
[1] 73.9917
$AICn
[1] 4.93278
$BIC
[1] 78.24
$BICqh
[1] 5.566187
The overdispersion is the key problem to consider in count model.
As we noted in Poisson random variable overdispersion occurs when the variance is greater than mean.
In count model, overdispersion means the variance of response condition on predictor is more than the conditional mean of response.
There are variety of tests to determine whether the model fits the data.
The first and natural test used in Poisson regression is called the deviance goodness-of-fit test.
The deviance is defined as the difference between a saturated log-likelihood and full model log-likelihood.
dev <- deviance(pois_m)
df <- df.residual(pois_m)
p_value <- 1 - pchisq(dev, df)
print(matrix(c("Deviance GOF ", " ",
"D =", round(dev,4) ,
"df = ", df,
"P-value = ", p_value), byrow = T,ncol = 2))
[,1] [,2]
[1,] "Deviance GOF " " "
[2,] "D =" "24190.3581"
[3,] "df = " "3871"
[4,] "P-value = " "0"
With a p < 0.05, the deviance GOF test indicates that we can reject the hypothesis that the model is not well fitted.
The other test statistics is pearson goodness-of-fit test. Defined as sum of square Pearson residuals.
The overdispersion can be model as different ways.
Quasi-likelihood models.
Sandwich or robust variance estimators.
Bootstrap standard errors.
Negative binomial 1 (NB1).
Negative binomial 1 (NB2).
In Poisson Mean = \(\mu\)
Variance = \(\mu\)
In Negative Binomial1 (NB1)
Mean = \(\mu\)
Variance = \(\mu + \alpha \mu\) or
Variance = \(\mu (1 + \alpha) = \mu \omega\)
In Negative Binomial2 (NB2)
Mean = \(\mu\)
Variance = \(\mu + \alpha \mu ^2\)
# Poisson Default standard errors (variance equals mean) based on MLE
poiss_mle <- glm(docvis ~ outwork + age, data = rwm1984, family=poisson())
print(summary(poiss_mle),digits=3)
Call:
glm(formula = docvis ~ outwork + age, family = poisson(), data = rwm1984)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.033517 0.039181 -0.86 0.39
outwork 0.407931 0.018845 21.65 <2e-16 ***
age 0.022084 0.000838 26.36 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 25791 on 3873 degrees of freedom
Residual deviance: 24190 on 3871 degrees of freedom
AIC: 31279
Number of Fisher Scoring iterations: 6
# Robust sandwich based on poiss_mle
library(sandwich)
#this will gives robust covariance matrix of estimated model
cov.robust <- vcovHC(poiss_mle, type="HC0")
#diagonal is the variances
se.robust <- sqrt(diag(cov.robust))
coeffs <- coef(poiss_mle)
z_.025 <- qnorm(.975)
t.robust <- coeffs / se.robust
poiss_mle_robust <- cbind(coeffs, se.robust, t.robust, pvalue=round(2*(1-pnorm(abs(coeffs/se.robust))),5),
lower=coeffs - z_.025 * se.robust, upper=coeffs+ z_.025 * se.robust)
print(poiss_mle_robust,digits=3)
coeffs se.robust t.robust pvalue lower upper
(Intercept) -0.0335 0.12333 -0.272 0.786 -0.2752 0.2082
outwork 0.4079 0.06907 5.906 0.000 0.2726 0.5433
age 0.0221 0.00283 7.808 0.000 0.0165 0.0276
# Poisson Bootstrap standard errors (variance equals mean)
# install.packages("boot")
library(boot)
boot.poiss <- function(data, indices) {
data <- data[indices, ]
model <- glm(docvis ~ outwork + age, family=poisson(), data=data)
coefficients(model)
}
set.seed(10101)
# To speed up we use only 100 bootstraps.
summary.poissboot <- boot(rwm1984, boot.poiss, 400)
print(summary.poissboot,digits=3)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = rwm1984, statistic = boot.poiss, R = 400)
Bootstrap Statistics :
original bias std. error
t1* -0.0335 0.007078 0.12531
t2* 0.4079 0.000811 0.07318
t3* 0.0221 -0.000168 0.00294
# Poisson with NB1 standard errors (assumes variance is a multiple of the mean) w = theta*mu or w = mu*(1 + alpha)
#also called Poisson Quasi-MLE
poiss_qmle <- glm(docvis ~ outwork + age, data = rwm1984, family=quasipoisson())
print(summary(poiss_qmle),digits=4)
Call:
glm(formula = docvis ~ outwork + age, family = quasipoisson(),
data = rwm1984)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.033517 0.131962 -0.254 0.8
outwork 0.407931 0.063468 6.427 1.46e-10 ***
age 0.022084 0.002821 7.827 6.39e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 11.34324)
Null deviance: 25791 on 3873 degrees of freedom
Residual deviance: 24190 on 3871 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 6
# MASS does just NB2
# install.packages("MASS")
library(MASS)
# Note: MASS reports as the overdispersion parameter 1/alpha not alpha
# In this example R reports 0.9285 and 1/0.9285 = 1.077
# NB2 with default standard errors
NB2.MASS <- glm.nb(docvis ~ outwork + age, data = rwm1984)
print(summary(NB2.MASS),digits=3)
Call:
glm.nb(formula = docvis ~ outwork + age, data = rwm1984, init.theta = 0.4353451729,
link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.0396 0.1067 -0.37 0.71
outwork 0.4147 0.0554 7.48 7.5e-14 ***
age 0.0221 0.0024 9.24 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(0.4353) family taken to be 1)
Null deviance: 4100.8 on 3873 degrees of freedom
Residual deviance: 3909.6 on 3871 degrees of freedom
AIC: 16674
Number of Fisher Scoring iterations: 1
Theta: 0.4353
Std. Err.: 0.0135
2 x log-likelihood: -16665.5250
The R code below demonstrates the use of Weighted Least Squares (WLS) and explores its relationship with Generalized Linear Models (GLMs).
Weighted least square
# Set the random seed for reproducibility
set.seed(101)
# Create predictor variable x
x <- seq(1, 3.5, 0.1)
x <- sort(rep(x, 10)) # Replicate values for simulation
# Define expected values of y
Ey <- 220 - 60 * x
# Generate response variable y from a Poisson distribution
y <- rpois(length(Ey), Ey)
# Fit an Ordinary Least Squares (OLS) model
m <- lm(y ~ x)
# Plot the data and OLS regression line
plot(x, y, xlab = "Price", ylab = "Number Sold")
abline(m)
# Residual plot for OLS model
plot(predict(m), resid(m), xlab = "Predicted Values", ylab = "Residuals")
abline(h = 0)
# Estimate weights for WLS (inverse of predicted values)
W <- 1 / predict(m)
# Fit a Weighted Least Squares (WLS) model
Wm <- lm(y ~ x, weights = W)
# Residual plot for WLS model
plot(predict(Wm), sqrt(W) * resid(Wm), xlab = "Predicted Values", ylab = "Weighted Residuals")
abline(h = 0)
# Fit a Poisson regression model (GLM)
glp.m <- glm(y ~ x, family = poisson(link = "identity"))
# Compare the models
summary(m)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-29.2760 -5.1955 -0.3457 4.6567 24.8014
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 222.4431 1.7288 128.67 <2e-16 ***
x -60.8978 0.7289 -83.55 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.815 on 258 degrees of freedom
Multiple R-squared: 0.9644, Adjusted R-squared: 0.9642
F-statistic: 6980 on 1 and 258 DF, p-value: < 2.2e-16
Call:
lm(formula = y ~ x, weights = W)
Weighted Residuals:
Min 1Q Median 3Q Max
-2.83173 -0.66938 -0.02412 0.58847 2.97862
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 223.055 1.751 127.4 <2e-16 ***
x -61.170 0.597 -102.5 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9548 on 258 degrees of freedom
Multiple R-squared: 0.976, Adjusted R-squared: 0.9759
F-statistic: 1.05e+04 on 1 and 258 DF, p-value: < 2.2e-16
Call:
glm(formula = y ~ x, family = poisson(link = "identity"))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 223.0681 1.8305 121.86 <2e-16 ***
x -61.1756 0.6227 -98.24 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 7382.88 on 259 degrees of freedom
Residual deviance: 236.76 on 258 degrees of freedom
AIC: 1817.1
Number of Fisher Scoring iterations: 4
The following R code estimates logistic regression coefficients using an iterative re-weighted least squares (IRLS) approach and compares it to a standard GLM.
# Response variable (admission outcome)
y <- mydata$admit
# Predictor variable (GRE scores)
x <- mydata$gre
# Initialize values
mu <- rep(mean(y), nrow(mydata)) # Initial fitted values
eta <- log(mu / (1 - mu)) # Initial linear predictor
# Iterate re-weighted least squares for 10 iterations
for (i in 1:10) {
w <- mu * (1 - mu) # Compute weights (variance)
z <- eta + (y - mu) / w # Adjusted response
mod <- lm(z ~ x, weights = w) # Weighted regression
eta <- mod$fitted.values # Update linear predictor
mu <- 1 / (1 + exp(-eta)) # Update fitted values
cat(i, coef(mod), "\n") # Print iteration results
}
1 -2.783528 0.003434139
2 -2.899506 0.003579592
3 -2.901344 0.003582211
4 -2.901344 0.003582212
5 -2.901344 0.003582212
6 -2.901344 0.003582212
7 -2.901344 0.003582212
8 -2.901344 0.003582212
9 -2.901344 0.003582212
10 -2.901344 0.003582212
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.901344270 0.6068541966 -4.780958 2.459924e-06
x 0.003582212 0.0009873336 3.628167 3.226024e-04
2.5 % 97.5 %
(Intercept) -4.094384619 -1.708303920
x 0.001641171 0.005523253
# Fit standard logistic regression using GLM
glm.mod <- glm(admit ~ gre, family = binomial, data = mydata)
# Model summary and confidence intervals
coef(summary(glm.mod))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.901344270 0.6060376103 -4.787400 1.689561e-06
gre 0.003582212 0.0009860051 3.633056 2.800845e-04
2.5 % 97.5 %
(Intercept) -4.119988259 -1.739756286
gre 0.001679963 0.005552748