Office of Advanced Research and Computing (OARC), Statistical Methods and Data Analysis

Chart below shows examples of Generalized Linear Models (GLM)

Obviously it is not realistic to be able to cover all of the models in the chart in 3 hours.

In this workshop we will go over the most important aspects of GLM and we will go over Logistic Regression, Poisson Regression and, briefly, Negative binomial model with examples using R.

Before we explore the generalized regression models, it is a good idea to start with linear regression model to understand why and how we extend the linear regression model to generalized linear regression models.

- Regression Analysis is a statistical tools to explain relationship between a response (or dependent) variable as a function of one or more predictor (independent) variables.

We use \(x\) for the predictor and \(y\) for the response variable.

In

**Simple regression model**we observe \(n\) pairs of observations \((x_i, y_i)\), \(i = 1, 2, \cdots, n\),randomly (and most of the time independently) sampled from our population of interest for each individual or case (unit of observations). We would like to investigate the relationship between \(x\) and \(y\).In a

**Linear Regression Model**we assume this relationship is a**linear**function.We consider a simple linear model (a linear model with only one predictor) as:

\[y = \beta_0 + \beta_1 x + \epsilon\] - The quantities \(\beta_0\) and \(\beta_1\) are called model parameters and they are estimated from the data, where \(\beta_0\) is the intercept of the line, \(\beta_1\) is the slope of the line and \(\epsilon\) is the error term that represent unexplained uncertainty.

We can extend the model above to a model with multiple predictors then the model will be:

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots +\beta_p x_p + \epsilon\]

Here, \(\beta_0\) is the intercept, and each \(\beta_j\) is the slope for predictor \(x_j\) for \(j = 1,2, \cdots, p\).

\(\beta_0, \beta1, \cdots, \beta_p\) are also called coefficients of the model.

I use \(j\) index for each predictors and \(i\) for each observation.

There is another parameter of the model that we will estimate. and that is the variance of error, \(var(\epsilon)\) or \(\sigma^2\).

Linear regression model under some assumptions is estimated by Ordinary Least Square (OLS) method. In OLS we estimate the parameters of the model by minimizing sum of squares of errors. OLS has an algebraic solution and the estimated parameters can be drive analytically.

The regression model has two part, a linear function and an error term.

\[y = \overbrace{\beta_0 + \beta_1 x }^\text{linear function}+ \overbrace{\epsilon}^\text{error}\]

The error part represent the uncertainty in the observation and is a random variable with mean equal to zero and unknown variance \(\sigma ^2\).

In OLS we make some statistical assumptions to be able to

1- Estimate the model parameters(i.e. intercept and slope of the fitted line and a measure of uncertainty)

2- Make inference about model parameters.

- In summary assumptions of the OLS are:

1 - Errors have mean equal to zero.

2 - Errors are independent.

3 - Errors have equal or constant variance (Homoscedasticity or Homogeneity of variance).

4 - Errors follow a normal distribution.

Briefly, the assumption of OLS is that the error term in the model are independent with identically distributed random variables from a normal distribution with mean equal to zero for each randomly sampled outcomes \(y\) for any given value of \(x\).

The linear function predicts the mean or average value of the outcome for a given value of predictor. The mean value in mathematical statistics is noted by E, Expected value. We show this by \(\mu_{y|x} = \text{E}(y | x)\) which reads the expected value of \(y\) given \(x\).

In summary and for simplicity we write:

\[\mu = \text{E}(y |x) = \beta_0 + \beta_1 x\]

- In general, when we have multiple predictors, \(x_1, x_2, \cdots, x_p\), The predicted mean is:

\[\mu = \text{E}(y | x_1, x_2 , \cdots x_p) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\] *note: The predictor variable may be continuous, meaning that it may assume all values within a range, for example, age or height. Or it may be dichotomous, meaning that the variable may assume only one of two values, for example, 0 or 1 or a categorical variable with more than two levels. There is only one response or dependent variable, and it is assumed to be continuous.

In Regression model, a linear prediction is a linear combination of predictors.

The linear prediction can get values from \(-\inf\) to \(+\inf\).

Questions that leads us to a more general linear regression model are:

- What should we do if we do not have a continuous outcome and the error does not have equal variance. What if the predicted value from the linear model is not plausible because of the range of the outcome?

For this example I used the elementary school api score data available in the OARC website.

- We want to study the relationship between academic performance
`api00`

and number of students enrolled in schools,`enroll`

, the first 6 observations look like this:

```
#reading the data from OARC website
dat <- read.csv(
"https://stats.oarc.ucla.edu/wp-content/uploads/2019/02/elemapi2v2.csv")
head(dat[,c("api00", "enroll")])
```

```
## api00 enroll
## 1 693 247
## 2 570 463
## 3 546 395
## 4 571 418
## 5 478 520
## 6 858 343
```

We think the relationship between \(y\) (The response) and \(x\) (The predictor) might be explained by a line.

We do not know the true relationship between \(y\) and \(x\)

In R we use function `lm`

to fit (estimate) the linear model and to extract the result from the `lm`

object we use `summary`

.

```
##
## Call:
## lm(formula = api00 ~ enroll, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -285.50 -112.55 -6.70 95.06 389.15
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 744.25141 15.93308 46.711 < 2e-16 ***
## enroll -0.19987 0.02985 -6.695 7.34e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 135 on 398 degrees of freedom
## Multiple R-squared: 0.1012, Adjusted R-squared: 0.09898
## F-statistic: 44.83 on 1 and 398 DF, p-value: 7.339e-11
```

The estimated intercept is 744.25 and The estimated slope is about -0.2.

Intercept means that for a school with zero number of enrollment the expected api score is 744.25 on average. Usually we do not interpret the intercept and for example in here the expected api score for school with zero number of student does not make sense. If we want to be able to interpret the slope we can center the number of enrollment to their mean.

The slope is equal to -0.2 means that for every unit of additional enrollment the expected api score decreases by -0.2 or for every 10 more number of enrollment, the api score decreases by -2. The p-value for the hypothesis of slope equal to zero shows that; if we assume there is no effect of enrollment on school performance then this decrease of score is not likely by chance.

- in R we can use function
`plot`

to graph the scatter plot. I used function`abline(m1)`

after scatter plot to plot the regression line.

The goal is to estimate parameters \(\beta_0\) and \(\beta_1\),

**intercept**and the**slope**of the line, in a way that the line fit the best with the data.Estimate model parameters are noted as \(\hat{\beta}_0\) and \(\hat{\beta}_1\).

The estimated model will be

\[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\]

\(\hat{y}\) represent the predicted mean of the outcome for a given value of \(x\). We also call it fitted value.

The

**predicted value**for each observation can be calculated by the linear line as: \(\hat{y_i} = \hat{\beta}_0 + \hat{\beta}_1 x_i\)The difference between observed value of \(y_i\) and the predicted value \(\hat{y_i}\) is called

**residual**which is:

\[ e_i = y_i - \hat{y_i}\]

- In the OLS we estimate the parameters of the model (intercept and slope) by minimizing sum of square of residuals.

\[ \min \sum_{i = 1}^{n} (y_i - \hat{y_i}) ^ 2\]

This method is called Least Square (LS).

Let’s look at the components of uncertainty in the linear model by plotting the fitted line in the scatter plot.

In the plot above,

\(y_i - \bar{y}\) is overall or total uncertainty which we try to explain by the linear model.

\(\hat{y}_i - \bar{y}\) is the uncertainty that can be explain by the linear model.

\(y_i - \hat{y}_i\) is the residual or uncertainty that cannot be explain by the linear model.

In linear regression, the likelihood function (possibility of observing the outcome given parameters) can be determined using *normal distribution*. In fact, we can use another statistical method to estimate parameters of the model, which is called *Maximum Likelihood Method*. In OLS the Maximum Likelihood Method is equivalent to Least square method.

- We will look at model diagnostics plots to evaluate the model assumptions.

This plot is showing us that there is not strong evidence to reject assumptions of zero mean and constant variance of error.

- Recall linear regression model where the outcome is continues and error has normal distribution, the linear prediction was.

\[ \mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\]

There are cases that the outcome for a given set of predictor can only gets positive values or values in the interval \((0, 1)\)

If the outcome is from a count distribution where it is count number \(0, 1, 2, \cdots\), like Poisson distribution, the mean is always positive (\(\mu > 0\)).

If the outcome is dichotomous and can get only value 0 and 1 (Bernoulli distribution), the mean is between 0 and 1.

-In this two example the linear prediction is not working correctly.

In generalized linear regression we use a link function to transform linear prediction to get a plausible expected outcome.

for example if the mean is in the interval \((0, 1)\) then we can use

**logit**function to transfer a value between 0 and 1 to a value between \(-\infty\) and \(+\infty\). The chart below shows the steps:

- The table below (from wikipedia page for GLM) shows link functions for different distributions and regression models.

In the next slides we look at this two functions a little more

Suppose the probability of getting admitted to a graduate school on average is 0.3.

Then We define the odds of getting to a graduate school to be:

\[ \tt{odds(admit)} = \dfrac{p}{1 - p} = \dfrac{0.3}{1 - 0.3} = 0.4285714 \]

- and odds of not getting admitted to a graduate school will be:

\[ \tt{odds(not \ admit)} = \dfrac{p}{1 - p} = \dfrac{0.7}{1 - 0.7} = 2.333333 \]

As we can see odds is a positive value between 0 and infinity. Thus, the inverse of odd is a number between 0 and 1.

Question: CNN analysts predict that Los Angeles Lakers has odds of 9 to 1 to go to the play off. What is the probability that the Los Angeles Lakers make the play off?

\(odds = \dfrac{p}{1 - p}\)

Lets calculate probability of Lakers goes to play off. From the odds formula:

\(p = odds * (1 - p)\) and \(p + p \times odds = odds\)

\(p = \dfrac{odds} {1 + odds} = 9 / (9 + 1) = 0.9\)

Another useful function is log. If we have a value between zero and infinity then the log of that value is between negative and positive infinity.

The inverse log function is an exponential function. We usually use natural log and exponential function.

We can use log(odds) transformation to transform a value between zero and one to a value between negative infinity and positive infinity. This function, log(odds) is called logit function.

Exponential function plays a big rule in generalized linear model and understanding its mathematical properties is needed to really understand generalized linear models.

For exampel, for \(10^3 = 1000\),

\(\log_{10}(1000) = 3\)

We say the logarithm base 10 of 1000.

\(\log_{2}(64) = 6\), so

\(2^6 = 64\).

\[\displaystyle e=\sum \limits _{n=0}^{\infty }{\frac {1}{n!}}={\frac {1}{1}}+{\frac {1}{1}}+{\frac {1}{1\times 2}}+{\frac {1}{1\times 2\times 3}}+\cdots \approx 2.718282.\]

Logarithm with the base equal to \(e\) is called the natural logarithm.

If \(e^6 = a\) Then \(\log_e(a) = ?\)

The answer is 6.

\(e^x\) can also noted as \(\exp(x)\). \(\log_e(x)\) is also shown as \(\ln(x)\)

Since \(\exp(x + y) = \exp(x)\exp(y)\)

\(\log(xy) = \log(x) + \log(y)\) and,

\(\log(x/y) = \log(x) - \log(y)\)

In linear regression models we assumed the errors are distributed from a normal distribution with mean zero and variance \(\sigma^2\).

We can also say the outcome for a given \(x\) (i.e. \(y|x\)) has a normal distribution with mean \(\mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\) and variance \(\sigma^2\).

This is not always true and for example in \({0, 1}\) outcome the outcome has a Bernoulli distribution and the variance depends on the mean.

In generalized linear regression models we assume a transformation (by the link function) of the mean of the outcome has a linear relationship with predictors.

In the next few slides we review some of the special distributions.

For example weight of newborn babies.

The graph below shows the plot of PDF of a normal distribution with \(\mu = 2\) and \(sd = 1\).

For example, number of time we flipped a coin until we observe a head for the first time. This value could be in the set of \(\{1, 2, 3, ...\}\).

Sum of outcomes from rolling two dice as a random variable that takes values in the set of \(\{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\}\).

Number of patients hospitalized daily for COVID 19 in a healthcare facility.

** Probability Mass Function **

For example the PMF of a random variable of outcome of rolling a dice with outcome of \(\{1, 2, 3, 4, 5, 6\}\) is:

\(p(X = x) = 1/6\)

for \(x \in \{1, 2, 3, 4, 5, 6\}\).

Bernoulli random variable is a random variable that can get only two possible outcome, 1 with probability \(p\) and 0 with probability \(q = 1 - p\).

The PMF of Bernoulli with zero and one outcomes is:

\[f(x;p)=\begin{cases} 1 - p & {\text{if }}x=0,\\p & {\text{if }}x=1.\end{cases}\]

the mean of a Bernoulli random variable is:

\(\mu_X = E(X) = 1 \times p + 0 \times (1 - p) = p\)

and the variance is

\(\sigma^2_X = var(X) = E(X- \mu)^2 = (1 - p) ^ 2 \times p + (0 - p) ^ 2 \times (1 - p) \\ = p(1 - p) ( 1 - p + p) = p(1 - p)\)

The PMF of Binomial(n, p) is: (optional) \[\displaystyle f(x; n,p)=\Pr(x;n,p)=\Pr(X=x)=\frac{n!}{x!(n-x)!} p^{x}(1-p)^{n-x}\] For \(x = 0, 1, 2, 3, \cdots, n\).

\(P(x = 0) = [10! / 0!(10 - 0)!] 0.2 ^ 0 (1 - 0.2) ^ {10} = 0.8 ^ {10} \approx 0.107\)

- logistic regression is used where the outcome has only 1 and 0 values.

We start with an example:

- The data that we are using here is called binary data from OARC stat website.

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don’t admit, is a binary variable.

This data set has a binary response (outcome, dependent) variable called admit.

There are three predictor variables: gre, gpa and rank.

We will treat the variables gre and gpa as continuous.

The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest.

We can get basic descriptive for the entire data set by using summary. To get the standard deviations, we use `sapply`

to apply the `sd`

function to each variable in the data set.

```
#reading the data from OARC stat website:
mydata <- read.csv("https://stats.oarc.ucla.edu/stat/data/binary.csv")
#summary statistics of variables
summary(mydata)
```

```
## admit gre gpa rank
## Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
## Median :0.0000 Median :580.0 Median :3.395 Median :2.000
## Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
## 3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
## Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
```

```
## admit gre gpa rank
## 0.4660867 115.5165364 0.3805668 0.9444602
```

This is also can be seen as regression model with intercept only model.

Variable admit is binary and it has value of 1 if a person admitted and 0 for not admitted. For the variable admit, the mean is the same as proportion of 1 or estimated probability of getting admit in a grad school.

\[ \hat{p} = \dfrac{X}{n} = \dfrac{\sum_{i=1}^nx_i}{n} \]

Where \(X\) is the total number of people admitted and \(n\) is total number of students and \(x_i\) is value of admit for the \(i\)-th student.

We can calculate standard deviation directly from the mean.

Note that this variable is binary and assuming a binary outcome is a result of a Bernoulli trial then, \(\hat\mu_{admit} = \hat p = \sum_{i=1}^{n} x_i /n\)

Since in Bernoulli distribution the variance is \(var(x) = p(1-p)\), we can use this formula to calculate the variance.

Remember in OLS, where the error assumed to be normally distributed and so the outcome, the variance assumed to be constant but in here the variance depends on the predicted mean value.

Of course not in the intercept only model because the expected mean is constant!

Now run a regression model of admit predicted by gre, assuming admit is a continues variable.

We are trying to estimate the mean of admit for given value of gre. Since the admit is 0 and 1 this can be seen as probability of admit.

```
##
## Call:
## lm(formula = admit ~ gre, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4755 -0.3415 -0.2522 0.5989 0.8966
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1198407 0.1190510 -1.007 0.314722
## gre 0.0007442 0.0001988 3.744 0.000208 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4587 on 398 degrees of freedom
## Multiple R-squared: 0.03402, Adjusted R-squared: 0.03159
## F-statistic: 14.02 on 1 and 398 DF, p-value: 0.0002081
```

As we can see the variance of the predicted outcome is systematically related to the predicted outcome and thus is not constant.

Another issue with this model is that:

In linear model we allow the predicted outcome to be any number between negative infinity to positive infinity.

But the probability can not be a negative number or a number greater than zero. So, we need to use a transformation function such that we can relate the outcome to a value between 0 and 1.

To overcome this issue and also the issue of equality of variance in OLS we need to use a different model.

- In the logistic regression we model the log of odds of the outcome by linear function of predictors:

\[ log(odds | x_1, x_2 , \cdots x_p) = \beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p\]

Or,

\[\frac{p}{1-p} = e^{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}\]

or,

\[p = \frac{e^{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}}{1+e^{(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}} \] Or,

\[p = \frac{1} {1 + e^{-(\beta_0 + \beta_1 x1 + \beta_2 x_2 + \beta_3 x_3 + ... + \beta_p x_p)}}\]

Notice that we do not need to use \(\epsilon\) to indicate the uncertainty of the outcome because the probability itself is the expected of the outcome and the outcome is one Bernoulli draw with this given probability.

In R we use function

`glm`

to run a generalized linear model to estimate the coefficients of the model. The distribution of the outcome is defined by the argument “family” with the “option”logit" for the link function.

```
#to estimate coefficient of the model we use glm with
#the option family = binomial(link = "logit")
glm.mod <- glm(admit ~ gre, family = binomial(link = "logit"), data = mydata)
summary(glm.mod)
```

```
##
## Call:
## glm(formula = admit ~ gre, family = binomial(link = "logit"),
## data = mydata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1623 -0.9052 -0.7547 1.3486 1.9879
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.901344 0.606038 -4.787 1.69e-06 ***
## gre 0.003582 0.000986 3.633 0.00028 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 499.98 on 399 degrees of freedom
## Residual deviance: 486.06 on 398 degrees of freedom
## AIC: 490.06
##
## Number of Fisher Scoring iterations: 4
```

```
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.901344270 0.6060376103 -4.787400 1.689561e-06
## gre 0.003582212 0.0009860051 3.633056 2.800845e-04
```

```
## 2.5 % 97.5 %
## (Intercept) -4.119988259 -1.739756286
## gre 0.001679963 0.005552748
```

\[ log(\text{odds of admit}) = -2.90134 + 0.00382 \text{ GRE} \]

- The coefficient of GRE (slope of the model) interpreted as: For one unit increase of GRE the log-odds of admit expected to increases by 0.003582 on average.

This is not very usefull and easy to understand. We just can see this positive relationship is not very likely by chance (p-value = 0.00028).

The intercept is the expected log odds of someone getting admitted to graduate school when their GRE score is zero.

When we exponentiate the intercept then we get the expected odds of someone with GRE equal to zero.

\(exp(-2.901344) = 0.05494932\). Transforming to probability, \(p = 0.05494932/(1+0.05494932)= 0.052\), we get the expected probability of someone with GRE equal to 0.

What if we exponentiate the slope?

Assume the someone has GRE of 500. Then the predicted log-odds based on the model will be:

\[odds_{500} = e^{-2.901344 + 0.003582 \times 500}\]

If their GRE score increases by one unit then the odds will be:

\[odds_{501} = e^{-2.901344 + 0.003582 \times (500 + 1)}\]

If we divide the latter by the former we have:

\[\frac{odds_{501}}{odds_{500}} = \frac{e^{-2.901344 + 0.003582 \times (500 + 1)}}{e^{-2.901344 + 0.003582 \times (500)}} = \frac{e^{-2.901344}\times e ^{0.003582\times 500} \times e^{(0.003582)}}{e^{-2.901344}\times e ^{0.003582\times 500}} = e^{0.003582} \]

- This number is constant and does not depend on the GRE score nor on intercept. This is actually exponentiation of the slope.

Thus,

The exponentiation of the slope is the odds ratio of admission for a unit increase of GRE score which is:

\[exp(0.003582) = 1.003588\] Or we can say the odds of getting admitted on average increase by \(0.3\%\) for one unit improvement of GRE score. This is the old GRE score and at that time it was in 10-point increments. What if someone improve their GRE by 50 unit?

- With the same calculations:

\[\frac{odds_{550}}{odds_{500}} = \frac{e^{-2.901344 + 0.003582 \times (500 + 50)}}{e^{-2.901344 + 0.003582 \times (500)}} = \frac{e^{-2.901344}\times e ^{0.003582\times 500} \times e^{50\times(0.003582)}}{e^{-2.901344}\times e ^{0.003582\times 500}} = (e^{0.003582})^{50} \]

The exponentiation function is increases multiplicative and not additive. Thus, for 50 more GRE score the odds of admission increases by a factor of \((\exp(\beta_1))^{50} = 1.003588 ^{50} = 1.196115\). In another word their adds of getting admitted increases by about \(20\%\).

- Exercise: As an exercise use the same data set and run logistic regression of admit predicted by gpa.

Calculate the exponentiated of the slope and interpret the slope of the model?

If someone increase their gpa by 0.2 how the expected odds of their admission changes?

```
glm.mod.3 <- glm(admit ~ gre + gpa, family = binomial(link = "logit"), data = mydata)
summary(glm.mod.3)
```

```
##
## Call:
## glm(formula = admit ~ gre + gpa, family = binomial(link = "logit"),
## data = mydata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2730 -0.8988 -0.7206 1.3013 2.0620
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.949378 1.075093 -4.604 4.15e-06 ***
## gre 0.002691 0.001057 2.544 0.0109 *
## gpa 0.754687 0.319586 2.361 0.0182 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 499.98 on 399 degrees of freedom
## Residual deviance: 480.34 on 397 degrees of freedom
## AIC: 486.34
##
## Number of Fisher Scoring iterations: 4
```

```
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.949378063 1.075092882 -4.603675 4.151004e-06
## gre 0.002690684 0.001057491 2.544403 1.094647e-02
## gpa 0.754686856 0.319585595 2.361455 1.820340e-02
```

```
## (Intercept) gre gpa
## 0.007087816 1.002694307 2.126945379
```

```
## 2.5 % 97.5 %
## (Intercept) -7.1092205752 -2.886726963
## gre 0.0006373958 0.004791712
## gpa 0.1347509288 1.390131131
```

the odds of getting admitted on average increases by a factor of \(2.1269\) for one unit improvement of GPA given GRE score is constant.

```
glm.mod.4 <- glm(admit ~ gre * gpa, family = binomial(link = "logit"), data = mydata)
summary(glm.mod.4)
```

```
##
## Call:
## glm(formula = admit ~ gre * gpa, family = binomial(link = "logit"),
## data = mydata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1324 -0.9387 -0.7099 1.2969 2.3519
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -13.818700 5.874816 -2.352 0.0187 *
## gre 0.017464 0.009606 1.818 0.0691 .
## gpa 3.367918 1.722685 1.955 0.0506 .
## gre:gpa -0.004327 0.002787 -1.552 0.1205
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 499.98 on 399 degrees of freedom
## Residual deviance: 477.85 on 396 degrees of freedom
## AIC: 485.85
##
## Number of Fisher Scoring iterations: 4
```

```
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -13.818700374 5.874815854 -2.352193 0.01866309
## gre 0.017464312 0.009606206 1.818024 0.06906048
## gpa 3.367918022 1.722685231 1.955040 0.05057838
## gre:gpa -0.004326605 0.002786899 -1.552480 0.12054741
```

```
## (Intercept) gre gpa gre:gpa
## 9.968153e-07 1.017618e+00 2.901805e+01 9.956827e-01
```

```
## 2.5 % 97.5 %
## (Intercept) -2.576285e+01 -2.677081148
## gre -9.057462e-04 0.036855984
## gpa 8.009849e-02 6.850511379
## gre:gpa -9.932612e-03 0.001022485
```

`## [1] 3.335476`

`## [1] 3.335407`

The interaction therm is an multiplication factor, for example for GPA, increases/decreases (here decreases) the odds of outcome by a factor of 0.9956827 for every unit of GRE.

For example if someone has GRE 500, for every 1 unit increase of their GPA then their odds of admission changes by a factor of \(29.01805 \times 0.9956827 ^ {500} = 3.3354\)

Or we can first calculate \(3.367918022 + 500 \times -0.004326605\) and then we can take the exponentiation.

There are more on logistic regression analysis and we do not have time to go over them.

- For the problem of complete or quasi-complete separation in logistic/probit see the following page:

To review of the assumptions of logistic regression you can visit this page:

The analysis is in Stata

\[Pr(Y = y) = \dfrac{e^{-\mu}(\mu)^y}{y!}, \text{ for } y = 0, 1, 2, \dots .\]

It can be shown that a Binomial distribution will converge to a Poisson distribution as \(n\) goes to infinite and \(p\) goes to zero such that \(np \rightarrow \mu\)

Suppose number of accidents in a sunny day in freeway 405 between Santa Monica and Wilshire blvd is Poisson with the rate of 2 accidents per day, then probability of 0, 1, 2, …, 8 accidents can be calculated using the PMF of Poisson:

\(P(Y = 0| \mu = 2) = \dfrac{e^{-2}(2)^0}{0!} = 1/e^2 \approx 0.1353\)

\(P(Y = 1| \mu = 2) = \dfrac{e^{-2}(2)^1}{1!} = 2/e^2 \approx 0.2707\)

\(P(Y = 2| \mu = 2) = \dfrac{e^{-2}(2)^2}{2!} = 2/e^2 \approx 0.2707\)

\(P(Y = 3| \mu = 2) = \dfrac{e^{-2}(2)^3}{3!} = 8/(6e^2) \approx 0.1804\)

\(P(Y = 4| \mu = 2) = \dfrac{e^{-2}(2)^4}{4!} = 16/(24e^2) \approx 0.0902\)

\(P(Y = 5| \mu = 2) = \dfrac{e^{-2}(2)^5}{5!} = 32/(120e^2) \approx 0.0361\)

\(P(Y = 6| \mu = 2) \approx 0.0120\)

\(P(Y = 7| \mu = 2) \approx 0.0034\)

\(P(Y = 8| \mu = 2) \approx 0.0009\)

The following graph shows the plot of PMF of a Poisson random variable with \(\mu = 2\)

This looks like binomial but not exactly the same!

Now assume number of accidents in a rainy day in freeway 405 between Santa Monica and wilshire blvd is Poisson with the rate of 5.5 accidents per day, then probability of each \(x = 0, 1, 2, \cdots, 8\) accidents using R is:

```
## [1] 0.004086771 0.022477243 0.061812418 0.113322766 0.155818804 0.171400684
## [7] 0.157117294 0.123449302 0.084871395
```

We can also say there is 2.75 times increase in the mean of accidents per day for a rainy day compare to a sunny day.

A sample of 20 number will be look like this:

```
prob <- c(0.1353, 0.2707, 0.2707, 0.1804, 0.0902, 0.0361, 0.0120, 0.0034, 0.0009)
domain <- seq(0,8,1)
set.seed(1001)
y1 <- sample(x = domain, size = 20, replace = TRUE, prob = prob)
y1
```

`## [1] 6 2 2 2 2 4 1 1 2 0 2 1 4 2 1 4 1 2 1 0`

`## [1] 2`

`## [1] 2.210526`

Now lets add some zeroes and some larger values to our sample

```
y2 <- c(y1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 5, 6, 8, 5, 6, 7, 4, 3, 6, 10, 15, 20)
mean(y2)
```

`## [1] 2.213115`

`## [1] 14.07049`

`rpois()`

function in R.
`## [1] 1.958`

`## [1] 2.038274`

** Poisson distribution for unit time and time t**

The basic Poisson distribution assumes that each count in the distribution occurs over a unit of time. However, we often count events over a period of time or over various areas. For example if the number of accidents occur on average 2 times per day, over the course of a month (30 days) the average of accident will be \(2 \times 30 = 60\). The rate of accidents per day still is 2 accident per day. The rate parameterization of Poisson distribution will be:

\[Pr(Y = y) = \dfrac{e^{-t\mu}(t\mu)^y}{y!}, \text{ for } y = 0, 1, 2, \dots .\] Where \(t\) represents the length of time. We will talk about this when we talk about exposure in a Poisson regression.

Count data are observations that have only non-negative integer values.

Count can range from zero to some grater undetermined value.

Theoretically count can range from zero to infinity but in practice they are always limited to some lesser value.

Some types of count data:

** Examples of count data:**

There are two problem with this approach:

**Formalization of count model **

A count model with one predictor can be formulize as:

\[\text{log of mean response} = \ln(\mu) = \beta_0 + \beta x\]

From this model the expected count or mean response will be:

\[\text{mean response} = \mu =E(y|x) = \exp( \beta_0 + \beta_1 x)\] This ensures \(\mu\) will be positive for any value of \(\beta_0\), \(\beta_1\), or \(X\).

** The Poisson model **

The Poisson model is the model that the count \(y|x\) has a poisson distribution with the above mean. For an outcome \(y\) and single predictor \(x\) we have:

\[ f(y | x; \beta_0, \beta_1) = \dfrac{e^{-\exp( \beta_0 + \beta x)}(\exp( \beta_0 + \beta x))^{y}}{y!}\]

Plot of distribution of the response for Poisson regression model will be:

- In contrast the plot below shows the distribution of the responses when the response has a normal distribution for a given predictor.

Back to our toy example of number of accidents in 405. Let \(x\) is a binary variable of 0 for sunny day and 1 for rainy day.

We have a sample of 20 days with 10 rainy and 10 sunny days from Poisson with rates 5 and 2 respectively.

In fact we can artificially generate this sample using R:

```
y <- c(rpois(20, lambda = 2), rpois(20, lambda = 5))
x <- rep(c(0,1), each = 20)
print(data.frame(y = y,x = x))
```

```
## y x
## 1 2 0
## 2 3 0
## 3 0 0
## 4 4 0
## 5 2 0
## 6 5 0
## 7 1 0
## 8 3 0
## 9 4 0
## 10 1 0
## 11 3 0
## 12 2 0
## 13 2 0
## 14 0 0
## 15 2 0
## 16 6 0
## 17 2 0
## 18 1 0
## 19 2 0
## 20 1 0
## 21 9 1
## 22 10 1
## 23 1 1
## 24 4 1
## 25 6 1
## 26 5 1
## 27 5 1
## 28 7 1
## 29 8 1
## 30 6 1
## 31 6 1
## 32 7 1
## 33 6 1
## 34 5 1
## 35 5 1
## 36 6 1
## 37 6 1
## 38 4 1
## 39 8 1
## 40 5 1
```

```
##
## Call:
## glm(formula = y ~ x, family = poisson(link = "log"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5166 -0.4006 -0.2024 0.4407 2.0264
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.8329 0.1474 5.649 1.61e-08 ***
## x 0.9505 0.1736 5.475 4.38e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 69.616 on 39 degrees of freedom
## Residual deviance: 36.173 on 38 degrees of freedom
## AIC: 160.7
##
## Number of Fisher Scoring iterations: 5
```

The coefficient now are in the log metric.

- Let’s exponentiate the coefficients:

```
## (Intercept) x
## 2.300000 2.586957
```

- We can find the rate of accidents for each group (sunny or rainy) directly using the mean of rainy days to sunny days.

`## [1] 2.300000 5.950000 2.586957`

This part is just for your information and is optional.

If we have more than one predictors for ith observation we will have:

\[\mu_i = E(y_i | x_{i1} , x_{i2}, \dots, x_{ip};\beta) = \exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots+ \beta_p x_{ip})\]

For simplicity we use \(\boldsymbol{x'\beta}_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots+ \beta_p x_{ip}\) and therefore we have

\[P(Y = y_i|\boldsymbol{x}_i, \boldsymbol{\beta}) = \dfrac{e^{-\exp(\boldsymbol{x'\beta}_i)}(\exp(\boldsymbol{x'\beta}_i))^y}{y!}\] and log-likelihood function will be:

\[\cal L(\boldsymbol{\beta}) = \sum_{i=1}^n [y_i \boldsymbol{x'\beta}_i - \exp(\boldsymbol{x'\beta}_i) - \ln y_i!]\]

For this part We will use German national health registry data set as an example of Poisson regression. The data set is also exist in package COUNT in R. We are going to use only year 1984.

Source: German Health Reform Registry, years pre-reform 1984-1988, From Hilbe and Greene (2008)

The data can be access by installing package

`COUNT`

.The response variable is the number of visit.

We are using outwork and age as predictors. outwork is a binary variable with 1 = not working and 0 = working. age is a continuous variable range 25-64.

Since ages are range from 25 to 64 and age 0 is not a valid value for age, to interpret the intercept we center variable age.

**R code for loading the data and descriptive statistics**

```
## docvis hospvis edlevel age outwork female married kids hhninc educ self
## 1 1 0 3 54 0 0 1 0 3.050 15.0 0
## 2 0 0 1 44 1 1 1 0 3.050 9.0 0
## 3 0 0 1 58 1 1 0 0 1.434 11.0 0
## 4 7 2 1 64 0 0 0 0 1.500 10.5 0
## 5 6 0 3 30 1 0 0 0 2.400 13.0 0
## 6 9 0 3 26 1 0 0 0 1.050 13.0 0
## edlevel1 edlevel2 edlevel3 edlevel4
## 1 0 0 1 0
## 2 1 0 0 0
## 3 1 0 0 0
## 4 1 0 0 0
## 5 0 0 1 0
## 6 0 0 1 0
```

```
## docvis hospvis edlevel age
## Min. : 0.000 Min. : 0.0000 Min. :1.00 Min. :25
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.:1.00 1st Qu.:35
## Median : 1.000 Median : 0.0000 Median :1.00 Median :44
## Mean : 3.163 Mean : 0.1213 Mean :1.38 Mean :44
## 3rd Qu.: 4.000 3rd Qu.: 0.0000 3rd Qu.:1.00 3rd Qu.:54
## Max. :121.000 Max. :17.0000 Max. :4.00 Max. :64
## outwork female married kids
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.3665 Mean :0.4793 Mean :0.7891 Mean :0.4491
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## hhninc educ self edlevel1
## Min. : 0.015 Min. : 7.00 Min. :0.00000 Min. :0.0000
## 1st Qu.: 2.000 1st Qu.:10.00 1st Qu.:0.00000 1st Qu.:1.0000
## Median : 2.800 Median :10.50 Median :0.00000 Median :1.0000
## Mean : 2.969 Mean :11.09 Mean :0.06118 Mean :0.8136
## 3rd Qu.: 3.590 3rd Qu.:11.50 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :25.000 Max. :18.00 Max. :1.00000 Max. :1.0000
## edlevel2 edlevel3 edlevel4
## Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.0524 Mean :0.0746 Mean :0.05937
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000
```

- In R we still use
`glm`

to run Poisson model. For`family`

option we use`poisson(link = "log")`

.

And if we use centered age:

```
#running glm
pois_m <- glm(docvis ~ outwork + cage, family = poisson(link = "log"), data = rwm1984)
#extract the outcome by summary
summary(pois_m)
```

```
##
## Call:
## glm(formula = docvis ~ outwork + cage, family = poisson(link = "log"),
## data = rwm1984)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4573 -2.2475 -1.2366 0.2863 25.1320
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.9380965 0.0127571 73.53 <2e-16 ***
## outwork 0.4079314 0.0188447 21.65 <2e-16 ***
## cage 0.0220842 0.0008377 26.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 25791 on 3873 degrees of freedom
## Residual deviance: 24190 on 3871 degrees of freedom
## AIC: 31279
##
## Number of Fisher Scoring iterations: 6
```

In order to have a change in predictor value reflect a change in actual visits to a physician, we must exponentiate the coefficient and confidence interval of coefficients.

- Suppose the predicted mean for number of visit when outwork is 0 is \(\mu_0\) and the predicted mean for number of visit when outwork is 1 is \(\mu_1\). Then,

\[ log(\mu_1) -log(\mu_0) = \log(\dfrac{\mu_1}{\mu_0}) = \beta_0 + \beta_1 (\text{at work = 1}) + \beta_2 \text{ age} - (\beta_0 + \beta_1 (\text{at work = 0}) + \beta_2 \text{ age}) = \beta_1\]

So, \(\exp(0.4079314) = 1.503704\) means that on avarage we expect that the number of doctor visit for a person at work to be 50% more than a person not at work having the same age.

The standard error can approximately calculated by delta method.

Delta Method:

\[se(\exp(\beta)) = exp(\beta) * se(\beta)\]

```
## (Intercept) outwork cage
## 2.555113 1.503704 1.022330
```

```
## (Intercept) outwork cage
## 0.0325958199 0.0283368154 0.0008564317
```

```
## 2.5 % 97.5 %
## (Intercept) 2.492018 2.619805
## outwork 1.449178 1.560282
## cage 1.020653 1.024010
```

```
# sum of fit statistics for poisson model
pr <- sum(residuals(pois_m, type = "pearson") ^ 2)
pr / pois_m$df.resid
```

`## [1] 11.34322`

`## [1] 8.074027`

```
## $AIC
## [1] 31278.78
##
## $AICn
## [1] 8.074027
##
## $BIC
## [1] 31297.57
##
## $BICqh
## [1] 8.07418
```

```
#marginal effect at mean
beta <- coef(pois_m)
mout <- mean(rwm1984$outwork)
mage <- mean(rwm1984$cage)
xb <- sum(beta * c(1,mout,mage))
dfdxb <- exp(xb) * beta[3]
dfdxb
```

```
## cage
## 0.06552846
```

```
## cage
## 0.06984968
```

Sometimes all data points are not occur at the same period , area or volumes. In this case we can use an offset with a model to adjust for counts of events over time period, area, and volumes.

\[ \exp(\boldsymbol{x'\beta}) = \mu / t\]

\[ \mu = \exp(\boldsymbol{x'\beta} + \log(t))\] The term \(\log(t)\) is referred as offset.

In this data set:

die : number of death

cases: number of cases

anterior: if the patient had a previous anterior myocardial infarction.

hcabg: if the patient had a history of CABG procedure.

Killip: a summary indicator of the cardiovascular health of the patient.

**R code for exposure **

```
## die cases anterior hcabg killip kk1 kk2 kk3 kk4
## 1 5 19 0 0 4 0 0 0 1
## 2 10 83 0 0 3 0 0 1 0
## 3 15 412 0 0 2 0 1 0 0
## 4 28 1864 0 0 1 1 0 0 0
## 5 1 1 0 1 4 0 0 0 1
## 6 0 3 0 1 3 0 0 1 0
```

```
summary(fast <- glm(die ~ anterior + hcabg +factor(killip) , family = poisson,
offset = log(cases), data = fasttrakg))
```

```
##
## Call:
## glm(formula = die ~ anterior + hcabg + factor(killip), family = poisson,
## data = fasttrakg, offset = log(cases))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4791 -0.5271 -0.1766 0.4337 2.3317
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.0698 0.1459 -27.893 < 2e-16 ***
## anterior 0.6749 0.1596 4.229 2.34e-05 ***
## hcabg 0.6614 0.3267 2.024 0.0429 *
## factor(killip)2 0.9020 0.1724 5.234 1.66e-07 ***
## factor(killip)3 1.1133 0.2513 4.430 9.44e-06 ***
## factor(killip)4 2.5126 0.2743 9.160 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 116.232 on 14 degrees of freedom
## Residual deviance: 10.932 on 9 degrees of freedom
## AIC: 73.992
##
## Number of Fisher Scoring iterations: 5
```

```
## (Intercept) anterior hcabg factor(killip)2 factor(killip)3
## 0.01708133 1.96376577 1.93746487 2.46463342 3.04434885
## factor(killip)4
## 12.33745552
```

```
## (Intercept) anterior hcabg factor(killip)2 factor(killip)3
## 0.002492295 0.313359487 0.632970849 0.424784164 0.765119601
## factor(killip)4
## 3.384215172
```

```
## 2.5 % 97.5 %
## (Intercept) 0.0128329 0.02273622
## anterior 1.4363585 2.68482829
## hcabg 1.0212823 3.67554592
## factor(killip)2 1.7581105 3.45508317
## factor(killip)3 1.8602297 4.98221265
## factor(killip)4 7.2067174 21.12096274
```

```
## $AIC
## [1] 73.9917
##
## $AICn
## [1] 4.93278
##
## $BIC
## [1] 78.24
##
## $BICqh
## [1] 5.566187
```

```
dev <- deviance(pois_m)
df <- df.residual(pois_m)
p_value <- 1 - pchisq(dev, df)
print(matrix(c("Deviance GOF ", " ",
"D =", round(dev,4) ,
"df = ", df,
"P-value = ", p_value), byrow = T,ncol = 2))
```

```
## [,1] [,2]
## [1,] "Deviance GOF " " "
## [2,] "D =" "24190.3581"
## [3,] "df = " "3871"
## [4,] "P-value = " "0"
```

In Poisson Mean = \(\mu\)

Variance = \(\mu\)

In Negative Binomial1 (NB1)

Mean = \(\mu\)

Variance = \(\mu + \alpha \mu\) or

Variance = \(\mu (1 + \alpha) = \mu \omega\)

In Negative Binomial2 (NB2)

Mean = \(\mu\)

Variance = \(\mu + \alpha \mu ^2\)

```
# Poisson Default standard errors (variance equals mean) based on MLE
poiss_mle <- glm(docvis ~ outwork + age, data = rwm1984, family=poisson())
print(summary(poiss_mle),digits=3)
```

```
##
## Call:
## glm(formula = docvis ~ outwork + age, family = poisson(), data = rwm1984)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.457 -2.248 -1.237 0.286 25.132
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.033517 0.039181 -0.86 0.39
## outwork 0.407931 0.018845 21.65 <2e-16 ***
## age 0.022084 0.000838 26.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 25791 on 3873 degrees of freedom
## Residual deviance: 24190 on 3871 degrees of freedom
## AIC: 31279
##
## Number of Fisher Scoring iterations: 6
```

```
# Robust sandwich based on poiss_mle
library(sandwich)
#this will gives robust covariance matrix of estimated model
cov.robust <- vcovHC(poiss_mle, type="HC0")
#diagonal is the variances
se.robust <- sqrt(diag(cov.robust))
coeffs <- coef(poiss_mle)
z_.025 <- qnorm(.975)
t.robust <- coeffs / se.robust
poiss_mle_robust <- cbind(coeffs, se.robust, t.robust, pvalue=round(2*(1-pnorm(abs(coeffs/se.robust))),5),
lower=coeffs - z_.025 * se.robust, upper=coeffs+ z_.025 * se.robust)
print(poiss_mle_robust,digits=3)
```

```
## coeffs se.robust t.robust pvalue lower upper
## (Intercept) -0.0335 0.12333 -0.272 0.786 -0.2752 0.2082
## outwork 0.4079 0.06907 5.906 0.000 0.2726 0.5433
## age 0.0221 0.00283 7.808 0.000 0.0165 0.0276
```

```
# Poisson Bootstrap standard errors (variance equals mean)
# install.packages("boot")
library(boot)
boot.poiss <- function(data, indices) {
data <- data[indices, ]
model <- glm(docvis ~ outwork + age, family=poisson(), data=data)
coefficients(model)
}
set.seed(10101)
# To speed up we use only 100 bootstraps.
summary.poissboot <- boot(rwm1984, boot.poiss, 400)
print(summary.poissboot,digits=3)
```

```
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = rwm1984, statistic = boot.poiss, R = 400)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* -0.0335 0.007078 0.12531
## t2* 0.4079 0.000811 0.07318
## t3* 0.0221 -0.000168 0.00294
```

```
# Poisson with NB1 standard errors (assumes variance is a multiple of the mean) w = theta*mu or w = mu*(1 + alpha)
#also called Poisson Quasi-MLE
poiss_qmle <- glm(docvis ~ outwork + age, data = rwm1984, family=quasipoisson())
print(summary(poiss_qmle),digits=4)
```

```
##
## Call:
## glm(formula = docvis ~ outwork + age, family = quasipoisson(),
## data = rwm1984)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4573 -2.2475 -1.2366 0.2863 25.1320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.033517 0.131962 -0.254 0.8
## outwork 0.407931 0.063468 6.427 1.46e-10 ***
## age 0.022084 0.002821 7.827 6.39e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 11.34324)
##
## Null deviance: 25791 on 3873 degrees of freedom
## Residual deviance: 24190 on 3871 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
```

```
# MASS does just NB2
# install.packages("MASS")
library(MASS)
# Note: MASS reports as the overdispersion parameter 1/alpha not alpha
# In this example R reports 0.9285 and 1/0.9285 = 1.077
# NB2 with default standard errors
NB2.MASS <- glm.nb(docvis ~ outwork + age, data = rwm1984)
print(summary(NB2.MASS),digits=3)
```

```
##
## Call:
## glm.nb(formula = docvis ~ outwork + age, data = rwm1984, init.theta = 0.4353451729,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.532 -1.282 -0.489 0.104 5.047
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.0396 0.1067 -0.37 0.71
## outwork 0.4147 0.0554 7.48 7.5e-14 ***
## age 0.0221 0.0024 9.24 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.4353) family taken to be 1)
##
## Null deviance: 4100.8 on 3873 degrees of freedom
## Residual deviance: 3909.6 on 3871 degrees of freedom
## AIC: 16674
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.4353
## Std. Err.: 0.0135
##
## 2 x log-likelihood: -16665.5250
```

The R codes below is used for weighted least square and the relationship between weighted least square and generalized linear models.

```
set.seed(101)
x = seq(1,3.5,.1)
x = sort(rep(x,10))
Ey = 220 - 60*x
y = rpois(length(Ey),Ey)
m = lm(y~x)
plot(x,y,xlab="Price",ylab="Number Sold"); abline(m)
```

```
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.2760 -5.1955 -0.3457 4.6567 24.8014
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 222.4431 1.7288 128.67 <2e-16 ***
## x -60.8978 0.7289 -83.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.815 on 258 degrees of freedom
## Multiple R-squared: 0.9644, Adjusted R-squared: 0.9642
## F-statistic: 6980 on 1 and 258 DF, p-value: < 2.2e-16
```

```
##
## Call:
## lm(formula = y ~ x, weights = W)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.83173 -0.66938 -0.02412 0.58847 2.97862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 223.055 1.751 127.4 <2e-16 ***
## x -61.170 0.597 -102.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9548 on 258 degrees of freedom
## Multiple R-squared: 0.976, Adjusted R-squared: 0.9759
## F-statistic: 1.05e+04 on 1 and 258 DF, p-value: < 2.2e-16
```

```
##
## Call:
## glm(formula = y ~ x, family = poisson(link = "identity"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.14035 -0.67749 -0.02331 0.58215 2.81612
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 223.0681 1.8305 121.86 <2e-16 ***
## x -61.1756 0.6227 -98.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 7382.88 on 259 degrees of freedom
## Residual deviance: 236.76 on 258 degrees of freedom
## AIC: 1817.1
##
## Number of Fisher Scoring iterations: 4
```

```
y <- mydata$admit
x <- mydata$gre
mu <- rep(mean(y), nrow(mydata)) # initialize mu
#
eta <- log(mu/(1-mu)) # initialize eta
for (i in 1:10) { # loop for 10 iterations
w <- mu*(1-mu) # weight = variance
z <- eta + (y - mu)/(mu*(1-mu)) # expected response + standardized error
mod <- lm(z ~ x, weights = w) # weighted regression
eta <- mod$fit # linear predictor
mu <- 1/(1+exp(-eta)) # fitted value
cat(i, coef(mod), "\n") # displayed iteration log
}
```

```
## 1 -2.783528 0.003434139
## 2 -2.899506 0.003579592
## 3 -2.901344 0.003582211
## 4 -2.901344 0.003582212
## 5 -2.901344 0.003582212
## 6 -2.901344 0.003582212
## 7 -2.901344 0.003582212
## 8 -2.901344 0.003582212
## 9 -2.901344 0.003582212
## 10 -2.901344 0.003582212
```

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.901344270 0.6068541966 -4.780958 2.459924e-06
## x 0.003582212 0.0009873336 3.628167 3.226024e-04
```

```
## 2.5 % 97.5 %
## (Intercept) -4.094384619 -1.708303920
## x 0.001641171 0.005523253
```

```
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.901344270 0.6060376103 -4.787400 1.689561e-06
## gre 0.003582212 0.0009860051 3.633056 2.800845e-04
```

```
## 2.5 % 97.5 %
## (Intercept) -4.119988259 -1.739756286
## gre 0.001679963 0.005552748
```