Regression with Graphics by Lawrence Hamilton Chapter 7: Logit regression

–>

SPSS Textbook Examples
Regression with Graphics by Lawrence Hamilton
Chapter 7: Logit regression

Limitations of linear regression

Page 218 Figure 7.1 Linear regression of a dichotomous Y variable (0 = open schools, 1 = close schools) on a measurement X variable (years lived in town).

GET FILE 'd:appsrwgdatatoxic.sav'.

IGRAPH
 /X1 = VAR(lived)
 /Y = VAR(close) type=scale
 /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL
 /SCATTER.

Interactive Graph

Page 219 Figure 7.2 Boxplots and oneway scatterplots of years lived in town, for respondents favoring closed and open schools.

compute const=.01.
execute.

EXAMINE  VARIABLES=lived BY close
 /PLOT=BOXPLOT
 /STATISTICS=NONE.

Explore

Total Sample

**Case Processing Summary**
	Cases
	Valid		Missing		Total
	N	Percent	N	Percent	N	Percent
years lived in Williamstown	153	100.0%	0	.0%	153	100.0%

years lived in Williamstown

schools should close

**Case Processing Summary**
		Cases
		Valid		Missing		Total
	schools should close	N	Percent	N	Percent	N	Percent
years lived in Williamstown	open	87	100.0%	0	.0%	87	100.0%
years lived in Williamstown	close	66	100.0%	0	.0%	66	100.0%

years lived in Williamstown

Page 222 Figure 7.4 Logit regression of school-closing opinion on years lived in town, also showing linear regression line.

NOTE: SPSS will not allow you to graph two regression lines and the data points on the same graph.

Estimation

Page 224 Table 7.1 Logit regression of school-closing opinion on years lived in town.

LOGISTIC REGRESSION VAR=close
  /METHOD=ENTER lived.

Logistic Regression

**Case Processing Summary**
Unweighted Cases(a)		N	Percent
Selected Cases	Included in Analysis	153	100.0
	Missing Cases	0	.0
	Total	153	100.0
Unselected Cases		0	.0
Total		153	100.0
a If weight is in effect, see classification table for the total number of cases.

**Dependent Variable Encoding**
Original Value	Internal Value
open	0
close	1

Block 0: Beginning Block

**Classification Table(a,b)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 0	schools should close	open	87	0	100.0
	schools should close	close	66	0	.0
	Overall Percentage				56.9
a Constant is included in the model.
b The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 0	Constant	-.276	.163	2.864	1	.091	.759

**Variables not in the Equation**
			Score	df	Sig.
Step 0	Variables	LIVED	12.683	1	.000
Step 0	Overall Statistics		12.683	1	.000

Block 1: Method = Enter

**Omnibus Tests of Model Coefficients**
		Chi-square	df	Sig.
Step 1	Step	13.944	1	.000
	Block	13.944	1	.000
	Model	13.944	1	.000

**Model Summary**
Step	-2 Log likelihood	Cox & Snell R Square	Nagelkerke R Square
1	195.267	.087	.117

**Classification Table(a)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 1	schools should close	open	59	28	67.8
	schools should close	close	29	37	56.1
	Overall Percentage				62.7
a The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1(a)	LIVED	-.041	.012	11.398	1	.001	.960
Step 1(a)	Constant	.460	.263	3.069	1	.080	1.584
a Variable(s) entered on step 1: LIVED.

Hypothesis Tests and Confidence Intervals

Page 226 Table 7.2 Logit regression of school-closing opinion on years lived in town, education, contamination, and HSC meetings.

LOGISTIC REGRESSION VAR=close
  /METHOD=ENTER lived educ contam hsc.

Logistic Regression

**Case Processing Summary**
Unweighted Cases(a)		N	Percent
Selected Cases	Included in Analysis	153	100.0
	Missing Cases	0	.0
	Total	153	100.0
Unselected Cases		0	.0
Total		153	100.0
a If weight is in effect, see classification table for the total number of cases.

**Dependent Variable Encoding**
Original Value	Internal Value
open	0
close	1

Block 0: Beginning Block

**Classification Table(a,b)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 0	schools should close	open	87	0	100.0
	schools should close	close	66	0	.0
	Overall Percentage				56.9
a Constant is included in the model.
b The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 0	Constant	-.276	.163	2.864	1	.091	.759

**Variables not in the Equation**
			Score	df	Sig.
Step 0	Variables	LIVED	12.683	1	.000
		EDUC	.221	1	.638
		CONTAM	17.292	1	.000
		HSC	39.337	1	.000
	Overall Statistics		52.845	4	.000

Block 1: Method = Enter

**Omnibus Tests of Model Coefficients**
		Chi-square	df	Sig.
Step 1	Step	59.830	4	.000
	Block	59.830	4	.000
	Model	59.830	4	.000

**Model Summary**
Step	-2 Log likelihood	Cox & Snell R Square	Nagelkerke R Square
1	149.382	.324	.434

**Classification Table(a)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 1	schools should close	open	75	12	86.2
	schools should close	close	24	42	63.6
	Overall Percentage				76.5
a The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1(a)	LIVED	-.046	.015	9.698	1	.002	.955
	EDUC	-.166	.090	3.404	1	.065	.847
	CONTAM	1.208	.465	6.739	1	.009	3.347
	HSC	2.173	.464	21.919	1	.000	8.784
	Constant	1.731	1.302	1.768	1	.184	5.649
a Variable(s) entered on step 1: LIVED, EDUC, CONTAM, HSC.

Page 227 Table 7.3 Logit regression of school-closing opinion on seven background variables.

LOGISTIC REGRESSION VAR=close
  /METHOD=ENTER lived educ contam hsc female kids nodad
  /PRINT=ITER(1) SUMMARY.

Logistic Regression

**Case Processing Summary**
Unweighted Cases(a)		N	Percent
Selected Cases	Included in Analysis	153	100.0
	Missing Cases	0	.0
	Total	153	100.0
Unselected Cases		0	.0
Total		153	100.0
a If weight is in effect, see classification table for the total number of cases.

**Dependent Variable Encoding**
Original Value	Internal Value
open	0
close	1

Block 0: Beginning Block

**Iteration History(a,b,c)**
		-2 Log likelihood	Coefficients
Iteration		-2 Log likelihood		Constant
Step 0	1	209.212	-.275
Step 0	2	209.212	-.276
a Constant is included in the model.
b Initial -2 Log Likelihood: 209.212
c Estimation terminated at iteration number 2 because log-likelihood decreased by less than .010 percent.

**Classification Table(a,b)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 0	schools should close	open	87	0	100.0
	schools should close	close	66	0	.0
	Overall Percentage				56.9
a Constant is included in the model.
b The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 0	Constant	-.276	.163	2.864	1	.091	.759

**Variables not in the Equation**
			Score	df	Sig.
Step 0	Variables	LIVED	12.683	1	.000
		EDUC	.221	1	.638
		CONTAM	17.292	1	.000
		HSC	39.337	1	.000
		FEMALE	3.868	1	.049
		KIDS	5.666	1	.017
		NODAD	9.835	1	.002
	Overall Statistics		57.038	7	.000

Block 1: Method = Enter

**Iteration History(a,b,c,d)**
		-2 Log likelihood	Coefficients
Iteration		-2 Log likelihood		Constant	LIVED	EDUC	CONTAM	HSC	FEMALE	KIDS	NODAD
Step 1	1	147.028	1.565	-.027	-.130	.782	1.764	-.015	-.365	-1.074
	2	141.482	2.538	-.041	-.187	1.147	2.239	-.037	-.580	-1.844
	3	141.054	2.859	-.046	-.204	1.269	2.401	-.050	-.662	-2.184
	4	141.049	2.893	-.047	-.206	1.282	2.418	-.052	-.671	-2.225
a Method: Enter
b Constant is included in the model.
c Initial -2 Log Likelihood: 209.212
d Estimation terminated at iteration number 4 because log-likelihood decreased by less than .010 percent.

**Omnibus Tests of Model Coefficients**
		Chi-square	df	Sig.
Step 1	Step	68.162	7	.000
	Block	68.162	7	.000
	Model	68.162	7	.000

**Model Summary**
Step	-2 Log likelihood	Cox & Snell R Square	Nagelkerke R Square
1	141.049	.359	.482

**Classification Table(a)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 1	schools should close	open	77	10	88.5
	schools should close	close	25	41	62.1
	Overall Percentage				77.1
a The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1(a)	LIVED	-.047	.017	7.549	1	.006	.954
	EDUC	-.206	.093	4.886	1	.027	.814
	CONTAM	1.282	.481	7.093	1	.008	3.604
	HSC	2.418	.510	22.507	1	.000	11.221
	FEMALE	-.052	.557	.009	1	.926	.950
	KIDS	-.671	.566	1.405	1	.236	.511
	NODAD	-2.225	.999	4.962	1	.026	.108
	Constant	2.893	1.603	3.258	1	.071	18.054
a Variable(s) entered on step 1: LIVED, EDUC, CONTAM, HSC, FEMALE, KIDS, NODAD.

Page 228 Table 7.4 Reduced model with male/nonparent interaction term.

LOGISTIC REGRESSION VAR=close
  /METHOD=ENTER lived educ contam hsc nodad.

Logistic Regression

**Case Processing Summary**
Unweighted Cases(a)		N	Percent
Selected Cases	Included in Analysis	153	100.0
	Missing Cases	0	.0
	Total	153	100.0
Unselected Cases		0	.0
Total		153	100.0
a If weight is in effect, see classification table for the total number of cases.

**Dependent Variable Encoding**
Original Value	Internal Value
open	0
close	1

Block 0: Beginning Block

**Classification Table(a,b)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 0	schools should close	open	87	0	100.0
	schools should close	close	66	0	.0
	Overall Percentage				56.9
a Constant is included in the model.
b The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 0	Constant	-.276	.163	2.864	1	.091	.759

**Variables not in the Equation**
			Score	df	Sig.
Step 0	Variables	LIVED	12.683	1	.000
		EDUC	.221	1	.638
		CONTAM	17.292	1	.000
		HSC	39.337	1	.000
		NODAD	9.835	1	.002
	Overall Statistics		56.279	5	.000

Block 1: Method = Enter

**Omnibus Tests of Model Coefficients**
		Chi-square	df	Sig.
Step 1	Step	66.559	5	.000
	Block	66.559	5	.000
	Model	66.559	5	.000

**Model Summary**
Step	-2 Log likelihood	Cox & Snell R Square	Nagelkerke R Square
1	142.652	.353	.473

**Classification Table(a)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 1	schools should close	open	76	11	87.4
	schools should close	close	25	41	62.1
	Overall Percentage				76.5
a The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1(a)	LIVED	-.040	.015	6.559	1	.010	.961
	EDUC	-.197	.093	4.509	1	.034	.821
	CONTAM	1.298	.477	7.422	1	.006	3.664
	HSC	2.278	.490	21.590	1	.000	9.762
	NODAD	-1.731	.725	5.695	1	.017	.177
	Constant	2.182	1.330	2.691	1	.101	8.865
a Variable(s) entered on step 1: LIVED, EDUC, CONTAM, HSC, NODAD.

Interpretation

Page 232 Figure 7.5 Conditional effects of years lived in town, at proclosing (top), average, and anticlosing levels of other X variables.

LOGISTIC REGRESSION VAR=close
  /METHOD=ENTER lived educ contam hsc nodad.

Logistic Regression

**Case Processing Summary**
Unweighted Cases(a)		N	Percent
Selected Cases	Included in Analysis	153	100.0
	Missing Cases	0	.0
	Total	153	100.0
Unselected Cases		0	.0
Total		153	100.0
a If weight is in effect, see classification table for the total number of cases.

**Dependent Variable Encoding**
Original Value	Internal Value
open	0
close	1

Block 0: Beginning Block

**Classification Table(a,b)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 0	schools should close	open	87	0	100.0
	schools should close	close	66	0	.0
	Overall Percentage				56.9
a Constant is included in the model.
b The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 0	Constant	-.276	.163	2.864	1	.091	.759

**Variables not in the Equation**
			Score	df	Sig.
Step 0	Variables	LIVED	12.683	1	.000
		EDUC	.221	1	.638
		CONTAM	17.292	1	.000
		HSC	39.337	1	.000
		NODAD	9.835	1	.002
	Overall Statistics		56.279	5	.000

Block 1: Method = Enter

**Omnibus Tests of Model Coefficients**
		Chi-square	df	Sig.
Step 1	Step	66.559	5	.000
	Block	66.559	5	.000
	Model	66.559	5	.000

**Model Summary**
Step	-2 Log likelihood	Cox & Snell R Square	Nagelkerke R Square
1	142.652	.353	.473

**Classification Table(a)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 1	schools should close	open	76	11	87.4
	schools should close	close	25	41	62.1
	Overall Percentage				76.5
a The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1(a)	LIVED	-.040	.015	6.559	1	.010	.961
	EDUC	-.197	.093	4.509	1	.034	.821
	CONTAM	1.298	.477	7.422	1	.006	3.664
	HSC	2.278	.490	21.590	1	.000	9.762
	NODAD	-1.731	.725	5.695	1	.017	.177
	Constant	2.182	1.330	2.691	1	.101	8.865
a Variable(s) entered on step 1: LIVED, EDUC, CONTAM, HSC, NODAD.


SORT CASES BY
  lived (A).

compute lhat1 = 3.17-.04*lived.
compute phat1 = 1/(1+exp(-lhat1)).
compute lhat2 = .387-.04*(lived).
compute phat2 = 1/(1+exp(-lhat2)).
compute lhat3 = -2.14-.04*(lived).
compute phat3 = 1/(1+exp(-lhat3)).
execute.

GRAPH
  /SCATTERPLOT(OVERLAY)=lived lived lived  WITH phat1 phat2 phat3 (PAIR).

Graph

Page 232 Figure 7.6 Conditional effects of contamination, at proclosing, average, and anticlosing levels of other X variables.

NOTE: This graph does not look exactly like the one in the text because of scaling issues on the X-axis.

SORT CASES BY contam (A).
compute lhat4 = 3.22+1.3*(contam).
compute phat4 = 1/(1+exp(-lhat4)).
compute lhat5 = -.7681+1.3*(contam).
compute phat5 = 1/(1+exp(-lhat5)).
compute lhat6 = -6.79+1.3*(contam).
compute phat6 = 1/(1+exp(-lhat6)).
execute.

SORT CASES BY
  contam (A).

GRAPH
  /LINE(MULTIPLE)= VALUE( phat4 phat5 phat6 ) BY contam.

Graph

Diagnostic graphs

Page 239 Figure 7.7 Poorness-of-fit statistic delta-chi-square(P) versus predicted probability of favoring closed schools; X patterns 131 and 3 are poorly fit (high delta-chi-square(P) values).

LOGISTIC REGRESSION VAR=close
  /METHOD=ENTER lived educ contam hsc nodad
  /SAVE PRED COOK LEVER ZRESID DEV.

Logistic Regression

**Case Processing Summary**
Unweighted Cases(a)		N	Percent
Selected Cases	Included in Analysis	153	100.0
	Missing Cases	0	.0
	Total	153	100.0
Unselected Cases		0	.0
Total		153	100.0
a If weight is in effect, see classification table for the total number of cases.

**Dependent Variable Encoding**
Original Value	Internal Value
open	0
close	1

Block 0: Beginning Block

**Classification Table(a,b)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 0	schools should close	open	87	0	100.0
	schools should close	close	66	0	.0
	Overall Percentage				56.9
a Constant is included in the model.
b The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 0	Constant	-.276	.163	2.864	1	.091	.759

**Variables not in the Equation**
			Score	df	Sig.
Step 0	Variables	LIVED	12.683	1	.000
		EDUC	.221	1	.638
		CONTAM	17.292	1	.000
		HSC	39.337	1	.000
		NODAD	9.835	1	.002
	Overall Statistics		56.279	5	.000

Block 1: Method = Enter

**Omnibus Tests of Model Coefficients**
		Chi-square	df	Sig.
Step 1	Step	66.559	5	.000
	Block	66.559	5	.000
	Model	66.559	5	.000

**Model Summary**
Step	-2 Log likelihood	Cox & Snell R Square	Nagelkerke R Square
1	142.652	.353	.473

**Classification Table(a)**
			Predicted
			schools should close		Percentage Correct
	Observed		open	close	Percentage Correct
Step 1	schools should close	open	76	11	87.4
	schools should close	close	25	41	62.1
	Overall Percentage				76.5
a The cut value is .500

**Variables in the Equation**
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1(a)	LIVED	-.040	.015	6.559	1	.010	.961
	EDUC	-.197	.093	4.509	1	.034	.821
	CONTAM	1.298	.477	7.422	1	.006	3.664
	HSC	2.278	.490	21.590	1	.000	9.762
	NODAD	-1.731	.725	5.695	1	.017	.177
	Constant	2.182	1.330	2.691	1	.101	8.865
a Variable(s) entered on step 1: LIVED, EDUC, CONTAM, HSC, NODAD.


compute deltap=(zre_1)**2/(1-lev_1).
execute.

GRAPH
  /SCATTERPLOT(BIVAR)=pre_1 WITH deltap.

Graph

Page 240 Figure 7.8 Poorness-of-fit statistic delta-chi-square(D) versus predicted probability of favoring closed schools; X patterns 131, 3, 27, 62, 115 are poorly fit (high delta-chi-square(D) values).

compute deltad=(dev_1)**2/(1-lev_1).
execute.

GRAPH
  /SCATTERPLOT(BIVAR)=pre_1 WITH deltad.

Graph

Page 241 Figure 7.9 Influence statistic delta-B versus predicted probability of favoring closed schools; patterns 131, 3, 115, 44, and 94 are most influential (high delta-B values).

NOTE: Delta-B is the Cook’s D statistic.

GRAPH
  /SCATTERPLOT(BIVAR)=pre_1 WITH coo_1.

Graph

Page 242 Figure 7.10 Delta-chi-square(D) versus P-hat with symbols proportional to delta-B; large, high circles indicate influential, poorly fit X patterns.

NOTE: We do not know how to make the bubbles (symbols) proportional.

GRAPH
  /SCATTERPLOT(BIVAR)=pre_1 WITH deltad.

SPSS Textbook Examples Regression with Graphics by Lawrence Hamilton Chapter 7: Logit regression

Limitations of linear regression

Interactive Graph

Explore

Total Sample

years lived in Williamstown

schools should close

years lived in Williamstown

Estimation

Logistic Regression

Block 0: Beginning Block

Block 1: Method = Enter

Hypothesis Tests and Confidence Intervals

Logistic Regression

Block 0: Beginning Block

Block 1: Method = Enter

Logistic Regression

Block 0: Beginning Block

Block 1: Method = Enter

Logistic Regression

Block 0: Beginning Block

Block 1: Method = Enter

Interpretation

Logistic Regression

Block 0: Beginning Block

Block 1: Method = Enter

Graph

Graph

Diagnostic graphs

Logistic Regression

Block 0: Beginning Block

Block 1: Method = Enter

Graph

Graph

Graph

Graph

SPSS Textbook Examples
Regression with Graphics by Lawrence Hamilton
Chapter 7: Logit regression