Display system information and load tidyverse
and faraway
packages
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.2 magrittr_1.5 tools_3.6.2 htmltools_0.4.0
## [5] yaml_2.2.1 Rcpp_1.0.4.6 stringi_1.4.6 rmarkdown_2.1
## [9] knitr_1.28 stringr_1.4.0 xfun_0.13 digest_0.6.25
## [13] rlang_0.4.5 evaluate_0.14
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.0 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(faraway)
faraway
package contains the datasets in the ELMR book.
Multinomial regression is a natural extension of the binomial regression to the case when responses are multivariate counts. Let \(Y_{ij}\) be the number of observations falling into category \(j\) for batch \(i\) and let \(n_i = \sum_j Y_{ij}\) be the batch size. Then \[ \mathbb{P}(Y_{i1} = y_{i1}, \ldots, Y_{iJ} = y_{iJ}) = \binom{n_i}{y_{i1} \cdots y_{iJ}} p_{i1}^{y_{i1}} \cdots p_{iJ}^{y_{iJ}}. \]
We distinguish between nominal multinomial data where there is no natural order to the categories and ordinal multinomial data where there is an order. The multinomial logit model is indended for nominal data. It can be used for ordinal data, but the informtion about order will not be used.
Suppose the length of \(\mathbf{x}_i\) is \(p\), then the multinomial-logit model consumes \(p(J-1)\) parameters.
We may estimate the parameters using MLE and the standard methods for inference.
(nes96 <- as_tibble(nes96))
## # A tibble: 944 x 10
## popul TVnews selfLR ClinLR DoleLR PID age educ income vote
## <int> <int> <ord> <ord> <ord> <ord> <int> <ord> <ord> <fct>
## 1 0 7 extCon extLib Con strRep 36 HS $3Kminus Dole
## 2 190 1 sliLib sliLib sliCon weakDem 20 Coll $3Kminus Clinton
## 3 31 7 Lib Lib Con weakDem 24 BAdeg $3Kminus Clinton
## 4 83 4 sliLib Mod sliCon weakDem 28 BAdeg $3Kminus Clinton
## 5 640 7 sliCon Con Mod strDem 68 BAdeg $3Kminus Clinton
## 6 110 3 sliLib Mod Con weakDem 21 Coll $3Kminus Clinton
## 7 100 7 sliCon Con Mod weakDem 77 Coll $3Kminus Clinton
## 8 31 1 sliCon Mod sliCon indRep 21 Coll $3Kminus Clinton
## 9 180 7 Mod Con sliLib indind 31 Coll $3Kminus Clinton
## 10 2800 0 sliLib sliLib extCon strDem 39 HS $3Kminus Clinton
## # … with 934 more rows
summary(nes96)
## popul TVnews selfLR ClinLR DoleLR
## Min. : 0.0 Min. :0.000 extLib: 16 extLib:109 extLib: 13
## 1st Qu.: 1.0 1st Qu.:1.000 Lib :103 Lib :317 Lib : 31
## Median : 22.0 Median :3.000 sliLib:147 sliLib:236 sliLib: 43
## Mean : 306.4 Mean :3.728 Mod :256 Mod :160 Mod : 87
## 3rd Qu.: 110.0 3rd Qu.:7.000 sliCon:170 sliCon: 67 sliCon:195
## Max. :7300.0 Max. :7.000 Con :218 Con : 36 Con :460
## extCon: 34 extCon: 19 extCon:115
## PID age educ income vote
## strDem :200 Min. :19.00 MS : 13 $60K-$75K:103 Clinton:551
## weakDem:180 1st Qu.:34.00 HSdrop: 52 $50K-$60K:100 Dole :393
## indDem :108 Median :44.00 HS :248 $30K-$35K: 70
## indind : 37 Mean :47.04 Coll :187 $25K-$30K: 68
## indRep : 94 3rd Qu.:58.00 CCdeg : 90 $105Kplus: 68
## weakRep:150 Max. :91.00 BAdeg :227 $35K-$40K: 62
## strRep :175 MAdeg :127 (Other) :473
PID
(party identification) has ordered categories strDem
(strong democratic), weakDem
(weak democratic), indDem
(independent democratic), indind
(independent), indRep
(independent republican), weakRep
(weak Republican), and strRep
(strong republican). In this analysis we only consider the coarsened categories: Democratic
, Independent
, and Republican
.nes96 <- nes96 %>%
mutate(sPID = recode(PID,
strDem = "Democrat",
weakDem = "Democrat",
indDem = "Independent",
indind = "Independent",
indRep = "Independent",
weakRep = "Republican",
strRep = "Republican"))
summary(nes96$sPID)
## Democrat Independent Republican
## 380 239 325
table(nes96$income)
##
## $3Kminus $3K-$5K $5K-$7K $7K-$9K $9K-$10K $10K-$11K $11K-$12K
## 19 12 17 19 18 13 11
## $12K-$13K $13K-$14K $14K-$15K $15K-$17K $17K-$20K $20K-$22K $22K-$25K
## 17 10 15 23 35 26 39
## $25K-$30K $30K-$35K $35K-$40K $40K-$45K $45K-$50K $50K-$60K $60K-$75K
## 68 70 62 48 51 100 103
## $75K-$90K $90K-$105K $105Kplus
## 53 47 68
inca <- c(1.5, 4, 6, 8, 9.5, 10.5, 11.5, 12.5, 13.5,
14.5, 16, 18.5, 21, 23.5, 27.5, 32.5, 37.5,
42.5, 47.5, 55, 67.5, 82.5, 97.5, 115)
nes96 <- nes96 %>%
mutate(nincome = inca[unclass(income)])
summary(nes96$nincome)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.50 23.50 37.50 46.58 67.50 115.00
How does party identification changes with education level?
nes96 %>%
count(educ, sPID) %>%
group_by(educ) %>%
mutate(prop = prop.table(n)) %>%
print(width = Inf) %>%
ggplot() +
geom_line(mapping = aes(x = educ, y = prop, group = sPID, color = sPID)) +
scale_color_manual(values = c("blue", "green", "red")) +
labs(x = "Education", y = "Proportion")
## # A tibble: 21 x 4
## # Groups: educ [7]
## educ sPID n prop
## <ord> <ord> <int> <dbl>
## 1 MS Democrat 9 0.692
## 2 MS Independent 3 0.231
## 3 MS Republican 1 0.0769
## 4 HSdrop Democrat 29 0.558
## 5 HSdrop Independent 14 0.269
## 6 HSdrop Republican 9 0.173
## 7 HS Democrat 108 0.435
## 8 HS Independent 63 0.254
## 9 HS Republican 77 0.310
## 10 Coll Democrat 74 0.396
## # … with 11 more rows
How does party identification changes with income level?
nes96 %>%
count(income, sPID) %>%
group_by(income) %>%
mutate(prop = prop.table(n)) %>%
print(width = Inf) %>%
ggplot() +
geom_line(mapping = aes(x = income, y = prop, group = sPID, color = sPID)) +
scale_color_manual(values = c("blue", "green", "red")) +
labs(x = "Income", y = "Proportion") +
theme(axis.text.x = element_text(angle = 90))
## # A tibble: 72 x 4
## # Groups: income [24]
## income sPID n prop
## <ord> <ord> <int> <dbl>
## 1 $3Kminus Democrat 13 0.684
## 2 $3Kminus Independent 3 0.158
## 3 $3Kminus Republican 3 0.158
## 4 $3K-$5K Democrat 7 0.583
## 5 $3K-$5K Independent 4 0.333
## 6 $3K-$5K Republican 1 0.0833
## 7 $5K-$7K Democrat 9 0.529
## 8 $5K-$7K Independent 3 0.176
## 9 $5K-$7K Republican 5 0.294
## 10 $7K-$9K Democrat 11 0.579
## # … with 62 more rows
How does party identification changes with age?
nes96 %>%
mutate(age_grp = cut(age, breaks = seq(from = 10, to = 100, by = 10))) %>%
count(age_grp, sPID) %>%
group_by(age_grp) %>%
mutate(prop = prop.table(n)) %>%
print(width = Inf) %>%
ggplot() +
geom_line(mapping = aes(x = age_grp, y = prop, group = sPID, color = sPID)) +
scale_color_manual(values = c("blue", "green", "red")) +
labs(x = "Age", y = "Proportion") +
theme(axis.text.x = element_text(angle = 90))
## # A tibble: 26 x 4
## # Groups: age_grp [9]
## age_grp sPID n prop
## <fct> <ord> <int> <dbl>
## 1 (10,20] Democrat 3 0.333
## 2 (10,20] Independent 2 0.222
## 3 (10,20] Republican 4 0.444
## 4 (20,30] Democrat 60 0.438
## 5 (20,30] Independent 37 0.270
## 6 (20,30] Republican 40 0.292
## 7 (30,40] Democrat 87 0.348
## 8 (30,40] Independent 65 0.26
## 9 (30,40] Republican 98 0.392
## 10 (40,50] Democrat 89 0.441
## # … with 16 more rows
multinom
function is avaible in the nnet
package.library(nnet)
mmod <- multinom(sPID ~ age + educ + nincome, data = nes96)
## # weights: 30 (18 variable)
## initial value 1037.090001
## iter 10 value 990.568608
## iter 20 value 984.319052
## final value 984.166272
## converged
summary(mmod)
## Call:
## multinom(formula = sPID ~ age + educ + nincome, data = nes96)
##
## Coefficients:
## (Intercept) age educ.L educ.Q educ.C
## Independent -1.197260 0.0001534525 0.06351451 -0.1217038 0.1119542
## Republican -1.642656 0.0081943691 1.19413345 -1.2292869 0.1544575
## educ^4 educ^5 educ^6 nincome
## Independent -0.07657336 0.1360851 0.15427826 0.01623911
## Republican -0.02827297 -0.1221176 -0.03741389 0.01724679
##
## Std. Errors:
## (Intercept) age educ.L educ.Q educ.C educ^4
## Independent 0.3265951 0.005374592 0.4571884 0.4142859 0.3498491 0.2883031
## Republican 0.3312877 0.004902668 0.6502670 0.6041924 0.4866432 0.3605620
## educ^5 educ^6 nincome
## Independent 0.2494706 0.2171578 0.003108585
## Republican 0.2696036 0.2031859 0.002881745
##
## Residual Deviance: 1968.333
## AIC: 2004.333
step(mmod)
## Start: AIC=2004.33
## sPID ~ age + educ + nincome
##
## trying - age
## # weights: 27 (16 variable)
## initial value 1037.090001
## iter 10 value 988.896864
## iter 20 value 985.822223
## final value 985.812737
## converged
## trying - educ
## # weights: 12 (6 variable)
## initial value 1037.090001
## iter 10 value 992.269502
## final value 992.269484
## converged
## trying - nincome
## # weights: 27 (16 variable)
## initial value 1037.090001
## iter 10 value 1009.025560
## iter 20 value 1006.961593
## final value 1006.955275
## converged
## Df AIC
## - educ 6 1996.539
## - age 16 2003.625
## <none> 18 2004.333
## - nincome 16 2045.911
## # weights: 12 (6 variable)
## initial value 1037.090001
## iter 10 value 992.269502
## final value 992.269484
## converged
##
## Step: AIC=1996.54
## sPID ~ age + nincome
##
## trying - age
## # weights: 9 (4 variable)
## initial value 1037.090001
## final value 992.712152
## converged
## trying - nincome
## # weights: 9 (4 variable)
## initial value 1037.090001
## final value 1020.425203
## converged
## Df AIC
## - age 4 1993.424
## <none> 6 1996.539
## - nincome 4 2048.850
## # weights: 9 (4 variable)
## initial value 1037.090001
## final value 992.712152
## converged
##
## Step: AIC=1993.42
## sPID ~ nincome
##
## trying - nincome
## # weights: 6 (2 variable)
## initial value 1037.090001
## final value 1020.636052
## converged
## Df AIC
## <none> 4 1993.424
## - nincome 2 2045.272
## Call:
## multinom(formula = sPID ~ nincome, data = nes96)
##
## Coefficients:
## (Intercept) nincome
## Independent -1.1749331 0.01608683
## Republican -0.9503591 0.01766457
##
## Residual Deviance: 1985.424
## AIC: 1993.424
nincome
only and then compare the deviances.mmodi <- multinom(sPID ~ nincome, data = nes96)
## # weights: 9 (4 variable)
## initial value 1037.090001
## final value 992.712152
## converged
deviance(mmodi) - deviance(mmod)
## [1] 17.09176
mmod$edf - mmodi$edf
## [1] 14
pchisq(deviance(mmodi) - deviance(mmod), mmod$edf - mmodi$edf, lower = F)
## [1] 0.2513209
summary(mmodi)
## Call:
## multinom(formula = sPID ~ nincome, data = nes96)
##
## Coefficients:
## (Intercept) nincome
## Independent -1.1749331 0.01608683
## Republican -0.9503591 0.01766457
##
## Std. Errors:
## (Intercept) nincome
## Independent 0.1536103 0.002849738
## Republican 0.1416859 0.002652532
##
## Residual Deviance: 1985.424
## AIC: 1993.424
- The intercept terms model the probabilities of party identification when income is zero.
cc <- c(0, -1.17493, -0.95036)
exp(cc) / sum(exp(cc))
## [1] 0.5898166 0.1821593 0.2280242
- The slope terms represent the log-odds of moving from the baseline category `Democrat` to `Independent` and `Republican` respectively for a unit change of $1000 in income.
(pp <- predict(mmodi, data.frame(nincome = c(0, 1)), type = "probs"))
## Democrat Independent Republican
## 1 0.5898168 0.1821588 0.2280244
## 2 0.5857064 0.1838228 0.2304708
log(pp[1, 1] * pp[2, 2] / (pp[1, 2] * pp[2, 1]))
## [1] 0.01608683
log(pp[1, 1] * pp[2, 3] / (pp[1, 3] * pp[2, 1]))
## [1] 0.01766457
il <- c(8, 26, 42, 58, 74, 90, 107)
predict(mmodi, data.frame(nincome = il), type = "probs") %>%
as_tibble() %>%
mutate(income = il) %>%
pivot_longer(Democrat:Republican, names_to = "sPID", values_to = "prob") %>%
ggplot() +
geom_line(mapping = aes(x = income, y = prob, group = sPID, color = sPID)) +
scale_color_manual(values = c("blue", "green", "red")) +
labs(x = "Income", y = "Probability")
(cns <- as_tibble(cns))
## # A tibble: 16 x 7
## Area NoCNS An Sp Other Water Work
## <fct> <int> <int> <int> <int> <int> <fct>
## 1 Cardiff 4091 5 9 5 110 NonManual
## 2 Newport 1515 1 7 0 100 NonManual
## 3 Swansea 2394 9 5 0 95 NonManual
## 4 GlamorganE 3163 9 14 3 42 NonManual
## 5 GlamorganW 1979 5 10 1 39 NonManual
## 6 GlamorganC 4838 11 12 2 161 NonManual
## 7 MonmouthV 2362 6 8 4 83 NonManual
## 8 MonmouthOther 1604 3 6 0 122 NonManual
## 9 Cardiff 9424 31 33 14 110 Manual
## 10 Newport 4610 3 15 6 100 Manual
## 11 Swansea 5526 19 30 4 95 Manual
## 12 GlamorganE 13217 55 71 19 42 Manual
## 13 GlamorganW 8195 30 44 10 39 Manual
## 14 GlamorganC 7803 25 28 12 161 Manual
## 15 MonmouthV 9962 36 37 13 83 Manual
## 16 MonmouthOther 3172 8 13 3 122 Manual
Suppose we have \(J\) ordered categories and \(p_{ij} = \mathbb{P}(Y_i = j)\) for \(j=1, \ldots, J\). For ordered responses, it is more convenient to work with the cumulative probabilities \[ \gamma_{ij} = \mathbb{P}(Y_i \le j). \] Since \(\gamma_{iJ}=1\), we only need to model \(J-1\) cumulative probabilities.
Latent variable interpretation.
theta <- c(-2, -1, 2)
tibble(t = seq(-6, 6, 0.1), pdf = dlogis(t, location = 0, scale = 1)) %>%
ggplot() +
geom_line(mapping = aes(x = t, y = pdf)) +
geom_segment(mapping = aes(x = theta[1], y = 0, xend = theta[1], yend = dlogis(theta[1]))) +
geom_segment(mapping = aes(x = theta[2], y = 0, xend = theta[2], yend = dlogis(theta[2]))) +
geom_segment(mapping = aes(x = theta[3], y = 0, xend = theta[3], yend = dlogis(theta[3]))) +
labs(x = "t", y = "Density") +
annotate("text", x = theta[1], y = 0, label = expression(theta[1])) +
annotate("text", x = theta[2], y = 0, label = expression(theta[2])) +
annotate("text", x = theta[3], y = 0, label = expression(theta[3]))
## Warning in is.na(x): is.na() applied to non-(list or vector) of type
## 'expression'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type
## 'expression'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type
## 'expression'
The logit link function dictates \[ \log \frac{\gamma_{ij}}{1 - \gamma_{ij}} = \theta_j - \mathbf{x}_i^T \boldsymbol{\beta}, \quad j = 1,\ldots,J-1. \]
The corresponding inverse link function is \[ \gamma_{ij} = \frac{\exp (\theta_j - \mathbf{x}_i^T \boldsymbol{\beta})}{1 + \exp (\theta_j - \mathbf{x}_i^T \boldsymbol{\beta})}. \] When \(\beta_j > 0\), as \(x_{ij}\) increases, \(\gamma_{ij} = \mathbb{P}(Y_i \le j)\) decreases the same amount for all \(j < J\), thus \(Y_i\) is more likely to take the largest value \(J\). This motivates the minus sign in the definition of the model since it allows easier interpretation of \(\beta\).
It is called the proportional odds model because the relative odds for \(y \le j\) comparing \(\mathbf{x}_1\) and \(\mathbf{x}_2\) are \[ \left( \frac{\gamma_j(\mathbf{x}_1)}{1 - \gamma_j(\mathbf{x}_1)} \right) / \left( \frac{\gamma_j(\mathbf{x}_2)}{1 - \gamma_j(\mathbf{x}_2)} \right) = \exp (- (\mathbf{x}_1 - \mathbf{x}_2)^T \boldsymbol{\beta}), \] which do not dependent on \(j\). Here \(\gamma_j(\mathbf{x}_i) = \gamma_{ij}\).
We can fit a proportional odds model using the polr
function from the MASS library
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
pomod <- polr(sPID ~ age + educ + nincome, data = nes96)
summary(pomod)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = sPID ~ age + educ + nincome, data = nes96)
##
## Coefficients:
## Value Std. Error t value
## age 0.005775 0.003887 1.48581
## educ.L 0.724087 0.384388 1.88374
## educ.Q -0.781361 0.351173 -2.22500
## educ.C 0.040168 0.291762 0.13767
## educ^4 -0.019925 0.232429 -0.08573
## educ^5 -0.079413 0.191533 -0.41462
## educ^6 -0.061104 0.157747 -0.38735
## nincome 0.012739 0.002140 5.95187
##
## Intercepts:
## Value Std. Error t value
## Democrat|Independent 0.6449 0.2435 2.6479
## Independent|Republican 1.7374 0.2493 6.9694
##
## Residual Deviance: 1984.211
## AIC: 2004.211
nincome
.pomodi <- step(pomod)
## Start: AIC=2004.21
## sPID ~ age + educ + nincome
##
## Df AIC
## - educ 6 2002.8
## <none> 2004.2
## - age 1 2004.4
## - nincome 1 2038.6
##
## Step: AIC=2002.83
## sPID ~ age + nincome
##
## Df AIC
## - age 1 2001.4
## <none> 2002.8
## - nincome 1 2047.2
##
## Step: AIC=2001.36
## sPID ~ nincome
##
## Df AIC
## <none> 2001.4
## - nincome 1 2045.3
summary(pomodi)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = sPID ~ nincome, data = nes96)
##
## Coefficients:
## Value Std. Error t value
## nincome 0.01312 0.001971 6.657
##
## Intercepts:
## Value Std. Error t value
## Democrat|Independent 0.2091 0.1123 1.8627
## Independent|Republican 1.2916 0.1201 10.7526
##
## Residual Deviance: 1995.363
## AIC: 2001.363
nincome
.c(deviance(pomodi), pomodi$edf)
## [1] 1995.363 3.000
which can be compared to the corresponding multinomial logit model
c(deviance(pomod), pomod$edf)
## [1] 1984.211 10.000
We see the proportional odds model is justifiable.
pchisq(deviance(pomodi) - deviance(pomod),
pomod$edf - pomodi$edf,
lower.tail = FALSE)
## [1] 0.1321517
Interpretation of coefficients.
Democratic
to Independent
or from Independent
to Republican
increases by a factor ofexp(pomodi$coef[1])
## nincome
## 1.013206
as income increases by one unit ($1000).
Democrat
isilogit(pomodi$zeta[1])
## Democrat|Independent
## 0.5520865
and that of being an Independent
is
ilogit(pomodi$zeta[2]) - ilogit(pomodi$zeta[1])
## Independent|Republican
## 0.2323241
and that of being a Republican
is
1 - ilogit(pomodi$zeta[2])
## Independent|Republican
## 0.2155895
The predicted probabilities of each category at each income level is
l <- c(8, 26, 42, 58, 74, 90, 107)
predict(pomodi, data.frame(nincome = il), type = "probs")
## Democrat Independent Republican
## 1 0.5260129 0.2401191 0.2338679
## 2 0.4670450 0.2541588 0.2787962
## 3 0.4153410 0.2617693 0.3228897
## 4 0.3654362 0.2641882 0.3703756
## 5 0.3182635 0.2612285 0.4205080
## 6 0.2745456 0.2531189 0.4723355
## 7 0.2324161 0.2395468 0.5280371
opmod <- polr(sPID ~ nincome, method = "probit", data = nes96)
summary(opmod)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = sPID ~ nincome, data = nes96, method = "probit")
##
## Coefficients:
## Value Std. Error t value
## nincome 0.008182 0.001208 6.775
##
## Intercepts:
## Value Std. Error t value
## Democrat|Independent 0.1284 0.0694 1.8510
## Independent|Republican 0.7976 0.0722 11.0399
##
## Residual Deviance: 1994.892
## AIC: 2000.892
l <- c(8, 26, 42, 58, 74, 90, 107)
predict(opmod, data.frame(nincome = il), type = "probs")
## Democrat Independent Republican
## 1 0.5251020 0.2428478 0.2320502
## 2 0.4664030 0.2542673 0.2793298
## 3 0.4147945 0.2602623 0.3249432
## 4 0.3646178 0.2620370 0.3733452
## 5 0.3166610 0.2595041 0.4238349
## 6 0.2716037 0.2527879 0.4756084
## 7 0.2275119 0.2414351 0.5310530
Suppose we use the cloglog link \[ \log (- \log (1 - \gamma_j(\mathbf{x}_i))) = \theta_j - \mathbf{x}_i^T \boldsymbol{\beta}. \]
The hazard of category \(j\) is the probability of falling in category \(j\) given that your category is greater than \(j\) \[ \text{Hazard}(j) = \mathbb{P}(Y_i = j \mid Y_i \ge j) = \frac{\mathbb{P}(Y_i = j)}{\mathbb{P}(Y_i \ge j)} = \frac{\gamma_{ij} - \gamma_{i,j-1}}{1 - \gamma_{i,j-1}}. \] The quantity \(- \log \mathbb{P}(Y > j)\) is called the cumulative hazard function.
Since \[\begin{eqnarray*} 1 - \gamma_j(\mathbf{x}) &=& \mathbb{P}(Y > j) = e^{- e^{\theta_j - \mathbf{x}^T \boldsymbol{\beta}}}, \end{eqnarray*}\] we have \[\begin{eqnarray*} \log \mathbb{P}(Y > j) = - e^{\theta_j - \mathbf{x}^T \boldsymbol{\beta}} \end{eqnarray*}\] and \[ \frac{- \log \mathbb{P}(Y > j \mid \mathbf{x}_1)}{- \log \mathbb{P}(Y > j \mid \mathbf{x}_2)} = e^{(\mathbf{x}_2 - \mathbf{x}_1)^T \boldsymbol{\beta}} \] or \[ \mathbb{P}(Y > j \mid \mathbf{x}_1) = [\mathbb{P}(Y > j \mid \mathbf{x}_2)]^{\exp (\mathbf{x}_2 - \mathbf{x}_1)^T \boldsymbol{\beta}}. \] It is called the proportional hazards model because the ratio of cumulative hazards does not depend on level \(j\).
polr(sPID ~ nincome, method = "cloglog", data = nes96)
## Call:
## polr(formula = sPID ~ nincome, data = nes96, method = "cloglog")
##
## Coefficients:
## nincome
## 0.008271135
##
## Intercepts:
## Democrat|Independent Independent|Republican
## -0.2866703 0.4564409
##
## Residual Deviance: 2003.96
## AIC: 2009.96
We see a relatively worse fit than proportional odds and ordered probit models.