Biostat 200C Homework 2

Q1. Beta-Binomial
- Q1.1
- Q1.2
Q2. Motivation for quasi-binomial
Q3. Concavity of Poisson regression log-likelihood
- Q3.1
- Q3.2
- Q3.3
- Q3.4
Q4. Odds ratios
- Q4.1
- Q4.2

Q1. Beta-Binomial

Let \(Y_i\) be the number of successes in \(n_i\) trials with \[ Y_i \sim \text{Bin}(n_i, \pi_i), \] where the probabilities \(\pi_i\) have a Beta distribution \[ \pi \sim \text{Be}(\alpha, \beta) \] with density function \[ f(x; \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} x^{\alpha - 1} (1 - x)^{\beta - 1}, \quad x \in [0, 1], \alpha > 0, \beta > 0. \]

Q1.1

Find the mean and variance of \(\pi\).

Q1.2

Find the mean and variance of \(Y_i\) and show that the variance of \(Y_i\) is always larger than or equal to that of a Binomial random variable with the same batch size and mean.

Q2. Motivation for quasi-binomial

Verify that the log-likilihood \(\ell_i\) of a binomial proportion \(Y_i\), where \(m_i Y_i \sim \text{Bin}(m_i, p_i)\), satisfies
\[\begin{eqnarray*} \mathbb{E} \frac{\partial \ell_i}{\partial \mu_i} &=& 0 \\ \operatorname{Var} \frac{\partial \ell_i}{\partial \mu_i} &=& \frac{1}{\phi V(\mu_i)} \\ \mathbb{E} \frac{\partial \ell_i^2}{\partial^2 \mu_i} &=& - \frac{1}{\phi V(\mu_i)}, \end{eqnarray*}\] with \(\phi = 1\), \(\mu_i = p_i\), and \(V(\mu_i) = p_i (1 - p_i)/m_i\). Therefore the \(U_i\) in quasi-binomial method mimics the behavior of a binomial model.

Q3. Concavity of Poisson regression log-likelihood

Let \(Y_1,\ldots,Y_n\) be independent random variables with \(Y_i \sim \text{Poisson}(\mu_i)\) and \(\log \mu_i = \mathbf{x}_i^T \boldsymbol{\beta}\), \(i = 1,\ldots,n\).

Q3.1

Write down the log-likelihood function.

Q3.2

Derive the gradient vector and Hessian matrix of the log-likelhood function with respect to the regression coefficients \(\boldsymbol{\beta}\).

Q3.3

Show that the log-likelihood function of the log-linear model is a concave function in regression coefficients \(\boldsymbol{\beta}\). (Hint: show that the negative Hessian is a positive semidefinite matrix.)

Q3.4

Show that for the fitted values \(\widehat{\mu}_i\) from maximum likelihood estimates \[ \sum_i \widehat{\mu}_i = \sum_i y_i. \] Therefore the deviance reduces to \[ D = 2 \sum_i y_i \log \frac{y_i}{\widehat{\mu}_i}. \]

Q4. Odds ratios

Consider a \(2 \times 2\) contingency table from a prospective study in which people who were or were not exposed to some pollutant are followed up and, after several years, categorized according to the presense or absence of a disease. Following table shows the probabilities for each cell. The odds of disease for either exposure group is \(O_i = \pi_i / (1 - \pi_i)\), for \(i = 1,2\), and so the odds ratio is \[ \phi = \frac{O_1}{O_2} = \frac{\pi_1(1 - \pi_2)}{\pi_2 (1 - \pi_1)} \] is a measure of the relative likelihood of disease for the exposed and not exposed groups.

	Diseased	Not diseased
Exposed	\(\pi_1\)	\(1 - \pi_1\)
Not exposed	\(\pi_2\)	\(1 - \pi_2\)

Q4.1

For the simple logistic model \[ \pi_i = \frac{e^{\beta_i}}{1 + e^{\beta_i}}, \] show that if there is no difference between the exposed and not exposed groups (i.e., \(\beta_1 = \beta_2\)), then \(\phi = 1\).

Q4.2

Consider \(J\) \(2 \times 2\) tables, one for each level \(x_j\) of a factor, such as age group, with \(j=1,\ldots, J\). For the logistic model \[ \pi_{ij} = \frac{e^{\alpha_i + \beta_i x_j}}{1 + e^{\alpha_i + \beta_i x_j}}, \quad i = 1,2, \quad j= 1,\ldots, J. \] Show that \(\log \phi\) is constant over all tables if \(\beta_1 = \beta_2\).