Let \(Y_i\) be the number of successes in \(n_i\) trials with \[ Y_i \sim \text{Bin}(n_i, \pi_i), \] where the probabilities \(\pi_i\) have a Beta distribution \[ \pi \sim \text{Be}(\alpha, \beta) \] with density function \[ f(x; \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} x^{\alpha - 1} (1 - x)^{\beta - 1}, \quad x \in [0, 1], \alpha > 0, \beta > 0. \]
Find the mean and variance of \(\pi\).
Find the mean and variance of \(Y_i\) and show that the variance of \(Y_i\) is always larger than or equal to that of a Binomial random variable with the same batch size and mean.
Verify that the log-likilihood \(\ell_i\) of a binomial proportion \(Y_i\), where \(m_i Y_i \sim \text{Bin}(m_i, p_i)\), satisfies
\[\begin{eqnarray*}
\mathbb{E} \frac{\partial \ell_i}{\partial \mu_i} &=& 0 \\
\operatorname{Var} \frac{\partial \ell_i}{\partial \mu_i} &=& \frac{1}{\phi V(\mu_i)} \\
\mathbb{E} \frac{\partial \ell_i^2}{\partial^2 \mu_i} &=& - \frac{1}{\phi V(\mu_i)},
\end{eqnarray*}\] with \(\phi = 1\), \(\mu_i = p_i\), and \(V(\mu_i) = p_i (1 - p_i)/m_i\). Therefore the \(U_i\) in quasi-binomial method mimics the behavior of a binomial model.
Let \(Y_1,\ldots,Y_n\) be independent random variables with \(Y_i \sim \text{Poisson}(\mu_i)\) and \(\log \mu_i = \mathbf{x}_i^T \boldsymbol{\beta}\), \(i = 1,\ldots,n\).
Write down the log-likelihood function.
Derive the gradient vector and Hessian matrix of the log-likelhood function with respect to the regression coefficients \(\boldsymbol{\beta}\).
Show that the log-likelihood function of the log-linear model is a concave function in regression coefficients \(\boldsymbol{\beta}\). (Hint: show that the negative Hessian is a positive semidefinite matrix.)
Show that for the fitted values \(\widehat{\mu}_i\) from maximum likelihood estimates \[ \sum_i \widehat{\mu}_i = \sum_i y_i. \] Therefore the deviance reduces to \[ D = 2 \sum_i y_i \log \frac{y_i}{\widehat{\mu}_i}. \]
Consider a \(2 \times 2\) contingency table from a prospective study in which people who were or were not exposed to some pollutant are followed up and, after several years, categorized according to the presense or absence of a disease. Following table shows the probabilities for each cell. The odds of disease for either exposure group is \(O_i = \pi_i / (1 - \pi_i)\), for \(i = 1,2\), and so the odds ratio is \[ \phi = \frac{O_1}{O_2} = \frac{\pi_1(1 - \pi_2)}{\pi_2 (1 - \pi_1)} \] is a measure of the relative likelihood of disease for the exposed and not exposed groups.
Diseased | Not diseased | |
---|---|---|
Exposed | \(\pi_1\) | \(1 - \pi_1\) |
Not exposed | \(\pi_2\) | \(1 - \pi_2\) |
For the simple logistic model \[ \pi_i = \frac{e^{\beta_i}}{1 + e^{\beta_i}}, \] show that if there is no difference between the exposed and not exposed groups (i.e., \(\beta_1 = \beta_2\)), then \(\phi = 1\).
Consider \(J\) \(2 \times 2\) tables, one for each level \(x_j\) of a factor, such as age group, with \(j=1,\ldots, J\). For the logistic model \[ \pi_{ij} = \frac{e^{\alpha_i + \beta_i x_j}}{1 + e^{\alpha_i + \beta_i x_j}}, \quad i = 1,2, \quad j= 1,\ldots, J. \] Show that \(\log \phi\) is constant over all tables if \(\beta_1 = \beta_2\).