Assume we have $n$ observations: $\left(Y_i, x_i\right), i=1, \cdots, n$, wher e $Y_i$ is the random response and $x_i=\left(x_{i 1}, \cdots, x_{i p}\right)^T$ is a ve ctor of $p$ fixed covariates for the $i$ th observation. Denote $\beta=\left(\beta_1, \cdots, \beta_p\right)$ be a unknown $p$-length vector of regressio n coefficients. Let $\theta_i=\sum_{j=1}^p x_{i j} \beta_j, \mu_i=E\left(Y_i\right)$ and $\sigma_i^2=\operatorname{Var}\left(Y_i\right)$. Assume the density of $Y_i$ belongs to the follo wing exponential family:
$$f\left(y_i ; \theta_i\right)=\exp \left\{\theta_i y_i-b\left(\theta_i\right)\right\},(1)$$
where $b^{\prime}\left(\theta_i\right)=\mu_i, b^{\prime \prime}\left(\theta_i\right)=\sigma_i^2$. Suppose that all $\theta_i$ 's are con tained in a compact subset of a space $\Theta$. Let $\ell_n(\beta)$ be the log -likelihood function of the data, and let $H_n(\beta)=-\frac{\partial^2 \ell_n(\beta)}{\partial \beta \partial \beta^T}$.

Let $\mathcal{X}$ be the set of all $p$ covariates under consideration. Let $\alpha_0 \subset \mathcal{X}$ be the subset that contains and only contains all the important covariates affecting $Y$ (the corresponding $\beta_j$ 's are nonzero). Let $\alpha$ be any subset of $\mathcal{X}$, and let $\beta(\alpha)$ be the vect or of the components in $\beta$ that correspond to the covariates $\mathrm{i}$ $\mathrm{n} \alpha$. Let $A=\left\{\alpha: \alpha_0 \subset \alpha\right\}$ be the collection of models that including all important covariates. We assume:
(I) There exist positive constants $C_1, C_2$ such that for all suffic iently large $n$,
$$C_1 < \lambda_{\min }\left\{\frac{1}{n} H_n(\beta)\right\} < \lambda_{\max }\left\{\frac{1}{n} H_n(\beta)\right\} < C_2$$
where $\lambda_{\min }\left\{\frac{1}{n} H_n(\beta)\right\}$ and $\lambda_{\max }\left\{\frac{1}{n} H_n(\beta)\right\}$ are the smalles $\mathrm{t}$ and largest eigenvalues of $\frac{1}{n} H_n(\beta)$.
(II) For any given $\varepsilon>0$, there exists a constant $\delta>0$ such th at, when $n$ is sufficiently large,
$$(1-\varepsilon) H_n(\beta(\alpha)) \leq H_n(\tilde{\beta}) \leq(1+\varepsilon) H_n(\beta(\alpha))$$
for all $\alpha \in A$ and $\tilde{\beta}$ satisfying $\|\tilde{\beta}-\beta(\alpha)\| \leq \delta$.

For any model $\alpha$. let $\hat{\beta}_\alpha$ be the MLE of $\beta(\alpha)$ based on this m odel. Show that
$$\max _{\alpha \in A}\left\|\hat{\beta}_\alpha-\beta(\alpha)\right\|=O_p\left(n^{-1 / 3}\right)$$