Implementing EM Algorithm for Probit Regression
To implement EM algorithm for probit regression, at first I am going to prepare simulated data for modeling.
x <- rnorm(2000,0,2)
betas <- c(0.1 , 0.2) #True values
z <- rnorm(2000, mean = cbind(1,x) %*% betas, sd=1)
y <- z
y[y>=0] <- 1
y[y<0] <- 0
t <- glm(y~x, family = binomial(link = "probit"))
summary(t)
As the next step, we should calculate \(\gamma_i\):
\[\gamma_i=E(z_i|y=y_i)\]\(z_i\)s conditioned on \(y_i\) have truncated normal distribution. So the expected value can be calculated by the equations below. If \(y_i=1\),
\[E(z_i|z_i \geq 0)=x^T\beta+\frac{\phi(x^T\beta)}{\Phi(x^T\beta)}\]And If \(y_i=0\),
\[E(z_i|z_i<0) =x^T\beta-\frac{\phi(x^T\beta)}{\Phi(-x^T\beta)}\]b <- c(10,2) #initial values for betas
ex <- cbind(1,x)
mu <- ex %*% b
it=1
converged = FALSE
maxits = 100000
tol=0.0001
gammas=array(0,20)
while ((!converged) & (it < maxits)) {
b_old = b
gammas = ifelse(y==1,mu+dnorm(mu)/pnorm(mu), mu-dnorm(mu)/pnorm(-mu)) #E-step
b = solve(t(ex)%*%ex)%*%t(ex)%*%gammas #Mstep
mu = ex %*% b
it = it + 1
converged = max(abs(b_old - b)) <= tol
}
b
it
Note that the estimations are not at all sensitive to initial values.