Probit model question (regression)

Question

I'm reading a thesis and I need your help to understand the equation below.

$$\Pr(\text{failure}=1 \mid X_1,X_3,X_3,X_4)=\int_{-\infty}^z \varphi(k) \, dk\tag{1}$$

$\varphi(k)$ is the standard normal density and $z=\beta_1 X_1+\beta_2 X_2+\beta_3 X_3+\beta_4 X_4$
We estimate regression (1) through probit.

It means,
first we estimate probability of failure ($z$, in here) by probit test,
and then do the normal regression to estimate $\beta_1,\beta_2, \beta_3,\beta_4$?
(by using $z$ from first stage and we already have $X_1,X_2,X_3,X_4$ values)
We have to regress twice in this model?

Michael Hardy · Accepted Answer · 2013-08-25T13:25:45.693

In a probit model the data would typically consist of a list of cases, and for each case one would have observed the values of $X_1,X_2,X_3,X_4$ (the number of such predictors may vary) and whether there was a success or a failure. Let $n$ be the number of cases. One observes $(I_i,x_{1,i},\ldots,x_{4,i})$ for $i=1,\ldots,n$, and the value of $I_i$ would be "success" or "failure", and often this would be represented by $1$ or $0$, the number of successes in the $i$th case. But below I'll do it the other way, for sake of consistency with your notation, i.e. \begin{align} 0 & = \text{success}, \\ 1 & = \text{failure}. \end{align}

One estimates $\beta_1,\beta_2,\beta_3,\beta_4$ by maximum likelihood. That means the estimates $\widehat\beta_1,\widehat\beta_2,\widehat\beta_3,\widehat\beta_4$ would be the values that maximize the probability of the data that were actually observed. To be more explicit: The likelihood function is $$ L(\beta_1,\beta_2,\beta_3,\beta_4) = \left(\prod_{i\,:\,I_i=1} \int_{-\infty}^{\beta_1 x_{1,i}+\cdots+\beta_4 x_{4,i}} \varphi(k)\,dk \right)\left(\prod_{i\,:\,I_i=0} \left(1 - \int_{-\infty}^{\beta_1 x_{1,i}+\cdots+\beta_4 x_{4,i}} \varphi(k)\,dk\right) \right). $$ Each integral is from $-\infty$ to $\beta_1 x_{1,i}+\cdots+\beta_4 x_{4,i}$.

The estimates $\widehat\beta_1,\widehat\beta_2,\widehat\beta_3,\widehat\beta_4$ are the values that maximize this. These would be found by numerical methods; I don't think there's a closed form.

I think typically there would be an additional parameter: one would have $\alpha+\beta_1 x_{1,i}+\cdots+\beta_4 x_{4,i}$ as the upper bound of integration, and the likelihood function would be $L(\alpha,\beta_1,\beta_2,\beta_3,\beta_4)$.

So: No, you don't regress twice.

At some point while in graduate school I got the impression that the use of probit models had been superseded by the use of logit models. Later I got the impression that that may be a somewhat simple view of the matter and a more nuanced view exists. But I am not able to say any more than that now. — Michael Hardy, Aug 25 '13 at 13:19
Thanks a lot. In fact, I'm trying to regress that model another way. In the thesis, they use probit since they give $1$ if it fails, but I use the formula for $z$(failure prob) and then regress $z=\beta_1 X_1beta_2 X_2+\beta_3 X_3+\beta_4 X_4$ with regular way. I don't know much about probit so before I do that I want to know what exatly that means. Thanks again. — name, Aug 25 '13 at 13:33

Probit model question (regression)

1 Answers1