Endogenous sampling with matching (also called "mixed sampling") occurs when the statistician samples from the non-right-censored subset at a predetermined proportion and matches on one or more exogenous variables when sampling from the right-censored subset. This is widely applied in the duration analysis of firm failures, loan defaults, insurer insolvencies, and so on, due to the low frequency of observing non-right-censored samples (bankrupt, default, and insolvent observations in respective examples). However, the common practice of using estimation procedures intended for random sampling or for the qualitative response model will yield either an inconsistent or inefficient estimator. This paper proposes a consistent and efficient estimator and investigates its asymptotic properties. In addition, this paper evaluates the magnitude of asymptotic bias when the model is estimated as if it were a random sample or an endogenous sample without matching. This paper also compares the relative efficiency of other commonly used estimators and provides a general guideline for optimally choosing sample designs. The Monte Carlo study with a simple example shows that random sampling yields an estimator of poor finite sample properties when the population is extremely unbalanced in terms of default and non-default cases while endogenous sampling and mixed sampling are robust in this situation.
Keywords: Duration models; Endogenous sampling with matching; Maximum likelihood estimator; Manski-Lerman estimator; Asymptotic distribution
Views expressed in the paper are those of the authors and do not necessarily reflect those of the Bank of Japan or Institute for Monetary and Economic Studies.