In many binary classification applications such as disease diagnosis, practitioners commonly face the need to control the type I error (i.e., the conditional probability of misclassifying a class 0 observation as class1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes the type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it was not recognized and implemented in classification schemes until recently. In contrast to the NP paradigm, common practices that directly control the empirical type I error under alpha are unsatisfactory because the resulting classifiers are still likely to have the population type I errors much larger than alpha.
In this work, I will introduce an umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. The umbrella algorithm uses sample splitting and order statistics to construct a classifier such that the population type I error is controlled under alpha with high probability.
If time permits, I'll briefly introduce our recent work that links the cost-sensitive (CS) learning, another paradigm for asymmetric binary classification, with the NP paradigm. I'll discuss the relative advantages of each paradigm and provide an NP interpretation for classifiers constructed under the CS paradigm.
Topic: Statistics Seminar - Jessica Li (UCLA)
Time: Aug 27, 2020 03:30 PM Central Time (US and Canada)
Meeting ID: 910 4471 0375