Distributed Inference for Degenerate U-Statistics with Application to One and Two Sample Test
Abstract
In many hypothesis testing problems such as one-sample and two-sample test problems, the test statistics are degenerate U-statistics. One of the challenges in practice is the computation of U-statistics for a large sample size. Besides, for degenerate U-statistics, the limiting distribution is a mixture of weighted chi-squares, involving the eigenvalues of the kernel of the U-statistics. As a result, it’s not straightforward to construct the rejection region based on this asymptotic distribution. In this research, we aim to reduce the computation complexity of degenerate U-statistics and propose an easy-to-calibrate test statistic by using the divide-and-conquer method. Specifically, we randomly partition the full n data points into kn even disjoint groups, and compute U-statistics on each group and combine them by averaging to get a statistic Tn. We proved that the statistic Tn has the standard normal distribution as the limiting distribution. In this way, the running time is reduced from O(n^m) to O( n^m/km_n), where m is the order of the one sample U-statistics. Besides, for a given significance level , it’s easy to construct the rejection region. We apply our method to the goodness of fit test and two-sample test. The simulation and real data analysis show that the proposed test can achieve high power and fast running time for both one and two-sample tests.