Distributed Inference for Degenerate U-Statistics with Application to One and Two Sample Test

Atta-Asiamah, Ernest2021-03-042021-03-042020https://hdl.handle.net/10365/31777In many hypothesis testing problems such as one-sample and two-sample test problems, the test statistics are degenerate U-statistics. One of the challenges in practice is the computation of U-statistics for a large sample size. Besides, for degenerate U-statistics, the limiting distribution is a mixture of weighted chi-squares, involving the eigenvalues of the kernel of the U-statistics. As a result, it’s not straightforward to construct the rejection region based on this asymptotic distribution. In this research, we aim to reduce the computation complexity of degenerate U-statistics and propose an easy-to-calibrate test statistic by using the divide-and-conquer method. Specifically, we randomly partition the full n data points into kn even disjoint groups, and compute U-statistics on each group and combine them by averaging to get a statistic Tn. We proved that the statistic Tn has the standard normal distribution as the limiting distribution. In this way, the running time is reduced from O(n^m) to O( n^m/km_n), where m is the order of the one sample U-statistics. Besides, for a given significance level , it’s easy to construct the rejection region. We apply our method to the goodness of fit test and two-sample test. The simulation and real data analysis show that the proposed test can achieve high power and fast running time for both one and two-sample tests.NDSU policy 190.6.2https://www.ndsu.edu/fileadmin/policy/190.pdfdegenerate and non degeneratedivide-and-conquergoodness-of-fit testhypothesis testingmaximum mean discrepancyU-statisticsDistributed Inference for Degenerate U-Statistics with Application to One and Two Sample TestDissertation