1. Gradient descent provably optimizes over-parameterized neural networks;du;Proc Int Conf Learn Represent,2018
2. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes;shamir;Proc Int Conf Mach Learn,2013
3. Linking losses for density ratio and class-probability estimation;menon;Proc Int Conf Mach Learn,2016
4. Discriminative learning under covariate shift;bickel;J Mach Learn Res,2009