Affiliation:
1. UC San Diego, USA. p1xie@ucsd.edu
2. Northeastern University, USA
3. UC Santa Cruz, USA
Abstract
Abstract
In many NLP applications, to mitigate data deficiency in a target task, source data is collected to help with target model training. Existing transfer learning methods either select a subset of source examples that are close to the target domain or try to adapt all source examples into the target domain, then use selected or adapted source examples to train the target model. These methods either incur significant information loss or bear the risk that after adaptation, source examples which are originally already in the target domain may be outside the target domain. To address the limitations of these methods, we propose a four-level optimization based framework which simultaneously selects and adapts source data. Our method can automatically identify in-domain and out-of-domain source examples and apply example-specific processing methods: selection for in-domain examples and adaptation for out-of-domain examples. Experiments on various datasets demonstrate the effectiveness of our proposed method.
Reference103 articles.
1. Domain adaptation via pseudo in-domain data selection;Axelrod,2011
2. Source-relaxed domain adaptation for image segmentation;Bateson;CoRR,2020
3. Online learning rate adaptation with hypergradient descent;Baydin;CoRR,2017
4. A theory of learning from different domains;Ben-David;Machine Learning,2010
5. Integrating structured biological data by kernel maximum mean discrepancy;Borgwardt;Bioinformatics,2006