Affiliation:
1. School of Chinese Language and Literature, Nanjing Normal University, No.122, Ninghai Road , Nanjing 210097, China
2. Center of Language Big Data and Computational Humanities, Nanjing Normal University, No.122, Ninghai Road , Nanjing 210097, China
3. East Asian Languages and Civilizations, Harvard University , Cambridge, MA 02138, USA
Abstract
Abstract
Kinship is an important issue in history studies. The kinship database is the key resource to analyze the structure, succession, and evolution of families. However, one kinship could be expressed by different words, and one kinship word may be vague and ambiguous in natural languages, especially in pre-modern Chinese. As in the well-known China Biographical Database, which contains 484,066 kinship instances, there are more than 400 kinship words. Thus, the relations extracted from history texts cannot be directly used to build family networks. In this article, we put forward a novel method to normalize kinship relations by three basic relations: father–descendant, mother–descendant, and husband–wife, as well as the gender of each person. All types of kinships are normalized to these three basic relations. In this way, we identified 178,390 basic kinship relations to fully describe the original 462,147 unambiguous kinship instances, while finding 3,989 inconsistencies and inferring 5,805 missing persons. Then, we generate 29,423 families by basic kinship relations and analyze the properties of families, such as their sizes, depths, and intermarriages across families. This type of family analysis had been almost impossible prior to normalizing kinship relations. Therefore, this technique enables improved family database construction and deeper quantitative analysis.
Publisher
Oxford University Press (OUP)