Cameron Campbell and Bijia Chen published a paper “Nominative Linkage of Records of Officials in the China Government Employee Dataset-Qing (CGED-Q)” in Historical Life Course Studies. It shares their experience with nominative linkage in the CGED-Q. It is intended to be useful to others who are engaged in large-scale, automated nominative linkage (disambiguation) of individuals in historical Chinese-language sources.
While the approach that they arrived at after many iterations may be specific to the CGED-Q and its contents, the summary of the challenges will be of broader interest, and the methods should at least be a roadmap for others with related projects. Major issues the paper documents and then addresses include the use of variant orthographies for the same character in different editions or sources, replacement of characters with ones that look similar but are actually completely different, replacement of characters with homophones, inconsistencies in the writing of the names of counties, and changes in boundaries that led the same county to be associated with different provinces in different sources or editions.
The complete tabulations that are the basis of the tables in the paper are also available. These include the frequencies of surnames and given names in the CGED-Q JSL, and the frequencies of discordance across record of the same individual in the recording of surnames, characters in given names, and place of origin. The tabulations can be downloaded at the HKUST and Harvard Dataspaces:
Those not specifically interested in linkage may still be interested in the tabulations of surnames and characters in given names.
Footnote 21 on page 245 states that “Huguang湖廣” refers to “Hunan and Guangdong”. Ma Ziyao has written to point out that “In most cases, however, it has been a legacy term for Hubei and Hunan. The “guang” here originally comes from Guangxi during the Yuan but should not be mistaken for the Qing-era Liangguang兩廣 region to the south of Hunan.” We are grateful to Ma Ziyao for bringing this to our attention.