New publication using process mining to study the careers of Qing officials in the CGED-Q JSL

Adam Burke at the Queensland University of Technology lead-authored a paper “State Snapshot Process Discovery on Career Paths of Qing Dynasty Civil Servants” that introduces a new process mining technique he calls ‘state snapshot process discovery’ and illustrates it by application to our CGED-Q JSL data on the careers of jinshi officials. Cameron Campbell is a co-author. The paper has been accepted for presentation at the 5th International Conference on Process Mining (ICPM2023), in Rome, Italy, in October 2023.

A pre-print of the paper is available at Adam’s website:

Here is a figure from the paper that summarizes the empirical reconstruction of the careers of first and second tier (一甲 and 二甲) jinshi in the years after they earned their degree. One of the attractions of the CGED-Q JSL for demonstrating this technique was that there were canonical career pathways specified by regulations for such high-ranked degree holders, thus it was possible to assess whether the empirical results derived from the data were consistent with the canonical career pathways. We hope that extensions of this technique, and possibly other techniques, can be used to explore the trajectories of officials with more mundane qualifications.

For this paper, Cameron Campbell helped Adam and the other collaborators (Sander Leemans and Moe T. Wynn) understand the data that we provided, and advise on adjustments to accommodate undocumented or otherwise unanticipated features of the data in successive iterations, and then assist in the writing of sections related to the data and the historical context, background on the social science studies of careers, the interpretation of the results.

We are happy to collaborate with computer scientists and other researchers developing techniques for understanding careers and trajectories more generally in complex longitudinal data, who need data like the CGED-Q to showcase their approaches.

New 大数据与中国历史 with translation of Cameron Campbell’s and James Lee’s 40 year career retrospective is now available

The 4th edition of the annual 大数据与中国历史 (Big Data and Chinese History), edited by Fu Haiyan at Central China Normal University, is out now from 社会科学文献出版社 (Social Science Documents Publishing House). It includes a Chinese translation of my and James Lee’s career retrospective, summarizing our work over the last four decades constructing and analyzing historical population and other databases for China.

The full text is available here.

Here is a link to the volume’s page at Dangdang in case you want to order:

The English language original of our retrospective is available here:

Here is the complete reference for the Chinese language translation:

康文林 (Cameron Campbell),李中清 (James Lee). 2023. 中国历史量化微观大数据:李中清-康文林团队40 年学术回顾 in 付海晏 Ed. 大数据与中国历史研究. 第4辑. Beijing:社会科学文献出版社 Social Sciences Academic Press (China), 74-114.

English version of forthcoming paper on the organizational demography of the Qing civil service

社會科學研究 (Social Science Research) published by the Sichuan Academy of Social Science has accepted our paper “The Organizational Demography of the Qing Civil Service, 1830-1911” and tentatively scheduled it for publication in 2024. In the meantime, they have given permission for us to share the English language version. Here is the PDF:

The Organizational Demography of the Qing Civil Service, 1830-1911

The paper is largely descriptive. It uses the CGED-Q JSL to measure the turnover of officials, career lengths, and years since appointment for currently serving officials. It was inspired by the older literature on organizational demography that sought to relate the performance of organizations to aggregate ‘demographic’ features such as their turnover, length of service and so forth. We hope that it will be a useful reference for anyone studying Qing officialdom. Previous studies of the dynamics of Qing official have focused on the lengths of appointments to specific posts, and turnover in those posts, rather than entire careers.

Here is the abstract:

We study the organizational demography of the Qing civil service from 1830 to 1911. Before the 20th century, the Qing bureaucracy was one of the largest non-military organizations in the world in terms of numbers of regular employees. At any given time, approximately 13,000 officials held formal appointments. We present the basic features of its organizational demography using data on nearly all civil officials with formal appointments from 1830 to 1912. We make use of longitudinally linked records of officials in the China Government Employee Database – Jinshenlu (CGED-Q JSL) to reconstruct rates of exit from service, the career lengths of officials, and the number of years since first appointment for currently serving officials. While previous studies of the Qing have examined turnover in specific types of posts, they have not considered the dynamics of complete careers. We find that exit rates in the first year of service were high and then low and stable afterward. While most officials only served for a short time, currently serving officials were relatively experienced. We also show that rates of exit from service declined for much of the last half of the 19th century, and then increased in the first decade of the 20th century. Declining turnover in the last half of the 19th century would have reduced opportunities for degree holders seeking posts and for officials seeking promotion at a time when the number of holders of purchased degrees competing for posts was increasing. We also compare different categories of officials. The results not only illuminate basic features of the organizational demography of Qing officialdom, but also provide a baseline for interpreting results from case studies of specific groups of officials or specific time periods.

Here’s a figure from the paper, presenting time trends in rates of exit from service in the next three months for officials with different amounts of experience:

Chinese Translation of Campbell’s and Lee’s Historical Chinese Microdata: 40 Years of Dataset Construction by the Lee-Campbell Research Group

A Chinese translation of Cameron Campbell’s and James Lee’s Historical Life Course Studies paper “Historical Chinese Microdata. 40 Years of Dataset Construction by the Lee-Campbell Research Group” is forthcoming in a volume of Big Data and the Study of Chinese History 大数据与中国历史研究. The title of the translation is 中国历史量化微观大数据李中清康文林团队40年学术回顾. This paper reviews all of our projects since 1979, including construction of datasets and the study of topics in population, family, and social mobility. Pending the appearance of the volume, we are making a PDF of the translation available.

Here is the PDF of the Chinese translation of Historical Chinese Microdata. 40 Years of Dataset Construction by the Lee-Campbell Research Group.

Here is the English language original, in case you missed it.


New paper on nominative linkage in the CGED-Q in Historical Life Course Studies

Cameron Campbell and Bijia Chen published a paper “Nominative Linkage of Records of Officials in the China Government Employee Dataset-Qing (CGED-Q)” in Historical Life Course Studies. It shares their experience with nominative linkage in the CGED-Q. It is  intended to be useful to others who are engaged in large-scale, automated nominative linkage (disambiguation) of individuals in historical Chinese-language sources.

While the approach that they arrived at after many iterations may be specific to the CGED-Q and its contents, the summary of the challenges will be of broader interest, and the methods should at least be a roadmap for others with related projects. Major issues the paper documents and then addresses include the use of variant orthographies for the same character in different editions or sources, replacement of characters with ones that look similar but are actually completely different, replacement of characters with homophones, inconsistencies in the writing of the names of counties, and changes in boundaries that led the same county to be associated with different provinces in different sources or editions.

The complete tabulations that are the basis of the tables in the paper are also available. These include the frequencies of surnames and given names in the CGED-Q JSL, and the frequencies of discordance across record of the same individual in the recording of surnames, characters in given names, and place of origin. The tabulations can be downloaded at the HKUST and Harvard Dataspaces:

Those not specifically interested in linkage may still be interested in the tabulations of surnames and characters in given names.


Footnote 21 on page 245 states that “Huguang湖廣” refers to “Hunan and Guangdong”. Ma Ziyao has written to point out that “In most cases, however, it has been a legacy term for Hubei and Hunan. The “guang” here originally comes from Guangxi during the Yuan but should not be mistaken for the Qing-era Liangguang兩廣 region to the south of Hunan.” We are grateful to Ma Ziyao for bringing this to our attention.

New paper recapping 40 years of ‘big data’ research on population, family and social history by James Lee, Cameron Campbell and collaborators

James Lee and Cameron Campbell in Daoyi in 1987

Cameron Campbell and James Lee just published an article on the contributions of the Lee-Campbell Research Group to a new scholarship of discovery in Chinese population history, family history, and socio-economic history based on our construction and analysis from 1979 to 2020 of large historical datasets from largely archival records.  Our paper first introduces these datasets, then describes our joint research, and concludes with a summary of our major analytic results. This publication is an invited contribution to a special issue of Historical Life Course Studies which introduces the major historical population databases. Papers on the Quebec BALSAC project and the Historical Sample of the Netherlands are already posted.

Here is the web page:

This both a career retrospective and a comprehensive summary of everything that James, Cameron and the Lee-Campbell Group have done together over more than four decades that ties everything together and shows how each project led to the next. It starts from our early efforts in population history using household registers, and proceeds sequentially up to the present day, including our new projects on university students, civil officials, and educated professionals.

In front of the No. 1 Historical Archives in Beijing in 1987

The section on the history of our collaboration will hopefully be the most readable: it starts with James Lee’s visit to China in 1979 to look for records in historical archives that could be turned into databases, then Cameron  shows up in 1987 at the end his sophomore year at Caltech. Later others joined to form what is now the Lee-Campbell Group. We also talk about our involvement in the Eurasia Project in Population and Family History including what we hope will be interesting anecdotes, reminisces, and reflections.

The introduction to our databases and summary of results, meanwhile, is the first time we have put almost everything we have done together in one place. We hope that it will be useful for those who may be familiar with specific pieces of our work to gain a better sense of the larger research agenda In into which these pieces fit.

This was a fun paper to write, especially the history section which includes some discussion of our faculty years at Caltech 1982-2002, UCLA 1996-2015, Michigan 1980-1982, 1995-1996, 2002-2009 and most recently HKUST 2009/2013-onwards and the contributions of these institutions and our colleagues to advancing our research projects.

New paper by Cameron Campbell on exam degree holders at the end of the Qing

This image has an empty alt attribute; its file name is image-1-739x1024.pngCameron Campbell recently published a paper titled 清末科举停废对士人文官群体的影响——基于微观大数据的宏观新视角 in 社会科学辑刊 on the appointment and subsequent careers of exam degree holders and the overall composition of officialdom between 1900 and 1912, that is at the end of the Qing, and before and after the abolition of the examination system in 1905. By analysis of quarterly CGED-Q data on civil officials between 1900 and 1912 linked to rosters of jinshi 进士 and juren 举人 degree recipients for specific exam years, the paper shows that annual chances of appointment of men who already held degrees actually increased after 1905, presumably because they were no longer competing with newly-minted exam degree holders. The number of serving officials who held exam degrees remained stable after the abolition of the exams and their turnover rates remained unchained. The share of central government officials who were exam degree holders declined mainly because there was an expansion in the total number of officials, driven by officials who held other kinds of degrees. Increases in the numbers and share of officials who held purchased degrees were especially notable. The main takeaway is that the abolition of the examination system had little apparent effect on those who already had degrees. They continued to be appointed at roughly the same rate, and those who had appointments had roughly the same level of turnover as before. This image has an empty alt attribute; its file name is image-2-733x1024.png Campbell wrote the first draft of the paper in Chinese in summer 2018. Yuying Shen, Ting Wang and especially Bijia Chen then edited the text substantially. Here’s the Chinese abstract: 围绕清末科举停废及新政时期官员任命和晋升政策调整这一历史背景,采用新的微观大数据的分析方法,力图为观察清末新政前后清政府文官系统的变化提供新的视角。对1900年以来十余年间清代文官数据的分析,呈现出新政前后文官在人数、组成比例等方面的动态趋势。首先,根据官员出身,分析进士、举人及贡生等群体在整体文官系统中的比例及随时间变化的趋势。新政时期进士出身的官员群体未见受政策调整的影响,京师与地方进士官员的人数、官职分布均相对稳定,不同科年进士的任职机会大体相近。其次,虽然举人与贡生在地方官员中所占人数未变,且不同科年举人的就职机会亦未出现明显变化,但其在京师却显示巨大的变化。随着1907年后京师官员人数的增长,京师举人与贡生官员人数有相当明显的增长,且官职分布也有变化,如小京官所占比例有显著增加。最后,监生与捐纳贡生呈现出与进士和举人不同的另一种模式。1907年后监生与捐纳贡生人数增长了,但是分布的变化与举人和贡生的变化不同。 Rough English translation: With the abolition of the examination system and the reform of appointment and promotion of officials during the New Government period as a backdrop, this paper offers a new perspective on the changes in Qing government officialdom before and after the New Government period. Analysis of data on officials in the 12 years after 1900 reveals trends and patterns in the number, composition, and other characteristics of officials. First, according to an analysis of the qualifications of officials, it analyzes time trends in the shares of officials who were jinshi, juren, gongsheng or other degree holders. During the New Government period, jinshi officials were not affected by the adjustment of policies: the numbers of officials with jinshi degrees was remained stable, as did the distribution of positions they held, and the chances of appointment for jinshi from different sittings of the exam were stable. Second, even though the share of local officials who were juren and gongsheng did not change, and there was little change in the chances of appointment for holders of juren degrees, there was a large change in the capital. After the numbers of officials serving in the capital began to increase in 1907, there was an increase in the numbers of juren and gongsheng serving there, and there was a change in the types of positions they held, so that for example there was an increase in the share of of ‘minor capital officials’. Finally, jiansheng and purchased gongsheng had very different trends from jinshi and juren. The numbers of jiansheng and gongsheng increased after 1907, but changes in their distribution were different. Reference: 康文林 (Cameron Campbell). 2020. 清末科举停废对士人文官群体的影响——基于微观大数据的宏观新视角 (The Influence of the Abolition of the Examinations at the End of the Qing on the Holders of Exam Degrees).社会科学辑刊 (Social Science Journal) 2020:4(249):156–166. LINK  

English language paper introducing the CGED-Q published in the Journal of Chinese History

Our paper providing an introduction in English to the China Government Employee Dataset-Qing (CGED-Q) is now available at the Journal of Chinese History. The paper is lead-authored by Bijia Chen and is based on the second chapter of her PhD dissertation, which she defended in 2019. The paper will appear in the July 2020 issue.

Here is the abstract:

We introduce the China Government Employee Database—Qing (CGED-Q), a new resource for the quantitative study of Qing officialdom. The CGED-Q details the backgrounds, characteristics and careers of Qing officials who served between 1760 and 1912, with nearly complete coverage of officials serving after 1830. We draw information on careers from the Roster of Government Personnel (jinshenlu), which in each quarterly edition listed approximately 12,500 regular civil offices and their holders in the central government and the provinces. Information about backgrounds and characteristics comes from such linked sources as lists of exam degree holders. In some years, information on military officials is also available. As of February 2020, the CGED-Q comprises 3,817,219 records, of which 3,354,897 are civil offices and the remainder are military. In this article we review the progress and prospects of the project, introduce the sources, transcription procedures, and constructed variables, and provide examples of results to showcase its potential.

Bijia Chen is now a postdoc at  the Renmin University Institute of Qing History.

For more information about the CGED-Q, please see the CGED-Q project page.


Page 2 – Footnote 1 – line 10 – Zhenan should be Zhinan

Page 3- second line in paragraph after heading ‘Origin, current status, and future plans…’, ‘ongoing of study’ should be ‘ongoing study’

Page 3 – Footnote 6 – line 2 – Lishi Yanjiu should be Qingshi Yanjiu

Page 8 – Footnote 22 – line 1 – Jizhi should be Jiazhi

Page 8 – Footnote 26 – line 1 – Jijie should be Ji


Paper on assortative marriage in rural Shanxi during the mid-20th century published in Research in Social Stratification and Mobility

Our paper “Education, class and assortative marriage in rural Shanxi, China in the mid-twentieth century” has been accepted at Research in Social Stratification and Mobility. The paper is lead-authored by XING Long at Shanxi University and co-authored by group members Cameron Campbell, Xiangning Li, Matthew Noellert and James Lee. A pre-print is now available open access at the site: This is something we have been working on for a while and it is great to see it coming out. More information about the larger project and the data is available here. Here is the abstract:
This paper examines the consequences of political, economic and social change in mid-twentieth century China for patterns of assortative mating by both education and class. Traditionally in China, marriages were arranged by parents, and ideally matched families of similar socioeconomic status. However, the Marriage Law passed by the People’s Republic of China in 1950 promoted free choice and forbade arranged marriage and other interference by families in the marriage decisions of their children. Later, Land Reform, Collectivization and other movements had profound impacts on rural household organization and social relations. We investigate their effects on assortative mating by using novel linked administrative data compiled in rural Shanxi Province in North China in the mid-1960s. These data record the education and family class labels (jiating chushen) of spouses for 1459 couples in 30 villages. The class labels were assigned in the 1950s based on family landholding before the Land Reform and became hereditary. We find that class label had effects above and beyond those of education, suggesting that assortative mating studies that only consider education overlook an important dimension of social status in marriage patterns, and thereby overstate the overall permeability of boundaries between social groups. Furthermore, by comparing couples according to whether they married before or after 1949, we find that patterns of homogamy and hypergamy remained highly stable in the face of substantial social transformation after 1949.