Data

I have used a variety of population-level survey data as well as online text corpora in my research and teaching. I share links of these sources and tools here. My respective published papers using these datasets are also listed for reference.

Social Media Text Data

  1. Reddit monthly-scraped data: Project Arctic Shift by ArthurHeitmann.

  2. Twitter/X historical and longitudinal data: Twitter Archiving Project.

  3. Social Media Archive at ICPSR.

U.S. Data

  1. Digest of Education Statistics, National Center for Educational Statistics. (Buchmann, Dwyer, and Yao 2024)

  2. National Postsecondary Student Aid Study (NPSAS), National Center for Educational Statistics. (Buchmann, Dwyer, and Yao 2024)

  3. Panel Study of Income Dynamics (PSID), Institute for Social Research at University of Michigan. (Zheng, Lu, and Yao 2024)

  4. Genereal Social Survey (GSS), NORC at the University of Chicago.

  5. Study on Collegiate Financial Wellness (SCFW), Center for the Study of Student Life at The Ohio State University. (Yao, Rehr, and Regan 2024)

Cross-National Data

  1. Programme for International Student Assessment (PISA), OECD.

  2. World Values Survey (WVS).

  3. EAST Asian Social Survey (EASS).

China Data

  1. Chinese General Social Survey (CGSS), National Survey Research Center at Renmin University of China. (Yao and Han 2024, Downey, Yao, and Merry 2024)

  2. China Education Panel Survey (CEPS), National Survey Research Center at Renmin University of China.

  3. China Family Panel Studies (CFPS), Institute of Social Science Survey at Peking University.

  4. Beijing College Students Panel Survey (BCSPS), National Survey Research Center at Renmin University of China. (Yao 2023)