Stanford Seminar - Replication strategies for more robust human simulation

()
Stanford Seminar - Replication strategies for more robust human simulation

Using LLMs for Social Scientific Research

  • LLMs can advance social scientific inquiry and simulate human behavior.
  • LLMs can be used to understand how people make decisions, interact with each other, and form opinions.
  • There are challenges in using LLMs for social scientific research, such as sampling bias and variations.
  • Appropriate interfaces and standards are needed for using LLMs in social scientific research.

Validity and Reproducibility of LLM-Generated Findings

  • Concerns about the validity and reproducibility of social science findings generated using LLMs.
  • Some studies embrace transparency and reproducibility by providing prompting materials and input data.
  • Open-source models and data are advocated to understand biases and ensure reproducibility.
  • Need to assess distinct threats to reproducing social science with AI models.
  • Prior work focused on estimating bias and sampling problems in other new research settings.

Threats to Robust Social Scientific Replication and Simulations

  • LLM-specific threats:
    • Prompt sensitivity: Idiosyncrasies in crafting prompts affect generalizability.
    • Stochasticity: Inherent randomness impacts consistency and reliability.
    • Memorization: Reproducing artifacts of training data leads to biased simulations.
  • Sensitivity probes:
    • Perturbation: Observing effects of small changes to prompts or parameters.
    • Data augmentation: Assessing sensitivity to variations in input data.
    • Model comparison: Comparing results across different LLMs or datasets.

Perturbation and Iteration

  • Perturbation: Systematically varying prompts and settings to assess sensitivity.
    • Dimensions of perturbation: study protocol, settings, prompting strategies, model version.
  • Iteration: Drawing multiple samples to understand distributional characteristics.
  • Re-replication: Replicating existing replications to assess consistency.
  • Perturbation and iteration can be combined to understand the sampling distribution of perturbed results.

Replication and Re-replication in Scientific Research

  • Replication: Repeating a study to confirm or refute the original findings.
  • Re-replication: Replicating an existing replication to assess consistency.
  • Re-replication is not as common in social science as replication and meta-analysis.
  • Implications of replications and re-replications in social science.

Example: Overhead Aversion in Donations to Charities

  • Original study by Ergy et al. (2014) on overhead aversion in donations.
  • Simulation study using a language model to replicate the original study.
  • Model exhibited overhead aversion but was more extreme than human participants.
  • Questions about whether the model's behavior is a compelling replication.

Probing and Exploring the Space of Settings

  • Study using a language model to simulate a social science experiment.
  • Model's choices compared to human data from the original study.
  • Model's overall patterns resemble human data, but point estimates are extreme.
  • Perturbing prompting and settings produces substantial variations in model output.
  • Probing and exploring settings reveal important information about result sensitivity.

Overwhelmed by Endless Content?