Lab / Active Research
Hallucination Detection
When and why does AI-assisted writing fabricate citations in submitted and accepted academic papers?
Research Questions
Are hallucinated citations more likely when an author cites outside their primary domain of expertise?
Do fields with faster publication cycles produce higher hallucination rates than slower-moving fields?
Do hallucinated citations cluster in particular sections of a paper (Related Work vs. Methods vs. Discussion)?
Pipeline
The project uses a four-phase automated and semi-automated pipeline targeting a stratified random sample of ~300–400 papers across fields with differing publication velocities.
Stratified sampling and PDF acquisition across target venues. Semi-automated with seeded random sampling for reproducibility.
Citation extraction from PDFs; CrossRef verification; GPTZero AI-content scoring. Fully automated.
Manual coding of author expertise and citation characteristics via an interactive interface. ~20 min/paper.
Hypothesis testing and visualization. Logistic regression on citation-level data; mixed models accounting for paper-level clustering.
Stack
Pre-print forthcoming. Citation details will be added upon publication.