Focus on AI: AI Has Joined the Chat
Monday, 06/29/2026
Last Updated: Monday, 06/29/2026
By Noor HindiUniversity of Michigan School of Information faculty and PhD students are advancing the field of artificial intelligence through innovative research and impactful contributions. Here are some of their recent publications.
Publications
Patient Perceptions and preferences for the disclosure of artificial intelligence generated draft replies to electronic messages – A qualitative study
International Journal of Medical Informatics, September 2026
Philip D Barrison, Jodyn Platt, Mark S Ackerman, Charles P Friedman, Alexandra H Vinson
Background: Since their public release, artificial intelligence-generated draft replies (GDRs) to patient portal messages have been rapidly adopted across healthcare systems. Concurrently, debate has arisen regarding the extent to which GDRs should be disclosed to patients. This qualitative interview study, using vignettes, examines adult patients’ reactions to and preferences regarding written AI disclosure statements.
Methods: Semi-structured interviews were conducted with 30 adult portal users with no prior experience with GDRs. Participants were recruited from a patient research registry at a large academic health system. Eligible participants were stratified by age and educational attainment and randomly sampled. Participants shared their initial interpretations, reactions, and preferences regarding GDRs with written disclosure statements, using four case-based vignettes. Interview data were analyzed through an interpretivist epistemology and a systematic process of inductive code and category generation and synthesis.
Results: Patient interpretations of disclosure statements suggested that written disclosure statements did not provide sufficient information to clarify how AI was used, with nearly half of participants unable to identify AI as the draft’s writer. GDR disclosure statements elicited participant concerns about error, authorship, and depersonalization. Despite these concerns, most participants expressed a preference for disclosure practices, citing a perceived right to information, agency, and autonomy in their healthcare. Those participants who expressed the most concerns when presented with written disclosure statements argued for preemptive disclosure methods that facilitate education and trust building.
Conclusion: As healthcare systems establish best practices for transparency and disclosure of GDRs, accounting for evolving patient perspectives on AI will be crucial to maintaining patient trust and confidence. The reactions and preferences for disclosure methods outlined by participants in this study suggest that careful deliberation should be given not to whether to disclose, but how to disclose, as healthcare systems balance tensions between AI transparency and patient trust.
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue
The 64th Annual Meeting of the Association for Computational Linguistics, July 2026
Jonathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni Møller, Lechen Zhang, David Jurgens
Studying and building datasets for dialogue tasks is both expensive and time-consuming due to the need to recruit, train, and collect data from study participants. In response, much recent work has sought to use large language models (LLMs) to simulate both human-human and human-LLM interactions, as they have been shown to generate convincingly human-like text in many settings. However, to what extent do LLM-based simulations \textit{actually} reflect human dialogues? In this work, we answer this question by generating a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset and quantifying how well the LLM simulations align with their human counterparts. Overall, we find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties, including style and content. Further, in comparisons of English, Chinese, and Russian dialogues, we find that models perform similarly. Our results suggest that LLMs generally perform better when the human themself writes in a way that is more similar to the LLM's own style.
Caring for the Furry Friends in the Smart Home: An Initial Exploration of a Child-Centered Approach to Designing for Pets
IDC '26: Proceedings of the 25th Annual ACM Interaction Design and Children Conference, June 2026
Jade Xiaoyi Li, Jason Yip, Katie Davis, Florian Schaub, Christopher Brooks, Jenny Radesky, Kaiwen Sun
Smart home technologies are often designed to meet the needs of adults, yet children and pets also live with these systems without being meaningfully considered in their design. Child-Computer Interaction (CCI) researchers have shown the value of studying children's experiences and ideation of technologies used in the domestic space. In this pictorial, we explore experiences at the intersection of children, pets, and smart home technologies by analyzing data from an in-home study with 6-to-11-year-olds. Our analysis identifies five themes in how children perceive smart home technologies in the context of pet care: convenience, presence, physical comfort, emotional wellbeing, and responsibility. Grounded in children's everyday routines of playing with and looking after their pets, this work offers design directions for domestic technologies that account for non-human household members.
Designing Workbook Probes for Families: A Smart Home Case Study of Intergenerational Co-Speculation
IDC '26: Proceedings of the 25th Annual ACM Interaction Design and Children Conference, June 2026
Kaiwen Sun, Jade Xiaoyi Li, Irene Chung, Jenny Radesky, Jason Yip, Christopher Brooks, Florian Schaub
Smart home technologies increasingly shape family life, yet research lacks methods helping children and parents jointly articulate experiences and imagine alternatives. We present a family-centered design workbook probe for intergenerational co-speculation about smart home futures, along with a six-step design process model that translates research evidence into narrative scenarios, child-friendly illustrations, and collaborative activities. We deployed the workbook for 2–3 months with nine families (children 6–11), and triangulated our design rationale with families’ completed workbook and interviews. We distill five methodological lessons for designing workbooks that sustain shared participation and co-ideation about otherwise invisible domestic technologies: tangible, unplugged materiality; curated-yet-universal scenarios; fictional narratives that create a safe third space; bounded prompts that support perspective-taking; and a relational deployment infrastructure supporting family coordination. We provide transferable guidance for HCI/CCI researchers, framing workbooks as participation scaffolds that balance adult-child dynamics and center family voices in domestic technology research.
Queer Zineographies: Materializing Tactics for Resisting AI and Data Systems
DIS '26: Proceedings of the 2026 Designing Interactive Systems Conference, June 2026
Alexandra Teixeira Riggs, Louie Søs Meyer, Molly O'Reilly-Kime, Tommaso Armstrong, Kay Kender, Ekat Osipova, Anh-Ton Tran, Jordan Taylor, Annabel Rothschild, Imke Grabe, Irene Kaklopoulou, Caitlin Lustig, Sonja Rattay, Liza Shkirando, Fe Simeoni, Grace Leonora Turtle, Ann Light, Carl DiSalvo, Oliver L. Haimson
As AI and data systems often falter when encountering queer identities and knowledge, reinforcing existing oppressions, queer people have resisted such systems and their normalizing tendencies. This pictorial explores tactics of queering AI through a collaborative zine-making project (i.e. zineography) that challenges generative AI and data systems. We share how we workshopped and materialized queering tactics in zine spreads; analyzed these spreads according to materials, content, and tone; and visualized our analysis as thematic collages. We contribute: (1) tangible characteristics of queering AI and data systems (i.e. materials, tones, and aesthetics); and (2) design opportunities for using zineographies as a radical method for building and collectively sharing knowledge about a marginalized community, including recommendations for enacting queer zineographies. By materializing queering tactics through zine-making, we invite embodied, action-oriented critiques that question dominant techno-solutionist movements and trace queer possibilities outside of their normalizing narratives.
Algorithm Auditing Policies Rest on Flawed Assumptions About Public Sector Systems
FAccT '26: The 2026 ACM Conference on Fairness, Accountability, and Transparency, June 2026
Nel Escher, Nikola Banovic, Ben Green
Acknowledging that automated decision-making systems can generate significant societal harms, policymakers have proposed regulations that require audits of algorithms. However, although these policies are popular, scholars have warned that algorithm audits can provide a veneer of accountability without meaningful impact. In this paper, we evaluate the emerging landscape of algorithm auditing policies. First, we analyze 25 United States policies that require audits of public sector algorithms to distill their core assumptions. Second, we test those assumptions against a representative case of public sector algorithms gone wrong: the Michigan Integrated Data Automated System (MiDAS). We find that auditing policies rely on flawed assumptions that limit their ability to identify and address algorithmic harms. While audit policies conceptualize harm as discriminatory bias, government algorithms can produce a much broader range of issues, including misallocation of public resources and violations of due process. While audit policies investigate algorithms as static and isolated pieces of software, these systems are dynamic and embedded in complex relationships with other technical systems and human practices. While audit policies envision that problems uncovered by audits will be corrected, many public agencies lack the capacity to fix or replace automated systems. In light of these flawed assumptions, we provide recommendations to improve audit policies and the broader governance of public sector algorithms.
Measuring Simulation Fidelity via Statistical Detectability: A Diagnostic Framework for AI-Generated Tutoring Conversations
L@S '26: Proceedings of the Thirteenth ACM Conference on Learning @ Scale, June 2026
Michael Ion, Kevyn Collins-Thompson
Large-scale, realistic simulations of learning interactions powered by large language models (LLMs) have the potential for significant impact in educational research and practice at scale: from allowing fast, inexpensive, safe evaluation, and pretesting of AIassisted systems, to scalable practice sessions in teacher preparation programs. However, if synthetic conversations do not preserve essential statistical properties of real interactions – especially those related to cognitive aspects of teaching and learning – conclusions drawn from them may fail to generalize to authentic settings. Basic methods to estimate the fidelity of synthetic data, such as nonparametric tests of distributional similarity or comparing marginal distributions for individual features, lack detailed diagnostic ability and interpretability, especially when real data are characterized by complex feature interactions. We propose measuring synthetic conversation quality in terms of interpretable statistical detectability, using as a starting point recent progress in statistics developing the propensity score mean squared error (pMSE) ratio metric, originally introduced for synthetic tabular data validation. We show how to adapt the pMSE approach to conversation data by first developing a feature extraction workflow that maps variable-length natural language dialogues to a rich representation vector capturing both surface patterns (message length, vocabulary, turn structure) and cognitive dynamics (confusion duration, resolution patterns, hint sensitivity). By sampling from a reference dataset of authentic online tutoring dialogues, we create and evaluate a series of increasingly sophisticated synthetic conversational datasets generated by iteratively improved prompt strategies with a state-of-the-art LLM. We show that our fidelity assessment framework is effective at detecting real vs synthetic differences not only between marginal distributions of surface features (e.g., mean and variance of tutor message lengths), but also joint distributions of cognitive-related features (e.g., confusion rate × confusion duration). These pMSE-based measures also effectively replicate the sophistication levels of our prompt strategies. However, even the most advanced prompt strategy (V4) produces a normalized pMSE of 50.6% (lower is better, zero is ideal), showing that substantial potential for improvement still exists for LLM-based generation of synthetic tutoring conversations.
Designing with AI at Work: Designers' Expertise and Pragmatic Decision-Making in Workplace AI Transformation
FAccT '26: The 2026 ACM Conference on Fairness, Accountability, and Transparency, June 2026
Lu Xian, Huiran Yi, Yile Zhang, Jingyan Zeng, Zifan Zhang
Ever-evolving generative AI tools are increasingly embedded in professional practices, especially in creative and knowledge work. While prior research has examined AI supporting creative activities through lenses of automation and tool development, less is known about how designers themselves interpret and negotiate challenges and changes amid AI transformation in workplaces. Through interviews with 23 designers, we show that designers assume responsibility for determining whether and how to use AI in practice. Within corporate contexts, designers selectively adopt AI tools, rely on aesthetic knowledge to judge design quality, and align design outputs with collective pursuits and business values. Additionally, they need to navigate organizational constraints in collaboration with colleagues to ensure that design-with-AI outputs can be produced and materialized. Designers engage in what we term “pragmatic decision-making,” through which designers' invisible labor effectively governs AI in the workplace. Our contribution to FAccT is twofold. First, we provide an empirical account of designers' workplace practices, showing how they rely on aesthetic knowledge to evaluate design qualities and enhance their credibility. We emphasize pragmatic decision-making that shapes how AI outputs can be responsibly integrated into final design products. Second, we extend transparency and accountability discussions in FAccT to everyday production practices by highlighting transparency as the visibility of how expertise and decisions shape designing with AI practices. Designers, in deciding whether and how to adopt AI tools, enable organizations to translate AI capability into accountable, high-quality, and business-relevant work. Our findings underscore the need for a future of work that requires clearer organizational guidance, recognition of human expertise, and labor-aware approaches to AI governance.
Vibe Check: Accessibility Heuristics for Vibe Coding Interfaces
W4A '26: Proceedings of the 23rd International Web for All Conference, June 2026
Shalini Madan, Sreelakshmi Surabiyil Bindu, Venkatesh Potluri
AI coding tools are transforming programming from a highly editorial task into natural language conversations. Popular vibe coding tools such as Replit and Cursor integrate natural language interfaces with traditional development environments, simplifying software development and increasing productivity through automation. These tools, however, introduce new accessibility challenges for blind or low vision (BLV) developers. These accessibility challenges necessitate comprehensive guidelines that account for the complex interactions in these tools. To address this need, we develop a set of heuristics to assess the accessibility of conversational programming tools. Our heuristics combine web accessibility guidelines, best practices to design conversational interfaces, and accessibility needs specific to BLV developers. A validation study with 16 auditors across two phases with 3 conversational programming tools demonstrates that heuristics are applicable to surface both known accessibility issues and a substantial proportion of novel challenges posed by inaccessible human-AI interactions.
The curious case of culturally-aware AI: The need for CSCW methods in AI evaluations
ECSCW 2026 Posters and Demos, June 2026
Nasanbayar Ulzii-Orshikh, Justine Zhang, Mark S. Ackerman
Numerous efforts to design AIs to fit cultural contexts have been attempted. Efforts have been based on very general cultural characteristics or quantitative methods that are heavily reductionist. Recently, there have been calls for “thick alignment” (e.g., Nelson (2023)), where a deep understanding of the cultures would help determine the significant cultural experiences and self-representations that need to be incorporated in AI systems.Our study examined 2264 Facebook posts and comments about generative AI (genAI) images in Mongolia, an under-represented country that is wedged between Russia andChina. We show how understanding the ways that Mongolians view their world and culture is crucial to genAI systems that wish to be culturally aware, responsible, and trustworthy. This is especially important for under-represented and under-resourced populations.Our study shows the utility and importance of CSCW methods and sensibilities for culturally- aware AI. It also suggests new research directions for CSCW investigating the inclusion of social and cultural data into AI systems.
Embodied or virtually represented: Navigating the embodiment debate in human-robot interaction
Science Robotics, May 2026
Connor Esterwood, Xin Ye, Ruijia Guan, Lionel Peter Robert
The validity of virtually represented robots in HRI experiments depends on when, where, and for whom it matters.
Integrating AI Literacy in Chemistry Graduate Education: Harnessing the Power of Transformer-Based Models
AI in Education, April 2026
Yulia V. Sevryugina, Kevyn Collins-Thompson, Nils G. Walter
Rapid adoption of general-purpose generative AI (GenAI) tools, such as ChatGPT, is reshaping teaching, learning, and assessment in chemical education. In this study, we expanded the implementation of GenAI tools within an upper-level undergraduate biochemistry course, providing students access to four distinct platforms: commercial chatbots (ChatGPT and LearningClues) and in-house tools developed at the University of Michigan (U-M GPT and U-M Maizey). We analyzed student learning outcomes from GenAI-enhanced writing assignments using pre- and post-surveys. Our results show that integrating GenAI into biochemistry coursework promoted effective and responsible usage, enhanced students’ prompt literacy, built ethical awareness, and increased confidence in utilizing these tools. The study specifically examined factors influencing GenAI acceptance: familiarity, perceived usefulness, ease of use, and trust. Trust emerged as the most significant criterion, with a majority of students recommending in-house chatbots for future cohorts due to strong privacy and ethical standards. Over the last year, we observed a shift in student sentiment from excitement about efficiency to emerging concerns about creativity silencing. This highlights the importance of addressing both capabilities and risks of using AI-tools through teaching AI literacy.
AI Has Joined the Chat: Exploring the Role of Generative AI Assistants in Group Chats
Connective AI (Book) (Routledge), May 2026
Eden Litt, Nicole B. Ellison, Lauren Scissors, Isabelle Giordano, Shreshta Bhat
More and more people are turning to generative artificial intelligence (AI) assistants like ChatGPT and Meta AI for tasks ranging from research help to seeking relational advice. While the overwhelming majority of generative AI experiences today are a one-on-one (or solo human-to-AI) experience, a few platforms have recently introduced the ability for groups of people to engage together with an AI one-to-many—potentially creating the foundation for a communication paradigm shift that could introduce new and critical questions about this technology’s impact on our communication practices and the relationships they support. In this chapter, we offer some early thoughts about this potential shift, drawing on data about group use of generative AI applications from a survey of 1,300 generative AI users in the United States. Although solo and group use of generative AI assistants leverage similar design and technology today, and the experience is still new, our results suggest that group use may entail some unique features and affordances, each representing a collection of benefits and hurdles. For example, our findings suggest that group use may yield a different mental model of the technology (seeing it as a social actor versus a tool), evoke different use cases (more social and entertainment activities versus informational ones), and leverage a different skill set than one-on-one use. While people may frame generative AI experiences as inherently antisocial, this work also highlights that much of what people are doing today is directly and indirectly social during both group and solo usage. We conclude with some speculative thoughts about how these tools may evolve and shape the nature of technology-mediated conversations in the future, surfacing domains for future research to explore.
Beyond Representation and Bias: Mimicry and Distillation Through generative a.i.
Connective AI (Book Chapter), 2026
William R. Frey, Kishonna L. Gray
Instructions for generative a.i. systems often suggest to “play around” with them, evoking a sense of innocence and imagination. But what happens when people attempt to generate Black people and Black culture—Blackness? While some concern themselves with questions of representation, we must interrupt with the question: What cannot be represented in generative a.i.? And we answer: Blackness. Drawing on a series of examples from social media and fashion, this chapter demonstrates a dehumanization of Black people through a sinister consumptive mimicry enacted by generative a.i. systems fed by white playfulness. We argue that generative a.i. operates as a distillation process, reducing multiplicities and complexities of Blackness into tamed, controllable caricatures ready for public extraction and consumption. Further synthetic literacies need to be developed so those who are subjected to the harmful impacts of a.i. systems can resist, and those who wield the power of these systems understand the potential harms.
Pre-prints, Working Papers, Articles, Workshops and Talks
Co-Designing Community-Centered AI Education for Adults: A Midwestern Case Study
arXiv, June 2026
Yao Lyu, Leonymae Aumentado, Holden Winton, Jared Lee Katzman, Sparkle Berry, Zachary Rowe, Kimberly Sanders, Tawanna R. Dillahunt
Artificial Intelligence (AI) education is increasingly important, yet adults outside higher education receive less attention. We report a case study of an AI education session with 54 adults (48 in-person and 6 virtual) in a predominantly African American community on the east side of a major Midwestern city. We ask: "What does AI education for adults outside formal educational systems look like in practice?" and "What does this AI education session reveal about AI literacy at the community level?" Through a co-designed session developed with community partners, we found that concerns about AI persisted but shifted to specific, locally grounded questions about AI design and deployment. We also discuss AI literacy from a community capacity perspective and argue for AI literacy frameworks grounded in local community contexts that strengthen community capacity.
Interpreting Style Representations via Style-Eliciting Prompts
arXiv, June 2026
Junghwan Kim, David Jurgens
Style representation learning is a powerful tool for authorship analysis and modeling writing style, yet the latent nature of learned representations makes them difficult to interpret. Recent work has attempted to explain these representations by generating natural language descriptions with large language models (LLMs) conditioned on input text. However, such descriptions are often prone to the LLM's biases and hallucinations, and they lack an explicit objective and practical utility. In this work, we propose a novel framework for interpreting style representations through style-eliciting prompts: natural language instructions designed to steer LLMs to generate text that reflects specific stylistic attributes. We curate 1,010 distinct style features spanning 26 stylistic categories and construct a dataset by prompting an LLM to generate text conditioned on these features. Using this data, we train a decoder to generate a style prompt from the style representation of the generated text. We evaluate our approach on three tasks: (1) recovering original style prompts from generated text, (2) generating text in the same style using the recovered prompts, and (3) steering LLM outputs to match the style of human-written texts. Experiments demonstrate that our method consistently outperforms strong baselines that directly prompt LLMs with target text, achieving superior performance in both style description and style imitation. These results highlight that style-eliciting prompts can provide a practical and interpretable interface to stylistic information encoded in style representations.
Molecular Lead Optimization via Agentic Tool Planni
arXiv, May 2026
Lingxiao Li, Haobo Zhang, Ruohao Fan, Bin Chen, Jiayu Zhou
Drug discovery is a lengthy and resource-intensive process composed of multiple stages. Among these stages, lead optimization plays a critical role in transforming early hit compounds into viable drug candidates. This stage requires improving ADMET-related properties through subtle structural refinement while preserving key molecular substructures responsible for binding affinity to disease targets. Recent advances in artificial intelligence have shown promise in accelerating various aspects of drug discovery; however, most existing approaches to lead optimization rely on one-step molecular optimization, which fail to account for the long-term consequences of sequential design decisions. To address this limitation, we propose TRACE, a trajectory-aware, LLM-reasoning agent for molecular lead optimization that formulates tool selection as a sequential decision-making problem over action trajectories. Given a lead molecule and an optimization objective, TRACE makes trajectory-aware decisions over molecular optimization tools, enabling forward-looking refinement under structural constraints. Experiments on multiple ADMET optimization tasks show that our agent achieves higher optimization success, larger property improvements, and higher validity, while preserving molecular similarity compared to baseline models.
Generative AI Advertising as a Problem of Trustworthy Commercial Intervention
arXiv, May 2026
Jingyi Qiu, Qiaozhu Mei
Major deployed generative AI advertising systems preserve a visible boundary between commercial content and AI-generated responses. Yet empirical research shows that ads woven directly into large language model (LLM) outputs often go undetected by users. We argue that generative AI fundamentally changes advertising: rather than placing products into discrete slots, it enables interventions on the generative process itself, which induce commercial influence through less observable channels. This reframes generative AI advertising as a problem of trustworthy intervention rather than content placement. We introduce a taxonomy organized by influence tier, corresponding to interventions on progressively more latent variables: product mentions, information framing, behavioral redirection, and long-term preference shaping; and show how these tiers instantiate across modalities and system architectures, including retrieval-augmented generation and agentic pipelines where upstream decisions can sharply constrain downstream outcomes. Both major deployed systems and designed mechanisms concentrate on the most observable and easiest-to-govern tier, while the forms of commercial influence most consequential for user autonomy remain poorly understood and lack frameworks for detection, measurement, or disclosure. The central challenge is whether commercial influence in generative systems can be made trustworthy, i.e., attributable, measurable, contestable, and aligned with user welfare.
Modeling LLM Persuasion via Interactive 2T1L Game
Open Review, April 2026
Large Language Models (LLMs) are increasingly deployed in interactive settings where they shape user beliefs, yet persuasion is rarely evaluated as a dynamic, behavioral process. We introduce a controlled persuasion benchmark inspired by the social game Two Truths and One Lie (2T1L), where users identify a false statement, engage in dialogue with an AI agent, and make a final decision under varying strategic conditions (cooperative, covertly persuasive, or adversarial). Across 100 participants and 500 interaction rounds, we measure belief revision as a function of interaction mode and domain. Results show that LLMs can substantially influence user decisions, with the highest belief-change rates occurring under covert persuasion. Belief revision declines over repeated exposure, suggesting adaptive trust calibration. However, persuasion often reduces epistemic accuracy, as users who changed their answers were less likely to be correct than those who retained their initial choice. These findings highlight both the persuasive power of LLMs and the risks of belief destabilization in cooperative dialogue settings.
Hint-Writing with Deferred AI Assistance: Fostering Critical Engagement in Data Science Education
arXiv, April 2026
Anjali Singh, Christopher Brooks, Warren Li, Juho Kim, Xu Wang
Generating hints for incorrect code is a cognitively demanding task that fosters learning and metacognitive development. This study investigates three designs for personalized, scalable, and reflective hint-writing activities within a data science course: (i) writing a hint independently, (ii) writing a hint with on-demand AI assistance, and (iii) deferred AI assistance, in which students first write a hint independently and then revise it with the help of an AI-generated one. We examine how AI support can scaffold the learning process without diminishing students' productive cognitive effort. Through a randomized controlled experiment with graduate-level students (N=97), we found that deferring AI assistance leads to the highest-quality hints. Further, this design helps students identify a wide range of mistakes they otherwise struggle to identify without any AI assistance. Students valued these activities as opportunities to practice debugging and critically engage with AI outputs--skills that are now critical for learners to acquire as programming becomes increasingly automated and the use of AI for learning grows. Our findings also highlight key considerations for designing student-AI collaborative learning experiences to sustain student engagement, maintain appropriate cognitive load, and mitigate negative effects of AI, such as introducing redundancies and extraneous information into student work.
When Cow Urine Cures Constipation on YouTube: Limits of LLMs in Detecting Culture-specific Health Misinformation
arXiv, April 2026
Anamta Khan, Ratna Kandala, Deepti, Sheza Munir, Joyojeet Pal
Social media platforms have become primary channels for health information in the Global South. Using gomutra (cow urine) discourse on YouTube in India as a case study, we present a post-facto Large Language Model (LLM)-assisted discourse analysis of 30 multilingual transcripts showing that promotional content blends sacred traditional language with pseudo-scientific claims in ways that sophisticated debunking content itself mirrors, creating a rhetorical register that LLMs, trained predominantly on Western corpora, are systematically ill-equipped to analyse. Varying prompt tone across three LLMs (GPT-4o, Gemini 2.5 Pro, DeepSeek-V3.1), we find that culturally embedded health misinformation does not look like ordinary misinformation, and this cultural obfuscation extends to gendered rhetoric and prompt design, compounding analytical unreliability. Our findings argue that cultural competency in LLM-assisted discourse analysis cannot be retrofitted through prompt engineering alone.
Dharma, Data and Deception: An LLM-Powered Rhetorical Analysis of Cow-Urine Health Claims on YouTube
arXiv, April 2026
Sheza Munir, Ratna Kandala, Anamta Khan, Deepti, Joyojeet Pal
Health misinformation remains one of the most pressing challenges on social media, particularly when cultural traditions intersect with scientific-sounding claims. These dynamics are not only global but also deeply local, manifesting in culturally specific controversies that require careful analysis. Motivated by this, we examine 100 YouTube transcripts that promote or debunk cow urine (gomutra) as a health remedy, focusing on rhetorical strategies such as appeals to authority, efficacy appeals, and conspiracy framing. We employ large language models (LLMs) including GPT-4, GPT-4o, GPT-4.1, GPT-5, Gemini 2.5 Pro, and Mistral Medium 3 to annotate transcripts using a 14-category taxonomy of persuasive tactics. Our analysis reveals that promoters predominantly rely on efficacy appeals and social proof, while debunkers emphasize authority and rebuttal. Human evaluation of a subset of annotations yielded 90.1\% inter-annotator agreement, confirming the reliability of our taxonomy and validation process. This work advances computational methods for misinformation analysis and demonstrates how LLMs can support large-scale studies of cultural discourse online.
Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows
arXiv, April 2026
Shivani Kumar, Adarsh Bharathwaj, David Jurgens
Multi-agent systems built from teams of large language models (LLMs) are increasingly deployed for collaborative scientific reasoning and problem-solving. These systems require agents to coordinate under shared constraints, such as GPUs or credit balances, where cooperative behavior matters. Behavioral economics provides a rich toolkit of games that isolate distinct cooperation mechanisms, yet it remains unknown whether a model's behavior in these stylized settings predicts its performance in realistic collaborative tasks. Here, we benchmark 35 open-weight LLMs across six behavioral economics games and show that game-derived cooperative profiles robustly predict downstream performance in AI-for-Science tasks, where teams of LLM agents collaboratively analyze data, build models, and produce scientific reports under shared budget constraints. Models that effectively coordinate games and invest in multiplicative team production (rather than greedy strategies) produce better scientific reports across three outcomes, accuracy, quality, and completion. These associations hold after controlling for multiple factors, indicating that cooperative disposition is a distinct, measurable property of LLMs not reducible to general ability. Our behavioral games framework thus offers a fast and inexpensive diagnostic for screening cooperative fitness before costly multi-agent deployment.
RELATED
Keep up with research from UMSI experts by subscribing to our free research roundup newsletter!