Rakshit Trivedi

rt_sq_image.jpg

I am a computer scientist working on cooperative AI, multi-agent safety, and institutions for collective intelligence. My research asks how we can build, evaluate, and govern AIs that help humans, AIs, and institutions cooperate in complex multi-actor worlds.

I am interested in developing robust and generalizable AI whose capabilities are beneficial to humans, aligned with human values and principles, and safely integrated into society. My research is grounded in a simple premise: our world is inherently a multi-actor ecosystem shaped by incentives, norms, institutions, relationships, and collective action. This perspective motivates my broader goal of building AI systems and surrounding institutions that support cooperation-—the hallmark of intelligence that underpins human societies’ ability to coordinate effectively in addressing the challenges of shared existence.

I develop methods blending innovations in AI for multi-agent systems (e.g. reinforcement learning, generative agents, graph machine learning) and interdisciplinary topics spanning social science, anthropology, game theory and economics among others. My recent work increasingly explores how to evaluate AI systems in richer social and institutional environments: how they reason about other agents, respond to incentives and norms, participate in cooperative settings, and affect coordination among humans and AI systems.

I am incredibly fortunate to collaborate closely with Dylan Hadfield-Menell at MIT, David Parkes at Harvard University, Gillian Hadfield at Johns Hopkins University and Joel Leibo and other members of the the multi-agent team at Google Deepmind on these topics.

News

Jun 2026 The paper I have been most excited about lately on Solipsistic Superintelligence is Unlikely to be Cooperative has been accepted at ICML 2026 Position Paper track.
Mar 2026 Our full paper on Building AI for the Democratic Matrix: A Technical Research Agenda for Normative Competence and Normative Institutions has been published by the Knight First Amendment Institute as a follow-up to Knight Symposium on Artificial Intelligence and Democratic Freedoms.
Jan 2026 Our paper on COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics has been accepted at ICLR 2026!
Sep 2025 Our paper on Inner Speech as Behavior Guides has been accepted as a Spotlight paper at NeurIPS 2025!
Sep 2025 Our paper on Evaluating Generalization Capabilities of LLM-Based Agents has been accepted at NeurIPS 2025 Datasets and Benchmarks Track!
Dec 2024 I co-presented the tutorial on Cross disciplinary Insights into Alignment in Humans and Machines at NeurIPS 2024.
Dec 2024 Co-organized the NeurIPS 2024 Concordia Contest in collaboration with Google Deepmind and Cooperative AI foundation. This contest challenged participants to advance the cooperative intelligence of language model (LM) agents in rich, text-based environments, based on the recently released Concordia framework which uses language models to create open-ended worlds similar to tabletop role-playing games.
Nov 2024 Our preliminary investigation into design of normative frameworks to ensure sociotechnical AI safety was accepted at Knight Symposium on Artificial Intelligence and Democratic Freedoms.
Aug 2024 Our report on the Melting Pot contest: Charting the Future of Generalized Cooperative Intelligence” was accepted at Neurips 2024 Dataset and Benchmark Track.
Jul 2024 Our works at the intersection of AI alignment, normative infrastructure and normative reasoning in AI agents were accepted in the Agentic Markets Workshop at ICML 2024 (this work focuses reinforcement learning agents) and the Workshop on Foundation Models and Game Theory at EC 2024 (this work focuses on language agents).
Jul 2024 Our work on “Diffuse, Sample, Project: Plug-and-Play Controllable Graph Generation” was accepted in ICML 2024.
Mar 2024 I was selected as a Kavli Fellow at the 34th Annual Kavli Frontiers of Science Symposium by the US National Academy of Sciences
Dec 2023 Organized the NeurIPS 2023 Melting Pot Contest in collaboration with Google Deepmind and Cooperative AI foundation. This contest challenged researchers to push the boundaries of multi-agent reinforcement learning for mixed-motive cooperation by evaluating how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations.
Jun 2023 Our work on “Plug-and-Play Controllable Graph Generation with Diffusion Models” was accepted in ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling.
May 2023 Our work on “Temporal Dynamics-Aware Adversarial Attacks on Discrete-Time Dynamic Graph Models” was accepted at KDD 2023
Apr 2023 I gave an invited talk on “Foundations for Learning in Multi-agent Ecosystems: Modeling, Imitation and Equilibria” at University of Southern California.
Dec 2022 Our work on “Imperceptible Adversarial Attacks on Discrete-Time Dynamic Graph Models” was accepted at Neurips Temporal Graph Learning Workshop
Aug 2022 I gave an invited talk on “Learning from Interactions in Networked Systems” as a part of Beneficial AI seminar series at the Center for Human-Compatible Artificial Intelligence (CHAI), UC Berkeley
May 2022 Our work on “Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning” was accepted at AAMAS 2022.
Apr 2022 Our work on “CrowdPlay: Crowdsourcing human demonstration data for offline learning” was accepted at ICLR 2022.