Rakshit Trivedi

I’m a Postdoctoral Associate in the Algorithmic Alignment Group working with Dylan Hadfield-Menell at Computer Science and Artificial Intelligence Laboratory, MIT. I study artificial intelligence, with a specific focus on cooperative AI, alignment and multi-agent safety.

I’m interested in developing robust and generalizable AI systems that feature advanced capabilities beneficial to humans, align with human principles, and integrate safely into our daily lives. My research is grounded in the fundamental reality of our world: it is inherently a multi-agent ecosystem where success relies on the ability to navigate and contribute to complex interactions and networks of relationships. This perspective drives my ambition to enable AI systems with collective intelligence—the ultimate capability that underpins human societies’ ability to cooperate and coordinate effectively in addressing the challenges of shared existence.

I develop methods blending innovations in AI for multi-agent systems (e.g. reinforcement learning, generative agents, graph machine learning) and interdisciplinary topics spanning social science, anthropology, game theory and economics among others. I use these methods to understand social and economic factors driving human behaviors and interactions with an aim to build AI systems that engender cooperation between humans, AI and institutions.

I am incredibly fortunate to collaborate closely with David Parkes at Harvard University, Gillian Hadfield at Johns Hopkins University and Joel Leibo and other members of the the multi-agent team at Google Deepmind on these topics. Previously, I completed my PhD at Georgia Institute of Technology, where I was advised by Hongyuan Zha.

News

Sep 2025	Our paper on Inner Speech as Behavior Guides has been accepted as a Spotlight paper at NeurIPS 2025!
Sep 2025	Our paper on Evaluating Generalization Capabilities of LLM-Based Agents has been accepted at NeurIPS 2025 Datasets and Benchmarks Track!
Dec 2024	I co-presented the tutorial on Cross disciplinary Insights into Alignment in Humans and Machines at NeurIPS 2024.
Dec 2024	Co-organized the NeurIPS 2024 Concordia Contest in collaboration with Google Deepmind and Cooperative AI foundation. This contest challenged participants to advance the cooperative intelligence of language model (LM) agents in rich, text-based environments, based on the recently released Concordia framework which uses language models to create open-ended worlds similar to tabletop role-playing games.
Nov 2024	Our preliminary investigation into design of normative frameworks to ensure sociotechnical AI safety was accepted at Knight Symposium on Artificial Intelligence and Democratic Freedoms.
Aug 2024	Our report on the Melting Pot contest: Charting the Future of Generalized Cooperative Intelligence” was accepted at Neurips 2024 Dataset and Benchmark Track.
Jul 2024	Our works at the intersection of AI alignment, normative infrastructure and normative reasoning in AI agents were accepted in the Agentic Markets Workshop at ICML 2024 (this work focuses reinforcement learning agents) and the Workshop on Foundation Models and Game Theory at EC 2024 (this work focuses on language agents).
Jul 2024	Our work on “Diffuse, Sample, Project: Plug-and-Play Controllable Graph Generation” was accepted in ICML 2024.
Mar 2024	I was selected as a Kavli Fellow at the 34th Annual Kavli Frontiers of Science Symposium by the US National Academy of Sciences
Dec 2023	Organized the NeurIPS 2023 Melting Pot Contest in collaboration with Google Deepmind and Cooperative AI foundation. This contest challenged researchers to push the boundaries of multi-agent reinforcement learning for mixed-motive cooperation by evaluating how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations.
Jun 2023	Our work on “Plug-and-Play Controllable Graph Generation with Diffusion Models” was accepted in ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling.
May 2023	Our work on “Temporal Dynamics-Aware Adversarial Attacks on Discrete-Time Dynamic Graph Models” was accepted at KDD 2023
Apr 2023	I gave an invited talk on “Foundations for Learning in Multi-agent Ecosystems: Modeling, Imitation and Equilibria” at University of Southern California.
Dec 2022	Our work on “Imperceptible Adversarial Attacks on Discrete-Time Dynamic Graph Models” was accepted at Neurips Temporal Graph Learning Workshop
Aug 2022	I gave an invited talk on “Learning from Interactions in Networked Systems” as a part of Beneficial AI seminar series at the Center for Human-Compatible Artificial Intelligence (CHAI), UC Berkeley
May 2022	Our work on “Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning” was accepted at AAMAS 2022.
Apr 2022	Our work on “CrowdPlay: Crowdsourcing human demonstration data for offline learning” was accepted at ICLR 2022.