publications

publications by categories in reversed chronological order

View my Google Scholar profile

2025

  1. dual_process.png
    Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
    Suraj Anand, Michael Lepori, Jack Merullo, and 1 more author
    Apr 2025
    ICLR 2025

2024

  1. ppo.png
    Is PPO Hackable
    Suraj Anand, and David Getzen
    May 2024
    Capstone

2023

  1. hackathon.png
    The first New England RLHF Hackers Hackathon
    Suraj Anand, Stephen Casper, Louis Castricato, and 6 more authors
    Sep 2023
    Blog post
  2. disentangled.png
    Disentangling Causal_Mechanisms By Obstructing Classifiers
    Suraj Anand, and Neil Xu
    May 2023
    Preprint
  3. subnetworks.png
    Subnetworks and Superpositions
    Suraj Anand, Vignesh Pandiarajan, and Noah Foster
    May 2023
    Preprint