A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
- Paper
 - Feb 6, 2023
 - #ComputerScience
 
      Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks. In this work, we study t...
      
        Show More
      
    
    
  
Mentions
There are no mentions of this content so far.