MIT researchers have made a significant breakthrough in understanding the inner workings of protein language models, which are artificial intelligence models used to predict the structure and function of proteins. These models, based on large language models (LLMs), and can accurately predict a protein's suitability for a given application. However, until now, it was impossible to determine how they made their predictions.
The researchers used a novel technique called sparse autoencoders to shed light on the "black box" of protein language models. By expanding the neural network from 480 neurons to 20,000 nodes, they allowed the information to "spread out," making it possible to identify specific features encoded by each node. With the help of an AI assistant named Claude, they analyzed thousands of representations and found that the features most likely encoded by these nodes were protein family and certain functions, including metabolic and biosynthetic processes.
This breakthrough has broad implications for enhanced explainability in downstream tasks and could reveal novel biological insights. According to Bonnie Berger, the Simons Professor of Mathematics at MIT, "Our work has broad implications for enhanced explainability in downstream tasks that rely on these representations." Onkar Gujral, an MIT graduate student and lead author of the study, adds that "at some point when the models get a lot more powerful, you could learn more biology than you already know, from opening up the models."
The study marks a significant step forward in understanding protein language models and their potential applications. By shedding light on the inner workings of these models, researchers can now better understand how they make predictions and potentially unlock new biological insights. This could lead to significant advancements in fields such as drug discovery and vaccine development, where protein language models are already being used to streamline processes and improve efficiency.