Researchers Publish Breakthrough Internal Steering Technique for LLMs in Science Journal
UC San Diego and MIT researchers have published a landmark study in Science demonstrating a scalable method to steer and monitor AI models by directly manipulating internal concept representations, exposing both safety vulnerabilities and capability improvements.


