IBM’s Mikhail Yurochkin wants to make AI’s “cool” factor tangible
As a researcher and a mentor, Yurochkin has worked to make AI more inclusive, accessible, and relevant to everyday life.
Probabilistic modeling comes naturally to Mikhail “Misha” Yurochkin. He would argue the same is true for all of us.
“When you get dressed in the morning, you’re using an internal model to decide what to wear,” he said recently. “At the end of the day, you might be glad you dressed for the cold but left the umbrella at home. Your internal model gets updated with this data and affects your decision-making tomorrow.”
As a researcher at the MIT-IBM Watson AI Lab, Yurochkin spends much of his time thinking about data, and how to build models with variables both known and unknown that are constantly in flux. In his six years at the lab, he has explored an eclectic range of topics, all connected by data and what it can tell us about the future if we ask in the right way.
He sat down recently to chat in the café of IBM Research’s Cambridge office. It was a cold day, and the Charles River below glittered under a fresh coat of snow. Yurochkin’s inner model that morning had suggested a heavy sweatshirt, no umbrella, which had apparently been the right call.
He arrived at IBM with a PhD in Bayesian statistics, never suspecting that large statistical models of language — what we call LLMs today — would become all the rage. In hindsight, his career path, too, had been a fortuitous choice. At IBM, he has applied his training in Bayesian modeling to everything from federated learning to algorithmic bias to shrinking the amount of computation required to train, tune, and serve LLMs.
In classical statistics, the parameters of a model are fixed, based on the observed data. Under Bayes’ theorem, however, parameters change as new evidence comes in. For Yurochkin, this dynamic way of thinking about data has been a source of creativity. He compares building models to playing with Legos; each block represents a “mathematical object” corresponding to things like hidden “latent” variables and probability distributions.
Together, they paint the data in a new light. “I try to understand how LLMs ‘learn,’ and what each piece of the model is doing,” he says. “This helps me come up with new ideas to improve them. It feels a lot like building a Bayesian model.”
Since 2022, and the rise of ChatGPT, he has divided his time between AI evaluation (“understanding what AI can and can’t do”) and AI efficiency (“figuring out how to reduce its costs, so it can go from something that’s cool to something that’s everywhere.”)
Borrowing an idea from standardized testing, Yurochkin and his colleagues brought “tiny” benchmarks to Hugging Face, allowing developers to evaluate LLMs with far fewer questions and at much lower cost. These mini benchmarks have been collectively downloaded more than 250,000 times since their release.
He has also come up with innovative ways to lower the cost of AI training and AI inferencing. One method he helped develop routes user queries to the most cost-effective LLM. Another harnesses low-rank adapters, or LoRAs, that can be swapped on and off an LLM, like bits in a multi-bit screwdriver, to customize and serve models faster.
“At any point in time, he has more project ideas than he can possibly explore,” says his boss, Dan Gutfreund, an IBM manager based in Cambridge. “Whenever new interns come in, he pulls another idea out of the drawer for them to work on. It’s why researchers and students like working with him.”
Yurochkin grew up in a city six hours from Moscow, the only child of two professionals. He played chess with his father, and even took classes, but he ended up channeling his competitive drive into Magic: The Gathering, the role-playing fantasy card game. By his senior year in high school, he was ranked among the top 10 players in Russia.
In college, he majored in applied math and physics — the only major offered at Moscow Institute of Physics and Technology. After, he came to the U.S. to study for a PhD in statistics at the University of Michigan, Ann Arbor. There, he was converted to Bayesian thinking by his adviser, Long Nguyen, and learned to code as a way to carry out experiments.
One night in bed, while visualizing geometric shapes in his head, a big idea came to him. He was looking for a quicker way to thematically organize large collections of documents, commonly done at the time with probabilistic topic modeling. You grouped together frequently occurring words to infer underlying themes. Yurochkin realized that a geometric solution could be much simpler and faster than the probabilistic one.
He translated his insight into equations and presented the work at NeurIPS in 2016. A year later, a follow-up paper at the same conference caught the eye of IBM researchers. A few days later, he was offered a job at the MIT-IBM Watson AI Lab, which had just opened with a $240 million investment from IBM.
In Cambridge, Yurochkin connected geometric modeling to open questions in AI. In his most cited paper, he and his colleagues came up with a better way to merge the weights of two independently trained models. “Simply averaging their weights would wreck the combined model,” he said. “But if we account for their permutations, we can get a ‘smarter’ average, and a higher-performing model.”
The building that houses the IBM lab where Yurochkin works, at 314 Main Street (nicknamed the pi building), looks out on the MIT campus. Over the years, he has worked closely with MIT professor Justin Solomon and his students, co-authoring nearly 20 papers together.
“Misha is a brilliant, inclusive collaborator with clever and constructive ideas,” said Solomon. “He stands out for his incredible breadth of knowledge and openness to pursuing research in totally new and unfamiliar areas.”
Yurochkin had an early interest in algorithmic fairness, which looks at how to ‘debug’ AI systems trained on biased or unrepresentative data. In 2022, he led the development of InFairness, an open-source Python library to train and audit machine-learning models for individual fairness, as well as a post-processing tool that corrects biased AI outputs to ensure that individuals with similar qualifications are treated similarly.
Despite his belief in engineering solutions, however, Yurochkin saw a need for more diversity within tech itself. To encourage more women and minorities to join the industry, he started volunteering with an educational nonprofit called Break Through Tech.
This past fall, he finished working with a fifth cohort of Boston-area college students through Break Through Tech. The research projects he has guided them through have ranged from testing the capabilities of ‘small’ language models to deploying LLMs as AP-exam tutors.
“Our project evolved from Misha’s initial proposal, but he still guided us and was constantly teaching us new things,” said Geneva Yang, who helped build the AP-tutor as a junior at Boston University.
When the time came for student presentations, Yurochkin invited Stephanie Soetendal, the founder of a local ed-tech startup Matrix Holograms. Yang was ultimately offered a summer internship at Matrix Holograms, and the experience convinced her to go into ed tech after graduation.
Some of the students had never set foot in a lab before or done original research. Working with Yurochkin in Cambridge, they got to meet other IBM researchers. They also learned how to go from an initial hypothesis to verifiable results, diligently pushing through each obstacle.
“Misha laid out the timeline with clear, achievable milestones, which allowed our team to systematically tackle each step,” says Ishita Kakkar, a senior at the University of Massachusetts, Amherst. “Through him, I learned the importance of meticulous planning for any research project.”
Many of the students later sought out Yurochkin out for career guidance as well as job and graduate school recommendations. “He gave me insights into how to frame my research interests and align them with faculty expertise,” says Kakkar. “That helped me craft a more compelling statement of purpose.” Kakkar was recently accepted to the University of Wisconsin, Madison’s competitive PhD program in computer science.
The benefits go both ways. Working with students clearly energizes Yurochkin. He breaks into a smile as he describes past projects and the jobs and graduate programs his students have gone on to. “The goal is to teach them how to think creatively,” he says, “how to think like a researcher.”
He doesn’t play with Magic cards so much anymore, but when asked where his research was headed next, he barely skipped a beat, as if he had just drawn the winning card that ends the match.
“Reinforcement learning,” he said. “Through reinforcement learning you can teach models new things not by showing them more examples, but by getting them to solve the problem themselves and verify the solution. Scaling reinforcement learning will take LLMs to the next level.”