While ordinary people around the world are waking up to large language models on the cloud, researchers at New Jersey Institute of Technology want you to know about the power of small models on your own hardware. It’s not unlike fifty years ago, when people were becoming aware of business computers the size and cost of a car, unaware of the imminent personal computing revolution.
The advantages this time are personalization and privacy. By running a small language model on your own server, laptop or even smartphone, the model doesn’t have to be programmed for general knowledge or availability to anyone else, explained Yingcong Li, assistant data science professor in NJIT’s Ying Wu College of Computing.
Li already helped conquer one downside, which is training efficiency. Small models can’t process your answer nearly as efficiently as large ones with more than 50 billion data parameters. Li, as a University of Michigan doctoral student until summer 2025 and then as a new NJIT faculty member in the fall 2025 semester, worked on research that largely closes the gap by giving small models mathematical hints from large models about how to answer questions.
If the small model succeeds after receiving its hint, then a shorter and more difficult hint is provided to see if it still works. If the small model fails, then the hint is lengthened to become easier. The goal is to find what Li and collaborators call an expert anchor — the shortest possible hint where the model can still maintain a specific success rate. Their intention is to make small-model training three times faster, and they presented this project, BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning, at the recent Neural Information Processing Systems conference in San Diego. To read the full story.
