I’m experimenting with a different approach to local coding assistants and wanted to get feedback from people who’ve tried similar setups.
Instead of relying on one general-purpose model, I’m thinking of building multiple small, specialized models, each focused on a specific domain:
The idea is:
I’m also considering sharing the results publicly (maybe on **Hugging Face / Transformers) if this approach works.
Would really appreciate any insights, warnings, or even “this is a bad idea” takes 🙏
Thanks!
The user is exploring the development of multiple small, specialized coding LLMs instead of a single large model. They seek feedback on the feasibility and effectiveness of this approach, particularly regarding multi-model routing systems, fine-tuning versus RAG, and dataset quality. They are working with limited hardware resources and aim to create a practical development assistant.
i've been down this road and honestly the routing layer is the hardest part.
ran something similar with a few 7b models for different tasks - the domain specialization did help, but the overhead of managing multiple models and figuring out which one to call for what basically erased most of the gains.
for your constraints (16gb ram, no gpu), i'd honestly suggest starting with just RAG on a solid base model before investing in fine-tuning. you can get 80% of the benefit with way less setup pain.
if you do go multi-model, i'd recommend routing at the task level (file type + complexity) rather than trying to do it dynamically. much easier to debug when something breaks.
on that hardware it makes more sense to use one decent model plus good rag than build a zoo of models and routing that eats all your gains
With 16GB RAM and no GPU, it’s more or less a lost cause. 16GB is very little, let alone if it’s not even VRAM.