i've been down this road and honestly the routing layer is the hardest part. ran something similar with a few 7b models for different tasks - the domain specialization did help, but the overhead of managing multiple models and figuring out which one to call for what basically erased most of the gains. for your constraints (16gb ram, no gpu), i'd honestly suggest starting with just RAG on a solid base model before investing in fine-tuning. you can get 80% of the benefit with way less setup pain. if you do go multi-model, i'd recommend routing at the task level (file type + complexity) rather than trying to do it dynamically. much easier to debug when something breaks.
i think the main value is just reducing ops drag early. your stack (clerk + rds + fastapi) is more flexible, but every auth/infra edge case steals feature time when you’re still finding product signal. supabase/firebase is basically paying to remove moving parts for that phase, then you can peel off later if cost/control starts hurting