in FR #8 Anh mentioned he’s working on introducing two models setup:
I am a teacher, teaching offline classroom, where I need low-latency for realtime lipsync in the surround speakers. I also record my lectures on Loom(via DSLR-capture_card)
If I could run both Models in parallel, I can get the best audio as well as my recordings, and low-latency for live classroom.
Some minor details on how it should be executed (for my usecase)
(i) Fixed latency:
while the Low-CPU model strives for the least possible latency, the Best-quality model maintains a constant latency (to sync audio with video, I was thinking of creating a video buffer to accomodate syncing - a constant latency would allow me to confugure this only-once and I can reuse it forever)
(ii) For efficiency:
The ‘Best-quality-model’ works on top of ‘Low-CPU-model’ continuing where Low-CPU left-off.
In other words, Low-CPU mode is a half-baked version of Best-Quality model. That way I am extracting audio from a “single model” at two different instances, rather than having to run “two models” at once. This could save some processing.
(Disclaimer: I am not sure if this is feasible, or if this even makes any sense. I am no techie. Would be awesome if it could work)
Users like me using this in a live setup like classroom, lectures, churches, event-organizers, etc. Users like indie creators such as co-op gamers, podcasts, etc could make use of this tech, where low-latency helps communicating with other speakers and high-quality helps with streaming. this saves them tons from sound treating their room, etc.