Abstract: Retrieval-based augmentation enhances large language models (LLMs) by grounding responses in external knowledge. However, in voice-driven assistants that rely on remote cloud retrieval, open ...