Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT

Source: arXiv AI Papers

The integration of ASR and MT for real-time streaming speech translation on devices faces significant latency bottlenecks despite advances in ASR technology such as Recurrent Neural Network Transducers (RNN-T). This paper proposes a simultaneous translation approach that balances translation quality with latency, enabling more efficient on-device processing. The approach leverages linguistic cues from the ASR system to better manage context during translation and employs beam-search pruning techniques like time-out and forced finalization to maintain real-time responsiveness. These innovations allow the system to operate effectively in bilingual conversational settings, demonstrating superior performance compared to baseline models. By narrowing the quality gap with traditional non-streaming translation systems, the proposed method enhances the feasibility of accurate and efficient real-time speech translation on devices. The results suggest that integrating ASR and MT with alignment-based streaming techniques can significantly reduce latency without sacrificing translation accuracy. This advancement has important implications for applications requiring immediate translation feedback, such as live conversations and accessibility tools. Future work may focus on further optimizing the balance between latency and quality and extending the approach to additional languages and more complex conversational scenarios. Overall, this research marks a step forward in overcoming the technical challenges of on-device streaming speech translation.

👉 Pročitaj original: arXiv AI Papers