Abstract
Crosstalk in audio recordings, where multiple speakers' voices overlap, presents a significant challenge for accurate automatic speech recognition and dialogue transcription in real-time communication systems. This paper introduces an audio processing component that achieves effective crosstalk suppression through the application of sidechain compression and noise gating techniques. Implemented within the AI-powered communication assistance platform CoSy (Communication Support System), the system enhances transcription quality while maintaining low computational resource consumption, enabling real-time operation on edge devices. Evaluation using the custom LibriDialogue dataset demonstrates that the CoSy system achieves superior computational efficiency compared to state-of-the-art models Conv-TasNet and MossFormer2, while maintaining competitive transcription accuracy and reasonable signal quality. The results indicate that the proposed approach effectively mitigates crosstalk and is particularly suitable for practical deployment in scenarios requiring real-time operation on CPU-limited edge devices.
| Original language | English |
|---|---|
| Title of host publication | Intelligent Systems and Applications |
| Editors | Kohei Arai |
| Number of pages | 15 |
| Place of Publication | Cham |
| Publisher | Springer Nature Switzerland |
| Publication date | 2025 |
| Pages | 433-447 |
| ISBN (Print) | 978-3-032-00071-2 |
| Publication status | Published - 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
-
SDG 9 Industry, Innovation, and Infrastructure
-
SDG 11 Sustainable Cities and Communities
-
SDG 12 Responsible Consumption and Production
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver