Efficient and Adaptable Speech Enhancement via Pre-trained Generative Audioencoders and Vocoders
Recent advances in speech enhancement (SE) have moved beyond traditional mask or signal prediction methods, turning instead to pre-trained audio models for richer, more transferable features. These models, such as WavLM, extract meaningful audio embeddings that enhance the performance of SE. Some approaches use these embeddings to predict masks or combine them with spectral data…
