Audio samples - Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis


We separately demonstrated the performance of the RASVC model in handling long English singing conversion , short Chinese singing conversion and cross-domain conversion(from Chinese to English), with each type of conversion including both male and female source singers.The audio used in our experiment was sampled at a rate of 16 KHz.

Long English singing conversion-male source-male

No. Target RASVC converted
1
2
3
4

Long English singing conversion-female source-female

No. Target RASVC converted
1
2
3
4

Short Chinese singing conversion-male source-male

No. Target RASVC converted
1
2
3
4

Short Chinese singing conversion-female source-female

No. Target RASVC converted
1
2
3
4

Cross-domain conversion(from Chinese to English)-male source-male

No. Target RASVC converted
1
2
3
4

Cross-domain conversion(from Chinese to English)-female source-female

No. Target RASVC converted
1
2
3
4