Audio samples - Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis
We separately demonstrated the performance of the RASVC model in handling long English singing conversion , short Chinese singing conversion and cross-domain conversion(from Chinese to English), with each type of conversion including both male and female source singers.The audio used in our experiment was sampled at a rate of 16 KHz.
Long English singing conversion-male source-male
No.
Target
RASVC converted
1
2
3
4
Long English singing conversion-female source-female
No.
Target
RASVC converted
1
2
3
4
Short Chinese singing conversion-male source-male
No.
Target
RASVC converted
1
2
3
4
Short Chinese singing conversion-female source-female
No.
Target
RASVC converted
1
2
3
4
Cross-domain conversion(from Chinese to English)-male source-male
No.
Target
RASVC converted
1
2
3
4
Cross-domain conversion(from Chinese to English)-female source-female