ligw1998.github.io

Home Page

Welcome to Diverse and Vivid Sound Generation from Text Descriptions

DEMOs are here!

Text-to-Sound Generation:

These samples are generated using our best model. The text descriptions used are taken from the AudioCaps test set, so we are able to compare our generated audios directly with the real corresponding audios from AudioCaps. For each example, we provide both generated sound and original sound for comparison.

(1) Motorcycle starting then driving away. generated: reference:

(2) A man speaking as birds are chirping. generated: reference:

(3) A man speaking on a microphone followed by a crowd of people laughing then applauding. generated: reference:

(4) A female voice and then a male voice followed by the female voice again. generated: reference:

(5) Motorcycle starting then driving away. generated: reference:

(6) Food is frying, and a woman talks. generated: reference:

(7) A woman talking followed by a young girl talking while an infant cries. generated: reference:

(8) A man talking as a door slams shut followed by a door creaking. generated: reference:

(9) A motor is revving and changing gears. generated: reference:

(10) Water is* trickling, and a man *talks. generated: reference:

(11) Very strong wind is blowing, and leaves are rustling on the trees. generated: reference:

(12) A person is snoring. generated: reference: