Voicebox is a generative AI model for speech that can generalize to tasks it was not specifically trained for with state-of-the-art performance. Unlike existing speech synthesizers, it can be trained on diverse, unstructured data without requiring carefully labeled inputs.Voicebox uses a new approach called Flow Matching, which is a Meta's latest advancement on non-autoregressive generative models that can learn highly non-deterministic mapping between text and speech.