Adobe Turns Up the Volume on AI With New Ways to Generate Soundtracks and Audio

5 hours ago 2

Adobe's hub for all things AI, Firefly, is central to its latest innovations. The company announced a ton of AI-powered updates at its Max creative conference on Tuesday. While the rest of us have been obsessing (and worrying) over OpenAI's new Sora AI slop app, Adobe is headed in a different direction: Its newest features are for generating AI audio.

Adobe was the second big tech company to introduce AI-generated audio to its AI video model, following Google's Veo 3. Its previous AI audio tool was primarily focused on sound effects. With that tool, you could record yourself roaring like a monster, and AI would keep the cadence of your recording but beef it up with AI. Now, Adobe is building on its audio tools and introducing new ones.

Generate soundtrack and generate speech do exactly what they suggest: You can create background music and record scripts for your video. But each comes with industry-first perks that make them enticing for any creator. They're available in beta now.

Adobe is also releasing its latest, fifth-generation Firefly Image Model. It's better at producing photorealistic images, and you can now use prompt-based editing. There's also a new Firefly video editor, a multitrack timeline that's meant to help you manage AI-generated clips. Adobe is expanding its partnerships with two new AI companies, ElevenLabs and Topaz Labs. And with Adobe, you'll also be able to create your own custom AI models. For even more AI news, you can learn about the AI assistants coming to Photoshop and Express.

Generating speech

Generating speech in Firefly is simple, and it includes a lot of features that'll make it useful for nearly any project. It's a simple window where you can type in the words you want the AI voice to read. You can also upload a script of up to 7,500 characters -- roughly a 15- to 20-minute video. Once uploaded, you can choose from 50 voices, each tagged with an approximate age and gender, including nonbinary options. You can generate speech in 20 different languages. But the fun part is what you can do to fine-tune your prompt.

Speech is more than just reading words on a page. When we read long passages or talk with others, we naturally add emphasis, emotion and rhythm to our speech. With the new program, you can do the same, adding pauses where you want the AI to take a breather and highlighting sections where the tone should shift.

If you're like me and nobody pronounces your name right on the first try, you can use the "fix pronunciation" tool to ensure there aren't any flubs. Select the name or proper noun and then add a phonetic breakdown, and the AI will use that to smooth out the pronunciation.

These tools, along with your hands-on ability to adjust specific sections, are meant to give you more control, something other text-to-speech programs don't always offer.

"It's a way for us to provide lifelike speech to creators, to small business owners, to educators, to everybody that really just has a story to tell, and maybe they're not as comfortable as we are just pulling out a mic and talking," Jay LeBoeuf, Adobe's head of AI audio, said in an interview.

Firefly audio is a brand-new AI model. But that's not your only option. Adobe has been steadily adding to its roster of third-party AI models this year, for both AI video and image. It's expanding those choices again by including ElevenLab's multilingual V2 model as an option for generating speech.

Generate music and soundtracks

Music licensing is complicated, especially for commercial use. So let me start with the part that matters most: Any music generated with Firefly's generate soundtrack is given a universal license, which means you can use it for any purpose, indefinitely. Adobe creates its AI tools by using content (in this case, audio) that it has permission to use for AI training. So in theory, you shouldn't have Firefly AI audio removed from YouTube or other platforms or get a dreaded copyright strike.

"This is a unique time in the world where music licensing is on the top of everybody's mind and creators are just either frustrated because they're trying to do the best thing for their content, or they're confused," said LeBoeuf. "So we're just hoping to remove the confusion."

In a demo, Firefly did reject a prompt with an artist's name in it as it violated its user guidelines due to copyright concerns. Because the model isn't trained on Taylor Swift's music, for example, it can't create music similar to hers.

Now, the fun stuff: Generate soundtrack is the first AI music tool from Adobe, and it's designed to take the guesswork out of what you want. You upload your video, and the AI analyzes it. Based on its assessment, Firefly will write a prompt it thinks may work well for your video. It's a Mad Libs-style prompt, and you can swap out the descriptors as you see fit. The prompt has three parts: describing the general vibe, style (think genre) and purpose (commercial, experimental, etc.). You can also adjust the tempo and energy level.

Once you're happy with your prompt, click generate and less than two minutes later, four music variations will be ready for you to play. Your audio will be as long as your video, but you can edit that as needed. You can upload videos that are up to five minutes long.

For more, check out how Adobe's Project Indigo camera app works, now with iPhone 17 support.

Read Entire Article