Anon's Guide to A.I. Music Remixes Using AudioCraft

Q: What is AudioCraft?

A group of text to music diffusion models - basically, the music version of Dall-E or Stable Diffusion. One of the models, musicgen-melody, lets you use input audio as well, letting us remix songs.

Q: Is this the thing that lets me make my waifu sing songs?

No, that's RVC

Google Collab Notebook * (Recomended) *

If you have an Nvidia graphics card with less than 12gb VRAM, or and AMD graphics card, you'll need to use the Google Collab for now.

Step 1: Running the Collab

Go to the Google Collab URL:

https://colab.research.google.com/github/camenduru/MusicGen-colab/blob/main/MusicGen_ClownOfMadness_plus_colab.ipynb

Then make a local copy in your drive

File -> Save a copy in Drive

Then, run the Cel.

After everthing is installed, you should be given something that looks like this:

Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://af28e4e30f71282200.gradio.live

Visit the public URL to access the gradio web ui

Step 2: Picking the right settings

Under the MusicGen -> Settings tab

  • change Output Audio Channels from stereo to stereo effect, this improves audio quality
  • change the model from large to melody so we can prompt with a base track
  • for Decoder, change Default to MultiBand_Diffusion to get higher quality outputs

Under the MusicGen -> Audio tab

  • change Input Audio Mode from sample to melody (change from continue audio mode to follow audio melody mode)
  • then, upload an input audio prompt

Under the MusicGen -> Generation tab

  • adjust number of prompts
  • type prompts
  • adjust durations
Q: How can I make my prompt sound closer the melody I provided?

You can adjust the temperature setting in MusicGen -> Settings to change how strictly melody is adhered to. A value of 0.1 will try to make it nearly identical to the melody, while 0.95 or higher will give it more freedom.

Step 3: Download and share your best (and worst) gens!

Linux Local Install Guide *(Nvidia Only)*

Requirements

The github repo recommends 16GB of VRAM, but I had no issues running it on my 3060 with 12GB of VRAM, though it used about 9-11GB depending on number of generations and generation length (~30s x 5 styles). It's possible you could get it working with a 10GB. Overall, it's fairly length limited, but has the obvious local benefits.

Step 1: Clone the Repo

Clone AudioCraft Plus:
git clone https://github.com/GrandaddyShmax/audiocraft_plus.git

Step 2: Create a Virtual Environment with Conda or venv
- Using Conda:

Create a conda virtual environment with conda create --name audiocraft python==3.10.9
Then activate it with conda activate audiocraft

- Using venv:

Create a virtual environment with venv like python3 -m venv ./venv/
Then activate it with source ./venv/bin/activate

Step 3: Install Dependencies

AudioCraft requires Python 3.9 or above and PyTorch version 2.0.0. or above
It's best to make sure you have torch installed first, in particular before installing xformers.

#Don't run this if you already have PyTorch installed.
pip install torch torchaudio

Then we can install the rest of the dependencies

1
2
3
# if you have an error with xformers, specify a specific version number in requirements.txt
# ie: xformers==0.0.22
pip install -r requirements.txt

Finally, we install audiocraft

pip install -e .

If you want to use .mp3 files, you need ffmpeg installed, either through your system or Anaconda:

1
2
3
sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge

Also consider installing jupyter so you can run the .ipynb notebook files

pip install jupyter
Step 4: Starting the WebUI

Run the following command:

python app.py

Then, go to the local url: http://127.0.0.1:7860

Step 5: Picking the right settings

Under the MusicGen -> Settings tab

  • change Output Audio Channels from stereo to stereo effect, this improves audio quality
  • change the model from large to melody so we can prompt with a base track
  • for Decoder, change Default to MultiBand_Diffusion to get higher quality outputs

Under the MusicGen -> Audio tab

  • change Input Audio Mode from sample to melody (change from continue audio mode to follow audio melody mode)
  • then, upload an input audio prompt

Under the MusicGen -> Generation tab

  • adjust number of prompts
  • type prompts
  • adjust durations
Q: How can I make my prompt sound closer the melody I provided?

You can adjust the temperature setting in MusicGen -> Settings to change how strictly melody is adhered to. A value of 0.1 will try to make it nearly identical to the melody, while 0.95 or higher will give it more freedom.

Step 6: Share your best (and worst) gens!

TODO:

Find someone to write a guide for windows
Find if there's a way to add AMD support
Edit Report
Pub: 12 Oct 2023 23:45 UTC
Edit: 13 Oct 2023 17:51 UTC
Views: 2913