Anon's Guide to A.I. Music Remixes Using AudioCraft

Q: What is AudioCraft?

A group of text to music diffusion models - basically, the music version of Dall-E or Stable Diffusion. One of the models, musicgen-melody, lets you use input audio as well, letting us remix songs.

Q: Is this the thing that lets me make my waifu sing songs?

No, that's RVC

Google Collab Notebook * (Recomended) *

If you have an Nvidia graphics card with less than 12gb VRAM, or and AMD graphics card, you'll need to use the Google Collab for now.

Step 1: Running the Collab

Go to the Google Collab URL:

https://colab.research.google.com/github/camenduru/MusicGen-colab/blob/main/MusicGen_ClownOfMadness_plus_colab.ipynb

Then make a local copy in your drive

File -> Save a copy in Drive

Then, run the Cel.

After everthing is installed, you should be given something that looks like this:

⎗

1 2	Running on local URL: http://127.0.0.1:7860 Running on public URL: https://af28e4e30f71282200.gradio.live

Visit the public URL to access the gradio web ui

Step 2: Picking the right settings

Under the MusicGen -> Settings tab

change Output Audio Channels from stereo to stereo effect, this improves audio quality
change the model from large to melody so we can prompt with a base track
for Decoder, change Default to MultiBand_Diffusion to get higher quality outputs

Under the MusicGen -> Audio tab

change Input Audio Mode from sample to melody (change from continue audio mode to follow audio melody mode)
then, upload an input audio prompt

Under the MusicGen -> Generation tab

adjust number of prompts
type prompts
adjust durations

Q: How can I make my prompt sound closer the melody I provided?

You can adjust the temperature setting in MusicGen -> Settings to change how strictly melody is adhered to. A value of 0.1 will try to make it nearly identical to the melody, while 0.95 or higher will give it more freedom.

Linux Local Install Guide (Nvidia Only)

Requirements

The github repo recommends 16GB of VRAM, but I had no issues running it on my 3060 with 12GB of VRAM, though it used about 9-11GB depending on number of generations and generation length (~30s x 5 styles). It's possible you could get it working with a 10GB. Overall, it's fairly length limited, but has the obvious local benefits.

Step 1: Clone the Repo

Clone AudioCraft Plus:
git clone https://github.com/GrandaddyShmax/audiocraft_plus.git

Step 2: Create a Virtual Environment with Conda or venv

- Using Conda:

Create a conda virtual environment with conda create --name audiocraft python==3.10.9
Then activate it with conda activate audiocraft

- Using venv:

Create a virtual environment with venv like python3 -m venv ./venv/
Then activate it with source ./venv/bin/activate

Step 3: Install Dependencies

AudioCraft requires Python 3.9 or above and PyTorch version 2.0.0. or above
It's best to make sure you have torch installed first, in particular before installing xformers.

⎗
✓
#Don't run this if you already have PyTorch installed.
pip install torch torchaudio

Then we can install the rest of the dependencies

⎗
✓
# if you have an error with xformers, specify a specific version number in requirements.txt
# ie: xformers==0.0.22
pip install -r requirements.txt

Finally, we install audiocraft

⎗

1	pip install -e .

If you want to use .mp3 files, you need ffmpeg installed, either through your system or Anaconda:

⎗
✓
sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge

Also consider installing jupyter so you can run the .ipynb notebook files

⎗

1	pip install jupyter

Step 4: Starting the WebUI

Run the following command:

⎗

1	python app.py

Then, go to the local url: http://127.0.0.1:7860

Step 5: Picking the right settings

Under the MusicGen -> Settings tab

change Output Audio Channels from stereo to stereo effect, this improves audio quality
change the model from large to melody so we can prompt with a base track
for Decoder, change Default to MultiBand_Diffusion to get higher quality outputs

Under the MusicGen -> Audio tab

change Input Audio Mode from sample to melody (change from continue audio mode to follow audio melody mode)
then, upload an input audio prompt

Under the MusicGen -> Generation tab

adjust number of prompts
type prompts
adjust durations

Anon's Guide to A.I. Music Remixes Using AudioCraft

Q: What is AudioCraft?

Q: Is this the thing that lets me make my waifu sing songs?

Google Collab Notebook * (Recomended) *

Step 1: Running the Collab

Step 2: Picking the right settings

Q: How can I make my prompt sound closer the melody I provided?

Linux Local Install Guide (Nvidia Only)

Requirements

Step 1: Clone the Repo

Step 2: Create a Virtual Environment with Conda or venv

- Using Conda:

- Using venv:

Step 3: Install Dependencies

Step 4: Starting the WebUI

Step 5: Picking the right settings

Q: How can I make my prompt sound closer the melody I provided?

TODO:

Find someone to write a guide for windows

Find if there's a way to add AMD support

Anon's Guide to A.I. Music Remixes Using AudioCraft

Q: What is AudioCraft?

Q: Is this the thing that lets me make my waifu sing songs?

Google Collab Notebook * (Recomended) *

Step 1: Running the Collab

Step 2: Picking the right settings

Q: How can I make my prompt sound closer the melody I provided?

Step 3: Download and share your best (and worst) gens!

Linux Local Install Guide *(Nvidia Only)*

Requirements

Step 1: Clone the Repo

Step 2: Create a Virtual Environment with Conda or venv

- Using Conda:

- Using venv:

Step 3: Install Dependencies

Step 4: Starting the WebUI

Step 5: Picking the right settings

Q: How can I make my prompt sound closer the melody I provided?

Step 6: Share your best (and worst) gens!

TODO:

Find someone to write a guide for windows

Find if there's a way to add AMD support

Linux Local Install Guide (Nvidia Only)