Anon's Guide to A.I. Music Remixes Using AudioCraft
Q: What is AudioCraft?
A group of text to music diffusion models - basically, the music version of Dall-E or Stable Diffusion. One of the models, musicgen-melody, lets you use input audio as well, letting us remix songs.
Q: Is this the thing that lets me make my waifu sing songs?
No, that's RVC
Google Collab Notebook * (Recomended) *
If you have an Nvidia graphics card with less than 12gb VRAM, or and AMD graphics card, you'll need to use the Google Collab for now.
Step 1: Running the Collab
Go to the Google Collab URL:
Then make a local copy in your drive
File -> Save a copy in Drive
Then, run the Cel.
After everthing is installed, you should be given something that looks like this:
Visit the public URL to access the gradio web ui
Step 2: Picking the right settings
Under the MusicGen -> Settings
tab
- change
Output Audio Channels
fromstereo
tostereo effect
, this improves audio quality - change the model from
large
tomelody
so we can prompt with a base track - for
Decoder
, changeDefault
toMultiBand_Diffusion
to get higher quality outputs
Under the MusicGen -> Audio
tab
- change
Input Audio Mode
fromsample
tomelody
(change from continue audio mode to follow audio melody mode) - then, upload an input audio prompt
Under the MusicGen -> Generation
tab
- adjust number of prompts
- type prompts
- adjust durations
Q: How can I make my prompt sound closer the melody I provided?
You can adjust the temperature
setting in MusicGen -> Settings
to change how strictly melody is adhered to. A value of 0.1 will try to make it nearly identical to the melody, while 0.95 or higher will give it more freedom.
Step 3: Download and share your best (and worst) gens!
Linux Local Install Guide *(Nvidia Only)*
Requirements
The github repo recommends 16GB of VRAM, but I had no issues running it on my 3060 with 12GB of VRAM, though it used about 9-11GB depending on number of generations and generation length (~30s x 5 styles). It's possible you could get it working with a 10GB. Overall, it's fairly length limited, but has the obvious local benefits.
Step 1: Clone the Repo
Clone AudioCraft Plus:
git clone https://github.com/GrandaddyShmax/audiocraft_plus.git
Step 2: Create a Virtual Environment with Conda or venv
- Using Conda:
Create a conda virtual environment with conda create --name audiocraft python==3.10.9
Then activate it with conda activate audiocraft
- Using venv:
Create a virtual environment with venv like python3 -m venv ./venv/
Then activate it with source ./venv/bin/activate
Step 3: Install Dependencies
AudioCraft requires Python 3.9 or above and PyTorch version 2.0.0. or above
It's best to make sure you have torch installed first, in particular before installing xformers.
Then we can install the rest of the dependencies
Finally, we install audiocraft
If you want to use .mp3 files, you need ffmpeg installed, either through your system or Anaconda:
Also consider installing jupyter so you can run the .ipynb notebook files
Step 4: Starting the WebUI
Run the following command:
Then, go to the local url: http://127.0.0.1:7860
Step 5: Picking the right settings
Under the MusicGen -> Settings
tab
- change
Output Audio Channels
fromstereo
tostereo effect
, this improves audio quality - change the model from
large
tomelody
so we can prompt with a base track - for
Decoder
, changeDefault
toMultiBand_Diffusion
to get higher quality outputs
Under the MusicGen -> Audio
tab
- change
Input Audio Mode
fromsample
tomelody
(change from continue audio mode to follow audio melody mode) - then, upload an input audio prompt
Under the MusicGen -> Generation
tab
- adjust number of prompts
- type prompts
- adjust durations
Q: How can I make my prompt sound closer the melody I provided?
You can adjust the temperature
setting in MusicGen -> Settings
to change how strictly melody is adhered to. A value of 0.1 will try to make it nearly identical to the melody, while 0.95 or higher will give it more freedom.