Cloning voices with AI, generating AI audio

And AI music covers

Here are the requirements for this guide:

In other words, there is still a chance you can use already trained RVC files to generate AI voices AND do XTTS quick-training if you have less than 6gb of VRAM.

If you meet this requirement we can begin, but first, a little disclaimer:

Disclaimer

I am not responsible or liable for any kind of misuse or illegal use of this guide.
At no point here i encourage any kind of illicit action, nor do so the developers of the software used here.
By using this guide and these software you agree to not use this for any illegal or unlawful content.

Leaving that aside, now we can start with the guide!

This is the table of contents for this guide:

Required knowledge used through all the tutorial.
Required Software:
Software to prepare a dataset to clone a voice with AI:
The Audio dataset creation:
Installing RVC Mangio Fork GUI:
Training with RVC Mangio Fork GUI:
Using trained AI voices with XTTS and RVC:
Making covers Automatically and Manually:
1. Making covers automatically with AICoverGen UI
2. Making covers manually with UltimateVocalRemover and Audacity:

There are some things you need to know how to do:

Required knowledge used through all the tutorial.

Opening a command prompt (CMD) window on a specific folder: This is super simple, just go to the address bar on your folder and write "cmd" to open a command prompt window on that specific direction:

Opening notepad: You probably know very well how to do this, but if not just press the windows start button and type "notepad", the application should appear there ready to open.

Getting the full location of a file: Another awfully simple thing to do, just right click a file, select properties, go to the security tab and you will have the full path of a file to copy and paste:

Required Software:

7zip
Python 3.10.x
GIT
FFMPEG (existing in PATH, not just in a folder)
Visual C++ build tools.

Make sure NONE of the software in this guide is installed on onedrive. Paths like C:\onedrive\user\RVC are no good. This software is critical to unpack and run the AI applications we will use. First 7zip, which we will use to open compressed 7z files, if you already can open 7z files then ignore this part. Go to https://www.7-zip.org/download.html and download 7zip and download it, you'll need the one that says 64-bit Windows x64 and install it. Then once it is installed open it and go to options, 2 columns will appear, your username and all the PC users, click on your username column where it says 7z, like this gif:

You should be able to open 7z files with 7zip now.

Now python 3.10, ignore if you already have python in your PC Go to https://www.python.org/downloads/release/python-3108/ and download the x64-bit installed at the bottom. I personally chose 3.10.8, but as long as it is 3.10 it should be fine which version you choose. Of course, if you already have python installed, skip this step.

When you install you MUST MAKE SURE that "Add python.exe to PATH" is checked like in the screenshot here, otherwise you will have problems later.

When you are done, press the "Disable Path length limit" option:

Now to install GIT.
GIT allow us to clone the repositories where the AI audio tools are in our PC, to work locally and without need for internet.
Go to https://git-scm.com/download/win and download the 64-bit version:

Now it's time to install GIT. By default it will add GIT to the context menu (right click menu). You will be asked a lot of stuff, just keep pressing "next", the default options are good.

Finally, FFMPEG. FFMPEG is a collection of open source and free libraries to manipulate audio and video. It is arguably the ultimate tool to convert your videos, audio and manipulate them, providing BETTER capabilities than paid software. Go to https://www.gyan.dev/ffmpeg/builds/ and get the FULL ffmpeg 7z file. You need 7zip to open it.

Use 7zip to open it and put drag it to another folder on your PC, i prefer C: drive but it's not necessary. Rename it as "ffmpeg" to simplify things in the following steps:

Inside the "ffmpeg" folder there is a subfolder called "bin", in this folder are the ffmpeg executables, this is the folder you want to add to PATH.
Click the directory bar to copy the address, like this:

There are multiple ways to open the environment variables menu, the quick way is to press the start button and start typing "environment variable" it will automatically open the menu we want:

From there you have to choose "Environment variables", then on System variables select "Path" and press "Edit", finally you press "New" and add the path to FFMPEG.

There IS a small chance you won't have the "Edit" button enabled if enter this way, you can manually enter by going to the start menu, writing "Control Panel", then "System and security", then "System", followed by scrolling all the way to the bottom to "Advanced System settings" and clicking on it. Like here:

Either way, once that you put ffmpeg in the PATH, press ok on all the windows and sub-windows opened until they are all closed.
To check if you actually put ffmpeg folder on PATH, you can open a CMD window and write "PATH":

As you can see FFMPEG is there, so it was correctly added to path, if it's not there you either put the path to ffmpeg wrong or you didn't click "ok" to close all the windows related to environment variables.

Visual C++ build tools are needed by XTTS and AICoverGen, here is the link: https://visualstudio.microsoft.com/visual-cpp-build-tools/
Here download this:

Then click desktop development with C++ and install:

It should end looking like this:

Now that we have all the requirement software, it's time to get the software to get an audio dataset AND the AI tools.

Software to prepare a dataset to clone a voice with AI:

To clone a voice you need a dataset that consist of clean, noise-less, echo-less, reverb-less audio of said voice. How many minutes are required differs from option to option. RVC is the most popular option because it's free and offers the best quality compared to other alternatives.
A dataset for RVC should be around 10-13 minutes of your subject speaking, but if your dataset quality is exceptionally high, you can get a great result with as little as 5 minutes. The software we are going to use to create this clean, noise-less, echo-less, reverb-less dataset is the following:

1. Audacity 2. Ultimate Vocal Remover

Audacity is a free software to record and manipulate audio. To download it just go to https://www.audacityteam.org/download/windows/ and install it, there isn't much to say about this for now:

Ultimate Vocal Remover is free application to remove noise, echo, reverb from audio, among other things using AI algorithms. To download go to https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/v5.6 and download the main link there. Make sure to install it on C: drive otherwise you might get errors.

The Audio dataset creation:

Now that both software are installed, let's check how to use them. Audacity has the ability to record your physical microphone OR every sound you listen on your headphones, like the audio of a twitch stream, the audio of a video call or the audio of a youtube video. Open Audacity and press the Audio Setup button and click Audio settings option:

In the host, use WASAPI, which is arguably the best one:

In the recording devices, you might see your microphone (if have one) and your headphones (loopback), choose the one you will use:

If choose loopback because you are going to record what you hear on your headphones, make sure the volume on your PC, headphones and video is at the highest, because THAT will be the volume level it will record. If your volume is low, the audio will be low too.
Anyway, here is my audio:

If you already have a video with the audio and want to extract the audio, open a command prompt window in the folder where your video file is and write the following: ffmpeg -i "your video here.mkv" -vn -ac 2 -acodec pcm_s16le audio.wav Obviously replace your video here.mkv with the name of your video file which might have a different extension other than .mkv like .mp4 or avi. For example video name.mp4. I am going to use a .mkv video file as an example:

There is the possibility that in your audio there are other people speaking rather than just one person, that is quite normal.
What you have to do now is manually delete all the voices that do not belong to the person's voice you want to clone.
To do this, you have to select on Audacity the section of the audio you want to delete and press the DEL key, or CTRL and K to delete sections of audio until you only have your voice, like on this gif:

After that you are pretty much done with Audacity for now, all that is left is to save the audio, to do this go to File and Export Audio and name your dataset and where it will be saved:

Now, it's necessary to improve the quality of this audio so it becomes fitting to use in AI voice cloning, this is where Ultimate Voice Remover comes.

Open Ultimate Vocal Remover, this is the user interface:

There are two methods we will use (VR Architecture and MDX-NET) and each has their own models, the models do stuff like remove noise, remove echo, remove music, etc.

Click on Select input or the empty bar to choose your audio file recorded with Audacity. (If click on the folder icon you will be asked to choose a folder instead) By default you just there are 1-2 models, you need to download more. To do so, just open the drop menu on models, click to download methods, check the method you and and the model of each method, then press download, like this GIF shows:

These are some models you should get:

1. MDX-Net: Kim Vocal 1 2. MDX-Net: Reverb HQ 3. VR Architecture: UVR-DeNoise 4. VR Architecture: UVR-De-Echo Normal

After you have downloaded them they will appear on the drop menu and you can use them.
Notice that each model comes with their own options. For example, the DeNoise model asks you if you want to save the file with no noise or if want so save only the noise itself, obviously you want to save the no noise version in that case. The same with the others:

The idea here is to take the audacity audio and run it through a model, then the RESULT of that will be ran through the next model and the RESULT of that one will be run through another model and so. This is how my folder with the audio dataset ended after going through the 4 models:

Your audio should also look like that. You want to use the one that went through all the models i mentioned and has only the vocals without noise, echo or reverb. All that is left to do is to remove the silence from your dataset. To do this open Audacity again and open that file there (File > Open).
Your audio should look like this:

You can clearly see the voices and the silences there. Now, press ctrl and A at the same time to select all the audio and go to Effect > Special > Truncate Silence.

The default option of -20db is fine, just press apply and it will start working. Now the silence on your audio dataset will be removed, leaving only the voice:

And you are DONE with the dataset! You have a good dataset you can use to clone a voice with RVC!

Installing RVC Mangio Fork GUI:

RVC is currently the best tool to train a voice checkpoint with AI. And the Mangio fork is the best fork of RVC and easy to install. Go to https://huggingface.co/MangioRVC/Mangio-RVC-Huggingface/tree/main and download Mangio-RVC-v23.7.0_INFER_TRAIN.7z:

Open it with 7zip and then drag it to another folder of your choosing. Now it's time to install tensorboard in a virtual environment for RVC. In the RVC folder, open a CMD window and put the following lines one by one:

python -m venv venv venv\scripts\activate pip install tensorboard

Should look like this, if you ran them one by one:

Once you are done installing tensorboard just close the CMD window. Now open a notepad and paste the following text:

@echo off call venv\Scripts\activate tensorboard --logdir logs pause

So it looks like this:

Save it as "LaunchTensorboard.bat" on the RVC folder, making sure that you are NOT saving it as a .txt but as "All files" like the image shows. The name is not important as long as it ends in .bat:

Training with RVC Mangio Fork GUI:

First of all, launch "LaunchTensorboard.bat" and go to the address you are given:

It is not REALLY necessary to run tensorboard, but it will help you get a good result when training. Once you opened the tensorboard site you can minimize it, we won't focus on that for now, instead we will open RVC. Run go-web.bat on the Mangio folder and go to the address given to see the RVC user interface. Once you have your dataset ready, go to the train tab in RVC. Here is a quick summary of what you'll find in the RVC train tab:

These settings are good:

Setting	Value
Version	v2
Target sample rate	40k
Number of CPU processes used for pitch extraction and data processing	2
Select the pitch extraction algorithm	rmvpe
Whether to save only the latest .ckpt file to save hard drive space	Checked
Cache all training sets to GPU memory	NOT Checked
Save a small final model to the 'weights' folder at each save point	Checked

The number of epoch, the save frequency and the batch size all depend on your dataset and GPU. Around 300-400 epoch could be fine and saving every 10-20 is good too. Give a name to your project (that will be the name of the trained file), mine is named "TestingPlum", you choose a fitting name for your project. Also, enter the path where your dataset is:

That should be where your audio file or files are. In my case i put my audio files on a folder named "DatasetPlum" and put that on RVC's dataset folder.

Now this is important, you CAN get error if give your folder a bad name with spaces like "my dataset 123" or a very generic one like "files" that might confuse RVC. Give them a distinctive name that won't give errors. So do NOT have spaces in the path to your dataset like "F:\ new folder \dataset". Also, if you give your project a name you already used it won't overwrite it, instead it will raise errors.

Once you have your folder path press "Process Data" and wait until it says it is done in the CMD window:

At this point you might get this error:

This happens because inside your dataset folder there are OTHER non-audio files that RVC cannot process. Make sure your dataset only has the audio files to process.

Now press the "Feature Extraction" button and wait until it is done:

Before pressing the start training button, press "Train Feature Index" and wait until it says "Successful Index Construction", it shouldn't take more than a few seconds. After that you can finally press the start training button:

Now, if everything went well (which should be the case if followed the steps) it should start training and after some seconds you will be able to see in the CMD window that it's beginning with epoch 1:

Now let's go to the tensorboard window and press the update button on the top right. Your project name should appear there, along some graphs, make sure to check both of your project checkboxes like the gif here:

Select the "Scalar" tab to see the correct charts. Uncheck "Ignore outliers in chart scaling". Increase the smoothing bar from 0.6 to 0.999 and in the search bar write total/g/loss to search for the graph we want:

This is the chart where we see the training of the AI dataset, the horizontal line are the steps while the vertical one the value. To update the chart press the update button and then the blue icon on the right to fit all the data inside the chart, like this:

Once your training has finalized it your chart should look like this:

But WHAT does this chart even mean? This chart give us information to determine how many epoch we should have used for good results and when the overtraining begun to happen. When the line hardly shows any change over time we can say that we are overtraining and won't get better results:

So, if we put our mouse on the Tensorboard table we will get very useful information:

Around this point my voice AI training stopped improving significantly, it is step 3.330 and around 13:53:00. But which epoch is THAT? To know this we will go to the "Logs" folder in RVC's folder, inside you will see a folder with your project name, if enter there you will see lots of files and a train.txt text file, open it.

Here we can see a detailed log of the training, if go to the step 3330 that was trained at 13:53:00 we can see RVC was training epoch 207 at this point. So, i should have chosen to train for 207 epoch, maybe 220 to be sure rather than 500. Oh well, i can always try again!

The training is done, but what now? Where are the AI voice files that trained? There are 2 files: The weight .pth file and the pitch .index file. The weight is one the "Weights" folder in RVC and the pitch is on the logs folder, the one named added_IVF etc etc.

The name PlumTesting_e250_s6750 means PlumTesting at epoch 250 and step 6750. So you can choose which epoch to try, but i personally prefer to re-train from scratch to the correct one. You only need those 2 files to use your trained RVC AI voice checkpoint anywhere that allows RVC checkpoints, like this:

If you train another voice with RVC, make sure that you are on the correct tab on tensorboard (which is scalars) and that the only checked boxes are the one you are currently training, like this:

That covers all the RVC training.

Using trained AI voices with XTTS and RVC:

Installation of XTTS:

Now we have a RVC trained AI voice, it's time to use it! For this you can just record yourself speaking with audacity OR if you don't like that you can use an AI text-to-speech application. XTTS is one of the best text-to-speech AI tools i have found. Compared to other options, XTTS has a quick voice cloning feature, allowing you to clone a voice with just 7-10 seconds of audio.
Let's install it!

First, open a CMD window where you want to install it and copy write this: git clone https://github.com/pbanuru/xtts2-ui.git

After that a folder called "xtts2-ui" should have been created. Close the CMD window, enter the xtts2-ui folder and open a CMD window there. Now paste these lines one at a time and press enter:

python -m venv venv venv\scripts\activate

It should look like in the screenshot here:

Now paste this and press enter: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Finally, paste this pip install -r requirements.txt and press enter:

And we are done, you can close the CMD window. Open notepad and write the following in it:

Name it run.bat and save it on the xtts2-ui folder, it doesn't matter the name as long as it ends in .bat AND you are saving it as "All files", not as a .txt file:

Running XTTS to make speech from text:

You will use that file to run XTTS. Click it and a CMD window should open, go to that local URL on your web browser. It is a URL that only applies for your PC, and will stop working if you close the CMD window:

Most applications (rvc, xtts, etc) will run on a local host site that exist solely your PC. Make sure you are only running one of them at the time, because sometimes both want to use the same local host site at the same time which will cause one to not run be able to run at all.

Here we have the XTTS ui:

You can write text and a default voice (Rogger) will speak it if press generate. XTTS generation is non-determinant, that means you can get VERY different results even if use the same audio source, keep generating until you get something decent! You can also add a 7-10 second audio and it will mimic that audio voice instead.

If you will use a downloaded model or got it from another person instead of training one to generate AI audio.... Generate an audio with a quality you like in XTTS with default voice and go to the 6C section below.

If you trained a model and have a dataset.... Get an audio is very easy, using Audacity you can open an audio, zoom if press the magnifying glass icon, select a portion of the audio, go to file > export audio, make VERY SURE you check the current selection option when saving and save it, just like in this gif:

I named mine crop.wav. If drop the audio in xtts2-ui it will look like this:

Do not use the cropping tools here, they are bugged. If generate audio this should be the result: you have a download button, use it to download the audio and get to the next step.

If your audio sounds good you can choose to stop here. If you want it to sound EVEN BETTER then download the audio you generated, close XTTS, go to RVC folder and open RVC using go-web.bat file:

RVC to cloning the voice generated with XTTS (or make it better):

If you downloaded a model or got it from another person instead of train it.... Mangio RVC saves the .pth models in Mangio-RVC-v23.7.0\weights and the .index (the file named "added_IVF_xxxxxxx") in Mangio-RVC-v23.7.0\logs\Voice-names-folder\ so have to make a folder inside logs folder named like your .pth model and inside there put the .index.

If you trained a model and have a dataset.... You should already have the .pth and .index file in their respective folders. All that is next is to do is take the voice generated and pass it through RVC. Choose the model, the index (if your model doesn't appear in the droplist, press the refresh models button) and the path to your audio:

There are some settings that will help you get a better result, i have summarized them here:

Once you chose the settings just press generate and in a few seconds you'll have your cloned audio ready!:

By the way, every single audio you generate with either RVC or XTTS will also be saved on C:\Users\PUT-YOUR-WINDOWS-USERNAME-HERE\AppData\Local\Temp\gradio\ inside folders with long names like 3jmloju43oi15414o5njuo4uj5 as backups. You might want to delete them to save some space. You can enter that folder by pasting that in the directory bar and pressing enter(obviously putting your pc username, mine is Plum):

They are named exactly as the generated audio, so you can also search them!

That covers cloning AI voices, now making music covers with AI voices!

Making covers Automatically and Manually:

Making covers automatically with AICoverGen UI

AICoverGen requires Sox to be in PATH, so let's download it and put it in PATH. Go to https://sourceforge.net/projects/sox/files/sox/ and press the green button:

Then simply install it, there is no need for desktop and start menu shortcuts, just copy the path where you installed it:

If you didn't copied it, go to the folder where it was installed copy the path clicking on the directory bar:

Just like with FFMPEG, put it in on PATH in the enviroment variables:

Now that the requirements are met, it's time to install AICoverGen. Most AI tools use python 3.10.x in order to work, some REQUIRE specifically 3.10.x and refuse to work with other versions. However, in the case of AICoverGen you are required to use python 3.9.x. But there is a workaround that limitation.

Open a CMD window where you want to install AICoverGen and paste this: git clone https://github.com/SociallyIneptWeeb/AICoverGen

Now go inside AICoverGen folder, you should see a txt file called "requirements.txt". Open it and erase everything in it, once the .txt file is empty, paste this list:

⎗
✓
deemix==3.6.6
fairseq==0.12.2 
faiss-cpu==1.7.3 
ffmpeg-python==0.2.0 
gradio==3.39.0 
lib==4.0.0
librosa==0.9.1 
numpy==1.23.5
onnxruntime_gpu==1.16.3 
praat-parselmouth==0.4.3
pedalboard==0.7.7 
pydub==0.25.1
pyworld==0.3.4 
Requests==2.31.0
scipy==1.11.1
soundfile==0.12.1
torch==2.0.1
torchcrepe==0.0.20
tqdm==4.65.0 
yt_dlp==2023.7.6
sox==1.4.1
pycryptodomex==3.19.1
deezer-py==1.3.7
mutagen==1.47.0
click==8.1.7
sacrebleu==2.4.0
hydra-core==1.0.7
cffi==1.16.0
regex==2023.12.25
bitarray==2.9.1
cython==3.0.7
future==0.18.3
pyyaml==6.0.1
uvicorn==0.25.0
packaging==23.2
altair==5.2.0
gradio-client==0.8.0
matplotlib==3.8.2
markupsafe==2.1.3
pandas==2.1.4
websockets==11.0.3
semantic-version==2.10.0
ffmpy==0.3.1
aiofiles==23.2.1
markdown-it-py<3.0.0
httpx==0.26.0
pydantic==2.5.3
pillow==10.1.0
jinja2==3.1.2
orjson==3.9.10
python-multipart==0.0.6
aiohttp==3.9.1
huggingface-hub==0.20.1
fastapi==0.108.0
mdit-py-plugins==0.3.3
typing-extensions==4.9.0
numba==0.58.1
resampy==0.4.2
decorator==5.1.1
joblib==1.3.2
pooch==1.8.0
scikit-learn==1.3.2
audioread==3.0.1
flatbuffers==23.5.26
coloredlogs==15.0.1
protobuf==4.25.1
sympy==1.12
urllib3==2.1.0
certifi==2023.11.17
idna==3.6
charset-normalizer==3.3.2
networkx==3.2.1
filelock==3.13.1
colorama==0.4.6
brotli==1.1.0
lxml==5.0.0
portalocker==2.8.2
tabulate==0.9.0
antlr4-python3-runtime==4.8
pycparser==2.21
h11==0.14.0
toolz==0.12.0
jsonschema==4.20.0
fsspec==2023.12.2
kiwisolver==1.4.5
fonttools==4.47.0
contourpy==1.2.0
importlib-resources==6.1.1
pyparsing==3.1.1
python-dateutil==2.8.2
cycler==0.12.1
tzdata==2023.4
pytz==2023.3.post1
mdurl==0.1.2
linkify-it-py==2.0.2
anyio==4.2.0
sniffio==1.3.0
httpcore==1.0.2
pydantic-core==2.14.6
annotated-types==0.6.0
yarl==1.9.4
attrs==23.1.0
aiosignal==1.3.1
async-timeout==4.0.3
frozenlist==1.4.1
multidict==6.0.4
starlette==0.32.0.post1
llvmlite==0.41.1
platformdirs==4.1.0
threadpoolctl==3.2.0
humanfriendly==10.0
mpmath==1.3.0
pywin32==306
rpds-py==0.16.2
jsonschema-specifications==2023.12.1
referencing==0.32.0
zipp==3.17.0
six==1.16.0
uc-micro-py==1.0.2
exceptiongroup==1.2.0
pyreadline3==3.4.1

It should end up looking exactly like this:

These are the specific dependencies that AICoverGen uses in python 3.9.x.

Now, we are going to make a virtual enviroment, go inside the AICoverGen folder, open a CMD window and type the following, one line at a time:

python -m venv venv venv\scripts\activate

Now you are going to paste the following while on the (venv):
python -m pip install torch==2.0.1+cu118 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

It might take a while, after you are done and still on (venv) write pip install -r requirements.txt

Once you are done with that it's time to download the AICoverGen required checkpoints. To do so, paste python src/download_models.py while in the (venv):

And we are done! Now let's make a file to actually run AICoverGen. In the AICoverGen folder, make a txt file (right click, new, text document) and write the following in it:

@echo off call venv\Scripts\activate src\webui.py pause

Call this file run.bat and save it on the AICoverGen folder. It can be any name as long as it ends in .bat AND you are saving it as "All files", like the screenshot here shows:

You will use this file to run AICoverGen.

The process is SUPER simple because almost everything is done behind curtains.

Put your RVC models on rvc_models folder in AICoverGen UI folder:

Open AICoverGen. All you have to do is pick a RVC voice, paste a youtube link (or the path to your song if have it on your PC), adjust the settings and press generate. Of course, there are settings you SHOULD change. I will use a song from my PC here, but you can use a youtube link too! HOWEVER, it's likely that youtube won't allow a download and you will get an error.

Now, it's important that the path for your songs does NOT have special symbols or it's too long. A path like F:\Playlist\Mysongs\2024\epicsongs\metal\somefoldername\another_folder_just_because\Super-Cool-Song!!_[Concert~version].mp3
will most likely fail to work because of the symbols like ~ [] !! - _ in the MP3 file. Try to keep it simple.

Now comes one of the most important settings, the pitch here:

If your AI voice and your song is singer gender matches then you don't need to change anything. If not you might need to change the number. Positive for converting a song originally sang by a male to an AI female cover. Negative for converting a song originally sang by a female to an AI male cover. While the bar moves from 1 to 2 you CAN use values like 0.4 or -0.35 or 1.35.

You can change things like how much of the accent to keep on the cover:

And the individual volumes and reverb of the song:

After that all you need to do is press generate and the application will do the rest of the job.

Making covers manually with UltimateVocalRemover and Audacity:

If you read the entire guide up to this point, you should have a good idea of how you do this. If not, i am going to draw a diagram showing you:

Sometimes you might want the reverb or chorus as another track too, but it's entirely optional, it really depends on the song. I am just mentioning it to show there is the possibility to keep some extra reverb in the AI cover. So, take a song and pass it through Ultimate Vocal Remover to divide it into main vocals, accompanying vocals and instruments.

Here are some models and what they are good at:

They are not ALL the models, but it should give you a good idea that some models are made specifically to get the instrumentals/secondary vocals in amazing quality. Speaking of quality, there is a value to keep in mind: The Window and Segment size.

Remember that ultimate vocal remover uses A LOT of resources so increasing this value might crash your PC. Now open RVC an run the vocals through your model.

Once you have your AI vocals it's time to join them to the instrumentals and other parts in Audacity. Open audacity and import the tracks just like in this gif:

Notice your vocals are NOT stereo, they are mono. We need to convert them to stereo to improve the quality. To do this you need to select the AI vocal track, duplicate it, select both and join them as stereo by using the track menu, just like this gif shows:

Now comes the interesting part: Improving things. There are an incredibly big amount of plugins and effects for Audacity that can make your voice have more "body", your instrumentals to sound more crisp or deeper, and a big big etc. I will absolutely not go through all of that otherwise this guide would be easily 10 times bigger than it already is, plus there is not a one-fits-all method to improve things. Sometimes you want louder instrumentals, or more crisp voices, it all depends on your song.

You'll have to google to look for audacity plugins and how they can improve your song. However, I'll teach you how to do some general optimizations with audacity. There are MANY settings here you have to play with to get the optimal results. First, you can add some reverb to your AI vocals. Think of reverb as a type of echo that gives depth to sounds. If select the track, go to effects, delay and reverb and reverb. Like this gif:

You can also normalize the audio. Normalization in this context means increasing the volume of the really quiet parts and sightly decreasing the volume or the really loud parts, improving the audio. To do this you select the track, go to effects, volume and compression, and choosing normalize, like here:

If want to increase the volume you can always use the compressor special effect, as seen on this gif:

You can also apply a treble boost to your audio, basically enhancing the clarity and articulation of your vocals, which is obviously a good thing. You can find this in effects>EQ and filters>filter curve EQ>Presets>Treble boost, just like on this gif:

Probably the most interesting thing you can do is create AI cover duets. Basically running 2 AI voices with RVC so you have 2 AI singers for the same song, copy-pasting where you want singer A to sing, singer B to do so and when you want BOTH.

To do this just import both vocal tracks and copy/paste/delete parts as necessary:

And...that would be pretty much it!