Creating Datasets for RVC using iZotope RX

In this guide i will be explaining how to use a Paid Software to clean audio for training models.
iZotope RX is known to be the software for denoising audio and the one used by "every" good model maker.
In this guide i'll be recommending mostly free VSTs/Plugins and some paid plugins which aren't needed but can make the process easier
This guide also has a step which uses Audacity a free software which you also need for audio labeling, better than truncate silence
from what i know this has only recently gotten support in RVC so make sure you aren't on an old version
If you want to skip to a section:

Creating Datasets for RVC using iZotope RX
also uhm sorry my dumbass used Google Drive for a hopefully "permanent" image hosting, but apparently for the images to load you HAVE to be logged into your google account. so yea uwu
Denoising the dataset

also uhm sorry my dumbass used Google Drive for a hopefully "permanent" image hosting, but apparently for the images to load you HAVE to be logged into your google account. so yea uwu

also someone made a "copy", files are hosted on imgur this time but well you might get errors like

⎗
✓
{"data":{"error":"Imgur is temporarily over capacity. Please try again later."},"success":false,"status":403}

so ye here's the link RVC-dataset-RX-imgur
also imgur may be banned in your country (just like my case with it being banned in Iran)
but well also that Google Drive is a bitch to host so i have to do goofy Inspect Menu stuff to get the link to it which apparently it can also change after a time period, either that or just that its a me issue
but well as a last ditch effort if nothing works, just go to the google drive folder it self and check the files there GoogleDrive folder with rentry images

I'm using Mainline RVC(at the time of writing ver.1006)
RVC already provides built versions, so go to the releases tab and download one of these two
something like this for example:

For Nvidia GPU users:
https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/RVC1006Nvidia.7z

For AMD/Intel GPU users:
https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/RVC1006AMD_Intel.7z

Getting a dataset

My prefered way of getting a dataset is using Cobalt
make sure to go to settings and change to Best and select audio

Cobalt (Website)
me using Cobalt is mainly due to the issue of YouTube being banned in my country

but if you can do it locally use these:

YT-DLP command line use (no GUI)
Stacher (GUI)
both work fine but if you dont want the hassel of learning commands, Stacher would be a good choice

my prefered file format is WAV 24 and 32 bit but it has large file sizes, so if you want smaller files, FLAC level 8 compression would be your second choice. other than that the other options aren't good.
avoid using MP3 for your datasets.
If you end up using WAV make sure to do either 32 bit Float or 24 bit.
but in the end 32 bit float and 24 bit are practically the same and doesnt matter much.

Loading the audio and changing some settings

Open the WAV or FLAC file also FLAC takes time to decode, so if you can use in WAV so that you dont have to wait for 1 minute for your dataset to get decoded.

now that we have the file in RX
make sure to turn this to only show the Spectrogram since you dont need waveform for now

Opacity

now after that it should look like this
Spectrogrma this is using Mel scaling, if you right click on the numbers list(20k and such). you can change scaling.
Mel is the best scaling in our case since it shows vocals better than Linear scaling would.

the brighter a color on the spectrogram, the louder it is

if the spectrogram shows a point on 20k that point would be 40kHz, basically the spectrogramx2 would be the actual sample rate

trying to explain a spectrogram

in my opinion the spectrogram cant be taught as well as long as you dont mess around for yourself and basically you'll learn for yourself
but this spectrum as example, which if you just rotate on its side would be the spectrogram
the top of the Spectrogram would be the right-most of the spectrum.

examples

for example, this is noise:
Noise
and this is breathing:

and this is speach:
Speech
as for the rest, i'd say im too lazy to try and somehow explain all

Modules / VSTs used and their settings

i use a couple of the tools in
first i use these
1. Adaptive Phase Rotation
Phase

Normalize to -3 dB
De-Ess
De-Crackle
De-Click
Spectral Denoise
Deconstruct
Plugins
1. Auburn Renegate
  https://www.auburnsounds.com/products/Renegate.html
  (basically noisegate but better) the free version will be more than enough
2. KiloHearts Dynamic
  https://kilohearts.com/products/dynamics
  Downward Expansion
3. Bertom Denoiser Pro
  https://bertomaudio.com/
  for some reason doesnt work on RX for me, so i use it on Audacity instead

Denoising the dataset

so this audio is at 48kHz even tho there is about 24kHz of actual data, most likely since this was recorded on a phone. but its the same process either way just that it will be more noisy even with 32k pretrain since RVC has to guess the rest of the frequencies.
but for now i'll just denoise to 32kHz aka 16k

this is before me touching the audio file, only after resampling it to 32kHz
Original
now we select the noise
Noise-Selected
now in this case one pass of Spectral Denoise wasnt enough
one-pass
the noise profile for the first pass being
Noise-Profile

second time running Spectral Denoise
Second-Pass
and as for second pass noise profile
second-pass-profile

from here on the rest is manual denoising which i cant really teach
so just use RX long enough to learn what noise looks like, and then manually clean those

now that we've also ran our plguins like Auburn Renegate and kHs Dynamics with Bertom Denoiser Pro
we save our file as 32 Bit float
24 bit will also be as good, but i like overkill uwu
now we open Audacity
first covnert your dataset to mono since RVC works on mono and not stereo
Truncate Silence is the old technique, we got a better way and its called audio labeling
follow these steps
open the menu for labeling
label
label-settings
you will now have it like this
Labels
now we go to export our audio
Export
Export-Settings
the output will be like this
output
now go in the RVC folder and place all these files in datasets folder