Resonance Model:
News:
Resonance R1 Lite is training.
All funds raised across all channels will be used to fund infrastructure rental to bring you the best version of Resonance,
and help cover costs associated with ensuring free public access to the model (and generations, without involving CivitAI) remains available to anyone who wishes to use it.
https://www.zeffy.com/en-US/fundraising/ec77a176-672f-4aa9-ba4d-4c74607379fd
Donate in US Dollars:
Donate with Worldwide Currencies:
Socials:
- Discord: https://discord.gg/pBwH4AwYzP
- TreeChan: https://treechan.net / https://board.spectrometer.art
- Booru: https://booru.spectrometer.art
The What:
A Stable Cascade based finetune combining anime and furry, since everyone else seems to screw up model training in some manner.
FAQ:
- What is a "Lite" model?
A "Lite" model is a cut down 1B parameter version of the 3.6B Stage C and is intended for lower performance and memory capacity hardware.
This is also incompatible with the currently existing ControlNets which only work with the larger 3.6B Stage C model.
Models with "Lite" somewhere in the name will indicate that it is a 1B model. You can also identify whether the model is "Lite" or not by observing whether the model is 2GB for 16 bit values, or 4GB for 32 bit values.
This model size difference also means that LoRAs must also be recreated for "Lite" and the larger 3.6B Stage C model separately.
GUIs and Related Software:
ComfyUI:
https://github.com/comfyanonymous/ComfyUI/ (Does this really need explanation?)
CAPGUI:
GUI that resembles Auto1111/Forge and tries to preserve as many habits and provide a smoother user experience.
Also the front end for the aforementioned free generation infrastructure.
Uses ComfyUI as a backend, and in heavy development:
https://github.com/Jordach/CAPGUI
If you have a piece of software that works with Cascade, let me know on Discord or if I'm in the thread.
Downloads:
Resonance:
Note: The non Lite models of Resonance will be using the text encoder trained from the Lite training runs. It's marked resonance_r1_eX_te
due to the Text Encoder being shared by both models.
Auto Complete CSVs:
- Anime/Gelbooru: https://cdn.spectrometer.art/resonance_anime.csv
- Furry/e621: https://cdn.spectrometer.art/resonance_furry.csv
Lite:
Epoch 7: https://cdn.spectrometer.art/resonance_lite_r1_e7.safetensors
Epoch 7 Text Model: https://cdn.spectrometer.art/resonance_r1_e7_te.safetensors
Epoch 6: https://cdn.spectrometer.art/resonance_lite_r1_e6.safetensors
Epoch 6 Text Model: https://cdn.spectrometer.art/resonance_r1_e6_te.safetensors
Epoch 5: https://cdn.spectrometer.art/resonance_lite_r1_e5.safetensors
Epoch 5 Text Model: https://cdn.spectrometer.art/resonance_r1_e5_te.safetensors
Epoch 4: https://cdn.spectrometer.art/resonance_lite_r1_e4.safetensors
Epoch 4 Text Model: https://cdn.spectrometer.art/resonance_r1_e4_te.safetensors
Epoch 3: https://cdn.spectrometer.art/resonance_lite_r1_e3.safetensors
Epoch 3 Text Model: https://cdn.spectrometer.art/resonance_r1_e3_te.safetensors
Epoch 2: https://cdn.spectrometer.art/resonance_lite_r1_e2.safetensors
Epoch 2 Text Model: https://cdn.spectrometer.art/resonance_r1_e2_te.safetensors
Epoch 1: https://cdn.spectrometer.art/resonance_lite_r1_e1.safetensors
Epoch 1 Text Encoder: https://cdn.spectrometer.art/resonance_r1_e1_te.safetensors
Epoch "Zero": https://cdn.spectrometer.art/resonance_lite_r1_e0.safetensors
Epoch "Zero" Text Model: https://cdn.spectrometer.art/resonance_r1_e0_te.safetensors
Resonance Prototype Delta/Epsilon:
Notes:
Epsilon is a 3.6B model trained using Delta's text encoder.
Stage C (Lite) Delta cannot be used with ControlNets as the official ControlNets are published are for the 3.6B model, which will work with Stage C Epsilon.
- Furry/e621 Tags: https://gist.github.com/Jordach/aedc6edaafd8abb139b7ec9b6c3965f4/raw/5c153506294ae292af46cf0733ff62e8e537ff14/reso_proto_delta_furry.csv
- Anime/Gelbooru Tags: https://gist.github.com/Jordach/aedc6edaafd8abb139b7ec9b6c3965f4/raw/5c153506294ae292af46cf0733ff62e8e537ff14/reso_proto_delta_anime.csv
Delta:
- Text Encoder: https://static.spectrometer.art/resonance/models/reso_proto_delta_e5_te.safetensors
- Stage C (Lite): https://pixeldrain.com/u/dXRdaWpr
- Combined Model: soonTM
Epsilon:
Future Plans:
WTF Bruh the Zeffy Goal Changed!?
This is less for me as it is more about you: who wants to fucking pay rent forever to some shitty company just to train models with a dataset this size? I don't, you definitely don't, and neither of us want it to take forever - the standard Catch 22. However, the box in question is equipped with 8x 32GB GPUs that allow for some meme tier model training in super short time, while also allowing me to never indulge in the smooth brained move that is commercial hosting of the model.
The Vision Language Model:
The VLM is the result of having dedicated private hardware that has zero rental upkeep - meaning projects like community NL submissions can feed into the VLM resulting in less flowery and more straight to the point image descriptions. I don't want to write an essay to describe someone in a train station in the rain looking a bit sad. Whereas a few booru tags would've been more efficient for this purpose. This VLM finetune will be fully open sourced under MIT or Apache 2, or some maximally permissive license. Everything made from public access resources should also be licensed freely to the general public.
Resonance R2:
Which is a 10m (potentially 12m) total image dataset comprising of the following:
- 4m e621
- 4m Gelbooru
- 2m rule34.xxx (Deduplicated against e621 and Gelbooru, potentially 4m storage space permitting.)
Furthermore, the full dataset will be properly given natural language, and the training steps looks like this unshuffled:
- 100% Tags
- 10% Unconditional Training
- 10% Natural Language
Example:
For 1 million tagged training steps, 100k unconditional steps are added, as are 100k natural language steps added. This results in 1.2m total steps. These are duplicate batches and shuffled training wise amongst each other so even learning is achieved. For natural language for what would be tags inserted are swapped for natural language.