Win11 - How to install Automatic1111's Stable Diffusion WebUI for a 4090 and get 40+ it/s

Last updated 9/25/2023 Linux guide: https://rentry.org/installing-automatic1111 4090 AI discord community: https://discord.gg/zVAvFp3wnU Tested and written by GuruVirus

Prerequisites
Make folder, clone repo
Run 1200 test to confirm configuration is acceptable (minimum 32-37it/s is the target on Windows currently)
Troubleshooting
Additional customizations
Legacy instructions (Anything below this can be ignored)
Install cuDNN 11.x and CUDA 11.8 (can be skipped if installing a 2nd, etc. time)
Install libs while in (venv)
Set your commandline_args
Run webui-user.bat to load SD
Backup your working config
Extra credit: Negative guidance minimum sigma to reach 50+ it/s
1. Getting results with high sigma
2. Installing Xformers (not better than SDP but may decrease VRAM usage allowing higher resolution maximum output):
Troubleshooting
How to restore from a backup (e.g. you installed an extension and it broke SD or you want to use an updated version of SD)

Prerequisites

Install python 3.10.6
- IMPORTANT: check "Add Python to PATH" at bottom of the first page of the install wizard.
- Download: https://www.python.org/ftp/python/3.10.6/python-3.10.6-amd64.exe
Install git (latest version): https://git-scm.com/download/win
Install CUDA 12.1 (Win 11 version; 3.1GB): https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_531.14_windows.exe
- Verify your CUDA version (in CMD): nvcc --version
- CUDNN no longer needed.
Install Nvidia Game Ready driver 537.34 (released 9/12/2023).

Make folder, clone repo

Open cmd (not as administrator otherwise you'll always have to use administrator)
- mkdir C:\sdwebui
  - Cd c:\sdwebui
    - Git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git .
      - Run webui.bat or webui-user.bat to set up the python virtual environment (venv) folder. A1111 1.6.0 or newer will auto-install everything else you need (torch 2.0.1+cu121 and torchvision 0.15.2+cu121). It will then auto-launch the webUI in your browser.

Run 1200 test to confirm configuration is acceptable (minimum 32-37it/s is the target on Windows currently)

Prompt: Cat in the hat
Steps: 150
Batch SIZE: 8
- Test Results:
  - Look in the console take the time from the left column (ignore it/s here) and calculate 1200 ÷ seconds. This gets you your real it/s for comparison.
  - No args (remember to compare to baseline) with 4090 and Ryzen 7950X: 26-30.7it/s.
    - RTX 4080: 19-20it/s.
  - With --opt-sdp-no-mem-attention: 44it/s (now 46it/s with latest Nvidia driver).
    - RTX 4080: 27.9-29.2

Troubleshooting

If you have a top CPU (Ryzen 9 7000 series or Intel 13K series) and you're getting ~37it/s on the 1200 test, make sure you rebooted after installing the latest Nvidia Game Ready driver.
To verify if you have the correct Torch+cu installed you can use CMD and go into the webui folder/venv/scripts, activate.bat, run pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 . It will also install torchaudio which isn't used.

Additional customizations

Tip from reddit: If you go so settings > user interface > quicksettings list > add "sd_vae" to the end of the line.
- Use a comma to separate the argument like this. sd_model_checkpoint, sd_vae
- Apply settings and restart the UI.
- Add your VAE files to the "stable-diffusion-webui\models\VAE"
- Now a selector appears in the Webui beside the Checkpoint selector that lets you choose your VAE, or no VAE. You select it like a checkpoint. I prefer this option, because it allows you to easily disable the VAE if you want, or use a different one.
Make a folder in C:\ for all your data files. Set your webui-user.bat args to --xformers --data-dir "C:\folder"
- If you are considering symlinking (sharing the data dir between installs), be careful. The embeddings and model directories can easily be symlinked. Probably the extensions one as well. Don't symlink the whole C:\folder because of conflicting JSON files that then get put into C:\folder.

Legacy instructions (Anything below this can be ignored)

Install cuDNN 11.x and CUDA 11.8 (can be skipped if installing a 2nd, etc. time)

Set up the virtual environment (venv) first time

This step may not be necessary but we're doing it just in case.
Cd venv/scripts
- Activate.bat
  - To exit the venv at any time, type: deactivate.

Download cuDNN for your torch version for Cuda (see filename of cu118 = cuda 11.8).
- https://developer.nvidia.com/rdp/cudnn-download
  - Open the zip file to \cudnn-windows-x86_64-8.9.0.131_cuda11-archive.zip\cudnn-windows-x86_64-8.9.0.131_cuda11-archive\bin
    - Copy the DLLs to C:\sdwebui\venv\Lib\site-packages\torch\lib (replace)

Download Nvidia CUDA toolkit

You need the installer that matches the same 11.8 version of Torch (you will see this later on in the guide).
Here’s the Windows 11 local exe URL: https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_local
- Custom install
  - Uncheck drivers; do not overwrite newer drivers.
  - If you installed the wrong version, uninstall all from appwiz.cpl first before installing 11.8.

Install libs while in (venv)

Open cmd again
- Cd venv/scripts
  - Activate.bat
    - To exit the venv at any time, type: deactivate
  - You are now in a virtual environment and the current working directory string is prefixed with (venv)
    * Note: Venv is used to prevent your pip changes from affecting other versions of python you may have running on this machine.
    - pip install torchvision==0.15.1
    - pip install https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-win_amd64.whl
      - This is version 11.8, so match it with cuDNN for CUDA 11.x
deactivate

Set your commandline_args

cd C:\sdwebui
- notepad webui-user.bat
  - Add whatever commandline_args you want so it looks like this at the minimum:
  - set COMMANDLINE_ARGS=--opt-sdp-no-mem-attention
Note: You can back up the installation at this time by copying and pasting c:\sdwebui so you get sdwebui - copy (easier to work with if you rename it to sdwebui_date_fresh_venvTorchInstalled) and git pull will work to update it in the future to expedite trying new features and updated extensions. You can then copy the latest working folder and rename.

Run webui-user.bat to load SD

C:\sdwebui\Webui-user.bat

Backup your working config

Only do this if you have achieved your 40it/s minimum.

Once confirmed working, copy and paste C:\sdwebui and rename it to sdwebui_40its_freshinstall_date

Extra credit: Negative guidance minimum sigma to reach 50+ it/s

Pull request: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/9177
Install GitHub CLI: https://cli.github.com/ (the file I got was gh_2.27.0_windows_amd64.msi )
- Close and re-open CMD
- Cd c:\sdwebui
- Gh auth login
  - Press enter for Github.com
  - Down arrow, Enter for SSH
  - Y, enter to make new SSH key
  - Create a passphrase for your SSH key
  - Label the key: Github CLI SSH Key
  - Press enter to login through the browser
    - Copy one time code
    - Press Enter to open the login
    - Paste code, click Submit
    - When successful, go back to CMD
- gh pr checkout 9177
  - This will patch the current SD webUI folder.
  - Change ‘Negative Guidance minimum sigma’ in Settings.
    - Max value is ~3.0 before significant quality degradation.
    - Note extensions may break if they require a value of 0 (so no negative prompts can be skipped for those extensions).

Getting results with high sigma

CFG, Steps, HiRez fix steps, and HiRez denoise behave differently than you will be used to, especially at 4.0 (for maximum speed).
- 4.0 sigma makes things real tricky.
  - If you raise CFG you need to raise steps to avoid blur.
  - Steps (not high rez steps) still creates variance every ~20 steps with Euler A.
  - Recommend 1.5x-2x hirez steps over normal steps to find good shapes/polygons from normal steps and then refine the texture on top.
  - Denoise is really sensitive. Anything below 0.7 and it starts to add what appears to be blurry latent noise which adds lots of crystallization/polygons in my test prompt.
  - Recommend trying 20Cfg, 60s, 80Hirez steps, 0.75 denoise, 512x512 x2 upscale with Latent upscaler. SD 1.5 model.

Installing Xformers (not better than SDP but may decrease VRAM usage allowing higher resolution maximum output):

cd to venv/scripts/
- activate.bat
pip install -U xformers (can skip if using sdp instead of xformers)
- May be able to skip this.^
pip install ninja (can skip if using sdp instead of xformers)
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
- If you get file path length error deactivate venv and run:
  - git config --system core.longpaths true
- If you get a generic compile failure, especially related to cmake, make sure you install C++ build tools: https://visualstudio.microsoft.com/visual-cpp-build-tools/
Update commandline args and replace --opt-sdp-no-mem-attention with --xformers

Troubleshooting

"DefaultCPUAllocator: not enough memory: you tried to allocate 58982400 bytes." Make sure your paging file size is equal to or greater than your VRAM maximum (e.g. 24GB for a 4090). Automatic paging file management should work.
- May also show this error "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate x GiB (GPU 0; x GiB total capacity; x GiB already allocated; x bytes free; x GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
- Note from Vintro: "I got this torch error a lot before the PageFile fix. I know that error can also mean you are trying to go to hires, but I knew i wasn't close to my limit because I monitored my GPU VRAM and the reported value ALWAYS showed only allocation just below 4GB"

How to restore from a backup (e.g. you installed an extension and it broke SD or you want to use an updated version of SD)

Copy out all of your models, embeddings, output, and extensions folders.
If you made a copy after installing torch, you can copy and paste the folder (rename it to sdwebui_date) and update it
- cd c:\sdwebui_date
  - git pull
  - Try running webui-user.bat
Paste all your models, embeddings, output, and extensions folders back in.
- Try running webui-user.bat
  - Test

Win11 - How to install Automatic1111's Stable Diffusion WebUI for a 4090 and get 40+ it/s

Prerequisites

Make folder, clone repo

Run 1200 test to confirm configuration is acceptable (minimum 32-37it/s is the target on Windows currently)

Troubleshooting

Additional customizations

Legacy instructions (Anything below this can be ignored)

Install cuDNN 11.x and CUDA 11.8 (can be skipped if installing a 2nd, etc. time)

Set up the virtual environment (venv) first time

Sign up for Nvidia Developer area for cuDNN

Download Nvidia CUDA toolkit

Install libs while in (venv)

Set your commandline_args

Run webui-user.bat to load SD

Backup your working config

Extra credit: Negative guidance minimum sigma to reach 50+ it/s

Getting results with high sigma

Installing Xformers (not better than SDP but may decrease VRAM usage allowing higher resolution maximum output):

Troubleshooting

How to restore from a backup (e.g. you installed an extension and it broke SD or you want to use an updated version of SD)

Warning