/ldg/ Wan 2.2 Install and Optimization Guide
This is a noob's guide to help you install Wan and implement every available optimization to maximize the speed of video generation. Achieving this involves trade-offs in quality, but you can disable any of the optimizations if you prefer to prioritize quality over speed. The idea is to offer the fastest possible generation speed in a single, basic workflow, which you can then tailor to your hardware and needs.
The included guide and workflows were created for NVIDIA GPUs with 24GB of VRAM, typically utilizing 21-23GB during inference. 64GB of system RAM is also recommended, along with having Comfy and the models on an SSD for fast swapping.
There's also options for systems with less than 24GB VRAM, and the workflows can be tweaked to accommodate them. More info is below.
- /ldg/ Wan 2.2 Install and Optimization Guide
- VRAM Requirements and Model Size (aka, "Can My 4GB GPU Run This?")
- Prerequisite Steps - DO FIRST
- Installation
- Common Errors
- Important Notes
- Kijai's Wan2.2 lightx2v workflow
- Phr00t's Rapid AllInOne workflow
- /ldg/anon's Wan2.2 lightx2v workflow
- Wan2GP
- Bullerwins quants + workflow for the GPU poor
- The Optimizations
- Changelog
VRAM Requirements and Model Size (aka, "Can My 4GB GPU Run This?")
To check if your GPU's VRAM can handle a model, verify the model's file size, as it must fully load into VRAM for inference. For example, wan2.2_t2v_high_noise_14B_Q8_0.gguf (15.4GB) and wan2.2_i2v_high_noise_14B_Q8_0.gguf (15.4GB) both require at least 15.4GB of VRAM, respectively. That's just to load the models.
Additional components like text encoders or CLIP models also use VRAM, and inference can add 2-5+GB more, depending on resolution and context/frame count, with 720p settings being particularly VRAM-intensive.
To manage VRAM limitations, offload to RAM/CPU using the blocks_to_swap setting in the WanVideo Block Swap node, though this slows generation and you can only offload so much before it slows down to the point of being unusable.
If your VRAM is still insufficient, use a lower-quantization model (available in repositories listed in the installation section), which reduces VRAM needs but sacrifices accuracy and quality.
Prerequisite Steps - DO FIRST
- ComfyUI Portable. It's not mandatory, but I recommend a clean install of ComfyUI Portable, one created specifically for Wan.
- GIT. You might already have this. Open a cmd.exe prompt and enter "git". If the command isn't recognized, download it here.
Next, download these workflows, based on both Comfy's original workflows for Wan, as well as user workflows and feedback from /ldg/ users. They're kept relatively barebones by design, so you can easily modify them to suit your needs. They also use Alibaba's default settings as a baseline, but include the most important optimizations, a LoRA loading fix, along with video interpolation, which uses AI to increase the framerate of the generated videos. The workflows output two videos; the raw 16 fps generation and an interpolated 32 fps version with smoother motion.
- /ldg/ Comfy I2V workflow (Kijai): ldg_2_2_i2v.json (updated 17th November 2025)
- /ldg/ Comfy T2V workflow (bullerwins): wan2_2_14B_t2v_example.png (updated 2nd August 2025) (drag the image into comfy)
Installation
- Ensure that ComfyUI is updated to the very latest version. (update_comfyui.bat in ComfyUI_windows_portable\update)
- Download these models. If you have less than 16GB of VRAM, you could also swap out the base models for Q6/Q5/Q4 GGUFs, though you'll see a progressively larger drop in output quality the lower you go.
Do NOT use any other text encoder files with these models! Using quantized version of umt5_xxl_fp16.safetensors can lead to errors! Using KJ's version of the text encoder will error out before generating with Exception during processing !!! mat1 and mat2 shapes cannot be multiplied (77x768 and 4096x5120)
- Wan2_2-I2V-A14B-HIGH_fp8_e4m3fn_scaled_KJ.safetensors goes in ComfyUI\models\diffusion_models\
- Wan2_2-I2V-A14B-LOW_fp8_e4m3fn_scaled_KJ.safetensors goes in ComfyUI\models\diffusion_models\
- Wan2_2-T2V-A14B_HIGH_fp8_e4m3fn_scaled_KJ.safetensors goes in ComfyUI\models\diffusion_models\
- Wan2_2-T2V-A14B-LOW_fp8_e4m3fn_scaled_KJ.safetensors goes in ComfyUI\models\diffusion_models\
- Wan2_2-TI2V-5B_fp8_e4m3fn_scaled_KJ.safetensors goes in ComfyUI\models\diffusion_models\
- Wan_2_2_I2V_A14B_HIGH_lightx2v_4step_lora_v1030 or Wan_2_2_I2V_A14B_HIGH_lightx2v_MoE goes in ComfyUI\models\loras\
- lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors goes in ComfyUI\models\loras\
- umt5-xxl-enc-bf16.safetensors goes in ComfyUI\models\text_encoders
- clip_vision_h.safetensors goes in ComfyUI\models\clip_vision\
- wan_2.1_vae.safetensors goes in ComfyUI_windows_portable\ComfyUI\models\vae\
- taew2_1.safetensors goes in ComfyUI_windows_portable\ComfyUI\models\vae_approx\
- Download this bat file and save it as wan_autoinstall.bat in ComfyUI_windows_portable\
Run the .bat file and select your GPU type. If you have a 50XX series, it'll install pytorch 2.8.0dev, while 40XX and below will install 2.7.1 Stable. There's no functional difference in speed between pytorch versions, however some 40XX and 30XX users have reported issues with 2.8.0dev and WAN. Also installed are other requirements, all of which will drastically speed up your generations. Run the commands through an LLM to confirm its safe, or run the steps within manually if you prefer. - Make a copy of run_nvidia_gpu.bat in ComfyUI_windows_portable, and call it run_nvidia_gpu_optimizations.bat. Then change the first line to this :
.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-sage-attention --fast
- Run ComfyUI with run_nvidia_gpu_optimizations.bat. Look in the cmd.exe console window and make sure
pytorch version: 2.7.1orpytorch version: 2.8.0devis shown during startup. You should also seeEnabled fp16 accumulationandUsing sage attention. Make sure that every time you start Comfy, pytorch version reads either2.7.1or2.8.0dev, otherwisefp16_fast / fp16 accumulationwon't work. - Open one of the provided workflows. Run your first gen. The video interpolation model will automatically download once the node is activated.

Common Errors
IMPORTANT! There's an issue with Comfy where it sometimes boots up an old version of pytorch. It can happen :
- when you first install 2.7.1/2.8.0dev and run Comfy
- when you update Comfy
- when you restart Comfy via Manager's restart button
On booting Comfy up, if the cmd.exe console window displays anything but 2.7.1 (for 30XX & 40XX) or 2.8.0dev (for 50XX), restart Comfy manually. If it still isn't listing 2.7.1/2.8.0dev after you've restarted it once or twice, try to manually install pytorch again by running this in Comfy portable's root directory:
30XX or 40XX
.\python_embeded\python.exe -s -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 --force-reinstall
50XX
.\python_embeded\python.exe -s -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 --force-reinstall
If the workflow freezes during model loading with "Press any key to continue" in the cmd.exe window, you need to restart your computer.
If you get this error when running the workflow :
ImportError: DLL load failed while importing cuda_utils: The specified module could not be found.
Go to \users\username\ and open the .triton directory. Delete the cache subdirectory inside of it. Do not delete the entire .triton directory.
If you get an error about :
SamplerCustomAdvanced returned non-zero exit status 1
Download this and extract it to ComfyUI_windows_portable\python_embeded
Important Notes
It is highly recommended you enable previews during generation. If you followed the guide, you'll have the extension required. Go to ComfyUI Settings (the cog icon at the bottom left) and search for "Display animated previews when sampling". Enable it. Then open Comfy Manager and set Preview method to TAESD (slow). At about step 10, the preview will clear up enough to get a general sense of the composition and movement. This can and will save you a lot of time, as you can cancel gens early if you don't like how they look.
Wan2.2 is more flexible with resolution than 2.1. Don't worry about adhearing to 480p or 720p, but try to make sure the width and height are both divisible by 16. For example: 864x560, 720x960, 1072x720, etc.
TorchCompile needs to compile when running your first gen. You'll see multiple lines in your cmd.exe window as it compiles (DeviceCopy in input program) Once it's finished, subsequent generations will be faster. It needs to compile every time you restart Comfy or change your LoRA stack.
When a video finishes generating, they will be saved to the output folder.
Kijai's Wan2.2 lightx2v workflow
This is a modified workflow for it with all relevant optimizations and settings. I've only added the I2V version, you should be able to adapt the settings to T2V easily enough :
- /ldg/ Comfy I2V Kijai workflow: ldg_2_2_i2v.json (updated 22nd Novembert 2025)
To test it, make sure both ComfyUI and KJNodes is up to date.
Make sure you have the lightx2v LoRA from here and one of the newer loras for the high noise model.
Add it to your ComfyUI LoRA directory. Any other LoRA's should be connected to the "prev_lora" input of the lightx2v lora nodes. Plugging into the low noise Lora has a stronger effect.
Phr00t's Rapid AllInOne workflow
This is an "all in one" merge of WAN 2.2, 2.1, accelerators, VAE and umt5 text encoder models into 1. FP8, which is a good compromise of VRAM usage and precision.
Super simple and designed for speed with as little sacrifice to quality as possible.
/ldg/anon's Wan2.2 lightx2v workflow
>This is the Wan 2.2 rentry's workflow, with the settings I don't use tucked away inside subgraphs + a tiny bit of cg-use-everywhere (controversial, but very much optional).
Wan2GP
Wan2GP is an interface for wan that better resembles A1111's WebUI.
Bullerwins quants + workflow for the GPU poor
- /ldg/ Comfy T2V workflow (by bullerwins): ldg_2_2_t2v_14b_480p.json (updated 2nd August 2025)
The Optimizations
Several options in this guide speed up inference time. They are fp16_fast (fp16 accumulation), TeaCache, Torch Compile, AdaptiveGuidance and Sage Attention. If you wish to disable them for testing or to increase quality at the expense of time, do the following :
- fp16_fast : remove --fast from run_nvidia_gpu.bat.
- Sage Attention : remove --use-sage-attention from run_nvidia_gpu.bat
- AdaptiveGuidance : set the AdaptiveGuidance node to a threshold of 1
- Torch Compile : right click on the TorchCompileModelWanVideo node and click Bypass
- TeaCache : right click the TeaCache node and click Bypass
Changelog
22/11/25
- Updated the Kijai I2V workflow
- Provided links for new high-noise lightx2v loras
17/06/25
- Added Self Forcing+NAG section
- Updated .bat installer with GPU selection for pytorch, given 2.8.0dev seems to be causing OOM's with batch runs on 3090 GPU's
- Added ComfyUI-Crystools to auto installer for resource monitoring, good to see at a glance if you're using too much VRAM
- Updated TorchCompileModelWanVideo to TorchCompileModelWanVideoV2
03/06/25
- Removed KJ section and workflows given nobody was using them and native still remains the best option
- Added section on RifleXRoPE and VACE
10/05/25
- Fixed .bat file attempting to download a broken version of torchvision
26/04/25
- Changed FILM VFI's clear cache from 20 to 10 to prevent OOM's under certain conditions
- Video files now output to date-formatted directories, with the seed in the filename
22/03/25
- changed requirement from CUDA 12.6 to 12.8
- updated pytorch to 2.8.0.dev20250317
- updated Triton and Sage
- 50XX series should work with this setup
- streamlined install process
21/03/25
- Comfy Workflows: added patch for TorchCompile issue leading to LoRA's being broken in Comfy Native
Other /ldg/ rentries I maintain
- updated collage script for /ldg/ : https://rentry.org/ldgcollage
- local model meta : https://rentry.org/localmodelsmeta