/ldg/ Wan 2.1 Install and Optimization Guide
This is a noob's guide to help you install Wan and implement every available optimization to maximize the speed of video generation. Achieving this involves trade-offs in quality, but you can easily disable any of the optimizations if you prefer to prioritize quality over speed.
The included guide and workflows are tailored for NVIDIA GPUs with 24GB of VRAM, typically utilizing 21-23GB during inference. While it’s possible to use a GPU with less than 24GB, you’ll need to make adjustments. For example, a 16GB GPU can use FP8/Q8 models, provided you increase the virtual_vram_gb or block swapping settings in the provided workflows.
If you're under 16GB, you'll probably want to use the models quantized below Q8, but keep in mind that using a lower quantization level will reduce the quality of your outputs. In general, the lower you go, the lower the quality and less coherent the output.
Prerequisites - INSTALL FIRST
ComfyUI Portable
CUDA 12.8
GIT. Open a cmd.exe prompt and enter "git". If the command isn't recognized, download it here.
A clean install of ComfyUI Portable is recommended, one created specifically for Wan. This is for two reasons; one, this guide was tested on multiple PC's with a clean install and worked without errors or node conflicts. Two, the nightly pytorch, sage, triton and CUDA 12.8 installs required by this guide could cause issues with existing installations/workflows/nodes meant for image/audio generation in ComfyUI if those nodes/extensions require specific or non-nightly versions of these libraries.
Choose Implementation
Wan 2.1 can be integrated into ComfyUI through two approaches: Native support or Kijai's Wrapper. Native boasts several advantages unavailable in Kijai's version; support for gguf models (more accurate quantizations which gives you higher quality outputs compared to the fp8's offered by kijais), Adaptive Guidance (a method to speed up generations at the cost of some quality), and TorchCompile compatibility across not only the 40XX and 50XX GPU series, but also the 30XX series, which speeds up generations by an additional 30% or so. So at the present moment, Native is arguably the better option.
Once you've settled on a method and its associated workflow, proceed to the general installation steps.
Option 1 - Comfy Native
Download these modified versions of Comfy's workflows, based on an anon's from /ldg/. Beyond the optimizations and a few extra features, they use Alibaba's default settings as a baseline. The workflow outputs two videos, raw 16 fps and an interpolated 32 fps version.
You can easily adapt these to use the 720P model/setting. See Generating at 720P.
/ldg/ Comfy I2V 480p workflow: ldg_cc_i2v_14b_480p.json
(updated 21st March 2025)
/ldg/ Comfy T2V 480p workflow: ldg_cc_t2v_14b_480p.json
(updated 21st March 2025)
- Ensure that ComfyUI is updated to the very latest version. (update_comfyui.bat in ComfyUI_windows_portable\update)
- Download these models. If you have less than 24GB of VRAM, you could also swap out the Q8 models for Q6/Q5/Q4, though you'll see a progressively larger drop in output quality the lower you go.
Do NOT use Kijai's text encoder files with these models! You MUST use these text encoders or it will error out before generating with Exception during processing !!! mat1 and mat2 shapes cannot be multiplied (77x768 and 4096x5120)
- wan2.1-i2v-14b-480p-Q8_0.gguf goes in ComfyUI\models\diffusion_models\
- wan2.1-i2v-14b-720p-Q8_0.gguf goes in ComfyUI\models\diffusion_models\
- wan2.1-t2v-14b-Q8_0.gguf goes in ComfyUI\models\diffusion_models\
- umt5_xxl_fp16.safetensors goes in ComfyUI\models\text_encoders
- clip_vision_h.safetensors goes in ComfyUI\models\clip_vision\
- wan_2.1_vae.safetensors goes in ComfyUI_windows_portable\ComfyUI\models\vae\
- Move to General Install Steps.
Option 2 - Kijai's Wrapper
Download these modified versions of Kijai's default workflows. Beyond the optimizations and a few extra features, they use Alibaba's default settings as a baseline. The workflow outputs two videos, raw 16 fps and an interpolated 32 fps version.
You can easily adapt these to use the 720P model/setting. See Generating at 720P.
/ldg/ KJ I2V 480p workflow: ldg_kj_i2v_14b_480p.json
(updated 17th March 2025)
/ldg/ KJ T2V 480p workflow: ldg_kj_t2v_14b_480p.json
(updated 17th March 2025)
- Ensure that ComfyUI is updated to the very latest version. (update_comfyui.bat in ComfyUI_windows_portable\update)
- Download these models. There are e5m2 quantizations in the same repo, but they are inferior to e4m3fn. They do have the advantage of allowing TorchCompile to work on the 30XX series GPU's, however.
Do NOT use ANY other model files with KJ's! You MUST use these or you will encounter issues!
- Wan2_1-I2V-14B-480P_fp8_e4m3fn.safetensors goes in ComfyUI\models\diffusion_models\WanVideo
- Wan2_1-I2V-14B-720P_fp8_e4m3fn.safetensors goes in ComfyUI\models\diffusion_models\WanVideo
- Wan2_1-T2V-14B_fp8_e4m3fn.safetensors goes in ComfyUI\models\diffusion_models\WanVideo
- umt5-xxl-enc-bf16.safetensors goes in ComfyUI\models\text_encoders
- open-clip-xlm-roberta-large-vit-huge-14_fp16.safetensors goes in ComfyUI\models\text_encoders
- Wan2_1_VAE_bf16.safetensors goes in ComfyUI_windows_portable\ComfyUI\models\vae\wanvideo
- Move to General Install Steps.
General Install Steps
- Save this as auto_installer.bat in ComfyUI_windows_portable\ and run the .bat file to automatically install all the required extensions, along with Triton, Sage and Pytorch 2.8.0dev20250317, which will drastically speed up your generations. Run the commands through an LLM to confirm its safe, or run the steps within manually if you prefer.
- Edit run_nvidia_gpu.bat in ComfyUI_windows_portable and change the first line to this :
.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-sage-attention --fast
- Run ComfyUI. Look in the cmd.exe console window and make sure
pytorch version: 2.8.0.dev20250317+cu128
is shown during startup. You should also seeEnabled fp16 accumulation
andUsing sage attention
. Make sure that every time you start Comfy, pytorch version reads2.8.0dev
orfp16_fast / fp16 accumulation
won't work.
There's a possible bug when you update extensions and restart which reports an incorrect version of pytorch. If that happens, close Comfy manually and restart it again. This seems to happen most often if you use the "Restart" button in comfy after updating extensions, so close it manually and start it up manually after updating extensions. It can also happen after updating Comfy. If upon a second restart it still isn't 2.8.0dev
, run this in Comfy portable's root directory:
.\python_embeded\python.exe -s -m pip install torch==2.8.0.dev20250317+cu128 torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 --force-reinstall
- Open one of the provided workflows. Run your first gen. The video interpolation model will automatically download once the node is activated. If the workflow freezes during model loading with "Press any key to continue" in the cmd.exe window, you need to restart your computer. If you get this error when running the workflow :
ImportError: DLL load failed while importing cuda_utils: The specified module could not be found.
Important Notes Before You Gen
The initial generation time you get is NOT accurate. Teacache kicks in during the gen, and Adaptive about midway through if you're on Comfy Native/Core.
When a video finishes generating, you'll get two files in their own i2v or t2v directories and subdirectories. The raw files are the 16 frame outputs while the int files are interpolated to 32 frames which gives you much smoother motion.
It is highly recommended you enable previews during generation. If you followed the guide, you'll have the extension required. Go to ComfyUI Settings (the cog icon at the bottom left) and search for "Display animated previews when sampling". Enable it. Then open Comfy Manager and set Preview method to TAESD (slow). The output will become clearer by about step 10, and you'll get a general sense of the composition and movement. This can and will save you a lot of time, as you can cancel gens early if you don't like how they look.
NEVER use the 720p i2v model at 480p resolutions and vice versa. If you use the 720p i2v model and set your res to 832x480 for example, the output you get will be much worse than simply using the 480p i2v model. You won't ever improve quality by genning 480p on the 720p model, so don't do it. The only model which allows you to mix 480p and 720p resolutions is t2v 14B.
Supported Resolutions
Each model in Wan 2.1 is trained and fine-tuned to work best at specific resolutions. Sticking to these supported resolutions generally delivers the sharpest, most reliable results, especially for i2v, where each model was apparently tailored to perform optimally at just two resolutions. Straying from these can, in theory, lead to subpar output.
Text to Video - 1.3B | Text to Video - 14B | Image to Video - 480p | Image to Video - 720p |
---|---|---|---|
480x832 | 720x1280 | 832x480 | 1280x720 |
832x480 | 1280x720 | 480x832 | 720x1280 |
624x624 | 960x960 | ||
704x544 | 1088x832 | ||
544x704 | 832x1088 | ||
480x832 | |||
832x480 | |||
624x624 | |||
704x544 | |||
544x704 |
Guide On Using Non-Standard Resolutions on I2V
Though the two I2V models were trained at two specific resolutions each, using non-standard resolutions typically only seems to give you lower quality outputs due to reduced pixel space for the model to work with. I haven't noticed any glaring temporal issues or a huge drop in coherence by doing so.
That said, I generally try to keep my own outputs as close to the original resolutions as possible, avoiding extreme shifts from the standard 480p or 720p settings. I prefer to lock one dimension - either 480 for 480p models or 720 for 720p models - and adjust the other dimension downward (never upward) to tweak the aspect ratio as needed.
So with the 480p i2v model, one dimension stays fixed at 480, while the other starts at a maximum of 832 and can be scaled down from there. For the 720p model, one dimension anchors at 720, with the other starting at 1280 and adjustable downward.
This might be a better solution for you if you don't want to crop out vital details to get the image to fit the resolution, and it's better than having black vertical or horizontal bars on the image to slot inside of a precise 480p or 720p aspect ratio, given Wan doesn't seem to like letterboxing.
Still, nothing is stopping you from generating videos at, say, 512x512 on the 480p model, it's just that the quality will be worse than if you found a way to crop/scale it closer to the resolution the model was trained on.
Generating at 720P
If you want to use the 720p model in i2v or 720p res on t2v, you'll need to:
- On t2v, you need to increase the resolution to 720p (1280x720 / 720x1280). The single 14B t2v model supports both 480p and 720p.
- When using i2v on Wan, start by selecting the i2v 720P model in the model loader. Next, adjust the width and height settings of your input image to 1280x720 or 720x1280.
- On Comfy Native, set Teacache coefficients to i2v_720. Kijai's wrapper automatically selects the correct coefficients.
- Set Teacache threshold to 0.2, which is the medium setting. Increase it to 0.3 for faster gens at the expense of a hit to output quality.
- Increase virtual_vram_gb (Comfy Native) or block swaps (Kijai's Wrapper) depending on which implementation you use.
On a 24GB GPU, you want to increase it until you're using just under 23GB in total. You never want to reach or exceed 23.5GB use, or you'll either OOM or massively increase gen times.
The Optimizations
Several options in this guide speed up inference time. They are fp16_fast (fp16 accumulation), TeaCache, Torch Compile, AdaptiveGuidance (exclusive to Comfy Native) and Sage Attention. If you wish to disable them for testing or to increase quality at the expense of time, do the following :
- fp16_fast : remove --fast from run_nvidia_gpu.bat. If you're using KJ's, you also need to set WanVideo Model Loader's base_precision from fp16_fast to fp16
- Sage Attention : remove --use-sage-attention from run_nvidia_gpu.bat
- AdaptiveGuidance : set the AdaptiveGuidance node to a threshold of 1
- Torch Compile : right click on the TorchCompileModelWanVideo node and click Bypass
- TeaCache : right click the TeaCache node and click Bypass
Changelog
22/03/05
- changed requirement from CUDA 12.6 to 12.8
- updated pytorch to 2.8.0.dev20250317
- updated Triton and Sage
- 50XX series should work with this setup
- streamlined install process
21/03/25
- Comfy Workflows: added patch for TorchCompile issue leading to LoRA's being broken in Comfy Native