If you have a graphics card with enough VRAM to run it, the chinese text-to-video model is surprisingly simple to get running.
I think you need at least 16gb vram, but it appears to be higher for more frames generated and lower for fewer.
https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis

This assumes windows platform, but its very similar in mac and linux (just smaller market share = less tutorials and updates)

To do so, you will need:
"conda" - miniconda3, look up how you can use conda to manage software dependencies
"git" - git to git clone the software onto your machine, including the code and the trained model
"cuda" - assuming you have nvidia GPU, you have to install cuda on your machine. Install cuda 11.7
Understanding of how to run commands in a command prompt window - I use anaconda powershell prompt
Python code understanding - you need to at least be able to open a text file and edit some code lines. Simple stuff though

Everything below was quick, took me about 15mins from start to finish on fast internet.

  1. Install miniconda, install git for windows
  2. open up miniconda command prompt (it should be a new shortcut in start menu called "anaconda powershell prompt", and navigate to where you want to put this code (ie "cd /videogen" etc).
  3. "git clone https://github.com/lopho/sd-video.git" will download the code here
  4. "git clone https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis.git" will download the model and config.
  5. Go into windows explorer, and move everything from the model into a folder named /model inside the /sd-video folder
  6. You are now ready to start preparing your conda environment for this
  7. "conda create -n videogen python=3.10.9" - Create a new miniconda environment called "videogen", which we will install all the required dependencies to
  8. "conda activate videogen" - activate that environment
  9. "conda install cuda -c nvidia/label/cuda-11.7.0 -c nvidia/label/cuda-11.7.1" - install cuda dependencies in python. Press y if prompted
  10. make sure you are in the "sd-video" folder, and run "pip install -r requirements.txt" - this will install dependencies specific to this code
  11. "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia" make sure you have all the torch dependencies
  12. If you look at the sd-video repository readme, you can see it demonstrates how to use it in python:
    https://github.com/lopho/sd-video/

This is the code from their readme:

from sd_video import SDVideo, save_vid
model = SDVideo('/path/to/model_and_config', 'cuda')
x = model('arnold schwarzenegger eating a giant cheeseburger')
save_vid(x, 'output') # 0001.png ... 00NN.png

  1. I copied this text, created a new file called "generate.py" in the sd-video folder. Then I edited the generate.py file and put this code inside it.

You can see in the code that there are some things that need to be changed. I changed the pointer to where my model was downloaded to the "model" folder that i created earlier and put the model files into. I also changed the text prompt.

from sd_video import SDVideo, save_vid
model = SDVideo('model', 'cuda')
x = model('camera pan, studio ghibli style, anime, countryside, blue sky, farmhouse')
save_vid(x, 'output') # 0001.png ... 00NN.png

  1. now, on your command line in, type "python .\generate.py" - This will execute the code you wrote, reading the prompt you put in there, passing it to the model, and generating the video.
  2. Look inside the /output folder - there you will see 16 frames of a video.

Some tips - the configuration.json file in /model contains some parameters you can tweak. Upping '"max_frames": 16,' uses more vram, reducing it lowers vram usage.

Edit
Pub: 19 Mar 2023 22:14 UTC
Views: 422