I had a go at creating an animated AI-generated monster through the use of Disco Diffusion and EBSyth. The idea is to take a video of someone who isn’t moving too much as a base for the monster. Then take a single frame from the video and use CLIP model techniques (Disco Diffusion) to generate a new image from that frame. The new image will be then be applied throughout the rest of the video in a non-parametric approach (EBSynth). This post walks through the ‘high level’ steps of what I did to remind myself how to do it in the future. If anybody wants any further help, just ask I can write a better guide.
Find a video, split the video into frames, set up Disco
Find a video and follow the steps in this post about EBSyth until you’ve split the video up into many frames. There is an extra thing to consider while following the instructions in the link. Both the width and height have to be divisible by 64. You can change the height and width in Adobe Media Encoder if you are using that. Here is the video I used, which was Boris talking about lockdown.
You should have a load of images like so:
Pick a frame, use Disco to create some monsters
Once you have a load of frames of the video, open up Disco Diffusion, I used version 4.1, but as I post this, we are on version 5.0. Set Disco to create a new image that uses an init image, I set Disco to do about 500 steps but skipped the first 250. Upload one of the frames from the video to the Collab disk space, and set your settings to load that frame as the init image. This was my frame:
Set up a prompt, I tried it twice, on two different runs with a text prompt on step 0. My prompts were simple:
"A monster. Trending on artstation"]
["A clown. Trending on artstation"]
Run Disco Diffusion, a bunch of times, pick your favourite images, this is what I ended up with. One clown and one monster.
Use Ebsnth to paint other frames
Apply the style of the newly generated image to the other frames using Ebsynth. Follow the instructions in the blog post linked above. You should end up with a series of frames. Similar to your other frames, but this time painted by Ebsynth with your Disco image:
Use VirtalDub to stitch them back into a video
It should also probably be noted that Disco Diffusion has animation modes in it, but I haven’t played with it yet, and these animations won’t follow your original picture. Instead, I think it creates an image per frame. Here are short gifs of my clown and monster:
You could do a much better job if you spent more time on this, tweaking Disco, picking better videos and taking your time to get good init images. Still, I think the process is quite easy and interesting and I wonder how much of it I could automate.