How I Make AI Videos Step by Step
A step-by-step breakdown of the workflow, the image/video process, the music, and the weird mistakes I had to cover up.
TLDR: Today we will talk about how to make really good AI videos. I’ll take you through it step by step — the good, the bad, and the weird stuff I had to cover up cleverly. At the end of this post, you’ll be able to make any AI video you desire! 🔥
Use the Ultimate Prompt Creator (free, no signup):
Want access to the UPC Supporter Toolkit?
Before We Get Started
This is what we will be making today 👆.
I’ll explain the workflow and show you the things that went very well, including the things that went not so good, and what weird stuff I had to cover up cleverly. 👀😂 So let’s start with what we will cover in this post:
Crafting the video idea
Making the images
Making the videos
Making the music
Crafting everything together & SFX sounds (special effect sounds)
Weird stuff I had to cover up
The difference between newer and older AI models
Don’t let that list intimidate you by the way. When you actually do it, it feels way more natural than it looks written out like this. 👍
Then one thing I want to mention: making AI videos is something that I truly enjoy, but it is a work in progress. Every time you create a video you learn new stuff, and the videos keep getting better. If you are just starting out with making AI videos, please do not take this as a reference standard. If you want to know more about my previous AI videos, check out this previous post I made: How I Make AI Videos — The Full Process, Warts and All
Something else I want to mention before we begin. I know there are AI tools to automate AI video making, but personally I am not a fan of it. I am not saying it is wrong, but I just think it is not comparable because you lose a lot of control. I strongly believe the result is better if you have full control and are able to really use your own creativity to the fullest.
The more control you keep, the more your own creativity actually gets to show up.
Crafting the Video Scenes
We start inside the Ultimate Prompt Creator, and click “get started.” You will be greeted by this starting message — use it for guidance if you need it 👍
Then it is time to enter what you want to make. Now if you read my other posts, then you know I love showing screenshots of what I entered that you can use as an example. But this time I can’t. Reason why is because I made this video idea like half a year ago or so, so I will never ever find that thread back again. 🤔😂
But if you truly have no idea what to enter because you are totally new to this, here’s a quick example input. Just note that this is a very quick example I am throwing into this post out of my head. (Plus a few small tips that don’t make it too complicated.)
The more effort you put into this, the better the result will be. So if you are giving this a serious try — do not take this as-is. Put effort into it.
Example:
I want to make an AI video about [TOPIC]. What will happen inside this AI video is [BLA BLA BLA]. The feeling that the viewer will have when watching this is [FEELING]. This AI video will consist of multiple short AI videos between the length of 5–10 seconds. I have the broad idea I just described to you — but I don’t have a really concrete idea of what I want. That is what this prompt should do — it should help me flesh out this idea to a point that I can start making it.
Rules to apply:
First 5–10 Seconds: Prove the click and promise a payoff. The start must be fast, visually and narratively engaging.
Immersion & Retention: Keep the video interesting once it “really starts.” Aim to immerse viewers and avoid breaks that cause drop-off.
Human Psychology > Algorithm: Optimize for what humans want — clicks and sustained watch — not tricks.
Controversial & Extreme Opinions: Strong opinions can work if you deliver on them.
Bingeability & Chain Reaction: Design content so viewers want to watch the next video.
No Copy-Cat — Be the Purple Cow: Be original and bold; don’t just replicate others.
Don’t treat the user with soft gloves! If the user says something stupid — say so with an explanation why and how to make it better.
Character of the prompt: a very capable buddy with humor that is determined to help you achieve the best video possible — and is not shy to speak up against you in order to achieve this goal.
One quick tip before you use this. I deliberately kept this simple. The UPC will probably ask you how advanced the output should be. Meaning stuff like: SFX sounds, scenes, camera movement, voiceovers, text overlay, feelings being provoked, strength of that particular moment etc etc.
If you are new to this — say no to this! It may sound fancy, but getting all this information will very likely totally overwhelm you. Simply tell the UPC that you are not yet familiar with this, that you just entered a copy-pasted template, and that it should keep it simple for you so you can execute this in a simple and effective way. 👍💪
Anyway — then the UPC will ask you a few questions, and then produce the prompt for you that you can copy-paste. Then throw that into your favorite AI model (ChatGPT, Gemini, Grok, Copilot etc etc) and refine the video idea until you are totally satisfied! 🔥👀
Making the Images
Now you should have a solid plan of how your video will look. It is now time to start generating the images and bring the idea to life! 👀🔥
For this we go into the UPC Image Prompts GPT, and you will be greeted with this starting message:
Important to note: before you start enthusiastically throwing in all the image prompts you will be making — you first want to upload the context of the total picture of the video you will be making! That way the AI will be much better able to guide and help you, and to make it in a way that fits your video idea. Simply copy-paste the text or make screenshots of the idea, and upload those. 👍
I will give you an example of this as I did it in the Video Prompts GPT. Reason why I show Video GPT instead of Image GPT is because I didn’t fully agree with how I worked it out back then — so uploading that as reference into the Image GPT would have given it wrong information. The principle is the same though. 👍💪
Alright, so once you did that — it will ask you some questions about how to set up everything in the image prompts. Stuff like lighting, atmosphere, colors, vibe etc etc. Don’t worry though, it guides you through all this and does it even completely for you if that is what you tell it.
Then from there you simply continue making all the image prompts. Here are two examples of how I did that:
The Image Frames
There’s a thing you need to think about when you make the images: whether you need beginning-to-end frames or just single frames — because this majorly impacts the way you make your AI images.
So... what do I mean with this? 🤔
Single frames: You just give the AI video model an image with a video prompt. You do not care what the video ends with because it switches to another scene.
Beginning-to-end frame: Let’s say you need a flawless continuation of the scene. The first scene immediately gets followed up with the next scene. Meaning you give the AI video model a frame to begin with, and an end frame it should end with. Because the next scene begins with the exact same image that the last scene ended on.
If you want a more in-depth explanation, check out this post that I made that goes specifically about this topic: AI Image to Video: Which Method Should You Use?
Disclaimer: You do want to be careful how to plan these. In one of my older videos I tried to be a perfectionist, and threw in too many controlled frames — which resulted in weird pauses in my AI video in the middle of an action scene lol. 🤔👀😂 I am not going too deep into that now, but I explain it more in depth in the post about my older AI videos — I linked to it at the top of the post. 👆😉
In this specific video of the Ice Warrior I tried to do it differently. 🤔 I threw many short scenes into the action scenes, and then patched them together in a logical flow. This resulted in a much more fluent fighting scene. 💪
I also used beginning-to-end frames though. See for example the moment he transforms from his brown outfit to his ice armor, and then summons his wolves. Those are 3 images in 2 videos.
Him looking down at the camera in his brown outfit (begin frame of first scene)
Him looking at the camera in his ice armor (end frame of first scene — begin frame of second scene)
Him with 2 wolves next to him (end frame of second scene)
No worries if this seems a little confusing — once you get it, it is really simple! 🔥👀
Consistent Characters & Environments
This is a thing that is really simple, but can be a pain in the ass if you don’t know how to do it. 😋😂 For this you can use multiple tools, though I personally use ChatGPT.
In the case of ChatGPT, you simply tell it to use that same character or environment, and it will do it flawlessly. But you also need to watch out with this — because once in a while it starts slightly adjusting your characters or environments. All you then need to do is upload the perfect reference image of your character or environment to make it perfect again, and you will be good to go.
You can clearly see where I missed this. The part where the warrior charges the Ogre with his wolves, he is not himself, and is transformed pretty badly. Sad life for me that I only noticed that once I started editing... 🥴😂
So once you have that one “perfect” image of the character or environment — guard it with your life. 😂 That is your repair button whenever the AI starts getting creative in the wrong way.
You can also use other tools like for example Freepik. (There are many more — do your research 👍)
I cannot explain that in detail as I don’t use it. But you probably need to make a preset with the same character or environment from multiple angles, and then link a trigger keyword to it. You can simply tell the Image Prompt GPT to include that trigger word into the prompt. 👍
That’s it for the boring stuff — let’s continue! 😂
Making the Videos
This is the really fun part, as your vision will start to get a very concrete shape now. Seeing the images is one thing — seeing them brought to life with videos is another thing. 🤔😂
For this we now open the UPC Image-to-Video GPT. Again, as I showed in the screenshot before — we upload the plan of the video so it has more context.
And then... well. There’s not much to say. 😂 You simply follow the plan of the video together with the GPT to make the video prompts. 🤷
Only thing I want to mention is that you should know you can totally request what happens in the videos. If you don’t totally agree with what it does for you — simply ask it to adjust it, and you are good to go!
Throw your AI video prompts into your AI video platform, and make all the videos! 🔥👀
Turn sound generation OFF. Add the sound effects yourself later.
I use Freepik for this, but again — there are many more platforms for this, so do your research!
Disclaimer: If you are planning to use the free tier of a platform for AI videos, make sure to check the rules for commercial use. Not all — but many platforms don’t allow you to use the generated videos to make any sort of money with them. So if that’s your plan -> make sure to check that out first. 👍
Making the Music
If you did it correctly, then now you should have all your AI videos. Now technically, you could start throwing them into your editing program to start making a whole of it (more about that later), but I wouldn’t recommend it. Then you have to switch from editor to an LLM to the music software and then back to the editor — which in my opinion is pretty annoying. 😂
So now you make the music. How you make this probably varies per person. You can use the UPC, just throw text into the software, or use a template. Whatever method you use, I’d personally recommend uploading the whole video plan with it when you make the instruction, so the AI knows much better what it is about. The result will be surprising, especially if you are not familiar with music... like me. 👀😋
Personally I use Suno as platform. You can also use something like Producer, which can handle much more complex instructions — but the result is less epic in my opinion. But as always, there are a lot more platforms out there — do your research! 👍🤔
One thing I want to give a tip about. Try to watch your favorite movie without sound... it sucks, I know. 😂 But the same principle applies to AI videos. Music is the make-or-break of your video — I’d almost say even more important than visuals. So what do I mean with this in detail?
Think twice about what music you throw into it. When I made my dino video (the other post about my previous videos that I linked to at the top of the post 👆) I threw in more hardcore-ish music, and I truly loved it! But when I showed it to my family, they didn’t know how fast to turn off the sound... 🥴
So be a little careful with this. If you love it, it doesn’t mean everyone else will too. In the other post I literally show the difference, so check that out if you want to see it with your own eyes.
Disclaimer: Same as with the videos, if you use the free tier of a platform, keep an eye out on the commercial rules when making music!
I personally made multiple music tracks that I experimented around with a bit in the next step with editing. If you want to listen to what I made, here are a few of them — but you don’t need to listen to them if you don’t want to. Skip it if you wish. 👍
Feel free to use these yourself - just link back to this post if you do. 👍😉
Crafting Everything Together & Adding SFX Sounds
This is the part I cannot really guide you through because it is wildly different per video genre — but let me try. 👀🤔😂
Choose your editing platform. I personally use Invideo, but you can also use Canva or something else. 👍
Upload all the short AI videos into your editor, and put them all into the logical flow
Drop the music below it (don’t perfect it yet — just drop it)
Drop the voiceover if you have one
Watch it just for fun. 😋😂
Then make the first edits. Slow down or speed up individual video parts. Cut them if necessary to hide the freaky stuff or speed up the flow (more on that later 😂). Add transitions or special effects if they add value. (Don’t make it too chaotic)
Re-adjust the music until it fits. Reason why is because you now edited the flow of the video, so only now you can make the music fit properly
Then add SFX sounds where they add value. Make sure they are not so loud that they destroy the flow, neither so soft that they don’t add value. I think most platforms will require you to have a paid tier for SFX sounds — but do your research. 👀🤔
That’s it! Now do a final review to make sure everything fits and your video is finished! 🔥
Here is an example of how it looked at the end for this video. Don’t get scared if it seems chaotic — the flow is very logical if you do it as described:
As you may notice, you don’t see all the stuff that I added because you can scroll down. But it is basically the same as you scroll down — more special effect sounds etc — so I didn’t screenshot that part. 👀👍
The editing timeline always looks more chaotic than the actual logic behind the video.
One tip: at the beginning of the editing process I always get insecure if I really did do as good a job as I thought — I can tell you that you did. I can tell you out of experience, so believe me. Simply continue editing — and the result will be better than you imagined in the beginning! 💪🔥
Weird Stuff I Had to Cover Up
Even though the final result of the Ice Warrior video looked great — there was some pretty freaky shit I had to cover up. 😂
Some of the videos were generated really weirdly because I either made the image slightly wrong, or because the prompt should be slightly adjusted. Me being me though -> I always try to cover it up in a good way if I can instead of instantly regenerating — so I don’t waste generation credits. (I make a lot of scenes per video, so credits can burn quickly.) 🤔
And this is actually a useful mindset in general: first ask “can I fix this in the edit?” before instantly burning more credits on regenerations.
Let’s go through them so you know what I mean exactly — and how I managed to cover it up. That is the best way to get a feel for it. 👍
Number one: The opening scene where the chest gets taken away.
In the video it is fast — and the special effect is spectacular. It really has impact, right? Pure coincidence. I had to cover up a mis-generation 😂 I cut the video short and added a special effect at the end. The result is even better than a normal stealing scene:
Number two: Where the two wolves shatter the wraiths.
This was again spectacular by coincidence. The original video: they shattered the enemy... and then suddenly turned into black wolves lol. 🤔😂 Believe it or not — this part is the same video, but cut into two parts.
First I cut it down and added a special effect up to where they shattered them. Then I added the same video again — but then a very small part. The explosion itself in slow motion. It ended right before they turned black:
Number three: Where the Ice Warrior gets hit by the Ogre’s Morningstar.
The idea was simple. He would get smacked in the face, fly out of frame, and then immediately we cut to the part where he gets smashed against the pillar. That didn’t work out. 😂
First generation — he took the weapon and didn’t even get hit lol. I generated again — same result. I regenerated the prompt — again the same result. 🤯😂 I asked AI why, and apparently when the weapon is already basically in their face in the image, AI models have difficulties achieving this effect.
So I stopped regenerating and simply did an ultra-slow-motion shot of where he got hit. I am not very satisfied with this part as it does slow down the pace in the video, but it was the best I was able to do 🤔
Number four: Where the evil witch gets pushed on the floor.
The part where the witch and the Ice Warrior fight — you clearly see she gets pushed away. The idea was to make a spectacular shot of how she gets smacked across the floor... forget it. 😂
I generated her from the wrong angle, and so the AI made her fall to the ground in a very weird way that looked really weird. Again — I made a slow-motion shot where she hit the ground:
The Difference Between Newer and Older AI Models
There were three videos that I made that were generated with the older AI video models, and that gave not-great results. So I remade them with the newer AI models. The difference is big.
Sadly I didn’t think of keeping them. It happened on accident that I kept one of the older videos — so I can only show you that one.
I’ll show you both to show the difference. This was the older video model:
I was like... what the fuck?! 😂😂😂 What are you doing to my wolf?! 😂
This is with the newer model — much better:
Just keep this in mind when making your videos. The cost may be higher — but so is the result. Personally I use the older model for basic scenes, and the more complicated scenes with the newer models. 🤔👀
Final Note
That’s it! That was the whole workflow of how I made this video!
As said, if you want to know more about the previous videos I made, check out my other post (same link as at the top of the post): How I Make AI Videos — The Full Process, Warts and All
Use the Ultimate Prompt Creator (free, no signup):
Want access to the UPC Supporter Toolkit?









