Google has decided to keep up with the trend concerning neural networks. Its new system, called Imagen Video, creates videos based on text descriptions, and in fairly high quality.
The system can create videos up to 5 seconds long at a resolution of 1280 by 768 pixels. The company is not yet publishing the source code of the system and is not opening access to it in order to avoid the appearance of shocking or obscene content.
A neural network has been trained on tens of millions of photos, videos and text descriptions. When a text request is made, the system generates an initial prototype video of 16 frames at a resolution of 24 × 48 pixels and a rate of 3 frames per second. Another system then increases the resolution to 1280 by 768 pixels and 5 frames per second.