A new frontier of AI: from text to video in one click
OpenAI has launched a new artificial intelligence application in recent days. Sora ‘s goal is to make videos from textual descriptions. For now, the app is a research product, and will be made available to a group of creators and security experts.
As announced on OpenAI’s website: “Sora is an AI model that can create realistic and imaginative scenes from textual instructions.” Several videos generated by Sora are presented on the website. The application can generate videos up to one minute long while maintaining visual quality and adherence to the user’s request. Sora can generate complex scenes with multiple characters, specific types of movement, and accurate subject and background details. The model encompasses not only what the user asks for but also, how these things exist in the physical world. This model has weaknesses, in fact, it may have difficulty accurately simulating a complex scene. For example, it might result in a person biting into a cookie but later not reporting the bite mark on the cookie. It could also confuse spatial details, such as confusing right from left.
On the security side, before making the application available, OpenAI set up several tests for Sora to put it to the test, as well as using the existing security methods, used for DALL-E 3. Politicians, educators, and artists will then be involved to understand and identify the positive use of this new technology.
Sora, like GPT, uses a transformer architecture. Videos are represented as collections of data units. In addition to being able to generate a video solely from the textual instructions, the template can take an existing still image and generate a video from it.
Also, in this case we can understand how much artificial intelligence has made progress, obviously there will be criticism of it and especially of the possibility of obscuring the creators of this area.
Written by Sara Pia Votta