After two meetings with the team's supervisor, Dr. Dirk Schnieders, the team has settled down the project direction, which is generating comics from natural language description.
Background
As creation needs a thorough understanding of the subject being created and also the techniques such as designing and drawing [2], it’s difficult for people who have a great idea but no previous designing experience to change his/her idea to real artwork. It is a common problem among online writers that they have finished their works but cannot create their own cover images due to un- familiarity of design or drawing. But for computers, they have both the ability to understand natural language and also the deep knowledge in many fields, generating images based on the natural language description, which is called creation, will be a feasible task for them and help people who have the passion but no ability to create.
There has been an emergent trend in computer vision combining machine learn- ing related to this possible solution, which is constructing scenes from sentence descriptions based on provided references (known as Text-to-Image synthesis) [5]. It requires a natural language process as well as an image process. Addition- ally, a bridge that synthesis both vision and language modalities are indispens- able to the performance. Some recent works have utilized methods concerning Generative Adversarial Networks(GANs) present some admirable upshots.
However, current works regarding GANs analyze sentences on a general base, problems might take place when a word-level detail is determined to a picture requirement[5], as a result, it may probably fail to generate a picture fit the description well. Additionally, recent studies majorly focused on a simple object such as birds which do not require the objects’ interaction with others or notice to objects’ action. As a consequence, complex scenes generation in current works have not been well developed such as in the COCO dataset [6].
Besides, there is still room for further improvement in the image layout control. Most models based on Generative Adversarial Networks(GANs) perform well in one-object-in-the-center problem or single-domain images problem, while can not achieve expected results when the image to be generated contains multiple objects, which have complicated relationships with each other and different locations in the image [7].
A full version of proposal and the references are available at: First Proposal