It is difficult to be perfect in any form of translation, and there is always information loss in translation. The loss leads to misunderstanding. Whenever we talk about translation, the most common example is language translation. For human, one of the difficulties of doing translation is synonym. There are many synonyms for one meaning, but actually these synonyms have nuance. As to computer, it is easy for the computer misinterpret the meaning.
What about series of translation? The first thing that come up to my mind is Telephone Game, a game that needs several people line up in a row and pass messages from the first one to the last person, and the message at the end is always totally different from the beginning.
It is difficult for us to do translation, how about computer? Machine learning is the buzzword for recent years, and it has done many good works that we used to think that can be only done by humans. However, we don’t really see a machine learning model has 100% accuracy or it cannot learn from new data and will be overfitting from the training datasets. Accordingly, there must be something lost during the process. Also, machine learning can be treated as a process of translation, so I want to see how would the computer do within series of machine learning translation.
Doing translation can be treated as the process of interpreting the target, and this is based on your knowledge and experience. If you ask different people to describe an image, a thing, or just repeat a story told by others, the answers are always slightly different since everyone has their own way to interpret. However, for something that we all know, although it’s not 100% the same, we can have consensus on what it is or the summary is. For computers, we give them training dataset to be their knowledge and experience. However, it is difficult to perfectly simulate human thinking and experiences. That is, I make this project to see how different would a computer and human interpret.
The process will start from a random image, and the user needs to describe the image in a sentence. The computer will draw a sketch based on the description. The computer will come up with a sentence from the sketch then generate a new image from the sentence. The user will describe the new image and make the computer do the same process again. The whole process will go for 3 rounds. At the end the user will see the whole translation story and they can see how the computer interprets the user’s description into a sketch and even a new image.
There are some approaches that I have done but did not use at the end. I have tried Detectron to detect objects from an image, but at the end I make the user to get involved with the process so I don’t need to make the computer interpret the image. The user will describe the image instead.
Another technique is sketchyGAN, it can generate an image from a sketch, however right now it can only generate the image from a sketch with single object, and the sketch generate from my process has multiple objects in it.