Glove for playing paper-scissors-stone using TensorFlow
2 March 2018Gutenberg. Do we need another visual editor?
8 March 2018* based on the article “Turning Design Mockups Into Code With Deep Learning” by Emil Walner published on FloydHub Blog -> https://blog.floydhub.com/turning-design-mockups-into-code-with-deep-learning/
Within the next few years, deep learning will strongly affect the front-end development or creating HTML / CSS / JS mockups. It should accelerate building prototypes and facilitate building software. Currently, the biggest barrier to front-end development automation is computing power. Despite that, we can start exploring front-end automation right now, using deep learning algorithms along with synthesized training data. This article presents a description of neural network learning, and how to encode a basic HTML and CSS page based on the mock-up image of the project.
The description is based on the tutorial of Emil Walner published on the FloydHub blog. FloydHub is a deep learning training platform. Installation and running of the first model takes about 10 minutes. This is the best option for running models on the GPU in the cloud.
Emil Walner’s project consists of three steps:
- Transfer of the project’s image to a trained neural network
- Convert an image to HTML tags via a neural network
- Rendering
The neural network is built using three iterations.
At the very beginning, you need to prepare a minimum version to get a hang of moving parts. Next, we create an HTML version that focuses on automating all steps and explaining the layers to the neural network. The final version is Bootstrap. Then we create a model that can generalize and explore the LSTM layer. The code can be written in Python and Keras, and the framework created using TensorFlow. If you don’t know deep learning, it is advisable to get to know Python first and acquire practice and knowledge about backward propagation and convolutional neural networks.
The beginning – training the neural network
The project goal is to build a neural network that will generate HTML / CSS tags corresponding to screenshots. By training a neural network, you give it several screenshots with matching HTML code. The network learns by predicting all matching HTML tags one by one. When it predicts the next tag, it receives a screenshot as well as all valid tags up to that point. Creating a model that predicts word by word is the most common approach, but not the only one.
First, you need to focus on capturing the input and output of the neural network.
Let’s say we’re training the network to predict the phrase “I can code.” When he receives “I”, he anticipates “I can”. Next time he will get “I can” and will write “code”. He receives all previous words and only has to anticipate the next word.
The neural network creates functions from the data. It builds functions that connect the input data with the output data. It must create representations to understand what is on each of the screenshots and the HTML syntax it has predicted. This builds knowledge to predict the next tag.
When you want to use a trained model for use in the real world, it is similar to when you practice the model. The text is generated one by one with the same screenshot each time. Instead of providing valid HTML tags, it receives the tags he has generated so far. Then it predicts the next tag. Forecasting begins with the “start marker” and stops when it anticipates the “end marker” or reaches the maximum limit.
Hello World version
Then we build the hello world version. We need to provide a neural network with a screenshot of the site displaying “Hello World!”. Then we teach her how to generate tags. It can be a character, word or sentence. The version with the character requires a smaller dictionary, but it limits the neural network.
Watch out for:
- Building the first working version before collecting data.
- Dealing with terabyte size data requires good equipment or a lot of patience
- Nothing makes sense before you understand the input and output data.
HTML version
This version automates many steps from the Hello World model. It focuses on creating scaled implementation and moving elements in the neural network. This version will not be able to predict HTML from random websites, but it is still a great configuration to study the dynamics of the problem. There are two main sections. First the encoder. At this point, we create image functions and previous tag functions. Functions are construction blocks created by the network in order to connect the design mock-ups with tags. At the end of the encoder, we stick image functions to each word in the previous tag. The decoder then combines the design and marking function and creates the next marker function. This function is triggered by a fully connected neural network to predict the next tag.
Watch out for:
- Building vocabulary from scratch is much easier than narrowing down a huge vocabulary.
- Most libraries are created to analyze text documents instead of code.
- You can extract functions using a model trained on Imagenet
Bootstrap version
The final version uses a set of generated bootstrap page data from pix2code paper. Using this bootstrap, we can combine HTML and CSS and reduce the size of the vocabulary. We turn it on to generate a tag for a screenshot that has not yet been seen. We also analyze how to build knowledge about screenshot and markup. Instead of training it on the bootstrap tag, 17 simplified tokens were used, which were then translated into HTML and CSS. The data set includes 1,500 test screenshots and 250 validation images. For each screenshot, there is an average of 65 tokens, which gives 96925 examples of training.
Watch out for:
- Understanding the weaknesses of models instead of testing random models.
- Use only pre-trained models if they are suitable.
- Plan for a slight deviation when you run your model on a remote server.
- Make sure you understand the functions of the library.
- Use lighter models during experiments.
Deep learning future
Front-end development is the ideal area for deep learning. Data generation is easy, and current deep learning algorithms can map most of the logic. In the near future, the most important factor will be building a scalable way to synthesize data. Then, step by step, you can add fonts, colours, words and animations.
So far, most of the progress is taking place in sketches and transforming them into template applications. In less than two years, we will be able to draw the application on paper and have the right front-end in less than a second.