December 25, 2020
This is an article that was published on the “DeNA Advent Calendar 2020” for December 25th. To those of you who only read English, I’m sorry the links are mostly Japanese. Hi, I’m @mazgi. In the last article, I wrote that we are renewing this blog. This blog has been renewed on Apr. 2020 and thanks to you, the readership is going up and everything is going well. The internet is flooded with announcements of new releases. However, there are few articles following up on what’s happened to those new releases. Therefore, as the individual who was allotted the last day of the advent calendar, I’ll follow up on the results of the blog, whether it could be considered as a success or a failure. Renewal: What Changed? Our blog was built ten years ago and powered by Movable Type (MT) which was the standard back them, and it has been hosted on a Linux virtual machine on our on-prem infrastructure. However, technologies have improved in these ten years, so nowadays hosting static websites such as blogs don’t require servers. Therefore, I decided, “I will not use servers anymore!” on this renewal. As a result, we designed a new blog
In this article we will explain how we handle the generation of annotated data for computer vision related machine learning at DeNA. We will focus mainly in how we solved our problem by creating our own annotation system, Nota, and how it integrates into the ML workflow. We will describe our current system and some of the decisions we made, as well as the challenges we had to solve to get to the current solution.
My name is Jonatan Alama, I am a member of the Analytics Solution Engineering and the Machine Learning Engineering groups at DeNA. My team and I design, develop and operate web applications and other solutions for data related problems.
The problem obtaining accurate data In the recent years there have been a lot of advancements in AI systems, and many of them are related to the computer vision field. We can train computers to interpret and try to understand what they see. For the training phase, we use what we call “Annotated images and videos”. Image annotation and image classification is a process done by humans in order to obtain a set of data that a computer can learn from using machine learning processes. It consists of marking objects inside an image, normally using basic shapes, and then categorizing the marked object.