Implement a supervisory system to teach, monitor and guide workers on assembly lines using deep learning technology
Learn moreThe real life motivation for our study is to implement a supervisory system to teach, monitor and guide workers on assembly lines. It is a common practice for robot systems to learn from images with category labels and bounding boxes which has been well explored in the work related to supervised object detection. However, it takes a lot of effort to annotate images with precise bounding boxes which is infeasible in the above scenario. Instead, demonstration videos with phase-based labels are more accessible and common in such industry.
In light of that, our work would like to make use of the implicit temporal information in videos instead of focusing on the image level spatial information only. Inspired by some works that use temporal related approaches [8, 6], we would like to feed videos with only phase-based object labels to networks. The intuition behind is to let implicit temporal information serve as free supervision signals [9, 5] to compensate the absence of localization ground truth. Ideally, after trained with simply labeled demonstration videos, the system is able to recognize all assembly phases and localize the objects of interest in each phase so as to guide and monitor assembling work.
In a network architecture, for instance ResNet, adding a global average pooling before the output layer can maintain the features detected in former layers. Assigning those features with different weights can generate a heat map, through which the information of image region used for discrimination can be tracked by distinct colors.
CCT model is a free supervision technique. The deep feature space embedded by φ can be used to propagate masks specified in first frame throughout the video. We propose to use localization information generated in CAMs as mask for propagation. In this way, connectivity between neighbor frames can be ensured by performing time cycle consistency.
We will train the model on self-collected dataset, which contains video demonstrating the procedure of assembling hard drive, power supply and CD-ROM into a computer case. Demonstration videos will be labelled with the category of objects on phase basis.
We plan to compare the accuracy of localization with models making use of CAMs. At the same time, we propose to test the accuracy in different assembling working scenerios. For instance, we will experiment under different manufacture work and distinct working background to examine the accuracy.
To be released
● Milestone 1: Detailed project plan submission
● Project webpage goes live
● Literature review
● Test and improve localization with weakly supervised learning in assembly line dataset, namely test CAM model.
● Test and apply techniques to enforce temporal smoothness of the localization results.
● Milestone 2: Provide a system using weakly supervised learning for localization.
● Submit interim report.
● Evaluate the system
● Milestone 3: Submit final report and poster design
● Prepare presentation.
● Final exhibition
Deliverables for this project
Team Member
Team Member
Project Supervisor
If you are interested in our topic, feel free to reach out to us.