2017-18 Final Year Project | Computer Science | The University of Hong Kong

Deep Learning for Text Classification in Azure Infrastructure

Deep Learning & Natural Language Processing

Background

Endowing computers with the ability to comprehend human languages and communicate with us in our mother tongue has long been a challenging pursuit in the field of artificial intelligence. Nowadays with increasing computing power and exploding volumes of data, deep learning has emerged as a promising approach to accomplishing natural language processing (NLP) tasks.

There are many examples of successful academic achievements and real-world applications:

  • Speech recognation;
  • Handwriting recognation;
  • Google Translate;
  • Siri.

Though the outcomes of applying deep learning to NLP seem overwhelming, the majority of current studies focus on processing English text or speeches. There are few research studies and applications of NLP with Chinese. Therefore, we cooperate with Microsoft who proposed this industry-based project to develop a system that analyzes underlying sentiments of sentences in Chinese, leveraging the approach of deep learning.

The significance of this project is twofold:

  1. Academic Research:

    Different options and possibilities for Chinese NLP are explored and their advantages and drawbacks are identified. Based on research works, original deep learning models are designed and developed.

  2. Real-world Application:

    Sentiment analysis for domain-specific Chinese corpora may yield significant value. As billions of Chinese text pieces generated on the internet every day, sentiment analysis over a huge batch of data in a specific domain can extract valuable information which can be used by companies to determine marketing strategies and improve customer service.

CNTK & Microsoft Azure

Industry-based Project

This is an industry-based project proposed by Microsoft Limited HK. Therefore, this project adopts Microsoft technology as its infrastructure.


Microsoft Cognitive Toolkit

Our project adopts Microsoft Cognitive Toolkit (CNTK) as the depp learning frameworks.

CNTK is an open-source toolkit for commercial-grade distributed deep learning. It describes neural networks as a series of computational steps via a directed graph. CNTK allows the user to easily realize and combine popular model types such as feed-forward DNNs, convolutional neural networks (CNNs) and recurrent neural networks (RNNs/LSTMs). CNTK implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers.

Official Website: Microsoft Cognitive Toolkit


Microsoft Azure

Microsoft Azure provides computing power for this industry-based project.

Microsoft Azure is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through a global network of Microsoft-managed data centers. It provides software as a service (SaaS), platform as a service (PaaS) and infrastructure as a service (IaaS) and supports many different programming languages, tools and frameworks, including both Microsoft-specific and third-party software and systems.

Official Website: Microsoft Azure