Beta's projects for MSc students 2020—2021 semester 2

General

The MSc Project or Disseration is yours, rather than your supervisor's.
You are expected to communicate with your supervisor actively when you work on the project. The supervisor makes comments and points you to the right direction if needed.
You, rather than your supervisor, will be the expert at the end of the project.
Please start the discussion of your proposed project with me by emailing to clyip at cs dot hku dot hk.
Please email me a statement why you are interested in the project, with a CV for reference and records before selection. Students not submitting these documents will be selected at a low priority.
This year, MSc Project and Dissertation topics should be self-proposed. See if your project idea matched one of my interested areas: meteorological computing, mobile applications, IoT applications, data mining, multimedia, computer music processing, audio processing, artificial intelligence, machine learning, and perception.
This page may be updated from time to time, before the project selection exercise ends. So, please check back from time to time.
In case you are interested, in the decades of undergraduate and postgraduate project supervision, my students have received every subgrade from A+ to F.
You can find some suggested topics below. You should cherry-pick elements of these projects and propose one that is suitable for you.
Slides (PDF) | Video introduction

Quick links to projects

˙ ˙ ˙

Project	nStudents	max nGroups	Type	Streams
First responder training game	1-2	2	Development	Gen, InfSec, MM
Automatic Cantonese speech translation	1-2	3	Research and Development	Gen, MM
Bird Sound Recognizer	1	3	Development	Gen, MM
Real-time Speaker Recognizer	1	3	Development	Gen, InfSec, MM
Vulnerability Analysis	1	1	Development	Gen, InfSec
Sequential associations	1	3	Research	Gen, InfSec, MM, Fin
Investment recommendation system	1–2	3	Research	Gen, Fin

First responder training game

Project info

1-2 students, maximum 2 groups
Development project
Streams: General, Information Security, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

In digital forensics, first responders are responsible for preserving an electronic crime scene and for recognising, collecting, and safeguarding digital evidence. A first responder constantly make decisions in an electronic crime scene, such as taking photos of objects, switching on or off devices, collecting and labeling evidence, and making appropriate records which start the Chains of Custody.

A number of principles should be followed in the process. For example, the process of collecting, securing, and transporting digital evidence should not change the evidence. Digital evidence should be examined only by those trained specifically for that purpose. Everything done during the seizure, transportation, and storage of digital evidence should be fully documented, preserved, and available for review. Failure to adhere to these principles risks nullification of the collected evidence, which could in turn affect the case under investigation.

The goal of the project is to develop a game for educating the player the appropriate steps of digital forensics. This includes building 3D physical crime scenes that allow players to interact with. A player chooses the equipments, procedures or the actions to be done in the crime scene. Feedback in form of verbal and visual recommendations should be given to the players according to their actions. Multiple scenes can be used to illustrate different steps in the evidence collection procedure.

There is no limitation on the deployment platform nor the languages for implementation. It can be a mobile game, a desktop game, or a game where the player interacts with a number of IoT devices in a virtual scene.

Keywords: Digital Forensics, Computer Forensics, Game Design and Development

Requirements

Have experience in, or is willing to learn, gamification in education
Have experience in, or is willing to learn, 3D programming

Deliverables

A game for educating the player the appropriate steps of digital forensics in mulitple 3D virutal electronic crime scenes.

References

Electronic Crime Scene Investigation: A Guide for First Responders, Second Edition (PDF)

Automatic Cantonese speech translation

Project info

1-2 student, maximum 3 groups
Research and Development project
Streams: General, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Automatic translation from spoken or written Cantonese to written or spoken English are not easy to find. Indeed, for written Cantonese, a system that is accepted by most people does not seem to have been developed.

There are variants of Cantonese transcription, or jyutping, and active research in Cantonese linguistics

With comprehensive Cantonese resources such as jyuping input method, ShefCE Cantonese-English Bilingual Speech Corpus, and machine translation system such as Moses, there must be some aspects of the language that makes automatic translation difficult.

Students working on this project will carry out a comprehensive study of the issues around automatic Cantonese-English translation, and make reference implementations that address these issues, towards a practical solution of automatic Cantonese-to-English, or English-to-Cantonese speech translator.

Keywords: Cantonese, English, tranlation, jyuping, ShefCE, Moses, speech processing, cepstrum, formants, MFCC

Requirements

Speaker of both Cantonese and English.
Not being afraid of linguistics in general, specifically grammars of natural languages.
Interest and/or knowledge in audio signal processing, and their tools and environments.
Interest and/or knowledge in artificial intellgence methods.
Good programmer.

Deliverables

Reference implementations that address various issues of automatic Cantonese-to-English, or English-to-Cantonese speech translation.

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

Real-time Speaker Recognizer

Project info

1 student, maximum 3 groups
Development project
Streams: General, Information Security, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Build a system that recognizes that labels the speakers in a recorded radio talk show or phone-in show, without need of prior training.

A more advanced version of the system should be language-independent. It should be able to take live speech, generating output as the input is analyzed, and possibly correcting earlier outputs when necessary.

Note that the student is expected to build their own collection of training and testing data.

Experimentation using systems such as Audacity, PureData, Octave, Mathematica, or Matlab is expected.

The system should be implemented in an operating-system independent way.

Application: some biometrics systems authenticate the user by speaker recognition. Your study may shed light on the usability of such a system.

Keywords: cepstrum, formants, MFCC, speaker diarisation

Requirements

Interest and/or knowledge in audio signal processing, and their tools and environments.
Interest and/or knowledge in artificial intellgence methods, including but not limited to search algorithms, neural networks, or genetic algorithms.
Good programmer.

Deliverables

The software system implementing the speaker recognizer.
Sets of data for training, testing, and system parameter tuning.

References

Bird Sound Recognizer

Project info

1 student, maximum 3 groups
Development project
Streams: General, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Listening to bird sounds turns out to be very important for bird watchers, as it not only locates the birds but also identify which species they are.

The project is about building a system that recognizes sounds of birds.

The project involves cleaning the sound, extracting the relevant section of course, identifying features, and matching against a database or recoginsing through a neural network or other methods.

Though it seems standard pattern recognition techniques can be applied to solve the problem, the variations of sounds of birds from the same species and the difficulty to collect enough samples makes the project practically difficult.

The problem is so interesting that the AI community have projects such as A.I.Experiments: Bird Sounds that visualises and clusters similar bird sounds.

Note that the student is expected to build their own collection of training and testing data. HKBWS Bird Call page is a good starting point. There are CDs of bird calls in the market as well.

It is best to start from a small collection of sounds of a few distinct species, and expand it to 20, 50 or 100.

The much larger collection of bird sounds of North America birds from The Macaulay Library of the Cornell Lab of Ornithology can be used to see later if your recognizer can handle a different data set.

Experimentation and visualization using systems such as Audacity, PureData, GNU Octave, Mathematica, or Matlab would be the fun part of the project.

The system should be implemented in an operating-system independent way.

Requirements

Interest and/or knowledge in audio signal processing, and their tools and environments.
Interest and/or knowledge in artificial intellgence methods, including but not limited to search algorithms, neural networks, or genetic algorithms.
Good programmer.

Deliverables

The software system implementing the bird sound recognizer.
Sets of data for training, testing, and system parameter tuning.

References

Vulnerability Analysis

Project info

1 student, maximum 1 group
Development project
Streams: General, Information Security
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

PoisonTap is an application installed on a Raspberry Pi Zero that emulates an Ethernet device over USB, exploits existing trust in various mechanisms of a machine and network, including USB/Thunderbolt, DHCP, DNS, and HTTP, to produce a snowball effect of information exfiltration, network access and installation of semi-permanent backdoors.

However, it is a tool developed half a decade ago, and computer systems nowadays may not be vulnerable to the attacks of the tool.

The student is to do implement PoisonTap on real hardware, and do a vulnerability analysis on various victim operating systems against the PoisonTap attacks.

One objective of the project is to analyse the vulnerabilities and propose attack prevention and detection solutions.

Another objective of the project is to make PoisonTap more powerful on various victim operating systems.

Requirements

Knowledge in computer security.
Good scripting and programming skills.

Deliverables

Vulnerability analysis report.
Updated PoisonTap code.

References

PoisonTap — siphons cookies, exposes internal router & installs web backdoor on locked computers
Raspberry Pi Zero

Sequential associations

Project info

1 student, maximum 3 groups
Research project
Streams: General, Information Security, Multimedia Computing, Financial Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Events often happen with a cause. A cause may lead to a effect, but not always for certain. Sometimes, it is hard to say which event is the cause, and which is the effect. The events can be just correlated and are affected by the same cause, or they just happen together by chance [2].

When many Hang Seng Index (HSI) constituents rise, the Hang Seng Index tends to go up. Earthquake at one place may be followed up by earthquakes in some other places along a fault. When the subtropical ridge of high pressure area is more to the West, more typhoons are expected to enter the South China Sea [1]. Rise of some stock prices may cause the rise of the prices of commodities, and the most talked-about stocks on social media may be the ones with greatest volatility. Some of these chages are immediate (e.g., HSI), and some delayed (e.g., earthquake). The cause of some delayed events may not be obvious (e.g., typhoon), and some seemingly correlated events may not even be have cause-effect relationship.

The project is about studying and applying algorithms on sequences of data to find out how they are associated with each other. The student is going to choose and acquire sequences of data they are interested in, and study and apply statistical techniques, sequential data mining algorithms, or temporal classification methods to find out how the events in the sequences are associated with each other. Applications include generating warnings (earthquake case), prediction of future stock index values (HSI case), prediction of the range of the number of typhoons entering South China Sea (typhoon case), or prediction of stock prices given market information (commodity and social media cases).

Requirements

Interest and/or knowledge in machine learning, data mining and artificial intellgence methods, such as search algorithms, neural networks, or genetic algorithms.
Good programmer.

Deliverables

Sets of data for training, testing, and system parameter tuning.
Programs for getting data and generating experimental results.
Programs for applying the results for generation of suggestions or predictions.

References

Why Tropical Cyclone Recurves? PAN Chi-kin; Hong Kong Observatory 2011-09.
Spurious correlations

Investment recommendation system

Project info

1–2 students, maximum 3 groups
Research project
Streams: General, Financial Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

From historical and current or near-current data of a stock or index prices, design an algorithm that makes investment recommendations (e.g., buy, hold, sell, hedge) that would maximize the profit in a simulated environment.

Some factors the algorithm can take into the consideration include the day in month, weekday of day, time of day, various financial indicators, correlations between data from different time series.

Time series of non-numerical data such as news articles, Twitter feeds, or Facebook posts can be analysed to improve the accuracy of the prediction. Indeed, this has been proven to be quite effective in prior studies.

Note that the student is expected to build their own collection of training and testing data.

Be very careful about accuracy claims of better than 70% when you do literature research on how good their systems are, especially when the system uses historical numerical data or financial indicators only.

Students working on this topic are encouraged to take a wider view and analyse prices or indices of many stocks or markets.

Students producing good results early in the project can be linked up with real investment recommendation firms who can provide real time data for testing and may see your algorithms licensed and implemented for real.