Beta's projects for MSc students 2020—2021 semester 2

General

First responder training game

Project info

1-2 students, maximum 2 groups
Development project
Streams: General, Information Security, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

In digital forensics, first responders are responsible for preserving an electronic crime scene and for recognising, collecting, and safeguarding digital evidence. A first responder constantly make decisions in an electronic crime scene, such as taking photos of objects, switching on or off devices, collecting and labeling evidence, and making appropriate records which start the Chains of Custody.

A number of principles should be followed in the process. For example, the process of collecting, securing, and transporting digital evidence should not change the evidence. Digital evidence should be examined only by those trained specifically for that purpose. Everything done during the seizure, transportation, and storage of digital evidence should be fully documented, preserved, and available for review. Failure to adhere to these principles risks nullification of the collected evidence, which could in turn affect the case under investigation.

The goal of the project is to develop a game for educating the player the appropriate steps of digital forensics. This includes building 3D physical crime scenes that allow players to interact with. A player chooses the equipments, procedures or the actions to be done in the crime scene. Feedback in form of verbal and visual recommendations should be given to the players according to their actions. Multiple scenes can be used to illustrate different steps in the evidence collection procedure.

There is no limitation on the deployment platform nor the languages for implementation. It can be a mobile game, a desktop game, or a game where the player interacts with a number of IoT devices in a virtual scene.

Keywords: Digital Forensics, Computer Forensics, Game Design and Development

Requirements

Deliverables

References

  1. Electronic Crime Scene Investigation: A Guide for First Responders, Second Edition (PDF)

Automatic Cantonese speech translation

Project info

1-2 student, maximum 3 groups
Research and Development project
Streams: General, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Automatic translation from spoken or written Cantonese to written or spoken English are not easy to find. Indeed, for written Cantonese, a system that is accepted by most people does not seem to have been developed.

There are variants of Cantonese transcription, or jyutping, and active research in Cantonese linguistics

With comprehensive Cantonese resources such as jyuping input method, ShefCE Cantonese-English Bilingual Speech Corpus, and machine translation system such as Moses, there must be some aspects of the language that makes automatic translation difficult.

Students working on this project will carry out a comprehensive study of the issues around automatic Cantonese-English translation, and make reference implementations that address these issues, towards a practical solution of automatic Cantonese-to-English, or English-to-Cantonese speech translator.

Keywords: Cantonese, English, tranlation, jyuping, ShefCE, Moses, speech processing, cepstrum, formants, MFCC

Requirements

Deliverables

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

  1. Online jyuping input method
  2. ShefCE: A Cantonese-English Bilingual Speech Corpus
  3. Moses: statistical machine translation system
  4. Audacity

Real-time Speaker Recognizer

Project info

1 student, maximum 3 groups
Development project
Streams: General, Information Security, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Build a system that recognizes that labels the speakers in a recorded radio talk show or phone-in show, without need of prior training.

A more advanced version of the system should be language-independent. It should be able to take live speech, generating output as the input is analyzed, and possibly correcting earlier outputs when necessary.

Note that the student is expected to build their own collection of training and testing data.

Experimentation using systems such as Audacity, PureData, Octave, Mathematica, or Matlab is expected.

The system should be implemented in an operating-system independent way.

Application: some biometrics systems authenticate the user by speaker recognition. Your study may shed light on the usability of such a system.

Keywords: cepstrum, formants, MFCC, speaker diarisation

Requirements

Deliverables

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

  1. Audacity
  2. PureData
  3. GNU Octave
  4. SciPy
  5. NumPy
  6. matplotlib
  7. Archive of Speaker Diarization project at UC Berkeley

Bird Sound Recognizer

Project info

1 student, maximum 3 groups
Development project
Streams: General, Multimedia Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Listening to bird sounds turns out to be very important for bird watchers, as it not only locates the birds but also identify which species they are.

The project is about building a system that recognizes sounds of birds.

The project involves cleaning the sound, extracting the relevant section of course, identifying features, and matching against a database or recoginsing through a neural network or other methods.

Though it seems standard pattern recognition techniques can be applied to solve the problem, the variations of sounds of birds from the same species and the difficulty to collect enough samples makes the project practically difficult.

The problem is so interesting that the AI community have projects such as A.I.Experiments: Bird Sounds that visualises and clusters similar bird sounds.

Note that the student is expected to build their own collection of training and testing data. HKBWS Bird Call page is a good starting point. There are CDs of bird calls in the market as well.

It is best to start from a small collection of sounds of a few distinct species, and expand it to 20, 50 or 100.

The much larger collection of bird sounds of North America birds from The Macaulay Library of the Cornell Lab of Ornithology can be used to see later if your recognizer can handle a different data set.

Experimentation and visualization using systems such as Audacity, PureData, GNU Octave, Mathematica, or Matlab would be the fun part of the project.

The system should be implemented in an operating-system independent way.

Requirements

Deliverables

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

  1. HKBWS Bird Call page
  2. A.I.Experiments: Bird Sounds
  3. The Macaulay Library of the Cornell Lab of Ornithology
  4. Audacity
  5. PureData
  6. GNU Octave
  7. SciPy
  8. NumPy
  9. matplotlib

Vulnerability Analysis

Project info

1 student, maximum 1 group
Development project
Streams: General, Information Security
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

PoisonTap is an application installed on a Raspberry Pi Zero that emulates an Ethernet device over USB, exploits existing trust in various mechanisms of a machine and network, including USB/Thunderbolt, DHCP, DNS, and HTTP, to produce a snowball effect of information exfiltration, network access and installation of semi-permanent backdoors.

However, it is a tool developed half a decade ago, and computer systems nowadays may not be vulnerable to the attacks of the tool.

The student is to do implement PoisonTap on real hardware, and do a vulnerability analysis on various victim operating systems against the PoisonTap attacks.

One objective of the project is to analyse the vulnerabilities and propose attack prevention and detection solutions.

Another objective of the project is to make PoisonTap more powerful on various victim operating systems.

Requirements

Deliverables

References

  1. PoisonTap — siphons cookies, exposes internal router & installs web backdoor on locked computers
  2. Raspberry Pi Zero

Sequential associations

Project info

1 student, maximum 3 groups
Research project
Streams: General, Information Security, Multimedia Computing, Financial Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

Events often happen with a cause. A cause may lead to a effect, but not always for certain. Sometimes, it is hard to say which event is the cause, and which is the effect. The events can be just correlated and are affected by the same cause, or they just happen together by chance [2].

When many Hang Seng Index (HSI) constituents rise, the Hang Seng Index tends to go up. Earthquake at one place may be followed up by earthquakes in some other places along a fault. When the subtropical ridge of high pressure area is more to the West, more typhoons are expected to enter the South China Sea [1]. Rise of some stock prices may cause the rise of the prices of commodities, and the most talked-about stocks on social media may be the ones with greatest volatility. Some of these chages are immediate (e.g., HSI), and some delayed (e.g., earthquake). The cause of some delayed events may not be obvious (e.g., typhoon), and some seemingly correlated events may not even be have cause-effect relationship.

The project is about studying and applying algorithms on sequences of data to find out how they are associated with each other. The student is going to choose and acquire sequences of data they are interested in, and study and apply statistical techniques, sequential data mining algorithms, or temporal classification methods to find out how the events in the sequences are associated with each other. Applications include generating warnings (earthquake case), prediction of future stock index values (HSI case), prediction of the range of the number of typhoons entering South China Sea (typhoon case), or prediction of stock prices given market information (commodity and social media cases).

Requirements

Deliverables

References

  1. Why Tropical Cyclone Recurves? PAN Chi-kin; Hong Kong Observatory 2011-09.
  2. Spurious correlations

Investment recommendation system

Project info

1–2 students, maximum 3 groups
Research project
Streams: General, Financial Computing
Updated 2021-01-11[1]

Description

Read the general instructions at the General section first.

From historical and current or near-current data of a stock or index prices, design an algorithm that makes investment recommendations (e.g., buy, hold, sell, hedge) that would maximize the profit in a simulated environment.

Some factors the algorithm can take into the consideration include the day in month, weekday of day, time of day, various financial indicators, correlations between data from different time series.

Time series of non-numerical data such as news articles, Twitter feeds, or Facebook posts can be analysed to improve the accuracy of the prediction. Indeed, this has been proven to be quite effective in prior studies.

Note that the student is expected to build their own collection of training and testing data.

Be very careful about accuracy claims of better than 70% when you do literature research on how good their systems are, especially when the system uses historical numerical data or financial indicators only.

Students working on this topic are encouraged to take a wider view and analyse prices or indices of many stocks or markets.

Students producing good results early in the project can be linked up with real investment recommendation firms who can provide real time data for testing and may see your algorithms licensed and implemented for real.

Requirements

Deliverables

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

  1. Apache Spark
  2. Apache Spark MLlib
  3. SciPy
  4. NumPy
  5. matplotlib
  6. Weka
  7. scikit-learn