Beta's projects for MSc students 2016—2017

General

Sequential associations

Project info

1 student, maximum 5 groups | Research project | Streams: General, Information Security, Multimedia Computing, Financial Computing | Project span: At least 1 year | Updated 2016-06-01[3]

Description

Events often happen with a cause. A cause may lead to a effect, but not always for certain. Sometimes, it is hard to say which event is the cause, and which is the effect. The events can be just correlated and are affected by the same cause, or they just happen together by chance.

When many Hang Seng Index (HSI) constituents rise, the Hang Seng Index tends to go up. Getting straight A's in exams may lead you to get a degree with distinction. When the subtropical ridge of high pressure area is more to the West, more typhoons are expected to enter the South China Sea [1]. Rise of some stock prices may cause the rise of the prices of commodities, and the most talked-about stocks on social media are the ones with greatest volatility. Some of these chages are immediate (e.g., HSI), and some delayed (e.g., distinction). The cause of some delayed events may not be obvious (e.g., typhoon), and some seemingly correlated events may not even be have cause-effect relationship (e.g., commodity, forum) [2].

The project is about studying and applying algorithms on sequences of data to find out how they are associated with each other. The student is going to choose and acquire sequences of data they are interested in, and study and apply statistical techniques, sequential data mining algorithms, or temporal classification methods to find out how the sequences are associated. Applications include generating academic advices (distinction case), prediction of future stock index values (HSI case), prediction of the range of the number of typhoons entering South China Sea (typhoon case), or prediction of stock prices given market information (commodity and forum cases).

Requirements

Deliverables

References

  1. Why Tropical Cyclone Recurves? PAN Chi-kin; Hong Kong Observatory 2011-09.
  2. Spurious correlations

Bird Sound Recognizer

Project info

1 student, maximum 3 groups | Development project | Streams: General, Multimedia Computing | Project span: At least 1 year | Updated 2016-06-01[3]

Description

Listening to bird sounds turns out to be very important for bird watchers, as it not only locates the birds but also identify which species they are.

Build a system that recognizes song of birds.

Note that the student is expected to build their own collection of training and testing data. HKBWS Bird Call page is a good starting point. There are CDs of bird calls in the market as well.

The project involves cleaning the sound, extracting the relevant section of course, identifying features, and matching against a database or recoginsing through a neural network or other methods.

It is best to start from a small collection of sounds of a few distinct species, and expand it to 20, 50 or 100.

Experimentation and visualization using systems such as Audacity, PureData, GNU Octave, Mathematica, or Matlab would be the fun part of the project.

The system should be implemented in an operating-system independent way.

Requirements

Deliverables

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

  1. HKBWS Bird Call page
  2. Audacity
  3. PureData
  4. GNU Octave
  5. SciPy
  6. NumPy
  7. matplotlib

Speaker Recognizer

Project info

1 student, maximum 2 groups | Development project | Streams: General, Multimedia Computing | Project span: At least 1 year | Updated 2016-06-01[3]

Description

Build a system that recognizes and labels the speakers in a recorded radio talk show or phone-in show.

Preferably, the system should be language-indpendent.

A more advanced version of the system should be able to run in stream mode and takes live speech, generating output or as the input is analyzed, and possibly correcting earlier outputs when necessary.

Note that the student is expected to build their own collection of training and testing data.

Experimentation using systems such as Audacity, PureData, Octave, Mathematica, or Matlab is expected.

The system should be implemented in an operating-system independent way.

Requirements

Deliverables

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

  1. Audacity
  2. PureData
  3. GNU Octave
  4. SciPy
  5. NumPy
  6. matplotlib

Distributed Financial data forecaster

Project info

1 student, maximum 2 groups | Research project | Streams: General, Financial Computing | Project span: At least 1 year | Updated 2016-06-01[3]

Description

Given the historical data of a number of stock or index prices at different points in time, design an algorithm that would predict their values in the future.

Some factors the algorithm can take into the consideration include the day in month, weekday of day, time of day, various financial indicators, correlations between data from different time series.

Time series of non-numerical data such as news articles, Tweeter feeds, or Facebook posts can be analysed to improve the accuracy of the prediction. Indeed, this has been proven to be quite effective in prior studies.

Note that the student is expected to build their own collection of training and testing data.

The prediction system should be run and tested in a distributed environment, using packages and libraries such as MLLib on PySpark, Hadoop, or MPI.

Also, note that since the non-distributed version of this project has been done in the past, students taking it up should show how their approaches are better than past approaches.

Be very careful about accuracy claims of better than 70% when you do literature research on how good their systems are, especially when the system uses historical numerical data or financial indicators only.

Requirements

Deliverables

References

Although some resources here are Python libraries, there is no restriction on the languages and tools you use. Indeed, a good data analysis and machine learning project like this one often requires the use of multiple languages.

  1. Apache Spark
  2. Apache Spark MLlib
  3. SciPy
  4. NumPy
  5. matplotlib
  6. Weka
  7. scikit-learn