Objectives

Firstly, we will write a canonicalizer which can canonicalize the original KB to build a less-redundant better-linked KB. We will investigate on more effective similarity functions which can make canonicalization more accurate. Meanwhile, we will also try out several approximation methods to simply the computation of similarity function, which can hopefully improve the efficiency of clustering process

Secondly, we will inspect the implementation details of current QA systems and try to adjust it to apply on canonicalized KB. As the feature vectors will be totally different from the traditional KBs, we have to train a new machine learning model. We will try different models and different parameters on each of them to increase the accuracy of matching

Thirdly, we will also work on automatic population of KBs. Sometimes, different assertions have to be linked together to give us the answer to certain questions. As it’s difficult to implement a QA system which can link assertions together, we can instead add a new assertion into the KB. Refer to the example above, we can write a program, which we will call it automatic populater, which can add the assertion (”John”,”is the father of”,”Henry”) into the KB. If this can be done automatically, our QA system will be able to answer more comprehensive questions.

Finally, we will investigate on how to apply a combined parsing technique to ensure a high accuracy for question parsing. The aforementioned OQA system employed one way of combining handwritten template and machine learning model, and there are many other ways to build an ensemble parser. We will spend some time to investigate on that.

Next up: Methodologies