A Smart Email Client
to help user
Identify Malicious URLs




Introduction


Email spam is a serious problem in the cyber world. More than 50% of global mail traffic in 2019 are spam mails [1]. Many of those spam mails contain malicious URLs (Uniform Resource Locator). It is noticed that there are many url checkers available on the internet. However, one can hardly see email clients integrating the function of checking the safety of a URL. This project plans to build a smart email client which can help users to identify malicious url.


Background


Malicious URL is a URL that points to malicious website which hosts malicious content. Users visiting those malicious websites suffer from different kinds of attacks including drive-by download attacks and phishing [2]. In the drive-by download attack, malware is downloaded to the user's device, and hackers may control users’ device to perform other cyber attacks [3]. Phishing sites are websites with appearance identical to other legitimate websites with the purpose of luring user’s private information such as phone number and credit card number [4]. User may suffer monetary loss in phishing attacks.

There are hackers hiding the malicious URLs using a web service called URL shortening service. URL shortening is a web service that can create a short URL as an alias of any URL submitted by users. User visiting the short URL will be redirected to the page pointed by the original URL [5]. The purpose of this service is to convert long URLs to URL with reasonably short length for easy sharing. Some URL shortening service can also track the number of visitors of the short URL and provide the statistics to the link-creating user [6]. Hackers use this service to hide their malicious URLs, and users can hardly know whether the short URL will redirect them to malicious web site without accessing it [5].


Progress


Date Task performed Notes
26/9/2019 Constructed project plan -

Methodology


This project is divided into two parts: developing malicious URL Classification Model and developing Email Client that integrates the model


Malicious URL Classification Mode

This project will use the machine learning approach to train a classification model. Logistic regression algorithm will be used to analyze the data. It is one of the common algorithms that solves binary classification problem. The Python library scikit-learn will be used to implement the classification model. List of malicious and benign URLs can be collected from data sets available on the internet. The list of URLs will be divided into a training set and a testing set. The URLs will be processed to extract features for classification. In early stage of the project, the model will focus on lexical features in the URL for simplicity. Later, more features such as the features in the HTML file pointed by the URL will be applied to improve the accuracy of the model. The training data set will be used to train the classification model. After the training, the testing set will be used to test the accuracy of the trained model.


Email Client that integrates the model

This project aims to develop a smart email client, hence providing a comfortable user interface is also important. The project uses Python as backend, since Python has various library provided, including smtplib, which simplifies the step extracting data from other emails websites. In the early stage of the project, Gmail will be used for testing functions such as using backend to send and receive gmail through smtp. More kinds of email will be supported in the future. Due to security issues, users’ account, password and messages received through smtp will not be stored in the database, i.e. it takes more time to load details of an email, affecting efficiency. For frontend, React JS is used to create websites, since it provides most functions for websites, such as animation and handle clicking, moving or dragging. Basic functions such as showing a list of emails or showing details of an email, will be available during early stage. Further improvements, for example seasonal theme, and allowing sending email functions, will be added in the future.


Documents


- Project Plan

Contact Us



Leung Tak Fung
Student
Tel: +852 55465117
Email: tfleung6@connect.hku.hk


Cheng Ngai Tong
Student
Tel: +852 64857231
Email: ntcheng@cs.hku.hk


Professor Dr. T.W. Chim
Supervisor
Tel: +852 28578272
Email: twchim@cs.hku.hk

Reference


[1] M. Vergelis, T. Shcherbakova, and T. Sidorina , “Spam and phishing in Q2 2019,” Spam and phishing in Q2 2019, 15-May-2019. [Online]. Available: https://securelist.com/spam-and-phishing-in-q2-2019/92379/.

[2] D. R. Patil and J. Patil, “Survey on Malicious Web Pages Detection Techniques”, International Journal of u- and e-Service, Science and Technology. vol. 8, pp. 195-206, May 2015.

[3] M. Cova, C. Krügel and G. Vigna, “Detection and analysis of drive-by-download attacks and malicious JavaScript code”, Proceedings of the 19th International Conference on World Wide Web, WWW, Raleigh, North Carolina, 2010.

[4] D. Sahoo, C. Liu and C. H. Hoi, “Malicious URL Detection using Machine Learning: A Survey”, arXiv preprint arXiv:1701.07179, vol. 1

[5] S. Zanero, G. Stringhini, “Two years of short URLs internet measurement: security threats and countermeasures”, Proceedings of the 22nd international conference on World Wide Web, May 2013

[6] N. Nikiforakis, F. Maggi, G. Stringhini, M. Zubair Rafique, W. Joosen, C. Kruegel, F. Piessens, G. Vigna, S. Zanero, “Stranger danger: exploring the ecosystem of ad-based URL shortening services”, Proceeding WWW '14 Proceedings of the 23rd international conference on World wide web, April 07 - 11, 2014

[7] B. Eshete, A. Villafiorita, and K. Weldemariam, “BINSPECT: Holistic Analysis and Detection of Malicious Web Pages”, In: Keromytis A.D., Di Pietro R. (eds) Security and Privacy in Communication Networks. SecureComm 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 106. Springer, Berlin, Heidelberg

Image source: https://pixabay.com/illustrations/email-newsletter-marketing-online-3249062/