Serving AI using a Distributed Architecture

In recent years, artificial intelligence (AI) has penetrated multiple dimensions of people’s daily lives by making the devices they use smarter. Fueled by data, AI programs imitate human intelligence in terms of their learning and behavioral capabilities.

With such widespread usage, however, users demand improved functionalities and speed, pushing developers and data scientists to make their programs smarter amid industry competition. These smarter programs have to deal with more complexities, and developers consequently have to choose whether to prioritize the program’s features or performance, posing a dilemma for them.

This project proposes to design an AI application with a distributed architecture instead of a centralized architecture (the more common structure in the status quo) to improve its latency, efficiency, and throughput. As proof of concept, the project specifically examines a complex image analysis service.

The project’s objective is to develop tooling and foundation to automatically instantiate and compare distributed systems of a variety of specifications and scheduling algorithms. There are three milestones in this project.

Firstly, the machine learning stage where test models have to be developed. Second, modifying the model serving to work in a distributed manner. Lastly, comparing distributed implementations which is the most crucial aspect of this project.

The project shows that distributed implementations of AI applications lead to better latency, efficiency and throughput.

Final Project Report can be accessed here.

The interim presentation of the project (recorded on Feb 10, 2020) can be accessed here (slides available here).