Deploying machine learning models the serverless way

Join the MCubed webcast for tips on getting Machine Learning production-ready without breaking the bank

Looking into machine learning disciplines like natural language processing (NLP), models with insane amounts of parameters currently get practitioners swooning when it comes to precision. However, enthusiasm fades when the question of getting them into production comes up, as their sheer size turns deployments into quite a struggle. Add the costs of setting up and maintaining the infrastructure needed to make that step happen, and you begin to understand why approaches like BERT and GPT aren’t more widely used.

Serverless Computing, with its promise of automatically scalable, managed computing capabilities to satisfy changing demands and keep costs low, might be the solution here. The reason why it hasn’t caught on yet, can be found in the limitations cloud providers impose on their users. Conventional deployment package sizes for example don’t exactly lend itself to NLP scenarios, which is why researchers around the world work on ways to shrink down their models to fit serverless specifications.

Marek Šuppa is one of them and will join episode 4 of our MCubed web lecture series for machine learning practitioners on December 2nd to talk about recent developments. Šuppa is head of data at Q&A and polling app Slido, where he and some colleagues used the last year to investigate ways to modify BERT models for sentiment analysis and classification so that they can be used in serverless environments - without dreaded performance degradations.

In his talk, Šuppa will speak a bit about his team’s use-case, troubles they encountered during their studies, and the approaches they found to be the most promising to reach latency levels appropriate for production environments.