Using Apache Spark on Amazon EMR with SageMaker for End-to-End ML and Data Science Workflows

Resources

Using Apache Spark on Amazon EMR with SageMaker for End-to-End ML and Data Science Workflows

Webinar

Published May 2022

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). It provides a single, web-based visual interface where you can perform all ML development steps required to prepare data, as well as to build, train, and deploy models. Analyzing, transforming, and preparing large amounts of data is a foundational step of any data science and ML workflow. Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto. In this talk, we will demonstrate recent integrations between the services making it really simple for Data Scientists and Machine Learning Engineers to use distributed big data frameworks such as Spark in their machine learning workflow.

Learning Objectives:

How to use a unified notebook-centric experience to create and manage EMR clusters, run analytics on those clusters, and train and deploy SageMaker models
How to use a one-click interface for debugging and monitoring Amazon EMR jobs through the Spark UI.
How data workers can discover, connect, create, and stop clusters in a multi-account setup.

WANT TO DOWNLOAD THIS WHITEPAPER?

OR

Sign up
TO DOWNLOAD

Receive The Register's Tech Resources update (access industry whitepapers, reports, eBooks etc.)

Receive The Register's Events update (webcasts, in-person events, lectures and workshops)

You can update your preferences, unsubscribe or delete your account at any time by logging into the site, or via the links at the bottom of any of our emails.

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Topics

Special Features

Vendor Voice

Resources

Resources