Data Ingestion to Delta Lake

Learning Track

Data Engineering

Delivery methods

On-Site, Virtual

Duration

1 day

This course prepares data professionals to leverage the Databricks Intelligence Platform to productionalize ETL pipelines. Students will use Delta Live Tables with Spark SQL and Python to define and schedule pipelines that incrementally process new data from a variety of data sources into the Lakehouse. Students will also orchestrate tasks with Databricks Workflows and promote code with Databricks Repos.

Objectives

By the end of this course, attendees should be able to:

Navigate and use the Databricks Data Science and Engineering Workspace for code development tasks.
Utilize Spark SQL and PySpark to extract data from various sources.
Apply common data cleaning transformations using Spark SQL and PySpark.
Manipulate complex data structures with advanced functions in Spark SQL and PySpark.

Prerequisites

Beginner familiarity with basic cloud concepts (virtual machines, object storage, identity management)
Ability to perform basic code development tasks (create compute, run code in notebooks, use basic notebook operations, import repos from git, etc.)
Intermediate familiarity with basic SQL concepts (CREATE, SELECT, INSERT, UPDATE, DELETE, WHILE, GROUP BY, JOIN, etc.)