• Databricks
  • Data Engineering

Data Ingestion to Delta Lake

Contact us to book this course
Learning Track icon
Learning Track

Data Engineering

Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1 day

This course prepares data professionals to leverage the Databricks Intelligence Platform to productionalize ETL pipelines. Students will use Delta Live Tables with Spark SQL and Python to define and schedule pipelines that incrementally process new data from a variety of data sources into the Lakehouse. Students will also orchestrate tasks with Databricks Workflows and promote code with Databricks Repos.

Objectives

By the end of this course, attendees should be able to:
  • Navigate and use the Databricks Data Science and Engineering Workspace for code development tasks.

  • Utilize Spark SQL and PySpark to extract data from various sources.

  • Apply common data cleaning transformations using Spark SQL and PySpark.

  • Manipulate complex data structures with advanced functions in Spark SQL and PySpark.

Prerequisites

  • Beginner familiarity with basic cloud concepts (virtual machines, object storage, identity management)
  • Ability to perform basic code development tasks (create compute, run code in notebooks, use basic notebook operations, import repos from git, etc.)
  • Intermediate familiarity with basic SQL concepts (CREATESELECTINSERTUPDATEDELETEWHILEGROUP BYJOIN, etc.)

Course outline

  • Set Up and Load Delta Tables
  • Basic Transformations
  • Load Data Lab
  • Cleaning Data
  • Complex Transformations
  • SQL UDFs
  • Advanced Delta Lake Features
  • Manipulate Delta Tables Lab

Ready to accelerate your team's innovation?