• Databricks
  • Data Engineering

Databricks Streaming and Delta Live Tables

Contact us to book this course
Learning Track icon
Learning Track

Data Engineering

Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1 day

The Databricks Streaming and Delta Live Tables (SDLT) course is designed to prepare students for the Databricks Certified Professional Data Engineer certification exam. The content for this course consists of the Professional-level modules of the Data Engineer Learning Path and can be delivered as instructor-led training (ILT)

Objectives

  • Describe the computation model used by Spark Structured Streaming
  • Configure required options to perform a streaming read on a source
  • Perform aggregation on a streaming dataset and describe watermarking
  • Ingest raw streaming data into a multiplex bronze table and apply metadata
  • Enforce quality with expectations and quarantine tables
  • Explore and tune state information using streaming joins

Prerequisites

  • Ability to perform basic code development tasks using the Databricks Data Engineering and Data Science workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc.)
  • Intermediate programming experience with PySpark

    * Extract data from a variety of file formats and data sources

    * Apply a number of common transformations to clean data

    * Reshape and manipulate complex data using advanced built-in functions

  • Intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions, etc.)
  • Beginner experience configuring and scheduling data pipelines using the Delta Live Tables (DLT) UI
  • Beginner experience defining Delta Live Tables pipelines using PySpark

    * Ingest and process data using Auto Loader and PySpark syntax

    * Process Change Data Capture feeds with APPLY CHANGES INTO syntax

    * Review pipeline event logs and results to troubleshoot DLT syntax

Course outline

  • Streaming Data Concepts
  • Introduction to Apache Spark™ Structured Streaming
  • Reading from a Streaming Query
  • Aggregations, Time Windows, Watermarks
  • Windowed Aggregation with Watermark
  • Data Ingestion Patterns
  • Auto Load to Bronze
  • Stream from Multiplex Bronze
  • Quality Enforcement Patterns
  • Quality Enforcement

Ready to accelerate your team's innovation?