• Databricks
  • Data Engineering

Databricks Data Privacy

Contact us to book this course
Learning Track icon
Learning Track

Data Engineering

Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1/2 day

This content provides a comprehensive guide to managing data privacy within Databricks. It covers key topics like Delta Lake architecture, regional data isolation, GDPR/CCPA compliance, and Change Data Feed (CDF) usage. Through practical demos and hands-on labs, participants learn to use Unity Catalog features for securing sensitive data and ensuring compliance, empowering them to safeguard data integrity effectively.

Objectives

- Store sensitive data appropriately to simplify granting access and processing deletes.
- Perform data masking and configure fine grained access control to configure appropriate privileges to sensitive data.
- Process deletes to ensure compliance with the right to be forgotten.
 

Prerequisites

- Ability to perform basic code development tasks using the Databricks Data Engineering & Data Science workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc)
- Intermediate programming experience with PySpark
- Extract data from a variety of file formats and data sources
- Apply a number of common transformations to clean data
- Reshape and manipulate complex data using advanced built-in functions
- Intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.)
- Beginner experience configuring and scheduling data pipelines using the Delta Live Tables (DLT) UI
- Beginner experience defining Delta Live Tables pipelines using PySpark
- Ingest and process data using Auto Loader and PySpark syntax
- Process Change Data Capture feeds with APPLY CHANGES INTO syntax
- Review pipeline event logs and results to troubleshoot DLT syntax

Course outline

  • Regulatory Compliance
  • Data Privacy
  • Key Concepts and Components
  • Audit Your Data
  • Data Isolation
  • Securing Data in Unity Catalog
  • Pseudonymization & Anonymization
  • Summary & Best Practices
  • PII Data Security
  • Capturing Changed Data
  • Deleting Data in Databricks
  • Processing Records from CDF and Propagating Changes
  • Propagating Changes with CDF Lab

Ready to accelerate your team's innovation?