Databricks Data Privacy
Contact us to book this courseData Engineering
On-Site, Virtual
1/2 day
This content provides a comprehensive guide to managing data privacy within Databricks. It covers key topics like Delta Lake architecture, regional data isolation, GDPR/CCPA compliance, and Change Data Feed (CDF) usage. Through practical demos and hands-on labs, participants learn to use Unity Catalog features for securing sensitive data and ensuring compliance, empowering them to safeguard data integrity effectively.
Objectives
Prerequisites
- Ability to perform basic code development tasks using the Databricks Data Engineering & Data Science workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc)
- Intermediate programming experience with PySpark
- Extract data from a variety of file formats and data sources
- Apply a number of common transformations to clean data
- Reshape and manipulate complex data using advanced built-in functions
- Intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.)
- Beginner experience configuring and scheduling data pipelines using the Delta Live Tables (DLT) UI
- Beginner experience defining Delta Live Tables pipelines using PySpark
- Ingest and process data using Auto Loader and PySpark syntax
- Process Change Data Capture feeds with APPLY CHANGES INTO syntax
- Review pipeline event logs and results to troubleshoot DLT syntax
Course outline
- Regulatory Compliance
- Data Privacy
- Key Concepts and Components
- Audit Your Data
- Data Isolation
- Securing Data in Unity Catalog
- Pseudonymization & Anonymization
- Summary & Best Practices
- PII Data Security
- Capturing Changed Data
- Deleting Data in Databricks
- Processing Records from CDF and Propagating Changes
- Propagating Changes with CDF Lab