• Databricks
  • Data Engineering

Data Management and Governance with Unity Catalog

Contact us to book this course
Learning Track icon
Learning Track

Data Engineering

Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1/2 day

Provides an introduction to the Databricks Data Intelligence Platform from a Data Engineer perspective.

In this course, you will learn about data governance and management using Databricks Unity Catalog. It begins with foundational concepts of data governance, highlighting the complexities and challenges in managing data lakes and the key functionalities of the Unity Catalog. The course then delves into Unity Catalog's architecture, emphasizing key concepts such as metastores, schemas, tables, and external storage access. Security and administration are thoroughly covered, detailing Databricks roles, identity management, and the security model. Advanced topics include fine-grained access control and privilege management, equipping learners with the skills to implement robust data governance and security measures in the Unity Catalog. The course includes practical demos and labs to reinforce theoretical knowledge.

Objectives

By the end of this course, attendees should be able to:

  • Explain the importance of data governance and challenges in traditional data lake environments.

  • Differentiate between managed and external tables, and evaluate the architecture of Unity Catalog.

  • Utilize SQL commands to navigate and inspect metastore components, and assess data segregation strategies.

  • Identify query lifecycle steps and Databricks roles for effective data governance and security within Unity Catalog.

  • Implement privilege assignments and fine-grained access control strategies using SQL syntax and dynamic views in Databricks.

  • Assess the effectiveness and implications of different privilege scenarios, inheritance models, and access control mechanisms in Unity Catalog.

Prerequisites

  • Beginner familiarity with cloud computing concepts (virtual machines, object storage, etc.)

     

  • Intermediate experience with basic SQL concepts such as SQL commands, aggregate functions, filters and sorting, indexes, tables, and views.

  • Basic knowledge of Python programming, jupyter notebook interface, and PySpark fundamentals.

Course outline

  • Demo: Populating the Metastore
  • Lab: Navigating the Metastore
  • Organization and Access Patterns
  • Demo: Upgrading Tables to Unity Catalog
  • Security and Administration in Unity Catalog
  • Databricks Marketplace Overview
  • Privileges in Unity Catalog
  • Demo: Controlling Access to Data
  • Fine-grained Access Control
  • Lab: Securing Data in Unity Catalog
  • Lakehouse Monitoring
  • Demo: Lakehouse Monitoring

Ready to accelerate your team's innovation?