Microsoft Purview: A System for Central Governance of Data

Author:

Ahmad Shafi1,Arumugam Dillidorai1,Bozovic Srdan1,Degefa Elnata1,Duvvuri Sailesh1,Gott Steven1,Gupta Nitish1,Hammer Joachim1,Kaluskar Nivedita1,Kaushik Raghav1,Khanduja Rakesh1,Mujumdar Prasad1,Malhotra Gaurav1,Naik Pankaj1,Ogg Nikolas1,Parthasarthy Krishna Kumar1,Ramakrishnan Raghu1,Rodriguez Vlad1,Sharma Rahul1,Szymaszek Jakub1,Wolter Andreas1

Affiliation:

1. Microsoft Corporation

Abstract

Modern data estates are spread across data located on premises, on the edge and in one or more public clouds, spread across various sources like multiple relational databases, file and storage systems, and no-SQL systems, both operational and analytic; this phenomenon is referred to as data sprawl. Data administrators who wish to enforce compliance across the entire organization have to inventory their data, identify what parts of it are sensitive, and govern the sensitive data appropriately --- across the entirety of their sprawling data estate. Today, governance of data is completely siloed; each of the data subsystems has its own (and varied) governance features. Policies applied to sensitive data are applied piece-meal by iterating over all the data sources in a custom language specific to each source. This makes data governance cumbersome, error-prone (because a given policy must be manually enforced across different subsystems, inconsistencies can easily arise), and expensive. This paper presents Microsoft Purview , a service for unified governance of the entire data estate of an organization from a single central pane of glass. The Purview service consists of three parts: (1) a Data Map or metadata catalog that is populated by automated scanning of data sources in the organization, (2) a system to store and manage sensitivity classification of data, and (3) a policy system that enables data security officers to author and implement policies that span the entire organization, e.g., a policy that says, "Non-full-time employees should be denied access to data classified as PII (Personally Identifiable Information.") Purview transforms data governance across a complex data estate by offering the ability to govern centrally and automating data discovery, classification and policy enforcement. While other commercial catalog systems also build a global catalog, Purview is unique in its support for policies. It is also distinguished by covering both structured and unstructured data, thanks to its deep integration with Office 365 and its governance framework; indeed, "Microsoft Purview" represents a new unified offering that combines Office 365 governance and what was formerly a service for governing structured data called "Azure Purview". By integrating with Office 365's Rights Management Service, Purview offers central governance over structured data stored in databases and stores, reports in systems such as Power BI, as well as document data stored in Office 365. The Purview vision is to make the metadata in the Data Map increasingly richer through further automation and curation support and to use this 360 degree view of the data estate to support a wide range of governance policies, ranging from access control to lifecycle management (e.g., retention, deletion, restricting data movement). This paper covers the design and implementation challenges in building the Purview service for Attribute-Based Access Control (ABAC) policies, focusing specifically on a detailed description of its integration with Azure SQL Database. We illustrate the power of unifying Office 365 governance with structured data governance through Purview policies that enforce consistent access control even as data flows between Office 365 and structured data engines like Azure SQL Database. We also describe the results of our empirical evaluation of the performance overheads imposed by Purview.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Reference19 articles.

1. Alation Data Catalog and Data Governance 2021. https://www.alation.com/. Alation Data Catalog and Data Governance 2021. https://www.alation.com/.

2. Amazon Web Services Identity and Access Management 2023. https://aws.amazon.com/iam/. Amazon Web Services Identity and Access Management 2023. https://aws.amazon.com/iam/.

3. Apache Atlas : Data Governance and Metadata Framework 2021 . https://atlas.apache.org/. Apache Atlas: Data Governance and Metadata Framework 2021. https://atlas.apache.org/.

4. Apache Ranger 2023. https://ranger.apache.org/. Apache Ranger 2023. https://ranger.apache.org/.

5. Azure Active Directory 2023. https://azure.microsoft.com/en-us/services/active-directory/. Azure Active Directory 2023. https://azure.microsoft.com/en-us/services/active-directory/.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. The Potential Benefits of Integrating Business Intelligence and CRM;Advances in Marketing, Customer Relationship Management, and E-Services;2024-06-28

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3