This chapter discussed data‐related security concepts. The number one takeaway is that although you now know much about data security and some security design and implementation principals, you engage with a security professional before, during, and after the design and implementation of your data security solution. Before beginning any data security design, you must perform an audit to find out what data sources you have and what data exists within them. The best tool for this is Microsoft Purview. Once you know what you have, you should identify and mark the data type and tag the data using sensitivity labels. Both encryption‐at‐rest and encryption‐in‐transit are required aspects of data protection. Encryption‐at‐rest is enabled by default for most Azure data products, whereas encryption‐in‐transit requires the use of TLS when the data transmission happens using HTTP. Row‐level security, column‐level security, and data masking are useful methods for displaying data to a consumer in a privacy‐compliant manner. Some rows may be viewable by the owner and not users, and certain columns on row, like an employee’s salary, may have the similar viewability constraint, whereas some columns like an email address, government ID, or credit card can be masked, exposing only a few digits of the value.
This chapter also introduced Azure Key Vault and managed identities. You know that password leakage is an issue and that using Azure Key Vault and managed identities together is a way to stop it. Creating a group in Azure Active Directory and assigning permission to it is the recommended approach for granting permissions to users. Due to the complexities of implementing security, applying permissions to a group means it needs to happen only once. You can add someone who needs those permissions to the group, instead of granting permissions multiple times. Granting the group access to an Azure product is achieved using RBAC roles. An additional method for data protection is to use ACLs in an Azure data lake. VNets, NSGs, private endpoints, and firewalls are networking features that can be used to protect data on the networking layer. The chapter ended with some examples of masking and encrypting data in Parquet files using Python running an Apache Spark pool.