Data Masking
Last updated
Last updated
The Data Masking solution transforms sensitive data contained in databases into less sensitive data according to your own specification and needs.
The solution allows companies to use this data while reducing data breach impact. It offers a response to security and regulatory issues (e.g. GDPR).
A substantial amount of masking techniques and efficient pseudonymisation parameters can be deployed while keeping the consistency of the data.
The solution is adaptable and it covers a large variety of database types such as Oracle, DB2, SQL Server, MySQL, MongoDB, Sybase, Teradata.
To benefit from the service, please contact your TAM/SDM to evaluate the complexity of the database to pseudonymize and related quotation.
Data Masking Service is built on top of TDM (Test Data Management).
A dictionary is a flat file or relational table that contains substitute data and a serial number. Dictionaries can be used to replace sensitive data in a table.
Various dictionaries already exist in TDM (first name, last name, country names, job position, etc.). Dictionaries can be used in Masking Rules in order to create substitution masking rules.
Custom dictionaries can be imported or created:
If results need to contain specific data. Example: for first name, only subset of list name; for country, only EU country, etc.
If dictionaries do not fit a project specifications
Masking rules are tools used to mask data. Different types of masking rules can be defined and many masking rules exist by default.
Example of Masking Rule properties :
Repeatable output returns deterministic value each time when the source and the seed values are the same. It depends on the Seed, that can be changed at will.
Unique substitution Data returns unique value for each unique entry and Seed.
Dictionary: dictionary used in the rule
Masked value: determines the value that will be masked in the database (and which column is "applied")
Lookup Column: for more advanced rules, defines a lookup condition that can be used to trigger conditional results
Serial Number column: id used by the application (transparent for the user)
Advanced custom masking rules can be created if needed.
Examples:
Masking rule based on several entries. Ex: first name + last name concatenation
Advanced masking rule combining several masking rules. Ex : Masking a field using different dictionaries depending on the content of another field in the database.
Below are some non-exhaustive masking techniques:
Substitution : Replaces a column of data with similar data from a dictionnary
Randomisation : Produces random results for the same source data and masking rules
Blurring : Return a random value that is close to the original value
Key : produces deterministic results for the same source data, masking rule and seed value
Expression : Applies an expression to the data and return the masked or changed data. Example: concatenation
Nullification : Replace a column or data with null value
Encryption : Transform data into unintelligible data using a cryptographic algorithm and a defined key.
Credit Card : Applies a build-in technique to disguise credit card number
IP Address : Applies a built in technique to disguise IP Adress
Phone : Applies a built in technique to disguise Phone number
Email : Applies a built in technique to disguise email address
Shuffle : Applies randomly sensitive column values from one row to another row in the same table
Advanced : Applies customisable masking technique to multiple input and output columns.
A project is an entity that defines the link between masking rules and databases.
Project connections: Each project is associated to one (in case of in place masking) or several (in case of instream masking) database by using connections (ex: oracle jdbc string).
Rule mapping: Once created, rules can be mapped with database entries in "Projects". For each column that needs to be masked, a masking rule is added:
Once all mapping attributes have been entered according to the client's needs, the project can be launched
Job execution: Each project can be executed to perform the masking. The masking operation is called "job". During a job, TDM application performs the masking of the database "on the fly". The database is not stored locally on the worker node, nor is it retained in the application.
Nodes designate the virtual machine on which the job is performed. For each client, one node is created. The performance of the node depends on the complexity of the masking job.
Different types of masking can be performed according to the databases used :
In-Stream: This is the commonly used masking type. It involves two database: one "source" containing the data you want to mask, and one destination empty database. Masking is performed "on the fly" while copying data from the source to the destination database. The destination database must be empty.
In-place: one database. Masking is performed on the database.