Timothy Smith
Three example alternatives of tracking behavior by data masking the specific time

Data Masking or Altering Behavioral Information

June 26, 2020 by

As tracking behavioral data becomes increasingly popular, firms may overlook areas where they can collect the same information while data masking details that can be used in a compromise. Behavioral data collection can be extremely dangerous as it allows attackers a wide range of attacks, from spoofing targets to automating custom attacks on targets. Since behavior can reveal key details about us, this information may be as costly as private identifiable information. When tracking behavioral data, we want to weigh risks, and, in some cases, we can accomplish the same result without specific details. In other cases, we may want to mask specific behavioral information on reports that are generated, even if we retain the specific time. We’ll look at a method where we can accomplish either – updating data to remove time or data masking specific time while returning the information we want.

An example involving Behavioral Consistency

One popular metric to track that involves behavior data is consistency – the number of times a person performs a behavior over a period. Attackers value behavioral data because knowing when a person does something is useful when you’re preparing an attack. For instance, in a sim-swapping attack, knowing when a person isn’t on their phone helps the hacker proceed with the attack before it can be stopped. This also applies to attacking a bank account while a customer is on vacation. Behavioral data involving time are often tracked by the specific time of the activity and the length of the activity during the day.

For our example, we’ll only look at a scenario where we track the behavior by day and time of day and how we can use data masking or altering to accomplish the same solution, but without tracking specific times which may help attackers. We’ll start by creating a table with 11 records and have random times of sequential days added to the table to mimic an 11-day behavioral streak of a user. What we see in our result (below image) is that we have 11 days in a row of various times where a user has done an activity. Because we use the RAND() function to create these times, your time values will differ.

Our example data set we’ll be altering and masking

Our example data set we’ll be altering and masking

What we see in our example is unmasked data of the time a user completed a task for a specific day. This, along with other behavioral data, would uncover the activities of a user, which could be useful to an attacker.

Before data masking in this example, we should ask, “What are we trying to accomplish by tracking this behavior?” In this example, we may want to identify the number of days that a user has completed a task and track the days of the activity over time. Or we may want to know how many times the user has done a task over the past month. Unless we have other uses for more detailed information (while factoring in risks), we can accomplish the same result without tracking as many details. This follows the least data principle for risk scenarios – in situations where we may be liable for data; we should track the least amount of data possible to accomplish the same task. In addition to saving us resources, this reduces our risk of being liable for data exposure if an attack exposes information.

To mask detailed information while returning the information we want, we can format our date without the time, by resetting the time of day to midnight or tracking only the last month of a login. The below query shows us three ways in which we can use data masking with a date to these alternative values that accomplish the same task:

Three example alternatives of tracking behavior by data masking the specific time

Three example alternatives to tracking behavior by data masking the specific time

Depending on what we find most appropriate, we would choose the solution that accomplishes the same task while masking a user’s behavior information involving specific times of the day.

  • The simple date results in us knowing the streak without knowing the specific time along with the latest date
  • The complex date results in us knowing the streak and setting the time to its earliest possible value along with the latest date
  • The month only results in us knowing the latest month of activity

From these example outputs, we could return these values in a report to hide the specific time (masking), or we could update the values and remove the specific time (altering).

Tracking Specific Data by Feature

Before we solve for data masking or altering of behavioral data, consider that if users want detailed information in our software, we may have these as features, they can add over standard features that exist. Because risks exist for some of these features (like specific times in our example), we can both caution users and charge them for these additional features – as their compromise may result in litigation for our firm and a charge prepares for this. Unfortunately, some users may not be aware of risks with behavioral data. Cautioning users before they add the feature with a charge would alert them to these risks. In general, a good software principle regarding optional data features is don’t create data features that users haven’t requested and may add risks if compromised.

Example of using optional features where the default doesn’t track any behavioral data.

Example of using optional features where the default doesn’t track any behavioral data

We should also consider that sometimes detailed information may not be required even for the user. We can avoid data masking because we wouldn’t need to store data in these situations. Consider an example with orders where an email confirms an order – if the user needed the specific day and time of the order, the email confirmation would identify this for the user outside our system without us having the specific time in our database.

Summary

The principles of tracking as little data as required with behavioral data apply to personally identifiable data. For example, never ask for information that is not required and would be costly if compromised. In most cases, our application may need very little information from users. The more we ask for, the more we may be responsible for in the long run as data breaches increase. When we do store information, we should use data masking techniques that accomplish the same task with the least amount of data.

Timothy Smith
Data masking, Security

About Timothy Smith

Tim manages hundreds of SQL Server and MongoDB instances, and focuses primarily on designing the appropriate architecture for the business model. He has spent a decade working in FinTech, along with a few years in BioTech and Energy Tech. He hosts the West Texas SQL Server Users' Group, as well as teaches courses and writes articles on SQL Server, ETL, and PowerShell. In his free time, he is a contributor to the decentralized financial industry. View all posts by Timothy Smith

168 Views