AdTech Explained: Data Leakage


Ah, data. The lifeblood pumping through digital advertising. Data leakage? A potential hemorrhage, causing data value depreciation and brand safety risks to creep up on publishers and marketers, until... beeeeeeeeeeeeeeeeep. 

Definition: Data Leakage

According to DigiDay:
"Data leakage typically occurs when an ... ad tech company collects data about a website’s audience and subsequently uses that data without the initial publisher’s permission." 

The piggybacking problem 
While data leakage can happen in several ways, the most common is Piggybacking (see the full definition of piggybacking). When a user lands on a website, multiple tech tags fire, targeting the user and loading ad content. These tags then load additional tags – piggybacking tech. 

As of October 2018, the average advertising-driven website loads 62 separate advertising and marketing technologies on their home page. In turn, those 62 technologies load other technologies (on average, an additional 55 others{1}). These additional technology loads are called "piggybackers" as each tech tag load is loading on the back of another technology. These tag loads may or may not be part of a contract between publisher and technology provider, and are often unknown to the site administrators. While this is a rampant practice within programmatic advertising, it is not limited to programmatic – anytime a first-party tech tag is loaded (e.g., via a direct ad buy), beware: piggybacking may be present.

Why all the Leaking?
There are many reasons for piggybacking – many of which are both beneficial to, and requested by, publishers (passing data to help fulfill a bid request, for instance). On the other hand, there are also more nefarious tactics in play (passing along data to create behavioral profiles that allow a buyer to target that same user on a different, less expensive, websites). Uncool, yes. Clever? Absolutely. 

How common is it?
Having spoken with countless publishers, they are, without exception, almost always surprised by the breadth and scope of leakage activity on their domains Those familiar with AdTech are not unaware that piggybacking occurs – they are simply shocked by the number of companies with access to their data.

As an example, in reviewing Industry Index Top 5,000 Publishers data, 2,394 work directly with Krux (a Data Management Platform now owned by Salesforce). In 76% of these relationships, Krux shares data with no other party, but in the remaining 24%, (582 publishers) Krux leaks data to an average of 18.4 other vendors. (BTW, not picking on Krux specifically... We found one DMP that leaks data to an average of 82(!) other companies.){2}

What type of data is leaked?
In an illuminating article about this type of data leakage, Brian O'Kelley, CEO of AppNexus explains:

"Every one of these third-party vendors gets information about the user. Here's a subset of what they can find out when the browser makes the third-party call:

  • The URL of the page
  • The referring URL (where the user came from before this)
  • The user's IP address
  • The user's browser information (user agent)
  • The latitude and longitude of the user
  • The user's cookie ID
  • The user's gender and year of birth
  • Publisher-specific user data

What's scary about this is that we have no idea who this data is being sent to. It could be good actors who are only using this data in order to decide whether to bid on this request. What's stopping the bidders from storing this data and creating behavioral profiles?

More info on the above can be found in our post on cookie syncing.

When third-party advertising tools collect data about a website's audience, said data is no longer owned by the website. While not illegal, and often within the T&Cs of a contract, the reality is publishers have no idea the extent of this leakage, have no idea if their leakage data is within industry norms, and almost never know which companies receive the data. At least, not without a lot of work. 

Discovering your own data leakage
You can find out if data leakage is affecting your site by finding piggybackers on your domains at – just enter your company's URL. 


{1} (2)  Industry Index StackFinder(TM) data, October 2018.