BeLibre Leak Detector: What your website tells strangers about your visitors

Some fancy new socks to go with my new outfit. That’s what I need to finish it off! I look up an Belgian online shop specialized in socks and with a great reputation. Colleague recommended it, never been there before. Despite the exciting assortment, I decide to buy nothing. All the good ones are sold out in my size. When I close my tab, little do I know that around ten companies have been told that I exist, and what my soft spots are. Several of them aren’t even European.

And you know what? It seems those big American ones have known me for a while already. Google and Meta, were riding along for a large slice of my past week, the same handful of companies sat quietly on most of the sites I visited. The socks are not the whole story. There was the perfume I looked at yesterday, and the barber I booked for next Tuesday, the table for two at the cosy place around the corner. Any one of those visits is trivia. Stitched together across thousands of sites that all phone the same few companies, it becomes a fairly complete picture of my life, assembled by firms I have no relationship with and cannot ring up to ask what they have on file.

Meta’s share is more intimate, because the pixel on the shop links this visit to my social account, the one carrying my real name and friends. On their platform, they log whatever they want: where I stopped scrolling and lingered, the message I started typing into a box and then deleted before posting, every photo I paused on long enough to like. I don’t want to imagine what they already know and deduced about me, and what the socks tell them on top.

I can’t blame the shop owner. He just wanted the insights, and Google and Meta offer that for free. He isn’t aware what he’s spilling about his customers. He’s not an IT-guy, he just wants to sell great socks to a lot of people. And make an earnest living.

So is there something I can do to gain more insight in what’s going on on a site? Is there something I can tell the owner of the sock-shop?

Well, there is a new tool from BeLibre, and the code lives on Codeberg. This article explains what it does, what you can do with it, and the bit that matters most: how to read what it tells you.

# What it does, in one breath

You browse a website normally. The tool records the session, then reads it back to you and reports exactly which third parties received data about the visit, and what kind of data they got. See something worth saving? Alt-Ctrl-S allows you to dump a screenshot to your capture.

The report comes in three layers. A summary at the top scores its findings red, yellow, or green and tells you what to do about each one. Below that, a breakdown per vendor. Below that, a request-by-request drill-down for anyone who wants to see the receipts.

Two things make it more than a tracker counter. First, every vendor it finds is tagged with its jurisdiction and its exposure to extra-territorial law, so a US analytics provider gets flagged against the CLOUD Act, FISA 702, and the Schrems II ruling. The output reads as a digital-sovereignty audit, not just a list of cookies. Second, it catches the cases that hide: trackers disguised behind first-party-looking hostnames, cookies with lifetimes measured in presidential terms, and identifiers that quietly follow a visitor from site to site.

It also looks past the browser at the back office. Alongside the page capture it inspects the a whole bunch of technical data (DNS): where the authoritative name servers live, which provider handles inbound mail, whether the zone is DNSSEC-signed, whether DMARC actually blocks spoofed mail or only watches it, and which vendors the site has quietly admitted using through its DNS records. A shop can host its pages in Belgium and still run its analytics, its mail, and its DNS through providers on another continent. The report shows both halves.

There is also a bulk mode for scanning a whole list of websites in one go, which we will come back to.

The tool is released under the GPLv3. If you want to know precisely how the sausage is recorded, the source is right there and you are welcome to read every line.

# Installing it

We will not reprint the install steps here, because the README already does it well and nobody reads installation prose for pleasure. You need Python and a recent Firefox. The rest is three commands and a coffee.

# Running it: four things to try

This is where it gets interesting. Here are four ways to point it at the world.

# 1. Your favourite online shop

Start with the sock scene from the top, except now you have the report open. We ran it against a real Belgian sock shop, browsed for about eighty seconds, and bought nothing.

The first instruction will help you create the capture. The moment you close the window, the capture gets saved at the given location:

python -m leak_inspector capture https://example.com --out captures/example.zip

The next instruction will generate a report. The output can be text, html, markdown_summary or markdown_detailed - or even json if you intend to parse it in the next tool. It will be saved in the file named report.html:

python -m leak_inspector analyze captures/example.zip --format html > report.html

Here an example of the trimmed summary of our socks experiment:

VERDICT
  10 external vendors contacted. 11 personal-data fields observed leaving
  via 4 trackers: Google Ads / DoubleClick, Google Analytics 4,
  Google Tag Manager, Meta (Facebook) Pixel.
  Nothing requiring action was found.

KEY FINDINGS — WEBSITE
  🔴 1 persistent cross-site tracking cookie set. Vendor: Google Ads / DoubleClick.
  🔴 2 vendors under extra-territorial jurisdiction. 2× 🇺🇸 US (Google, Meta).
  🟡 4 unclassified third-party hosts.

KEY FINDINGS — BACK-OFFICE
  🟡 Authoritative DNS hosted under extra-territorial jurisdiction. 🇺🇸 US (Cloudflare).
  🟡 Inbound mail handled by an extra-territorial provider. 🇺🇸 US (Google).
  🔴 4 third-party SaaS relationships self-disclosed via DNS. US: Google, Meta.

This is the report of a tidy, competently run shop, not infested with hundreds of trackers or even a screen recorder. The pages are hosted in the Netherlands, so nicely inside the EU. By the standards of commercial browsing, it behaves well.

And yet. Its analytics, its advertising pixel, its inbound mail, and its DNS all sit under US jurisdiction. The shop almost certainly did not decide this on purpose. Someone added Google Tag Manager years ago and it brought friends, the mail went to Google because that is what everyone does, and the DNS landed at Cloudflare without a second thought. None of it is malicious. All of it is exactly the kind of quiet default that a sovereignty audit exists to make visible. That is the real lesson of the sock shop: the leaks are not dramatic, they are ordinary, and ordinary is harder to notice.

# 2. Your school or your employer

Now run it against a site you are more or less obliged to use. A school portal, the staff intranet, the platform your child logs into every single day.

The finding often looks similar to the webshop. The difference is the power relationship. You can stop shopping at a webshop. You cannot really stop your kid from using the homework platform the school chose. That is what makes this the uncomfortable case, and the useful one.

It is worth noting, gently, that a footer reading “we take your privacy seriously” and a login page that loads fourteen US trackers before you have typed anything are not mutually exclusive. They coexist constantly. The tool simply makes the second half visible.

# 3. Every school in your municipality

One report is an anecdote. Thirty reports is evidence.

This is what bulk mode is for. Give it a list of every school website in your municipality, set it running, and come back to a ranked overview: who leaks the least, who leaks the most, and which trackers show up across the whole set. Suddenly you are not complaining about one site, you are holding a comparison you can take to a school board, a municipal council, or a journalist who has been looking for exactly this kind of concrete number.

This is the BeLibre advocacy use case in miniature. Sovereignty arguments tend to drown in abstraction. A table that says “nine of the twelve schools in our town send pupil data to the same US vendor” does not drown in anything.

# 4. Cookies off, then cookies on

The last one is less a use case than a trick that turns the tool into an experiment.

Visit a site and refuse all cookies. Capture it. Visit the same site again, accept everything, and capture it a second time. Now compare the two reports.

What you are really testing is whether the consent banner does anything at all. Sometimes the “refuse” capture is reassuringly empty and the banner is doing its job. Sometimes the two reports are nearly identical, which means the tracking fired before you ever clicked, and the banner was decorative. The second result is more common than it should be, and it is precisely the kind of finding a data protection authority cares about.

To compare these two cases, have a look at the diff command in the README.md file. This one might be a bit more technical but if you wrestled yourself through the first exercises, this one will work out just fine.

Whichever case you run, the tool can render the report as plain terminal text, as HTML with hover tooltips, as JSON for feeding into something else, or as markdown for sharing. And it always reflects the session you actually captured, consent choices and all. It reports what happened, not what should have happened.

# Reading the results without scaring yourself

Here is the honest part. The tool hands you evidence. Evidence is not yet a conclusion, and the main risk in using it is reading it wrong in either direction: panicking at a red flag that turns out to be a vendor you already have a contract with, or shrugging at one that genuinely matters.

A bit of background closes that gap. You do not need a law degree, but a few things help:

The colour ratings measure what was observed on the wire. They are sensible heuristics, not a verdict from a lawyer.
There is a real difference between a piece of personal data (your IP, a hashed email), an identifier that links your visits together, and plain technical noise. They carry very different weight under the GDPR, and the report distinguishes them.
“Vendor in the US” is a sentence with legal consequences. It is shorthand for a body of law (FISA 702, the CLOUD Act, and the reasoning behind Schrems II) that can reach data held by US companies regardless of where the server sits. That is why a US flag in the report is not the same as a Belgian one.
CNAME cloaking is a tracker wearing a fake first-party moustache. A hostname like stats.yourschool.be looks local and trustworthy, then quietly resolves to a vendor’s infrastructure elsewhere. The tool unmasks it; ordinary blocklists and quick code reviews usually miss it.
Session replay and analytics are different instruments. One counts visits, the other records them like a camera over your shoulder. Only the recording drags a formal impact assessment (GDPR Article 35) behind it.
An unclassified host is not the same as a guilty one. It just means no detector module recognised it yet. In the sock-shop report, the four unclassified hosts were a marketing-automation SDK worth a closer look, a Google Analytics endpoint, the Apple Pay script, and the shop’s own sister site serving a product image. One of those deserves investigation. Three are noise. Telling them apart is the work, and a host showing up here is a good reason to open an issue so the next person does not have to guess.

Finally, the blind spots: The big one is server-side tagging: when a website collects data in the browser, sends it to its own server, and forwards it onward from there, the onward hop never touches the browser and the tool cannot see it. What looks clean on the wire can still be feeding a vendor out the back. The DNS and mail checks catch some of that indirectly, but a determined server-side setup stays invisible by design.

If a finding has you squinting, ask. The BeLibre online community is the place for it: real people, considerably fewer trackers than this paragraph.

# Try it on something you care about

So, if you’re the one buying socks, or selling them: this tool can help you better understand what’s going on right under your now and under the hood of your browser. Using the Leak Detector, trying it out merely takes a few minutes.

The code, the README, and the issue tracker: codeberg.org/BeLibre/Leak_Detector
Questions, findings, or help reading a report: the BeLibre Matrix channel
Stay up to date on the BeLibre activities? Follow us on Mastodon
Spotted an unclassified host worth recognising? New detector modules are how the catalogue grows. Proposals welcome.

Released under the GPLv3: ♥️ run it ♥️ study it ♥️ share it ♥️ improve it ♥️

BeLibre

Digital Autonomy