30% Of The Worlds Biggest Websites Track Visitors Using Web Browser Fingerprinting
Many of the worlds biggest websites have begun using web browser fingerprinting techniques as a means to track users even if their web browsers have a strict cookie policy. A combination of screen resolution, graphics card capabilities and other device-specific information often enough to create a unique fingerprint that can be used to track you across the web. The larger the site, the more like it is that it tries to track you using dirty tricks.
written by 林慧 (Wai Lin) 2020-09-02 - last edited 2020-09-03. © CC BY
A heavily obscured piece of JavaScript used to do web browser fingerprinting.
A research-paper titled "Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors", created as Fpinspector-sp2021.pdf on a macOS 10.15.1 laptop on August 4th, reveals that the more traffic and revenue a website has, the more likely it is that it uses stealthy very invasive tracking techniques to monitor its visitors.
The paper, written by Umar Iqbal of The University of Iowa, Zubair Shafiq of the University of California and Steven Englehardt from the Mozilla Corporation, claims that 30% of the worlds top 1000 websites use dirty immoral web browser fingerprinting techniques to track their visitors. That figure drops to 7.7% in the bottom half of the top hundred thousand.
Rank | Websites (count) | Websites (%) |
---|---|---|
1 to 1K | 266 | 30.60% |
1K to 10K | 2,010 | 24.45% |
10K to 20K | 981 | 11.10% |
20K to 50K | 2,378 | 8.92% |
50K to 100K | 3,405 | 7.70% |
1 to 100K | 9,040 | 10.18% |
These numbers do include some examples of sites who, according to our brief close-up inspection of the data, look very much like they are false positives. The numbers are a very good indicator of how wide-spread this problem is even though they are slightly higher than what they should be.
Gathering and using information specific to someones web browser to track them is not a new concept. It has been possible to find out what what JavaScript and graphics capabilities present, what screen resolution is used and a lot of other things when someone visits a website for quite some time. A few of the bigger websites begun tracking and spying on users using browser fingerprinting techniques years ago. The shear amount of top 10000 Internet websites who have begun participating in this pure evil is a new development. Stealth tracking has never been as bad as it is today and it is getting worse as "small tech" races to archive the same level of sophisticated surveillance "big tech" is (ab)using.
Why Web Browser Fingerprinting Has Become Such A Huge Problem[edit]
When Rome was still a functioning republic Roman judges would always ask the simple questions:
Latin | English |
---|---|
Cui Bono? | To whom is it a benefit? |
Cui Prodest? | Whom does it profit? |
Answering those questions is easy once you have the necessary background and context.
Big web browser vendors have begun removing support for third party web browser cookies, causing a lot of frustration in the ad-tech industry. Apple has removed support for third party cookies from their Safari web browser product, recent Mozilla Firefox versions are restricting them and Chrome and Chromium-based browsers stopped supporting third party cookies over HTTP in Chromium version 84 (third party cookies set using HTTPS will be removed from Chromium in the future).
"Digital content is to a large degree funded by advertising, which means that rather than paying for services with money, companies monetize our behaviour, attention and personal data."
published January 14th, 2020
The removal of third party cookie support in major web browsers is a problem for both the ad-tech industry and large website publishers (mostly one and the same when you look at the top 100 websites) because targeted advertising is a lot more profitable than contextualized advertising. Tracking someone across the web is very easy if you can set a third party cross-site cookie and use that to identify someone on any website with JavaScript provided by you. Targeting a specific users without cross-site cookies requires some kind of alternative solution. It would appear that web browser fingerprinting is, in the eyes of the ad-tech industry, the answer.
Fraud prevention is also a somewhat valid concern for the ad-tech industry. Click-fraud is quite rampant.
Measuring The Problem[edit]
The "Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors" (Fpinspector-sp2021.pdf) paper describes a tool called FP-INSPECTOR, a "machine learning based syntactic-semantic approach to accurately detect browser fingerprinting". They have not published the complete source code for this tool. The researches used a fork of the OpenWPM web privacy measurement framework to get script execution traces from the web pages they scraped. They have setup a GitHub page at github.com /uiowa-irl/FP-Inspector with their OpenWPM fork but not much else. It is therefore impossible to easily reproduce and verify their results.
The researchers have published a 2.4 MiB file named fingerprinting_domains.json on GitHub with a list of links to scripts they claim are used for web browser fingerprinting and the sites they are used on. A close-up inspection of the sites and scripts in that file shows that the vast majority of the scripts they have identified really are used for web browser fingerprinting. A few sites are much likely false positives: There are a few sites with games written in JavaScript and a few sites with video players on that list that seem to use JavaScript for completely harmless and totally legitimate purposes. It is quite understandable that a game would use WebGLContextEvent
and WEBGL_draw_buffers
.
The researches data appears to be mostly accurate even though it would appear that the false positive rate is a bit higher than the researchers claim. It is safe to conclude that a lot of the top 1000 websites are using web browser fingerprinting even if the papers 30.60% figure is off by a few percent.
Government Complicity[edit]
"Fascism should more appropriately be called Corporatism because it is a merger of state and corporate power"
It is interesting to note that a lot of the sites the researches have correctly identified as using web browser fingerprinting scripts belong to the American government. Those sites include the fbi.gov
(Democratic Party spyring), weather.gov
(National Weather Service), uspto.gov
(US Patent Office), nhtsa.gov
(National Highway Traffic Safety Administration), irs.gov
(The Internal Revenue Service), sec.gov
(US Securities and Exchange Commission). All of those sites embed the exact same JavaScript spyware libraries provided by an American outfit called ForeSee who describe themselves as providing products that lets their clients "Listen to all customer signals — across web, mobile, location, and contact center — then connect data for deeper insights".
Creative Techniques[edit]
The research paper identified several new ways of doing web browser fingerprinting. Researchers have long known that is possible to uniquely identify someone using WebRTC, canvas properties, font availability and AudioContext properties. The researches discovered that several sites are also using less-known metrics like keyboard layout (getLayoutMap
), residual data in the browsers cache (using the Performance API), browser permissions (for things like Notifications, Geolocation and Camera API), the presence of peripherals and sensors (gamepads, proximity sensors, etc) and browser and platform-specific differences in API behaviour (AudioWorklet
, setTimeout
, etc) to identify users across the web.
Many of the scripts used to do web browser fingerprinting are heavily obscured. Good luck trying to make sense of this piece of code:
function(_0x38108e,_0x25f99e){var _0x5138c2=function(_0x40649a){while(--_0x40649a){_0x38108e['\x70\x75\x73\x68'](_0x38108e['\x73\x68\x69\x66\x74']());}};_0x5138c2(++_0x25f99e);}(_0xd604,0x15b));var _0x4d60=function(_0x4f0c2d,_0x181af7){_0x4f0c2d=_0x4f0c2d-0x0;var _0x5e4665=_0xd604[_0x4f0c2d];if(_0x4d60['\x69\x6e\x69\x74\x69\x61\x6c\x69\x7a\x65\x64']===undefined){(function(){var _0x2e8507=Function('\x72\x65\x74\x75\x72\x6e\x20\x28\x66\x75\x6e\x63\x74\x69\x6f\x6e\x20\x28\x29\x20'+'\x7b\x7d\x2e\x63\x6f\x6e\x73\x74\x72\x75\x63\x74\x6f\x72\x28\x22\x72\x65\x74\x75\x72\x6e\x20\x74\x68\x69\x73\x22\x29\x28\x29'+'\x29\x3b');var _0x5dc16e=_0x2e8507();
Some scripts are easier to read. It is easy to see that the American NoFraud corporation, who sell "Real-time Fraud Screening for eCommerce", like to identify website visitors by the audio formats their web browser supports.
function(){var e=['video/mp4; codecs="avc1.42c00d"','video/ogg; codecs="theora"','video/webm; codecs="vorbis,vp8"','video/webm; codecs="vorbis,vp9"','video/mp2t; codecs="avc1.42E01E,mp4a.40.2"'],t=["audio/mpeg",'audio/mp4; codecs="mp4a.40.2"','audio/ogg; codecs="vorbis"','audio/ogg; codecs="opus"','audio/webm; codecs="vorbis"','audio/wav; codecs="1"'],n=function(e){for(var t={},n=0;n<e.length;n++){var r=e[n];window.MediaSource?t[r]=window.MediaSource.isTypeSupported(r):window.WebKitMediaSource&&(t[r]=window.WebKitMediaSource.isTypeSupported(r))}return t};
return{audio:n(t),video:n(e)}}
Knowing what audio formats a web browser supports is not enough for the NoFraud corporation. They are also very curious about the web browsers Canvas support:
function(){
var e,t;
try{e=document.createElement("canvas"),t=e.getContext("2d")}catch(n){}return t?(t.fillStyle="red",t.fillRect(30,10,200,100),t.strokeStyle="#1a3bc1",
t.lineWidth=6,t.lineCap="round",t.arc(50,50,20,0,Math.PI,!1),t.stroke(),
t.fillStyle="#42e1a2",t.font="15.4px 'Arial'",t.textBaseline="alphabetic",
t.fillText("PR flacks quiz gym: TV DJ box when?â˜",15,60),t.shadowOffsetX=1,
t.shadowOffsetY=2,t.shadowColor="white",
t.fillStyle="rgba(0, 0, 200, 0.5)",t.font="60px 'Not a real font'",
t.fillText("No骗",40,80),i(e.toDataURL())):null}
The NoFraud script will also check if WebGL is supported and identify users by the way WebGL behaves:
function(e,t){
var n="attribute vec2 attrVertex; varying vec2 varyinTexCoordinate; uniform vec2 uniformOffset; void main() { varyinTexCoordinate = attrVertex + uniformOffset; gl_Position = vec4(attrVertex, 0, 1); }",r="precision mediump float; varying vec2 varyinTexCoordinate; void main() { gl_FragColor = vec4(varyinTexCoordinate, 0, 1); }",o=e.createBuffer();
e.bindBuffer(e.ARRAY_BUFFER,o);
var a=new Float32Array([-.2,-.9,0,.4,-.26,0,0,.732134444,0]);
e.bufferData(e.ARRAY_BUFFER,a,e.STATIC_DRAW),o.itemSize=3,o.numItems=3;
var l=e.createProgram(),c=e.createShader(e.VERTEX_SHADER);e.shaderSource(c,n),e.compileShader(c);
var u=e.createShader(e.FRAGMENT_SHADER);
return e.shaderSource(u,r),e.compileShader(u),e.attachShader(l,c),
e.attachShader(l,u),e.linkProgram(l),e.useProgram(l),
l.vertexPosAttrib=e.getAttribLocation(l,"attrVertex"),
l.offsetUniform=e.getUniformLocation(l,"uniformOffset"),
e.enableVertexAttribArray(l.vertexPosArray),
e.vertexAttribPointer(l.vertexPosAttrib,o.itemSize,e.FLOAT,!1,0,0),
e.uniform2f(l.offsetUniform,1,1),e.drawArrays(e.TRIANGLE_STRIP,0,o.numItems),
i(t.toDataURL())}
The NoFraud browser fingerprinting script will, additionally, check what web browser extensions and plugins are present, the battery status (if available) and several other things.
Web browser fingerprinting scripts do not try to track users using just one or two or three methods. The majority collect dozens of data-points in order to create a profile.
NoFraud is just a random example, we are not picking on their script because they are unusually invasive. A whole lot of other actors use even more invasive fingerprinting techniques. The sole reason we decided use NoFraud as an example is that JavaScript code is simple and readable and just 13405 bytes large. Who knows what a 200 KiB large JavaScript file filled with functions like this one actually do:
function(_0x40649a){
while(--_0x40649a){
_0x38108e['\x70\x75\x73\x68'](_0x38108e['\x73\x68\x69\x66\x74']());
}
}
A scarily large percentage of the commonly used tracking scripts have been processed by some obscurification tool.
Resistance Is Not Entirely Futile[edit]
Entertainer Alex Jones tried to warn you about government tracking in his movie "9/11: The Road To Tyranny" released in 2002.
The maintainers of the EasyList easyprivacy.txt filter list for web browser extensions like Ublock Origin and filter features built into web browsers like Falkon have added the script files identified by the research paper to their blocklist. Those who maintain other browser filter lists, and HOST file blacklists, will likely follow suit.
Ublock Origin can be added to Mozilla Firefox and all the Chromium-based browsers. The EasyPrivacy list is enabled in uBlock Origin v1.29.2 by default. The ad-blocker built into the Falkon enables the "primary" EasyList filter list by default but does not enable EasyPrivacy by default. Falkon-users have to manually add and enable it.
TIP: Ublock Origin has a </> button in the pop-up menu that appears when you click the fine red UB button. The </> button lets you disable JavaScript on a per-site basis. JavaScript can also be disabled using NoScript and similar web browser extensions. Disabling JavaScript will magically disable fingerprinting scripts, cookie warnings and a large percentage of all advertisements on the modern web.
Disabling JavaScript on this site will remove some buttons from the page editor. Everything else of importance will work just fine. The Google AdSense advertisements we subject international users and EU users dumb enough to accept the tracking/cookie warning to will not show up when JavaScript is disabled. Most smaller sites are like this one, they work just fine when JavaScript is disabled. How well sites work with JavaScript disabling is variable, too many bigger sites like YouTube and Instagram are either broken or partially broken if JavaScript is disabled. |
Using blocklists will help but it will not prevent or stop all the web browser fingerprinting the modern web will subject you to. It is an arms-race, the ad tech industry will come up with new and clever ways to identify and track you, those who maintain web browsers and filter lists will adapt and the ad tech industry will react by coming up with even more ways to unique identify web surfers.
Regular average people who live regular conforming lives and don't care about or know much about technology do not stand a chance. Average people tend use the default web browser on whatever device they happened to buy with default settings. The pervasiveness of web browser fingerprinting ensures that those people will be tracked, they end up with shadow profiles at all the big and small ad-tech companies and they will be bombarded with targeted advertising even if every single web browser vendor disables cross-site cookies. And tech-savvy GNU/Linux users are not that much better off, using a web filter like Ublock Origin helps but it is not a magic bullet.
Enable comment auto-refresher
WaiLin
Permalink |