- cross-posted to:
- technology@lemmy.world
- cross-posted to:
- technology@lemmy.world
Even if you have encrypted your traffic with a VPN (or the Tor Network), advanced traffic analysis is a growing threat against your privacy. Therefore, we now introduce DAITA.
Through constant packet sizes, random background traffic and data pattern distortion we are taking the first step in our battle against sophisticated traffic analysis.
Good for Mullvad. Long overdue, as they say. Wish I could still be a customer. Whoever induced them to turn off port forwarding did the world a disservice.
The possibility that some of those bad customers might’ve had the specific objective of permanently crippling the service as they successfully did is often under-appreciated.
Oh, I know AI can’t identify me because I constantly get banned from my accounts by machine learning algos
Anyone have an eli5 explanation of how AITA works? What patterns could be captured and how would that lead to identification or data siphoning?
One example:
By observing that when someone visits site X, it loads resources A, B, C, etc in a specific order with specific sizes, then with enough distinguishable resources loaded like that someone would be able to determine that you’re loading that site, even if it’s loaded inside a VPN connection. Think about when you load Lemmy.world, it loads the main page, then specific images and style sheets that may be recognizable sizes and are generally loaded in a particular order as they’re encountered in the main page, scripts, and things included in scripts. With enough data, instead of writing static rules to say x of size n was loaded, y of size m was loaded, etc, it can instead be used with an AI model trained on what connections to specific sites typically look like. They could even generate their own data for sites in both normal traffic and the VPN encrypted forms and correlate them together to better train their model for what it might look like when a site is accessed over a VPN. Overall, AI allows them to simplify and automate the identification process when given enough samples.
Mullvad is working on enabling their VPN apps to: 1. pad the data to a single size so that the different resources are less identifiable and 2. send random data in the background so that there is more noise that has to be filtered out when matching patterns. I’m not sure about 3 to be honest.
Thanks very much, I believe I understand that part now, like a fingerprint to associate to site components like pulled in js, css, etc. I still don’t understand, though, how they associate that to a particular user of a VPN. Does each request done through a VPN include some sort of identifier for each of us or is AI also doing something to put these requests in a particular user’s bucket?
I think it was more targeting the client ISP side, than the VPN provider side. So something like having your ISP monitor your connection (voluntarily or forced to with a warrant/law) and report if your connection activity matches that of someone accessing a certain site that your local government might not like for example. In that scenario they would be able to isolate it to at least individual customer accounts of an ISP, which usually know who you are or where to find you in order to provide service. I may be misunderstanding it though.
Edit: On second reading, it looks like they might just be able to buy that info directly from monitoring companies and get much of what they need to do correlation at various points along a VPN-protected connection’s route. The Mullvad post has links to Vice articles describing the data that is being purchased by governments.
lol. and yet they wont support multi hop on Android. how quaint.