There’s been a lot of talk about Meta’s new twitter clone called Threads because it will federate with other ActivityPub apps. I’ve seen several posts about them possibly using the app as a way to embrace, extend, and extinguish ActivityPub.
Another more immediate concern I have is that Meta will now be able to harvest data from users of other ActivityPub social networks like Lemmy and Mastodon. If Alice on Threads follows Bob on Mastodon for example, that means Bob’s mastodon instance will send information about all of Bob’s posts and everyone who interacts with them to Meta so that Alice can see it.
This is a concern specifically with Meta and other big tech companies running ActivityPub-enabled servers, because their primary motive is to harvest user data to use for advertising. The scariest part to me is that users on networks like Mastodon specifically migrated to Mastodon to get away from big tech, and Meta is still able to harvest their data with Threads.
Meta doesn’t need threads.net to harvest data from ActivityPub instances. In fact, literally anyone can do it without much effort. You’re not protecting your data by being here. It’s as public as old phpbb forums used to be even more so. That bit is a non issue (well, it is an issue…)
My issue is just with the quality of content from meta owned instances and what it’ll bring to other federated instances.
“Protecting your data” is probably a nonsense thing to say. You can’t at the same time make a post public and “protect” it from someone copying, reading, or learning from it.
It is theoretically possible to publish in public while protecting information but it requires the added step of maintaining authorization for those you wish to see that data.
You could create a protocol where poster info is encrypted and only the appropriate parties are given the keys to access that information. To the general public it’s posted by an anonymous user.
The question to ask is who really cares that much about your posts? Use a VPN and make an alt account and Bob’s your Uncle.
There’s also the question of the value of that metadata. If something like an email address or phone number was tied to your account in a publicly available fashion that would be one thing as those are relatively unique and can be used to tie back to an individual person. But if all you’re getting is a username with nothing else attached that’s a lot more nebulous. If you care to obfuscate your activity you could easily just create an account with a name you’ve never used before. That’s probably far more effective at preventing your “data” from being harvested by meta than some kind of encryption scheme would be unless you’re willing to go all in on E2E encryption which creates a ton of other problems that would need to be solved.
Exactly. But you can now get burner numbers much easier so obfuscating the meta data outside the service is getting easier. Not perfect but easier.
If you are doing things where you need full on clean room level obfuscation you can never use any type of service outside of your full control. At that point you’re talking one time pads and posting messages in the wanted ads of news papers.
I’ve always assumed every part of the internet to be tracked and logged somehow. What I take issue with is the use of that data to shove targeted ads down my throat at every opportunity.
I don’t care about data harvesting. This stuff’s public.
What I don’t want is my community that I joined to be impacted by a larger community filled with bigots, fascists, the hateful. That’s going to be my cue to exit. So I hope that the community I’m a part of values me and others like me more than tolerating the above for the sake of growth or whathaveyou.
Eh any small instance like these aren’t into the growth at all costs thing because we have to shell out of the costs for hosting more content.
So as long as you don’t join a corpo instance you shouldn’t have that issue hopefully.
This is from the first few hours of Threads, you’re clearly right to be concerned:
and a link, since the embedded images from kbin apparently don’t show on lemmy
🎶 who let the trolls in! (Who, who-who-who)🎶
If Meta, Reddit, Twitter, etc. aren’t already harvesting all of the fediverse it’s only because they don’t see it big enough to be of value.
It’s a trivial task.
Since pubfed is open, it will be relatively impossible to avoid outside actors harvesting the content. All they would need to do is deploy a custom instance and that way they would have access.
I think this will be an interesting struggle as it evolves, protecting against a giant like Meta who won’t hesitate to spend serious money if it suits them.
Or simply query the API from an existing instance, like third party clients do. Though I suppose then it’d have to account for rate limits.
Anyone can set up a Lemmy instance, write a small script/bot to find and follow all the communities on all the instances in the Fediverse and store all that data. It’s not even hard, maybe a day of work for a proof of concept if you start from zero. (Then you have to figure out how to scale it properly, how to detect you’re getting defederated and how to change domains to restart without the defederations. Maybe a week’s worth of effort.)
Threads would be way overkill to achieve this goal. You don’t need any users. You don’t want any users. Just your one account that follows everything.
Edit: or you can just set up a web crawler like Google Search uses to find and store all the data you’re looking for, you don’t necessarily to be federated / use ActivityPub
I think it would be interesting to explore an API affordance for attaching licenses to fediverse content, with the admins being able to set a default license for their server.
So if Meta wants to get content from the fediverse, it has to check the headers of each post and make sure it’s licensed for commercial use.