Scala compiler engineer for embedded HDLs by profession.

I also trickjump in Quake III Arena as a hobby.

  • 0 Posts
  • 23 Comments
Joined 1 year ago
cake
Cake day: June 13th, 2023

help-circle

  • Right. The legality of just recording everything in a room, without any consent, is already incredibly dubious at best, so companies aren’t going to risk it. At least with voice dictation or wakewords, you need to voluntarily say something or push a button which signifies your consent to the device recording you.

    Also, another problem with the idea of on-device conversion to a keyword that is sent to Google or Amazon: with constant recording from millions of devices, even text forms of keywords will still be an infeasible amount of data to process. Discord’s ~200 million active users send almost a billion text messages each day, yet Discord can’t use algorithmic AI to detect hate speech from Nazis or pedophiles approaching vulnerable children — it is simply far too much data to timely process.

    Amazon has 500 million Amazon Echo’s sold, and that’s just Amazon. From an infrastructure-standpoint, how is Amazon supposed to deal with processing near 24/7 keyword spam from 500 million Echo devices every single day? Such a solution would also have to be, in theory, infinitely scalable as the amount of traffic is directly proportional to the number of devices sold/being actively used.

    It’s just technologically infeasible.


  • Do you have any evidence for this claim? Voice recognition and processing is very power and energy intensive, so things like power consumption and heat dissipation should be readily measurable; especially if an app like Google or Amazon is doing it on an effectively constant basis.

    Keywords are being sent to Google — have you sniffed this traffic with Wireshark and analyzed the packets being sent?

    Phones have dedicated hardware for voice processing, true, but that’s when you voluntarily enable it through voice dictation or train it with very specific and optimally chosen key phrases (“Okay Google,” “Hey Siri,” …). For apps that otherwise allegedly listen to voice audio constantly, they would need to be utilizing his hardware continuously and constantly. Do you have any evidence that apps like Google continuously utilize this hardware (knowing that it is a power intensive and heat-inducing process?)

    I’m not trying to argue in bad faith. As an engineer, I’m having trouble mentally architecting such a surveillance system in my head which would also not leave blatantly obvious evidence behind on the device for researchers to collect. These are all the questions that I naturally came up while thinking of the ramifications of your statement. I want to keep an open mind and consider the facts here.



  • What was your methodology? Are you absolutely sure you eliminated all variables that would signal to Google that you were needing whatever you were talking about? Maybe you were talking in the car with your wife about buying something, and she decided to look up prices for it on Google, which then triggered their algorithms to associating that thing-of-want with her identity, and then associated that thing-of-want with your identity since it likely knows you two are married.

    Mitchollow tried to demonstrate exactly what you’re claiming with a controlled experiment where he would prove that Google would listen in to him saying “dog toys” without him clicking on or searching anything related to dog toys beforehand. What he failed to realize was that:

    1. He livestreamed the whole thing to YouTube, the conspirators he claimed were listening to him in the first place, so they were already processing his speech (including him claiming his need for dog toys repeatedly) and likely correlating all of that data to his identity
    2. He directly clicked on a (very likely coincidental, or due to data collected by #1) ad corresponding to his phrase of choice (“dog toys”), triggering the algorithm to exclusively show him dog toys on all other pages that used AdSense.

    After these flaws were pointed out, he admitted the test was effectively worthless and retracted his claims. The point here is it’s important to eliminate all variables that could lead to confirmation bias.

    I’ve had other similar stories of friends allegedly receiving ads after saying specific keywords. Probably one of the best ones to demonstrate that this entire notion is silly was an avid Magic: The Gathering player getting surprised that he received MTG ads after talking about MTG to his MTG playing friends. He was spooked out and claimed that Amazon was listening to his everyday speech.


  • the point of fluffy blanket is to show you an ad for fluffy blankets, so it can be poorly trained and wildly inaccurate

    Doesn’t that defeat the whole purpose of listening for key ad words?

    “We’ll come up with dedicated hardware to detect if someone says “fluffy blankets,” but it’s really poor at that, so if someone is talking about cat food then our processing will detect “cat food” as “fluffy blanket,” and serve them an ad for fluffy blankets. Oh wait… that means there’s hardly any correlation between what a person says and what ads we serve them. Why don’t we just serve them randomized ads? Why bother with advanced technology to listen to them in the first place?”







  • In my personal experience this is blatantly untrue, because now I can’t even log into my Google/YouTube account on Librewolf anymore. I get a prompt saying “this browser may be insecure” and requesting that I use Chrome instead. This is exactly what the Web Environment Integrity API was intended for — maybe they did decide to shelve it for general use, but Google is still absolutely trying to push this bullshit for their own services.

    I never had this issue for the past 2 years I’ve used Librewolf until, coincidentally, Google “decided” to “sunset” its browser DRM.






  • OP’s “evidence” is that Kagi internally uses Sentry.io (a FOSS crash report aggregation service for developers) to report crash logs, which they then use to assert that Kagi is aggregating personal data and sending that data to Sentry. The “proof” is that they used an Android tool that reports whether an APK contains specific Java classes whose fully qualified names match a “tracker” name filter (which, coincidentally, cherry-picks Sentry.io as a tracker), runs it on some completely irrelevant Android APK, and then concludes that because these classes are showing up with their cherry-picked filter, Sentry.io is a tracker, ergo Kagi is tracking personal data. Q.E.D.

    In short, it’s complete nonsense. I did a thorough debunking of their methodology in a previous comment of mine. You can safely ignore anything they have to say.


  • Anonymity is not the same as privacy, because the latter fundamentally entails a measure of trust between two parties over the control of personally identifying information. Note that this is contingent on whether that personal information is exchanged.

    In the situation you described, privacy is irrelevant in either case, whether you access a SearXNG instance with a VPN/Tor or use a pseudonym and Monero payments to access Kagi, because no personal information was exchanged in the first place.

    The “privacy” in both situations then becomes how difficult it is for a bad actor to deanonymize you, which comes down to whether you can trust that the VPN service you’re using isn’t logging your traffic and the email service your pseudonym is on won’t just give up your data… or whether Tor isn’t being actively deanonymized via malicious exit nodes controlled by certain three-letter government agencies. This isn’t a fault on either search engine, IMO.