Hey everyone,

This isn’t an announcement, just wanted peoples thoughts on this.

I think everyone knows searching the fediverse can be better. Googling doesn’t work too well, etc. So I wanted to do my part and help out.

Indexing all posts, etc is quite a lot to handle, so I wanted to start small and just focus on video search. I’ve started indexing videos from Peertube and other video websites. (Even YouTube but this could be removed to just focus on independent sites)

I know Peertube has their own search engine for videos. I will be reaching out to them. Compared to my site I’m planning it’ll have other video sources and be easier to use.

So that leads to feedback from you guys.

  • What do you think about indexing videos posted on the fediverse and other independent platforms?
  • Are there similar services?
  • Am I just wasting my time?
  • gabe [he/him]@literature.cafe
    link
    fedilink
    English
    arrow-up
    32
    arrow-down
    8
    ·
    edit-2
    11 months ago

    Well, please make sure it respects post privacy at least but also realize that on the microblogging side of the fediverse, they may not take kindly to this prospect at all. People who start these kinds of projects are often harassed or at least receive passive hostility. Making it opt in instead out of opt out in some capacity is best.

    • Scrubbles@poptalk.scrubbles.tech
      link
      fedilink
      English
      arrow-up
      32
      arrow-down
      3
      ·
      edit-2
      11 months ago

      I disagree. Post privacy sure, but the internet is by definition public. Anything you put out there can be used for pretty much everything, the original rules of the internet apply. I’d be happy to see an easy opt out on the engine to remove yourself, but if everything is opt in it’ll never get off the ground.

      • TimLovesTech (AuDHD)(he/him)@badatbeing.social
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        4
        ·
        11 months ago

        As the fediverse is almost exclusively run by volunteers that are paying server bills and being admins, I could see some larger instances not taking kindly to this, especially depending on how much stress it would be putting on some already at capacity servers.

        • loobkoob@kbin.social
          link
          fedilink
          arrow-up
          15
          ·
          11 months ago

          Ideally, OP’s crawlers will just come from their own instance that other instance owners can defederate from if they want to opt out.

            • Scrubbles@poptalk.scrubbles.tech
              link
              fedilink
              English
              arrow-up
              3
              ·
              11 months ago

              That’s a good idea. Listen to public data being broadcasted out, then you aren’t worrying people with scraping or anything. It would only be from go live onward, but you would just be listening to the protocol.

        • TrickDacy@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          11 months ago

          How much bandwidth do you suppose a crawler would use? I’d guess very little

      • gabe [he/him]@literature.cafe
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        4
        ·
        11 months ago

        That’s not how the fediverse functions and approaching it that way is a problem waiting to happen. I’m stating so as a warning to be mindful of the culture of the way the fediverse itself functions. This is not Reddit, we share the fediverse with other software with different uses and features and we need to be mindful of that especially when building these kinds of tools. Making it opt out not only places a burden on smaller instances but presents a potential harassment risk for instances with vulnerable people on other fediverse platforms. As well, it is contrary to the entire way specific other activitypub instances operate. The fediverse is like a city we share with others, if Lemmy is not mindful of that city’s culture then people will promptly give them the boot.

        I’m not saying user by user opt in either, but instance by instance. Lemmy needs a tool of archiving especially. There is already cultural clashes I see occurring with the rest of the fediverse. Post like these of potential tools when it seems like the creator doesn’t know the messy history behind previous projects like them in the fediverse make me fearful of the clashes coming to fruition.

        • lautan@lemmy.caOP
          link
          fedilink
          English
          arrow-up
          12
          ·
          11 months ago

          Well that’s why I’m asking for input. And I won’t launch this on every instance without letting them know. Baby steps.

          • Kierunkowy74@kbin.social
            link
            fedilink
            arrow-up
            2
            ·
            11 months ago

            Mastodon since 4.2 version supports allows its users to opt into appearing into search results. Just respect this flag with Mastodon users, and you will be fine, IMHO

          • gabe [he/him]@literature.cafe
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            4
            ·
            edit-2
            11 months ago

            My matrix is open if you want/are actually interested in doing this in a way that won’t make the rest of the fediverse flip shit. I support this tools creation especially for lemmy, but if it isn’t done the right way it’ll be received poorly. Making it behave differently on lemmy compared to other software as well might be an idea too.

        • Scrubbles@poptalk.scrubbles.tech
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          edit-2
          11 months ago

          But ActivityPub already publishes all of the data out. I don’t think this is going out to servers asking for data, it’s listening to public data being broadcasted out. If people are broadcasting over activitypub then they’re okay with it being shared.

          If they don’t want it shared then they don’t have to publish ActivityPub to anyone. They can defederate from the search federation. Those tools already exist.

        • 0x1C3B00DA@kbin.social
          link
          fedilink
          arrow-up
          0
          arrow-down
          1
          ·
          11 months ago

          That’s not how the fediverse functions

          That is how the fediverse functions. Instances send posts to anyone who request it, unless a block is in place. ActivityPub is opt-out and the web has always worked this way.

          be mindful of the culture

          There is no “the culture” on the fediverse. Your talking about a subgroup, which has a different opinion from other subgroups. They don’t get to define “culture” on the fediverse.

      • Skull giver@popplesburger.hilciferous.nl
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 months ago

        Technically speaking, little is stopping anyone from doing this.

        I’m not sure what you’d gain by this, though. You just end up being flagged as a malicious bot and blocked by most instances, and probably hated by most of your audience.

        Post privacy is mostly “please be nice and respect my wishes”. Not doing that is just dick thing to do.

        Also, depending on how the search engine is set up (personal project? independent organisation? receiving donations?) this could end up falling afoul to the GDPR and other data protection laws.

        Unless you’re in it to make money by selling data, getting rid of the goodwill people have for useful Fediverse services just sounds counterproductive. The whole Fediverse network is a privacy compliance nightmare (why do you think Threads is doing one-way ActivityPub traffic at the moment?) and the only reasons half the servers aren’t shut down or fined is that most people don’t dislike any of them enough to take proper legal action.

        • Scrubbles@poptalk.scrubbles.tech
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 months ago

          again it’s not going to servers and scraping data, it would be sitting somewhere receiving public data that is pushed out. There’s no malicious getting around privacy settings, if it’s pushed out then it’s free game. I agree about post privacy, but again activitypub already takes care of that