Ideas for a federated anime tracker

asudox@lemmy.world · 6 months ago

Ideas for a federated anime tracker

muhyb@programming.dev · 6 months ago

Never go for user-submitted / approved by moderator system. I’ve been in one project like that years ago and it’s not gonna work.

I’ll make another comment once I get up.

asudox@lemmy.world · 6 months ago

Is there a good reason why? Even MAL takes user submissions.

muhyb@programming.dev · 6 months ago

It’s a slow system. MAL takes user submissions because they already have a big database and those submissions help filling the cracks. If you don’t have a database to begin with it slows things down, especially if these submissions will go by trust-based. Doing this by randoms is also very risky.

However, I’m not completely oppose to that idea because it can be helpful for some areas. My suggestion is start with a basic dog tag system, where the anime name, alternative names, status (airing, completed), season and start date, studios (also licensors and producers), age rating etc. These information needs to be scraped for the fastest way to form a quick database, they are publicly available (even on Wikipedia) so it should be fine to scrape. You can even go full Wikipedia after got only the names. User submissions could be useful for the introduction / summary parts of the titles at this stage. For only names (and basic tags), you can scrape AniDB from this list. It’s just a search query so shouldn’t be against their ToS.

You can also check Kitsu for ideas, I like their DB request system. Pretty basic but can be done differently with the power of ActivityPub.

asudox@lemmy.world · edit-2 6 months ago

I see. I can use the Jikan API to scrape animes and mangas which will take approximately 1-2 days after I get approval then. Oh and I forgot to mention, the federation part isn’t really how people think it will be I guess. The only federation that will be done will be the reviews, threads and the comments in them. With every anime/manga/vn, etc. being a new community which will contain those threads and reviews. And because of that, I don’t really know if this project is something people would want to self host. I guess I could provide full dumps of the database every month or something but I suppose that would be expensive. Then there are images as well, which will take hundreds of GBs easily even in compressed form.

muhyb@programming.dev · 6 months ago

Noticed the edit:

For hosting images, you can go alternatives like some Lemmy sites do: Mirror everything automatically to Internet Archive.

I think people would want to self host, because they will get all the anime/manga titles with their communities out of the box and can moderate their own sites while their users can react to any other community via federation.

asudox@lemmy.world · 6 months ago

Wouldn’t Internet Archive be a bit slow? And also, I don’t want to stress their servers.

muhyb@programming.dev · 6 months ago

That’s a noble concern. I just gave an example there since some communities do that but yeah it would be better if not done I guess.

Also mentioned on other reply but you can ask this to [email protected] and probably will get an optimized answer regarding that issue.

asudox@lemmy.world · 6 months ago

Thanks.

muhyb@programming.dev · 6 months ago

Didn’t know about Jikan API. After a quick look at their docs, I think it should be a steady source for scraping.

For features, can be done a lot with ActivityPub. Of course the most wanted features would be a watchlist / episode tracker (and possibly an importing from the lists people already have, I switched to Kitsu from MAL that way) but just thinking about federating the all anime/manga titles with basically their own communities out of the box sounds great. Good luck with the project!

asudox@lemmy.world · edit-2 6 months ago

Yeah, though I am not sure how the federated instance admins would react. I am planning for every anime, manga, vn, etc. to have their own communities. This means about over 100k communities being made in an instant. Maybe instead of creating the communities in an instant, creating them when user activity first happens would be more fit. But this would also restrict other platforms’ users being able to comment on never heard or new anime entries until someone from the anime tracking platform comments or reviews them.

Thanks btw.

muhyb@programming.dev · 6 months ago

Both have ups and downs. Assuming these lists will be in the code, do you have an estimation how big would that be? If you think they won’t strangle the code, just go with it. Something like storing them in JSON and loading them when needed could be better for optimization though.

You can also do some best of both worlds, like not creating the communities beforehand but make the titles searchable from the database open for all users. That might require a bigger traffic from hosting side though, but it should be OK since these will be spread to all self hosted communities.

I think you can also ask some of your questions to selfhosted communities.

asudox@lemmy.world · 6 months ago

What do you mean by “lists in the code”? Which lists and do you mean by “in the code” hardcoded?

iopq@lemmy.world · 6 months ago

I think moderator approval is a bit slow

asudox@lemmy.world · 6 months ago

It is, but I have no choice other than that if I can’t scrape websites.

iopq@lemmy.world · 6 months ago

Could be like anyone can post, but if it’s downvoted it’s hidden, Reddit style

asudox@lemmy.world · 6 months ago

That seems like a good idea. I’ll keep it in mind.

Rimu@piefed.social · 6 months ago

You might want to discuss this in one of the communities at https://ani.social.

asudox@lemmy.world · 6 months ago

I’ll crosspost there later.

gomp@lemmy.ml · 6 months ago

but this most likely is against the ToS of every anime tracking website

AFAIK scraping publicly accessible websites is fine in most countries (IANAL, look into it)

asudox@lemmy.world · 6 months ago

Ok cool, so even when that is out of the way, are the images copyrighted? Because I’d like to download the images and host them on my servers.

Fisch@discuss.tchncs.de · 6 months ago

Somehow those anime trackers are allowed to use the images too

asudox@lemmy.world · 6 months ago

Guess I can download and host them in the EU as this project won’t be commercial.

unknowing8343@discuss.tchncs.de · 6 months ago

If you did this for all audiovisual content and not just anime I’d be all in.

asudox@lemmy.world · 6 months ago

I’m pretty sure that I/someone could just fork this project once it’s done and make a few changes to it to make it a audiovisual content tracker.

Temperche@slrpnk.net · 6 months ago

I think the biggest challenge will be to standardize the scraping process since databases for audiovisual content are very differently structured. Will probably require database-by-database API access/negotiations so that you know how their databases are structured - completely FOSS audiovisual content databases are rare.

asudox@lemmy.world · 6 months ago

I can see that being a problem.

pedroapero@lemmy.ml · edit-2 6 months ago

I had a try ad Bookwyrm, which seems similar to what you intend to build. I was disappointed because the database itself is federated, meaning it’s full of duplicated entries.

asudox@lemmy.world · edit-2 6 months ago

Correct. I thought there was some way to prevent those duplicates because of the federation, but apparently there’s no such check being done so it’s all duplicates. For that reason I won’t put any effort into making the database federated. Only threads, reviews and comments will be federated. This might change as I make it though. There are also clubs in MAL which I might copy and implement in this project and they could be federated as well.

typhoon@lemmy.world · 6 months ago

Decentralizing the database in a federated structure for anime tracking is a very good idea. Right now I’m using Anitrend that is open source but is only a interface to Anilist.

Like you pointed I think the major challenge will be to establish a solid new shows database entries police(s). Not sure how we could manage that effectively. Governance will be key but you also don’t want to be a hostage of this project for your life.

One aspect to be taken in consideration is privacy. I think a lot of people would appreciate to have access to the new tracker without having to share anything about them.

asudox@lemmy.world · edit-2 6 months ago

Oh well, no. The database won’t be decentralized. That just invites chaos. Bookwyrm did that and now there are lots of duplicates in their library, which I definitely want to avoid. The things that will be federated are animes’ forums (as communities in Lemmy for example), and those communities will have threads that will be either a forum thread or a review which others can comment on from any federated platform. Some other things might get federated in the future.

I decided to just scrape either MAL alone or multiple sources.

If you have any ideas on how the database can be decentralized while efficiently avoiding duplicates and spam, please do say it.

Temperche@slrpnk.net · 6 months ago

https://www.kenmei.co/ may be close to what you want for manga.

asudox@lemmy.world · 6 months ago

No, have you read the body text? I am planning to create an anime/manga tracker, I’m not looking for one.

Temperche@slrpnk.net · 6 months ago

I was referring to that you could take ideas regarding UX/UI from it.

asudox@lemmy.world · 6 months ago

Oh sorry.