Yesterday, I finished the process of moving all my active repositories to my Codeberg page. For something like five years, I had been using GitHub (where, by the way, all my old repositories are still archived, but will no longer be updated or maintained as mirrors). I figured I'd write a post on tirimid.net about why I did this, how I did it, the difficulties I faced, and some opinions related to the whole ordeal.
There was one main factor that particularly irked me about GitHub and the direction it's been headed for the past few years. Namely, I didn't want all my content to be chewed and regurgitated as training fodder for Large Language Models (LLMs). Generally, my stance on AI has been quite anti- for a considerable while, and I do find it bothersome when my code and writing are used in this manner. While — obviously — I never expect to solve this problem, I still prefer to reduce its surface area as much as possible.
Previously, I used a static site setup of GitHub pages (for hosting) with
Cloudflare (for a custom domain). The compromising part of this setup — with
regards to LLM training — is GitHub. Cloudflare actually has quite good AI
crawl control, although I didn't realize that until quite recently. With a
Cloudflare-managed robots.txt and AI crawl control, the detected number of AI
crawlers accessing tirimid.net has noticeably decreased; the number went from
tens or dozens within a 24-hour period to low single digits. As of writing, only
one crawler (GPTBot) has tried to access this website in the last 24 hours. And,
by the way, it was rejected.
So, Cloudflare, well done on this matter. The same cannot be said for GitHub — which has not only not given users control over AI access, but has rather mandated its permission, as per the Terms of Servive that constantly, though not explicitly, make that clear. The ToS say things like: "You grant us and our legal successors the right to store, archive, parse [note the use of that word], and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time" (D.5); "You retain all moral rights to Your Content that you upload, publish, or submit to any part of the Service, including the rights of integrity and attribution. However, you waive these rights and agree not to assert them against us" (D.7); and so on. It's quite clever, to be honest, but reading these terms gives you the clear impression that they'll definitely be able to wiggle themselves out of any potential legal action regarding AI training. Oh, and also, GitHub Copilot is a thing, and obviously it's trained on GitHub users' code, whether they explicitly consented or not.
This fact means that textual content hosted on GitHub pages — even if not accessible to LLM crawlers directly through the internet — is probably read by GitHub, since they have direct access to your public repositories (the legal terms are a bit different on private ones), means that it's just a bad place to host things that you'd rather keep human eyes on.
In comparison, Codeberg has at least signalled its willingness to keep these crawlers/scrapers in check through traffic control and other means. Also, the non-profit nature of the organization, and its donation-oriented funding model is considerably less perverse (in terms of incentive) than GitHub, a for-profit private sub-corporation (part of Microslop).
Apart from that main factor, there's also the whole privacy invasion thing. GitHub is proprietary and part of the most wretched, invasive, capitalistically monopolistic organs that the Americans have ever created (again, Microslop). All this I oppose politically and personally. Anything that invades my privacy gets significant reconsideration — and I usually excise it from my life if an alternative exists (unfortunately, I haven't de-Googled as of now, but that's definitely on the prospective docket).
Basically, I hate everything that GitHub has come to stand for.
Maybe it is. I'm under no illusion that ditching GitHub for future development and blogging is a cure-all against the AI scourge and general invasion of privacy. We already live in the epoch of corporate "do first; beg for forgiveness later". However, short of significant, systemic revolution, this will not be resolved; the best we can do in the meantime is guard ourselves in whatever miniscule ways are possible.
There were some difficulties, but, honestly, I think I made it harder than I needed to. In particular, I had some considerable challenges with getting the tirimid.net domain to work with the Codeberg static site. But, before that, what did I do?
When you login to Codeberg, there's actually a "New migration" button behind the "+" at the top:
I literally just used this to migrate all my repositories (except for a few,
which I wouldn't even bother hosting online if I could avoid it, like school
projects). Among the last repositories I migrated was tirimid/tirimid.net (the
name of the GitHub repository, prior to its deletion). When I did, I
struggled (to be honest, it probably didn't help that I was at school when I
decided to do all this). First I couldn't figure out what to call it:
tirimid/tirimid-net? tirimid/pages? Something else? Do I make a pages
branch? When that settled, I just could not, for the life of me, figure out how
to set up the DNS from the Cloudflare dashboard. In the end, I went with this
setup:
| Record type | Name | Content | Proxy status | TTL |
| A | tirimid.net | 217.197.91.145 | DNS only | Auto |
| AAAA | tirimid.net | 2001:67c:1401:20f0::1 | DNS only | Auto |
| CNAME | www | tirimid.net | DNS only | Auto |
| TXT | tirimid.net | "tirimid.codeberg.page" | DNS only | Auto |
<revision>
My website went down for a little while, and I was curious why. Turns out, with the configuration I use (A, AAAA, TXT records), it becomes invalid when the Codeberg pages server changes IPs. I fixed this by setting the IPs to 217.197.84.141 and 2a0a:4580:103f:c0de::2. If you are having similar issues, try searching up the current Codeberg pages server IP, as it may have changed.
</revision>
I also had some trouble with the .domains file, and finally decided upon:
Keep in mind that it took this setup about 15-20 minutes to actually start working. I think one of the main reasons I was suffering so much — despite the above being a commonly suggested setup — is because I kept changing something, not giving it time to update or do whatever, and assuming it was failing. To be fair, many of the configurations I tried were probably invalid, so maybe I was justified? Well, it doesn't really matter anymore. It works now.
Principally, nothing. tirimid.net now runs on Codeberg pages rather than GitHub pages, I've elaborated some of my opinions, and hopefully you've enjoyed reading them. Thanks for reading — and what do you think?
This work by tirimid is licensed under CC BY-SA 4.0