15 KiB
Arcology Project Scaffolding
Dev Environment
Python Project
The pyproject.toml
file is slowly starting consume all of the different configuration files a Python project needs. that's nice.
[project]
name = "arcology"
version = "0.0.1"
description = "org-mode metadata query engine, publishing platform, and computer metaprogrammer"
# license = "Hey Smell This"
readme = "README.md"
dependencies = [
"django ~= 4.2", "django-stub", "django-prometheus",
"click ~=8.1", "polling", "arrow ~= 1.3.0", "gunicorn ~= 21.0", "htmx ~= 1.17",
"arroyo"
]
requires-python = ">=3.10"
authors = [
{ name = "Ryan Rix", email = "code@whatthefuck.computer" }
]
[project.scripts]
"arcology" = "arcology:django_manage"
[tool.setuptools]
package-dir = {"" = "."}
[tool.setuptools.package-data]
arcology = [
'settings/sites.json',
'static/arcology/js/*',
'static/arcology/css/*',
'static/arcology/fonts/*',
'templates/arcology/*',
'templates/*',
]
sitemap = [
'static/sitemap/js/*',
'static/sitemap/css/*',
'templates/sitemap/*',
]
[tool.setuptools.packages.find]
where = ["."]
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
Nix package for the service
nix build
will spit out a python project that can be used in a NixOS definition. now where would we get one of those…? It's marked with licenses.unfree
right now because I don't think Hey Smell This will pass the OSI sniff-test.
{
pkgs ? import <nixpkgs> {},
lib ? pkgs.lib,
python3,
arroyo_rs,
}:
python3.pkgs.buildPythonPackage rec {
pname = "arcology";
version = "0.0.1";
format = "pyproject";
src = ./.;
nativeBuildInputs = with pkgs; [];
propagatedBuildInputs = (with pkgs; [
arroyo_rs
]) ++ (with python3.pkgs; [
arrow
click
django_4
django-prometheus
django-htmx
(django-stubs-ext.override { django = django_4; })
(django-stubs.override { django = django_4; })
gunicorn
polling
setuptools
]);
passthru.gunicorn = python3.pkgs.gunicorn;
meta = with lib; {
description = "An org-mode site engine";
homepage = "https://engine.arcology.garden/";
license = licenses.unfree;
maintainers = with maintainers; [ rrix ];
};
}
Dev Environment
nix develop
or nix-shell
will set you up with an environment that has Python programming dependencies available.
{ pkgs ? import <nixpkgs> {},
python3 ? pkgs.python3,
arroyo_rs ? pkgs.callPackage /home/rrix/org/arroyo/default.nix {},
}:
let
myPython = python3.withPackages( pp: with pp; [
pip
pytest
mypy
arrow
arroyo_rs
django_4
django-prometheus
django-htmx
(django-stubs-ext.override { django = django_4; })
(django-stubs.override { django = django_4; })
gunicorn
polling
]);
in pkgs.mkShell {
packages = (with pkgs; [
maturin
myPython
pyright
black]);
RUST_SRC_PATH = "${pkgs.rust.packages.stable.rustPlatform.rustLibSrc}";
NIX_CONFIG = "builders =";
shellHook = ''
PYTHONPATH=${myPython}/${myPython.sitePackages}
'';
}
A Flake to tie everything together and make it possible to run remotely
Nix is really going this direction, I'm not sure it's worthwhile but I'm going to see how to adapt to this world. It should be possible to nix run
a few apps to be able to operate the arcology.
{
description = "Arcology Site Engine, Django Edition";
inputs.nixpkgs.follows = "arroyo_rs/nixpkgs";
inputs.flake-utils.url = "github:numtide/flake-utils";
inputs.arroyo_rs.url = "git+https://code.rix.si/rrix/arroyo";
outputs = { self, nixpkgs, flake-utils, arroyo_rs }:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = import nixpkgs {
inherit system;
config.allowUnfree = true;
};
python3 = pkgs.python3;
arroyo = arroyo_rs.packages.${system}.default;
in
{
devShells.default = pkgs.callPackage ./shell.nix {
inherit python3;
arroyo_rs = arroyo;
};
packages = rec {
arcology = pkgs.callPackage ./default.nix {
inherit python3;
arroyo_rs = arroyo;
};
inherit arroyo;
default = arcology;
};
apps = rec {
arcology = flake-utils.lib.mkApp {
drv = self.packages.${system}.arcology;
exePath = "/bin/arcology";
};
# he he he
arroyo = flake-utils.lib.mkApp {
drv = self.packages.${system}.arroyo;
exePath = "/bin/arroyo";
};
default = arcology;
};
}
);
}
NEXT expose nixos modules and home manager modules here to aid re-bootstrap
Direnv
direnv fucking rules.
use flake
Gitignore
arcology.egg-info
__pycache__
venv
result
.direnv
db.sqlite3
Django bootstraps
import os
import sys
def django_manage():
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "arcology.settings")
from django.core.management import execute_from_command_line
execute_from_command_line(sys.argv)
this and a bit in pyproject.toml
lets you just type arcology watchfiles
to invoke a manage.py command.
These are generated scaffolds for now, basically the manage.py and -m arcology are the same and that is annoying, but i'll fix it some day.
#!/nix/store/c3cjxhn73xa5s8fm79w95d0879bijp04-python3-3.10.13/bin/python
"""Django's command-line utility for administrative tasks."""
import os
import sys
def main():
"""Run administrative tasks."""
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'arcology.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)
if __name__ == '__main__':
main()
"""
ASGI config for arcology project.
It exposes the ASGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/3.2/howto/deployment/asgi/
"""
import os
from django.core.asgi import get_asgi_application
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "arcology.settings")
application = get_asgi_application()
"""
WSGI config for arcology project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/3.2/howto/deployment/wsgi/
"""
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "arcology.settings")
application = get_wsgi_application()
Middlewares
User-Agent break-down
This AgentClassification
enumeration class can take a User Agent header and map it to one of a handful of groups, which a user has the ability to extend. AgentClassification.from_request(request)
will return a string from an enumeration, this is probably useful in labeling metrics or site statistics.
User Agent Substring | Enumeration |
---|---|
prometheus | INTERNAL |
feediverse | INTERNAL |
Synapse | MATRIX |
Element | MATRIX |
SubwayTooter | APP |
Dalvik | APP |
Nextcloud-android | APP |
Pleroma | FEDIVERSE |
Mastodon/ | FEDIVERSE |
Akkoma | FEDIVERSE |
Friendica | FEDIVERSE |
FoundKey | FEDIVERSE |
MissKey | FEDIVERSE |
CalcKey | FEDIVERSE |
gotosocial | FEDIVERSE |
Epicyon | FEDIVERSE |
feedparser | FEED |
granary | FEED |
Tiny Tiny RSS | FEED |
Go_NEB | FEED |
Gwene | FEED |
Feedbin | FEED |
NetNewsWire | FEED |
FreshRSS | FEED |
SimplePie | FEED |
Elfeed | FEED |
inoreader | FEED |
Reeder | FEED |
Miniflux | FEED |
Bot | BOT |
bot | BOT |
Poduptime | BOT |
aiohttp | AUTOMATION |
python-requests | AUTOMATION |
Go-http-client | AUTOMATION |
curl/ | AUTOMATION |
wget/ | AUTOMATION |
keybase-proofs/ | AUTOMATION |
InternetMeasurement | CRAWLER |
CensysInspect | CRAWLER |
scaninfo@paloaltonetworks.com | CRAWLER |
SEOlyt/ | CRAWLER |
Sogou web spider/ | CRAWLER |
Chrome/ | BROWSER |
Firefox/ | BROWSER |
DuckDuckGo/ | BROWSER |
Safari/ | BROWSER |
Opera/ | BROWSER |
ddg_android/ | BROWSER |
from __future__ import annotations
import logging
from typing import List
from enum import Enum
logger = logging.getLogger(__name__)
class AgentClassification(str, Enum):
NO_UA = "no-ua"
UNKNOWN = "unknown"
<<make_enum()>>
def __str__(self):
return self.value
@classmethod
def from_request(cls, request) -> AgentClassification:
user_agent = request.headers.get("User-Agent")
if user_agent == "":
return cls.NO_UA
if user_agent is None:
return cls.NO_UA
<<agent_classifier()>>
logger.warn(f"Unknown User-Agent: {user_agent}")
return cls.UNKNOWN
(thread-last
tbl
(mapcar (pcase-lambda (`(,substring ,enum)) enum))
(-uniq)
(mapcar (lambda (enum) (format "%s = \"%s\"\n" enum (downcase enum))))
(apply #'concat))
(thread-last
tbl
(mapcar (pcase-lambda (`(,substring ,enum))
(concat "if '" substring "' in user_agent:" "\n"
" return cls." enum "\n")))
(apply #'concat))
File-backed HTML/Atom cache
I got away with using functools.lru_cache
with the FastAPI prototype because uvicorn was single-process, but now we're deploying a WSGI app on multi-process gunicorn
so the memory that the lru_cache
writes to is not shared between the processes1. I don't feel like trying to get the Arcology to work as ASGI Django is worth the trouble, there would be too many multi-colored functions duplicated between the sync workers and the async workers.
There are currently a handful of hot cache points in the code-base they're all caching big huge strings. Django's cache framework solves all of this handily, but it doesn't provide a memoizing decorator. It's easy enough to write our own, let's see:
I want to do this:
from arcology.cache_decorator import cache
@cache(key_prefix="local_test")
def gimme(hk):
return "hello, world!"
gimme(1)
Writing a wrapper like this is sort of funny to look at, so let's step through it.
Consider the @fc.str_file_cache()
invocation above.
- That calls the outer-most function
cache
below, which returns the un-evaluated functionreturn_decoration
with some configuration variables in-scope. - The decorator system then invokes that function, passing the
gimme
function in to it - that returns a
wrapper
function when evaluated which is the thing that is actually invoked whengimme(1)
is invoked. - The inner wrapper function calculates a cache key similary to
functools.lru_cache
and checks the Django cache to see if there's anything matching that key, or storing and returning the value of the originalgimme
function.
If it makes more sense, it may be helpful to think that the @
in the code is evaluating the function returned by the statement after. If the statement is a naked function, it'll just evaluate it, but if you say @cache()
it will decorate gimme
with the return value of cache()
, which is another wrapper function.
All this nesting is necessary to pass arguments in to the decorator, and to have access to the inner function's arguments to calculate the hash key.
import pathlib
from django.core.cache import caches
import logging
logger = logging.getLogger(__name__)
def cache(key_prefix="", cache_connection="default", expire_secs=600):
def return_decoration(func):
def wrapper(*args, **kwargs):
cache = caches["default"]
key = args
for k, v in kwargs.items():
key += tuple(k,v)
cache_key = f"{key_prefix}/{hash(key)}"
ret = cache.get(cache_key)
if ret is None:
logger.debug(f"cache_miss {cache_key}")
ret = func(*args, **kwargs)
cache.set(cache_key, ret, expire_secs)
else:
logger.debug(f"cache_hit {cache_key}")
return ret
return wrapper
return return_decoration
NEXT I need to make sure to write some code to do cache-invalidation before it becomes a problem, too…
could also just use systemd-tmpfiles">systemd-tmpfiles
..!
Maybe some day the GIL won't get in the way, alas