arcology/arcology.org

1371 lines
44 KiB
Org Mode

:PROPERTIES:
:ID: arcology/django/arcology-models
:END:
#+TITLE: The Arcology's Data Models and Web Server
#+filetags: :Project:Arcology:
#+ARCOLOGY_KEY: arcology/webserver
#+ARCOLOGY_ALLOW_CRAWL: t
* Data Models for Sites, Web Features, and Feeds
:PROPERTIES:
:ID: 20240204T234334.762591
:END:
#+begin_src python :tangle arcology/models.py
from __future__ import annotations
from typing import Optional, List
from django.db import models
from django.conf import settings
from django_prometheus.models import ExportModelOperationsMixin as EMOM
import arrow
import arroyo.arroyo_rs as native
from arcology.cache_decorator import cache
import roam.models
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.WARN)
# used for some memoization
class hashabledict(dict):
def __hash__(self):
return hash(tuple(sorted(self.items())))
#+end_src
** Site
A =Site= has many =SiteDomain='s. It has a routing key, and a title, and some CSS and customization. There are a few helper classmethods to take an input request or routing key and output a Site object based on the SiteDomain or whatnot. I'm not sure I want the =async= definitions to stick around, there needs to be some consideration of *what* should be =async= in this system and where asgi can be relied on for concurrency.
Sites are created in the [[id:20231217T154835.232283][Arcology Seed Command]].
#+begin_src python :tangle arcology/models.py
# Sites and SiteDomains are created in django-admin or a seed rather than from arroyo parser, no create_from_arroyo..!
class Site(EMOM('site'), models.Model):
key = models.CharField(max_length=512, primary_key=True)
title = models.CharField(max_length=512)
# add choices
css_file = models.CharField(max_length=512, blank=True, default=None)
# this is used in sitemap, and maybe links..
link_color = models.CharField(max_length=8, blank=True, default=None)
def urlize_page(self, page: Page, heading: Optional[roam.models.Heading] = None):
domain = self.sitedomain_set.first().domain
key_rest = page.route_key.split("/", 1)[1]
url = f"https://{domain}/{key_rest}"
if heading is not None:
url = url + f"#{heading.node_id}"
return url
def urlize_feed(self, feed: Feed):
domain = self.sitedomain_set.first().domain
key_rest = feed.route_key.split("/", 1)[1]
url = f"https://{domain}/{key_rest}"
return url
@classmethod
def from_route(cls: Site, route_key: str) -> Site:
site_key = route_key.split("/")[0]
site = cls.objects.get(key=site_key)
assert site is not None
return site
@classmethod
def from_request(cls: Site, request) -> Site:
host = request.headers.get("Host")
site = cls.objects.filter(sitedomain__domain=host).first()
assert site is not None
return site
class SiteDomain(EMOM('site_domain'), models.Model):
site = models.ForeignKey(
Site,
on_delete=models.CASCADE,
)
domain = models.CharField(max_length=512)
#+end_src
*** Base migration
#+name: migration-site
#+begin_src python
migrations.CreateModel(
name="Site",
fields=[
(
"key",
models.CharField(max_length=512, primary_key=True, serialize=False),
),
("title", models.CharField(max_length=512)),
(
"css_file",
models.CharField(blank=True, default=None, max_length=512),
),
(
"link_color",
models.CharField(blank=True, default=None, max_length=8),
),
],
),
migrations.CreateModel(
name="SiteDomain",
fields=[
(
"id",
models.BigAutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
(
"site",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="arcology.site"
),
),
("domain", models.CharField(default="localhost", max_length=512)),
],
),
#+end_src
** Page
A site has many pages. Pages have a routing key defined by the =ARCOLOGY_KEY= keyword, a title based on the level-0 heading, and some metadata besides that.
These are created using the [[roam:create_from_arroyo][=create_from_arroyo=]] pattern which makes it easy for the [[id:20231217T154857.983742][Arcology ingest_files Command]] to include new functionality in to the system.
#+begin_src python :tangle arcology/models.py
class Page(EMOM('page'), models.Model):
file = models.ForeignKey(
roam.models.File,
on_delete=models.CASCADE,
)
route_key = models.CharField(max_length=512, primary_key=True)
root_heading = models.ForeignKey(roam.models.Heading, on_delete=models.CASCADE)
site = models.ForeignKey(
Site,
on_delete=models.CASCADE,
)
title = models.CharField(max_length=512)
allow_crawl = models.BooleanField(default=False)
def to_url(self):
site = self.site
return site.urlize_page(self)
def to_url_path(self):
key_rest = self.route_key.split("/", 1)[1]
return f"/{key_rest}"
def collect_keywords(self):
return self.file.keyword_set
def collect_tags(self):
return [
tag
for heading in self.file.heading_set.all()
for tag in heading.tag_set.all()
]
def collect_references(self):
return [
reference
for heading in self.file.heading_set.all()
for reference in heading.reference_set.all()
]
def collect_links(self):
my_headings = self.file.heading_set.all()
link_objs = self.file.outbound_links.all()
ret = {
h.node_id: h.to_url() for h in my_headings
}
for el in link_objs:
try:
h = el.dest_heading
url = h.to_url()
ret[h.node_id] = url
logger.info(f"link {url} from {el}")
except roam.models.Heading.DoesNotExist:
logger.info(f"{el} does not have dest")
return ret
def collect_backlinks(self) -> List[Link]:
my_headings = self.file.heading_set.all()
return set(roam.models.Link.objects.filter(dest_heading__in=my_headings))
def to_html(self, links, heading=None, include_subheadings=False):
return self._to_html_memoized(hashabledict(links), heading, include_subheadings, self.file.digest)
@cache(key_prefix="page_html", expire_secs=60*60*24*7)
def _to_html_memoized(self, links, heading, include_subheadings, _file_digest):
if heading is not None:
headings = [heading]
else:
headings = []
opts = native.ExportOptions(
link_retargets=links,
limit_headings=headings,
include_subheadings=include_subheadings,
ignore_tags=settings.IGNORED_ROAM_TAGS,
)
return native.htmlize_file(self.file.path, opts)
@classmethod
def create_from_arroyo(cls, doc: native.Document) -> Page:
f = roam.models.File.objects.get(path=doc.path)
route_key = next(iter(doc.collect_keywords("ARCOLOGY_KEY")), "")
allow_crawl = (
next(iter(doc.collect_keywords("ARCOLOGY_ALLOW_CRAWL")), False) is not False
)
site = Site.from_route(route_key)
root_heading = f.heading_set.filter(level=0)[0]
title = root_heading.title or ""
return cls.objects.get_or_create(
file=f,
route_key=route_key,
allow_crawl=allow_crawl,
site=site,
root_heading=root_heading,
title=title,
)[0]
#+end_src
*** Base migration:
#+name: migration-page
#+begin_src python
migrations.CreateModel(
name="Page",
fields=[
(
"route_key",
models.CharField(max_length=512, primary_key=True, serialize=False),
),
("title", models.CharField(max_length=512)),
("allow_crawl", models.BooleanField(default=False)),
(
"file",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="roam.file"
),
),
(
"root_heading",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="roam.heading"
),
),
(
"site",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="arcology.site"
),
),
],
),
#+end_src
** Feed
Pages can define an Atom feed + [[id:20230125T143144.011175][Feediverse]] feeds by tagging a page with =ARCOLOGY_FEED= keyword and making sure the headings have a =PUBDATE= an =ID= property. This feature relies on Pandoc right now, I'll need to write a custom Atom exporter in [[id:20231023T115950.248543][The arroyo_rs Native Org Parser]] when it comes time to implement these feeds.
These are also created using the [[roam:create_from_arroyo][=create_from_arroyo=]] pattern which makes it easy for the [[id:20231217T154857.983742][Arcology ingest_files Command]] to include new functionality in to the system.
#+begin_src python :tangle arcology/models.py
class Feed(EMOM('feed'), models.Model):
POST_VISIBILITY = [
("unlisted", "Unlisted"),
("private", "Private"),
("public", "Public"),
("direct", "direct"), # might be different, XXX
]
file = models.ForeignKey(
roam.models.File,
on_delete=models.CASCADE,
)
route_key = models.CharField(max_length=512, primary_key=True)
site = models.ForeignKey(
Site,
on_delete=models.CASCADE,
)
title = models.CharField(max_length=512)
visibility = models.CharField(max_length=512, choices=POST_VISIBILITY)
def url(self):
return self.site.urlize_feed(self)
@classmethod
def create_from_arroyo(cls, doc: native.Document) -> Feed | None:
route_key = next(iter(doc.collect_keywords("ARCOLOGY_FEED")), None)
if not route_key:
return None
visibility = next(
iter(doc.collect_keywords("ARCOLOGY_TOOT_VISIBILITY")), "private"
)
f = roam.models.File.objects.get(path=doc.path)
site = Site.from_route(route_key)
root_heading = f.heading_set.filter(level=0)[0]
title = root_heading.title
return cls.objects.get_or_create(
file=f,
route_key=route_key,
title=title,
visibility=visibility,
site=site,
)[0]
@classmethod
async def aget(cls, **kwargs):
return await cls.objects.prefetch_related("file", "site").aget(
**kwargs
)
#+end_src
*** Base migration
#+name: migration-feed
#+begin_src python
migrations.CreateModel(
name="Feed",
fields=[
(
"route_key",
models.CharField(max_length=512, primary_key=True, serialize=False),
),
("title", models.CharField(max_length=512)),
(
"visibility",
models.CharField(
choices=[
("unlisted", "Unlisted"),
("private", "Private"),
("public", "Public"),
("direct", "direct"),
],
max_length=512,
),
),
(
"file",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="roam.file"
),
),
(
"site",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="arcology.site"
),
),
],
),
#+end_src
** FeedEntry
A FeedEntry is a Heading with a PUBDATE property that exists on a page w/ ARCOLOGY_FEED Keyword. These are used to construct =Feeds=
#+begin_src python :tangle arcology/models.py
class FeedEntry(EMOM('feed_entry'), models.Model):
POST_VISIBILITY = [
("unlisted", "Unlisted"),
("private", "Private"),
("public", "Public"),
("direct", "direct"), # might be different, XXX
]
heading = models.ForeignKey(
roam.models.Heading,
on_delete=models.CASCADE,
)
feed = models.ForeignKey(
Feed,
on_delete=models.CASCADE,
)
route_key = models.CharField(max_length=512)
site = models.ForeignKey(
Site,
on_delete=models.CASCADE,
)
title = models.CharField(max_length=512)
visibility = models.CharField(max_length=512, choices=POST_VISIBILITY)
pubdate = models.DateTimeField(auto_now=False)
def to_html(self, links):
return self._to_html_memoized(hashabledict(links), self.heading.path.digest)
@cache(key_prefix="feedentry_html", expire_secs=60*60*24*7)
def _to_html_memoized(self, links, _file_digest):
opts = native.ExportOptions(
link_retargets=links,
limit_headings=[self.heading.node_id],
include_subheadings=True,
ignore_tags=settings.IGNORED_ROAM_TAGS,
)
return native.htmlize_file(self.heading.path.path, opts)
@classmethod
def create_from_arroyo(cls, doc: native.Document) -> List[Feed] | None:
route_key = next(iter(doc.collect_keywords("ARCOLOGY_FEED")), None)
if not route_key:
return None
visibility = next(
iter(doc.collect_keywords("ARCOLOGY_TOOT_VISIBILITY")), "private"
)
site = Site.from_route(route_key)
# f = roam.models.File.objects.get(path=doc.path)
feed = Feed.objects.get(route_key=route_key)
rets = []
for nheading in doc.headings:
if nheading.id is not None:
heading = roam.models.Heading.objects.get(node_id=nheading.id)
pdqs = heading.headingproperty_set.filter(keyword="PUBDATE")
if not pdqs.exists():
continue
v = pdqs.first().value
pubdate = arrow.get(v, "YYYY-MM-DD ddd H:mm").format(arrow.FORMAT_RFC3339)
title = heading.title
rets += [cls.objects.get_or_create(
heading=heading,
feed=feed,
route_key=route_key,
title=title,
pubdate=pubdate,
visibility=visibility,
site=site,
)[0]]
# root_heading = f.heading_set.filter(level=0)[0]
# title = root_heading.title
return rets
#+end_src
** Database Migrations
#+begin_src python :tangle arcology/migrations/__init__.py
#+end_src
*** =0001_base=
These are assembled from the snippets described in the models above.
#+begin_src python :tangle arcology/migrations/0001_base.py :noweb yes
# Generated by Django 4.2.6 on 2023-12-18 02:46
from django.db import migrations, models
import django.db.models.deletion
class Migration(migrations.Migration):
replaces = [("arcology", "0001_initial"), ("arcology", "0002_sitedomain_domain")]
dependencies = [
("roam", "0005_alter_link_dest_heading"),
]
operations = [
<<migration-site>>
<<migration-page>>
<<migration-feed>>
]
#+end_src
** NEXT admin
dont worry too much about these;
they are just used to validate that the data is ingested properly, to be honest.
#+begin_src python :tangle arcology/admin.py
from django.contrib import admin
import arcology.models
class DomainInline(admin.TabularInline):
model = arcology.models.SiteDomain
@admin.register(arcology.models.Site)
class SiteAdmin(admin.ModelAdmin):
inlines = [DomainInline]
@admin.register(arcology.models.Page)
class PageAdmin(admin.ModelAdmin):
pass
@admin.register(arcology.models.Feed)
class FeedAdmin(admin.ModelAdmin):
pass
@admin.register(arcology.models.FeedEntry)
class FeedEntryAdmin(admin.ModelAdmin):
list_display = ["heading", "route_key", "pubdate", "title"]
#+end_src
* The Web Server
These are the route [[https://docs.djangoproject.com/en/3.2/topics/http/urls/][urlpatterns]]:
#+begin_src python :tangle arcology/urls.py
from django.contrib import admin
from django.urls import path, re_path, include
from django.conf import settings
from arcology import views
urlpatterns = [
path("admin/", admin.site.urls),
path("", views.index),
path("robots.txt", views.robots, name="robots_txt"),
path("404", views.unpublished, name="page_not_found"),
path("sites.css", views.site_css, name="site-css"),
path("feeds.json", views.feed_list, name="feed-list"),
path("", include("django_prometheus.urls")),
path("", include("sitemap.urls")),
# ensure these ones are last because they're greedy!
re_path("(?P<key>[0-9a-zA-Z/_\-]+\.xml)", views.feed, name="feed"),
re_path("(?P<key>[0-9a-zA-Z/_\-\.]+)", views.org_page, name="org-page"),
]
if settings.ARCOLOGY_ENVIRONMENT != "production":
urlpatterns = [
path("api/v1/", include("localapi.urls")),
] + urlpatterns
#+end_src
This is the topmatter for the views described below:
#+begin_src python :tangle arcology/views.py
import logging
from django.http import HttpResponse, HttpResponseNotFound, Http404
from django.shortcuts import render, get_object_or_404
from arcology.models import Page, Feed, Site
from roam.models import Link
from prometheus_client import Counter, Histogram
logger = logging.getLogger(__name__)
#+end_src
** =GET /= site index
this will just call the Org Page rendering function for the site's index page. =render_page= is defined below.
#+begin_src python :tangle arcology/views.py
def index(request):
site = Site.from_request(request)
full_key = f"{site.key}/index"
return render_page(request, site, full_key)
#+end_src
** Arcology Org Page handler
:PROPERTIES:
:ID: 20240202T144002.656093
:END:
:LOGBOOK:
- State "INPROGRESS" from [2023-12-20 Wed 17:48]
:END:
This constructs a page key from the request, tries to load that page and its HTML, and renders that along with a bunch of other metadata stored in relation to the =Page= object in the DB.
#+begin_src python :tangle arcology/views.py
def org_page(request, key):
site = Site.from_request(request)
if site.key == "localhost":
full_key = key
new_site_key = key.split("/")[0]
site = Site.objects.filter(key=new_site_key).first()
else:
full_key = f"{site.key}/{key}"
return render_page(request, site, full_key)
#+end_src
This =render_page= function is shared between the =index= request and the more complicated route handler.
It's manually instrumented with a few [[https://prometheus.github.io/client_python/][Prometheus Client]] counters and gauges to be emitted on top of what comes out of =django-prometheus= already. This extra instrumentation is just enough to make a per-site and per-page hit chart, along with some very rudimentary [[id:20240213T120603.921365][User-Agent break-down]] to filter out most of the automated traffic.
#+begin_src python :tangle arcology/views.py
page_counter = Counter("arcology_page", "Hit counter for each page", ["site", "page", "status", "agent_type"])
render_latency = Histogram("arcology_page_render_seconds", "Latency for render_page func.", ["page", "site", "agent_type"])
from arcology.agent_utils import AgentClassification
from django.template import loader
def render_page(request, site, full_key):
agent = AgentClassification.from_request(request)
with render_latency.labels(page=full_key, site=site.key, agent_type=agent).time():
try:
the_page = Page.objects.get(route_key=full_key)
except Page.DoesNotExist:
page_counter.labels(page=full_key, status=404, site=site.key, agent_type=agent).inc()
template = loader.get_template("404.html")
context = dict(
missing_key=full_key
)
return HttpResponseNotFound(
template.render(context, request)
)
links = the_page.collect_links()
page_html = the_page.to_html(links)
feeds = site.feed_set.all()
page_counter.labels(page=full_key, status=200, site=site.key, agent_type=agent).inc()
return render(request, "arcology/page.html", dict(
site=site,
page=the_page,
feeds=feeds,
head_title=f"{the_page.title} - {site.title}",
html_content=page_html,
backlinks=the_page.collect_backlinks(),
keywords=the_page.collect_keywords().all(),
references=the_page.collect_references(),
tags=the_page.collect_tags(),
))
#+end_src
*** =arcology/page.html= extends =app.html= to embed the Org page and its metadata
:PROPERTIES:
:ID: 20240226T174503.655394
:ROAM_ALIASES: "Arcology Page HTML Template"
:END:
The =page= template extends the app template defined below, which provides four blocks to inject content in to:
#+begin_src jinja2 :tangle arcology/templates/arcology/page.html
{% extends "arcology/app.html" %}
#+end_src
The tab title is assembled from the page and site title:
#+begin_src jinja2 :tangle arcology/templates/arcology/page.html
{% block title %}{{ head_title }}{% endblock %}
#+end_src
If the site has any feeds, they're injected in to the =<head>= along with any particular web-crawler rules.
#+begin_src jinja2 :tangle arcology/templates/arcology/page.html
{% block extra_head %}
{% for feed in feeds %}
<link rel="alternate" type="application/atom+xml" href="{{ feed.url }}" title="{{ feed.title }}" />
{% endfor %}
{% if page.allow_crawl is none or page.allow_crawl is '"nil"' %}
<meta name="robots" content="noarchive noimageindex noindex nofollow"/>
{% else %}
<meta name="robots" content=""/>
{% endif %}
{% endblock %}
#+end_src
The main =content= block contains the =<main>= generated by the native parser, and a sidebar containing backlinks, and page metadata, and other crap.
#+begin_src jinja2 :tangle arcology/templates/arcology/page.html
{% load cache %}
{% block content %}
{# HTML is sent through without HTML Escaping via | safe #}
{{ html_content | safe }}
{% cache 604800 sidebar page.file.digest %}
<section class="sidebar">
{% if backlinks|length > 0 %}
<div class="backlinks">
<h3>Pages Linking Here</h3>
<ul class="backlinks">
{% for backlink in backlinks %}
<li>{{ backlink.to_backlink_html|safe }}</li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if tags|length > 0 %}
<div class="tags">
<h3>Page Tags</h3>
<ul class="tags">
{% for tag in tags %}
<li><a href="/tags/{{ tag.tag }}">{{tag.tag}}</a></li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if references|length > 0 %}
<div class="references">
<h3>External References</h3>
<ul class="references">
{% for ref in references %}
<li><a target="_blank" href="{{ ref.ref }}">{{ref.ref}}</a></li>
{% endfor %}
</ul>
</div>
{% endif %}
{% if keywords|length > 0 %}
<div class="keywords">
<h3>Page Metadata Keywords</h3>
<ul class="keywords">
{% for keyword in keywords %}
<pre>#+{{ keyword.keyword }}: {{ keyword.value }}</pre>
{% endfor %}
</ul>
</div>
{% endif %}
</section>
{% endcache %}
{% endblock %}
#+end_src
Here's a really simple 404 template, too.
#+begin_src jinja2 :tangle arcology/templates/404.html
{% extends "arcology/app.html" %}
{% block title %}Page Not Found{% endblock %}
{% block h1 %}<h1>Page Not Found</h1>{% endblock %}
{% block content %}
<section>
<p>
The page you tried to open either has not been written by the
author or the author has chosen to not publish it at this
time. Please contact the author and include the URL of both the
page you clicked the link on, as well as the link you&apos;d like
to read. You may just want
to <a href="javascript:history.back()">Go Back</a>, too.
</p>
<p>
If you&apos;re interested in a particular reference, you might of
course have more luck using a public search engine
like <a href="https://duckduckgo.com">DuckDuckGo</a>
or <a href="https://kagi.com">Kagi</a>.
</p>
<pre>MISSING KEY = {{ missing_key }}</pre>
</section>
{% endblock %}
#+end_src
*** Org Page-specific CSS Stylings
:PROPERTIES:
:ID: 20240226T174517.235275
:ROAM_ALIASES: "Arcology Page CSS Files"
:END:
Most of the page CSS is defined below as part of the =app.html=, but the content-specific CSS is here, nearer the actual implementation of the flexbox above.
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
.content {
margin-left: auto;
margin-right: auto;
padding: 1em;
padding-top: 0;
display: flex;
flex-flow: row wrap;
max-width: 120ch;
}
.content > section, main {
display: inline-block;
flex-grow: 1;
flex-shrink: 1;
flex-basis: 40em;
padding: 1em;
overflow: auto;
}
.content > section.sidebar {
flex-grow: 0;
flex-shrink: 1;
flex-basis: 30ch;
}
#+end_src
The sidebar itself is a vertical flexbox, pushing everything but the backlinks towards the bottom of the page.
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
section.sidebar {
display: flex;
flex-flow: column wrap;
}
section.sidebar > div.backlinks {
flex-grow: 1;
}
#+end_src
Here are some [[https://medium.com/@massimo.cassandro/flexbox-separators-b284d6d7b747][hacks]] to put a line between the main content flexbox and the sidebar. I'm not sure I'll keep this, but it's nice to have a delimeter.
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
.content::before {
align-self: stretch;
content: '';
border: 1px dotted var(--medium-gray);
margin-top: 1em;
margin-bottom: 1em;
}
.content > *:first-child {
order: -1;
}
#+end_src
And some simple image wrangling:
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
.content img {
display: block;
width: 80%;
margin: 0 auto;
}
#+end_src
These rules annotate task headings by inserting an icon before them.
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
.task.task-DONE::before {content: '☑️ ';}
.task.task-NEXT::before {content: '🆕 ';}
.task.task-INPROGRESS::before {content: '🔜 ';}
.task.task-WAITING::before {content: '⌚ ';}
.task.task-CANCELLED::before {content: '☒ ';}
#+end_src
This will display the header arguments to =org-babel= source blocks: You're staring right at one!
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
span.babel-args {
text-align: right;
display: block;
background: var(--light-gray);
margin-bottom: 0;
}
pre.src {
border-top: 1px solid var(--black);
background-color: var(--light-gray);
font-style: normal;
overflow: scroll;
margin-top: 0;
tab-size: 3ch;
padding-top: 1em;
padding-left: 0.5em;
padding-bottom: 1em;
padding-right: 0.5em;
}
#+end_src
** Atom Feed Handler
:PROPERTIES:
:ID: 20240204T234814.612917
:END:
:LOGBOOK:
- State "INPROGRESS" from "NEXT" [2024-02-04 Sun 23:48]
:END:
This uses the sub-feature of the HTML exporter to export only certain sub-headings in [[id:20231023T115950.248543][The arroyo_rs Native Org Parser]]. The =FeedEntry='s defined above are used to construct the feed. I do some gnarly stuff including just stuffing a custom Django template filter in to there so that I can keep a bunch of =node ID= -> =$thing= maps so that when I make the feed entries I can just reach in to a few dicts instead of shaping that all on the handler. But [[roam:仕方がない][仕方がない]]...
#+begin_src python :tangle arcology/views.py
import arrow
import roam.models
def feed(request, key):
# Get the site and construct the route key
site = Site.from_request(request)
if site.key == "localhost":
full_key = key
new_site_key = key.split("/")[0]
site = Site.objects.filter(key=new_site_key).first()
else:
full_key = f"{site.key}/{key}"
# Fetch page metadata
the_feed = get_object_or_404(Feed, route_key=full_key)
entries = the_feed.feedentry_set.order_by("-pubdate").all()[:10]
if len(entries) == 0:
return Http404()
try:
page_author = roam.models.Keyword.objects.get(keyword="AUTHOR", path=the_feed.file).value
except roam.models.Keyword.DoesNotExist:
logger.warn(f"Feed {key} does not have an AUTHOR!")
page_author = "Arcology User"
page_url = the_feed.file.page_set.first().to_url()
updated_at = arrow.get(entries[0].pubdate).format(arrow.FORMAT_RFC3339) # entries is already sorted
# node-id -> URL
links = the_feed.file.page_set.first().collect_links()
# node-id -> HTML
html_map = {
entry.heading.node_id: entry.to_html(links=links) for entry in entries
}
# node-id -> PUBDATE heading property
pubdate_map = {
entry.heading.node_id: arrow.get(entry.pubdate).format(arrow.FORMAT_RFC3339) for entry in entries
}
# return HttpResponse("",content_type="application/atom+xml")
return render(request, "arcology/feed.xml", dict(
title=the_feed.title,
page_url=page_url,
author=page_author,
updated_at=updated_at,
feed_entries=entries,
htmls=html_map,
pubdates=pubdate_map,
links=links,
), content_type="application/atom+xml")
#+end_src
An Atom feed is pretty simple, it's an XML document with multiple =<entry>='s and the metadata we collected above. For once i'm glad that Python templating treats strings as HTML-Unsafe and escapes the generated HTML used in the Summary for me. This bit me in the past, with the FastAPI version -- the stuff that goes inside of =type = "html"= elements isn't necessarily valid XML so it needs to get escaped.
#+begin_src jinja2 :tangle arcology/templates/arcology/feed.xml :mkdirp yes
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>{{ title }}</title>
<link href="{{ page_url }}"/>
<updated>{{ updated_at }}</updated>
<author>
<name>{{ author }}</name>
</author>
<id>{{ page_url }}</id>
{% for entry in feed_entries %}
<entry>
<title>{{ entry.title }}</title>
<link href="{{ links | get_item:entry.heading.node_id }}"/>
<id>urn:uid:{{ entry.heading.node_id }}</id>
<updated>{{ pubdates | get_item:entry.heading.node_id }}</updated>
<summary type="html">{{ htmls | get_item:entry.heading.node_id }}</summary>
</entry>
{% endfor %}
</feed>
#+end_src
*** NEXT add category/tags to the entries
*** NEXT move this function to somewhere else more reasonable
This template relies on this custom Django template i [[https://stackoverflow.com/questions/8000022/django-template-how-to-look-up-a-dictionary-value-with-a-variable][nicked from StackOverflow]] to access a dict with a variable key.
#+begin_src python :tangle arcology/views.py
from django.template.defaulttags import register
@register.filter
def get_item(dictionary, key):
return dictionary.get(key)
#+end_src
*** CANCELLED [#A] see if the IDs are consistent with the old generator
:LOGBOOK:
- State "CANCELLED" from "NEXT" [2024-02-26 Mon 17:46]
:END:
** 404 unpublished/not found endpoint
There are plenty of links inside the Arcology which aren't meant to be clicked. =roam:= stub links will of course
#+begin_src python :tangle arcology/views.py
def unpublished(request):
key = request.GET.get("key")
if key is None:
key = "NOT_SUPPLIED"
# query links etc to create a JSON doc for SigmaJS
template = loader.get_template("404.html")
context = dict(
missing_key=key
)
return HttpResponseNotFound(
template.render(context, request)
)
#+end_src
** =GET /robots.txt= Endpoint
[[https://en.wikipedia.org/wiki/Robots.txt][robots.txt]] is the [[roam:Robots Exclusion Protocol]], a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
- Disallow all GPT-alikes on all pages, I will add more to this list as necessary. Probably will pull these in to [[id:arcology/django/config][Arcology Project Configuration]] sooner or later.
- Show all pages with a truthy =ARCOLOGY_ALLOW_CRAWL= [[id:20240204T234111.701754][=roam.models.Keyword=]]
- If we're on local development, it will show all pages, otherwise only ones for the site being queried.
#+begin_src python :tangle arcology/views.py
def robots(request):
site = Site.from_request(request)
public_pages = Page.objects \
.filter(allow_crawl=True)
if site.key != "localhost":
public_pages = public_pages \
.filter(site=site)
public_pages = public_pages.all()
return render(request, "arcology/robots.txt", dict(
disallow_all_agents=["GPTBot", "ChatGPT-User", "Google-Extended", "CCBot", "anthropic-ai"],
pages=public_pages,
), content_type="text/plain")
#+end_src
Those values are passed to the Jinja template:
#+begin_src jinja2 :tangle arcology/templates/arcology/robots.txt :mkdirp yes
{% for agent in disallow_all_agents %}
User-agent: {{ agent }}
Disallow: /
{% endfor %}
User-agent: *
Disallow: /
{% for page in pages %}Allow: {{ page.to_url_path }}
{% endfor %}
#+end_src
** =GET /feeds.json= Feed discovery endpoint
:LOGBOOK:
CLOCK: [2024-02-15 Thu 14:17]--[2024-02-15 Thu 14:41] => 0:24
:END:
#+begin_src python :tangle arcology/views.py
import json
def feed_list(request):
site = Site.from_request(request)
feeds = Feed.objects.all()
ret = [
dict(
key=feed.route_key,
url=feed.site.urlize_feed(feed),
title=feed.title,
site=feed.site.key,
visibility=feed.visibility,
)
for feed in feeds
]
return HttpResponse(json.dumps(ret), content_type="application/json")
#+end_src
** =GET /sites.css= Per-Site link color dynamic CSS endpoint
:PROPERTIES:
:ID: 20231229T215425.830707
:END:
This endpoint generates a dynamic CSS file that colorizes internal URLs based on the [[id:20231229T164611.256424][The Arcology's Site List]] which is stored in the database. It does something [[https://twitter.com/gotMLK7/status/1675994399086641152][extremely wicked]] to make the page links less jarring until you hover over them by faking an alpha-channel in to the color.
#+begin_src python :tangle arcology/views.py
def site_css(request):
sites = Site.objects.all()
stanzas = []
for site in sites:
for domain in site.sitedomain_set.all():
stanzas.append(f'''
a[href*="//{domain.domain}"] {{
border-radius: 0.25em;
padding: 0.1em;
background-color: {site.link_color}66;
}}
a[href*="//{domain.domain}"]:hover {{
background-color: {site.link_color}FF !important;
}}
''')
stanzas.append(f'''
a[href*="/404"] {{
color: var(--alert);
/* text-decoration: line-through; */
}}
a[href*="/404"]::after {{
content: "";
}}
a[href*="/404"]::before {{
content: "";
}}
''')
return HttpResponse(stanzas, content_type="text/css")
#+end_src
** =app.html= Arcology Site Templates
In short, there are four blocks that the page template and other templates will use to embed content in the rendered web page:
- =title= is the =<title>= element, the name of the tab.
- =h1= is the displayed site/page title and only needs to be extended if some page wants to do something strange (like site index pages only showing the site title)
- =extra_head= is inside =<head>= and can be used to stuff more metadata in there
- =content= is where the content goes.
for now it's largely lifted from [[id:arcology/fastapi/base.html.j2][Base HTML Template]] and [[id:arcology/fastapi/page.html.j2][Page HTML Templates]] from the FastAPI prototype with some nips and tucks to make it more streamlined and legible.
#+begin_src jinja2 :tangle arcology/templates/arcology/app.html :mkdirp yes
<!DOCTYPE html>
<html>
<head>
#+end_src
The base template provides some basic information and loads the CSS sheets necessary to make things look nice, along with some page and author metadata. It provides a template block =extra_head= so that child templates can shove more =<head>= elements in here.
#+begin_src jinja2 :tangle arcology/templates/arcology/app.html :mkdirp yes
{% load static %}
{% load django_htmx %}
<link rel="stylesheet" href="{% static 'arcology/css/app.css' %}"/>
<link rel="stylesheet" href="{% static 'arcology/css/vulf.css' %}"/>
<link rel="stylesheet" href="{% static 'arcology/css/default-colors.css' %}"/>
<link rel="stylesheet" href="{% url 'site-css' %}"/>
{% if site and site.css_file %}
<link rel="stylesheet" href="{% static site.css_file %}"/>
{% endif %}
<meta name="author" content="Ryan Rix"/>
<meta name="generator" content="Arcology Site Engine https://engine.arcology.garden/"/>
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>{% block title %}{{head_title | default:"The Arcology Project" }}{% endblock %}</title>
{% block extra_head %}{% endblock %}
</head>
#+end_src
The body consists of a header which has the site and page title (which can be overridden for example in the =index= handler to only show the site title) and links to the other sites. These should be loaded from the DB eventually.
#+begin_src jinja2 :tangle arcology/templates/arcology/app.html :mkdirp yes
<body>
<header>
<div class="header-content">
{% block h1 %}
<h1><a href='/'>{{ site.title }}</a></h1>
<h2>{{ page.title }}</h2>
{% endblock %}
<div>
&bull; <a class="internal" href="https://thelionsrear.com">Life</a>
&bull; <a class="internal" href="https://arcology.garden">Tech</a>
&bull; <a class="internal" href="https://cce.whatthefuck.computer">Emacs</a>
&bull; <a class="internal" href="https://engine.arcology.garden">Arcology</a>
&bull;
</div>
</div>
</header>
#+end_src
The =content= block is used in child templates to hide a =<main>=; the =content= div *should* be a main element instead but [[id:20231023T115950.248543][The arroyo_rs Native Org Parser]] wants to output a =<main>= and i'm not going to stop it, so the div is there to make the body's flexbox layout work.
#+begin_src jinja2 :tangle arcology/templates/arcology/app.html :mkdirp yes
<div class="content">
{% block content %}{% endblock %}
</div>
#+end_src
A footer contains the oh-so-important copyright notice and a limited privacy policy which I should update before I ship this, along with links to the sitemap and to [[https://fediring.net][my fediring neighbors]].
#+begin_src jinja2 :tangle arcology/templates/arcology/app.html :mkdirp yes
<footer>
<hr/>
&copy; 02024 <a href="https://arcology.garden/people/rrix">Ryan Rix</a> &lt;<a href="mailto:site@whatthefuck.computer">site@whatthefuck.computer</a>&gt;
<br/>
<p>
Care has been taken to publish accurate information to
long-lived URLs, but context and content as well as URLs may
change without notice.
</p>
<p>
This site collects no personal information from visitors, nor
stores any identifying tokens. If you or your personal
information ended up in public notes please email me for
correction or removal. A single bit cookie may be stored on
your device if you choose to change appearance settings below.
</p>
<p>
Email me with questions, comments, insights, kind criticism.
blow horn, good luck.
</p>
<p>
View the <a href="/sitemap">Site Map</a> or the <a href="/tags">Tag Index</a>.
</p>
<p>
<a href="https://fediring.net/previous?host=arcology.garden">&larr;</a>
<a href="https://fediring.net/">Fediring</a>
<a href="https://fediring.net/next?host=arcology.garden">&rarr;</a>
</p>
#+end_src
The FastaAPI site had a "boredom mode" which would disable fonts and colors because some nerds were mean to me. This one will not have that until some nerds are mean to me.
#+begin_src jinja2 :tangle arcology/templates/arcology/app.html :mkdirp yes
<!--
<p>
<input type="checkbox" id="boredom-mode"><label for="boredom-mode">I do not like your aesthetic sensibilities!!</label>
</p>
<script type="text/javascript">
<<boredom>>
</script>
-->
</footer>
</body>
</html>
#+end_src
*** CSS
:PROPERTIES:
:ID: 20231229T164608.815737
:END:
this will be extended.
rather than using emoji for each site, it would be nice to subtly color them based on the link_color... will need to Do Some Bullshit to make that work though maybe.
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
body {
font-family: "Vulf Mono", monospace;
font-style: italic;
font-size: medium;
background-color: var(--white);
color: var(--black);
margin: 0;
}
#+end_src
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
header {
background-color: var(--light-gray);
border-radius: 0.25em;
margin-top: 0;
border-bottom: 2px solid var(--dark-gray);
}
header > .header-content {
padding: 1em;
max-width: 120ch;
margin-left: auto;
margin-right: auto;
}
header h1, header h2 {
margin-top: 0;
display: inline;
}
header h2:before {
content: " — ";
}
#+end_src
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
footer {
margin-left: auto;
margin-right: auto;
max-width: 120ch;
font-size: smaller;
text-align: center;
}
footer a {
font-weight: 500;
}
#+end_src
#+begin_src css :tangle arcology/static/arcology/css/app.css :mkdirp yes
a {
color: var(--primary);
}
a::visited {
color: var(--secondary);
}
code {
font-style: normal;
}
#+end_src
There are per-site CSS in [[id:20231229T164611.256424][The Arcology's Site List]].
*** Generating =@font-face= rules for a bunch of fonts
[[id:cce/vulfpeck_fonts_are_fun][Vulfpeck Fonts]] are pulled in with this code-gen because writing =@font-face= rules does not bring joy and I don't have the right to redistribute these files, so I won't check it in at all.
#+NAME: font-face-tbl
| VulfSans | Regular | 500 | |
| VulfMono | Regular | 500 | |
| VulfSans | Bold | 800 | |
| VulfMono | Bold | 800 | |
| VulfSans | Italic | 500 | italic |
| VulfMono | Italic | 500 | italic |
| VulfSans | Bold_Italic | 800 | italic |
| VulfMono | Bold_Italic | 800 | italic |
| VulfSans | Light | 300 | |
| VulfMono | Light | 300 | |
| VulfSans | Light_Italic | 500 | italic |
| VulfMono | Light_Italic | 500 | italic |
#+NAME: gen_font_faces
#+begin_src elisp :var tbl=font-face-tbl :results none
(with-temp-buffer
(-map (pcase-lambda (`(,first ,second ,weight ,style))
(insert
(s-join "\n" (list
"@font-face {"
"font-family: " (if (equal first "VulfMono")
"\"Vulf Mono\""
"\"Vulf Sans\"")
"; src:"
(concat "url('/static/arcology/fonts/" first "-" second ".woff') format('woff'),")
(concat "url('/static/arcology/fonts/" first "-" second ".woff2') format('woff2'),")
(concat "url('/static/arcology/fonts/" first "-" second ".ttf') format('truetype');")
"font-weight: " (number-to-string weight) ";"
(unless (equal style "")
(concat "font-style: " style ";"))
"}"))))
tbl)
(write-file "~/org/arcology-django/arcology/static/arcology/css/vulf.css"))
#+end_src
*** NEXT this is a lever for restructuring the arcology
=app.html= template would be provided by a configuration-module repo that a user should set up on a template that depends on arroyo, arcology, roam modules. It would be the one responsible for setting up =gunicorn= etc, and also provide the command line wrapper
* NEXT Testing
- site from_request and from_key need to be tested
- site urlize page function needs to be tested too
- page collect functions at least need type annotations...
- =to_html= instance method needs to be tested (and the memoization too)
- =create_from_arroyo= too
- =feed= and =feedentry=
- both the =create_from_arroyo=, =to_html=
- the feed generator stuff in the view probably should go in to a model class, but test it.
- page handler view logic, test that 404s work, check that localhost loads work
- check optional sidebar stuff in the view logic
- sitemap when i write it
- per-site link color css endpoint