arcology-fastapi/arcology-fastapi.org

24 KiB
Raw Permalink Blame History

Arcology Python Prototype

I learned a lot in building The Arcology Project the first time around, and now that I have Migrated to org-roam v2 I need to evaluate the project, fix it, and get it running again.

Over the last few months, I have been playing with a Python package called FastAPI and loving the "batteries included" approach with modern Python 3 and an Flask- or Express-like router model rather than a full MVC framework which I was working with on Elixir

The Arcology is a FastAPI Web App

What we have here is a real simple roam:FastAPI application.

Run it: shell:uvicorn arcology.server:app --reload --host 0.0.0.0 &

from fastapi import FastAPI, Request
from sqlmodel import Session

from arcology.arroyo import Page, engine
import arcology.html as html

from arcology.parse import parse_sexp

app = FastAPI()

import uvicorn

#@click.command(help="start the DRP status servlet")
#@click.option("--host", "-h", help="the host IP to listen on, defaults to all IPs/interfaces", default="0.0.0.0")
#@click.option("--port", "-p", help="port to listen on", default=8000)
def start(host="0.0.0.0", port=8000):
    uvicorn.run("arcology.server:app", host=host, port=port)

Arcology's FastAPI Instrumentation and Observability

It's instrumented with prometheus-fastapi-instrumentator. There's not much to observe; i guess i'll want to include things about the pandoc generator and cache size, etc…

from prometheus_fastapi_instrumentator import Instrumentator

prometheus_instrumentor = Instrumentator()
# done after adding custom metrics now
# prometheus_instrumentor.instrument(app).expose(app)

request counts broken down by Arcology Sites as http_request_by_site_total in roam:Grafana

  • Arcology Tags themselves!? (see that sensitive topics are being linked to for example) this requires a DB call though, lol

https://github.com/trallnag/prometheus-fastapi-instrumentator#creating-new-metrics https://github.com/trallnag/prometheus-fastapi-instrumentator/blob/master/prometheus_fastapi_instrumentator/metrics.py

This instrument loads the site and arcology.key.ArcologyKey to emit counter metrics for each page and each site.

from typing import Callable
from prometheus_fastapi_instrumentator.metrics import Info
from prometheus_client import Counter
from arcology.sites import host_to_site
from arcology.key import ArcologyKey

import logging
logger = logging.getLogger(__name__)
logger.setLevel("INFO")

def http_request_sites_total() -> Callable[[Info], None]:
    METRIC = Counter(
        "http_request_by_site_total",
        "Number of times a site or page has been requested.",
        labelnames=("site", "key", "method", "status", "ua_type")
    )

    def instrumentation(info: Info) -> None:
        key = ArcologyKey.from_request(info.request)

        user_agent = info.request.headers.get("User-Agent")
        agent_type = get_agent_type(user_agent)

        <<shortcircuits>>

        if agent_type == "unknown":
            logger.info("Detected unknown user agent: {agent}", dict(agent=user_agent))

        METRIC.labels(key.site.key, key.key, info.method, info.modified_status, agent_type).inc()

    return instrumentation

prometheus_instrumentor.add(http_request_sites_total())

prometheus_instrumentor.instrument(app).expose(app)

get_agent_type tries to make some smart guesses to bucket the callers in to human/feed/fedi/crawler/etc buckets:

def get_agent_type(user_agent: str) -> str:

    if user_agent == "":
        return "no-ua"

    if "Synapse" in user_agent:
        return "matrix"
    if "Element" in user_agent:
        return "matrix"

    if "SubwayTooter" in user_agent:
        return "app"
    if "Dalvik" in user_agent:
        return "app"
    if "Nextcloud-android" in user_agent:
        return "app"

    if "prometheus" in user_agent:
        return "internal"
    if "feediverse" in user_agent:
        return "internal"

    if "Pleroma" in user_agent:
        return "fedi"
    if "Mastodon/" in user_agent:
        return "fedi"
    if "Akkoma" in user_agent:
        return "fedi"
    if "Friendica" in user_agent:
        return "fedi"
    if "FoundKey" in user_agent:
        return "fedi"
    if "MissKey" in user_agent:
        return "fedi"
    if "CalcKey" in user_agent:
        return "fedi"
    if "gotosocial" in user_agent:
        return "fedi"
    if "Epicyon" in user_agent:
        return "fedi"

    if "feedparser" in user_agent:
        return "feed"
    if "granary" in user_agent:
        return "feed"
    if "Tiny Tiny RSS" in user_agent:
        return "feed"
    if "Go-NEB" in user_agent:
        return "feed"
    if "Gwene" in user_agent:
        return "feed"
    if "Feedbin" in user_agent:
        return "feed"
    if "SimplePie" in user_agent:
        return "feed"
    if "Elfeed" in user_agent:
        return "feed"
    if "inoreader" in user_agent:
        return "feed"
    if "Reeder" in user_agent:
        return "feed"
    if "Miniflux" in user_agent:
        return "feed"

    if "Bot" in user_agent:
        return "bot"
    if "bot" in user_agent:
        return "bot"
    if "Poduptime" in user_agent:
        return "bot"

    if "Chrome/" in user_agent:
        return "browser"
    if "Firefox/" in user_agent:
        return "browser"
    if "DuckDuckGo/" in user_agent:
        return "browser"
    if "Safari/" in user_agent:
        return "browser"

    return "unknown"

Some of these URLs shouldn't be loaded and this bit of code in <<shortcircuits>> will ensure those requests aren't recorded by the per-site counter. Note that the paths aren't actually verified as existing in the database the status will be a 4xx if "normal" pages aren't loaded but for static assets and favicon there will be some "chatter" in the logs which I simply short-circuit out here.

if info.request.url.path.startswith("/metrics"):
    return
if info.request.url.path.startswith("/static"):
    return
if info.request.url.path.startswith("/favicon.ico"):
    return

Arcology Static Files and appearance

  • State "INPROGRESS" from "NEXT" [2021-12-18 Sat 17:53]

I can't be fucked to care about asset pipelines right now/these days. There's not a complex enough set of assets in this context there is the problem of Arcology Media Store and exposing attachment files. This is just enough to make it look naisu, and to give each site a bit of flavor through the Arcology Sites customization module.

from fastapi.staticfiles import StaticFiles
import os

static_directory = os.environ.get('STATIC_FILE_DIR', "arcology/static")

app.mount("/static", StaticFiles(directory=static_directory), name="static")

Base HTML Template

<html>
  <head>
    <meta name="author" content="Ryan Rix"/>
    <meta name="generator" content="Arcology Site Engine https://engine.arcology.garden/"/>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" href="/static/css/app.css"/>
    <link rel="stylesheet" href="/static/css/vulf.css"/>
    {% if site and site.css_file %}
    <link rel="stylesheet" href="/static/css/default-colors.css"/>
    <link rel="stylesheet" href="{{ site.css_file }}"/>
    {% else %}
    <link rel="stylesheet" href="/static/css/default-colors.css"/>
    {% endif %}
    {% block head %}
      <title>{{ site.title }}</title>
    {% endblock %}
  </head>
  <body>
    <header>
      {% block h1 %}
      <h1><a href='/'>{{ site.title }}</a></h1>
      <h2>{{ page.get_title() }}</h2>
      {% endblock %}
      <div>
        &bull; <a class="internal" href="https://thelionsrear.com">Life</a>
        &bull; <a class="internal" href="https://arcology.garden">Tech</a>
        &bull; <a class="internal" href="https://cce.whatthefuck.computer">Emacs</a>
        &bull; <a class="internal" href="https://doc.rix.si/topics">Topics</a>
        &bull; <a class="internal" href="https://engine.arcology.garden">Arcology</a>
        &bull;
      </div>
    </header>

    {% block body %}
    {% endblock %}

    <footer>
      <hr/>
      &copy; 02023 <a href="https://arcology.garden/people/rrix">Ryan Rix</a> &lt;<a href="mailto:site@whatthefuck.computer">site@whatthefuck.computer</a>&gt;

      <br/>

      <p>
        Care has been taken to publish accurate information to
        long-lived URLs, but context and content as well as URLs may
        change without notice.
      </p>

      <p>
        This site collects no personal information from visitors, nor
        stores any identifying tokens. If you or your personal
        information ended up in public notes please email me for
        correction or removal. A single bit cookie may be stored on
        your device if you choose to change appearance settings below.
      </p>

      <p>
        Email me with questions, comments, insights, kind criticism.
        blow horn, good luck.
      </p>

      <p>
        <a href="/sitemap/">View the Site Map</a>
      </p>

      <p>
        <a class="internal" href="https://fediring.net/previous?host=arcology.garden">&larr;</a>
        <a class="internal" href="https://fediring.net/">Fediring</a>
        <a class="internal" href="https://fediring.net/next?host=arcology.garden">&rarr;</a>
      </p>

      <p>
        <input type="checkbox" id="boredom-mode"><label for="boredom-mode">I do not like your aesthetic sensibilities!!</label>
      </p>

      <script type="text/javascript">
        <<boredom>>
      </script>
    </footer>
  </body>
</html>

People don't like my aesthetic choices, that's fine. Stick a check-box on the bottom which makes it more endearing for them. The class which is added to the body is defined below.

var boredomCheckbox = document.querySelector('#boredom-mode');
var body = document.querySelector('body');

var setBoredom = function(enabled) {
  if (enabled) {
    body.classList.add("boredom");
  } else {
    body.classList.remove("boredom");
  }
    localStorage.setItem("boredom", enabled);
    boredomCheckbox.checked = enabled;
};

var updateClass = function () {
  var checked = boredomCheckbox.checked
  setBoredom(checked);
};

boredomCheckbox.addEventListener('click', updateClass);
setBoredom(localStorage.getItem("boredom") == "true"); // fucking stringly type DOM APIs

Page HTML Templates

{% extends "base.html.j2" %}

{% block h1 %}
  <h1><a href='/'>{{ site.title }}</a></h1>
  <h2>{{ page.get_title() }}</h2>
{% endblock %}

{% block head %}
  <title>{{ page.get_title() }} - {{ site.title }}</title>
  {% for feed in feeds %}
    <link rel="alternate" type="application/atom+xml" href="{{ feed[0] }}" title="{{ feed[1] }}" />
  {% endfor %}
  {% if page.allow_crawl is none or page.allow_crawl=='"nil"' %}
    <meta name="robots" content="noarchive noimageindex noindex nofollow"/>
  {% else %}
    <meta name="robots" content="noarchive noimageindex "/>
  {% endif %}
{% endblock %}

{% block body %}
  <main>
    <section class="body">
      {{ document | safe}}
    </section>
  </main>

  <section class="backlinks">
    {% if page.references %}
      <h3>
        See:
        {% for ref in page.references %}
          [<a href="{{ref.url()}}">ref</a>]&nbsp
        {% endfor %}
      </h3>
    {% endif %}

    {% if backlink %}
      <h2>Pages which Link Here</h2>

      {{ backlink | safe}}
    {% endif %}
  </section>
{% endblock %}

Arcology Site CSS

Look, there's not a lot of "there there". The default color variables are "nice to have".

:root {
  --alert: #CC6960;
  --primary: #707231;
  --secondary: #ebbe7b;
  --success: #67b4f8;
  --warning: #7e5c41;


  --white: #fcf6ed;
  --light-gray: #f6e5cb;
  --medium-gray: #BAAD9B;
  --dark-gray: #82796C;
  --black: #211F1C;
}

Dead links will be annotated by the HTML Rewriter and Hydrater with this class if they're internal links to pages which are not marked.

.dead-link::after {
    content: '🔗⚠';
}
.dead-link {
    color: var(--alert) !important;
}

Experimental: Mark external and internal URLs with an emoji.

/* a.internal::after {
     content: '';
} */
body.boredom a::before {
    content: '' !important;
}

a[href*="arcology.garden"]::before,
a[href*="dev.arcology.garden"]::before {
    content: '🌱 ';
    font-style: normal;
}

a[href*="thelionsrear.com"]::before,
a[href*="dev.thelionsrear.com"]::before {
    content: '🐲 ';
    font-style: normal;
}

a[href*="engine.arcology.garden"]::before {
    content: '🧑‍🔧 ';
    font-style: normal;
}

a[href*="dev.cce"]::before,
a[href*="cce.whatthefuck.computer"]::before,
a[href*="cce.rix.si"]::before {
    content: '♾️ ';
    font-style: normal;
}

a[href*="doc.rix.si"]::before {
    content: '✒️️ ';
    font-style: normal;
}

a[href*="localhost"]::before {
    content: '📚️️ ';
    font-style: normal;
}

a[href*="//"]:not(.internal)::before {
    content: '🌏 ';
    font-style: normal;
}

Color these things. The defaults are specified above, Sites can override these (and add other rules entirely of course!).

a {
    color: var(--primary);
    font-weight: 500;
}

a:visited {
    color: var(--warning);
}

pre, code {
    background-color: var(--light-gray);
}

.tags .tag {
    background-color: var(--success);
    color: var(--light-gray);
}

Configure the body, they headers, the footers, the whole dang lot! note that i use the vulf mono font make sure to bring your own!

the <body> is the "root" of the rendered elements.

body {
    font-family: "Vulf Mono", monospace;
    font-style: italic;
    font-size: 14px;
    background-color: var(--white);
    color: var(--black);
}

People seem to really dislike Vulf Mono so I'll add a checkbox eventually to set a cookie that disables it by adding this CSS class to the body. If they don't like that they can use reader mode or browse a different web site.

body.boredom {
    font-family: "Comic Sans", "Helvetica", "Sans Serif" !important;
    font-style: normal;
}

body.boredom main, body.boredom .backlinks, body.boredom .verbatim, body.boredom .sourceCode {
    background-color: var(--white);
    color: var(--black);
}

All headings are italic, the headings inside of <header> are displayed inline with each other rather than blocking them out.

h1,h2,h3,h4,h5,h6 {
    font-style: italic;
}

h1,h2,h3,h4,h5,h6 > code.verbatim {
    font-style: regular;
}

h1 code.verbatim {
    font-style: normal;
}
h2 code.verbatim {
    font-style: normal;
}
h3 code.verbatim {
    font-style: normal;
}


header > h1, header > h2 {
  display: inline;
}

header > h1:after {
  content: " —";
}

It's important things have room to breath

header {
    padding: 0.5em;
    border-radius: 1em;
    background-color: var(--light-gray);
    margin-bottom: 2em;
}

main, section.backlinks {
    padding: 0.5em;
    border-radius: 1em;
    background-color: var(--light-gray);
    border: 1px var(--medium-gray) solid;
    font-weight: 300;
}

main strong {
    font-weight: 700;
}

main, header :first-child {
    margin-top: 0 !important;
}

Margins must be set. This centers the major text sections on the page and lets them stretch to 80 characters. This is the holy and correct number for text to be displayed at, I guess, lol.

footer, section.backlinks, main {
    margin: 1em auto;
    max-width: 80em;
}

footer {
    text-align: center;
}

footer a {
    font-weight: 500;
}

Experimental: when hovering over code blocks, it will try to show you what file it's writing to.

pre.sourceCode {
    padding-left: 2em;
    padding-bottom: 1em;
    overflow: scroll;
}

.sourceCode[data-noweb]::before {
  content: "noweb setting " attr(data-noweb);
}

.sourceCode[data-noweb-ref]::before {
  content: "noweb interpolated as " attr(data-noweb-ref);
}

.sourceCode[data-tangle]::before {
  content: "write file to " attr(data-tangle);
}

.sourceCode, code {
    font-style: normal;
}

Various tweaks for SRS and friends. I should do this in Rewriting and Hydrating the Pandoc HTML

.tag .smallcaps {
    float: right;
    font-variant-caps:  small-caps;
    padding: 0.25em;
}

.REVIEW_DATA.drawer {
    display: none;
}

.fc-cloze {
    font-style: normal;
    text-decoration: underline;
}

Sitemap should have a height:

#sitemap-container {
  height: 100%;
}

Print media should look boring:

@media print {
    header {
        display: none;
    }
    main {
        border: none;
    }
    section.backlinks {
        display: none;
    }
    footer {
        display: none;
    }
    p {
        break-before: avoid;
    }
    
    body {
        font-family: "Comic Sans", "Helvetica", "Sans Serif" !important;
        font-style: normal;
    }

    body main, body .backlinks, body .verbatim, body .sourceCode {
        background-color: var(--white);
        color: var(--black);
    }
}
Generataing @font-face rules for a bunch of fonts

Vulfpeck Fonts are pulled in with this code-gen because writing @font-face rules does not bring joy and I don't have the right to redistribute these files, so I won't check it in at all.

VulfSans Regular 500
VulfMono Regular 500
VulfSans Bold 800
VulfMono Bold 800
VulfSans Italic 500 italic
VulfMono Italic 500 italic
VulfSans Bold_Italic 800 italic
VulfMono Bold_Italic 800 italic
VulfSans Light 300
VulfMono Light 300
VulfSans Light_Italic 500 italic
VulfMono Light_Italic 500 italic
(with-temp-buffer
  (-map (pcase-lambda (`(,first ,second ,weight ,style))
          (insert
           (s-join "\n" (list
                         "@font-face {"
                         "font-family: "  (if (equal first "VulfMono")
                                              "\"Vulf Mono\""
                                            "\"Vulf Sans\"")
                         "; src:"
                         (concat "url('/static/fonts/" first "-" second ".woff') format('woff'),")
                         (concat "url('/static/fonts/" first "-" second ".woff2') format('woff2'),")
                         (concat "url('/static/fonts/" first "-" second ".ttf') format('truetype');")
                         "font-weight: " (number-to-string weight) ";"
                         (unless (equal style "")
                           (concat "font-style: " style ";"))
                         "}"))))
        tbl)
  (write-file "~/org/arcology-fastapi/arcology/static/css/vulf.css"))
NEXT [#C] tufte sidenotes for the backlinks -> HTML should inject sidenotes in during rewrite_html?
NEXT [#C] page template for a backlink buffer like Topic Index

Wiring up Arcology Routing Logic

The Arcology Routing Logic needs to be wired up to the server, after the static asset routes are defined.

import arcology.routing.domains as domains
app = domains.decorate_app(app)

NEXT [#B] Org pre-processing

  • remove org-fc drawers
  • strip :NOEXPORT: headings (??)
  • rewrite org-fc clozes

    def fc_cloze_replacement_fn(match):
      main = match.group(1)
      hint = match.group(2)
      num = match.group(3)
    
      print("XXX", main, hint, num)
      return '<span cloze="{num}" alt="{hint}">{main}</span>'
    
    # output_html = re.sub(r'{{([^}]+)}{([^}]+)}@([0-9])}', fc_cloze_replacement_fn, output_html)

Arcology BaseSettings Configuration Class

Ref FastAPI Settings and Pydantic Settings management.

This is mostly used to coordinate the Arcology Batch Commands but will eventually contain all configurable elements of the web server and inotify worker.

from pydantic import BaseSettings
from enum import Enum
from functools import lru_cache
from pathlib import Path

class Environment(str, Enum):
    prod = "prod"
    dev  = "dev"

class Settings(BaseSettings):
    arcology_directory: Path = Path("~/org")

    arcology_src: Path = Path("~/org/arcology-fastapi")
    arroyo_src: Path = Path("~/org/arroyo")
    arroyo_emacs: Path = Path("emacs")

    arcology_db: Path = Path("~/org/arcology-fastapi/arcology.db")
    org_roam_db: Path = Path("~/org/arcology-fastapi/org-roam.db")

    db_generation_debounce: int = 15
    db_generation_cooldown: int = 300

    arcology_env: Environment = Environment.dev

@lru_cache
def get_settings():
    return Settings()

Translate in/out of s-expression forms with sexpdata

Use sexpdata to decode some of the keys which come out of the org-roam EmacSQL. At some point I could do some hackery-pokery to monkeypatch this in to some points to magically unwrap fields. For now it'll be great in __str__ and some property access methods.

import sexpdata as sexp

def parse_sexp(in_sexp: str):
    return sexp.loads(in_sexp)

def print_sexp(in_obj) -> str:
    return sexp.dumps(in_obj)