arcology-fastapi/arcology-arroyo.org

39 KiB
Raw Permalink Blame History

Arroyo Arcology Generator

shell:ln -s arroyo-arcology.el ~/org/cce/arroyo-arcology.el this needs to be in the CCE directory for Arroyo Emacs to automatically load it.

his can be set up to automatically load in an Arroyo Emacs environment.

The Arcology is fundamentally about rendering and sharing entire org-mode documents on the web. This made the direct usage of org-roam's database a pretty straight-forward endeavor, until the migration to a Node-centered model with org-roam v2. This model has made my note-taking much better but it's forced me to rethink the data model of the Arcology pretty significantly.

This ultimately has developed over 2021 as Arroyo Systems Management a set of sidecar metadata tables for my notes and the org-mode meta applications built on top of them. The Arcology's database is a set of tables derived from the metadata in my org-mode files. This database is generated inside of Emacs and mounted read-only by my FastAPI session via SQLModel. I would love to generate this database another way, but there is still only one high-quality org parser: org-mode.

The "entry point" of this API is the arcology.arroyo.Page below. It has some class methods hanging off it which can instantiate Pages from the database by filename or routing key.

A page doesn't require much metadata to render or be found, really. The org-mode source file, its ARCOLOGY_KEY routing key, and the root arcology.roam.Node object's primary ID. Most of this can be gleaned from the arcology.roam.File object and my Keyword sidecar.

(add-to-list 'arroyo-db-keywords "ARCOLOGY_KEY")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_FEED")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_TOOT_VISIBILITY")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_ALLOW_CRAWL")

The ARCOLOGY_KEY is a file property which contains the page's "routing key" a string with at least one / in it which separates the site it'll publish to from the path it'll be published on this maps to a URL in the form of localhost:3000/$ARCOLOGY_KEY or the first part will map to one of the public domains. this will make more sense later on.

The ARCOLOGY_FEED is a file property which contains a routing key to an RSS feed

This is assembled using noweb syntax because Page relies on Link being defined for the link_model relationship… And there is some more code that makes it in to arcology.arroyo for setting up the session and engine down below under Arcology SQLModel Database Bindings

from typing import Optional, List
from sqlmodel import Field, Relationship, SQLModel

from arcology.parse import parse_sexp, print_sexp

<<arcology.arroyo.Link>>
<<arcology.arroyo.Page>>
<<arcology.arroyo.Tag>>
<<arcology.arroyo.Node>>
<<arcology.arroyo.Ref>>
<<arcology.arroyo.Keyword>>
<<arcology.arroyo.Feed>>

Anyways.

NEXT document schemas

explain inter-relations between these classes, maybe a relationship graph

explain columns and link to where specialized columns like allow_crawl go and come from?

Arcology Page

A Page represents the minimal metadata required to find and render an org-mode document and generate links to it. I would love to someday not have to wire up all these relationships by hand, I'll have to remodel this at some point, but for now specifying all the primaryjoin characteristics is enough.

from sqlmodel import Session, select
import hashlib

from arcology.key import ArcologyKey, id_to_arcology_key
import arcology.html as html

class Page(SQLModel, table=True):
  __tablename__ = "arcology_pages"
  file: str = Field(primary_key=True)
  key: str = Field(description="The ARCOLOGY_KEY for the page")
  title: str = Field(description="Primary title of the page")
  hash: str = Field(description="The hash of the file when it was indexed")
  root_id: str = Field(description="The ID for the page itself", foreign_key="nodes.node_id")
  site: str = Field(description="Maps to an arcology.Site key.")
  allow_crawl: str = Field(description="Lisp boolean for whether this page should go in robots.txt")

  nodes: List["Node"] = Relationship(
    back_populates="page",
    sa_relationship_kwargs=dict(
      primaryjoin="Node.file==Page.file"
    )
  )
  tags: List["Tag"] = Relationship(
    sa_relationship_kwargs=dict(
      primaryjoin="Tag.file==Page.file"
    )
  )
  references: List["Reference"] = Relationship(
    sa_relationship_kwargs=dict(
      primaryjoin="Reference.file==Page.file"
    )
  )

  def get_title(self):
    return parse_sexp(self.title)

  def get_key(self):
    return parse_sexp(self.key)

  def get_file(self):
    return parse_sexp(self.file)

  def get_arcology_key(self):
    return ArcologyKey(self.get_key())

  def get_site(self):
    return self.get_arcology_key().site

  <<page_link_relationships>>
  <<page_classmethods>>
  <<page_html_generators>>
@classmethod
def from_file(cls, path: str, session: Session):
  q = select(cls).where(cls.file==print_sexp(path))
  return session.exec(q).one()

@classmethod
def from_key(cls, key: str, session: Session):
  q = select(cls).where(cls.key==print_sexp(key))
  try:
    return next(session.exec(q))
  except StopIteration:
    return None

The Page carries bi-directional link relationships to both the Link and the Page on the other side of it.

backlinks: List["Link"] = Relationship(
    back_populates="dest_page",
    sa_relationship_kwargs=dict(
        primaryjoin="Page.file==Link.dest_file"
    )
)

outlinks: List["Link"] = Relationship(
    back_populates="source_page",
    sa_relationship_kwargs=dict(
        primaryjoin="Page.file==Link.source_file"
    )
)

backlink_pages: List["Page"] = Relationship(
    link_model=Link,
    back_populates="outlink_pages",
    sa_relationship_kwargs=dict(
        foreign_keys="[Link.dest_file]",
        viewonly=True,
    )
)

outlink_pages: List["Page"] = Relationship(
    link_model=Link,
    back_populates="backlink_pages",
    sa_relationship_kwargs=dict(
        foreign_keys="[Link.source_file]",
        viewonly=True,
    )
)

The code to insert a page relies on a bunch of stuff pulled out of the page and out of the Arcology Keywords store be sure the arguments line up, and maybe i should switch these to use &keys eventually so that it's less foot-gun-shaped

(add-to-list 'arroyo-db--schemata
             '(arcology-pages
                [(file :not-null)
                 (key :not-null)
                 (site :not-null)
                 (title :not-null)
                 (root-id :not-null)
                 (allow-crawl)
                 (hash :not-null)]))

(defun arroyo-arcology--insert-page (file kw site title root-id allow-crawl hash)
  (arroyo-db-query [:delete :from arcology-pages
                    :where (= file $s1)]
                   file)
  (arroyo-db-query [:insert :into arcology-pages :values $v1]
                   (vector file kw site title root-id allow-crawl hash)))

Generating HTML from Arcology Pages

Arcology pages have two "documents" attached to them on render: the Org doc itself, and a document constituted from the backlinks.

The backlink document is generated dynamically using Page.make_backlinks_org which just generates a string from the Link relationships.

def make_backlinks_org(self):
    if self.backlinks is None:
        return ''

    def to_org(link: Link):
        return \
            """
            ,* [[id:{path}][{title}]]
            """.format(
                path=parse_sexp(link.source_id),
                title=link.get_source_title()
            )

    return '\n'.join([ to_org(link) for link in self.backlinks ])

async def document_html(self):
    cache_key = parse_sexp(self.hash)
    return html.gen_html(parse_sexp(self.file), cache_key)

async def backlink_html(self):
    org = self.make_backlinks_org()
    cache_key = hashlib.sha224(org.encode('utf-8')).hexdigest()
    return html.gen_html_text(org, cache_key)

Invoking Pandoc

Pandoc is used to generate the HTML for a page. It's a versatile kit and I do some fair bit to extend it in other places, for example in the

The HTML generation is done using PyPandoc, which I guess is just a shell wrapper around it. Caching is cheated with an functools.lru_cache; for this to work out well I need to bring the file's hash in to the arcology.arroyo.Page so that the cache can bust when the document is updated.

import functools
import pypandoc

@functools.lru_cache(maxsize=128)
def gen_html(input_path: str, extra_cache_key: str = '', input_format: str = 'org'):
    return pypandoc.convert_file(input_path, 'html', format='org')

@functools.lru_cache(maxsize=128)
def gen_html_text(input_text: str, extra_cache_key: str = '', input_format: str = 'org'):
    return pypandoc.convert_text(input_text, 'html', format='org')

Rewriting and Hydrating the Pandoc HTML

So the HTML that comes out of Pandoc is smart but doesn't understand, for example, ID links; I could of course use Emacs and its org-html-export-as-html but that shit is gonna be really slow. Instead I'll do the work myself (lol).

from arcology.parse import print_sexp, parse_sexp
import arcology.arroyo as arroyo

import sqlmodel
import re
from typing import Optional

from arcology.key import id_to_arcology_key, file_to_arcology_key

class HTMLRewriter():
  def __init__(self, session):
    self.res_404 = 'href="/404?missing={key}" class="dead-link"'
    self.session = session

  def replace(match):
    raise NotImplementedError()

  def re(self):
    raise NotImplementedError()

  def do(self, output_html):
    return re.sub(self.re(), self.replace, output_html)

Rewriting the HTML is a pretty straightforward affair using re.sub with callbacks rather than static replacements, with some abstraction sprinkled on top in the form of the HTMLRewriter superclass defined above. Each implementation of it provides a function which accepts the match object, and pulls the node's ARCOLOGY_KEY">ARCOLOGY_KEY with an optional node-id anchor attached to it. This is then farmed out to arcology_key_to_url">arcology_key_to_url or so to be turned in to a URL. In this fashion, each href is replaced with a URL that will route to the target page, or a 404 page link with a CSS class attached.

I'm pretty sure this is all quite inefficient but as always I invoke Personal Software Can Be Shitty.

So ID links can be rewritten like:

class IDReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    id = match.group(1)
    key = id_to_arcology_key(id, self.session)
    if key is None:
      return self.res_404.format(key=id)
    else:
      return 'class="internal" href="{url}"'.format(url=arcology_key_to_url(key))

  def re(self):
    return r'href="id:([^"]+)"'

File links can be rewritten like:

class FileReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    file = match.group(1)
    if file is None:
      return self.res_404.format(key=file)
    key = file_to_arcology_key(file, self.session)
    if key is None:
      return self.res_404.format(key=file)
    else:
      return 'class="file" href="{url}"'.format(url=arcology_key_to_url(key))

  def re(self):
    return r'href="file://([^"]+)"'

org-roam stub links can be rewritten link. This one is a little wonky because res_404 and the other regexen don't only want to operate on the anchor's attribute. This one wants to strip the roam: text from the [[roam:Stub]] links.

class RoamReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    return self.res_404.format(key=match.group(1)) + ">"

  def re(self):
    return r'href="roam:([^"]+)">roam:'

I also make some quality-of-life rewrites of my org-fc cloze cards in to simple <span> elements with the hint embedded in them.

class FCClozeReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    main = match.group(1) or ""
    hint = match.group(2) or ""
    hint = re.sub(r"</?[^>]+>", "", hint)
    return f"<span class='fc-cloze' title='{hint}'>{main}</span>"
  
  def re(self):
    return r'{{([^}]+)}{?([^}]+)?}?@[0-9]+}'

Invoke all these in a simple little harness:

def rewrite_html(input_html: str, session: sqlmodel.Session) -> str:
  """
  Run a series of replacement functions on the input HTML and return a new string.
  """

  output_html = input_html

  rewriters = [
    IDReplacementRewriter(session),
    FileReplacementRewriter(session),
    RoamReplacementRewriter(session),
    FCClozeReplacementRewriter(session),
  ]

  for rewriter in rewriters:
    output_html = rewriter.do(output_html)

  return output_html

It's logical that at some point this will have a "pluggable" URL engine, and in fact the production URLs will be hosted under different domains so deconstructing a URL to an ARCOLOGY_KEY … all of this can happen later, I am just playing jazz right now!

from arcology.key import ArcologyKey

def arcology_key_to_url(key: ArcologyKey) -> str:
  return key.to_url()

arcology.key.ArcologyKey encapuslates parsing and rendering URLs

The ArcologyKey is a simple dataclass encapsulating the things which the ARCOLOGY_KEY page keyword represents.

For example the key ArcologyKey(key=arcology/arroyo#arcology/arroyo/key) will contain the following properties:

  • key: the key passed in
  • site_key: this everything up to the first slash. It points to objects defined and fetchable through Arcology Sites.
  • site: I typed the line above, and said "oh", and added this resolution of the arcology.sites.Site object.
  • rest: "rest" is everything after the slash, but up to an optional anchor
  • anchor_id: said optional anchor Pandoc headings within the page will have the ID property as the anchor, this is handy!
from dataclasses import dataclass
from typing import Optional

from fastapi import Request
from starlette import routing
import sqlmodel

from arcology.parse import parse_sexp, print_sexp
from arcology.sites import sites, Site
from arcology.config import get_settings, Environment
from arcology.sites import host_to_site

route_regexp, _, _ = routing.compile_path("/{sub_key:path}/")
route_regexp2, _, _ = routing.compile_path("/{sub_key:path}")

import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

@dataclass
class ArcologyKey():
  key: str
  site_key: str
  site: Site
  rest: str = ""
  anchor_id: Optional[str] = None

  def __init__(self, key: str, site_key="", rest="", anchor_id = None):
    self.key = key
    self.site_key=site_key
    self.rest = rest
    self.anchor_id = anchor_id
    
    stop = '/'
    idx = 0
    collector = [""]
    for char in key:
      if char == stop:
        stop = '#'
        idx += 1
        collector = collector + [""]
        continue
      collector[idx] += char

    if len(collector) > 0:
      self.site_key = collector[0]
      self.site = sites.get(self.site_key, None)
    if len(collector) > 1:
      self.rest = collector[1]
    if len(collector) > 2:
      self.anchor_id = collector[2]

  def to_url(self) -> str:
    env = get_settings().arcology_env
    domains = self.site.domains.get(env, None)

    url = ""
    if domains is not None:
      url = "https://{domain}/{rest}".format(domain=domains[0], rest=self.rest)
    else:
      url = "http://localhost:8000/{key}".format(key=self.key)
    if self.anchor_id is not None:
      url = url + "#" + self.anchor_id

    return url

  def from_request(request: Request):
    path = request.url.path
    host = request.headers.get('host')
    return ArcologyKey.from_host_and_path(host, path)

  def from_host_and_path(host: str, path: str):
    m = route_regexp.match(path) or route_regexp2.match(path) or None
    if m is None:
      logger.debug("no path match: %s", path)
      return None
    sub_key = m.group("sub_key")

    site = host_to_site(host)
    if site is None:
      logger.debug("no host match: %s", host)
      return None

    if len(sub_key) == 0:
      sub_key = "index"
    key = "{site_key}/{sub_key}".format(
      site_key=site.key,
      sub_key=sub_key,
    )
    return ArcologyKey(key)

Retrieving the ARCOLOGY_KEY given an ID is a pretty straightforward SQLModel query, actually. If the referenced node is in the Arroyo database, by definition it's got a published arcology.arroyo.Page, and so it's a matter of going and fetching it. If the Node is the root node (a direct link to the document), simply return the key, otherwise append the node-id to it so that a URL can link directly to the heading's anchor.

def id_to_arcology_key(id: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
  """
  Given a node ID, return the ARCOLOGY_KEY for the node.
  """
  from .arroyo import Node

  linked_node_query = sqlmodel.select(Node) \
                              .where(Node.node_id==print_sexp(id))
  res = session.exec(linked_node_query)

  linked_node = res.all()
  if len(linked_node) == 1:
    linked_node = linked_node[0]
    linked_page = linked_node.page

    if linked_page == None:
      return None

    page_key = parse_sexp(linked_page.key)
    ret = ArcologyKey(key=page_key)
    if linked_node.level != 0:
      ret.anchor_id = id
    return ret

  elif len(linked_node) != 0:
    raise Exception(f"more than one key for node? {id}")
  else:
    return None

By File is even more simple:

def file_to_arcology_key(file: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
  """
  Given a node ID, return the ARCOLOGY_KEY for the node.
  """
  from .arroyo import Page
  key_q = sqlmodel.select(Page).where(Page.file == print_sexp(file))
  page = session.exec(key_q).first()

  if page is None:
    return
  page_key = parse_sexp(page.key)
  return ArcologyKey(key=page_key)

NEXT HTML should inject sidenotes in during rewrite_html?

this would be slow and maybe janky but that's probably fine once it's memoized. :Project: :Project:

but this would mean that node backlinks would appear in-line, things like Topic Index have some trouble otherwise.

Arcology Tags

class Tag(SQLModel, table=True):
  __tablename__ = "arcology_tags"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  tag: str = Field(primary_key=True, description="The tag itself.")
  node_id: str = Field(description="A heading ID which the tag applies to")

  def tag(self):
    return parse_sexp(self.tag)

A page has any number of tags according to the file primary key:

(add-to-list 'arroyo-db--schemata
             '(arcology-tags
               [(file :not-null)
                (tag :not-null)
                (node-id :not-null)]))

(defun arroyo-arcology--insert-tags (file node-tags)
  (arroyo-db-query [:delete :from arcology-tags
                    :where (= file $s1)]
                   file)
  (pcase-dolist (`(,tag ,node-id) node-tags)
    (arroyo-db-query [:insert :into arcology-tags
                      :values $v1]
                     (vector file tag node-id))))

Arcology Links

And for rewriting the links to point to their routing key, two tables:

A links table which contains the file and node ID references, as well as the title of the source file which can be used to quickly generate backlink listings for a given page (and its sub-heading nodes):

class Link(SQLModel, table=True):
  __tablename__ = "arcology_links"
  source_title: Optional[str] = Field(default="", description="The title of the page the link is written in.")

  def get_source_title(self):
    return parse_sexp(self.source_title)

  source_id: str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
  source_node: Optional["Node"] = Relationship(
    sa_relationship_kwargs=dict(
      # back_populates="outlinks",
      primaryjoin="Node.node_id == Link.source_id"
    )
  )

  dest_id:   str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
  dest_node: Optional["Node"] = Relationship(
    sa_relationship_kwargs=dict(
      # back_populates="backlinks",
      primaryjoin="Node.node_id == Link.dest_id"
    )
  )

  source_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  source_page: Optional["Page"] = Relationship(
    back_populates="outlinks",
    sa_relationship_kwargs=dict(
      primaryjoin="Page.file==Link.source_file"
    )
  )

  dest_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  dest_page: Optional["Page"] = Relationship(
    back_populates="backlinks",
    sa_relationship_kwargs=dict(
      primaryjoin="Page.file==Link.dest_file"
    )
  )

Links in the org-roam database have a useful type column. We only store ID Links for now… probably can support file links easily enough but other "unidirectional" links I would like to store elsewhere I think.

(add-to-list 'arroyo-db--schemata
             '(arcology-links
               [source-title
                (source-file :not-null)
                (source-id :not-null)
                (dest-file :not-null)
                (dest-id :not-null)]))

(defun arcology--published-page? (file)
  (not (not (arroyo-db-get "ARCOLOGY_KEY" file))))

(defun arroyo-arcology--insert-links (file source-title links)
  (arroyo-db-query [:delete :from arcology-links
                    :where (= source-file $s1)]
                   file)
  (pcase-dolist (`(,source ,dest ,type ,props) links)
    (cond ((equal type "id")
           (pcase-let* ((dest-file (caar (org-roam-db-query
                                    [:select file :from nodes
                                     :where (= id $s1)]
                                    dest)))
                        (`(,immediate-source-title ,immediate-source-level)
                         (car (org-roam-db-query
                               [:select [title level] :from nodes
                                 :where (= id $s1)]
                                source)))
                        ;; "level 0 -> level n" unless n == 0
                        (composed-node-title
                         (if (= 0 immediate-source-level)
                             source-title
                             (concat source-title " -> " immediate-source-title))))
             (when (and dest-file (arcology--published-page? dest-file)
                        (arroyo-db-query [:insert :into arcology-links
                                          :values $v1]
                                         (vector composed-node-title file source dest-file dest))))))
          ;; insert https link?
          ((equal type "https") nil)
          ((equal type "http") nil)
          ((equal type "roam") nil)
          (t nil))))

INPROGRESS source_title should populate with the immediate parent header's title, not level 0

  • State "INPROGRESS" from "NEXT" [2022-08-05 Fri 14:03]

It's passed in to arroyo-arcology--insert-links Below. Not sure the better way to do that query org-roam-db in the insert function itself? good enough for now prolly.

deal with the title being fetched and populated in that function below if necessary.

Arcology Nodes

A nodes table will help in reassembling links in to HREFs, in theory, but i don't think it's necessary? maybe? There are bunch of other metadata on this that I would like to pull across from org-roam eventually.

class Node(SQLModel, table=True):
  __tablename__ = "arcology_nodes"
  node_id: str = Field(primary_key=True, description="The heading ID property")
  file: str = Field(description="File in which this Node appears", foreign_key="arcology_pages.file")
  level: str = Field(description="Outline depth of the heading. 0 is top-level")

  page: Optional["Page"] = Relationship(
    back_populates="nodes",
    sa_relationship_kwargs=dict(
      viewonly=True,
      primaryjoin="Node.file==Page.file"
    )
  )
(add-to-list 'arroyo-db--schemata
             '(arcology-nodes
               [(node-id :not-null)
                (file :not-null)
                (level :not-null)]))

(defun arroyo-arcology--insert-nodes (file nodes)
  (arroyo-db-query [:delete :from arcology-nodes
                    :where (= file $s1)]
                   file)
  (pcase-dolist (`(,file ,id ,level) nodes)
    (arroyo-db-query [:insert :into arcology-nodes
                      :values $v1]
                     (vector id file level))))

Arcology References

Each org-roam node can have a set of "references" attached to them, I use these URIs to point to a "canonical" resource which the node is referencing.

class Reference(SQLModel, table=True):
  __tablename__ = "arcology_refs"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  ref: str = Field(primary_key=True, description="The full URI of the reference itself.")
  node_id: str = Field(description="A heading ID which the ref applies to")

  def url(self):
    return parse_sexp(self.ref)

A page has any number of refs according to the file primary key:

(add-to-list 'arroyo-db--schemata
             '(arcology-refs
               [(file :not-null)
                (ref :not-null)
                (node-id :not-null)]))

(defun arroyo-arcology--insert-refs (file node-refs)
  (arroyo-db-query [:delete :from arcology-refs
                    :where (= file $s1)]
                   file)
  (pcase-dolist (`(,ref ,type ,node-id) node-refs)
    (arroyo-db-query [:insert :into arcology-refs
                      :values $v1]
                     (vector file (format "%s:%s" type ref) node-id))))

INPROGRESS Arcology Feeds

  • State "INPROGRESS" from [2023-01-24 Tue 23:33]
class Feed(SQLModel, table=True):
  __tablename__ = "arcology_feeds"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  key: str = Field(primary_key=True, description="The routing key for the feed.")
  title: str = Field(description="Title of the page which the feed is embedded in")
  site: str = Field(description="Arcology Site which the feed resides on.")
  post_visibility: str = Field(description="Visibility of the feed's posts in feed2toot, etc")

  def get_key(self):
    return parse_sexp(self.key)

  def get_arcology_key(self):
    return ArcologyKey(self.get_key())

  def get_title(self):
    return parse_sexp(self.title)

  def get_site(self):
    return parse_sexp(self.site)

  def get_post_visibility(self):
    return parse_sexp(self.post_visibility)

  def dict(self, **kwargs):
    return dict(
      key=self.get_key(),
      url=self.get_arcology_key().to_url(),
      title=self.get_title(),
      site=self.get_site(),
      visibility=self.get_post_visibility(),
    )

A page has any number of feeds according to the file primary key:

(add-to-list 'arroyo-db--schemata
             '(arcology-feeds
               [(file :not-null)
                (key :not-null)
                (title :not-null)
                (site :not-null)
                (post-visibility :not-null)]))

(defun arroyo-arcology--insert-feeds (file)
  (arroyo-db-query [:delete :from arcology-feeds
                    :where (= file $s1)]
                   file)
  (if-let* ((key (car (arroyo-db-get "ARCOLOGY_FEED" file)))
            (site (replace-regexp-in-string "/.*" "" key)))
      (let* ((title (arroyo-db--get-file-title-from-org-roam file))
             (post-visibility (car (arroyo-db-get "ARCOLOGY_TOOT_VISIBILITY" file))))
        (arroyo-db-query [:insert :into arcology-feeds
                          :values $v1]
                         (vector file key title site post-visibility)))))

Arcology Keywords

All of these models are generated below from the ARCOLOGY_KEY entities embedded on each page. these are Keywords, a 3-tuple of file, keyword, value, a threeple

class Keyword(SQLModel, table=True):
  __tablename__ = "keywords"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  keyword: str = Field(primary_key=True, description="")
  value: str = Field(description="The value of the page")

  def filename(self):
    return parse_sexp(self.file)

  def keyword(self):
    return parse_sexp(self.keyword)

  def value(self):
    return parse_sexp(self.value)

  @classmethod
  def get(cls, key: str, value: str, session: Session):
    q = select(cls).where(cls.keyword==print_sexp(key)).where(cls.value==print_sexp(value))
    try:
      return next(session.exec(q))
    except StopIteration:
      return None

Arcology Arroyo System Database Generator

Putting all those update functions together in an arroyo-db update function. This has to run after the org-roam and Arroyo System Cache keyword database is built, this is annoyign and I need to rethink it.

(defun arroyo-arcology-update-file (&optional file)
  (interactive)
  (when-let* ((file (or file (buffer-file-name)))
              (page-keyword (first (arroyo-db-get "ARCOLOGY_KEY" file)))
              (site-key (first (split-string page-keyword "/")))
              (page-nodes (org-roam-db-query [:select [file id level title] :from nodes
                                              :where (= file $s1)]
                                             file))
              (file-hash (caar (org-roam-db-query [:select [hash] :from files :where (= file $s1)]
                                                  file)))
              (page-node-ids (apply #'vector (--map (second it) page-nodes)))
              (level-0-node (--first (eq 0 (third it)) page-nodes))
              (level-0-id (elt level-0-node 1)) 
              (level-0-title (elt level-0-node 3)))
                                        ; remove the map here -- there will only ever be one level-0 node hopefully but this is hard to understand
    (let* ((allow-crawl (first (arroyo-db-get "ARCOLOGY_ALLOW_CRAWL" file)))
           (allow-crawl (and allow-crawl
                             (not (equal allow-crawl "nil")))) ; make sure writing "nil" in the key is respected
           (all-node-refs (org-roam-db-query [:select [ref type node_id] :from refs
                                                      :where (in node_id $v1)]
                                             page-node-ids))
           (all-node-tags (org-roam-db-query [:select [tag node_id] :from tags
                                                      :where (in node_id $v1)]
                                             page-node-ids))
           (links (org-roam-db-query [:select [source dest type properties] :from links
                                              :where (in source $v1)]
                                     page-node-ids)))
      (arroyo-arcology--insert-page file page-keyword site-key level-0-title level-0-id allow-crawl file-hash)
      (arroyo-arcology--insert-nodes file page-nodes)
      (arroyo-arcology--insert-tags file all-node-tags)
      (arroyo-arcology--insert-refs file all-node-refs)
      (arroyo-arcology--insert-feeds file)
      (arroyo-arcology--insert-links file level-0-title links))))

(defun arroyo-arcology-update-db (&optional _wut)
  (interactive)
  (->>
   (arroyo-db-get "ARCOLOGY_KEY")
   (-map #'car)
   (-uniq)
   ;; this runs *after* db is updated... what to do here?
   ;; (-filter #'arroyo-db-file-updated-p)
   (-map #'arroyo-arcology-update-file)
   )
  )

(add-function :after (symbol-function 'arroyo-db-update-all-roam-files) #'arroyo-arcology-update-db)
;; (add-to-list 'arroyo-db-update-functions #'arroyo-arcology-update-file)

(provide 'arroyo-arcology)

Arcology SQLModel Database Bindings

The engine looks like this, and it's pretty easy to attach my org-roam database here using the SQLAlchmey Events System you can munge a SQLModel's __table__.schema to query and map against the org-roam metadatabase.

from sqlmodel import create_engine
from sqlalchemy import event

from arcology.config import get_settings

from pathlib import Path

settings = get_settings()
org_roam_sqlite_file_name = Path(settings.org_roam_db).expanduser().resolve()
arroyo_sqlite_file_name = Path(settings.arcology_db).expanduser().resolve()

def make_engine():
    engine = create_engine('sqlite:///{path}'.format(path=arroyo_sqlite_file_name), echo=False)

    @event.listens_for(engine, "connect")
    def do_connect(dbapi_connection, _connection_record):
        dbapi_connection.execute("attach database '{orgdb}' as orgroam;".format(orgdb=org_roam_sqlite_file_name))

    return engine


engine = make_engine()

An interactive testing session could look like this, and indeed C-c C-c in here will run it in an Inferior Python session:

from sqlmodel import select, SQLModel, Session

import arcology.arroyo as arroyo
from arcology.parse import *

engine = arroyo.engine
session = Session(engine)

first_link = next(session.exec(select(arroyo.Link)))

from_file = arroyo.Page.from_file("/home/rrix/org/arroyo/arroyo.org", session)
from_key = arroyo.Page.from_key("doc/archive", session)

ht = await from_key.document_html()

Invoking the Arroyo generator from Python

Since the Arcology Arroyo System is written in Emacs Lisp, it's not exactly simple to update the database. When implemented as part of a long-running user-controlled Emacs environment, Arroyo uses Emacs's Hooks to update the database when org-mode files change.

Instead of doing that, we find ourselves implementing some scaffolding to replace it:

Org-mode files are put on the server with Syncthing

"Batch" commands for running Emacs with the Arroyo generators from a shell

This little Emacs Lisp script sets up some of the minimal CCE scaffolding to make the Arroyo-DB functions available to an environment.

(unless (boundp 'org-roam-directory)
  (setq org-roam-directory (file-truename "~/org/")))

(load-file (expand-file-name "cce/packaging.el" org-roam-directory))

(add-to-list 'load-path default-directory)
(add-to-list 'load-path arroyo-source-directory)

(use-package dash)
(use-package f)
(use-package s)
(use-package emacsql)
;; (use-package emacsql-sqlite3)
(require 'subr-x)
(require 'cl)

(require 'org-roam)
(require 'arroyo-db)
(require 'arroyo-arcology)

That script is loaded by this script which isn't a script, but a template for a Python module so that the locations and variables can be customized at run time, loaded from the Arcology BaseSettings.

(lord help me)

set -ex
export DBPATH=$(mktemp $(dirname {arcology_db})/arcology.XXXXXXXXXX.db)
pushd {arcology_src};

cp {arcology_db} $DBPATH || echo "no existing db found, will be created from scratch"
{emacs} -Q --batch \
      --eval '(setq org-roam-directory "{arcology_dir}")' \
      --eval '(setq arcology-source-directory "{arcology_src}/lisp")' \
      --eval '(setq arroyo-source-directory "{arroyo_src}")' \
      --eval '(setq arroyo-db-location "'$DBPATH'")' \
      --eval '(setq org-roam-db-location "{org_roam_db}")' \
      -l lisp/arcology-batch.el \
      --eval '(org-roam-db-sync)' # \
# --eval '(arroyo-db-update-all-roam-files)' \
# --eval '(arroyo-db-update-all-roam-files)' \
# --eval '(arroyo-arcology-update-db)'

mv $DBPATH {arcology_db}
echo "rebuild done"

The Python extracts stuff from that BaseSettings module">FastAPI/Pydantic BaseSettings module and templates it in with format(). Sorry for Literate Programming (sorry for party rocking)

from .config import get_settings

COMMAND_TMPL = """
<<arcology-batch-shell>>
"""

def build_command():
   settings = get_settings()

   return COMMAND_TMPL.format(
      arcology_dir = settings.arcology_directory,
      arcology_src = settings.arcology_src,
      arroyo_src = settings.arroyo_src,
      arcology_db = settings.arcology_db,
      org_roam_db = settings.org_roam_db,
      emacs = settings.arroyo_emacs,
   )

This is executed by Arcology Automated Database Builder.