arcology-fastapi/arcology-arroyo.org

:PROPERTIES:
:ID:       arcology/arroyo-page
:END:
#+TITLE: Arroyo Arcology Generator
#+filetags: :Project:Arcology:
#+ARCOLOGY_KEY: arcology/arroyo
#+ARCOLOGY_ALLOW_CRAWL: t
#+AUTO_TANGLE: t

[[shell:ln -s arroyo-arcology.el ~/org/cce/arroyo-arcology.el]] this needs to be in the CCE directory for [[id:arroyo/emacs][Arroyo Emacs]] to automatically load it.

his can be set up to automatically load in an [[id:arroyo/emacs][Arroyo Emacs]] environment.
#+ARROYO_EMACS_MODULE: arroyo-arcology
#+ARROYO_MODULE_WANTS: arroyo/arroyo.org

The Arcology is fundamentally about rendering and sharing entire org-mode documents on the web. This made the direct usage of [[id:cce/org-roam][org-roam]]'s database a pretty straight-forward endeavor, until the migration to a Node-centered model with org-roam v2. This model has made my note-taking much better but it's forced me to rethink the data model of the Arcology pretty significantly.

This ultimately has developed over  2021 as [[id:arroyo/arroyo][Arroyo Systems Management]] -- a set of sidecar metadata tables for my notes and the [[id:128ab0e8-a1c7-48bf-9efe-0c23ce906a48][org-mode meta applications]] built on top of them. The Arcology's database is a set of tables derived from the metadata in my org-mode files. This database is generated inside of Emacs and mounted read-only by my FastAPI session via [[id:20210925T182140.388493][SQLModel]]. I would love to generate this database another way, but there is still only one high-quality org parser: org-mode.

The "entry point" of this API is the =arcology.arroyo.Page= below. It has some class methods hanging off it which can instantiate Pages from the database by filename or routing key.

A page doesn't require much metadata to render or be found, really. The org-mode source file, its =ARCOLOGY_KEY= routing key, and the root [[id:20211203T142533.902422][arcology.roam.Node]] object's primary ID. Most of this can be gleaned from the [[id:20211203T142617.812313][arcology.roam.File]] object and my Keyword sidecar.

#+begin_src emacs-lisp
(add-to-list 'arroyo-db-keywords "ARCOLOGY_KEY")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_FEED")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_TOOT_VISIBILITY")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_ALLOW_CRAWL")
#+end_src

The =ARCOLOGY_KEY= is a file property which contains the page's "routing key" -- a string with at least one =/= in it which separates the site it'll publish to from the path it'll be published on -- this maps to a URL in the form of =localhost:3000/$ARCOLOGY_KEY= or the first part will map to one of the public domains. this will make more sense later on.

The =ARCOLOGY_FEED= is a file property which contains a routing key to an RSS feed

#+PROPERTY: header-args:emacs-lisp :tangle arroyo-arcology.el :results none :mkdirp yes :comments link

This is assembled using [[id:09779ac0-4d5f-40db-a340-49595c717e03][noweb syntax]] because Page relies on Link being defined for the =link_model= relationship... And there is some more code that makes it in to =arcology.arroyo= for setting up the session and engine down below under [[id:arcology/arroyo/sqlmodel][Arcology SQLModel Database Bindings]] ...

#+begin_src python :tangle arcology/arroyo.py :noweb yes
from typing import Optional, List
from sqlmodel import Field, Relationship, SQLModel

from arcology.parse import parse_sexp, print_sexp

<<arcology.arroyo.Link>>
<<arcology.arroyo.Page>>
<<arcology.arroyo.Tag>>
<<arcology.arroyo.Node>>
<<arcology.arroyo.Ref>>
<<arcology.arroyo.Keyword>>
<<arcology.arroyo.Feed>>
#+end_src

Anyways.

* NEXT document schemas

explain inter-relations between these classes, maybe a relationship graph

explain columns and link to where specialized columns like =allow_crawl= go and come from?

* Arcology Page
:PROPERTIES:
:ID:       arcology/arroyo/page
:ROAM_ALIASES: arcology.arroyo.Page
:END:

A Page represents the minimal metadata required to find and render an [[id:1fb8fb45-fac5-4449-a347-d55118bb377e][org-mode]] document and generate links to it. I would love to someday not have to wire up all these relationships by hand, I'll have to remodel this at some point, but for now specifying all the =primaryjoin= characteristics is enough.

#+NAME: arcology.arroyo.Page
#+begin_src python :noweb yes
from sqlmodel import Session, select
import hashlib

from arcology.key import ArcologyKey, id_to_arcology_key
import arcology.html as html

class Page(SQLModel, table=True):
  __tablename__ = "arcology_pages"
  file: str = Field(primary_key=True)
  key: str = Field(description="The ARCOLOGY_KEY for the page")
  title: str = Field(description="Primary title of the page")
  hash: str = Field(description="The hash of the file when it was indexed")
  root_id: str = Field(description="The ID for the page itself", foreign_key="nodes.node_id")
  site: str = Field(description="Maps to an arcology.Site key.")
  allow_crawl: str = Field(description="Lisp boolean for whether this page should go in robots.txt")

  nodes: List["Node"] = Relationship(
    back_populates="page",
    sa_relationship_kwargs=dict(
      primaryjoin="Node.file==Page.file"
    )
  )
  tags: List["Tag"] = Relationship(
    sa_relationship_kwargs=dict(
      primaryjoin="Tag.file==Page.file"
    )
  )
  references: List["Reference"] = Relationship(
    sa_relationship_kwargs=dict(
      primaryjoin="Reference.file==Page.file"
    )
  )

  def get_title(self):
    return parse_sexp(self.title)

  def get_key(self):
    return parse_sexp(self.key)

  def get_file(self):
    return parse_sexp(self.file)

  def get_arcology_key(self):
    return ArcologyKey(self.get_key())

  def get_site(self):
    return self.get_arcology_key().site

  <<page_link_relationships>>
  <<page_classmethods>>
  <<page_html_generators>>
#+end_src

#+NAME: page_classmethods
#+begin_src python
@classmethod
def from_file(cls, path: str, session: Session):
  q = select(cls).where(cls.file==print_sexp(path))
  return session.exec(q).one()

@classmethod
def from_key(cls, key: str, session: Session):
  q = select(cls).where(cls.key==print_sexp(key))
  try:
    return next(session.exec(q))
  except StopIteration:
    return None
#+end_src

The Page carries bi-directional link relationships to both the Link and the Page on the other side of it.

#+NAME: page_link_relationships
#+begin_src python
backlinks: List["Link"] = Relationship(
    back_populates="dest_page",
    sa_relationship_kwargs=dict(
        primaryjoin="Page.file==Link.dest_file"
    )
)

outlinks: List["Link"] = Relationship(
    back_populates="source_page",
    sa_relationship_kwargs=dict(
        primaryjoin="Page.file==Link.source_file"
    )
)

backlink_pages: List["Page"] = Relationship(
    link_model=Link,
    back_populates="outlink_pages",
    sa_relationship_kwargs=dict(
        foreign_keys="[Link.dest_file]",
        viewonly=True,
    )
)

outlink_pages: List["Page"] = Relationship(
    link_model=Link,
    back_populates="backlink_pages",
    sa_relationship_kwargs=dict(
        foreign_keys="[Link.source_file]",
        viewonly=True,
    )
)
#+end_src

The code to insert a page relies on a bunch of stuff pulled out of the page and out of the [[id:arcology/arroyo/keyword][Arcology Keywords]] store -- be sure the arguments line up, and maybe i should switch these to use =&keys= eventually so that it's less foot-gun-shaped

#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
             '(arcology-pages
                [(file :not-null)
                 (key :not-null)
                 (site :not-null)
                 (title :not-null)
                 (root-id :not-null)
                 (allow-crawl)
                 (hash :not-null)]))

(defun arroyo-arcology--insert-page (file kw site title root-id allow-crawl hash)
  (arroyo-db-query [:delete :from arcology-pages
                    :where (= file $s1)]
                   file)
  (arroyo-db-query [:insert :into arcology-pages :values $v1]
                   (vector file kw site title root-id allow-crawl hash)))
#+end_src

** Generating HTML from Arcology Pages
:PROPERTIES:
:ID:       arcology/arroyo/gen_html
:END:

Arcology pages have two "documents" attached to them on render: the Org doc itself, and a document constituted from the backlinks.

The backlink document is generated dynamically using =Page.make_backlinks_org= which just generates a string from the Link relationships.

#+NAME: page_html_generators
#+begin_src python
def make_backlinks_org(self):
    if self.backlinks is None:
        return ''

    def to_org(link: Link):
        return \
            """
            ,* [[id:{path}][{title}]]
            """.format(
                path=parse_sexp(link.source_id),
                title=link.get_source_title()
            )

    return '\n'.join([ to_org(link) for link in self.backlinks ])

async def document_html(self):
    cache_key = parse_sexp(self.hash)
    return html.gen_html(parse_sexp(self.file), cache_key)

async def backlink_html(self):
    org = self.make_backlinks_org()
    cache_key = hashlib.sha224(org.encode('utf-8')).hexdigest()
    return html.gen_html_text(org, cache_key)
#+end_src

** Invoking Pandoc

[[https://pandoc.org/][Pandoc]] is used to generate the HTML for a page. It's a versatile kit and I do some fair bit to extend it in other places, for example in the

The HTML generation is done using [[https://pypi.org/project/pypandoc/][PyPandoc]], which I guess is just a shell wrapper around it. Caching is cheated with an [[https://docs.python.org/3/library/functools.html#functools.lru_cache][functools.lru_cache]]; for this to work out well I need to bring the file's hash in to the [[id:arcology/arroyo/page][arcology.arroyo.Page]] so that the cache can bust when the document is updated.

#+begin_src python :tangle arcology/html.py
import functools
import pypandoc

@functools.lru_cache(maxsize=128)
def gen_html(input_path: str, extra_cache_key: str = '', input_format: str = 'org'):
    return pypandoc.convert_file(input_path, 'html', format='org')

@functools.lru_cache(maxsize=128)
def gen_html_text(input_text: str, extra_cache_key: str = '', input_format: str = 'org'):
    return pypandoc.convert_text(input_text, 'html', format='org')
#+end_src

** Rewriting and Hydrating the Pandoc HTML
:PROPERTIES:
:ID:       arcology/arroyo/hydrate
:END:

So the HTML that comes out of Pandoc is smart but doesn't understand, for example, ID links; I could of course use Emacs and its =org-html-export-as-html= but that shit is gonna be really slow. Instead I'll do the work myself (lol).

#+begin_src python :tangle arcology/html.py
from arcology.parse import print_sexp, parse_sexp
import arcology.arroyo as arroyo

import sqlmodel
import re
from typing import Optional

from arcology.key import id_to_arcology_key, file_to_arcology_key

class HTMLRewriter():
  def __init__(self, session):
    self.res_404 = 'href="/404?missing={key}" class="dead-link"'
    self.session = session

  def replace(match):
    raise NotImplementedError()

  def re(self):
    raise NotImplementedError()

  def do(self, output_html):
    return re.sub(self.re(), self.replace, output_html)
#+end_src

Rewriting the HTML is a pretty straightforward affair using [[https://docs.python.org/3/library/re.html#re.sub][re.sub]] with callbacks rather than static replacements, with some abstraction sprinkled on top in the form of the =HTMLRewriter= superclass defined above. Each implementation of it provides a function which accepts the match object, and pulls the node's [[id:arcology/arroyo/key][=ARCOLOGY_KEY=]] with an optional node-id anchor attached to it. This is then farmed out to [[id:arcology/arroyo/key][=arcology_key_to_url=]] or so to be turned in to a URL. In this fashion, each =href= is replaced with a URL that will route to the target page, or a 404 page link with a CSS class attached.

I'm pretty sure this is all quite inefficient but as always I invoke [[id:personal_software_can_be_shitty][Personal Software Can Be Shitty]].

So ID links can be rewritten like:

#+begin_src python :tangle arcology/html.py
class IDReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    id = match.group(1)
    key = id_to_arcology_key(id, self.session)
    if key is None:
      return self.res_404.format(key=id)
    else:
      return 'class="internal" href="{url}"'.format(url=arcology_key_to_url(key))

  def re(self):
    return r'href="id:([^"]+)"'
#+end_src

File links can be rewritten like:

#+begin_src python :tangle arcology/html.py
class FileReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    file = match.group(1)
    if file is None:
      return self.res_404.format(key=file)
    key = file_to_arcology_key(file, self.session)
    if key is None:
      return self.res_404.format(key=file)
    else:
      return 'class="file" href="{url}"'.format(url=arcology_key_to_url(key))

  def re(self):
    return r'href="file://([^"]+)"'
#+end_src

[[id:cce/org-roam][org-roam]] stub links can be rewritten link. This one is a little wonky because =res_404= and the other regexen don't only want to operate on the anchor's attribute. This one wants to strip the =roam:= text from the =[[roam:Stub]]= links.

#+begin_src python :tangle arcology/html.py
class RoamReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    return self.res_404.format(key=match.group(1)) + ">"

  def re(self):
    return r'href="roam:([^"]+)">roam:'
#+end_src

I also make some quality-of-life rewrites of my [[id:2e31b385-a003-4369-a136-c6b78c0917e1][org-fc]] cloze cards in to simple =<span>= elements with the hint embedded in them.

#+begin_src python :tangle arcology/html.py
class FCClozeReplacementRewriter(HTMLRewriter):
  def replace(self, match):
    main = match.group(1) or ""
    hint = match.group(2) or ""
    hint = re.sub(r"</?[^>]+>", "", hint)
    return f"<span class='fc-cloze' title='{hint}'>{main}</span>"

  def re(self):
    return r'{{([^}]+)}{?([^}]+)?}?@[0-9]+}'
#+end_src

Invoke all these in a simple little harness:

#+begin_src python :tangle arcology/html.py
def rewrite_html(input_html: str, session: sqlmodel.Session) -> str:
  """
  Run a series of replacement functions on the input HTML and return a new string.
  """

  output_html = input_html

  rewriters = [
    IDReplacementRewriter(session),
    FileReplacementRewriter(session),
    RoamReplacementRewriter(session),
    FCClozeReplacementRewriter(session),
  ]

  for rewriter in rewriters:
    output_html = rewriter.do(output_html)

  return output_html
#+end_src

It's logical that at some point this will have a "pluggable" URL engine, and in fact the production URLs will be hosted under different domains so deconstructing a URL to an ARCOLOGY_KEY ... all of this can happen later, I am just playing jazz right now!

#+begin_src python :tangle arcology/html.py
from arcology.key import ArcologyKey

def arcology_key_to_url(key: ArcologyKey) -> str:
  return key.to_url()
#+end_src

** =arcology.key.ArcologyKey= encapuslates parsing and rendering URLs
:PROPERTIES:
:ID:       arcology/arroyo/key
:ROAM_ALIASES: arcology.key.ArcologyKey arcology.key.file_to_arcology_key arcology.key.id_to_arcology_key
:END:

The =ArcologyKey= is a simple =dataclass= encapsulating the things which the =ARCOLOGY_KEY= page keyword represents.

For example the key =ArcologyKey(key=arcology/arroyo#arcology/arroyo/key)= will contain the following properties:
- =key=: the key passed in
- =site_key=: this everything up to the first slash. It points to objects defined and fetchable through [[id:20211219T144255.001827][Arcology Sites]].
- =site=: I typed the line above, and said "oh", and added this resolution of the =arcology.sites.Site= object.
- =rest=: "rest" is everything after the slash, but up to an optional anchor
- =anchor_id=: said optional anchor -- Pandoc headings within the page will have the =ID= property as the anchor, this is handy!

#+begin_src python :tangle arcology/key.py
from dataclasses import dataclass
from typing import Optional

from fastapi import Request
from starlette import routing
import sqlmodel

from arcology.parse import parse_sexp, print_sexp
from arcology.sites import sites, Site
from arcology.config import get_settings, Environment
from arcology.sites import host_to_site

route_regexp, _, _ = routing.compile_path("/{sub_key:path}/")
route_regexp2, _, _ = routing.compile_path("/{sub_key:path}")

import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

@dataclass
class ArcologyKey():
  key: str
  site_key: str
  site: Site
  rest: str = ""
  anchor_id: Optional[str] = None

  def __init__(self, key: str, site_key="", rest="", anchor_id = None):
    self.key = key
    self.site_key=site_key
    self.rest = rest
    self.anchor_id = anchor_id

    stop = '/'
    idx = 0
    collector = [""]
    for char in key:
      if char == stop:
        stop = '#'
        idx += 1
        collector = collector + [""]
        continue
      collector[idx] += char

    if len(collector) > 0:
      self.site_key = collector[0]
      self.site = sites.get(self.site_key, None)
    if len(collector) > 1:
      self.rest = collector[1]
    if len(collector) > 2:
      self.anchor_id = collector[2]

  def to_url(self) -> str:
    env = get_settings().arcology_env
    domains = self.site.domains.get(env, None)

    url = ""
    if domains is not None:
      url = "https://{domain}/{rest}".format(domain=domains[0], rest=self.rest)
    else:
      url = "http://localhost:8000/{key}".format(key=self.key)
    if self.anchor_id is not None:
      url = url + "#" + self.anchor_id

    return url

  def from_request(request: Request):
    path = request.url.path
    host = request.headers.get('host')
    return ArcologyKey.from_host_and_path(host, path)

  def from_host_and_path(host: str, path: str):
    m = route_regexp.match(path) or route_regexp2.match(path) or None
    if m is None:
      logger.debug("no path match: %s", path)
      return None
    sub_key = m.group("sub_key")

    site = host_to_site(host)
    if site is None:
      logger.debug("no host match: %s", host)
      return None

    if len(sub_key) == 0:
      sub_key = "index"
    key = "{site_key}/{sub_key}".format(
      site_key=site.key,
      sub_key=sub_key,
    )
    return ArcologyKey(key)
#+end_src

Retrieving the =ARCOLOGY_KEY= given an ID is a pretty straightforward SQLModel query, actually. If the referenced node is in the Arroyo database, by definition it's got a published [[id:arcology/arroyo/page][arcology.arroyo.Page]], and so it's a matter of going and fetching it. If the Node is the root node (a direct link to the document), simply return the key, otherwise append the node-id to it so that a URL can link directly to the heading's anchor.

#+begin_src python :tangle arcology/key.py
def id_to_arcology_key(id: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
  """
  Given a node ID, return the ARCOLOGY_KEY for the node.
  """
  from .arroyo import Node

  linked_node_query = sqlmodel.select(Node) \
                              .where(Node.node_id==print_sexp(id))
  res = session.exec(linked_node_query)

  linked_node = res.all()
  if len(linked_node) == 1:
    linked_node = linked_node[0]
    linked_page = linked_node.page

    if linked_page == None:
      return None

    page_key = parse_sexp(linked_page.key)
    ret = ArcologyKey(key=page_key)
    if linked_node.level != 0:
      ret.anchor_id = id
    return ret

  elif len(linked_node) != 0:
    raise Exception(f"more than one key for node? {id}")
  else:
    return None
#+end_src

By File is even more simple:

#+begin_src python :tangle arcology/key.py
def file_to_arcology_key(file: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
  """
  Given a node ID, return the ARCOLOGY_KEY for the node.
  """
  from .arroyo import Page
  key_q = sqlmodel.select(Page).where(Page.file == print_sexp(file))
  page = session.exec(key_q).first()

  if page is None:
    return
  page_key = parse_sexp(page.key)
  return ArcologyKey(key=page_key)
#+end_src

** NEXT HTML should inject sidenotes in during rewrite_html?
:PROPERTIES:
:ID:       20211219T165357.962899
:END:

this would be slow and maybe janky but that's probably fine once it's memoized.  :Project:  :Project:

but this would mean that node backlinks would appear in-line, things like [[id:6b306fe3-fbc4-4ba7-bfcb-089c0564f9c3][Topic Index]] have some trouble otherwise.
* Arcology Tags
:PROPERTIES:
:ID:       arcology/arroyo/tag
:ROAM_ALIASES: arcology.arroyo.Tag
:END:

#+NAME: arcology.arroyo.Tag
#+begin_src python
class Tag(SQLModel, table=True):
  __tablename__ = "arcology_tags"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  tag: str = Field(primary_key=True, description="The tag itself.")
  node_id: str = Field(description="A heading ID which the tag applies to")

  def tag(self):
    return parse_sexp(self.tag)
#+end_src

A page has any number of tags according to the file primary key:

#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
             '(arcology-tags
               [(file :not-null)
                (tag :not-null)
                (node-id :not-null)]))

(defun arroyo-arcology--insert-tags (file node-tags)
  (arroyo-db-query [:delete :from arcology-tags
                    :where (= file $s1)]
                   file)
  (pcase-dolist (`(,tag ,node-id) node-tags)
    (arroyo-db-query [:insert :into arcology-tags
                      :values $v1]
                     (vector file tag node-id))))
#+end_src

* Arcology Links
:PROPERTIES:
:ID:       arcology/arroyo/link
:ROAM_ALIASES: arcology.arroyo.Link
:END:

And for rewriting the links to point to their routing key, two tables:

A =links= table which contains the file *and* node ID references, as well as the title of the source file which can be used to quickly generate backlink listings for a given page (and its sub-heading nodes):

#+NAME: arcology.arroyo.Link
#+begin_src python
class Link(SQLModel, table=True):
  __tablename__ = "arcology_links"
  source_title: Optional[str] = Field(default="", description="The title of the page the link is written in.")

  def get_source_title(self):
    return parse_sexp(self.source_title)

  source_id: str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
  source_node: Optional["Node"] = Relationship(
    sa_relationship_kwargs=dict(
      # back_populates="outlinks",
      primaryjoin="Node.node_id == Link.source_id"
    )
  )

  dest_id:   str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
  dest_node: Optional["Node"] = Relationship(
    sa_relationship_kwargs=dict(
      # back_populates="backlinks",
      primaryjoin="Node.node_id == Link.dest_id"
    )
  )

  source_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  source_page: Optional["Page"] = Relationship(
    back_populates="outlinks",
    sa_relationship_kwargs=dict(
      primaryjoin="Page.file==Link.source_file"
    )
  )

  dest_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  dest_page: Optional["Page"] = Relationship(
    back_populates="backlinks",
    sa_relationship_kwargs=dict(
      primaryjoin="Page.file==Link.dest_file"
    )
  )
#+end_src

Links in the [[id:cce/org-roam][org-roam]] database have a useful =type= column. We only store ID =Links= for now... probably can support file links easily enough but other "unidirectional" links I would like to store elsewhere I think.

#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
             '(arcology-links
               [source-title
                (source-file :not-null)
                (source-id :not-null)
                (dest-file :not-null)
                (dest-id :not-null)]))

(defun arcology--published-page? (file)
  (not (not (arroyo-db-get "ARCOLOGY_KEY" file))))

(defun arroyo-arcology--insert-links (file source-title links)
  (arroyo-db-query [:delete :from arcology-links
                    :where (= source-file $s1)]
                   file)
  (pcase-dolist (`(,source ,dest ,type ,props) links)
    (cond ((equal type "id")
           (pcase-let* ((dest-file (caar (org-roam-db-query
                                    [:select file :from nodes
                                     :where (= id $s1)]
                                    dest)))
                        (`(,immediate-source-title ,immediate-source-level)
                         (car (org-roam-db-query
                               [:select [title level] :from nodes
                                 :where (= id $s1)]
                                source)))
                        ;; "level 0 -> level n" unless n == 0
                        (composed-node-title
                         (if (= 0 immediate-source-level)
                             source-title
                             (concat source-title " -> " immediate-source-title))))
             (when (and dest-file (arcology--published-page? dest-file)
                        (arroyo-db-query [:insert :into arcology-links
                                          :values $v1]
                                         (vector composed-node-title file source dest-file dest))))))
          ;; insert https link?
          ((equal type "https") nil)
          ((equal type "http") nil)
          ((equal type "roam") nil)
          (t nil))))
#+end_src

** INPROGRESS =source_title= should populate with the immediate parent header's title, not level 0
:LOGBOOK:
- State "INPROGRESS" from "NEXT"       [2022-08-05 Fri 14:03]
:END:

It's passed in to =arroyo-arcology--insert-links= [[id:arcology/arroyo][Below]]. Not sure the better way to do that -- query =org-roam-db= in the insert function itself? good enough for now prolly.

deal with the title being fetched and populated in that function below if necessary.

* Arcology Nodes
:PROPERTIES:
:ID:       arcology/arroyo/node
:ROAM_ALIASES: arcology.arroyo.Node
:END:

A =nodes= table will help in reassembling links in to =HREFs=, in theory, but i don't think it's necessary? maybe? There are bunch of other metadata on this that I would like to pull across from [[id:cce/org-roam][org-roam]] eventually.

#+NAME: arcology.arroyo.Node
#+begin_src python
class Node(SQLModel, table=True):
  __tablename__ = "arcology_nodes"
  node_id: str = Field(primary_key=True, description="The heading ID property")
  file: str = Field(description="File in which this Node appears", foreign_key="arcology_pages.file")
  level: str = Field(description="Outline depth of the heading. 0 is top-level")

  page: Optional["Page"] = Relationship(
    back_populates="nodes",
    sa_relationship_kwargs=dict(
      viewonly=True,
      primaryjoin="Node.file==Page.file"
    )
  )
#+end_src

#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
             '(arcology-nodes
               [(node-id :not-null)
                (file :not-null)
                (level :not-null)]))

(defun arroyo-arcology--insert-nodes (file nodes)
  (arroyo-db-query [:delete :from arcology-nodes
                    :where (= file $s1)]
                   file)
  (pcase-dolist (`(,file ,id ,level) nodes)
    (arroyo-db-query [:insert :into arcology-nodes
                      :values $v1]
                     (vector id file level))))
#+end_src

* Arcology References
:PROPERTIES:
:ID:       arcology/arroyo/ref
:ROAM_ALIASES: arcology.arroyo.Reference
:END:

Each [[id:cce/org-roam][org-roam]] node can have a set of "references" attached to them, I use these URIs to point to a "canonical" resource which the node is referencing.

#+NAME: arcology.arroyo.Ref
#+begin_src python
class Reference(SQLModel, table=True):
  __tablename__ = "arcology_refs"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  ref: str = Field(primary_key=True, description="The full URI of the reference itself.")
  node_id: str = Field(description="A heading ID which the ref applies to")

  def url(self):
    return parse_sexp(self.ref)
#+end_src

A page has any number of refs according to the file primary key:

#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
             '(arcology-refs
               [(file :not-null)
                (ref :not-null)
                (node-id :not-null)]))

(defun arroyo-arcology--insert-refs (file node-refs)
  (arroyo-db-query [:delete :from arcology-refs
                    :where (= file $s1)]
                   file)
  (pcase-dolist (`(,ref ,type ,node-id) node-refs)
    (arroyo-db-query [:insert :into arcology-refs
                      :values $v1]
                     (vector file (format "%s:%s" type ref) node-id))))
#+end_src

* INPROGRESS Arcology Feeds
:PROPERTIES:
:ID:       arcology/arroyo/feed
:ROAM_ALIASES: arcology.arroyo.Feed
:END:
:LOGBOOK:
- State "INPROGRESS" from              [2023-01-24 Tue 23:33]
:END:

#+NAME: arcology.arroyo.Feed
#+begin_src python
class Feed(SQLModel, table=True):
  __tablename__ = "arcology_feeds"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  key: str = Field(primary_key=True, description="The routing key for the feed.")
  title: str = Field(description="Title of the page which the feed is embedded in")
  site: str = Field(description="Arcology Site which the feed resides on.")
  post_visibility: str = Field(description="Visibility of the feed's posts in feed2toot, etc")

  def get_key(self):
    return parse_sexp(self.key)

  def get_arcology_key(self):
    return ArcologyKey(self.get_key())

  def get_title(self):
    return parse_sexp(self.title)

  def get_site(self):
    return parse_sexp(self.site)

  def get_post_visibility(self):
    return parse_sexp(self.post_visibility)

  def dict(self, **kwargs):
    return dict(
      key=self.get_key(),
      url=self.get_arcology_key().to_url(),
      title=self.get_title(),
      site=self.get_site(),
      visibility=self.get_post_visibility(),
    )
#+end_src

A page has any number of feeds according to the file primary key:

#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
             '(arcology-feeds
               [(file :not-null)
                (key :not-null)
                (title :not-null)
                (site :not-null)
                (post-visibility :not-null)]))

(defun arroyo-arcology--insert-feeds (file)
  (arroyo-db-query [:delete :from arcology-feeds
                    :where (= file $s1)]
                   file)
  (if-let* ((key (car (arroyo-db-get "ARCOLOGY_FEED" file)))
            (site (replace-regexp-in-string "/.*" "" key)))
      (let* ((title (arroyo-db--get-file-title-from-org-roam file))
             (post-visibility (car (arroyo-db-get "ARCOLOGY_TOOT_VISIBILITY" file))))
        (arroyo-db-query [:insert :into arcology-feeds
                          :values $v1]
                         (vector file key title site post-visibility)))))
#+end_src

* Arcology Keywords
:PROPERTIES:
:ID:       arcology/arroyo/keyword
:ROAM_ALIASES: arcology.arroyo.Keyword
:END:

All of these models are generated below from the =ARCOLOGY_KEY= entities embedded on each page. these are *Keywords*, a 3-tuple of file, keyword, value, a *threeple*

#+NAME: arcology.arroyo.Keyword
#+begin_src python
class Keyword(SQLModel, table=True):
  __tablename__ = "keywords"
  file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
  keyword: str = Field(primary_key=True, description="")
  value: str = Field(description="The value of the page")

  def filename(self):
    return parse_sexp(self.file)

  def keyword(self):
    return parse_sexp(self.keyword)

  def value(self):
    return parse_sexp(self.value)

  @classmethod
  def get(cls, key: str, value: str, session: Session):
    q = select(cls).where(cls.keyword==print_sexp(key)).where(cls.value==print_sexp(value))
    try:
      return next(session.exec(q))
    except StopIteration:
      return None
#+end_src

* Arcology [[id:arroyo/arroyo][Arroyo System]] Database Generator
:PROPERTIES:
:ID:       arcology/arroyo
:ROAM_ALIASES: arroyo-arcology-update-file
:END:

Putting all those update functions together in an [[id:arroyo/system-cache][arroyo-db]] update function. This has to run after the [[id:cce/org-roam][org-roam]] and [[id:arroyo/system-cache][Arroyo System Cache]] keyword database is built, this is annoyign and I need to rethink it.

#+begin_src emacs-lisp
(defun arroyo-arcology-update-file (&optional file)
  (interactive)
  (when-let* ((file (or file (buffer-file-name)))
              (page-keyword (first (arroyo-db-get "ARCOLOGY_KEY" file)))
              (site-key (first (split-string page-keyword "/")))
              (page-nodes (org-roam-db-query [:select [file id level title] :from nodes
                                              :where (= file $s1)]
                                             file))
              (file-hash (caar (org-roam-db-query [:select [hash] :from files :where (= file $s1)]
                                                  file)))
              (page-node-ids (apply #'vector (--map (second it) page-nodes)))
              (level-0-node (--first (eq 0 (third it)) page-nodes))
              (level-0-id (elt level-0-node 1))
              (level-0-title (elt level-0-node 3)))
                                        ; remove the map here -- there will only ever be one level-0 node hopefully but this is hard to understand
    (let* ((allow-crawl (first (arroyo-db-get "ARCOLOGY_ALLOW_CRAWL" file)))
           (allow-crawl (and allow-crawl
                             (not (equal allow-crawl "nil")))) ; make sure writing "nil" in the key is respected
           (all-node-refs (org-roam-db-query [:select [ref type node_id] :from refs
                                                      :where (in node_id $v1)]
                                             page-node-ids))
           (all-node-tags (org-roam-db-query [:select [tag node_id] :from tags
                                                      :where (in node_id $v1)]
                                             page-node-ids))
           (links (org-roam-db-query [:select [source dest type properties] :from links
                                              :where (in source $v1)]
                                     page-node-ids)))
      (arroyo-arcology--insert-page file page-keyword site-key level-0-title level-0-id allow-crawl file-hash)
      (arroyo-arcology--insert-nodes file page-nodes)
      (arroyo-arcology--insert-tags file all-node-tags)
      (arroyo-arcology--insert-refs file all-node-refs)
      (arroyo-arcology--insert-feeds file)
      (arroyo-arcology--insert-links file level-0-title links))))

(defun arroyo-arcology-update-db (&optional _wut)
  (interactive)
  (->>
   (arroyo-db-get "ARCOLOGY_KEY")
   (-map #'car)
   (-uniq)
   ;; this runs *after* db is updated... what to do here?
   ;; (-filter #'arroyo-db-file-updated-p)
   (-map #'arroyo-arcology-update-file)
   )
  )

(add-function :after (symbol-function 'arroyo-db-update-all-roam-files) #'arroyo-arcology-update-db)
;; (add-to-list 'arroyo-db-update-functions #'arroyo-arcology-update-file)

(provide 'arroyo-arcology)
#+end_src

* Arcology SQLModel Database Bindings
:PROPERTIES:
:ID:       arcology/arroyo/sqlmodel
:END:

The engine looks like this, and it's pretty easy to attach my org-roam database here using the [[https://docs.sqlalchemy.org/en/14/core/event.html][SQLAlchmey Events System]] -- you can munge a =SQLModel='s =__table__.schema= to query and map against the org-roam metadatabase.

#+begin_src python :tangle arcology/arroyo.py
from sqlmodel import create_engine
from sqlalchemy import event

from arcology.config import get_settings

from pathlib import Path

settings = get_settings()
org_roam_sqlite_file_name = Path(settings.org_roam_db).expanduser().resolve()
arroyo_sqlite_file_name = Path(settings.arcology_db).expanduser().resolve()

def make_engine():
    engine = create_engine('sqlite:///{path}'.format(path=arroyo_sqlite_file_name), echo=False)

    @event.listens_for(engine, "connect")
    def do_connect(dbapi_connection, _connection_record):
        dbapi_connection.execute("attach database '{orgdb}' as orgroam;".format(orgdb=org_roam_sqlite_file_name))

    return engine


engine = make_engine()
#+end_src

An interactive testing session could look like this, and indeed =C-c C-c= in here will run it in an [[elisp:(run-python)][Inferior Python]] session:

#+begin_src python :session *Python* :results none
from sqlmodel import select, SQLModel, Session

import arcology.arroyo as arroyo
from arcology.parse import *

engine = arroyo.engine
session = Session(engine)

first_link = next(session.exec(select(arroyo.Link)))

from_file = arroyo.Page.from_file("/home/rrix/org/arroyo/arroyo.org", session)
from_key = arroyo.Page.from_key("doc/archive", session)

ht = await from_key.document_html()
#+end_src

* Invoking the Arroyo generator from Python
:PROPERTIES:
:ID:       20220117T162800.337943
:ROAM_ALIASES: "Arcology Batch Commands"
:END:

Since the [[id:arcology/arroyo][Arcology Arroyo System]] is written in [[id:cce/programming_lisp_in_emacs][Emacs Lisp]], it's not exactly simple to update the database. When implemented as part of a long-running user-controlled [[id:cce/emacs][Emacs]] environment, Arroyo uses Emacs's [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Hooks.html][Hooks]] to update the database when org-mode files change.

Instead of doing that, we find ourselves implementing some scaffolding to replace it:

** Org-mode files are put on the server with [[id:cce/syncthing][Syncthing]]

** "Batch" commands for running Emacs with the Arroyo generators from a shell

This little [[id:cce/programming_lisp_in_emacs][Emacs Lisp]] script sets up some of the minimal [[id:cce/cce][CCE]] scaffolding to make the Arroyo-DB functions available to an environment.

#+begin_src emacs-lisp :tangle lisp/arcology-batch.el :mkdirp yes
(unless (boundp 'org-roam-directory)
  (setq org-roam-directory (file-truename "~/org/")))

(load-file (expand-file-name "cce/packaging.el" org-roam-directory))

(add-to-list 'load-path default-directory)
(add-to-list 'load-path arroyo-source-directory)

(use-package dash)
(use-package f)
(use-package s)
(use-package emacsql)
;; (use-package emacsql-sqlite3)
(require 'subr-x)
(require 'cl)

(require 'org-roam)
(require 'arroyo-db)
(require 'arroyo-arcology)
#+end_src

That script is loaded by this script which isn't a script, but a template for a Python module so that the locations and variables can be customized at run time, loaded from the [[id:20220117T162655.535047][Arcology BaseSettings]].

(lord help me)

#+NAME: arcology-batch-shell
#+begin_src shell
set -ex
export DBPATH=$(mktemp $(dirname {arcology_db})/arcology.XXXXXXXXXX.db)
pushd {arcology_src};

cp {arcology_db} $DBPATH || echo "no existing db found, will be created from scratch"
{emacs} -Q --batch \
      --eval '(setq org-roam-directory "{arcology_dir}")' \
      --eval '(setq arcology-source-directory "{arcology_src}/lisp")' \
      --eval '(setq arroyo-source-directory "{arroyo_src}")' \
      --eval '(setq arroyo-db-location "'$DBPATH'")' \
      --eval '(setq org-roam-db-location "{org_roam_db}")' \
      -l lisp/arcology-batch.el \
      --eval '(org-roam-db-sync)' # \
# --eval '(arroyo-db-update-all-roam-files)' \
# --eval '(arroyo-db-update-all-roam-files)' \
# --eval '(arroyo-arcology-update-db)'

mv $DBPATH {arcology_db}
echo "rebuild done"
#+end_src

The Python extracts stuff from that [[id:20220117T162655.535047][FastAPI/Pydantic =BaseSettings= module]] and templates it in with =format()=. Sorry for [[id:cce/literate_programming][Literate Programming]] ([[https://www.youtube.com/watch?v=SkTt9k4Y-a8][sorry for party rocking]])

#+begin_src python :tangle arcology/batch.py :noweb yes
from .config import get_settings

COMMAND_TMPL = """
<<arcology-batch-shell>>
"""

def build_command():
   settings = get_settings()

   return COMMAND_TMPL.format(
      arcology_dir = settings.arcology_directory,
      arcology_src = settings.arcology_src,
      arroyo_src = settings.arroyo_src,
      arcology_db = settings.arcology_db,
      org_roam_db = settings.org_roam_db,
      emacs = settings.arroyo_emacs,
   )
#+end_src

This is executed by [[id:20211218T222408.578567][Arcology Automated Database Builder]].