arcology-fastapi/arcology-arroyo.org

1073 lines
39 KiB
Org Mode

:PROPERTIES:
:ID: arcology/arroyo-page
:END:
#+TITLE: Arroyo Arcology Generator
#+filetags: :Project:Arcology:
#+ARCOLOGY_KEY: arcology/arroyo
#+ARCOLOGY_ALLOW_CRAWL: t
#+AUTO_TANGLE: t
[[shell:ln -s arroyo-arcology.el ~/org/cce/arroyo-arcology.el]] this needs to be in the CCE directory for [[id:arroyo/emacs][Arroyo Emacs]] to automatically load it.
his can be set up to automatically load in an [[id:arroyo/emacs][Arroyo Emacs]] environment.
#+ARROYO_EMACS_MODULE: arroyo-arcology
#+ARROYO_MODULE_WANTS: arroyo/arroyo.org
The Arcology is fundamentally about rendering and sharing entire org-mode documents on the web. This made the direct usage of [[id:cce/org-roam][org-roam]]'s database a pretty straight-forward endeavor, until the migration to a Node-centered model with org-roam v2. This model has made my note-taking much better but it's forced me to rethink the data model of the Arcology pretty significantly.
This ultimately has developed over 2021 as [[id:arroyo/arroyo][Arroyo Systems Management]] -- a set of sidecar metadata tables for my notes and the [[id:128ab0e8-a1c7-48bf-9efe-0c23ce906a48][org-mode meta applications]] built on top of them. The Arcology's database is a set of tables derived from the metadata in my org-mode files. This database is generated inside of Emacs and mounted read-only by my FastAPI session via [[id:20210925T182140.388493][SQLModel]]. I would love to generate this database another way, but there is still only one high-quality org parser: org-mode.
The "entry point" of this API is the =arcology.arroyo.Page= below. It has some class methods hanging off it which can instantiate Pages from the database by filename or routing key.
A page doesn't require much metadata to render or be found, really. The org-mode source file, its =ARCOLOGY_KEY= routing key, and the root [[id:20211203T142533.902422][arcology.roam.Node]] object's primary ID. Most of this can be gleaned from the [[id:20211203T142617.812313][arcology.roam.File]] object and my Keyword sidecar.
#+begin_src emacs-lisp
(add-to-list 'arroyo-db-keywords "ARCOLOGY_KEY")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_FEED")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_TOOT_VISIBILITY")
(add-to-list 'arroyo-db-keywords "ARCOLOGY_ALLOW_CRAWL")
#+end_src
The =ARCOLOGY_KEY= is a file property which contains the page's "routing key" -- a string with at least one =/= in it which separates the site it'll publish to from the path it'll be published on -- this maps to a URL in the form of =localhost:3000/$ARCOLOGY_KEY= or the first part will map to one of the public domains. this will make more sense later on.
The =ARCOLOGY_FEED= is a file property which contains a routing key to an RSS feed
#+PROPERTY: header-args:emacs-lisp :tangle arroyo-arcology.el :results none :mkdirp yes :comments link
This is assembled using [[id:09779ac0-4d5f-40db-a340-49595c717e03][noweb syntax]] because Page relies on Link being defined for the =link_model= relationship... And there is some more code that makes it in to =arcology.arroyo= for setting up the session and engine down below under [[id:arcology/arroyo/sqlmodel][Arcology SQLModel Database Bindings]] ...
#+begin_src python :tangle arcology/arroyo.py :noweb yes
from typing import Optional, List
from sqlmodel import Field, Relationship, SQLModel
from arcology.parse import parse_sexp, print_sexp
<<arcology.arroyo.Link>>
<<arcology.arroyo.Page>>
<<arcology.arroyo.Tag>>
<<arcology.arroyo.Node>>
<<arcology.arroyo.Ref>>
<<arcology.arroyo.Keyword>>
<<arcology.arroyo.Feed>>
#+end_src
Anyways.
* NEXT document schemas
explain inter-relations between these classes, maybe a relationship graph
explain columns and link to where specialized columns like =allow_crawl= go and come from?
* Arcology Page
:PROPERTIES:
:ID: arcology/arroyo/page
:ROAM_ALIASES: arcology.arroyo.Page
:END:
A Page represents the minimal metadata required to find and render an [[id:1fb8fb45-fac5-4449-a347-d55118bb377e][org-mode]] document and generate links to it. I would love to someday not have to wire up all these relationships by hand, I'll have to remodel this at some point, but for now specifying all the =primaryjoin= characteristics is enough.
#+NAME: arcology.arroyo.Page
#+begin_src python :noweb yes
from sqlmodel import Session, select
import hashlib
from arcology.key import ArcologyKey, id_to_arcology_key
import arcology.html as html
class Page(SQLModel, table=True):
__tablename__ = "arcology_pages"
file: str = Field(primary_key=True)
key: str = Field(description="The ARCOLOGY_KEY for the page")
title: str = Field(description="Primary title of the page")
hash: str = Field(description="The hash of the file when it was indexed")
root_id: str = Field(description="The ID for the page itself", foreign_key="nodes.node_id")
site: str = Field(description="Maps to an arcology.Site key.")
allow_crawl: str = Field(description="Lisp boolean for whether this page should go in robots.txt")
nodes: List["Node"] = Relationship(
back_populates="page",
sa_relationship_kwargs=dict(
primaryjoin="Node.file==Page.file"
)
)
tags: List["Tag"] = Relationship(
sa_relationship_kwargs=dict(
primaryjoin="Tag.file==Page.file"
)
)
references: List["Reference"] = Relationship(
sa_relationship_kwargs=dict(
primaryjoin="Reference.file==Page.file"
)
)
def get_title(self):
return parse_sexp(self.title)
def get_key(self):
return parse_sexp(self.key)
def get_file(self):
return parse_sexp(self.file)
def get_arcology_key(self):
return ArcologyKey(self.get_key())
def get_site(self):
return self.get_arcology_key().site
<<page_link_relationships>>
<<page_classmethods>>
<<page_html_generators>>
#+end_src
#+NAME: page_classmethods
#+begin_src python
@classmethod
def from_file(cls, path: str, session: Session):
q = select(cls).where(cls.file==print_sexp(path))
return session.exec(q).one()
@classmethod
def from_key(cls, key: str, session: Session):
q = select(cls).where(cls.key==print_sexp(key))
try:
return next(session.exec(q))
except StopIteration:
return None
#+end_src
The Page carries bi-directional link relationships to both the Link and the Page on the other side of it.
#+NAME: page_link_relationships
#+begin_src python
backlinks: List["Link"] = Relationship(
back_populates="dest_page",
sa_relationship_kwargs=dict(
primaryjoin="Page.file==Link.dest_file"
)
)
outlinks: List["Link"] = Relationship(
back_populates="source_page",
sa_relationship_kwargs=dict(
primaryjoin="Page.file==Link.source_file"
)
)
backlink_pages: List["Page"] = Relationship(
link_model=Link,
back_populates="outlink_pages",
sa_relationship_kwargs=dict(
foreign_keys="[Link.dest_file]",
viewonly=True,
)
)
outlink_pages: List["Page"] = Relationship(
link_model=Link,
back_populates="backlink_pages",
sa_relationship_kwargs=dict(
foreign_keys="[Link.source_file]",
viewonly=True,
)
)
#+end_src
The code to insert a page relies on a bunch of stuff pulled out of the page and out of the [[id:arcology/arroyo/keyword][Arcology Keywords]] store -- be sure the arguments line up, and maybe i should switch these to use =&keys= eventually so that it's less foot-gun-shaped
#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
'(arcology-pages
[(file :not-null)
(key :not-null)
(site :not-null)
(title :not-null)
(root-id :not-null)
(allow-crawl)
(hash :not-null)]))
(defun arroyo-arcology--insert-page (file kw site title root-id allow-crawl hash)
(arroyo-db-query [:delete :from arcology-pages
:where (= file $s1)]
file)
(arroyo-db-query [:insert :into arcology-pages :values $v1]
(vector file kw site title root-id allow-crawl hash)))
#+end_src
** Generating HTML from Arcology Pages
:PROPERTIES:
:ID: arcology/arroyo/gen_html
:END:
Arcology pages have two "documents" attached to them on render: the Org doc itself, and a document constituted from the backlinks.
The backlink document is generated dynamically using =Page.make_backlinks_org= which just generates a string from the Link relationships.
#+NAME: page_html_generators
#+begin_src python
def make_backlinks_org(self):
if self.backlinks is None:
return ''
def to_org(link: Link):
return \
"""
,* [[id:{path}][{title}]]
""".format(
path=parse_sexp(link.source_id),
title=link.get_source_title()
)
return '\n'.join([ to_org(link) for link in self.backlinks ])
async def document_html(self):
cache_key = parse_sexp(self.hash)
return html.gen_html(parse_sexp(self.file), cache_key)
async def backlink_html(self):
org = self.make_backlinks_org()
cache_key = hashlib.sha224(org.encode('utf-8')).hexdigest()
return html.gen_html_text(org, cache_key)
#+end_src
** Invoking Pandoc
[[https://pandoc.org/][Pandoc]] is used to generate the HTML for a page. It's a versatile kit and I do some fair bit to extend it in other places, for example in the
The HTML generation is done using [[https://pypi.org/project/pypandoc/][PyPandoc]], which I guess is just a shell wrapper around it. Caching is cheated with an [[https://docs.python.org/3/library/functools.html#functools.lru_cache][functools.lru_cache]]; for this to work out well I need to bring the file's hash in to the [[id:arcology/arroyo/page][arcology.arroyo.Page]] so that the cache can bust when the document is updated.
#+begin_src python :tangle arcology/html.py
import functools
import pypandoc
@functools.lru_cache(maxsize=128)
def gen_html(input_path: str, extra_cache_key: str = '', input_format: str = 'org'):
return pypandoc.convert_file(input_path, 'html', format='org')
@functools.lru_cache(maxsize=128)
def gen_html_text(input_text: str, extra_cache_key: str = '', input_format: str = 'org'):
return pypandoc.convert_text(input_text, 'html', format='org')
#+end_src
** Rewriting and Hydrating the Pandoc HTML
:PROPERTIES:
:ID: arcology/arroyo/hydrate
:END:
So the HTML that comes out of Pandoc is smart but doesn't understand, for example, ID links; I could of course use Emacs and its =org-html-export-as-html= but that shit is gonna be really slow. Instead I'll do the work myself (lol).
#+begin_src python :tangle arcology/html.py
from arcology.parse import print_sexp, parse_sexp
import arcology.arroyo as arroyo
import sqlmodel
import re
from typing import Optional
from arcology.key import id_to_arcology_key, file_to_arcology_key
class HTMLRewriter():
def __init__(self, session):
self.res_404 = 'href="/404?missing={key}" class="dead-link"'
self.session = session
def replace(match):
raise NotImplementedError()
def re(self):
raise NotImplementedError()
def do(self, output_html):
return re.sub(self.re(), self.replace, output_html)
#+end_src
Rewriting the HTML is a pretty straightforward affair using [[https://docs.python.org/3/library/re.html#re.sub][re.sub]] with callbacks rather than static replacements, with some abstraction sprinkled on top in the form of the =HTMLRewriter= superclass defined above. Each implementation of it provides a function which accepts the match object, and pulls the node's [[id:arcology/arroyo/key][=ARCOLOGY_KEY=]] with an optional node-id anchor attached to it. This is then farmed out to [[id:arcology/arroyo/key][=arcology_key_to_url=]] or so to be turned in to a URL. In this fashion, each =href= is replaced with a URL that will route to the target page, or a 404 page link with a CSS class attached.
I'm pretty sure this is all quite inefficient but as always I invoke [[id:personal_software_can_be_shitty][Personal Software Can Be Shitty]].
So ID links can be rewritten like:
#+begin_src python :tangle arcology/html.py
class IDReplacementRewriter(HTMLRewriter):
def replace(self, match):
id = match.group(1)
key = id_to_arcology_key(id, self.session)
if key is None:
return self.res_404.format(key=id)
else:
return 'class="internal" href="{url}"'.format(url=arcology_key_to_url(key))
def re(self):
return r'href="id:([^"]+)"'
#+end_src
File links can be rewritten like:
#+begin_src python :tangle arcology/html.py
class FileReplacementRewriter(HTMLRewriter):
def replace(self, match):
file = match.group(1)
if file is None:
return self.res_404.format(key=file)
key = file_to_arcology_key(file, self.session)
if key is None:
return self.res_404.format(key=file)
else:
return 'class="file" href="{url}"'.format(url=arcology_key_to_url(key))
def re(self):
return r'href="file://([^"]+)"'
#+end_src
[[id:cce/org-roam][org-roam]] stub links can be rewritten link. This one is a little wonky because =res_404= and the other regexen don't only want to operate on the anchor's attribute. This one wants to strip the =roam:= text from the =[[roam:Stub]]= links.
#+begin_src python :tangle arcology/html.py
class RoamReplacementRewriter(HTMLRewriter):
def replace(self, match):
return self.res_404.format(key=match.group(1)) + ">"
def re(self):
return r'href="roam:([^"]+)">roam:'
#+end_src
I also make some quality-of-life rewrites of my [[id:2e31b385-a003-4369-a136-c6b78c0917e1][org-fc]] cloze cards in to simple =<span>= elements with the hint embedded in them.
#+begin_src python :tangle arcology/html.py
class FCClozeReplacementRewriter(HTMLRewriter):
def replace(self, match):
main = match.group(1) or ""
hint = match.group(2) or ""
hint = re.sub(r"</?[^>]+>", "", hint)
return f"<span class='fc-cloze' title='{hint}'>{main}</span>"
def re(self):
return r'{{([^}]+)}{?([^}]+)?}?@[0-9]+}'
#+end_src
Invoke all these in a simple little harness:
#+begin_src python :tangle arcology/html.py
def rewrite_html(input_html: str, session: sqlmodel.Session) -> str:
"""
Run a series of replacement functions on the input HTML and return a new string.
"""
output_html = input_html
rewriters = [
IDReplacementRewriter(session),
FileReplacementRewriter(session),
RoamReplacementRewriter(session),
FCClozeReplacementRewriter(session),
]
for rewriter in rewriters:
output_html = rewriter.do(output_html)
return output_html
#+end_src
It's logical that at some point this will have a "pluggable" URL engine, and in fact the production URLs will be hosted under different domains so deconstructing a URL to an ARCOLOGY_KEY ... all of this can happen later, I am just playing jazz right now!
#+begin_src python :tangle arcology/html.py
from arcology.key import ArcologyKey
def arcology_key_to_url(key: ArcologyKey) -> str:
return key.to_url()
#+end_src
** =arcology.key.ArcologyKey= encapuslates parsing and rendering URLs
:PROPERTIES:
:ID: arcology/arroyo/key
:ROAM_ALIASES: arcology.key.ArcologyKey arcology.key.file_to_arcology_key arcology.key.id_to_arcology_key
:END:
The =ArcologyKey= is a simple =dataclass= encapsulating the things which the =ARCOLOGY_KEY= page keyword represents.
For example the key =ArcologyKey(key=arcology/arroyo#arcology/arroyo/key)= will contain the following properties:
- =key=: the key passed in
- =site_key=: this everything up to the first slash. It points to objects defined and fetchable through [[id:20211219T144255.001827][Arcology Sites]].
- =site=: I typed the line above, and said "oh", and added this resolution of the =arcology.sites.Site= object.
- =rest=: "rest" is everything after the slash, but up to an optional anchor
- =anchor_id=: said optional anchor -- Pandoc headings within the page will have the =ID= property as the anchor, this is handy!
#+begin_src python :tangle arcology/key.py
from dataclasses import dataclass
from typing import Optional
from fastapi import Request
from starlette import routing
import sqlmodel
from arcology.parse import parse_sexp, print_sexp
from arcology.sites import sites, Site
from arcology.config import get_settings, Environment
from arcology.sites import host_to_site
route_regexp, _, _ = routing.compile_path("/{sub_key:path}/")
route_regexp2, _, _ = routing.compile_path("/{sub_key:path}")
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
@dataclass
class ArcologyKey():
key: str
site_key: str
site: Site
rest: str = ""
anchor_id: Optional[str] = None
def __init__(self, key: str, site_key="", rest="", anchor_id = None):
self.key = key
self.site_key=site_key
self.rest = rest
self.anchor_id = anchor_id
stop = '/'
idx = 0
collector = [""]
for char in key:
if char == stop:
stop = '#'
idx += 1
collector = collector + [""]
continue
collector[idx] += char
if len(collector) > 0:
self.site_key = collector[0]
self.site = sites.get(self.site_key, None)
if len(collector) > 1:
self.rest = collector[1]
if len(collector) > 2:
self.anchor_id = collector[2]
def to_url(self) -> str:
env = get_settings().arcology_env
domains = self.site.domains.get(env, None)
url = ""
if domains is not None:
url = "https://{domain}/{rest}".format(domain=domains[0], rest=self.rest)
else:
url = "http://localhost:8000/{key}".format(key=self.key)
if self.anchor_id is not None:
url = url + "#" + self.anchor_id
return url
def from_request(request: Request):
path = request.url.path
host = request.headers.get('host')
return ArcologyKey.from_host_and_path(host, path)
def from_host_and_path(host: str, path: str):
m = route_regexp.match(path) or route_regexp2.match(path) or None
if m is None:
logger.debug("no path match: %s", path)
return None
sub_key = m.group("sub_key")
site = host_to_site(host)
if site is None:
logger.debug("no host match: %s", host)
return None
if len(sub_key) == 0:
sub_key = "index"
key = "{site_key}/{sub_key}".format(
site_key=site.key,
sub_key=sub_key,
)
return ArcologyKey(key)
#+end_src
Retrieving the =ARCOLOGY_KEY= given an ID is a pretty straightforward SQLModel query, actually. If the referenced node is in the Arroyo database, by definition it's got a published [[id:arcology/arroyo/page][arcology.arroyo.Page]], and so it's a matter of going and fetching it. If the Node is the root node (a direct link to the document), simply return the key, otherwise append the node-id to it so that a URL can link directly to the heading's anchor.
#+begin_src python :tangle arcology/key.py
def id_to_arcology_key(id: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
"""
Given a node ID, return the ARCOLOGY_KEY for the node.
"""
from .arroyo import Node
linked_node_query = sqlmodel.select(Node) \
.where(Node.node_id==print_sexp(id))
res = session.exec(linked_node_query)
linked_node = res.all()
if len(linked_node) == 1:
linked_node = linked_node[0]
linked_page = linked_node.page
if linked_page == None:
return None
page_key = parse_sexp(linked_page.key)
ret = ArcologyKey(key=page_key)
if linked_node.level != 0:
ret.anchor_id = id
return ret
elif len(linked_node) != 0:
raise Exception(f"more than one key for node? {id}")
else:
return None
#+end_src
By File is even more simple:
#+begin_src python :tangle arcology/key.py
def file_to_arcology_key(file: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
"""
Given a node ID, return the ARCOLOGY_KEY for the node.
"""
from .arroyo import Page
key_q = sqlmodel.select(Page).where(Page.file == print_sexp(file))
page = session.exec(key_q).first()
if page is None:
return
page_key = parse_sexp(page.key)
return ArcologyKey(key=page_key)
#+end_src
** NEXT HTML should inject sidenotes in during rewrite_html?
:PROPERTIES:
:ID: 20211219T165357.962899
:END:
this would be slow and maybe janky but that's probably fine once it's memoized. :Project: :Project:
but this would mean that node backlinks would appear in-line, things like [[id:6b306fe3-fbc4-4ba7-bfcb-089c0564f9c3][Topic Index]] have some trouble otherwise.
* Arcology Tags
:PROPERTIES:
:ID: arcology/arroyo/tag
:ROAM_ALIASES: arcology.arroyo.Tag
:END:
#+NAME: arcology.arroyo.Tag
#+begin_src python
class Tag(SQLModel, table=True):
__tablename__ = "arcology_tags"
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
tag: str = Field(primary_key=True, description="The tag itself.")
node_id: str = Field(description="A heading ID which the tag applies to")
def tag(self):
return parse_sexp(self.tag)
#+end_src
A page has any number of tags according to the file primary key:
#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
'(arcology-tags
[(file :not-null)
(tag :not-null)
(node-id :not-null)]))
(defun arroyo-arcology--insert-tags (file node-tags)
(arroyo-db-query [:delete :from arcology-tags
:where (= file $s1)]
file)
(pcase-dolist (`(,tag ,node-id) node-tags)
(arroyo-db-query [:insert :into arcology-tags
:values $v1]
(vector file tag node-id))))
#+end_src
* Arcology Links
:PROPERTIES:
:ID: arcology/arroyo/link
:ROAM_ALIASES: arcology.arroyo.Link
:END:
And for rewriting the links to point to their routing key, two tables:
A =links= table which contains the file *and* node ID references, as well as the title of the source file which can be used to quickly generate backlink listings for a given page (and its sub-heading nodes):
#+NAME: arcology.arroyo.Link
#+begin_src python
class Link(SQLModel, table=True):
__tablename__ = "arcology_links"
source_title: Optional[str] = Field(default="", description="The title of the page the link is written in.")
def get_source_title(self):
return parse_sexp(self.source_title)
source_id: str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
source_node: Optional["Node"] = Relationship(
sa_relationship_kwargs=dict(
# back_populates="outlinks",
primaryjoin="Node.node_id == Link.source_id"
)
)
dest_id: str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
dest_node: Optional["Node"] = Relationship(
sa_relationship_kwargs=dict(
# back_populates="backlinks",
primaryjoin="Node.node_id == Link.dest_id"
)
)
source_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
source_page: Optional["Page"] = Relationship(
back_populates="outlinks",
sa_relationship_kwargs=dict(
primaryjoin="Page.file==Link.source_file"
)
)
dest_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
dest_page: Optional["Page"] = Relationship(
back_populates="backlinks",
sa_relationship_kwargs=dict(
primaryjoin="Page.file==Link.dest_file"
)
)
#+end_src
Links in the [[id:cce/org-roam][org-roam]] database have a useful =type= column. We only store ID =Links= for now... probably can support file links easily enough but other "unidirectional" links I would like to store elsewhere I think.
#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
'(arcology-links
[source-title
(source-file :not-null)
(source-id :not-null)
(dest-file :not-null)
(dest-id :not-null)]))
(defun arcology--published-page? (file)
(not (not (arroyo-db-get "ARCOLOGY_KEY" file))))
(defun arroyo-arcology--insert-links (file source-title links)
(arroyo-db-query [:delete :from arcology-links
:where (= source-file $s1)]
file)
(pcase-dolist (`(,source ,dest ,type ,props) links)
(cond ((equal type "id")
(pcase-let* ((dest-file (caar (org-roam-db-query
[:select file :from nodes
:where (= id $s1)]
dest)))
(`(,immediate-source-title ,immediate-source-level)
(car (org-roam-db-query
[:select [title level] :from nodes
:where (= id $s1)]
source)))
;; "level 0 -> level n" unless n == 0
(composed-node-title
(if (= 0 immediate-source-level)
source-title
(concat source-title " -> " immediate-source-title))))
(when (and dest-file (arcology--published-page? dest-file)
(arroyo-db-query [:insert :into arcology-links
:values $v1]
(vector composed-node-title file source dest-file dest))))))
;; insert https link?
((equal type "https") nil)
((equal type "http") nil)
((equal type "roam") nil)
(t nil))))
#+end_src
** INPROGRESS =source_title= should populate with the immediate parent header's title, not level 0
:LOGBOOK:
- State "INPROGRESS" from "NEXT" [2022-08-05 Fri 14:03]
:END:
It's passed in to =arroyo-arcology--insert-links= [[id:arcology/arroyo][Below]]. Not sure the better way to do that -- query =org-roam-db= in the insert function itself? good enough for now prolly.
deal with the title being fetched and populated in that function below if necessary.
* Arcology Nodes
:PROPERTIES:
:ID: arcology/arroyo/node
:ROAM_ALIASES: arcology.arroyo.Node
:END:
A =nodes= table will help in reassembling links in to =HREFs=, in theory, but i don't think it's necessary? maybe? There are bunch of other metadata on this that I would like to pull across from [[id:cce/org-roam][org-roam]] eventually.
#+NAME: arcology.arroyo.Node
#+begin_src python
class Node(SQLModel, table=True):
__tablename__ = "arcology_nodes"
node_id: str = Field(primary_key=True, description="The heading ID property")
file: str = Field(description="File in which this Node appears", foreign_key="arcology_pages.file")
level: str = Field(description="Outline depth of the heading. 0 is top-level")
page: Optional["Page"] = Relationship(
back_populates="nodes",
sa_relationship_kwargs=dict(
viewonly=True,
primaryjoin="Node.file==Page.file"
)
)
#+end_src
#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
'(arcology-nodes
[(node-id :not-null)
(file :not-null)
(level :not-null)]))
(defun arroyo-arcology--insert-nodes (file nodes)
(arroyo-db-query [:delete :from arcology-nodes
:where (= file $s1)]
file)
(pcase-dolist (`(,file ,id ,level) nodes)
(arroyo-db-query [:insert :into arcology-nodes
:values $v1]
(vector id file level))))
#+end_src
* Arcology References
:PROPERTIES:
:ID: arcology/arroyo/ref
:ROAM_ALIASES: arcology.arroyo.Reference
:END:
Each [[id:cce/org-roam][org-roam]] node can have a set of "references" attached to them, I use these URIs to point to a "canonical" resource which the node is referencing.
#+NAME: arcology.arroyo.Ref
#+begin_src python
class Reference(SQLModel, table=True):
__tablename__ = "arcology_refs"
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
ref: str = Field(primary_key=True, description="The full URI of the reference itself.")
node_id: str = Field(description="A heading ID which the ref applies to")
def url(self):
return parse_sexp(self.ref)
#+end_src
A page has any number of refs according to the file primary key:
#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
'(arcology-refs
[(file :not-null)
(ref :not-null)
(node-id :not-null)]))
(defun arroyo-arcology--insert-refs (file node-refs)
(arroyo-db-query [:delete :from arcology-refs
:where (= file $s1)]
file)
(pcase-dolist (`(,ref ,type ,node-id) node-refs)
(arroyo-db-query [:insert :into arcology-refs
:values $v1]
(vector file (format "%s:%s" type ref) node-id))))
#+end_src
* INPROGRESS Arcology Feeds
:PROPERTIES:
:ID: arcology/arroyo/feed
:ROAM_ALIASES: arcology.arroyo.Feed
:END:
:LOGBOOK:
- State "INPROGRESS" from [2023-01-24 Tue 23:33]
:END:
#+NAME: arcology.arroyo.Feed
#+begin_src python
class Feed(SQLModel, table=True):
__tablename__ = "arcology_feeds"
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
key: str = Field(primary_key=True, description="The routing key for the feed.")
title: str = Field(description="Title of the page which the feed is embedded in")
site: str = Field(description="Arcology Site which the feed resides on.")
post_visibility: str = Field(description="Visibility of the feed's posts in feed2toot, etc")
def get_key(self):
return parse_sexp(self.key)
def get_arcology_key(self):
return ArcologyKey(self.get_key())
def get_title(self):
return parse_sexp(self.title)
def get_site(self):
return parse_sexp(self.site)
def get_post_visibility(self):
return parse_sexp(self.post_visibility)
def dict(self, **kwargs):
return dict(
key=self.get_key(),
url=self.get_arcology_key().to_url(),
title=self.get_title(),
site=self.get_site(),
visibility=self.get_post_visibility(),
)
#+end_src
A page has any number of feeds according to the file primary key:
#+begin_src emacs-lisp
(add-to-list 'arroyo-db--schemata
'(arcology-feeds
[(file :not-null)
(key :not-null)
(title :not-null)
(site :not-null)
(post-visibility :not-null)]))
(defun arroyo-arcology--insert-feeds (file)
(arroyo-db-query [:delete :from arcology-feeds
:where (= file $s1)]
file)
(if-let* ((key (car (arroyo-db-get "ARCOLOGY_FEED" file)))
(site (replace-regexp-in-string "/.*" "" key)))
(let* ((title (arroyo-db--get-file-title-from-org-roam file))
(post-visibility (car (arroyo-db-get "ARCOLOGY_TOOT_VISIBILITY" file))))
(arroyo-db-query [:insert :into arcology-feeds
:values $v1]
(vector file key title site post-visibility)))))
#+end_src
* Arcology Keywords
:PROPERTIES:
:ID: arcology/arroyo/keyword
:ROAM_ALIASES: arcology.arroyo.Keyword
:END:
All of these models are generated below from the =ARCOLOGY_KEY= entities embedded on each page. these are *Keywords*, a 3-tuple of file, keyword, value, a *threeple*
#+NAME: arcology.arroyo.Keyword
#+begin_src python
class Keyword(SQLModel, table=True):
__tablename__ = "keywords"
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
keyword: str = Field(primary_key=True, description="")
value: str = Field(description="The value of the page")
def filename(self):
return parse_sexp(self.file)
def keyword(self):
return parse_sexp(self.keyword)
def value(self):
return parse_sexp(self.value)
@classmethod
def get(cls, key: str, value: str, session: Session):
q = select(cls).where(cls.keyword==print_sexp(key)).where(cls.value==print_sexp(value))
try:
return next(session.exec(q))
except StopIteration:
return None
#+end_src
* Arcology [[id:arroyo/arroyo][Arroyo System]] Database Generator
:PROPERTIES:
:ID: arcology/arroyo
:ROAM_ALIASES: arroyo-arcology-update-file
:END:
Putting all those update functions together in an [[id:arroyo/system-cache][arroyo-db]] update function. This has to run after the [[id:cce/org-roam][org-roam]] and [[id:arroyo/system-cache][Arroyo System Cache]] keyword database is built, this is annoyign and I need to rethink it.
#+begin_src emacs-lisp
(defun arroyo-arcology-update-file (&optional file)
(interactive)
(when-let* ((file (or file (buffer-file-name)))
(page-keyword (first (arroyo-db-get "ARCOLOGY_KEY" file)))
(site-key (first (split-string page-keyword "/")))
(page-nodes (org-roam-db-query [:select [file id level title] :from nodes
:where (= file $s1)]
file))
(file-hash (caar (org-roam-db-query [:select [hash] :from files :where (= file $s1)]
file)))
(page-node-ids (apply #'vector (--map (second it) page-nodes)))
(level-0-node (--first (eq 0 (third it)) page-nodes))
(level-0-id (elt level-0-node 1))
(level-0-title (elt level-0-node 3)))
; remove the map here -- there will only ever be one level-0 node hopefully but this is hard to understand
(let* ((allow-crawl (first (arroyo-db-get "ARCOLOGY_ALLOW_CRAWL" file)))
(allow-crawl (and allow-crawl
(not (equal allow-crawl "nil")))) ; make sure writing "nil" in the key is respected
(all-node-refs (org-roam-db-query [:select [ref type node_id] :from refs
:where (in node_id $v1)]
page-node-ids))
(all-node-tags (org-roam-db-query [:select [tag node_id] :from tags
:where (in node_id $v1)]
page-node-ids))
(links (org-roam-db-query [:select [source dest type properties] :from links
:where (in source $v1)]
page-node-ids)))
(arroyo-arcology--insert-page file page-keyword site-key level-0-title level-0-id allow-crawl file-hash)
(arroyo-arcology--insert-nodes file page-nodes)
(arroyo-arcology--insert-tags file all-node-tags)
(arroyo-arcology--insert-refs file all-node-refs)
(arroyo-arcology--insert-feeds file)
(arroyo-arcology--insert-links file level-0-title links))))
(defun arroyo-arcology-update-db (&optional _wut)
(interactive)
(->>
(arroyo-db-get "ARCOLOGY_KEY")
(-map #'car)
(-uniq)
;; this runs *after* db is updated... what to do here?
;; (-filter #'arroyo-db-file-updated-p)
(-map #'arroyo-arcology-update-file)
)
)
(add-function :after (symbol-function 'arroyo-db-update-all-roam-files) #'arroyo-arcology-update-db)
;; (add-to-list 'arroyo-db-update-functions #'arroyo-arcology-update-file)
(provide 'arroyo-arcology)
#+end_src
* Arcology SQLModel Database Bindings
:PROPERTIES:
:ID: arcology/arroyo/sqlmodel
:END:
The engine looks like this, and it's pretty easy to attach my org-roam database here using the [[https://docs.sqlalchemy.org/en/14/core/event.html][SQLAlchmey Events System]] -- you can munge a =SQLModel='s =__table__.schema= to query and map against the org-roam metadatabase.
#+begin_src python :tangle arcology/arroyo.py
from sqlmodel import create_engine
from sqlalchemy import event
from arcology.config import get_settings
from pathlib import Path
settings = get_settings()
org_roam_sqlite_file_name = Path(settings.org_roam_db).expanduser().resolve()
arroyo_sqlite_file_name = Path(settings.arcology_db).expanduser().resolve()
def make_engine():
engine = create_engine('sqlite:///{path}'.format(path=arroyo_sqlite_file_name), echo=False)
@event.listens_for(engine, "connect")
def do_connect(dbapi_connection, _connection_record):
dbapi_connection.execute("attach database '{orgdb}' as orgroam;".format(orgdb=org_roam_sqlite_file_name))
return engine
engine = make_engine()
#+end_src
An interactive testing session could look like this, and indeed =C-c C-c= in here will run it in an [[elisp:(run-python)][Inferior Python]] session:
#+begin_src python :session *Python* :results none
from sqlmodel import select, SQLModel, Session
import arcology.arroyo as arroyo
from arcology.parse import *
engine = arroyo.engine
session = Session(engine)
first_link = next(session.exec(select(arroyo.Link)))
from_file = arroyo.Page.from_file("/home/rrix/org/arroyo/arroyo.org", session)
from_key = arroyo.Page.from_key("doc/archive", session)
ht = await from_key.document_html()
#+end_src
* Invoking the Arroyo generator from Python
:PROPERTIES:
:ID: 20220117T162800.337943
:ROAM_ALIASES: "Arcology Batch Commands"
:END:
Since the [[id:arcology/arroyo][Arcology Arroyo System]] is written in [[id:cce/programming_lisp_in_emacs][Emacs Lisp]], it's not exactly simple to update the database. When implemented as part of a long-running user-controlled [[id:cce/emacs][Emacs]] environment, Arroyo uses Emacs's [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Hooks.html][Hooks]] to update the database when org-mode files change.
Instead of doing that, we find ourselves implementing some scaffolding to replace it:
** Org-mode files are put on the server with [[id:cce/syncthing][Syncthing]]
** "Batch" commands for running Emacs with the Arroyo generators from a shell
This little [[id:cce/programming_lisp_in_emacs][Emacs Lisp]] script sets up some of the minimal [[id:cce/cce][CCE]] scaffolding to make the Arroyo-DB functions available to an environment.
#+begin_src emacs-lisp :tangle lisp/arcology-batch.el :mkdirp yes
(unless (boundp 'org-roam-directory)
(setq org-roam-directory (file-truename "~/org/")))
(load-file (expand-file-name "cce/packaging.el" org-roam-directory))
(add-to-list 'load-path default-directory)
(add-to-list 'load-path arroyo-source-directory)
(use-package dash)
(use-package f)
(use-package s)
(use-package emacsql)
;; (use-package emacsql-sqlite3)
(require 'subr-x)
(require 'cl)
(require 'org-roam)
(require 'arroyo-db)
(require 'arroyo-arcology)
#+end_src
That script is loaded by this script which isn't a script, but a template for a Python module so that the locations and variables can be customized at run time, loaded from the [[id:20220117T162655.535047][Arcology BaseSettings]].
(lord help me)
#+NAME: arcology-batch-shell
#+begin_src shell
set -ex
export DBPATH=$(mktemp $(dirname {arcology_db})/arcology.XXXXXXXXXX.db)
pushd {arcology_src};
cp {arcology_db} $DBPATH || echo "no existing db found, will be created from scratch"
{emacs} -Q --batch \
--eval '(setq org-roam-directory "{arcology_dir}")' \
--eval '(setq arcology-source-directory "{arcology_src}/lisp")' \
--eval '(setq arroyo-source-directory "{arroyo_src}")' \
--eval '(setq arroyo-db-location "'$DBPATH'")' \
--eval '(setq org-roam-db-location "{org_roam_db}")' \
-l lisp/arcology-batch.el \
--eval '(org-roam-db-sync)' # \
# --eval '(arroyo-db-update-all-roam-files)' \
# --eval '(arroyo-db-update-all-roam-files)' \
# --eval '(arroyo-arcology-update-db)'
mv $DBPATH {arcology_db}
echo "rebuild done"
#+end_src
The Python extracts stuff from that [[id:20220117T162655.535047][FastAPI/Pydantic =BaseSettings= module]] and templates it in with =format()=. Sorry for [[id:cce/literate_programming][Literate Programming]] ([[https://www.youtube.com/watch?v=SkTt9k4Y-a8][sorry for party rocking]])
#+begin_src python :tangle arcology/batch.py :noweb yes
from .config import get_settings
COMMAND_TMPL = """
<<arcology-batch-shell>>
"""
def build_command():
settings = get_settings()
return COMMAND_TMPL.format(
arcology_dir = settings.arcology_directory,
arcology_src = settings.arcology_src,
arroyo_src = settings.arroyo_src,
arcology_db = settings.arcology_db,
org_roam_db = settings.org_roam_db,
emacs = settings.arroyo_emacs,
)
#+end_src
This is executed by [[id:20211218T222408.578567][Arcology Automated Database Builder]].