1073 lines
39 KiB
Org Mode
1073 lines
39 KiB
Org Mode
:PROPERTIES:
|
|
:ID: arcology/arroyo-page
|
|
:END:
|
|
#+TITLE: Arroyo Arcology Generator
|
|
#+filetags: :Project:Arcology:
|
|
#+ARCOLOGY_KEY: arcology/arroyo
|
|
#+ARCOLOGY_ALLOW_CRAWL: t
|
|
#+AUTO_TANGLE: t
|
|
|
|
[[shell:ln -s arroyo-arcology.el ~/org/cce/arroyo-arcology.el]] this needs to be in the CCE directory for [[id:arroyo/emacs][Arroyo Emacs]] to automatically load it.
|
|
|
|
his can be set up to automatically load in an [[id:arroyo/emacs][Arroyo Emacs]] environment.
|
|
#+ARROYO_EMACS_MODULE: arroyo-arcology
|
|
#+ARROYO_MODULE_WANTS: arroyo/arroyo.org
|
|
|
|
The Arcology is fundamentally about rendering and sharing entire org-mode documents on the web. This made the direct usage of [[id:cce/org-roam][org-roam]]'s database a pretty straight-forward endeavor, until the migration to a Node-centered model with org-roam v2. This model has made my note-taking much better but it's forced me to rethink the data model of the Arcology pretty significantly.
|
|
|
|
This ultimately has developed over 2021 as [[id:arroyo/arroyo][Arroyo Systems Management]] -- a set of sidecar metadata tables for my notes and the [[id:128ab0e8-a1c7-48bf-9efe-0c23ce906a48][org-mode meta applications]] built on top of them. The Arcology's database is a set of tables derived from the metadata in my org-mode files. This database is generated inside of Emacs and mounted read-only by my FastAPI session via [[id:20210925T182140.388493][SQLModel]]. I would love to generate this database another way, but there is still only one high-quality org parser: org-mode.
|
|
|
|
The "entry point" of this API is the =arcology.arroyo.Page= below. It has some class methods hanging off it which can instantiate Pages from the database by filename or routing key.
|
|
|
|
A page doesn't require much metadata to render or be found, really. The org-mode source file, its =ARCOLOGY_KEY= routing key, and the root [[id:20211203T142533.902422][arcology.roam.Node]] object's primary ID. Most of this can be gleaned from the [[id:20211203T142617.812313][arcology.roam.File]] object and my Keyword sidecar.
|
|
|
|
#+begin_src emacs-lisp
|
|
(add-to-list 'arroyo-db-keywords "ARCOLOGY_KEY")
|
|
(add-to-list 'arroyo-db-keywords "ARCOLOGY_FEED")
|
|
(add-to-list 'arroyo-db-keywords "ARCOLOGY_TOOT_VISIBILITY")
|
|
(add-to-list 'arroyo-db-keywords "ARCOLOGY_ALLOW_CRAWL")
|
|
#+end_src
|
|
|
|
The =ARCOLOGY_KEY= is a file property which contains the page's "routing key" -- a string with at least one =/= in it which separates the site it'll publish to from the path it'll be published on -- this maps to a URL in the form of =localhost:3000/$ARCOLOGY_KEY= or the first part will map to one of the public domains. this will make more sense later on.
|
|
|
|
The =ARCOLOGY_FEED= is a file property which contains a routing key to an RSS feed
|
|
|
|
#+PROPERTY: header-args:emacs-lisp :tangle arroyo-arcology.el :results none :mkdirp yes :comments link
|
|
|
|
This is assembled using [[id:09779ac0-4d5f-40db-a340-49595c717e03][noweb syntax]] because Page relies on Link being defined for the =link_model= relationship... And there is some more code that makes it in to =arcology.arroyo= for setting up the session and engine down below under [[id:arcology/arroyo/sqlmodel][Arcology SQLModel Database Bindings]] ...
|
|
|
|
#+begin_src python :tangle arcology/arroyo.py :noweb yes
|
|
from typing import Optional, List
|
|
from sqlmodel import Field, Relationship, SQLModel
|
|
|
|
from arcology.parse import parse_sexp, print_sexp
|
|
|
|
<<arcology.arroyo.Link>>
|
|
<<arcology.arroyo.Page>>
|
|
<<arcology.arroyo.Tag>>
|
|
<<arcology.arroyo.Node>>
|
|
<<arcology.arroyo.Ref>>
|
|
<<arcology.arroyo.Keyword>>
|
|
<<arcology.arroyo.Feed>>
|
|
#+end_src
|
|
|
|
Anyways.
|
|
|
|
* NEXT document schemas
|
|
|
|
explain inter-relations between these classes, maybe a relationship graph
|
|
|
|
explain columns and link to where specialized columns like =allow_crawl= go and come from?
|
|
|
|
* Arcology Page
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/page
|
|
:ROAM_ALIASES: arcology.arroyo.Page
|
|
:END:
|
|
|
|
A Page represents the minimal metadata required to find and render an [[id:1fb8fb45-fac5-4449-a347-d55118bb377e][org-mode]] document and generate links to it. I would love to someday not have to wire up all these relationships by hand, I'll have to remodel this at some point, but for now specifying all the =primaryjoin= characteristics is enough.
|
|
|
|
#+NAME: arcology.arroyo.Page
|
|
#+begin_src python :noweb yes
|
|
from sqlmodel import Session, select
|
|
import hashlib
|
|
|
|
from arcology.key import ArcologyKey, id_to_arcology_key
|
|
import arcology.html as html
|
|
|
|
class Page(SQLModel, table=True):
|
|
__tablename__ = "arcology_pages"
|
|
file: str = Field(primary_key=True)
|
|
key: str = Field(description="The ARCOLOGY_KEY for the page")
|
|
title: str = Field(description="Primary title of the page")
|
|
hash: str = Field(description="The hash of the file when it was indexed")
|
|
root_id: str = Field(description="The ID for the page itself", foreign_key="nodes.node_id")
|
|
site: str = Field(description="Maps to an arcology.Site key.")
|
|
allow_crawl: str = Field(description="Lisp boolean for whether this page should go in robots.txt")
|
|
|
|
nodes: List["Node"] = Relationship(
|
|
back_populates="page",
|
|
sa_relationship_kwargs=dict(
|
|
primaryjoin="Node.file==Page.file"
|
|
)
|
|
)
|
|
tags: List["Tag"] = Relationship(
|
|
sa_relationship_kwargs=dict(
|
|
primaryjoin="Tag.file==Page.file"
|
|
)
|
|
)
|
|
references: List["Reference"] = Relationship(
|
|
sa_relationship_kwargs=dict(
|
|
primaryjoin="Reference.file==Page.file"
|
|
)
|
|
)
|
|
|
|
def get_title(self):
|
|
return parse_sexp(self.title)
|
|
|
|
def get_key(self):
|
|
return parse_sexp(self.key)
|
|
|
|
def get_file(self):
|
|
return parse_sexp(self.file)
|
|
|
|
def get_arcology_key(self):
|
|
return ArcologyKey(self.get_key())
|
|
|
|
def get_site(self):
|
|
return self.get_arcology_key().site
|
|
|
|
<<page_link_relationships>>
|
|
<<page_classmethods>>
|
|
<<page_html_generators>>
|
|
#+end_src
|
|
|
|
#+NAME: page_classmethods
|
|
#+begin_src python
|
|
@classmethod
|
|
def from_file(cls, path: str, session: Session):
|
|
q = select(cls).where(cls.file==print_sexp(path))
|
|
return session.exec(q).one()
|
|
|
|
@classmethod
|
|
def from_key(cls, key: str, session: Session):
|
|
q = select(cls).where(cls.key==print_sexp(key))
|
|
try:
|
|
return next(session.exec(q))
|
|
except StopIteration:
|
|
return None
|
|
#+end_src
|
|
|
|
The Page carries bi-directional link relationships to both the Link and the Page on the other side of it.
|
|
|
|
#+NAME: page_link_relationships
|
|
#+begin_src python
|
|
backlinks: List["Link"] = Relationship(
|
|
back_populates="dest_page",
|
|
sa_relationship_kwargs=dict(
|
|
primaryjoin="Page.file==Link.dest_file"
|
|
)
|
|
)
|
|
|
|
outlinks: List["Link"] = Relationship(
|
|
back_populates="source_page",
|
|
sa_relationship_kwargs=dict(
|
|
primaryjoin="Page.file==Link.source_file"
|
|
)
|
|
)
|
|
|
|
backlink_pages: List["Page"] = Relationship(
|
|
link_model=Link,
|
|
back_populates="outlink_pages",
|
|
sa_relationship_kwargs=dict(
|
|
foreign_keys="[Link.dest_file]",
|
|
viewonly=True,
|
|
)
|
|
)
|
|
|
|
outlink_pages: List["Page"] = Relationship(
|
|
link_model=Link,
|
|
back_populates="backlink_pages",
|
|
sa_relationship_kwargs=dict(
|
|
foreign_keys="[Link.source_file]",
|
|
viewonly=True,
|
|
)
|
|
)
|
|
#+end_src
|
|
|
|
The code to insert a page relies on a bunch of stuff pulled out of the page and out of the [[id:arcology/arroyo/keyword][Arcology Keywords]] store -- be sure the arguments line up, and maybe i should switch these to use =&keys= eventually so that it's less foot-gun-shaped
|
|
|
|
#+begin_src emacs-lisp
|
|
(add-to-list 'arroyo-db--schemata
|
|
'(arcology-pages
|
|
[(file :not-null)
|
|
(key :not-null)
|
|
(site :not-null)
|
|
(title :not-null)
|
|
(root-id :not-null)
|
|
(allow-crawl)
|
|
(hash :not-null)]))
|
|
|
|
(defun arroyo-arcology--insert-page (file kw site title root-id allow-crawl hash)
|
|
(arroyo-db-query [:delete :from arcology-pages
|
|
:where (= file $s1)]
|
|
file)
|
|
(arroyo-db-query [:insert :into arcology-pages :values $v1]
|
|
(vector file kw site title root-id allow-crawl hash)))
|
|
#+end_src
|
|
|
|
** Generating HTML from Arcology Pages
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/gen_html
|
|
:END:
|
|
|
|
Arcology pages have two "documents" attached to them on render: the Org doc itself, and a document constituted from the backlinks.
|
|
|
|
The backlink document is generated dynamically using =Page.make_backlinks_org= which just generates a string from the Link relationships.
|
|
|
|
#+NAME: page_html_generators
|
|
#+begin_src python
|
|
def make_backlinks_org(self):
|
|
if self.backlinks is None:
|
|
return ''
|
|
|
|
def to_org(link: Link):
|
|
return \
|
|
"""
|
|
,* [[id:{path}][{title}]]
|
|
""".format(
|
|
path=parse_sexp(link.source_id),
|
|
title=link.get_source_title()
|
|
)
|
|
|
|
return '\n'.join([ to_org(link) for link in self.backlinks ])
|
|
|
|
async def document_html(self):
|
|
cache_key = parse_sexp(self.hash)
|
|
return html.gen_html(parse_sexp(self.file), cache_key)
|
|
|
|
async def backlink_html(self):
|
|
org = self.make_backlinks_org()
|
|
cache_key = hashlib.sha224(org.encode('utf-8')).hexdigest()
|
|
return html.gen_html_text(org, cache_key)
|
|
#+end_src
|
|
|
|
** Invoking Pandoc
|
|
|
|
[[https://pandoc.org/][Pandoc]] is used to generate the HTML for a page. It's a versatile kit and I do some fair bit to extend it in other places, for example in the
|
|
|
|
The HTML generation is done using [[https://pypi.org/project/pypandoc/][PyPandoc]], which I guess is just a shell wrapper around it. Caching is cheated with an [[https://docs.python.org/3/library/functools.html#functools.lru_cache][functools.lru_cache]]; for this to work out well I need to bring the file's hash in to the [[id:arcology/arroyo/page][arcology.arroyo.Page]] so that the cache can bust when the document is updated.
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
import functools
|
|
import pypandoc
|
|
|
|
@functools.lru_cache(maxsize=128)
|
|
def gen_html(input_path: str, extra_cache_key: str = '', input_format: str = 'org'):
|
|
return pypandoc.convert_file(input_path, 'html', format='org')
|
|
|
|
@functools.lru_cache(maxsize=128)
|
|
def gen_html_text(input_text: str, extra_cache_key: str = '', input_format: str = 'org'):
|
|
return pypandoc.convert_text(input_text, 'html', format='org')
|
|
#+end_src
|
|
|
|
** Rewriting and Hydrating the Pandoc HTML
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/hydrate
|
|
:END:
|
|
|
|
So the HTML that comes out of Pandoc is smart but doesn't understand, for example, ID links; I could of course use Emacs and its =org-html-export-as-html= but that shit is gonna be really slow. Instead I'll do the work myself (lol).
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
from arcology.parse import print_sexp, parse_sexp
|
|
import arcology.arroyo as arroyo
|
|
|
|
import sqlmodel
|
|
import re
|
|
from typing import Optional
|
|
|
|
from arcology.key import id_to_arcology_key, file_to_arcology_key
|
|
|
|
class HTMLRewriter():
|
|
def __init__(self, session):
|
|
self.res_404 = 'href="/404?missing={key}" class="dead-link"'
|
|
self.session = session
|
|
|
|
def replace(match):
|
|
raise NotImplementedError()
|
|
|
|
def re(self):
|
|
raise NotImplementedError()
|
|
|
|
def do(self, output_html):
|
|
return re.sub(self.re(), self.replace, output_html)
|
|
#+end_src
|
|
|
|
Rewriting the HTML is a pretty straightforward affair using [[https://docs.python.org/3/library/re.html#re.sub][re.sub]] with callbacks rather than static replacements, with some abstraction sprinkled on top in the form of the =HTMLRewriter= superclass defined above. Each implementation of it provides a function which accepts the match object, and pulls the node's [[id:arcology/arroyo/key][=ARCOLOGY_KEY=]] with an optional node-id anchor attached to it. This is then farmed out to [[id:arcology/arroyo/key][=arcology_key_to_url=]] or so to be turned in to a URL. In this fashion, each =href= is replaced with a URL that will route to the target page, or a 404 page link with a CSS class attached.
|
|
|
|
I'm pretty sure this is all quite inefficient but as always I invoke [[id:personal_software_can_be_shitty][Personal Software Can Be Shitty]].
|
|
|
|
So ID links can be rewritten like:
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
class IDReplacementRewriter(HTMLRewriter):
|
|
def replace(self, match):
|
|
id = match.group(1)
|
|
key = id_to_arcology_key(id, self.session)
|
|
if key is None:
|
|
return self.res_404.format(key=id)
|
|
else:
|
|
return 'class="internal" href="{url}"'.format(url=arcology_key_to_url(key))
|
|
|
|
def re(self):
|
|
return r'href="id:([^"]+)"'
|
|
#+end_src
|
|
|
|
File links can be rewritten like:
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
class FileReplacementRewriter(HTMLRewriter):
|
|
def replace(self, match):
|
|
file = match.group(1)
|
|
if file is None:
|
|
return self.res_404.format(key=file)
|
|
key = file_to_arcology_key(file, self.session)
|
|
if key is None:
|
|
return self.res_404.format(key=file)
|
|
else:
|
|
return 'class="file" href="{url}"'.format(url=arcology_key_to_url(key))
|
|
|
|
def re(self):
|
|
return r'href="file://([^"]+)"'
|
|
#+end_src
|
|
|
|
[[id:cce/org-roam][org-roam]] stub links can be rewritten link. This one is a little wonky because =res_404= and the other regexen don't only want to operate on the anchor's attribute. This one wants to strip the =roam:= text from the =[[roam:Stub]]= links.
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
class RoamReplacementRewriter(HTMLRewriter):
|
|
def replace(self, match):
|
|
return self.res_404.format(key=match.group(1)) + ">"
|
|
|
|
def re(self):
|
|
return r'href="roam:([^"]+)">roam:'
|
|
#+end_src
|
|
|
|
I also make some quality-of-life rewrites of my [[id:2e31b385-a003-4369-a136-c6b78c0917e1][org-fc]] cloze cards in to simple =<span>= elements with the hint embedded in them.
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
class FCClozeReplacementRewriter(HTMLRewriter):
|
|
def replace(self, match):
|
|
main = match.group(1) or ""
|
|
hint = match.group(2) or ""
|
|
hint = re.sub(r"</?[^>]+>", "", hint)
|
|
return f"<span class='fc-cloze' title='{hint}'>{main}</span>"
|
|
|
|
def re(self):
|
|
return r'{{([^}]+)}{?([^}]+)?}?@[0-9]+}'
|
|
#+end_src
|
|
|
|
Invoke all these in a simple little harness:
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
def rewrite_html(input_html: str, session: sqlmodel.Session) -> str:
|
|
"""
|
|
Run a series of replacement functions on the input HTML and return a new string.
|
|
"""
|
|
|
|
output_html = input_html
|
|
|
|
rewriters = [
|
|
IDReplacementRewriter(session),
|
|
FileReplacementRewriter(session),
|
|
RoamReplacementRewriter(session),
|
|
FCClozeReplacementRewriter(session),
|
|
]
|
|
|
|
for rewriter in rewriters:
|
|
output_html = rewriter.do(output_html)
|
|
|
|
return output_html
|
|
#+end_src
|
|
|
|
It's logical that at some point this will have a "pluggable" URL engine, and in fact the production URLs will be hosted under different domains so deconstructing a URL to an ARCOLOGY_KEY ... all of this can happen later, I am just playing jazz right now!
|
|
|
|
#+begin_src python :tangle arcology/html.py
|
|
from arcology.key import ArcologyKey
|
|
|
|
def arcology_key_to_url(key: ArcologyKey) -> str:
|
|
return key.to_url()
|
|
#+end_src
|
|
|
|
** =arcology.key.ArcologyKey= encapuslates parsing and rendering URLs
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/key
|
|
:ROAM_ALIASES: arcology.key.ArcologyKey arcology.key.file_to_arcology_key arcology.key.id_to_arcology_key
|
|
:END:
|
|
|
|
The =ArcologyKey= is a simple =dataclass= encapsulating the things which the =ARCOLOGY_KEY= page keyword represents.
|
|
|
|
For example the key =ArcologyKey(key=arcology/arroyo#arcology/arroyo/key)= will contain the following properties:
|
|
- =key=: the key passed in
|
|
- =site_key=: this everything up to the first slash. It points to objects defined and fetchable through [[id:20211219T144255.001827][Arcology Sites]].
|
|
- =site=: I typed the line above, and said "oh", and added this resolution of the =arcology.sites.Site= object.
|
|
- =rest=: "rest" is everything after the slash, but up to an optional anchor
|
|
- =anchor_id=: said optional anchor -- Pandoc headings within the page will have the =ID= property as the anchor, this is handy!
|
|
|
|
#+begin_src python :tangle arcology/key.py
|
|
from dataclasses import dataclass
|
|
from typing import Optional
|
|
|
|
from fastapi import Request
|
|
from starlette import routing
|
|
import sqlmodel
|
|
|
|
from arcology.parse import parse_sexp, print_sexp
|
|
from arcology.sites import sites, Site
|
|
from arcology.config import get_settings, Environment
|
|
from arcology.sites import host_to_site
|
|
|
|
route_regexp, _, _ = routing.compile_path("/{sub_key:path}/")
|
|
route_regexp2, _, _ = routing.compile_path("/{sub_key:path}")
|
|
|
|
import logging
|
|
logger = logging.getLogger(__name__)
|
|
logger.setLevel(logging.DEBUG)
|
|
|
|
@dataclass
|
|
class ArcologyKey():
|
|
key: str
|
|
site_key: str
|
|
site: Site
|
|
rest: str = ""
|
|
anchor_id: Optional[str] = None
|
|
|
|
def __init__(self, key: str, site_key="", rest="", anchor_id = None):
|
|
self.key = key
|
|
self.site_key=site_key
|
|
self.rest = rest
|
|
self.anchor_id = anchor_id
|
|
|
|
stop = '/'
|
|
idx = 0
|
|
collector = [""]
|
|
for char in key:
|
|
if char == stop:
|
|
stop = '#'
|
|
idx += 1
|
|
collector = collector + [""]
|
|
continue
|
|
collector[idx] += char
|
|
|
|
if len(collector) > 0:
|
|
self.site_key = collector[0]
|
|
self.site = sites.get(self.site_key, None)
|
|
if len(collector) > 1:
|
|
self.rest = collector[1]
|
|
if len(collector) > 2:
|
|
self.anchor_id = collector[2]
|
|
|
|
def to_url(self) -> str:
|
|
env = get_settings().arcology_env
|
|
domains = self.site.domains.get(env, None)
|
|
|
|
url = ""
|
|
if domains is not None:
|
|
url = "https://{domain}/{rest}".format(domain=domains[0], rest=self.rest)
|
|
else:
|
|
url = "http://localhost:8000/{key}".format(key=self.key)
|
|
if self.anchor_id is not None:
|
|
url = url + "#" + self.anchor_id
|
|
|
|
return url
|
|
|
|
def from_request(request: Request):
|
|
path = request.url.path
|
|
host = request.headers.get('host')
|
|
return ArcologyKey.from_host_and_path(host, path)
|
|
|
|
def from_host_and_path(host: str, path: str):
|
|
m = route_regexp.match(path) or route_regexp2.match(path) or None
|
|
if m is None:
|
|
logger.debug("no path match: %s", path)
|
|
return None
|
|
sub_key = m.group("sub_key")
|
|
|
|
site = host_to_site(host)
|
|
if site is None:
|
|
logger.debug("no host match: %s", host)
|
|
return None
|
|
|
|
if len(sub_key) == 0:
|
|
sub_key = "index"
|
|
key = "{site_key}/{sub_key}".format(
|
|
site_key=site.key,
|
|
sub_key=sub_key,
|
|
)
|
|
return ArcologyKey(key)
|
|
#+end_src
|
|
|
|
Retrieving the =ARCOLOGY_KEY= given an ID is a pretty straightforward SQLModel query, actually. If the referenced node is in the Arroyo database, by definition it's got a published [[id:arcology/arroyo/page][arcology.arroyo.Page]], and so it's a matter of going and fetching it. If the Node is the root node (a direct link to the document), simply return the key, otherwise append the node-id to it so that a URL can link directly to the heading's anchor.
|
|
|
|
#+begin_src python :tangle arcology/key.py
|
|
def id_to_arcology_key(id: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
|
|
"""
|
|
Given a node ID, return the ARCOLOGY_KEY for the node.
|
|
"""
|
|
from .arroyo import Node
|
|
|
|
linked_node_query = sqlmodel.select(Node) \
|
|
.where(Node.node_id==print_sexp(id))
|
|
res = session.exec(linked_node_query)
|
|
|
|
linked_node = res.all()
|
|
if len(linked_node) == 1:
|
|
linked_node = linked_node[0]
|
|
linked_page = linked_node.page
|
|
|
|
if linked_page == None:
|
|
return None
|
|
|
|
page_key = parse_sexp(linked_page.key)
|
|
ret = ArcologyKey(key=page_key)
|
|
if linked_node.level != 0:
|
|
ret.anchor_id = id
|
|
return ret
|
|
|
|
elif len(linked_node) != 0:
|
|
raise Exception(f"more than one key for node? {id}")
|
|
else:
|
|
return None
|
|
#+end_src
|
|
|
|
By File is even more simple:
|
|
|
|
#+begin_src python :tangle arcology/key.py
|
|
def file_to_arcology_key(file: str, session: sqlmodel.Session) -> Optional[ArcologyKey]:
|
|
"""
|
|
Given a node ID, return the ARCOLOGY_KEY for the node.
|
|
"""
|
|
from .arroyo import Page
|
|
key_q = sqlmodel.select(Page).where(Page.file == print_sexp(file))
|
|
page = session.exec(key_q).first()
|
|
|
|
if page is None:
|
|
return
|
|
page_key = parse_sexp(page.key)
|
|
return ArcologyKey(key=page_key)
|
|
#+end_src
|
|
|
|
** NEXT HTML should inject sidenotes in during rewrite_html?
|
|
:PROPERTIES:
|
|
:ID: 20211219T165357.962899
|
|
:END:
|
|
|
|
this would be slow and maybe janky but that's probably fine once it's memoized. :Project: :Project:
|
|
|
|
but this would mean that node backlinks would appear in-line, things like [[id:6b306fe3-fbc4-4ba7-bfcb-089c0564f9c3][Topic Index]] have some trouble otherwise.
|
|
* Arcology Tags
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/tag
|
|
:ROAM_ALIASES: arcology.arroyo.Tag
|
|
:END:
|
|
|
|
#+NAME: arcology.arroyo.Tag
|
|
#+begin_src python
|
|
class Tag(SQLModel, table=True):
|
|
__tablename__ = "arcology_tags"
|
|
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
|
|
tag: str = Field(primary_key=True, description="The tag itself.")
|
|
node_id: str = Field(description="A heading ID which the tag applies to")
|
|
|
|
def tag(self):
|
|
return parse_sexp(self.tag)
|
|
#+end_src
|
|
|
|
A page has any number of tags according to the file primary key:
|
|
|
|
#+begin_src emacs-lisp
|
|
(add-to-list 'arroyo-db--schemata
|
|
'(arcology-tags
|
|
[(file :not-null)
|
|
(tag :not-null)
|
|
(node-id :not-null)]))
|
|
|
|
(defun arroyo-arcology--insert-tags (file node-tags)
|
|
(arroyo-db-query [:delete :from arcology-tags
|
|
:where (= file $s1)]
|
|
file)
|
|
(pcase-dolist (`(,tag ,node-id) node-tags)
|
|
(arroyo-db-query [:insert :into arcology-tags
|
|
:values $v1]
|
|
(vector file tag node-id))))
|
|
#+end_src
|
|
|
|
* Arcology Links
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/link
|
|
:ROAM_ALIASES: arcology.arroyo.Link
|
|
:END:
|
|
|
|
And for rewriting the links to point to their routing key, two tables:
|
|
|
|
A =links= table which contains the file *and* node ID references, as well as the title of the source file which can be used to quickly generate backlink listings for a given page (and its sub-heading nodes):
|
|
|
|
#+NAME: arcology.arroyo.Link
|
|
#+begin_src python
|
|
class Link(SQLModel, table=True):
|
|
__tablename__ = "arcology_links"
|
|
source_title: Optional[str] = Field(default="", description="The title of the page the link is written in.")
|
|
|
|
def get_source_title(self):
|
|
return parse_sexp(self.source_title)
|
|
|
|
source_id: str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
|
|
source_node: Optional["Node"] = Relationship(
|
|
sa_relationship_kwargs=dict(
|
|
# back_populates="outlinks",
|
|
primaryjoin="Node.node_id == Link.source_id"
|
|
)
|
|
)
|
|
|
|
dest_id: str = Field(primary_key=True, foreign_key="arcology_nodes.node_id")
|
|
dest_node: Optional["Node"] = Relationship(
|
|
sa_relationship_kwargs=dict(
|
|
# back_populates="backlinks",
|
|
primaryjoin="Node.node_id == Link.dest_id"
|
|
)
|
|
)
|
|
|
|
source_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
|
|
source_page: Optional["Page"] = Relationship(
|
|
back_populates="outlinks",
|
|
sa_relationship_kwargs=dict(
|
|
primaryjoin="Page.file==Link.source_file"
|
|
)
|
|
)
|
|
|
|
dest_file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
|
|
dest_page: Optional["Page"] = Relationship(
|
|
back_populates="backlinks",
|
|
sa_relationship_kwargs=dict(
|
|
primaryjoin="Page.file==Link.dest_file"
|
|
)
|
|
)
|
|
#+end_src
|
|
|
|
Links in the [[id:cce/org-roam][org-roam]] database have a useful =type= column. We only store ID =Links= for now... probably can support file links easily enough but other "unidirectional" links I would like to store elsewhere I think.
|
|
|
|
#+begin_src emacs-lisp
|
|
(add-to-list 'arroyo-db--schemata
|
|
'(arcology-links
|
|
[source-title
|
|
(source-file :not-null)
|
|
(source-id :not-null)
|
|
(dest-file :not-null)
|
|
(dest-id :not-null)]))
|
|
|
|
(defun arcology--published-page? (file)
|
|
(not (not (arroyo-db-get "ARCOLOGY_KEY" file))))
|
|
|
|
(defun arroyo-arcology--insert-links (file source-title links)
|
|
(arroyo-db-query [:delete :from arcology-links
|
|
:where (= source-file $s1)]
|
|
file)
|
|
(pcase-dolist (`(,source ,dest ,type ,props) links)
|
|
(cond ((equal type "id")
|
|
(pcase-let* ((dest-file (caar (org-roam-db-query
|
|
[:select file :from nodes
|
|
:where (= id $s1)]
|
|
dest)))
|
|
(`(,immediate-source-title ,immediate-source-level)
|
|
(car (org-roam-db-query
|
|
[:select [title level] :from nodes
|
|
:where (= id $s1)]
|
|
source)))
|
|
;; "level 0 -> level n" unless n == 0
|
|
(composed-node-title
|
|
(if (= 0 immediate-source-level)
|
|
source-title
|
|
(concat source-title " -> " immediate-source-title))))
|
|
(when (and dest-file (arcology--published-page? dest-file)
|
|
(arroyo-db-query [:insert :into arcology-links
|
|
:values $v1]
|
|
(vector composed-node-title file source dest-file dest))))))
|
|
;; insert https link?
|
|
((equal type "https") nil)
|
|
((equal type "http") nil)
|
|
((equal type "roam") nil)
|
|
(t nil))))
|
|
#+end_src
|
|
|
|
** INPROGRESS =source_title= should populate with the immediate parent header's title, not level 0
|
|
:LOGBOOK:
|
|
- State "INPROGRESS" from "NEXT" [2022-08-05 Fri 14:03]
|
|
:END:
|
|
|
|
It's passed in to =arroyo-arcology--insert-links= [[id:arcology/arroyo][Below]]. Not sure the better way to do that -- query =org-roam-db= in the insert function itself? good enough for now prolly.
|
|
|
|
deal with the title being fetched and populated in that function below if necessary.
|
|
|
|
* Arcology Nodes
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/node
|
|
:ROAM_ALIASES: arcology.arroyo.Node
|
|
:END:
|
|
|
|
A =nodes= table will help in reassembling links in to =HREFs=, in theory, but i don't think it's necessary? maybe? There are bunch of other metadata on this that I would like to pull across from [[id:cce/org-roam][org-roam]] eventually.
|
|
|
|
#+NAME: arcology.arroyo.Node
|
|
#+begin_src python
|
|
class Node(SQLModel, table=True):
|
|
__tablename__ = "arcology_nodes"
|
|
node_id: str = Field(primary_key=True, description="The heading ID property")
|
|
file: str = Field(description="File in which this Node appears", foreign_key="arcology_pages.file")
|
|
level: str = Field(description="Outline depth of the heading. 0 is top-level")
|
|
|
|
page: Optional["Page"] = Relationship(
|
|
back_populates="nodes",
|
|
sa_relationship_kwargs=dict(
|
|
viewonly=True,
|
|
primaryjoin="Node.file==Page.file"
|
|
)
|
|
)
|
|
#+end_src
|
|
|
|
#+begin_src emacs-lisp
|
|
(add-to-list 'arroyo-db--schemata
|
|
'(arcology-nodes
|
|
[(node-id :not-null)
|
|
(file :not-null)
|
|
(level :not-null)]))
|
|
|
|
(defun arroyo-arcology--insert-nodes (file nodes)
|
|
(arroyo-db-query [:delete :from arcology-nodes
|
|
:where (= file $s1)]
|
|
file)
|
|
(pcase-dolist (`(,file ,id ,level) nodes)
|
|
(arroyo-db-query [:insert :into arcology-nodes
|
|
:values $v1]
|
|
(vector id file level))))
|
|
#+end_src
|
|
|
|
* Arcology References
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/ref
|
|
:ROAM_ALIASES: arcology.arroyo.Reference
|
|
:END:
|
|
|
|
Each [[id:cce/org-roam][org-roam]] node can have a set of "references" attached to them, I use these URIs to point to a "canonical" resource which the node is referencing.
|
|
|
|
#+NAME: arcology.arroyo.Ref
|
|
#+begin_src python
|
|
class Reference(SQLModel, table=True):
|
|
__tablename__ = "arcology_refs"
|
|
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
|
|
ref: str = Field(primary_key=True, description="The full URI of the reference itself.")
|
|
node_id: str = Field(description="A heading ID which the ref applies to")
|
|
|
|
def url(self):
|
|
return parse_sexp(self.ref)
|
|
#+end_src
|
|
|
|
A page has any number of refs according to the file primary key:
|
|
|
|
#+begin_src emacs-lisp
|
|
(add-to-list 'arroyo-db--schemata
|
|
'(arcology-refs
|
|
[(file :not-null)
|
|
(ref :not-null)
|
|
(node-id :not-null)]))
|
|
|
|
(defun arroyo-arcology--insert-refs (file node-refs)
|
|
(arroyo-db-query [:delete :from arcology-refs
|
|
:where (= file $s1)]
|
|
file)
|
|
(pcase-dolist (`(,ref ,type ,node-id) node-refs)
|
|
(arroyo-db-query [:insert :into arcology-refs
|
|
:values $v1]
|
|
(vector file (format "%s:%s" type ref) node-id))))
|
|
#+end_src
|
|
|
|
* INPROGRESS Arcology Feeds
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/feed
|
|
:ROAM_ALIASES: arcology.arroyo.Feed
|
|
:END:
|
|
:LOGBOOK:
|
|
- State "INPROGRESS" from [2023-01-24 Tue 23:33]
|
|
:END:
|
|
|
|
#+NAME: arcology.arroyo.Feed
|
|
#+begin_src python
|
|
class Feed(SQLModel, table=True):
|
|
__tablename__ = "arcology_feeds"
|
|
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
|
|
key: str = Field(primary_key=True, description="The routing key for the feed.")
|
|
title: str = Field(description="Title of the page which the feed is embedded in")
|
|
site: str = Field(description="Arcology Site which the feed resides on.")
|
|
post_visibility: str = Field(description="Visibility of the feed's posts in feed2toot, etc")
|
|
|
|
def get_key(self):
|
|
return parse_sexp(self.key)
|
|
|
|
def get_arcology_key(self):
|
|
return ArcologyKey(self.get_key())
|
|
|
|
def get_title(self):
|
|
return parse_sexp(self.title)
|
|
|
|
def get_site(self):
|
|
return parse_sexp(self.site)
|
|
|
|
def get_post_visibility(self):
|
|
return parse_sexp(self.post_visibility)
|
|
|
|
def dict(self, **kwargs):
|
|
return dict(
|
|
key=self.get_key(),
|
|
url=self.get_arcology_key().to_url(),
|
|
title=self.get_title(),
|
|
site=self.get_site(),
|
|
visibility=self.get_post_visibility(),
|
|
)
|
|
#+end_src
|
|
|
|
A page has any number of feeds according to the file primary key:
|
|
|
|
#+begin_src emacs-lisp
|
|
(add-to-list 'arroyo-db--schemata
|
|
'(arcology-feeds
|
|
[(file :not-null)
|
|
(key :not-null)
|
|
(title :not-null)
|
|
(site :not-null)
|
|
(post-visibility :not-null)]))
|
|
|
|
(defun arroyo-arcology--insert-feeds (file)
|
|
(arroyo-db-query [:delete :from arcology-feeds
|
|
:where (= file $s1)]
|
|
file)
|
|
(if-let* ((key (car (arroyo-db-get "ARCOLOGY_FEED" file)))
|
|
(site (replace-regexp-in-string "/.*" "" key)))
|
|
(let* ((title (arroyo-db--get-file-title-from-org-roam file))
|
|
(post-visibility (car (arroyo-db-get "ARCOLOGY_TOOT_VISIBILITY" file))))
|
|
(arroyo-db-query [:insert :into arcology-feeds
|
|
:values $v1]
|
|
(vector file key title site post-visibility)))))
|
|
#+end_src
|
|
|
|
* Arcology Keywords
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/keyword
|
|
:ROAM_ALIASES: arcology.arroyo.Keyword
|
|
:END:
|
|
|
|
All of these models are generated below from the =ARCOLOGY_KEY= entities embedded on each page. these are *Keywords*, a 3-tuple of file, keyword, value, a *threeple*
|
|
|
|
#+NAME: arcology.arroyo.Keyword
|
|
#+begin_src python
|
|
class Keyword(SQLModel, table=True):
|
|
__tablename__ = "keywords"
|
|
file: str = Field(primary_key=True, foreign_key="arcology_pages.file")
|
|
keyword: str = Field(primary_key=True, description="")
|
|
value: str = Field(description="The value of the page")
|
|
|
|
def filename(self):
|
|
return parse_sexp(self.file)
|
|
|
|
def keyword(self):
|
|
return parse_sexp(self.keyword)
|
|
|
|
def value(self):
|
|
return parse_sexp(self.value)
|
|
|
|
@classmethod
|
|
def get(cls, key: str, value: str, session: Session):
|
|
q = select(cls).where(cls.keyword==print_sexp(key)).where(cls.value==print_sexp(value))
|
|
try:
|
|
return next(session.exec(q))
|
|
except StopIteration:
|
|
return None
|
|
#+end_src
|
|
|
|
* Arcology [[id:arroyo/arroyo][Arroyo System]] Database Generator
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo
|
|
:ROAM_ALIASES: arroyo-arcology-update-file
|
|
:END:
|
|
|
|
Putting all those update functions together in an [[id:arroyo/system-cache][arroyo-db]] update function. This has to run after the [[id:cce/org-roam][org-roam]] and [[id:arroyo/system-cache][Arroyo System Cache]] keyword database is built, this is annoyign and I need to rethink it.
|
|
|
|
#+begin_src emacs-lisp
|
|
(defun arroyo-arcology-update-file (&optional file)
|
|
(interactive)
|
|
(when-let* ((file (or file (buffer-file-name)))
|
|
(page-keyword (first (arroyo-db-get "ARCOLOGY_KEY" file)))
|
|
(site-key (first (split-string page-keyword "/")))
|
|
(page-nodes (org-roam-db-query [:select [file id level title] :from nodes
|
|
:where (= file $s1)]
|
|
file))
|
|
(file-hash (caar (org-roam-db-query [:select [hash] :from files :where (= file $s1)]
|
|
file)))
|
|
(page-node-ids (apply #'vector (--map (second it) page-nodes)))
|
|
(level-0-node (--first (eq 0 (third it)) page-nodes))
|
|
(level-0-id (elt level-0-node 1))
|
|
(level-0-title (elt level-0-node 3)))
|
|
; remove the map here -- there will only ever be one level-0 node hopefully but this is hard to understand
|
|
(let* ((allow-crawl (first (arroyo-db-get "ARCOLOGY_ALLOW_CRAWL" file)))
|
|
(allow-crawl (and allow-crawl
|
|
(not (equal allow-crawl "nil")))) ; make sure writing "nil" in the key is respected
|
|
(all-node-refs (org-roam-db-query [:select [ref type node_id] :from refs
|
|
:where (in node_id $v1)]
|
|
page-node-ids))
|
|
(all-node-tags (org-roam-db-query [:select [tag node_id] :from tags
|
|
:where (in node_id $v1)]
|
|
page-node-ids))
|
|
(links (org-roam-db-query [:select [source dest type properties] :from links
|
|
:where (in source $v1)]
|
|
page-node-ids)))
|
|
(arroyo-arcology--insert-page file page-keyword site-key level-0-title level-0-id allow-crawl file-hash)
|
|
(arroyo-arcology--insert-nodes file page-nodes)
|
|
(arroyo-arcology--insert-tags file all-node-tags)
|
|
(arroyo-arcology--insert-refs file all-node-refs)
|
|
(arroyo-arcology--insert-feeds file)
|
|
(arroyo-arcology--insert-links file level-0-title links))))
|
|
|
|
(defun arroyo-arcology-update-db (&optional _wut)
|
|
(interactive)
|
|
(->>
|
|
(arroyo-db-get "ARCOLOGY_KEY")
|
|
(-map #'car)
|
|
(-uniq)
|
|
;; this runs *after* db is updated... what to do here?
|
|
;; (-filter #'arroyo-db-file-updated-p)
|
|
(-map #'arroyo-arcology-update-file)
|
|
)
|
|
)
|
|
|
|
(add-function :after (symbol-function 'arroyo-db-update-all-roam-files) #'arroyo-arcology-update-db)
|
|
;; (add-to-list 'arroyo-db-update-functions #'arroyo-arcology-update-file)
|
|
|
|
(provide 'arroyo-arcology)
|
|
#+end_src
|
|
|
|
* Arcology SQLModel Database Bindings
|
|
:PROPERTIES:
|
|
:ID: arcology/arroyo/sqlmodel
|
|
:END:
|
|
|
|
The engine looks like this, and it's pretty easy to attach my org-roam database here using the [[https://docs.sqlalchemy.org/en/14/core/event.html][SQLAlchmey Events System]] -- you can munge a =SQLModel='s =__table__.schema= to query and map against the org-roam metadatabase.
|
|
|
|
#+begin_src python :tangle arcology/arroyo.py
|
|
from sqlmodel import create_engine
|
|
from sqlalchemy import event
|
|
|
|
from arcology.config import get_settings
|
|
|
|
from pathlib import Path
|
|
|
|
settings = get_settings()
|
|
org_roam_sqlite_file_name = Path(settings.org_roam_db).expanduser().resolve()
|
|
arroyo_sqlite_file_name = Path(settings.arcology_db).expanduser().resolve()
|
|
|
|
def make_engine():
|
|
engine = create_engine('sqlite:///{path}'.format(path=arroyo_sqlite_file_name), echo=False)
|
|
|
|
@event.listens_for(engine, "connect")
|
|
def do_connect(dbapi_connection, _connection_record):
|
|
dbapi_connection.execute("attach database '{orgdb}' as orgroam;".format(orgdb=org_roam_sqlite_file_name))
|
|
|
|
return engine
|
|
|
|
|
|
engine = make_engine()
|
|
#+end_src
|
|
|
|
An interactive testing session could look like this, and indeed =C-c C-c= in here will run it in an [[elisp:(run-python)][Inferior Python]] session:
|
|
|
|
#+begin_src python :session *Python* :results none
|
|
from sqlmodel import select, SQLModel, Session
|
|
|
|
import arcology.arroyo as arroyo
|
|
from arcology.parse import *
|
|
|
|
engine = arroyo.engine
|
|
session = Session(engine)
|
|
|
|
first_link = next(session.exec(select(arroyo.Link)))
|
|
|
|
from_file = arroyo.Page.from_file("/home/rrix/org/arroyo/arroyo.org", session)
|
|
from_key = arroyo.Page.from_key("doc/archive", session)
|
|
|
|
ht = await from_key.document_html()
|
|
#+end_src
|
|
|
|
* Invoking the Arroyo generator from Python
|
|
:PROPERTIES:
|
|
:ID: 20220117T162800.337943
|
|
:ROAM_ALIASES: "Arcology Batch Commands"
|
|
:END:
|
|
|
|
Since the [[id:arcology/arroyo][Arcology Arroyo System]] is written in [[id:cce/programming_lisp_in_emacs][Emacs Lisp]], it's not exactly simple to update the database. When implemented as part of a long-running user-controlled [[id:cce/emacs][Emacs]] environment, Arroyo uses Emacs's [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Hooks.html][Hooks]] to update the database when org-mode files change.
|
|
|
|
Instead of doing that, we find ourselves implementing some scaffolding to replace it:
|
|
|
|
** Org-mode files are put on the server with [[id:cce/syncthing][Syncthing]]
|
|
|
|
** "Batch" commands for running Emacs with the Arroyo generators from a shell
|
|
|
|
This little [[id:cce/programming_lisp_in_emacs][Emacs Lisp]] script sets up some of the minimal [[id:cce/cce][CCE]] scaffolding to make the Arroyo-DB functions available to an environment.
|
|
|
|
#+begin_src emacs-lisp :tangle lisp/arcology-batch.el :mkdirp yes
|
|
(unless (boundp 'org-roam-directory)
|
|
(setq org-roam-directory (file-truename "~/org/")))
|
|
|
|
(load-file (expand-file-name "cce/packaging.el" org-roam-directory))
|
|
|
|
(add-to-list 'load-path default-directory)
|
|
(add-to-list 'load-path arroyo-source-directory)
|
|
|
|
(use-package dash)
|
|
(use-package f)
|
|
(use-package s)
|
|
(use-package emacsql)
|
|
;; (use-package emacsql-sqlite3)
|
|
(require 'subr-x)
|
|
(require 'cl)
|
|
|
|
(require 'org-roam)
|
|
(require 'arroyo-db)
|
|
(require 'arroyo-arcology)
|
|
#+end_src
|
|
|
|
That script is loaded by this script which isn't a script, but a template for a Python module so that the locations and variables can be customized at run time, loaded from the [[id:20220117T162655.535047][Arcology BaseSettings]].
|
|
|
|
(lord help me)
|
|
|
|
#+NAME: arcology-batch-shell
|
|
#+begin_src shell
|
|
set -ex
|
|
export DBPATH=$(mktemp $(dirname {arcology_db})/arcology.XXXXXXXXXX.db)
|
|
pushd {arcology_src};
|
|
|
|
cp {arcology_db} $DBPATH || echo "no existing db found, will be created from scratch"
|
|
{emacs} -Q --batch \
|
|
--eval '(setq org-roam-directory "{arcology_dir}")' \
|
|
--eval '(setq arcology-source-directory "{arcology_src}/lisp")' \
|
|
--eval '(setq arroyo-source-directory "{arroyo_src}")' \
|
|
--eval '(setq arroyo-db-location "'$DBPATH'")' \
|
|
--eval '(setq org-roam-db-location "{org_roam_db}")' \
|
|
-l lisp/arcology-batch.el \
|
|
--eval '(org-roam-db-sync)' # \
|
|
# --eval '(arroyo-db-update-all-roam-files)' \
|
|
# --eval '(arroyo-db-update-all-roam-files)' \
|
|
# --eval '(arroyo-arcology-update-db)'
|
|
|
|
mv $DBPATH {arcology_db}
|
|
echo "rebuild done"
|
|
#+end_src
|
|
|
|
The Python extracts stuff from that [[id:20220117T162655.535047][FastAPI/Pydantic =BaseSettings= module]] and templates it in with =format()=. Sorry for [[id:cce/literate_programming][Literate Programming]] ([[https://www.youtube.com/watch?v=SkTt9k4Y-a8][sorry for party rocking]])
|
|
|
|
#+begin_src python :tangle arcology/batch.py :noweb yes
|
|
from .config import get_settings
|
|
|
|
COMMAND_TMPL = """
|
|
<<arcology-batch-shell>>
|
|
"""
|
|
|
|
def build_command():
|
|
settings = get_settings()
|
|
|
|
return COMMAND_TMPL.format(
|
|
arcology_dir = settings.arcology_directory,
|
|
arcology_src = settings.arcology_src,
|
|
arroyo_src = settings.arroyo_src,
|
|
arcology_db = settings.arcology_db,
|
|
org_roam_db = settings.org_roam_db,
|
|
emacs = settings.arroyo_emacs,
|
|
)
|
|
#+end_src
|
|
|
|
This is executed by [[id:20211218T222408.578567][Arcology Automated Database Builder]].
|