arcology-fastapi/backup.org

15 KiB

Modeling the org-roam schema in SQLModel

Getting these to all fit together is gonna hopefully not be too fiddly. I am happy that I don't have to coerce all these primary keys and shit, I just need to be able to read from the database and walk among ORM objects to extract an Arcology Page out of it

from typing import Optional, List
from sqlmodel import Field, Relationship, SQLModel

from arcology.parse import parse_sexp

Nodes have-many Links and Nodes have-many-to-many Nodes through Links

  • State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:35]

Setting up the links here is a bit fussy; the class key source_ref is the database key source and dest_ref similarly maps to dest; this requires some extra bits to be defined to properly de-reference these, especially when using it as the LinkModel: the link models need to know to query against source_ref and =dest_ref!

CREATE TABLE links (pos NOT NULL, source NOT NULL, dest NOT NULL, type NOT NULL, properties NOT NULL,
                    FOREIGN KEY (source) REFERENCES nodes (id) ON DELETE CASCADE);
class Link(SQLModel, table=True):
    __tablename__ = 'links'
    pos: str
    source_ref: Optional[str] = Field(primary_key=True, foreign_key="nodes.id",
                                      sa_column_kwargs=dict(name='source'))
    dest_ref: Optional[str] = Field(primary_key=True, foreign_key="nodes.id",
                                    sa_column_kwargs=dict(name='dest'))
    type: str
    properties: str

    source: Optional["Node"] = Relationship(back_populates="outlinks",
                                            sa_relationship_kwargs=dict(
                                                foreign_keys="[Link.source_ref]",
                                                lazy="joined",
                                            ))
    dest: Optional["Node"] = Relationship(back_populates="backlinks", 
                                          sa_relationship_kwargs=dict(
                                              foreign_keys="[Link.dest_ref]",
                                              lazy="joined",
                                          ))

Node is the "root" object

  • State "INPROGRESS" from [2021-09-21 Tue 18:31]

org-roam v2 is node-oriented, but Arcology is file-oriented.

CREATE TABLE nodes (id NOT NULL PRIMARY KEY, file NOT NULL, level NOT NULL, pos NOT NULL,
                    todo , priority , scheduled text, deadline text, title , properties , olp ,
                    FOREIGN KEY (file) REFERENCES files (file) ON DELETE CASCADE);

file_ref is defined strangely so that the relationship can be the reasonable one to access. Similarly, there are source and dest hackery-pokery here, as well.

class Node(SQLModel, table=True):
    __tablename__ = 'nodes'
    id: Optional[int] = Field(default=None, primary_key=True)
    level: int
    pos: int
    title: str

    todo: str
    priority: str

    scheduled: str
    deadline: str

    properties: str
    olp: str

    <<relationships>>
    <<link-relationships>>

It has a bunch of relationships defined against it that map to the classes defined below:

relationships
file_ref: Optional[str] = Field(foreign_key="files.file",
                                sa_column_kwargs={'name': 'file'})
file: List["File"] = Relationship(back_populates="nodes")
aliases: List["Alias"] = Relationship(back_populates="node")
tags: List["Tag"] = Relationship(back_populates="node")
references: List["Reference"] = Relationship(back_populates="node")

There are four linking relationships, and I would like to add a fifth some time (select child nodes):

  • outlinks and outlink_nodes are out-bound links in that the text of the link is in the current node
  • backlinks and backlink_noes are in-bound links which exist in other nodes and link to this one.
link-relationships
outlinks: List["Link"] = Relationship(
    back_populates="dest",
    sa_relationship_kwargs=dict(
        viewonly=True,
        foreign_keys="[Link.dest_ref]",
        primaryjoin="Node.id==Link.dest_ref",
    ))

outlink_nodes: List["Node"] = Relationship(
    back_populates="backlink_nodes",
    link_model=Link,
    sa_relationship_kwargs=dict(
        viewonly=True,
        primaryjoin="Node.id==Link.dest_ref",
        secondaryjoin="Node.id==Link.source_ref",
    )
)

backlinks: List["Link"] = Relationship(
    back_populates="source",
    sa_relationship_kwargs=dict(
        viewonly=True,
        foreign_keys="[Link.source_ref]",
        primaryjoin="Node.id==Link.source_ref",
    ))

# https://github.com/tiangolo/sqlmodel/issues/89
backlink_nodes: List["Node"] = Relationship(
    back_populates="outlink_nodes",
    link_model=Link,
    sa_relationship_kwargs=dict(
        viewonly=True,
        primaryjoin="Node.id==Link.source_ref",
        secondaryjoin="Node.id==Link.dest_ref",
    )
)

This thing is load-bearing, make sure it works:

res = session.exec(select(roam.Node).where(roam.Node.id=='"cce/cce"'))
node = res.first()

assert set([parse_sexp(bl.dest_ref) for bl in node.backlinks])
assert set([parse_sexp(bl.source_ref) for bl in node.backlinks])

Nodes have-one file

  • State "INPROGRESS" from [2021-09-21 Tue 18:31]

The node at level-0 is the File Properties node.

CREATE TABLE files (file UNIQUE PRIMARY KEY, hash NOT NULL,
                    atime NOT NULL, mtime NOT NULL);
class File(SQLModel, table=True):
    __tablename__ = 'files'
    file: Optional[str] = Field(default=None, primary_key=True)
    hash: str
    atime: str
    mtime: str

    nodes: List["Node"] = Relationship(back_populates="file")
    keywords: List["Keyword"] = Relationship(back_populates="file")

Nodes have-many Aliases

  • State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:30]
CREATE TABLE aliases (node_id NOT NULL, alias ,
                      FOREIGN KEY (node_id) REFERENCES nodes (id) ON DELETE CASCADE);
class Alias(SQLModel, table=True):
    __tablename__ = 'aliases'
    node_id: str = Field(primary_key=True, foreign_key='nodes.id')
    alias: str

    node: Optional["Node"] = Relationship(back_populates="aliases")

    def __str__(self):
        return parse_sexp(self.alias)

DONE un-wrap Alias.__str__

  • State "DONE" from "NEXT" [2021-09-21 Tue 21:22]

Nodes have-many Tags

  • State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:31]
CREATE TABLE tags (node_id NOT NULL, tag ,
                   FOREIGN KEY (node_id) REFERENCES nodes (id) ON DELETE CASCADE);
class Tag(SQLModel, table=True):
    __tablename__ = 'tags'
    node_id: str = Field(primary_key=True, foreign_key='nodes.id')
    tag: str

    node: Optional["Node"] = Relationship(back_populates="tags")

    def __str__(self):
        return parse_sexp(self.tag)

DONE un-wrap Tag.__str__

  • State "DONE" from "NEXT" [2021-09-21 Tue 21:22]

Nodes have-many References

  • State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:32]
CREATE TABLE refs (node_id NOT NULL, ref NOT NULL, type NOT NULL,
                   FOREIGN KEY (node_id) REFERENCES nodes (id) ON DELETE CASCADE);
class Reference(SQLModel, table=True):
    __tablename__ = 'refs'
    node_id: str = Field(primary_key=True, foreign_key='nodes.id')
    ref: str

    node: Optional["Node"] = Relationship(back_populates="references")

    def __str__(self):
        return parse_sexp(self.ref)

DONE Unwrap Reference.__str__

  • State "DONE" from "NEXT" [2021-09-21 Tue 21:17]

Attach arroyo-db for Keywords

Keywords are simple file/key/value triples which are extracted from org-mode documents' snippets matching help:org-re-property's regexp. Basically: #+ARROYO_MODULE: true in test.org will create a row with the following emacsql encoded/quoted strings: [ test.org, ARROYO_MODULE, true ]

CREATE TABLE keywords (file NOT NULL, keyword NOT NULL, value NOT NULL);
class Keyword(SQLModel, table=True):
    __tablename__ = 'keywords'
    keyword: str
    value: str = Field(str, primary_key=True)

    file_ref: Optional[str] = Field(foreign_key="files.file",
                                    sa_column_kwargs={'name': 'file'},
                                    primary_key=True)
    file: List["File"] = Relationship(back_populates="keywords")

    <<dunder_init>>

    def __str__(self):
        return {keyword: value}

This data is in Arroyo's database, rather then in Org Roam.

It's attached in the Arcology SQLModel Setup below and thus has its own schema namespace which will be crammed in by overriding it during __init__. It's pretty simple, and I do hope that it works.

def __init__(self):
    super(self)
    self.__table__.schema = "arroyo"

Arcology "just" reads the org-roam and Arroyo Systems Cache databases

With this, we can cram the arroyo DB alongside the org-roam DB which was harder to figure out than I thought:

from sqlmodel import create_engine
from sqlalchemy import event

org_roam_sqlite_file_name = "/home/rrix/.emacs.d/org-roam.db"
arroyo_sqlite_file_name = "/home/rrix/.emacs.d/arroyo.db"

def make_engine():
    engine = create_engine('sqlite:///{path}'.format(path=org_roam_sqlite_file_name), echo=True)

    @event.listens_for(engine, "connect")
    def do_connect(dbapi_connection, _connection_record):
        dbapi_connection.execute("attach database '{ar}' as arroyo;".format(ar=arroyo_sqlite_file_name))

    return engine


engine = make_engine()

NEXT make path to DB configurable

bring in a configuration framework in this?

Validating the DB API

here's a testing session; C-c C-c in here and then find your *arcology-testing* buffer to play around:

from sqlmodel import select, SQLModel, Session

import arcology.roam as roam
from arcology.parse import *

engine = roam.engine
session = Session(engine)

INPROGRESS higher-level arcology.Page class

  • State "INPROGRESS" from [2021-09-25 Sat 18:07]

The arcology.page.Page (aliased as arcology.Page elsewhere=. it's a SQLModel but Arcology will only use it as a dataclass and hopefully to do some smart mapping.

from sqlmodel import select, Session
from pydantic import BaseModel
from typing import List, Optional

import arcology.html as html
import arcology.roam as roam
from arcology.parse import *

import itertools

from dataclasses import dataclass

@dataclass 
class Page():
    file_path: str
    root_node: roam.Node
    title: str

    nodes: Optional[List[roam.Node]]
    backlinks: Optional[List[roam.Link]]
    outlinks: Optional[List[roam.Link]]

    html: Optional[str]
    backlink_html: Optional[str]

    @classmethod
    def from_file(cls, path: str):
        keyed_path = print_sexp(path)

        with Session(roam.engine) as session:
            # top "root" node has level=0
            q = select(roam.Node) \
                .where(roam.Node.level==0) \
                .where(roam.Node.file_ref==keyed_path)
            root_node = session.exec(q).first()
            title = root_node.title if root_node else ""

            # get all the nodes on the page, and their IDs
            q = select(roam.Node) \
                .where(roam.Node.file_ref==keyed_path)
            nodes = session.exec(q).all()
            node_ids = [node.id for node in nodes]

            # collect links to all nodes on page
            outlinks = next(itertools.chain([node.outlinks for node in nodes]))
            # collect links *from* all nodes on page
            backlinks = next(itertools.chain([node.backlinks for node in nodes]))
            # (ref:backlink_instantiation)
            
        # bundle it together
        return Page(
            file_path=path,
            title=title,
            root_node=root_node,

            backlinks=backlinks,
            outlinks=outlinks,
            nodes=nodes,

            html='',
            backlink_html='',
        )

    def make_backlinks_org(self):
        if self.backlinks is None:
            return ''

        def to_org(link: roam.Link):                  # (ref:backlinks_to_org_fn)
            return \
                """
                ,* [[{path}][{title}]]
                """.format(
                    path=link.source.file_ref,
                    title=link.source.title
                    )

        return '\n'.join([ to_org(link) for link in self.backlinks ])

    def load_html(self):
        self.html = html.gen_html(self.file_path)
        self.backlink_html = html.gen_html_text(self.make_backlinks_org())

HTML generation and caching

A very simple wrapper around pypandoc; this will also need to do some pre- and post-transformations.

import functools
import pypandoc

@functools.lru_cache(maxsize=128)
def gen_html(input_path: str, input_format: str = 'org'):
    return pypandoc.convert_file(input_path, 'html', format='org').encode('utf-8')

@functools.lru_cache(maxsize=128)
def gen_html_text(input_text: str, input_format: str = 'org'):
    return pypandoc.convert_text(input_text, 'html', format='org').encode('utf-8')
import arcology.roam as roam
from arcology.page import Page
from sqlmodel import Session, select

session = Session(roam.engine)
q = select(roam.Link) \
    .where(roam.Link.dest_ref.in_(['"cce/cce"']))
backlinks = session.exec(q).all()
print(backlinks[0].source.file_ref)

page = Page.from_file(path="/home/rrix/org/cce/cce.org")
page.load_html()

print(page.html)
print(page.backlink_html)

NEXT Backlink collection, generation and caching

NEXT page HTML re-writing