15 KiB
- Modeling the org-roam schema in SQLModel
- Nodes have-many Links and Nodes have-many-to-many Nodes through Links
- Node is the "root" object
- Nodes have-one file
- Nodes have-many Aliases
- Nodes have-many Tags
- Nodes have-many References
- Attach arroyo-db for Keywords
- Arcology "just" reads the org-roam and Arroyo Systems Cache databases
- Validating the DB API
- INPROGRESS higher-level
arcology.Page
class
Modeling the org-roam schema in SQLModel
Getting these to all fit together is gonna hopefully not be too fiddly. I am happy that I don't have to coerce all these primary keys and shit, I just need to be able to read from the database and walk among ORM objects to extract an Arcology Page out of it
from typing import Optional, List
from sqlmodel import Field, Relationship, SQLModel
from arcology.parse import parse_sexp
Nodes have-many Links and Nodes have-many-to-many Nodes through Links
- State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:35]
Setting up the links here is a bit fussy; the class key source_ref
is the database key source
and dest_ref
similarly maps to dest
; this requires some extra bits to be defined to properly de-reference these, especially when using it as the LinkModel: the link models need to know to query against source_ref
and =dest_ref!
CREATE TABLE links (pos NOT NULL, source NOT NULL, dest NOT NULL, type NOT NULL, properties NOT NULL,
FOREIGN KEY (source) REFERENCES nodes (id) ON DELETE CASCADE);
class Link(SQLModel, table=True):
__tablename__ = 'links'
pos: str
source_ref: Optional[str] = Field(primary_key=True, foreign_key="nodes.id",
sa_column_kwargs=dict(name='source'))
dest_ref: Optional[str] = Field(primary_key=True, foreign_key="nodes.id",
sa_column_kwargs=dict(name='dest'))
type: str
properties: str
source: Optional["Node"] = Relationship(back_populates="outlinks",
sa_relationship_kwargs=dict(
foreign_keys="[Link.source_ref]",
lazy="joined",
))
dest: Optional["Node"] = Relationship(back_populates="backlinks",
sa_relationship_kwargs=dict(
foreign_keys="[Link.dest_ref]",
lazy="joined",
))
Node is the "root" object
- State "INPROGRESS" from [2021-09-21 Tue 18:31]
org-roam v2 is node-oriented, but Arcology is file-oriented.
CREATE TABLE nodes (id NOT NULL PRIMARY KEY, file NOT NULL, level NOT NULL, pos NOT NULL,
todo , priority , scheduled text, deadline text, title , properties , olp ,
FOREIGN KEY (file) REFERENCES files (file) ON DELETE CASCADE);
file_ref
is defined strangely so that the relationship can be the reasonable one to access. Similarly, there are source
and dest
hackery-pokery here, as well.
class Node(SQLModel, table=True):
__tablename__ = 'nodes'
id: Optional[int] = Field(default=None, primary_key=True)
level: int
pos: int
title: str
todo: str
priority: str
scheduled: str
deadline: str
properties: str
olp: str
<<relationships>>
<<link-relationships>>
It has a bunch of relationships defined against it that map to the classes defined below:
relationships
file_ref: Optional[str] = Field(foreign_key="files.file",
sa_column_kwargs={'name': 'file'})
file: List["File"] = Relationship(back_populates="nodes")
aliases: List["Alias"] = Relationship(back_populates="node")
tags: List["Tag"] = Relationship(back_populates="node")
references: List["Reference"] = Relationship(back_populates="node")
There are four linking relationships, and I would like to add a fifth some time (select child nodes):
- outlinks and outlink_nodes are out-bound links in that the text of the link is in the current node
- backlinks and backlink_noes are in-bound links which exist in other nodes and link to this one.
link-relationships
outlinks: List["Link"] = Relationship(
back_populates="dest",
sa_relationship_kwargs=dict(
viewonly=True,
foreign_keys="[Link.dest_ref]",
primaryjoin="Node.id==Link.dest_ref",
))
outlink_nodes: List["Node"] = Relationship(
back_populates="backlink_nodes",
link_model=Link,
sa_relationship_kwargs=dict(
viewonly=True,
primaryjoin="Node.id==Link.dest_ref",
secondaryjoin="Node.id==Link.source_ref",
)
)
backlinks: List["Link"] = Relationship(
back_populates="source",
sa_relationship_kwargs=dict(
viewonly=True,
foreign_keys="[Link.source_ref]",
primaryjoin="Node.id==Link.source_ref",
))
# https://github.com/tiangolo/sqlmodel/issues/89
backlink_nodes: List["Node"] = Relationship(
back_populates="outlink_nodes",
link_model=Link,
sa_relationship_kwargs=dict(
viewonly=True,
primaryjoin="Node.id==Link.source_ref",
secondaryjoin="Node.id==Link.dest_ref",
)
)
This thing is load-bearing, make sure it works:
res = session.exec(select(roam.Node).where(roam.Node.id=='"cce/cce"'))
node = res.first()
assert set([parse_sexp(bl.dest_ref) for bl in node.backlinks])
assert set([parse_sexp(bl.source_ref) for bl in node.backlinks])
Nodes have-one file
- State "INPROGRESS" from [2021-09-21 Tue 18:31]
The node at level-0 is the File Properties node.
CREATE TABLE files (file UNIQUE PRIMARY KEY, hash NOT NULL,
atime NOT NULL, mtime NOT NULL);
class File(SQLModel, table=True):
__tablename__ = 'files'
file: Optional[str] = Field(default=None, primary_key=True)
hash: str
atime: str
mtime: str
nodes: List["Node"] = Relationship(back_populates="file")
keywords: List["Keyword"] = Relationship(back_populates="file")
Nodes have-many Aliases
- State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:30]
CREATE TABLE aliases (node_id NOT NULL, alias ,
FOREIGN KEY (node_id) REFERENCES nodes (id) ON DELETE CASCADE);
class Alias(SQLModel, table=True):
__tablename__ = 'aliases'
node_id: str = Field(primary_key=True, foreign_key='nodes.id')
alias: str
node: Optional["Node"] = Relationship(back_populates="aliases")
def __str__(self):
return parse_sexp(self.alias)
DONE un-wrap Alias.__str__
- State "DONE" from "NEXT" [2021-09-21 Tue 21:22]
Nodes have-many Tags
- State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:31]
CREATE TABLE tags (node_id NOT NULL, tag ,
FOREIGN KEY (node_id) REFERENCES nodes (id) ON DELETE CASCADE);
class Tag(SQLModel, table=True):
__tablename__ = 'tags'
node_id: str = Field(primary_key=True, foreign_key='nodes.id')
tag: str
node: Optional["Node"] = Relationship(back_populates="tags")
def __str__(self):
return parse_sexp(self.tag)
DONE un-wrap Tag.__str__
- State "DONE" from "NEXT" [2021-09-21 Tue 21:22]
Nodes have-many References
- State "INPROGRESS" from "NEXT" [2021-09-21 Tue 19:32]
CREATE TABLE refs (node_id NOT NULL, ref NOT NULL, type NOT NULL,
FOREIGN KEY (node_id) REFERENCES nodes (id) ON DELETE CASCADE);
class Reference(SQLModel, table=True):
__tablename__ = 'refs'
node_id: str = Field(primary_key=True, foreign_key='nodes.id')
ref: str
node: Optional["Node"] = Relationship(back_populates="references")
def __str__(self):
return parse_sexp(self.ref)
DONE Unwrap Reference.__str__
- State "DONE" from "NEXT" [2021-09-21 Tue 21:17]
Attach arroyo-db for Keywords
Keywords are simple file/key/value triples which are extracted from org-mode documents' snippets matching help:org-re-property's regexp. Basically: #+ARROYO_MODULE: true
in test.org
will create a row with the following emacsql encoded/quoted strings: [ test.org, ARROYO_MODULE, true ]
CREATE TABLE keywords (file NOT NULL, keyword NOT NULL, value NOT NULL);
class Keyword(SQLModel, table=True):
__tablename__ = 'keywords'
keyword: str
value: str = Field(str, primary_key=True)
file_ref: Optional[str] = Field(foreign_key="files.file",
sa_column_kwargs={'name': 'file'},
primary_key=True)
file: List["File"] = Relationship(back_populates="keywords")
<<dunder_init>>
def __str__(self):
return {keyword: value}
This data is in Arroyo's database, rather then in Org Roam.
It's attached in the Arcology SQLModel Setup below and thus has its own schema namespace which will be crammed in by overriding it during __init__
. It's pretty simple, and I do hope that it works.
def __init__(self):
super(self)
self.__table__.schema = "arroyo"
Arcology "just" reads the org-roam and Arroyo Systems Cache databases
With this, we can cram the arroyo DB alongside the org-roam DB which was harder to figure out than I thought:
from sqlmodel import create_engine
from sqlalchemy import event
org_roam_sqlite_file_name = "/home/rrix/.emacs.d/org-roam.db"
arroyo_sqlite_file_name = "/home/rrix/.emacs.d/arroyo.db"
def make_engine():
engine = create_engine('sqlite:///{path}'.format(path=org_roam_sqlite_file_name), echo=True)
@event.listens_for(engine, "connect")
def do_connect(dbapi_connection, _connection_record):
dbapi_connection.execute("attach database '{ar}' as arroyo;".format(ar=arroyo_sqlite_file_name))
return engine
engine = make_engine()
NEXT make path to DB configurable
bring in a configuration framework in this?
Validating the DB API
here's a testing session; C-c C-c
in here and then find your *arcology-testing*
buffer to play around:
from sqlmodel import select, SQLModel, Session
import arcology.roam as roam
from arcology.parse import *
engine = roam.engine
session = Session(engine)
INPROGRESS higher-level arcology.Page
class
- State "INPROGRESS" from [2021-09-25 Sat 18:07]
The arcology.page.Page
(aliased as arcology.Page
elsewhere=. it's a SQLModel
but Arcology will only use it as a dataclass and hopefully to do some smart mapping.
from sqlmodel import select, Session
from pydantic import BaseModel
from typing import List, Optional
import arcology.html as html
import arcology.roam as roam
from arcology.parse import *
import itertools
from dataclasses import dataclass
@dataclass
class Page():
file_path: str
root_node: roam.Node
title: str
nodes: Optional[List[roam.Node]]
backlinks: Optional[List[roam.Link]]
outlinks: Optional[List[roam.Link]]
html: Optional[str]
backlink_html: Optional[str]
@classmethod
def from_file(cls, path: str):
keyed_path = print_sexp(path)
with Session(roam.engine) as session:
# top "root" node has level=0
q = select(roam.Node) \
.where(roam.Node.level==0) \
.where(roam.Node.file_ref==keyed_path)
root_node = session.exec(q).first()
title = root_node.title if root_node else ""
# get all the nodes on the page, and their IDs
q = select(roam.Node) \
.where(roam.Node.file_ref==keyed_path)
nodes = session.exec(q).all()
node_ids = [node.id for node in nodes]
# collect links to all nodes on page
outlinks = next(itertools.chain([node.outlinks for node in nodes]))
# collect links *from* all nodes on page
backlinks = next(itertools.chain([node.backlinks for node in nodes]))
# (ref:backlink_instantiation)
# bundle it together
return Page(
file_path=path,
title=title,
root_node=root_node,
backlinks=backlinks,
outlinks=outlinks,
nodes=nodes,
html='',
backlink_html='',
)
def make_backlinks_org(self):
if self.backlinks is None:
return ''
def to_org(link: roam.Link): # (ref:backlinks_to_org_fn)
return \
"""
,* [[{path}][{title}]]
""".format(
path=link.source.file_ref,
title=link.source.title
)
return '\n'.join([ to_org(link) for link in self.backlinks ])
def load_html(self):
self.html = html.gen_html(self.file_path)
self.backlink_html = html.gen_html_text(self.make_backlinks_org())
HTML generation and caching
A very simple wrapper around pypandoc
; this will also need to do some pre- and post-transformations.
import functools
import pypandoc
@functools.lru_cache(maxsize=128)
def gen_html(input_path: str, input_format: str = 'org'):
return pypandoc.convert_file(input_path, 'html', format='org').encode('utf-8')
@functools.lru_cache(maxsize=128)
def gen_html_text(input_text: str, input_format: str = 'org'):
return pypandoc.convert_text(input_text, 'html', format='org').encode('utf-8')
import arcology.roam as roam
from arcology.page import Page
from sqlmodel import Session, select
session = Session(roam.engine)
q = select(roam.Link) \
.where(roam.Link.dest_ref.in_(['"cce/cce"']))
backlinks = session.exec(q).all()
print(backlinks[0].source.file_ref)
page = Page.from_file(path="/home/rrix/org/cce/cce.org")
page.load_html()
print(page.html)
print(page.backlink_html)