25 KiB
Arcology Roam Models
- model support functions
Arcology.Roam
files
tableArcology.Roam.File
keywords
tableArcology.Roam.Keyword
links
tableArcology.Roam.Link
references
tableArcology.Roam.Reference
titles
tableArcology.Roam.Title
tags
tableArcology.Roam.Tag
- Footnotes
These are the Ecto models and support functionality for the database tables provided by Arcology DB. These are all read only concerns, the write-path is in Arcology DB right now, where the org-mode parser is. Any database which I intend to write to should probably be an postgrex
ecto 3
model in its own OTP application, and is out of scope for this file.
Working with the Roam entities is designed to be handled in a few fashions:
- Higher Level Interface in Arcology.Page
Arcology.Roam.File.get
andArcology.Roam.File.all
which return "hydrated" Ecto models, they have all their links, references, tags, and titles preloaded, though they're still in the un-massaged structures rather than a compact or ergonomic data structure.Arcology.Roam.Keyword.from_file
andArcology.Roam.Keyword.all
andArcology.Roam.Keyword.get
constitute the API for looking up Keywords. They're pretty self-explanatory, and return strings (inArcology.Roam.Keyword.from_file
) andFile
objects (elsewhere).Arcology.Roam.Link.all
andArcology.Roam.Link.files
return linksArcology.Roam.Reference.ref_file
lets you look up a file based on a reference URL'sROAM_KEY
keyword, andArcology.Roam.Reference.file_ref
does the opposite, returns the reference key based for a file.Arcology.Roam.Tag.get
andArcology.Roam.Tag.all
can be used to get a map of file name -> list of tags, which is probably useful; these are in a File object we're likely working with, but not in this normalized preferable state; I need to think about a better interface than "uhh when you destructure%Arcology.Roam.File{tags: tags}
don't forget to run tags through ato_map
function!" 'cause that kind of sucks to me.Arcology.Roam.Title.get
does what it says on the tin, simple stuff. Feed it a file name string, get a list of titles out.
Other use cases oughta be considered and documented here. Again, I'm kind of nervous about this %Arcology.Roam.File{}
interface, having these as un-processed structures seems like it might be a mistake, I may want to make my own proxy structure that distills the relatively heavy Ecto models down to lists of strings where I no longer care about the associations. Not having to care about write concerns makes life so easy here, I hope I don't regret designing around that later on.
model support functions Arcology.Roam
Okay so in Arcology.Roam
there is a bunch of functionality designed to be use
'd in to other models, mostly around transforming the emacsql strings in to useful shapes:
Arcology.Roam.parse_sexp
takes a binary containing an s-expression string and returns a "real" data object. I use this in some pretty shameful ways, mostly to trick data in and out of the emacsql printed form. Arcology.Roam.dequote
for example uses this to pull strings out of the database and reassemble them in to their object simply – i "invented" a pattern for post-processing the results that uses these functions. it's hacky and i'm not super proud of it, but alas here we are splatting forms in to strings and parsing them back out.
def parse_sexp(form) do
SymbolicExpression.Parser.parse(form)
end
def dequote(form) do
{:ok, [parse]} = parse_sexp("(#{form})")
parse
end
Sometimes the things coming out of the DB are charlists instead of binaries, Arcology.Roam.charlist_to_string
will try to join them but will fail if invalid UTF-8 or unicode is passed through:
def charlist_to_string(raw) do
Enum.join(for <<c <- raw>>, do: <<c::utf8>>)
end
A plist
is an s-expression of the form (:key1 value1 :key2 value2)
, and Arcology.Roam.plist_to_keywords
will return a keyword list parsed from this.
def plist_to_keywords(plist) do
plist
|> Enum.chunk_every(2, 2)
|> Enum.map(&List.to_tuple(&1))
|> Enum.map(&clean_tuple_key(&1))
end
def clean_tuple_key(tuple) do
k =
elem(tuple, 0)
|> Atom.to_string()
|> String.trim_leading(":")
|> String.to_atom()
{k, elem(tuple, 1)}
end
And Arcology.Roam.quote_string
is used to escape file names in query building:
@doc "XXX: This is only for filenames."
def quote_string(string) do
~s("#{string}")
end
Assemblage
defmodule Arcology.Roam do
<<roam_parse_sexp>>
<<roam_charlist_to_string>>
<<roam_plist_to_keywords>>
<<roam_clean_file_name>>
<<roam_quote_string>>
end
files
table Arcology.Roam.File
The Arcology.Roam.File
module contains associations to the rest of the other entities defined in this file, it's the root of a short model hierarchy that is expressed in the code below. org-roam
's parser caches the hash of the file, helpfully providing a cache key to be easily referenced later on. This module is used by the Arcology Page Module to provide high-level interface to single Pages in the document graph.
The Ecto schema provides associative access to all relevant metadata, as well as files on the far ends of links. I need to move from to/from names to inbound/outbound or sender/receiver. I always find myself confused by to/from, somehow. The Arcology.Roam.File.preloads
function will load all relevant associations for the model or list passed in like Ecto.Repo
. This'll be nice to have on any of the accessor functions defined in the module itself.
use Ecto.Schema
alias Arcology.Repo
@primary_key {:file, :string, []}
schema "files" do
# field :file, :string
field :hash, :string
field :meta, :string
has_many :titles, Arcology.Roam.Title,
foreign_key: :file,
references: :file
# This table is not normalized in org-roam upstream; tags.tags is an s-expression
has_one :tags, Arcology.Roam.Tag,
foreign_key: :file,
references: :file
has_many :keywords, Arcology.Roam.Keyword,
foreign_key: :file,
references: :file
has_one :reference, Arcology.Roam.Reference,
foreign_key: :file,
references: :file
has_many :links_to, Arcology.Roam.Link, references: :file, foreign_key: :dest
has_many :links_from, Arcology.Roam.Link, references: :file, foreign_key: :source
has_many :files_to, Arcology.Roam.File, references: :file, foreign_key: :dest
has_many :files_from, Arcology.Roam.File, references: :file, foreign_key: :source
end
def preloads q do
file_links = from(l in Arcology.Roam.Link, where: l.type == ~s("file"))
Repo.preload(q, [
:titles,
:reference,
:tags,
:keywords,
:links_to,
:links_from,
files_to: file_links,
files_from: file_links
])
end
There is a get
and all
fetcher, and access functions which dequote
the hash and file path. "def get file file equals file do file pipe file fetch file pipe dequote end" has to be some of my least-legible code, but it does roll off the tongue pretty well. My UNIX brain is activated with the pipe operator, I want to design everything around and with it to the detriment of better patterns, perhaps.
defmodule Arcology.Roam.File do
import Ecto.Query
import Arcology.Roam
alias Arcology.Roam.File
<<files_ecto>>
def all do
Repo.all(File) |> preloads()
end
def get filename do
Repo.get_by(File,
file: quote_string(filename)
) |> preloads()
end
def get_hash(%File{} = file) do
file |> Map.get(:hash) |> dequote
end
def get_name(%File{} = file) do
file |> Map.get(:file) |> dequote
end
end
keywords
table Arcology.Roam.Keyword
The keywords
table exposes Key/Value settings embedded in files of the form #+KEY: value
, at least those which are added to the arcology-batch
configuration in arcology-db. the shape of the association is:
defmodule Arcology.Roam.Keyword do
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
alias Arcology.Repo
alias Arcology.Roam.{File, Keyword}
@primary_key false
schema "keywords" do
field(:file, :string)
field(:keyword, :string)
field(:value, :string)
belongs_to :f, Arcology.Roam.File,
type: :string,
references: :file,
foreign_key: :file,
define_field: false
end
<<keyword_all>>
<<keyword_get>>
<<keyword_from_file>>
<<keyword_from_files>>
<<keyword_accessors>>
end
/home/rrix/org/2018_08_state_of_the_art.org | ARCOLOGY_KEY | garden/2018-state-of-the-art |
/home/rrix/org/2018_08_state_of_the_art.org | ARCOLOGY_TWITTER | https://twitter.com/rrrrrrrix/status/1033934998724796416 |
/home/rrix/org/2018_08_state_of_the_art.org | ARCOLOGY_OLD_PERMALINK | 1535364960.0-note.html |
/home/rrix/org/2019_san_diego_and_phoenix_trip.org | ARCOLOGY_KEY | lionsrear/2019-ca-phx-trip |
/home/rrix/org/2019_san_diego_and_phoenix_trip.org | ARCOLOGY_TWITTER | https://twitter.com/rrrrrrrix/status/1202778552875339776 |
/home/rrix/org/2019_san_diego_and_phoenix_trip.org | ARCOLOGY_OLD_PERMALINK | 1575596160.0-note.html |
/home/rrix/org/2019_westfalia_vanagon_trip.org | ARCOLOGY_KEY | lionsrear/2019-westfalia-vanagon |
/home/rrix/org/2019_westfalia_vanagon_trip.org | ARCOLOGY_TWITTER | https://twitter.com/rrrrrrrix/status/1175924011550871557 |
/home/rrix/org/2019_westfalia_vanagon_trip.org | ARCOLOGY_OLD_PERMALINK | 1569194520.0-note.html |
/home/rrix/org/Bioregionalism.org | ARCOLOGY_KEY | lionsrear/bioregionalism |
There are two forms of Arcology.Roam.Keyword.all
, one of which returns all the keywords in the DB, one returns all rows which have the keyword which is passed in.
defp preloads results do
Repo.preload(results, [:f])
end
def all(keyword) do
Repo.all(
Keyword
|> where(keyword: ^quote_string(keyword))
) |> preloads
end
def all do
Repo.all(Keyword)
|> preloads
end
Arcology.Roam.Keyword.get/2
return all keywords of a certain type and value, it's used to map URLs to ARCOLOGY_KEY
entities. Keywords are not unique, files can only have one value for a certain keyword, but many files can have a different value.
@doc """
returns all files whose KEYWORD is set to VALUE
"""
def get(keyword, value) do
Repo.all(
Keyword
|> where(keyword: ^quote_string(keyword))
|> where(value: ^quote_string(value))
) |> preloads
end
Arcology.Roam.Keyword.from_file/2
will return the value of a keyword in a file, and Arcology.Roam.Keyword.from_file/1
will return a keyword-list:
I really need to find a better pattern for swapping between %File
structures and file-names; having the stubs isn't the worst thing, I like that there are explicit guards to signal mis-use, but I think that having to file.file |> dequote
is going to lead to some really difficult to debug query problems. Arcology.Roam.File.file_name file
(appropriately aliased) doesn't end up much longer, and it's less liable to be forgotten. It just feels really "old-school" or anachronistic, I guess, especially in a stub function.
@doc """
returns the value of a keyword from a particular file
"""
def from_file(%File{} = file, keyword) do
Keyword.from_file(File.get_name(file), keyword)
end
def from_file(file_name, keyword) when is_binary(file_name) do
not_found_trapdoor = &(&1 || ["nil"]) # ha-ha-ha
Repo.one(
Keyword
|> select([keyword], [keyword.value])
|> where(file: ^quote_string(file_name))
|> where(keyword: ^quote_string(keyword))
)
|> not_found_trapdoor.()
|> Enum.at(0)
|> dequote
end
@doc """
returns all keywords in a file
"""
def from_file(%File{} = file) do
Keyword.from_file(File.get_name(file))
end
def from_file(file) when is_binary(file) do
Repo.all(
Keyword
|> select([keyword], [keyword.keyword, keyword.value])
|> where(file: ^quote_string(file))
)
|> Enum.map(fn [keyword, value] -> {keyword|>dequote|>String.to_atom, value|>dequote} end)
end
from the list of files provided, from_files/2
loads all entities whose keyword is, well, keyword
. "give me the ARCOLOGY_KEY
for all these files"
def from_files(files, keyword) when is_list(files) do
Arcology.Repo.all(
Arcology.Roam.Keyword
|> where([kw], kw.file in ^files)
|> where(keyword: ^keyword)
)
end
Some simple accessor functions:
def get_value(nil), do: nil
def get_value(%Keyword{value: value}), do: value |> dequote()
def get_keyword(nil), do: nil
def get_keyword(%Keyword{keyword: keyword}), do: keyword |> dequote()
links
table Arcology.Roam.Link
The links
contains, well, links between files. The schema is pretty straightforward, but this is where queries start to get more complicated.
defmodule Arcology.Roam.Link do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
@primary_key false
schema "links" do
field(:source, :string)
field(:dest, :string)
field(:type, :string)
field(:properties, :string)
belongs_to :from_file, Arcology.Roam.File,
define_field: false,
references: :file,
foreign_key: :source
belongs_to :to_file, Arcology.Roam.File,
define_field: false,
references: :file,
foreign_key: :dest
end
<<links_public>>
<<links_all>>
<<links_files>>
<<links_to_list>>
<<links_get_content>>
end
There are a few forms of functions here to extract useful links from the database, they're used all over the place in extracting the backlinks for a file, or building the network graph. all/1
returns links of all types, files/1
returns links of type "file"
– quoted because the arcology-db is technical debt. I don't like having to define all these different forms, I eventually will build a thing that parrots how System.cmd
parses its options1.
scope | direction |
---|---|
all | to |
all | from |
all | |
files | to |
files | from |
files |
Public interfaces should use public_query/0
rather than querying the module directly. It's more expensive, but it returns only links which are to pages that have ARCOLOGY_KEY
keywords set.
def public_query do
Arcology.Roam.Link
|> join(:left, [l], kfrom in Arcology.Roam.Keyword, l.source == kfrom.file)
|> join(:left, [l, kfrom], kto in Arcology.Roam.Keyword, l.dest == kfrom.file)
|> where([l, kfrom, kto], kfrom.keyword == ~s("ARCOLOGY_KEY"))
end
I have a bunch of different forms of all/1
here, because I am bad at programming. wait 'til you see file/1
! Given a File object, it'll extract the file name and feed that to the one matching the is_binary
guard. That form will run the query and return the results.
def all do
Repo.all(public_query())
end
def all(to: %Arcology.Roam.File{} = destination) do
all(to: destination.file|>dequote())
end
def all(from: %Arcology.Roam.File{} = source) do
all(from: source.file|>dequote())
end
def all(to: destination) when is_binary(destination) do
Repo.all(
public_query()
|> where(dest: ^quote_string(destination))
) |> Arcology.Repo.preload(:from_file)
end
def all(from: source) when is_binary(source) do
Repo.all(
public_query()
|> where(source: ^quote_string(source))
) |> Arcology.Repo.preload(:to_file)
end
And would you look at this, from/1
does the same thing with an extra where
clause. i'll clean this up eventually, it can't be that hard, I'm just tired and being lazy.
def files do
Repo.all(public_query())
end
def files(to: destination) when is_binary(destination) do
Repo.all(
public_query()
|> where(dest: ^quote_string(destination))
|> where(type: ^~s("file"))
) |> Arcology.Repo.preload(:from_file)
end
def files(from: source) when is_binary(source) do
Repo.all(
public_query()
|> where(source: ^quote_string(source))
|> where(type: ^~s("file"))
) |> Arcology.Repo.preload(:to_file)
end
def files(to: %Arcology.Roam.File{} = destination) do
files(to: destination.file|>dequote())
end
def files(from: %Arcology.Roam.File{} = source) do
files(from: source.file|>dequote())
end
get_content/1
pulls the content out of the properties s-expression.
def get_content(%Arcology.Roam.Link{} = link) do
{:ok, sexp} = link.properties |> Arcology.Roam.parse_sexp()
sexp |> Arcology.Roam.plist_to_keywords |> Keyword.get(:content)
end
references
table Arcology.Roam.Reference
References are used to store a "canonical url" for a resource, usually external, where more information or the referenced doc itself lives. Another simple model, for the most part. At the boundaries, I expose maps, and there is to_map/1
that'll make one.
defmodule Arcology.Roam.Reference do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
@primary_key {:ref, :string, []}
schema "refs" do
# field(:ref, :string, unique: true)
field(:file, :string)
field(:type, :string)
belongs_to :f, Arcology.Roam.File,
type: :string,
references: :file,
foreign_key: :file,
define_field: false
end
@doc "Get a reference map for a Arcology.Roam.File"
def get_by(file: %Arcology.Roam.File{} = file), do: get_by(file.file|>dequote)
@doc "Get a reference map for a file path"
def get_by(file: filename) when is_binary(filename) do
Repo.one(
Arcology.Roam.Reference
|> where(file: ^quote_string(filename))
)
|> Repo.preload(:f)
|> to_map
end
@doc "Get an Arcology.Roam.File given a reference"
def get_by(ref: reference) do
Repo.one(
Arcology.Roam.Reference
|> where(ref: ^quote_string(reference))
)
|> Repo.preload(:f)
|> Map.get(:f)
end
def all do
Repo.all(
Arcology.Roam.Reference
)
|> Repo.preload(:f)
|> Enum.map(&(to_map(&1)))
end
def to_map(ref) when is_nil(ref), do: nil
def to_map(ref) do
%{
ref: ref.ref|>dequote,
type: ref.type|>dequote,
file: ref.file|>dequote,
}
end
end
titles
table Arcology.Roam.Title
Files can have multiple titles, on specified by #+TITLE
and multiple, each double-quoted, in #+ROAM_ALIAS
. The only function provided here is get
, which given a file name returns a list of strings. Easy stuff, I don't really want to need to do more than that with this schema.
defmodule Arcology.Roam.Title do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
@primary_key false
schema "titles" do
field(:file, :string)
field(:title, :string)
belongs_to :f, Arcology.Roam.File,
define_field: false,
references: :file
end
def to_list(titles) do
titles |> Enum.map(&(Map.get(&1, :title) |> dequote))
end
def get(filename) do
Repo.all(
Arcology.Roam.Title
|> where(file: ^quote_string(filename))
)
|> to_list
end
def from_files(files) when is_list(files) do
Arcology.Repo.all(
Arcology.Roam.Title
|> where([title], title.file in ^files)
)
end
def all() do
Arcology.Repo.all(
Arcology.Roam.Title
)
end
end
tags
table Arcology.Roam.Tag
When I had a forked arcology-db, this table was normalized, a query for the file would return multiple rows, but org-roam upstream stores all the tags in a single column, (print)
'd like link properties. This makes the data far less useful for querying than it used to be and I intend to fix that one way or another, maybe I will add a custom tags
table like I did for #+KEYWORD caching in org-roam-db.
This table contains file/tag tuples, every file can have many tags. The only code here that is really more interesting than the code above is merge_tags
which will take a list of maps returned from the database and flatten them in to a map of file -> list of tags
.
The module exposes a very minimal API – if this code is being called, the caller gets to work with strings, this is nice. Getting tags through Arcology.Roam.File
's association will inevitably leak the structures, and that's probably fine. I hope that I don't end up having to have a bunch of different type-matching functions to make coding "safe", I think I just need to use guards and structure matching more often, I think it's sound if I can adhere to it. haha, what a gotcha!
defmodule Arcology.Roam.Tag do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
alias Arcology.Roam.Tag
@primary_key false
schema "tags" do
field(:file, :string)
field(:tags, :string)
belongs_to :f, Arcology.Roam.File,
define_field: false,
references: :file
end
def get(filename) do
Repo.all(
Tag |> where(file: ^quote_string(filename))
)
|> process_tags_sexp()
#|> merge_tags()
#|> Map.get(filename)
end
def all do
Repo.all(Tag)
|> process_tags_sexp()
# |> merge_tags
end
process_tags_sexp
takes a list of Tag
objects, parses their tags s-expression in to a list, and then flattens that. I really want to fix this as described at the top of the heading; using the s-expression parser is not something I want to do this often!
def process_tags_sexp(%Arcology.Roam.Tag{} = tag), do: process_tags_sexp([tag])
def process_tags_sexp(_results = []), do: []
def process_tags_sexp(results) when is_nil(results), do: []
@doc "extract and process tags from s-expression"
def process_tags_sexp(results) do
for tag <- results do
{:ok, parsed} = Arcology.Roam.parse_sexp(tag.tags)
parsed
end
|> Enum.flat_map(& &1)
|> MapSet.new()
|> MapSet.to_list()
end
The only thing really in need of a lot of explanation is this merge_tags/1
, currently unused but maintained for when my tags are properly normalized. merge_tags/1
takes a list of Arcology.Roam.Tag
objects and returns a map of filename -> list of strings.
Simple stuff, but a bit obtuse. It's a recursive thing, it uses Map.update
to either create a list with a tag string in it, or update the map entry with the tag appended. It uses the nifty head/tail decomposition pattern that Elixir has2.
# @doc "Convert a list of Ecto results to a keyword list"
# def merge_tags(the_list) do
# merge_tags(the_list, %{})
# end
#
# defp merge_tags([%Tag{file: file, tag: tag} | rest], accumulator) do
# merge_tags(
# rest,
# Map.update(
# accumulator,
# file|>dequote,
# [tag|>dequote],
# fn existing ->
# existing ++ [tag|>dequote]
# end
# )
# )
# end
#
# defp merge_tags([], accumulator), do: accumulator
end