1
0
Fork 0
arcology-elixir/arcology_roam.org

670 lines
25 KiB
Org Mode

#+TITLE: Arcology Roam Models
#+ROAM_TAGS: Arcology
#+ROAM_ALIAS: "Arcology.Roam" "Arcology.Roam.File" "Arcology.Roam.Keyword" "Arcology.Roam.Link" "Arcology.Roam.Reference" "Arcology.Roam.Tag" "Arcology.Roam.Title"
#+ROAM_KEY: https://code.rix.si/rrix/arcology/src/branch/main/arcology_roam.org
#+ARCOLOGY_KEY: arcology/roam
These are the Ecto models and support functionality for the database tables provided by [[file:arcology_db.org][Arcology DB]]. These are all *read only* concerns, the write-path is in Arcology DB right now, where the org-mode parser is. Any database which I intend to write to should probably be an =postgrex= =ecto 3= model in its own OTP application, and is out of scope for this file.
Working with the Roam entities is designed to be handled in a few fashions:
- Higher Level Interface in [[file:arcology_page.org][Arcology.Page]]
- =Arcology.Roam.File.get= and =Arcology.Roam.File.all= which return "hydrated" Ecto models, they have all their links, references, tags, and titles preloaded, though they're still in the un-massaged structures rather than a compact or ergonomic data structure.
- =Arcology.Roam.Keyword.from_file= and =Arcology.Roam.Keyword.all= and =Arcology.Roam.Keyword.get= constitute the API for looking up Keywords. They're pretty self-explanatory, and return strings (in =Arcology.Roam.Keyword.from_file=) and =File= objects (elsewhere).
- =Arcology.Roam.Link.all= and =Arcology.Roam.Link.files= return links
- =Arcology.Roam.Reference.ref_file= lets you look up a file based on a reference URL's =ROAM_KEY= keyword, and =Arcology.Roam.Reference.file_ref= does the opposite, returns the reference key based for a file.
- =Arcology.Roam.Tag.get= and =Arcology.Roam.Tag.all= can be used to get a map of file name -> list of tags, which is probably useful; these are in a File object we're likely working with, but not in this normalized preferable state; I need to think about a better interface than "uhh when you destructure =%Arcology.Roam.File{tags: tags}= don't forget to run tags through a =to_map= function!" 'cause that kind of sucks to me.
- =Arcology.Roam.Title.get= does what it says on the tin, simple stuff. Feed it a file name string, get a list of titles out.
Other use cases oughta be considered and documented here. Again, I'm kind of nervous about this =%Arcology.Roam.File{}= interface, having these as un-processed structures seems like it might be a mistake, I may want to make my own proxy structure that distills the relatively heavy Ecto models down to lists of strings where I no longer care about the associations. Not having to care about write concerns makes life so easy here, I hope I don't regret designing around that later on.
* model support functions =Arcology.Roam=
Okay so in =Arcology.Roam= there is a bunch of functionality designed to be =use='d in to other models, mostly around transforming the emacsql strings in to useful shapes:
=Arcology.Roam.parse_sexp= takes a binary containing an s-expression string and returns a "real" data object. I use this in some pretty shameful ways, mostly to trick data in and out of the emacsql printed form. =Arcology.Roam.dequote= for example uses this to pull strings out of the database and reassemble them in to their object simply -- i "invented" a pattern for post-processing the results that uses these functions. it's hacky and i'm not super proud of it, but alas here we are splatting forms in to strings and parsing them back out.
#+begin_src elixir :noweb-ref roam_parse_sexp
def parse_sexp(form) do
SymbolicExpression.Parser.parse(form)
end
def dequote(form) do
{:ok, [parse]} = parse_sexp("(#{form})")
parse
end
#+end_src
Sometimes the things coming out of the DB are charlists instead of binaries, =Arcology.Roam.charlist_to_string= will try to join them but will fail if invalid UTF-8 or unicode is passed through:
#+begin_src elixir :noweb-ref roam_charlist_to_string
def charlist_to_string(raw) do
Enum.join(for <<c <- raw>>, do: <<c::utf8>>)
end
#+end_src
A =plist= is an s-expression of the form =(:key1 value1 :key2 value2)=, and =Arcology.Roam.plist_to_keywords= will return a keyword list parsed from this.
#+begin_src elixir :noweb-ref roam_plist_to_keywords
def plist_to_keywords(plist) do
plist
|> Enum.chunk_every(2, 2)
|> Enum.map(&List.to_tuple(&1))
|> Enum.map(&clean_tuple_key(&1))
end
def clean_tuple_key(tuple) do
k =
elem(tuple, 0)
|> Atom.to_string()
|> String.trim_leading(":")
|> String.to_atom()
{k, elem(tuple, 1)}
end
#+end_src
And =Arcology.Roam.quote_string= is used to escape file names in query building:
#+begin_src elixir :noweb-ref roam_quote_string
@doc "XXX: This is only for filenames."
def quote_string(string) do
~s("#{string}")
end
#+end_src
** Assemblage
#+begin_src elixir :mkdirp yes :tangle lib/arcology/roam.ex :noweb yes
defmodule Arcology.Roam do
<<roam_parse_sexp>>
<<roam_charlist_to_string>>
<<roam_plist_to_keywords>>
<<roam_clean_file_name>>
<<roam_quote_string>>
end
#+end_src
* =files= table =Arcology.Roam.File=
The =Arcology.Roam.File= module contains associations to the rest of the other entities defined in this file, it's the root of a short model hierarchy that is expressed in the code below. =org-roam='s parser caches the hash of the file, helpfully providing a cache key to be easily referenced later on. This module is used by the [[file:arcology_page.org][Arcology Page Module]] to provide high-level interface to single Pages in the document graph.
The Ecto schema provides associative access to all relevant metadata, as well as files on the far ends of links. I need to move from to/from names to inbound/outbound or sender/receiver. I always find myself confused by to/from, somehow. The =Arcology.Roam.File.preloads= function will load all relevant associations for the model or list passed in like =Ecto.Repo=. This'll be nice to have on any of the accessor functions defined in the module itself.
#+begin_src elixir :noweb-ref files_ecto
use Ecto.Schema
alias Arcology.Repo
@primary_key {:file, :string, []}
schema "files" do
# field :file, :string
field :hash, :string
field :meta, :string
has_many :titles, Arcology.Roam.Title,
foreign_key: :file,
references: :file
# This table is not normalized in org-roam upstream; tags.tags is an s-expression
has_one :tags, Arcology.Roam.Tag,
foreign_key: :file,
references: :file
has_many :keywords, Arcology.Roam.Keyword,
foreign_key: :file,
references: :file
has_one :reference, Arcology.Roam.Reference,
foreign_key: :file,
references: :file
has_many :links_to, Arcology.Roam.Link, references: :file, foreign_key: :dest
has_many :links_from, Arcology.Roam.Link, references: :file, foreign_key: :source
has_many :files_to, Arcology.Roam.File, references: :file, foreign_key: :dest
has_many :files_from, Arcology.Roam.File, references: :file, foreign_key: :source
end
def preloads q do
file_links = from(l in Arcology.Roam.Link, where: l.type == ~s("file"))
Repo.preload(q, [
:titles,
:reference,
:tags,
:keywords,
:links_to,
:links_from,
files_to: file_links,
files_from: file_links
])
end
#+end_src
#+begin_export html
<a id="def_get_file"/>
#+end_export
There is a =get= and =all= fetcher, and access functions which =dequote= the hash and file path. "def get file file equals file do file pipe file fetch file pipe dequote end" has to be some of my least-legible code, but it does roll off the tongue pretty well. My UNIX brain is activated with the pipe operator, I want to design everything around and with it to the detriment of better patterns, perhaps.
#+begin_src elixir :mkdirp yes :tangle lib/arcology/roam/file.ex :noweb yes
defmodule Arcology.Roam.File do
import Ecto.Query
import Arcology.Roam
alias Arcology.Roam.File
<<files_ecto>>
def all do
Repo.all(File) |> preloads()
end
def get filename do
Repo.get_by(File,
file: quote_string(filename)
) |> preloads()
end
def get_hash(%File{} = file) do
file |> Map.get(:hash) |> dequote
end
def get_name(%File{} = file) do
file |> Map.get(:file) |> dequote
end
end
#+end_src
* =keywords= table =Arcology.Roam.Keyword=
The =keywords= table exposes Key/Value settings embedded in files of the form =#+KEY: value=, at least those which are added to the =arcology-batch= configuration in [[file:arcology_db.org][arcology-db]]. the shape of the association is:
#+begin_src elixir :noweb yes :tangle lib/arcology/roam/keyword.ex
defmodule Arcology.Roam.Keyword do
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
alias Arcology.Repo
alias Arcology.Roam.{File, Keyword}
@primary_key false
schema "keywords" do
field(:file, :string)
field(:keyword, :string)
field(:value, :string)
belongs_to :f, Arcology.Roam.File,
type: :string,
references: :file,
foreign_key: :file,
define_field: false
end
<<keyword_all>>
<<keyword_get>>
<<keyword_from_file>>
<<keyword_from_files>>
<<keyword_accessors>>
end
#+end_src
#+begin_src emacs-lisp :exports results
(org-roam-db-query [:select * :from keywords :order-by file :limit 10])
#+end_src
#+results:
| /home/rrix/org/2018_08_state_of_the_art.org | ARCOLOGY_KEY | garden/2018-state-of-the-art |
| /home/rrix/org/2018_08_state_of_the_art.org | ARCOLOGY_TWITTER | https://twitter.com/rrrrrrrix/status/1033934998724796416 |
| /home/rrix/org/2018_08_state_of_the_art.org | ARCOLOGY_OLD_PERMALINK | 1535364960.0-note.html |
| /home/rrix/org/2019_san_diego_and_phoenix_trip.org | ARCOLOGY_KEY | lionsrear/2019-ca-phx-trip |
| /home/rrix/org/2019_san_diego_and_phoenix_trip.org | ARCOLOGY_TWITTER | https://twitter.com/rrrrrrrix/status/1202778552875339776 |
| /home/rrix/org/2019_san_diego_and_phoenix_trip.org | ARCOLOGY_OLD_PERMALINK | 1575596160.0-note.html |
| /home/rrix/org/2019_westfalia_vanagon_trip.org | ARCOLOGY_KEY | lionsrear/2019-westfalia-vanagon |
| /home/rrix/org/2019_westfalia_vanagon_trip.org | ARCOLOGY_TWITTER | https://twitter.com/rrrrrrrix/status/1175924011550871557 |
| /home/rrix/org/2019_westfalia_vanagon_trip.org | ARCOLOGY_OLD_PERMALINK | 1569194520.0-note.html |
| /home/rrix/org/Bioregionalism.org | ARCOLOGY_KEY | lionsrear/bioregionalism |
There are two forms of =Arcology.Roam.Keyword.all=, one of which returns all the keywords in the DB, one returns all rows which have the keyword which is passed in.
#+begin_src elixir :noweb-ref keyword_all
defp preloads results do
Repo.preload(results, [:f])
end
def all(keyword) do
Repo.all(
Keyword
|> where(keyword: ^quote_string(keyword))
) |> preloads
end
def all do
Repo.all(Keyword)
|> preloads
end
#+end_src
=Arcology.Roam.Keyword.get/2= return all keywords of a certain type and value, it's used to map URLs to =ARCOLOGY_KEY= entities. Keywords are *not* unique, files can only have one value for a certain keyword, but many files can have a different value.
#+begin_src elixir :noweb-ref keyword_get
@doc """
returns all files whose KEYWORD is set to VALUE
"""
def get(keyword, value) do
Repo.all(
Keyword
|> where(keyword: ^quote_string(keyword))
|> where(value: ^quote_string(value))
) |> preloads
end
#+end_src
=Arcology.Roam.Keyword.from_file/2= will return the value of a keyword in a file, and =Arcology.Roam.Keyword.from_file/1= will return a keyword-list:
I really need to find a better pattern for swapping between =%File= structures and file-names; having the stubs isn't the worst thing, I like that there are explicit guards to signal mis-use, but I think that having to =file.file |> dequote= is going to lead to some really difficult to debug query problems. =Arcology.Roam.File.file_name file= (appropriately aliased) doesn't end up much longer, and it's less liable to be forgotten. It just feels really "old-school" or anachronistic, I guess, especially in a stub function.
#+begin_src elixir :noweb-ref keyword_from_file
@doc """
returns the value of a keyword from a particular file
"""
def from_file(%File{} = file, keyword) do
Keyword.from_file(File.get_name(file), keyword)
end
def from_file(file_name, keyword) when is_binary(file_name) do
not_found_trapdoor = &(&1 || ["nil"]) # ha-ha-ha
Repo.one(
Keyword
|> select([keyword], [keyword.value])
|> where(file: ^quote_string(file_name))
|> where(keyword: ^quote_string(keyword))
)
|> not_found_trapdoor.()
|> Enum.at(0)
|> dequote
end
@doc """
returns all keywords in a file
"""
def from_file(%File{} = file) do
Keyword.from_file(File.get_name(file))
end
def from_file(file) when is_binary(file) do
Repo.all(
Keyword
|> select([keyword], [keyword.keyword, keyword.value])
|> where(file: ^quote_string(file))
)
|> Enum.map(fn [keyword, value] -> {keyword|>dequote|>String.to_atom, value|>dequote} end)
end
#+end_src
from the list of files provided, =from_files/2= loads all entities whose keyword is, well, =keyword=. "give me the =ARCOLOGY_KEY= for all these files"
#+begin_src elixir :noweb-ref keyword_from_files
def from_files(files, keyword) when is_list(files) do
Arcology.Repo.all(
Arcology.Roam.Keyword
|> where([kw], kw.file in ^files)
|> where(keyword: ^keyword)
)
end
#+end_src
Some simple accessor functions:
#+begin_src elixir :noweb-ref keyword_accessors
def get_value(nil), do: nil
def get_value(%Keyword{value: value}), do: value |> dequote()
def get_keyword(nil), do: nil
def get_keyword(%Keyword{keyword: keyword}), do: keyword |> dequote()
#+end_src
* =links= table =Arcology.Roam.Link=
The =links= contains, well, links between files. The schema is pretty straightforward, but this is where queries start to get more complicated.
#+begin_src elixir :noweb yes :tangle lib/arcology/roam/link.ex
defmodule Arcology.Roam.Link do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
@primary_key false
schema "links" do
field(:source, :string)
field(:dest, :string)
field(:type, :string)
field(:properties, :string)
belongs_to :from_file, Arcology.Roam.File,
define_field: false,
references: :file,
foreign_key: :source
belongs_to :to_file, Arcology.Roam.File,
define_field: false,
references: :file,
foreign_key: :dest
end
<<links_public>>
<<links_all>>
<<links_files>>
<<links_to_list>>
<<links_get_content>>
end
#+end_src
There are a few forms of functions here to extract useful links from the database, they're used all over the place in extracting the backlinks for a file, or building the network graph. =all/1= returns links of all types, =files/1= returns links of type ="file"= -- quoted because the [[file:arcology_db.org][arcology-db]] is technical debt. I don't like having to define all these different forms, I eventually will build a thing that parrots how =System.cmd= parses its options[fn:1].
| scope | direction |
|-------+-----------|
| all | to |
| all | from |
| all | |
| files | to |
| files | from |
| files | |
|-------+-----------|
Public interfaces should use =public_query/0= rather than querying the module directly. It's more expensive, but it returns only links which are to pages that have =ARCOLOGY_KEY= keywords set.
#+begin_src elixir :noweb-ref links_public
def public_query do
Arcology.Roam.Link
|> join(:left, [l], kfrom in Arcology.Roam.Keyword, l.source == kfrom.file)
|> join(:left, [l, kfrom], kto in Arcology.Roam.Keyword, l.dest == kfrom.file)
|> where([l, kfrom, kto], kfrom.keyword == ~s("ARCOLOGY_KEY"))
end
#+end_src
I have a bunch of different forms of =all/1= here, because I am bad at programming. wait 'til you see =file/1=! Given a File object, it'll extract the file name and feed that to the one matching the =is_binary= guard. That form will run the query and return the results.
#+begin_src elixir :noweb-ref links_all
def all do
Repo.all(public_query())
end
def all(to: %Arcology.Roam.File{} = destination) do
all(to: destination.file|>dequote())
end
def all(from: %Arcology.Roam.File{} = source) do
all(from: source.file|>dequote())
end
def all(to: destination) when is_binary(destination) do
Repo.all(
public_query()
|> where(dest: ^quote_string(destination))
) |> Arcology.Repo.preload(:from_file)
end
def all(from: source) when is_binary(source) do
Repo.all(
public_query()
|> where(source: ^quote_string(source))
) |> Arcology.Repo.preload(:to_file)
end
#+end_src
And would you look at this, =from/1= does the same thing with an extra =where= clause. i'll clean this up eventually, it can't be that hard, I'm just tired and being lazy.
#+begin_src elixir :noweb-ref links_files
def files do
Repo.all(public_query())
end
def files(to: destination) when is_binary(destination) do
Repo.all(
public_query()
|> where(dest: ^quote_string(destination))
|> where(type: ^~s("file"))
) |> Arcology.Repo.preload(:from_file)
end
def files(from: source) when is_binary(source) do
Repo.all(
public_query()
|> where(source: ^quote_string(source))
|> where(type: ^~s("file"))
) |> Arcology.Repo.preload(:to_file)
end
def files(to: %Arcology.Roam.File{} = destination) do
files(to: destination.file|>dequote())
end
def files(from: %Arcology.Roam.File{} = source) do
files(from: source.file|>dequote())
end
#+end_src
=get_content/1= pulls the content out of the properties s-expression.
#+begin_src elixir :noweb-ref links_get_content
def get_content(%Arcology.Roam.Link{} = link) do
{:ok, sexp} = link.properties |> Arcology.Roam.parse_sexp()
sexp |> Arcology.Roam.plist_to_keywords |> Keyword.get(:content)
end
#+end_src
* =references= table =Arcology.Roam.Reference=
References are used to store a "canonical url" for a resource, usually external, where more information or the referenced doc itself lives. Another simple model, for the most part. At the boundaries, I expose maps, and there is =to_map/1= that'll make one.
#+begin_src elixir :tangle lib/arcology/roam/reference.ex
defmodule Arcology.Roam.Reference do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
@primary_key {:ref, :string, []}
schema "refs" do
# field(:ref, :string, unique: true)
field(:file, :string)
field(:type, :string)
belongs_to :f, Arcology.Roam.File,
type: :string,
references: :file,
foreign_key: :file,
define_field: false
end
@doc "Get a reference map for a Arcology.Roam.File"
def get_by(file: %Arcology.Roam.File{} = file), do: get_by(file.file|>dequote)
@doc "Get a reference map for a file path"
def get_by(file: filename) when is_binary(filename) do
Repo.one(
Arcology.Roam.Reference
|> where(file: ^quote_string(filename))
)
|> Repo.preload(:f)
|> to_map
end
@doc "Get an Arcology.Roam.File given a reference"
def get_by(ref: reference) do
Repo.one(
Arcology.Roam.Reference
|> where(ref: ^quote_string(reference))
)
|> Repo.preload(:f)
|> Map.get(:f)
end
def all do
Repo.all(
Arcology.Roam.Reference
)
|> Repo.preload(:f)
|> Enum.map(&(to_map(&1)))
end
def to_map(ref) when is_nil(ref), do: nil
def to_map(ref) do
%{
ref: ref.ref|>dequote,
type: ref.type|>dequote,
file: ref.file|>dequote,
}
end
end
#+end_src
* =titles= table =Arcology.Roam.Title=
Files can have multiple titles, on specified by =#+TITLE= and multiple, each double-quoted, in =#+ROAM_ALIAS=. The only function provided here is =get=, which given a file name returns a list of strings. Easy stuff, I don't really want to need to do more than that with this schema.
#+begin_src elixir :tangle lib/arcology/roam/title.ex
defmodule Arcology.Roam.Title do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
@primary_key false
schema "titles" do
field(:file, :string)
field(:title, :string)
belongs_to :f, Arcology.Roam.File,
define_field: false,
references: :file
end
def to_list(titles) do
titles |> Enum.map(&(Map.get(&1, :title) |> dequote))
end
def get(filename) do
Repo.all(
Arcology.Roam.Title
|> where(file: ^quote_string(filename))
)
|> to_list
end
def from_files(files) when is_list(files) do
Arcology.Repo.all(
Arcology.Roam.Title
|> where([title], title.file in ^files)
)
end
def all() do
Arcology.Repo.all(
Arcology.Roam.Title
)
end
end
#+end_src
* =tags= table =Arcology.Roam.Tag=
When I had a forked [[file:arcology_db.org][arcology-db]], this table was normalized, a query for the file would return multiple rows, but [[file:../cce/org-roam.org][org-roam]] upstream stores all the tags in a single column, =(print)='d like link properties. This makes the data far less useful for *querying* than it used to be and I intend to fix that one way or another, maybe I will add a custom =tags= table like I did for [[file:../keyword_caching_in_org_roam_db_issue_672_org_roam_org_roam.org][#+KEYWORD caching in org-roam-db]].
This table contains file/tag tuples, every file can have many tags. The only code here that is really more interesting than the code above is =merge_tags= which will take a list of maps returned from the database and flatten them in to a map of =file -> list of tags=.
The module exposes a very minimal API -- if this code is being called, the caller gets to work with strings, this is nice. Getting tags through =Arcology.Roam.File='s association will inevitably leak the structures, and that's probably fine. I hope that I don't end up having to have a bunch of different type-matching functions to make coding "safe", I think I just need to use guards and structure matching more often, I think it's sound if I can adhere to it. haha, what a gotcha!
#+begin_src elixir :tangle lib/arcology/roam/tag.ex
defmodule Arcology.Roam.Tag do
alias Arcology.Repo
use Ecto.Schema
import Ecto.Query
import Arcology.Roam
alias Arcology.Roam.Tag
@primary_key false
schema "tags" do
field(:file, :string)
field(:tags, :string)
belongs_to :f, Arcology.Roam.File,
define_field: false,
references: :file
end
def get(filename) do
Repo.all(
Tag |> where(file: ^quote_string(filename))
)
|> process_tags_sexp()
#|> merge_tags()
#|> Map.get(filename)
end
def all do
Repo.all(Tag)
|> process_tags_sexp()
# |> merge_tags
end
#+END_SRC
=process_tags_sexp= takes a list of =Tag= objects, parses their tags s-expression in to a list, and then flattens that. I really want to fix this as described at the top of the heading; using the s-expression parser is not something I want to do this often!
#+begin_src elixir :tangle lib/arcology/roam/tag.ex
def process_tags_sexp(%Arcology.Roam.Tag{} = tag), do: process_tags_sexp([tag])
def process_tags_sexp(_results = []), do: []
def process_tags_sexp(results) when is_nil(results), do: []
@doc "extract and process tags from s-expression"
def process_tags_sexp(results) do
for tag <- results do
{:ok, parsed} = Arcology.Roam.parse_sexp(tag.tags)
parsed
end
|> Enum.flat_map(& &1)
|> MapSet.new()
|> MapSet.to_list()
end
#+end_src
The only thing really in need of a lot of explanation is this =merge_tags/1=, currently unused but maintained for when my tags are properly normalized. =merge_tags/1= takes a list of =Arcology.Roam.Tag= objects and returns a map of filename -> list of strings.
Simple stuff, but a bit obtuse. It's a recursive thing, it uses =Map.update= to either create a list with a tag string in it, or update the map entry with the tag appended. It uses the nifty head/tail decomposition pattern that Elixir has[fn:2].
#+begin_src elixir :tangle lib/arcology/roam/tag.ex
# @doc "Convert a list of Ecto results to a keyword list"
# def merge_tags(the_list) do
# merge_tags(the_list, %{})
# end
#
# defp merge_tags([%Tag{file: file, tag: tag} | rest], accumulator) do
# merge_tags(
# rest,
# Map.update(
# accumulator,
# file|>dequote,
# [tag|>dequote],
# fn existing ->
# existing ++ [tag|>dequote]
# end
# )
# )
# end
#
# defp merge_tags([], accumulator), do: accumulator
end
#+end_src
* Footnotes
[fn:1] https://github.com/elixir-lang/elixir/blob/87710cd49521fc396f3f04fa8615f05e7f57ecc0/lib/elixir/lib/system.ex#L578-L603
[fn:2] https://elixir-lang.org/getting-started/recursion.html#reduce-and-map-algorithms