1
0
Fork 0
arcology-elixir/arcology_roam.org

25 KiB
Raw Permalink Blame History

Arcology Roam Models

These are the Ecto models and support functionality for the database tables provided by Arcology DB. These are all read only concerns, the write-path is in Arcology DB right now, where the org-mode parser is. Any database which I intend to write to should probably be an postgrex ecto 3 model in its own OTP application, and is out of scope for this file.

Working with the Roam entities is designed to be handled in a few fashions:

  • Higher Level Interface in Arcology.Page
  • Arcology.Roam.File.get and Arcology.Roam.File.all which return "hydrated" Ecto models, they have all their links, references, tags, and titles preloaded, though they're still in the un-massaged structures rather than a compact or ergonomic data structure.
  • Arcology.Roam.Keyword.from_file and Arcology.Roam.Keyword.all and Arcology.Roam.Keyword.get constitute the API for looking up Keywords. They're pretty self-explanatory, and return strings (in Arcology.Roam.Keyword.from_file) and File objects (elsewhere).
  • Arcology.Roam.Link.all and Arcology.Roam.Link.files return links
  • Arcology.Roam.Reference.ref_file lets you look up a file based on a reference URL's ROAM_KEY keyword, and Arcology.Roam.Reference.file_ref does the opposite, returns the reference key based for a file.
  • Arcology.Roam.Tag.get and Arcology.Roam.Tag.all can be used to get a map of file name -> list of tags, which is probably useful; these are in a File object we're likely working with, but not in this normalized preferable state; I need to think about a better interface than "uhh when you destructure %Arcology.Roam.File{tags: tags} don't forget to run tags through a to_map function!" 'cause that kind of sucks to me.
  • Arcology.Roam.Title.get does what it says on the tin, simple stuff. Feed it a file name string, get a list of titles out.

Other use cases oughta be considered and documented here. Again, I'm kind of nervous about this %Arcology.Roam.File{} interface, having these as un-processed structures seems like it might be a mistake, I may want to make my own proxy structure that distills the relatively heavy Ecto models down to lists of strings where I no longer care about the associations. Not having to care about write concerns makes life so easy here, I hope I don't regret designing around that later on.

model support functions Arcology.Roam

Okay so in Arcology.Roam there is a bunch of functionality designed to be use'd in to other models, mostly around transforming the emacsql strings in to useful shapes:

Arcology.Roam.parse_sexp takes a binary containing an s-expression string and returns a "real" data object. I use this in some pretty shameful ways, mostly to trick data in and out of the emacsql printed form. Arcology.Roam.dequote for example uses this to pull strings out of the database and reassemble them in to their object simply i "invented" a pattern for post-processing the results that uses these functions. it's hacky and i'm not super proud of it, but alas here we are splatting forms in to strings and parsing them back out.

def parse_sexp(form) do
  SymbolicExpression.Parser.parse(form)
end

def dequote(form) do
  {:ok, [parse]} = parse_sexp("(#{form})")
  parse
end

Sometimes the things coming out of the DB are charlists instead of binaries, Arcology.Roam.charlist_to_string will try to join them but will fail if invalid UTF-8 or unicode is passed through:

def charlist_to_string(raw) do
  Enum.join(for <<c <- raw>>, do: <<c::utf8>>)
end

A plist is an s-expression of the form (:key1 value1 :key2 value2), and Arcology.Roam.plist_to_keywords will return a keyword list parsed from this.

def plist_to_keywords(plist) do
  plist
  |> Enum.chunk_every(2, 2)
  |> Enum.map(&List.to_tuple(&1))
  |> Enum.map(&clean_tuple_key(&1))
end

def clean_tuple_key(tuple) do
  k =
    elem(tuple, 0)
    |> Atom.to_string()
    |> String.trim_leading(":")
    |> String.to_atom()

  {k, elem(tuple, 1)}
end

And Arcology.Roam.quote_string is used to escape file names in query building:

@doc "XXX: This is only for filenames."
def quote_string(string) do
  ~s("#{string}")
end

Assemblage

defmodule Arcology.Roam do
  <<roam_parse_sexp>>
  <<roam_charlist_to_string>>
  <<roam_plist_to_keywords>>
  <<roam_clean_file_name>>
  <<roam_quote_string>>
end

files table Arcology.Roam.File

The Arcology.Roam.File module contains associations to the rest of the other entities defined in this file, it's the root of a short model hierarchy that is expressed in the code below. org-roam's parser caches the hash of the file, helpfully providing a cache key to be easily referenced later on. This module is used by the Arcology Page Module to provide high-level interface to single Pages in the document graph.

The Ecto schema provides associative access to all relevant metadata, as well as files on the far ends of links. I need to move from to/from names to inbound/outbound or sender/receiver. I always find myself confused by to/from, somehow. The Arcology.Roam.File.preloads function will load all relevant associations for the model or list passed in like Ecto.Repo. This'll be nice to have on any of the accessor functions defined in the module itself.

use Ecto.Schema
alias Arcology.Repo

@primary_key {:file, :string, []}

schema "files" do
  # field :file, :string
  field :hash, :string
  field :meta, :string

  has_many :titles, Arcology.Roam.Title,
           foreign_key: :file,
           references: :file

  # This table is not normalized in org-roam upstream; tags.tags is an s-expression
  has_one  :tags, Arcology.Roam.Tag,
           foreign_key: :file,
           references: :file

  has_many :keywords, Arcology.Roam.Keyword,
           foreign_key: :file,
           references: :file

  has_one :reference, Arcology.Roam.Reference,
          foreign_key: :file,
          references: :file

  has_many :links_to, Arcology.Roam.Link, references: :file, foreign_key: :dest
  has_many :links_from, Arcology.Roam.Link, references: :file, foreign_key: :source

  has_many :files_to, Arcology.Roam.File, references: :file, foreign_key: :dest
  has_many :files_from, Arcology.Roam.File, references: :file, foreign_key: :source
end

def preloads q do
  file_links = from(l in Arcology.Roam.Link, where: l.type == ~s("file"))
  Repo.preload(q, [
    :titles,
    :reference,
    :tags,
    :keywords,
    :links_to,
    :links_from,
    files_to: file_links,
    files_from: file_links
  ])
end

There is a get and all fetcher, and access functions which dequote the hash and file path. "def get file file equals file do file pipe file fetch file pipe dequote end" has to be some of my least-legible code, but it does roll off the tongue pretty well. My UNIX brain is activated with the pipe operator, I want to design everything around and with it to the detriment of better patterns, perhaps.

defmodule Arcology.Roam.File do
  import Ecto.Query
  import Arcology.Roam
  alias Arcology.Roam.File

  <<files_ecto>>

  def all do
    Repo.all(File) |> preloads()
  end

  def get filename do
    Repo.get_by(File,
      file: quote_string(filename)
    ) |> preloads()
  end

  def get_hash(%File{} = file) do
    file |> Map.get(:hash) |> dequote
  end

  def get_name(%File{} = file) do
    file |> Map.get(:file) |> dequote
  end
end

keywords table Arcology.Roam.Keyword

The keywords table exposes Key/Value settings embedded in files of the form #+KEY: value, at least those which are added to the arcology-batch configuration in arcology-db. the shape of the association is:

defmodule Arcology.Roam.Keyword do
  use Ecto.Schema
  import Ecto.Query
  import Arcology.Roam

  alias Arcology.Repo
  alias Arcology.Roam.{File, Keyword}

  @primary_key false

  schema "keywords" do
    field(:file, :string)
    field(:keyword, :string)
    field(:value, :string)

    belongs_to :f, Arcology.Roam.File,
               type: :string,
               references: :file,
               foreign_key: :file,
               define_field: false
  end

  <<keyword_all>>
  <<keyword_get>>
  <<keyword_from_file>>
  <<keyword_from_files>>

  <<keyword_accessors>>
end
/home/rrix/org/2018_08_state_of_the_art.org ARCOLOGY_KEY garden/2018-state-of-the-art
/home/rrix/org/2018_08_state_of_the_art.org ARCOLOGY_TWITTER https://twitter.com/rrrrrrrix/status/1033934998724796416
/home/rrix/org/2018_08_state_of_the_art.org ARCOLOGY_OLD_PERMALINK 1535364960.0-note.html
/home/rrix/org/2019_san_diego_and_phoenix_trip.org ARCOLOGY_KEY lionsrear/2019-ca-phx-trip
/home/rrix/org/2019_san_diego_and_phoenix_trip.org ARCOLOGY_TWITTER https://twitter.com/rrrrrrrix/status/1202778552875339776
/home/rrix/org/2019_san_diego_and_phoenix_trip.org ARCOLOGY_OLD_PERMALINK 1575596160.0-note.html
/home/rrix/org/2019_westfalia_vanagon_trip.org ARCOLOGY_KEY lionsrear/2019-westfalia-vanagon
/home/rrix/org/2019_westfalia_vanagon_trip.org ARCOLOGY_TWITTER https://twitter.com/rrrrrrrix/status/1175924011550871557
/home/rrix/org/2019_westfalia_vanagon_trip.org ARCOLOGY_OLD_PERMALINK 1569194520.0-note.html
/home/rrix/org/Bioregionalism.org ARCOLOGY_KEY lionsrear/bioregionalism

There are two forms of Arcology.Roam.Keyword.all, one of which returns all the keywords in the DB, one returns all rows which have the keyword which is passed in.

defp preloads results do
  Repo.preload(results, [:f])
end

def all(keyword) do
  Repo.all(
    Keyword
    |> where(keyword: ^quote_string(keyword))
  ) |> preloads
end

def all do
  Repo.all(Keyword)
  |> preloads
end

Arcology.Roam.Keyword.get/2 return all keywords of a certain type and value, it's used to map URLs to ARCOLOGY_KEY entities. Keywords are not unique, files can only have one value for a certain keyword, but many files can have a different value.

@doc """
returns all files whose KEYWORD is set to VALUE
"""
def get(keyword, value) do
  Repo.all(
    Keyword
    |> where(keyword: ^quote_string(keyword))
    |> where(value: ^quote_string(value))
  ) |> preloads
end

Arcology.Roam.Keyword.from_file/2 will return the value of a keyword in a file, and Arcology.Roam.Keyword.from_file/1 will return a keyword-list:

I really need to find a better pattern for swapping between %File structures and file-names; having the stubs isn't the worst thing, I like that there are explicit guards to signal mis-use, but I think that having to file.file |> dequote is going to lead to some really difficult to debug query problems. Arcology.Roam.File.file_name file (appropriately aliased) doesn't end up much longer, and it's less liable to be forgotten. It just feels really "old-school" or anachronistic, I guess, especially in a stub function.

@doc """
returns the value of a keyword from a particular file
"""
def from_file(%File{} = file, keyword) do
  Keyword.from_file(File.get_name(file), keyword)
end

def from_file(file_name, keyword) when is_binary(file_name) do
  not_found_trapdoor = &(&1 || ["nil"]) # ha-ha-ha
  Repo.one(
    Keyword
    |> select([keyword], [keyword.value])
    |> where(file: ^quote_string(file_name))
    |> where(keyword: ^quote_string(keyword))
  )
  |> not_found_trapdoor.()
  |> Enum.at(0)
  |> dequote
end

@doc """
returns all keywords in a file
"""
def from_file(%File{} = file) do
  Keyword.from_file(File.get_name(file))
end

def from_file(file) when is_binary(file) do
  Repo.all(
    Keyword
    |> select([keyword], [keyword.keyword, keyword.value])
    |> where(file: ^quote_string(file))
  )
  |> Enum.map(fn [keyword, value] -> {keyword|>dequote|>String.to_atom, value|>dequote} end)
end

from the list of files provided, from_files/2 loads all entities whose keyword is, well, keyword. "give me the ARCOLOGY_KEY for all these files"

def from_files(files, keyword) when is_list(files) do
  Arcology.Repo.all(
    Arcology.Roam.Keyword
    |> where([kw], kw.file in ^files)
    |> where(keyword: ^keyword)
  )
end

Some simple accessor functions:

def get_value(nil), do: nil
def get_value(%Keyword{value: value}), do: value |> dequote()
def get_keyword(nil), do: nil
def get_keyword(%Keyword{keyword: keyword}), do: keyword |> dequote()

links table Arcology.Roam.Link

The links contains, well, links between files. The schema is pretty straightforward, but this is where queries start to get more complicated.

defmodule Arcology.Roam.Link do
  alias Arcology.Repo
  use Ecto.Schema
  import Ecto.Query
  import Arcology.Roam

  @primary_key false

  schema "links" do
    field(:source, :string)
    field(:dest, :string)
    field(:type, :string)
    field(:properties, :string)

    belongs_to :from_file, Arcology.Roam.File,
               define_field: false,
               references: :file,
               foreign_key: :source
    belongs_to :to_file, Arcology.Roam.File,
               define_field: false,
               references: :file,
               foreign_key: :dest
  end

  <<links_public>>
  <<links_all>>
  <<links_files>>
  <<links_to_list>>
  <<links_get_content>>
end

There are a few forms of functions here to extract useful links from the database, they're used all over the place in extracting the backlinks for a file, or building the network graph. all/1 returns links of all types, files/1 returns links of type "file" quoted because the arcology-db is technical debt. I don't like having to define all these different forms, I eventually will build a thing that parrots how System.cmd parses its options1.

scope direction
all to
all from
all
files to
files from
files

Public interfaces should use public_query/0 rather than querying the module directly. It's more expensive, but it returns only links which are to pages that have ARCOLOGY_KEY keywords set.

def public_query do
  Arcology.Roam.Link
  |> join(:left, [l], kfrom in Arcology.Roam.Keyword, l.source == kfrom.file)
  |> join(:left, [l, kfrom], kto in Arcology.Roam.Keyword, l.dest == kfrom.file)
  |> where([l, kfrom, kto], kfrom.keyword == ~s("ARCOLOGY_KEY"))
end

I have a bunch of different forms of all/1 here, because I am bad at programming. wait 'til you see file/1! Given a File object, it'll extract the file name and feed that to the one matching the is_binary guard. That form will run the query and return the results.

def all do
  Repo.all(public_query())
end

def all(to: %Arcology.Roam.File{} = destination) do
  all(to: destination.file|>dequote())
end

def all(from: %Arcology.Roam.File{} = source) do
  all(from: source.file|>dequote())
end

def all(to: destination) when is_binary(destination) do
  Repo.all(
    public_query()
    |> where(dest: ^quote_string(destination))
  ) |> Arcology.Repo.preload(:from_file)
end

def all(from: source) when is_binary(source) do
  Repo.all(
    public_query()
    |> where(source: ^quote_string(source))
  ) |> Arcology.Repo.preload(:to_file)
end

And would you look at this, from/1 does the same thing with an extra where clause. i'll clean this up eventually, it can't be that hard, I'm just tired and being lazy.

def files do
  Repo.all(public_query())
end

def files(to: destination) when is_binary(destination) do
  Repo.all(
    public_query()
    |> where(dest: ^quote_string(destination))
    |> where(type: ^~s("file"))
  ) |> Arcology.Repo.preload(:from_file)
end

def files(from: source) when is_binary(source) do
  Repo.all(
    public_query()
    |> where(source: ^quote_string(source))
    |> where(type: ^~s("file"))
  ) |> Arcology.Repo.preload(:to_file)
end

def files(to: %Arcology.Roam.File{} = destination) do
  files(to: destination.file|>dequote())
end

def files(from: %Arcology.Roam.File{} = source) do
  files(from: source.file|>dequote())
end

get_content/1 pulls the content out of the properties s-expression.

def get_content(%Arcology.Roam.Link{} = link) do
  {:ok, sexp} = link.properties |> Arcology.Roam.parse_sexp()
  sexp |> Arcology.Roam.plist_to_keywords |> Keyword.get(:content)
end

references table Arcology.Roam.Reference

References are used to store a "canonical url" for a resource, usually external, where more information or the referenced doc itself lives. Another simple model, for the most part. At the boundaries, I expose maps, and there is to_map/1 that'll make one.

defmodule Arcology.Roam.Reference do
  alias Arcology.Repo
  use Ecto.Schema
  import Ecto.Query
  import Arcology.Roam

  @primary_key {:ref, :string, []}

  schema "refs" do
    # field(:ref, :string, unique: true)
    field(:file, :string)
    field(:type, :string)

    belongs_to :f, Arcology.Roam.File,
               type: :string,
               references: :file,
               foreign_key: :file,
               define_field: false
  end

  @doc "Get a reference map for a Arcology.Roam.File"
  def get_by(file: %Arcology.Roam.File{} = file), do: get_by(file.file|>dequote)

  @doc "Get a reference map for a file path"
  def get_by(file: filename) when is_binary(filename) do
    Repo.one(
      Arcology.Roam.Reference
      |> where(file: ^quote_string(filename))
    )
    |> Repo.preload(:f)
    |> to_map
  end

  @doc "Get an Arcology.Roam.File given a reference"
  def get_by(ref: reference) do
    Repo.one(
      Arcology.Roam.Reference
      |> where(ref: ^quote_string(reference))
    )
    |> Repo.preload(:f)
    |> Map.get(:f)
  end

  def all do
    Repo.all(
      Arcology.Roam.Reference
    )
    |> Repo.preload(:f)
    |> Enum.map(&(to_map(&1)))
  end
  
  def to_map(ref) when is_nil(ref), do: nil
  def to_map(ref) do
    %{
      ref: ref.ref|>dequote,
      type: ref.type|>dequote,
      file: ref.file|>dequote,
    }
  end
end

titles table Arcology.Roam.Title

Files can have multiple titles, on specified by #+TITLE and multiple, each double-quoted, in #+ROAM_ALIAS. The only function provided here is get, which given a file name returns a list of strings. Easy stuff, I don't really want to need to do more than that with this schema.

defmodule Arcology.Roam.Title do
  alias Arcology.Repo
  use Ecto.Schema
  import Ecto.Query
  import Arcology.Roam

  @primary_key false

  schema "titles" do
    field(:file, :string)
    field(:title, :string)

    belongs_to :f, Arcology.Roam.File,
               define_field: false,
               references: :file
  end

  def to_list(titles) do
    titles |> Enum.map(&(Map.get(&1, :title) |> dequote))
  end

  def get(filename) do
    Repo.all(
      Arcology.Roam.Title
      |> where(file: ^quote_string(filename))
    )
    |> to_list
  end

  def from_files(files) when is_list(files) do
    Arcology.Repo.all(
      Arcology.Roam.Title
      |> where([title], title.file in ^files)
    )
  end

  def all() do
    Arcology.Repo.all(
      Arcology.Roam.Title
    )
  end
end

tags table Arcology.Roam.Tag

When I had a forked arcology-db, this table was normalized, a query for the file would return multiple rows, but org-roam upstream stores all the tags in a single column, (print)'d like link properties. This makes the data far less useful for querying than it used to be and I intend to fix that one way or another, maybe I will add a custom tags table like I did for #+KEYWORD caching in org-roam-db.

This table contains file/tag tuples, every file can have many tags. The only code here that is really more interesting than the code above is merge_tags which will take a list of maps returned from the database and flatten them in to a map of file -> list of tags.

The module exposes a very minimal API if this code is being called, the caller gets to work with strings, this is nice. Getting tags through Arcology.Roam.File's association will inevitably leak the structures, and that's probably fine. I hope that I don't end up having to have a bunch of different type-matching functions to make coding "safe", I think I just need to use guards and structure matching more often, I think it's sound if I can adhere to it. haha, what a gotcha!

defmodule Arcology.Roam.Tag do
  alias Arcology.Repo
  use Ecto.Schema
  import Ecto.Query
  import Arcology.Roam
  alias Arcology.Roam.Tag

  @primary_key false

  schema "tags" do
    field(:file, :string)
    field(:tags, :string)

    belongs_to :f, Arcology.Roam.File,
      define_field: false,
      references: :file
  end

  def get(filename) do
    Repo.all(
      Tag |> where(file: ^quote_string(filename))
    )
    |> process_tags_sexp()
    #|> merge_tags()
    #|> Map.get(filename)
  end

  def all do
    Repo.all(Tag)
    |> process_tags_sexp()
    # |> merge_tags
  end

process_tags_sexp takes a list of Tag objects, parses their tags s-expression in to a list, and then flattens that. I really want to fix this as described at the top of the heading; using the s-expression parser is not something I want to do this often!

def process_tags_sexp(%Arcology.Roam.Tag{} = tag), do: process_tags_sexp([tag])
def process_tags_sexp(_results = []), do: []
def process_tags_sexp(results) when is_nil(results), do: []

@doc "extract and process tags from s-expression"
def process_tags_sexp(results) do
  for tag <- results do
    {:ok, parsed} = Arcology.Roam.parse_sexp(tag.tags)
    parsed
  end
  |> Enum.flat_map(& &1)
  |> MapSet.new()
  |> MapSet.to_list()
end

The only thing really in need of a lot of explanation is this merge_tags/1, currently unused but maintained for when my tags are properly normalized. merge_tags/1 takes a list of Arcology.Roam.Tag objects and returns a map of filename -> list of strings.

Simple stuff, but a bit obtuse. It's a recursive thing, it uses Map.update to either create a list with a tag string in it, or update the map entry with the tag appended. It uses the nifty head/tail decomposition pattern that Elixir has2.

#  @doc "Convert a list of Ecto results to a keyword list"
#  def merge_tags(the_list) do
#    merge_tags(the_list, %{})
#  end
#
#  defp merge_tags([%Tag{file: file, tag: tag} | rest], accumulator) do
#    merge_tags(
#      rest,
#      Map.update(
#        accumulator,
#        file|>dequote,
#        [tag|>dequote],
#        fn existing ->
#          existing ++ [tag|>dequote]
#        end
#      )
#    )
#  end
#
#  defp merge_tags([], accumulator), do: accumulator
end

Footnotes