1
0
Fork 0
arcology-elixir/arcology_page.org

16 KiB
Raw Permalink Blame History

Arcology Page Module

The Arcology Roam Models provide Ecto support for the arcology-db, including Arcology.Roam.File which provides associations to the full data-model but is not designed to be programmed against. And since these are read-only concerns, a "smarter" structure can be built without having to worry about moving between the facile interface and the data models. This structure contains the relations in easy to use fashions. The titles are a list of strings instead of Arcology.Roam.Titles, y'know, stuff like that. I do wonder how to implement some sort of "proxy" interface like this which can create better views of data, while (somehow) supporting write-backs.

defmodule Arcology.Page do
  alias Arcology.Roam.{Keyword, Reference, Tag, Title}
  require Logger

  defstruct [
    :file, :file_path, :route, :key,
    :keywords, :backlinks, :reference, :tags, :titles,
    :html, :html_status, :backlinks_html, :backlinks_status
  ]

  <<page_from_file>>

  <<page_resolve_route>>
  <<page_resolve_path>>

  <<page_html>>
  <<page_rewrite_local>>
end

Creating Page Structures

Getting an Arcology.Page from an Arcology.Roam.File is done with Arcology.Page.from_file/1 here. I tried as much as possible to load the pre-loaded entities on the File object, backlinks are loaded by a second query in [[file:arcology_roam.org][Arcology.Roam.Link]].files/1. It would be really nice to make this function accept a File object, but adding features to this is multiplicative until I re-implement this in a way that can be extended more easily.

@doc "return struct from Arcology.Roam.File"
def from_file(%Arcology.Roam.File{} = file) do
  # file_name = File.get_name(file)
  preloaded = file |> Arcology.Roam.File.preloads() 
  %Arcology.Page{
    file: preloaded,
    key: Keyword.from_file(preloaded, "ARCOLOGY_KEY"),
    keywords: Keyword.from_file(preloaded),
    backlinks: Arcology.Roam.Link.files(to: preloaded),
    reference: preloaded.reference |> Reference.to_map,
    tags: preloaded.tags |> Tag.process_tags_sexp,
    # tags: preloaded.tags |> Tag.merge_tags |> Map.get(file_name),
    titles: preloaded.titles |> Title.to_list,
  }
  |> Arcology.Page.resolve_path
  |> Arcology.Page.resolve_route
end

These functions break an ARCOLOGY_KEY in to its site and page constituents. This is used by the page router.

def resolve_route(%Arcology.Page{} = page) do
  %Arcology.Page{ page | route: split_route(page.key) }
end

def split_route(key) when is_nil(key), do: []
def split_route key  do
  [site, path] = String.split(key, "/", parts: 2)
  [site: site, path: path]
end

resolve_path/1 sticks the full file path in to an Page, based on the dynamic configuration entry in Arcology Phoenix.

def resolve_path(%Arcology.Page{} = page) do
  %Arcology.Page{page | file_path: Arcology.Roam.File.get_name(page.file)}
end

Tests for the Arcology Pages use this page itself, and probably needs to be updated when the syntax or structure of the project changes. That's fine. I'm going to sort of "paper over" the Arcology Roam Models, I think, and focus on testing this module, and the other things which use the Ecto models. At the end of the day, the Ecto code is mostly automatically generated and boiler-plate. Obviously, I need to test more than the "loading to Page model" code-paths, but I think I will do those in the Phoenix layers rather than the Ecto layers. The pages in this project are the test suite in the same way the code is, but the tests also use the metadata of the pages themselves, so they're going to be really sensitive to the overall project architecture, and this will probably be considered a mistake later on…1

defmodule ArcologyPageTestFromFile do
  use ExUnit.Case

  setup do
    :ok = Ecto.Adapters.SQL.Sandbox.checkout(Arcology.Repo)
  end

  test 'from_file simple loading' do
    this_page =
      "arcology_page.org"
      |> Path.expand()
      |> Arcology.Roam.File.get()
      |> Arcology.Page.from_file()

    assert this_page.key == "arcology/page"
    assert length(this_page.keywords) == 1
    assert this_page.tags |> Enum.at(0) == "Arcology"
    assert length(this_page.backlinks) == 5
    assert length(this_page.titles) == 2

    assert this_page.route[:site] == "arcology"
    assert this_page.route[:path] == "page"

    assert this_page.file_path == "/home/rrix/org/arcology/arcology_page.org"

    file = this_page.file |> Arcology.Roam.File.preloads()
    assert length(file.links_from) == 13
  end
end

HTML Rendering

Here is where I start to hit a problem, I need to build a URL rewriter, and an interface for it. Pandoc does not rewrite URLs when it compiles the documents and I choose to do this myself in Elixir in spite of Elixir's bad reputation for string processing. It's happening in regular expressions and using IO Lists under the hood, and so it's not unreasonable to do this way.2

I'm not really sure this should live here as opposed to in a view module or something? it seems weird to be rendering HTML so far from the edge, but it's "kind of" by design, after all. I guess I could push it all the way out to the edge and use Phoenix's view template caching instead but golly

But I'm still at an impasse on how to structure all of this I need a router that lives in its own module, most likely, so that I can encapsulate the complexity involved in having subdomain-based routing in production, and not with local development. I guess I could fake it with DNS and only have a single router, relying on my VPN DNS for development resolution … probably bad ideas, but largely feasible!

I still have only built the "local rewriter" in the existing MVP, which I pull in here.

This code uses memoize to cache the result of a Panpipe call because it's a fairly expensive process involving an external Linux process, dark arts, lazy evaluation, and a functional Pandoc installation. Arcology.Page.resolve_html returns a Page with the page HTML included, memoizing the HTML output with Arcology.Roam.File hash for invalidation. Arcology.Page.resolve_backlinks_html returns a Page with the page's backlinks, but does not do proper cache invalidation right now, it needs to cache the hash of all the files included in the backlink template, not the hash of the "target" file.

use Memoize

defmemo compiled_html(path, _hash) do
  Panpipe.pandoc(
    input: path,
    to: :html,
    from: :org
  )
end

def resolve_html(%Arcology.Page{} = page) do
  page = pre_process_page_for_pandoc(page)
  case res = compiled_html(page.file_path, Arcology.Roam.File.get_hash(page.file)) do
    {:ok, html} -> %Arcology.Page{page | html_status: :raw, html: html}
    {:error, _} -> res
  end
end

# not sure entirely what the args should be here yet.
defmemo compiled_backlinks(links, _hash) do
  preloaded =  links |> Arcology.Repo.preload(from_file: :titles)
  content =
    for link <- preloaded do
      path = Arcology.Roam.File.get_name(link.from_file)
      title = Enum.at(link.from_file.titles, 0).title
      content = Arcology.Roam.Link.get_content(link)

      """
      ,*** in [[file:#{path}][#{title}]]
      ,#+begin_quote
      #{content}
      ,#+end_quote
      """
    end
    |> Enum.join("\n")

  Panpipe.pandoc(content,
    from: :org,
    to: :html,
    metadata: "pagetitle=''",
    standalone: true
  )
end

def collect_link_hashes(%Arcology.Page{backlinks: backlinks}) do
  backlinks
  # returns a file
  |> Enum.map(&Map.get(&1, :from_file))
  |> Enum.map(&Arcology.Roam.File.get_hash(&1))
  |> MapSet.new()
  |> MapSet.to_list()
end

def resolve_backlinks_html(%Arcology.Page{} = page) do
  backlink_hashes = collect_link_hashes(page)

  case res = compiled_backlinks(page.backlinks, backlink_hashes) do
    {:ok, html} -> %Arcology.Page{page | backlinks_status: :raw, backlinks_html: html}
    {:error, _} -> res
  end
end

pre_process_page_for_pandoc/1 is a last-ditch effort to make changes to the org mode source before rendering in Pandoc and just calls in to the clean_up_org_fc/1 function which tries to make my SRS cards legible.

@doc "this works by returning a modified Page with a new file_path!"
def pre_process_page_for_pandoc(%Arcology.Page{} = page) do
  tmp_file_name = "/tmp/arcology-" <> (:crypto.hash(:sha256, page.key) |> Base.url_encode64()) <> ".org"
  File.open(page.file_path, [:read], fn file ->
    org_string = IO.read(file, :all) |> clean_up_org_fc()
    File.open(tmp_file_name, [:write], fn tmpfile ->
      IO.write(tmpfile, org_string)
    end)
  end)
  %Arcology.Page{page | file_path: tmp_file_name}
end

The thing I am most interested in checking here is in cache eviction, and that's gonna be a fucking pain in the ass, I guess. This tests implicitly the arcology-db codepaths that generate hashes, too. I'm sure that using System.cmd in tests to shell out to a fucking shell pipeline is a pattern that is wrought with chaos, but for now it'll do. As long as I'm reaching for the system hashing library, I might as well reach for a string-processing wrench while I'm in there! I do like these sorts of functional "reach in to the system" type of tests validating against the actual state of the files on-disk wherever possible. Ultimately, the cost of adding these utilities to every development environment is not worth spending a lot of time worrying about.

defmodule ArcologyPageTestPandocCompiler do
  use ExUnit.Case

  setup do
    :ok = Ecto.Adapters.SQL.Sandbox.checkout(Arcology.Repo)
  end

  test 'collect_link_hashes returns reasonable data' do
    this_page =
      "arcology_page.org"
      |> Path.expand()
      |> Arcology.Roam.File.get()
      |> Arcology.Page.from_file()

    hashes = Arcology.Page.collect_link_hashes(this_page)
    cmd = ~s(git ls-files | grep 'org$' | xargs sha1sum | awk '{print $1}')
    {cmd_out, 0} = System.cmd("bash", ["-c", cmd]) 
    system_hashes = cmd_out |> String.split("\n")

    assert Enum.all?(hashes, fn hash -> Enum.member?(system_hashes, hash) end)
  end
end

Now this provides the basic HTML it doesn't have the "smart" links in it, the ones based on ARCOLOGY_KEY keywords in the document rather than local file paths. This string is of the format site/path, site is one of a number of simple mnemonics I use, which map one-to-one with domains I own.

localize_urls/1 implements a simple state-machine around Page objects, there is an html_status key in the Page which tracks whether the links have already been localized. The is_binary implementation of localize_urls runs a regexp search and replace calling in to rewrite_local/2 which does the actual rewrite. Right now this has to do a database query but I intend to rewrite this to not need that; to perhaps pass in all of the keywords pulled from the database at once, a single query that can be cached between the possibly numerous calls to rewrite_local/2.

rewrite_local/2 is the point where I will swap in a "production" URL generator eventually, this code creates a domain-less absolute-url of the form /${ARCOLOGY_KEY}.html and in the production case it'll be ${DOMAIN_FOR_KEY}/${path}.html the domain is from the one-to-one mapping mentioned above from the site part of the arcology key, and the rest is the path portion. I add html suffixes but I may not in the future. This is largely an aesthetic choice. Oh, and if there is not an ARCOLOGY_KEY for a linked org-mode file, a stub link is generated with a CSS class on it.

def with_localized_html(%Arcology.Page{html_status: nil} = page), do: localize_urls(page)

@doc "This function changes the pandoc-output URLs in to site/key URLs for local wiki"
def localize_urls(html, relative_to) when is_binary(html) do
  Logger.debug("string")

  Arcology.Page.expand_link_paths(html, relative_to)
  |> Arcology.LinkRouter.Local.normalize_urls()
  |> Arcology.Page.clean_up_org_fc()
end

def localize_urls(%Arcology.Page{html_status: nil} = page) do
  Logger.debug("nil")
  page
  |> resolve_html
  |> resolve_backlinks_html
  |> localize_urls
end

def localize_urls(%Arcology.Page{html_status: :localized, backlinks_status: :localized} = page) do
  Logger.debug("pass")
  page
end

def localize_urls(%Arcology.Page{html: html, html_status: :raw} = page) when is_binary(html) do
  Logger.debug("localized")
  %Arcology.Page{page |
     html: html|>localize_urls(page.file_path),
     html_status: :localized,
     backlinks_html: page.backlinks_html|>localize_urls(page.file_path),
     backlinks_status: :localized,
  }
end

expand_link_paths/2 is responsible for re-writing links from relative file URIs to absolute paths for the link rewriter. The use of the sigil strings is a bit unfortunate, i choose to go this way because escaping quote marks is somehow less aesthetically pleasing to me. sorry.

This works, for the most part; right now, the backlinks html can sometimes render incorrectly, where links in the content will not resolve properly. This is fine, you can click the title to click through, and then the links work. When this code works, it should be moved in to the memoize calls for resolve_html and resolve_backlinks_html defined above.

def expand_link_paths(html, relative_path) do
  Regex.replace(
    ~r/<a href="([~\.0-9a-zA-Z_\- \/]+.org)">/,
    html,
    fn _match, path ->
      expanded_path =
        Path.expand(
          path,
          Path.dirname(relative_path)
        )
      ~s(<a href=") <> expanded_path <> ~s(">)
    end
  )
end
defmodule TestExpandLinkPaths do
  use ExUnit.Case

  setup do
    :ok = Ecto.Adapters.SQL.Sandbox.checkout(Arcology.Repo)
  end

  test "best-case expand_link_paths validations" do
    html = """
    <a href="bingus.org">bingus</a>
    <a href="../bangus.org">bangus</a>
    <a href="../beep/bongus.org">bongus</a>
    """

    relative_path = "/home/wonka/factory/"

    expanded = Arcology.Page.expand_link_paths(html, relative_path)

    assert expanded =~ "/home/wonka/factory/bingus.org"
    assert expanded =~ "/home/wonka/bangus.org"
    assert expanded =~ "/home/wonka/beep/bongus.org"
  end
end

clean_up_org_fc/1 takes an org-mode input and removes all the SRS metadata from it. org-fc uses a specialized markup for "clozing" parts of the text for quizzing, and stores metadata in a drawer under the entry which Pandoc renders by default. It would be nice to do something fancy with the clozes but for now I want to just make it legible. This is a pretty awful soup of escapes and regular expressions though. the @@html syntax is used by Org to escape the HTML3.

def clean_up_org_fc(input_org) do
  without_drawers = Regex.replace(
    ~r/:REVIEW_DATA:.*:END:/smU,
    input_org,
    &normalize_individual_org_fc(&1, &2)
  )

  without_clozes = Regex.replace(
    ~r/{{([^}]+)}({.*})?@([0-9])}/uU,
    without_drawers,
    &normalize_cloze(&1, &2, &3, &4)
  )

  without_clozes
end

def normalize_individual_org_fc(_match, capture), do: ""

def normalize_cloze(_match, first, optional_hint, position) do
  ~s(@@html:<span class="cloze" data-cloze=#{position} title="#{optional_hint}">#{first}</span>@@)
end

NEXT paragraph anchors within text bodies

NAME keywords may do this, but make sure.

Footnotes


1

open thread on whether this idea of Literate Programming meta-programing is good or not. might defeat the purpose, making the tests really brittle and make me unwilling to move code around or re-structure the doc to be more accessible.