phextv1.knowledge: phext-based-research

first things first - we need to add a category: foundations. this lays the groundwork for new knowledge to be presented in the results block.

foundations
-----------

first: titles should be written by humans and easily accessible, as they provide a succinct author-chosen name for the blob of research. no other process is going to be better at summarizing the research than this person or group. the title is the first line of the metadata block.

an abstract can be easily generated and helps readers determine if they want to spend time studying the paper in detail. it is clearly just metadata. ideally our gpt4-era tools should synthesize it.

the introduction should establish the context required for understanding it. although important, this is also metadata. we expect authors to insert research in relation to existing results, so our choice of hashing algorithm will be essential.

methods assist other researchers with reproducing the results. in a modern context, this is probably code. if there's no software available for your area of expertise, you should probably be building it.

results are the core output of research, and help the reader understand the impact of what was studied. ideally, all research should extend our knowledge graph. this is the value we seek. we thus call results by a simpler name: knowledge.

discussion helps orient research within the context of related / contemporary work. this could include feedback from reviewers and surveys of the knowledge graph conducted before and after research was done (prior to publishing). as useful as it is, discussion is just metadata.

tables and figures help keep the text stream free of obstacles to parsing and study. in the context of phext, which is a text-only format by design, these categories will often contain hyperlinks to media stored alongside phext. we could potentially use a base64 data encoding for binary information, though. we'll bundle these items as resources.

an appendix helps organize all of the new information presented. new terms and context that is helpful, but not directly related to the research, is given here. we will include the appendix within metadata.

citations are probably the worst part of 1970s era research. with modern hypertext systems, we have the opportunity to do better. the world uses links, so citations will be done via both intralinks within the context of phext and normal hyperlinks to external content. citations become a research for phexters to mine and collect.

thus, we will use the following 3-chunk of scrolls to describe all phext-based research. simply put, this means that we will treat scroll numbers as being written with base-3 indexing. as additional papers are slotted into a section, they will use scrolls in groups of 3 in this order. the particular format of your scrolls is left to you. we may revise this format in the future to standardize processing.

i suggest using something easy to parse, though, which is what this baseline document uses. We need to calculate 6 phext coordinates (A, B, C, D, E, and F). I've annotated which parts of the document are used for this generation process below.

* metadata (0)
  * title - A
  * abstract - B
  * intro
  * methods - C
* knowledge (1)
  * appendix
  * code - D
  * results - E
  * resources - F
* links (2)
  * discussion
  * references
  * citations

this is a reference implementation of phexty.ps1 in the code section of this phext. you can invoke it with the path to your raap file to produce a unique fingerprint of your research.

> ./phexty.ps1 rapp.phext
Input: raap.phext
Scrolls: 3
Title: raap - research as a phext
Phext Address: 1.1.2/31.98.7/44.95.4

Note that you can't refer to a phext address while editing a component that the address depends upon - so that will take some getting used to.

as a reminder, you can edit the following areas of a phext research document without impacting the fingerprint:
* intro
* foundations
* appendix
* discussion
* references
* citations

phextv1.code
------------

note: you can place comments in the code region prior to the start of a script. they will be ignored.
once you start extracting files, you should assume that all content will be exported.

# phextv1.file: extract.ps1
# Copyright: (c) 2024 Will Bickford
# License: MIT
# Purpose: extracts scripts from phext research documents
param(
  [string] $file,
  [switch] $force
)
if (-not (Test-Path $file)) {
    Write-Error "Unable to locate $file."
    exit(1)
}
$SCROLL_BREAK = [char]0x17
$scrolls = (Get-Content -raw $file) -split $SCROLL_BREAK
if ($scrolls.Count -ne 3) {
  Write-Error "Invalid phext research file"
  exit 1
}
$scroll1 = $scrolls[1] -split '\n'
$lines = $scroll1.Count
Write-Host "Scanning $lines for code..."

if ($lines -gt 1000000) {
  Write-Error "refusing to parse your spew"
  exit 1
}

$parsing = $false
$script = ""
$output = @{}
foreach ($line in $scroll1) {
  if ($line -match "^phextv1.code$") {
    $parsing = $true
  }
  if ($line -match "^phextv1.results$") {
    $parsing = $false
  }

  if ($line -match "^# phextv1.file: ") {
    $script = $line -replace "^# phextv1\.file: ",""    
    $output[$script] = @()    
    continue
  }

  if ($parsing) {
    $output[$script] += $line
    continue
  }
}

if ($output.Keys.Count -gt 0) {
  foreach ($script in $output.Keys|Sort-Object) {    
    $data = $output[$script]
    if ($force -or (-not (Test-Path $script))) {
      $data |Out-File -Encoding utf8 $script
      Write-Host "Wrote $($data.Count) lines to $script."
    } else {
      if (-not $force) {
        Write-Host "Warning: $script already exists. Use -force to overwrite."
      }
    }
  }
}

# phextv1.file: phexty.ps1
# ----------
# Copyright: (c) 2024 Will Bickford
# License: MIT
# Purpose: assists with calculating phext research addresses
#
# the phext address space for research is rooted at library=1, shelf=1, and series=1 (1.1.1/yyy.xxx).
#
# Dimensions: 1.1.1/A.B.C/D.E.F
#   A: phext-soundex(title)
#   B: phext-soundex(abstract)
#   C: phext-soundex(methods)
#   D: phext-soundex(code)
#   E: phext-soundex(results)
#   F: phext-soundex(resources)

param(
  [string] $file,
  [switch] $verbose
)
$raap = "1.1.2" # reserved by @wbic16
$SCROLL_BREAK = [char]0x17

if (-not (Test-Path $file)) {
    Write-Error "Unable to locate $file."
    exit(1)
}

# inspired by https://sites.rootsweb.com/~nedodge/transfer/soundexlist.htm
function soundex($byte, $lookup, $increment) {
  for ($i = 0; $i -lt $lookup.Length; ++$i)
  {
    if ($byte -eq $lookup[$i]) { return $increment }
  }
  return 0
}

# computes a phext sub-coordinate (recursive soundex modulo 100)
function phexty-soundex-v1($text) {
  if (-not $text) { return 1 }

  $trim = $text.trim()
  $max = $trim.Length
  if ($max -eq 0) { return 1 }

  $lower = $trim.ToLower()
  $value = 0
  $letter1 = "bpfv"
  $letter2 = "cskgjqxz"
  $letter3 = "dt"
  $letter4 = "l"
  $letter5 = "mn"
  $letter6 = "r"
  for ($i = 0; $i -lt $max; ++$i) {
    $byte = $lower[$i]
    $value += soundex $byte $letter1 1
    $value += soundex $byte $letter2 2
    $value += soundex $byte $letter3 3
    $value += soundex $byte $letter4 4
    $value += soundex $byte $letter5 5
    $value += soundex $byte $letter6 6
  }
  $value = $value % 100
  return $value
}

$scrolls = (Get-Content -raw $file) -split $SCROLL_BREAK
$scroll_count = $scrolls.Count

if ($scroll_count -ne 3) {
    Write-Error "Invalid phext research stream - you must provide exactly 3 scrolls (found: $($scroll.Count))."
    exit 1;
}

# ------------------------------------------------------------------------------------------------------------

Write-Host "input: $file"
Write-Host "scrolls: $scroll_count"

$have_title = $false
$have_abstract = $false
$have_methods = $false
$have_code = $false
$have_results = $false
$have_resources = $false

# ------------------------------------------------------------------------------------------------------------

$title = ""
$abstract = @()
$methods = @()

$parsing_abstract = 0
$parsing_intro = 0
$parsing_methods = 0
$BLOB_HEADER = "^----"

$scroll0 = $scrolls[0] -split '\n'

foreach ($line in $scroll0) {
  $trim = $line.trim()
  if ($trim.Length -eq 0) { continue }

  # parse title
  if (-not $have_title -and $line -match "^phextv1\.title:") {
    $have_title = $true
    $title = $line -replace "^.*title:","" -split '\n'
    $title = $title[0]
    continue
  }

  # parse abstract
  if (-not $have_abstract) {
    if ($line -match "^phextv1\.abstract") {
      $parsing_abstract = 1
      continue
    }
    if ($line -match $BLOB_HEADER) {
      $parsing_abstract = 2
      continue
    }

    if ($line -match "^phextv1\.intro") {
        $have_abstract = $true
        $parsing_intro = 1
        continue
    }

    if ($parsing_abstract -eq 2) {
        $abstract += $line
        continue
    }
  }

  # parse methods
  if (-not $have_methods) {
    if ($line -match "^phextv1\.methods") {
        $parsing_methods = 1
        continue
    }
    if ($line -match $BLOB_HEADER) {
      $parsing_methods = 2
      continue
    }

    if ($parsing_methods -eq 2) {
        $methods += $line
        continue
    }
  }
}
$have_methods = $methods.Count -gt 0

# ------------------------------------------------------------------------------------------------------------
$parsing_code = 0
$parsing_results = 0
$parsing_resources = 0

$code = @()
$results = @()
$resources = @()

$scroll1 = $scrolls[1] -split '\n'
foreach ($line in $scroll1) {
  $trim = $line.trim()
  if ($trim.Length -eq 0) { continue }

  if (-not $have_code) {
    if ($line -match "^phextv1\.code") {
      $parsing_code = 1
      continue
    }

    if ($parsing_code -eq 1 -and $line -match $BLOB_HEADER) {
      $parsing_code = 2
      continue
    }

    if ($line -match "^phextv1\.results$") {
      $have_code = $true
      $parsing_results = 1
      continue
    }

    if ($parsing_code -eq 2) {
      $code += $line
      continue
    }
  }

  if (-not $have_results) {
    if ($parsing_results -and $line -match $BLOB_HEADER) {
      $parsing_results = 2
      continue
    }

    if ($line -match "^phextv1\.resources$") {
      $have_results = $true
      $parsing_resources = 1
      continue
    }

    if ($parsing_results -eq 2) {
      $results += $line
    }
  }

  if (-not $have_resources) {
    if ($parsing_resources -eq 1 -and $line -match $BLOB_HEADER) {
        $parsing_resources = 2
        continue
    }

    if ($parsing_resources -eq 2) {
      $resources += $line
    }
  }
}
$have_results = $results.Count -gt 0
$have_resources = $resources.Count -gt 0

# phext research addresses are of the form:
# 1.1.2/collection.volume.book/chapter.section.scroll
# cn.vm.bk/ch.sn.sc
$ok = $true
if (-not $have_title) {
  Write-Error "missing title"
  $ok = $false
}
if (-not $have_abstract) {
  Write-Error "missing abstract"
  $ok = $false
}
if (-not $have_methods) {
  Write-Error "missing methods"
  $ok = $false
}
if (-not $have_code) {
  Write-Error "missing code"
  $ok = $false
}
if (-not $have_results) {
  Write-Error "missing results"
  $ok = $false
}
if (-not $have_resources) {
  Write-Host "warning: no resources found"
}
if (-not $ok) {
  exit 1
}

$cn = phexty-soundex-v1 $title

$abstract_text = $abstract -join '\n'
$vm = phexty-soundex-v1 $abstract_text

$book_text = $methods -join '\n'
$bk = phexty-soundex-v1 $book_text

$code_text = $code -join '\n'
$ch = phexty-soundex-v1 $code_text

$results_text = $results -join '\n'
$sn = phexty-soundex-v1 $results_text

$resources_text = $resources -join '\n'
$sc = phexty-soundex-v1 $resources_text

Write-Host "title: $title"
Write-Host "phext address: $raap/$cn.$vm.$bk/$ch.$sn.$sc"

if ($verbose) {
  Write-Host $abstract
}

phextv1.results
---------------

now that we have a blueprint for research formatting in phext, what we need next is a hashing algorithm that fits the bill. for now, i'm using a simple deterministic process. there is surely a better algorithm, but we don't need phext addresses to have any particular meaning - in fact it may help research to explicitly avoid existing forms of organization (to find novel connections we weren't aware of before).

here is the recipe - encoded in phexty.ps1 (above).

1. take all the words in each scroll, and convert them to lowercase text, then run them through a soundex (after removing all punctuation).
2. collapse those symbols into a single soundex by removing the commas and running it again (modulo arithmetic)
3. run this process for each of the main categories listed in the APA style guide.
4. use that information to generate a coordinate in phext subspace rooted at 1.1.2/*/*.
5. place your research there.

note: this reference phext exists in the 1.1.1/*/* root space. as we have not started exocortex construction yet, it is a pre-merged phext.

QED

phextv1.resources
-----------------

* see phexty.ps1 in the code section for a self-extracting powershell script :)

