doc/unitproc/bkshuf/article.tm

   1 <TeXmacs|2.1.1>
   2
   3 <style|generic>
   4
   5 <\body>
   6   <doc-data|<doc-title|bkshuf: A Shuf-Like Utility with Pre-Image Resistance
   7   and Relative Order Preservation for Random Sampling of Long
   8   Lists>|<doc-author|<author-data|<author-name|Steven Baltakatei
   9   Sandoval>>>|<doc-date|2023-02-14T13:56+00>|<doc-misc|CC BY-SA 4.0>>
  10
  11   <section|Summary>
  12
  13   <shell|bkshuf> is a <shell|shuf>-like utility designed to output randomly
  14   sized groups of lines with a group size distribution centered around some
  15   characteristic value.
  16
  17   <section|Objective>
  18
  19   The author desires to create a <shell|shuf>-like utility named
  20   <shell|bkshuf> to mix line lists in order to produce output line lists with
  21   the following somewhat conflicting properties:
  22
  23   <\description>
  24     <item*|Pre-image resistance (PIR)>An output line's position should not
  25     contain information about its input line position.
  26
  27     <item*|Relative order preservation (ROP)>Two neighboring lines in the
  28     input stream should have a high probability of remaining neighbors in the
  29     output stream.
  30   </description>
  31
  32   The objective is to improve the value of a short random scan of a small
  33   fraction of a potentially large input list; output that demonstrates ROP as
  34   well as some degree of PIR may achieve this objective. In contrast, the
  35   <shell|shuf> utility provides PIR but no ROP: a line's neighbor in the
  36   output of <shell|shuf> is equally likely to be any other random line from
  37   the input.\
  38
  39   In other words, output produced by <shell|bkshuf> should group together
  40   sequential segments of the input lines in order to partially preserve
  41   relationships that may exist between sequential files. For example, this
  42   could be done by jumping to a random position in the input lines, consuming
  43   (i.e. reading, outputting, and marking a line not to be read again) some
  44   amount of sequential lines, then repeating the process until every line is
  45   consumed. The amount of sequential lines to read between jumps affects how
  46   well the above desired properties are satisfied.
  47
  48   The objective of <shell|bkshuf> is not to completely prevent the
  49   possibility of reassembling the input given the output. Additionally, a
  50   valuable property desired of <shell|bkshuf> is output which demonstrates
  51   sufficiently high PIR compared to ROP such that only a short (compared to
  52   the logarithm of the input list size) sequential scan of the output list
  53   from a random starting position is required before a jump to a new group is
  54   is encountered; this would permit the overal contents of very large input
  55   line lists to be sampled.
  56
  57   <section|Design>
  58
  59   <subsection|Definitions>
  60
  61   <\eqnarray*>
  62     <tformat|<table|<row|<cell|l>|<cell|:>|<cell|<text|number of
  63     lines>>>|<row|<cell|l<rsub|<text|in>>>|<cell|:>|<cell|<text|input line
  64     count>>>|<row|<cell|l<rsub|<text|out>>>|<cell|:>|<cell|<text|output line
  65     count>>>|<row|<cell|c>|<cell|:>|<cell|<text|target group
  66     count>>>|<row|<cell|s>|<cell|:>|<cell|<text|target group
  67     size>>>|<row|<cell|p<rsub|<text|seq>><rsub|<text|>>>|<cell|:>|<cell|<text|probability
  68     to include next sequential line>>>|<row|<cell|s<around*|(|l<rsub|<text|in>,0>|)>>|<cell|:>|<cell|<text|<math|target
  69     group size parameter>>>>|<row|<cell|l<rsub|<text|in>,0>>|<cell|:>|<cell|<text|input
  70     line count parameter>>>>>
  71   </eqnarray*>
  72
  73   <subsection|Process>
  74
  75   <\enumerate-numeric>
  76     <item>Acquire and count input lines (via <shell|/dev/stdin> or positional
  77     arguments).
  78
  79     <item>Calculate line count <math|l<rsub|<text|in>>> .
  80
  81     <item>Calculate target group size <math|s>.
  82
  83     <item>Select random unconsumed input line and consume it to
  84     output.<label|jump-to-random>
  85
  86     <item>Consume the next sequential line with probability
  87     <math|p<rsub|<text|seq>>>. Otherwise if some input lines remain
  88     unconsumed, go to step <reference|jump-to-random>. Otherwise, exit.
  89   </enumerate-numeric>
  90
  91   <subsection|Parameter analysis>
  92
  93   <subsubsection|Target group size calculation>
  94
  95   The simultaneous presence of ROP and PIR properties in the output depends
  96   upon the amount of sequential lines that are read before <shell|bkshuf>
  97   jumps to a new random position in the input list. This amount is the
  98   <em|target group size>, <math|s>; it is the \Ptarget\Q since <math|s>
  99   represents the average of a distribution of group sizes that may be
 100   selected, not a single group size. In this analysis, the total number of
 101   lines in the input list is <math|l<rsub|<text|in>>>. For small input line
 102   counts, (e.g. <math|l<rsub|<text|in>>\<cong\>10>) the target group size
 103   should be nearly one (e.g. <math|s\<cong\>1>) since group sizes any larger
 104   than this would have almost no PIR (e.g. a group size of <math|s=8> for
 105   <math|l<rsub|<text|in>>=10> would be 80% identical to the input). For
 106   modest line input counts (e.g. <math|l<rsub|<text|in>>\<cong\>100>), the
 107   target group size may be allowed to be larger, such as a tenth of the input
 108   line count (e.g. <math|s\<cong\>10>); this would provide some PIR
 109   (approximately <math|10!> permutations between the approximately
 110   <math|<frac|l<rsub|<text|in>>|s>\<cong\><frac|100|10>\<cong\>10> groups)
 111   while each line in groups around size <math|10> would have a low
 112   probability of not being next to its neighbor (<math|8> of the 10 lines
 113   would retain the same two neighbors while the two ends would retain one
 114   each). For very large input line counts (e.g.
 115   <math|l<rsub|<text|in>>\<cong\>1<separating-space|0.2em>000<separating-space|0.2em>000>),
 116   however, breaking up and randomizing the input into ten groups of
 117   <math|100<separating-space|0.2em>000> offers very little PIR; the benefit
 118   of the very high ROP is also lost since sequential scanning of tens of
 119   thousands of lines is required before a random jump to a new group may be
 120   encountered; therefore, the target group size should be a much smaller
 121   fraction of <math|l<rsub|<text|in>>>, (e.g. <math|s\<cong\>20>) while still
 122   increasing. The relationship between a desireable target group size
 123   <math|s> and the input line count <math|l<rsub|<text|in>>> is non-linear.
 124   The author believes a reasonable approach is to scale the group size to the
 125   logarithm of input line count.
 126
 127   <hgroup|Figure <reference|fig ex-plot-s>> shows an example plot of
 128   <math|s<around*|(|l<rsub|<text|in>>|)>> that is tuned to achieve a target
 129   group size of <math|s<around*|(|l<rsub|<text|in>>=10<rsup|6>|)>=25> for an
 130   input list length of <math|l<rsub|<text|in>>=10<rsup|6>> lines.\
 131
 132   <\big-figure|<with|gr-mode|<tuple|edit|point>|gr-frame|<tuple|scale|1.18922cm|<tuple|0.299593gw|0.120812gh>>|gr-geometry|<tuple|geometry|1par|0.6par>|gr-grid|<tuple|cartesian|<point|0|0>|1>|gr-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-aspect|<tuple|<tuple|axes|none>|<tuple|1|none>|<tuple|10|none>>|gr-edit-grid|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-grid-aspect-props|<tuple|<tuple|axes|#808080>|<tuple|1|#c0c0c0>|<tuple|10|pastel
 133   blue>>|gr-grid-aspect|<tuple|<tuple|axes|#808080>|<tuple|1|#c0c0c0>|<tuple|10|pastel
 134   blue>>|magnify|1.18920711463847|gr-auto-crop|false|<graphics||<math-at|5|<point|-0.221848749616356|1.0>>|<math-at|10|<point|-0.397940008672038|2.0>>|<math-at|15|<point|-0.397940008672038|3.0>>|<math-at|20|<point|-0.397940008672038|4.0>>|<math-at|25|<point|-0.397940008672038|5.0>>|<math-at|30|<point|-0.397940008672038|6.0>>|<math-at|s|<point|0.0719460474170896|6.34360008343183>>|<point|0|0.2>|<point|6|5>|<math-at|10<rsup|1>|<point|1.0|-0.4>>|<math-at|10<rsup|2>|<point|2.0|-0.4>>|<math-at|10<rsup|3>|<point|3.0|-0.4>>|<math-at|10<rsup|4>|<point|4.0|-0.4>>|<math-at|10<rsup|5>|<point|5.0|-0.4>>|<math-at|10<rsup|6>|<point|6.0|-0.4>>|<math-at|1|<point|1.0|-0.8>>|<math-at|2|<point|2.0|-0.8>>|<math-at|3|<point|3.0|-0.8>>|<math-at|4|<point|4.0|-0.8>>|<math-at|5|<point|5.0|-0.8>>|<math-at|6|<point|6.0|-0.8>>|<math-at|x|<point|7.0|-0.8>>|<math-at|l<rsub|<text|in>>|<point|7.0|-0.4>>|<point|5.0|3.5>|<point|4.0|2.3>|<point|3.0|1.4>|<point|2.0|0.7>|<point|1.0|0.3>>>>
 135     <label|fig ex-plot-s>A plot of a possible function that relates target
 136     group size <math|s> and input lines <math|l<rsub|<text|in>>> that provide
 137     some ROP and PIR. The function is tuned to achieve
 138     <math|s<around*|(|l<rsub|<text|in>>=10<rsup|6>|)>=25>.\
 139   </big-figure>
 140
 141   The following is a set of equations that are used to derive a definition
 142   for <math|s<around*|(|l<rsub|<text|in>>|)>> that satisfies the plot in
 143   <hgroup|Figure <reference|fig ex-plot-s>>.\
 144
 145   <\eqnarray*>
 146     <tformat|<table|<row|<cell|x>|<cell|=>|<cell|<text|log>
 147     <around*|(|l<rsub|<text|in>>|)>=<frac|ln
 148     <around*|(|l<rsub|<text|in>>|)>|<text|ln>
 149     <around*|(|10|)>><eq-number><label|eq
 150     rel-x-lin>>>|<row|<cell|10<rsup|x>>|<cell|=>|<cell|l<rsub|<text|in>>>>|<row|<cell|x<rsub|0>>|<cell|=>|<cell|<text|log>
 151     <around*|(|l<rsub|<text|in>,0>|)>=<frac|ln
 152     <around*|(|l<rsub|<text|in>,0>|)>|<text|ln>
 153     <around*|(|10|)>><eq-number><label|eq
 154     rel-x0-lin0>>>|<row|<cell|10<rsup|x<rsub|0>>>|<cell|=>|<cell|l<rsub|<text|in>,0>>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|s<around*|(|x|)>>|<cell|=>|<cell|<around*|(|k**x|)><rsup|2>+1<eq-number><label|eq
 155     gsize-model>>>|<row|<cell|s<around*|(|x=6|)>=25>|<cell|=>|<cell|k<rsup|2>\<cdot\><around*|(|6|)><rsup|2>+1>>|<row|<cell|25>|<cell|=>|<cell|k<rsup|2>\<cdot\><around*|(|36|)>+1>>|<row|<cell|s<around*|(|x<rsub|0>|)>>|<cell|=>|<cell|<around*|(|k*x<rsub|0>|)><rsup|2>+1<eq-number><label|eq
 156     gsize-param-rel>>>|<row|<cell|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>>>|<cell|=>|<cell|k<rsup|2>>>|<row|<cell|k>|<cell|=>|<cell|<sqrt|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>>>>>|<row|<cell|k>|<cell|=>|<cell|<frac|<sqrt|s<around*|(|x<rsub|0>|)>-1>|x<rsub|0>>>>|<row|<cell|k<rsup|2>>|<cell|=>|<cell|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>><eq-number><label|eq
 157     gsize-const-ksq>>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>>|)>\<cdot\><around*|(|<frac|<text|ln><around*|(|l<rsub|<text|in>>|)>|<text|ln>
 158     <around*|(|10|)>>|)><rsup|2>+1>>|<row|<cell|>|<cell|=>|<cell|<around*|(|<frac|<text|ln>
 159     <around*|(|10|)>|ln <around*|(|l<rsub|<text|in>,0>|)>>|)><rsup|2>\<cdot\><around*|(|s<around*|(|x<rsub|0>|)>-1|)>\<cdot\><around*|(|<frac|<text|ln><around*|(|l<rsub|<text|in>>|)>|<text|ln>
 160     <around*|(|10|)>>|)><rsup|2>+1>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|s<around*|(|l<rsub|<text|in>,0>|)>-1|)>\<cdot\><around*|(|<frac|<text|ln><around*|(|l<rsub|<text|in>>|)>|ln
 161     <around*|(|l<rsub|<text|in>,0>|)>>|)><rsup|2>+1>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|s<around*|(|l<rsub|<text|in>,0>|)>-1|<around*|[|ln
 162     <around*|(|l<rsub|<text|in>,0>|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1<eq-number><label|eq
 163     gsize-lin>>>>>
 164   </eqnarray*>
 165
 166   Equation <reference|eq gsize-model> defines a quadratic equation for the
 167   linear range <math|s> and the logarithmic domain <math|x>. <math|x> is
 168   defined in terms of <math|l<rsub|<text|in>>> via a domain transformation
 169   defined by <hgroup|Equation <reference|eq rel-x-lin>>. The result is
 170   <hgroup|Equation <reference|eq gsize-lin>> which defines
 171   <math|s<around*|(|l<rsub|<text|in>>|)>> as a function of
 172   <math|l<rsub|<text|in>>> and parameters
 173   <math|s<around*|(|l<rsub|<text|in>,0>|)>> and <math|l<rsub|<text|in>,0>>.
 174   The parameters define how quickly or slowly the quadratic equation grows.
 175   In other words, if a user wishes for a <math|1<separating-space|0.2em>000<separating-space|0.2em>000>
 176   line input to be split into groups each containing, on average, <math|25>
 177   lines, then they should plug in <math|l<rsub|<text|in>,0>=1<separating-space|0.2em>000<separating-space|0.2em>000>
 178   and <math|s<around*|(|l<rsub|<text|in>,0>|)>=25> into <hgroup|Equation
 179   <reference|eq gsize-lin>> as is done in <hgroup|Equation <reference|eq
 180   gsize-ex-1>>. This equation can then be used to calculate target group
 181   sizes <math|s> as a function of other input line counts
 182   <math|l<rsub|<text|in>>> besides <math|l<rsub|<text|in>>=1<separating-space|0.2em>000<separating-space|0.2em>000>.
 183   For example, plugging <math|l<rsub|<text|in>>=500> into <hgroup|Equation
 184   <reference|eq gsize-ex-1>> yields <hgroup|Equation <reference|eq
 185   gsize-ex-1-lin500>> which specifies a target group size of
 186   <math|5.85629\<cong\>6>.
 187
 188   <\eqnarray*>
 189     <tformat|<table|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|s<around*|(|l<rsub|<text|in>,0>|)>-1|<around*|[|ln
 190     <around*|(|l<rsub|<text|in>,0>|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|25-1|<around*|[|<text|ln
 191     ><around*|(|1<separating-space|0.2em>000<separating-space|0.2em>000|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1<eq-number><label|eq
 192     gsize-ex-1>>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|s<around*|(|l<rsub|<text|in>>=500|)>>|<cell|=>|<cell|<around*|(|<frac|25-1|<around*|[|<text|ln
 193     ><around*|(|1<separating-space|0.2em>000<separating-space|0.2em>000|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|500|)>|]><rsup|2>+1<eq-number><label|eq
 194     gsize-ex-1-lin500>>>|<row|<cell|s<around*|(|l<rsub|<text|in>>=500|)>>|<cell|=>|<cell|5.85629>>>>
 195   </eqnarray*>
 196
 197   \;
 198
 199   <subsubsection|Jump from expected value>
 200
 201   A method <shell|bkshuf> may employ to decide when read the next sequential
 202   unconsumed input line is to simply do so with probability
 203   <math|p<rsub|<text|seq>>> such that the expected value of the average group
 204   size trends towards <math|s>.
 205
 206   <\eqnarray*>
 207     <tformat|<table|<row|<cell|p<rsub|<text|seq>>>|<cell|=>|<cell|<around*|(|1-p<rsub|<text|jump>>|)>>>|<row|<cell|p<rsub|<text|jump>>>|<cell|=>|<cell|1-p<rsub|<text|seq>>>>|<row|<cell|s>|<cell|=>|<cell|<frac|1|p<rsub|<text|jump>>>=<frac|1|1-p<rsub|<text|seq>>><eq-number>>>|<row|<cell|s>|<cell|=>|<cell|<frac|1|1-p<rsub|<text|seq>>>>>|<row|<cell|1-p<rsub|<text|seq>>>|<cell|=>|<cell|<frac|1|s>>>|<row|<cell|p<rsub|<text|seq>>-1>|<cell|=>|<cell|<frac|-1|s>>>|<row|<cell|p<rsub|<text|seq>>>|<cell|=>|<cell|1-<frac|1|s<around*|(|l<rsub|<text|in>>|)>><eq-number><label|eq
 208     pseq-from-s-lin>>>|<row|<cell|p<rsub|<text|jump>>>|<cell|=>|<cell|<frac|1|s<around*|(|l<rsub|<text|in>>|)>><eq-number><label|eq
 209     pjump-from-s-lin>>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|p<rsub|<text|seq>><around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|1-<around*|[|<around*|(|<frac|s<around*|(|l<rsub|<text|in>,0>|)>-1|<around*|[|ln
 210     <around*|(|l<rsub|<text|in>,0>|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1|]><rsup|-1><eq-number><label|eq
 211     pseq-from-s-lin-exp>>>>>
 212   </eqnarray*>
 213
 214   <subsubsection|Jump from random variate of inverse gaussian distribution>
 215
 216   Another method <shell|bkshuf> may employ to decide when to read the next
 217   sequential unconsumed input line is to use an inverse gaussian
 218   distribution. This may be done by generating from the distribution a float
 219   sampled from the inverse gaussian with range 0 to infinity with mean
 220   <math|\<mu\>> whenever a new random position in the input list is selected;
 221   the float is rounded to the nearest integer.<\footnote>
 222     See <name|Michael, John R.> "Generating Random Variates Using
 223     Transformations with Multiple Roots". 1976.
 224     <hlink|https://doi.org/10.2307/2683801|https://doi.org/10.2307/2683801> .
 225   </footnote> Then, after consuming an input line, this integer is
 226   decremented by one and another sequential line is consumed provided the
 227   integer does not become less than or equal to zero. The inverse gaussian
 228   distribution requires specifying the mean <math|\<mu\>> and the shape
 229   parameter <math|\<lambda\>>; a higher <math|\<lambda\>> results in a
 230   greater spread. An upper bound may also be specified since the distribution
 231   has none except for that imposed by its programming implementation.
 232
 233   The result of using an inverse gaussian distribution is an output with
 234   potentially much more regular group sizes than using the previously
 235   mentioned expected value method. However, the implementation of the inverse
 236   gaussian sampling operation described by (Michael, 1976) requires several
 237   exponent calculations and a square root calculation in addition to various
 238   multiplication and division operations. If sufficient processing power is
 239   available, this may not necessarily be an issue.
 240
 241   <subsubsection|Output structure>
 242
 243   Regardless of whether group sizes are determined by the expected value
 244   method or using variates of an inverse gaussian distribution, mimicking the
 245   <shell|shuf> property of all input lines being present in the output,
 246   albeit rearranged, results in a side effect: the first output lines are
 247   more likely to contain groups with uninterrupted sequence runs (high ROP)
 248   while groups in the last output lines are almost certain to contain
 249   sequence jumps within a group (less ROP). The reason for this is that
 250   <shell|bkshuf>, when it encounters an input line that has already been
 251   consumed, will skip to the next available input line. The decision could be
 252   made to skip to a new random line, but, it is simpler to simply read the
 253   next available input line. The author's original intention of sampling only
 254   a short initial portion of the output is compatible with the behavior that
 255   ROP is preserved mostly at the beginning of the output.
 256
 257   <section|Version History>
 258
 259   <\big-table|<tabular|<tformat|<table|<row|<cell|Version
 260   No.>|<cell|Date>|<cell|Path>|<cell|Description>>|<row|<cell|<verbatim|0.0.1>>|<cell|2023-02-14>|<cell|<verbatim|unitproc/bkshuf>>|<cell|Initial
 261   draft implemented in <name|Bash>.>>>>>>
 262     A table listing versions of <shell|bkshuf>.
 263   </big-table>
 264
 265   <\description>
 266     <item*|v0.0.1>Initial implementation in <shell|bash> <verbatim|5.1.16>
 267     with <shell|bc> <verbatim|1.07.1> and <name|GNU Coreutils>
 268     <verbatim|8.32> and tested on <name|Pop!_OS> <verbatim|22.04 LTS>. Saved
 269     to the author's <name|BK-2020-03> repository<\footnote>
 270       See commit <hlink|<verbatim|080ea4c>|https://gitlab.com/baltakatei/baltakatei-exdev/-/blob/080ea4c0ff0d4e6b5ce86f664fa6645c1cb02bf0/unitproc/bkshuf>
 271       at <hlink|https://gitlab.com/baltakatei/baltakatei-exdev|https://gitlab.com/baltakatei/baltakatei-exdev>
 272       .
 273     </footnote>.
 274   </description>
 275 </body>
 276
 277 <initial|<\collection>
 278 </collection>>
 279
 280 <\references>
 281   <\collection>
 282     <associate|auto-1|<tuple|1|1|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 283     <associate|auto-10|<tuple|3.3.3|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 284     <associate|auto-11|<tuple|3.3.4|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 285     <associate|auto-12|<tuple|4|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 286     <associate|auto-13|<tuple|1|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 287     <associate|auto-14|<tuple|1|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 288     <associate|auto-2|<tuple|2|1|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 289     <associate|auto-3|<tuple|3|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 290     <associate|auto-4|<tuple|3.1|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 291     <associate|auto-5|<tuple|3.2|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 292     <associate|auto-6|<tuple|3.3|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 293     <associate|auto-7|<tuple|3.3.1|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 294     <associate|auto-8|<tuple|1|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 295     <associate|auto-9|<tuple|3.3.2|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 296     <associate|eq gsize-const-ksq|<tuple|5|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 297     <associate|eq gsize-ex-1|<tuple|7|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 298     <associate|eq gsize-ex-1-lin500|<tuple|8|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 299     <associate|eq gsize-lin|<tuple|6|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 300     <associate|eq gsize-model|<tuple|3|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 301     <associate|eq gsize-param-rel|<tuple|4|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 302     <associate|eq pjump-from-s-lin|<tuple|11|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 303     <associate|eq pseq-from-s-lin|<tuple|10|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 304     <associate|eq pseq-from-s-lin-exp|<tuple|12|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 305     <associate|eq rel-x-lin|<tuple|1|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 306     <associate|eq rel-x0-lin0|<tuple|2|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 307     <associate|fig ex-plot-s|<tuple|1|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 308     <associate|footnote-1|<tuple|1|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 309     <associate|footnote-2|<tuple|2|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 310     <associate|footnr-1|<tuple|1|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 311     <associate|footnr-2|<tuple|2|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 312     <associate|jump-to-random|<tuple|4|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
 313   </collection>
 314 </references>
 315
 316 <\auxiliary>
 317   <\collection>
 318     <\associate|figure>
 319       <tuple|normal|<\surround|<hidden-binding|<tuple>|1>|>
 320         A plot of a possible function that relates target group size
 321         <with|mode|<quote|math>|s> and input lines
 322         <with|mode|<quote|math>|l<rsub|<with|mode|<quote|text>|in>>> that
 323         provide some ROP and PIR. The function is tuned to achieve
 324         <with|mode|<quote|math>|s<around*|(|l<rsub|<with|mode|<quote|text>|in>>=10<rsup|6>|)>=25>.
 325         </surround>|<pageref|auto-8>>
 326     </associate>
 327     <\associate|table>
 328       <tuple|normal|<\surround|<hidden-binding|<tuple>|1>|>
 329         A table listing versions of <with|mode|<quote|prog>|prog-language|<quote|shell>|font-family|<quote|rm>|bkshuf>.
 330       </surround>|<pageref|auto-13>>
 331     </associate>
 332     <\associate|toc>
 333       <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|1<space|2spc>Summary>
 334       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 335       <no-break><pageref|auto-1><vspace|0.5fn>
 336
 337       <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|2<space|2spc>Objective>
 338       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 339       <no-break><pageref|auto-2><vspace|0.5fn>
 340
 341       <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|3<space|2spc>Design>
 342       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 343       <no-break><pageref|auto-3><vspace|0.5fn>
 344
 345       <with|par-left|<quote|1tab>|3.1<space|2spc>Definitions
 346       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 347       <no-break><pageref|auto-4>>
 348
 349       <with|par-left|<quote|1tab>|3.2<space|2spc>Process
 350       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 351       <no-break><pageref|auto-5>>
 352
 353       <with|par-left|<quote|1tab>|3.3<space|2spc>Parameter analysis
 354       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 355       <no-break><pageref|auto-6>>
 356
 357       <with|par-left|<quote|2tab>|3.3.1<space|2spc>Target group size
 358       calculation <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 359       <no-break><pageref|auto-7>>
 360
 361       <with|par-left|<quote|2tab>|3.3.2<space|2spc>Jump from expected value
 362       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 363       <no-break><pageref|auto-9>>
 364
 365       <with|par-left|<quote|2tab>|3.3.3<space|2spc>Jump from random variate
 366       of inverse gaussian distribution <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 367       <no-break><pageref|auto-10>>
 368
 369       <with|par-left|<quote|2tab>|3.3.4<space|2spc>Output structure
 370       <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 371       <no-break><pageref|auto-11>>
 372
 373       <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|4<space|2spc>Version
 374       History> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
 375       <no-break><pageref|auto-12><vspace|0.5fn>
 376     </associate>
 377   </collection>
 378 </auxiliary>