| 1 | <TeXmacs|2.1.1> |
| 2 | |
| 3 | <style|generic> |
| 4 | |
| 5 | <\body> |
| 6 | <doc-data|<doc-title|bkshuf: A Shuf-Like Utility with Pre-Image Resistance |
| 7 | and Relative Order Preservation for Random Sampling of Long |
| 8 | Lists>|<doc-author|<author-data|<author-name|Steven Baltakatei |
| 9 | Sandoval>>>|<doc-date|2023-02-14T13:56+00>|<doc-misc|CC BY-SA 4.0>> |
| 10 | |
| 11 | <section|Summary> |
| 12 | |
| 13 | <shell|bkshuf> is a <shell|shuf>-like utility designed to output randomly |
| 14 | sized groups of lines with a group size distribution centered around some |
| 15 | characteristic value. |
| 16 | |
| 17 | <section|Objective> |
| 18 | |
| 19 | The author desires to create a <shell|shuf>-like utility named |
| 20 | <shell|bkshuf> to mix line lists in order to produce output line lists with |
| 21 | the following somewhat conflicting properties: |
| 22 | |
| 23 | <\description> |
| 24 | <item*|Pre-image resistance (PIR)>An output line's position should not |
| 25 | contain information about its input line position. |
| 26 | |
| 27 | <item*|Relative order preservation (ROP)>Two neighboring lines in the |
| 28 | input stream should have a high probability of remaining neighbors in the |
| 29 | output stream. |
| 30 | </description> |
| 31 | |
| 32 | The objective is to improve the value of a short random scan of a small |
| 33 | fraction of a potentially large input list; output that demonstrates ROP as |
| 34 | well as some degree of PIR may achieve this objective. In contrast, the |
| 35 | <shell|shuf> utility provides PIR but no ROP: a line's neighbor in the |
| 36 | output of <shell|shuf> is equally likely to be any other random line from |
| 37 | the input.\ |
| 38 | |
| 39 | In other words, output produced by <shell|bkshuf> should group together |
| 40 | sequential segments of the input lines in order to partially preserve |
| 41 | relationships that may exist between sequential files. For example, this |
| 42 | could be done by jumping to a random position in the input lines, consuming |
| 43 | (i.e. reading, outputting, and marking a line not to be read again) some |
| 44 | amount of sequential lines, then repeating the process until every line is |
| 45 | consumed. The amount of sequential lines to read between jumps affects how |
| 46 | well the above desired properties are satisfied. |
| 47 | |
| 48 | The objective of <shell|bkshuf> is not to completely prevent the |
| 49 | possibility of reassembling the input given the output. Additionally, a |
| 50 | valuable property desired of <shell|bkshuf> is output which demonstrates |
| 51 | sufficiently high PIR compared to ROP such that only a short (compared to |
| 52 | the logarithm of the input list size) sequential scan of the output list |
| 53 | from a random starting position is required before a jump to a new group is |
| 54 | is encountered; this would permit the overal contents of very large input |
| 55 | line lists to be sampled. |
| 56 | |
| 57 | <section|Design> |
| 58 | |
| 59 | <subsection|Definitions> |
| 60 | |
| 61 | <\eqnarray*> |
| 62 | <tformat|<table|<row|<cell|l>|<cell|:>|<cell|<text|number of |
| 63 | lines>>>|<row|<cell|l<rsub|<text|in>>>|<cell|:>|<cell|<text|input line |
| 64 | count>>>|<row|<cell|l<rsub|<text|out>>>|<cell|:>|<cell|<text|output line |
| 65 | count>>>|<row|<cell|c>|<cell|:>|<cell|<text|target group |
| 66 | count>>>|<row|<cell|s>|<cell|:>|<cell|<text|target group |
| 67 | size>>>|<row|<cell|p<rsub|<text|seq>><rsub|<text|>>>|<cell|:>|<cell|<text|probability |
| 68 | to include next sequential line>>>|<row|<cell|s<around*|(|l<rsub|<text|in>,0>|)>>|<cell|:>|<cell|<text|<math|target |
| 69 | group size parameter>>>>|<row|<cell|l<rsub|<text|in>,0>>|<cell|:>|<cell|<text|input |
| 70 | line count parameter>>>>> |
| 71 | </eqnarray*> |
| 72 | |
| 73 | <subsection|Process> |
| 74 | |
| 75 | <\enumerate-numeric> |
| 76 | <item>Acquire and count input lines (via <shell|/dev/stdin> or positional |
| 77 | arguments). |
| 78 | |
| 79 | <item>Calculate line count <math|l<rsub|<text|in>>> . |
| 80 | |
| 81 | <item>Calculate target group size <math|s>. |
| 82 | |
| 83 | <item>Select random unconsumed input line and consume it to |
| 84 | output.<label|jump-to-random> |
| 85 | |
| 86 | <item>Consume the next sequential line with probability |
| 87 | <math|p<rsub|<text|seq>>>. Otherwise if some input lines remain |
| 88 | unconsumed, go to step <reference|jump-to-random>. Otherwise, exit. |
| 89 | </enumerate-numeric> |
| 90 | |
| 91 | <subsection|Parameter analysis> |
| 92 | |
| 93 | <subsubsection|Target group size calculation> |
| 94 | |
| 95 | The simultaneous presence of ROP and PIR properties in the output depends |
| 96 | upon the amount of sequential lines that are read before <shell|bkshuf> |
| 97 | jumps to a new random position in the input list. This amount is the |
| 98 | <em|target group size>, <math|s>; it is the \Ptarget\Q since <math|s> |
| 99 | represents the average of a distribution of group sizes that may be |
| 100 | selected, not a single group size. In this analysis, the total number of |
| 101 | lines in the input list is <math|l<rsub|<text|in>>>. For small input line |
| 102 | counts, (e.g. <math|l<rsub|<text|in>>\<cong\>10>) the target group size |
| 103 | should be nearly one (e.g. <math|s\<cong\>1>) since group sizes any larger |
| 104 | than this would have almost no PIR (e.g. a group size of <math|s=8> for |
| 105 | <math|l<rsub|<text|in>>=10> would be 80% identical to the input). For |
| 106 | modest line input counts (e.g. <math|l<rsub|<text|in>>\<cong\>100>), the |
| 107 | target group size may be allowed to be larger, such as a tenth of the input |
| 108 | line count (e.g. <math|s\<cong\>10>); this would provide some PIR |
| 109 | (approximately <math|10!> permutations between the approximately |
| 110 | <math|<frac|l<rsub|<text|in>>|s>\<cong\><frac|100|10>\<cong\>10> groups) |
| 111 | while each line in groups around size <math|10> would have a low |
| 112 | probability of not being next to its neighbor (<math|8> of the 10 lines |
| 113 | would retain the same two neighbors while the two ends would retain one |
| 114 | each). For very large input line counts (e.g. |
| 115 | <math|l<rsub|<text|in>>\<cong\>1<separating-space|0.2em>000<separating-space|0.2em>000>), |
| 116 | however, breaking up and randomizing the input into ten groups of |
| 117 | <math|100<separating-space|0.2em>000> offers very little PIR; the benefit |
| 118 | of the very high ROP is also lost since sequential scanning of tens of |
| 119 | thousands of lines is required before a random jump to a new group may be |
| 120 | encountered; therefore, the target group size should be a much smaller |
| 121 | fraction of <math|l<rsub|<text|in>>>, (e.g. <math|s\<cong\>20>) while still |
| 122 | increasing. The relationship between a desireable target group size |
| 123 | <math|s> and the input line count <math|l<rsub|<text|in>>> is non-linear. |
| 124 | The author believes a reasonable approach is to scale the group size to the |
| 125 | logarithm of input line count. |
| 126 | |
| 127 | <hgroup|Figure <reference|fig ex-plot-s>> shows an example plot of |
| 128 | <math|s<around*|(|l<rsub|<text|in>>|)>> that is tuned to achieve a target |
| 129 | group size of <math|s<around*|(|l<rsub|<text|in>>=10<rsup|6>|)>=25> for an |
| 130 | input list length of <math|l<rsub|<text|in>>=10<rsup|6>> lines.\ |
| 131 | |
| 132 | <\big-figure|<with|gr-mode|<tuple|edit|point>|gr-frame|<tuple|scale|1.18922cm|<tuple|0.299593gw|0.120812gh>>|gr-geometry|<tuple|geometry|1par|0.6par>|gr-grid|<tuple|cartesian|<point|0|0>|1>|gr-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-aspect|<tuple|<tuple|axes|none>|<tuple|1|none>|<tuple|10|none>>|gr-edit-grid|<tuple|cartesian|<point|0|0>|1>|gr-edit-grid-old|<tuple|cartesian|<point|0|0>|1>|gr-grid-aspect-props|<tuple|<tuple|axes|#808080>|<tuple|1|#c0c0c0>|<tuple|10|pastel |
| 133 | blue>>|gr-grid-aspect|<tuple|<tuple|axes|#808080>|<tuple|1|#c0c0c0>|<tuple|10|pastel |
| 134 | blue>>|magnify|1.18920711463847|gr-auto-crop|false|<graphics||<math-at|5|<point|-0.221848749616356|1.0>>|<math-at|10|<point|-0.397940008672038|2.0>>|<math-at|15|<point|-0.397940008672038|3.0>>|<math-at|20|<point|-0.397940008672038|4.0>>|<math-at|25|<point|-0.397940008672038|5.0>>|<math-at|30|<point|-0.397940008672038|6.0>>|<math-at|s|<point|0.0719460474170896|6.34360008343183>>|<point|0|0.2>|<point|6|5>|<math-at|10<rsup|1>|<point|1.0|-0.4>>|<math-at|10<rsup|2>|<point|2.0|-0.4>>|<math-at|10<rsup|3>|<point|3.0|-0.4>>|<math-at|10<rsup|4>|<point|4.0|-0.4>>|<math-at|10<rsup|5>|<point|5.0|-0.4>>|<math-at|10<rsup|6>|<point|6.0|-0.4>>|<math-at|1|<point|1.0|-0.8>>|<math-at|2|<point|2.0|-0.8>>|<math-at|3|<point|3.0|-0.8>>|<math-at|4|<point|4.0|-0.8>>|<math-at|5|<point|5.0|-0.8>>|<math-at|6|<point|6.0|-0.8>>|<math-at|x|<point|7.0|-0.8>>|<math-at|l<rsub|<text|in>>|<point|7.0|-0.4>>|<point|5.0|3.5>|<point|4.0|2.3>|<point|3.0|1.4>|<point|2.0|0.7>|<point|1.0|0.3>>>> |
| 135 | <label|fig ex-plot-s>A plot of a possible function that relates target |
| 136 | group size <math|s> and input lines <math|l<rsub|<text|in>>> that provide |
| 137 | some ROP and PIR. The function is tuned to achieve |
| 138 | <math|s<around*|(|l<rsub|<text|in>>=10<rsup|6>|)>=25>.\ |
| 139 | </big-figure> |
| 140 | |
| 141 | The following is a set of equations that are used to derive a definition |
| 142 | for <math|s<around*|(|l<rsub|<text|in>>|)>> that satisfies the plot in |
| 143 | <hgroup|Figure <reference|fig ex-plot-s>>.\ |
| 144 | |
| 145 | <\eqnarray*> |
| 146 | <tformat|<table|<row|<cell|x>|<cell|=>|<cell|<text|log> |
| 147 | <around*|(|l<rsub|<text|in>>|)>=<frac|ln |
| 148 | <around*|(|l<rsub|<text|in>>|)>|<text|ln> |
| 149 | <around*|(|10|)>><eq-number><label|eq |
| 150 | rel-x-lin>>>|<row|<cell|10<rsup|x>>|<cell|=>|<cell|l<rsub|<text|in>>>>|<row|<cell|x<rsub|0>>|<cell|=>|<cell|<text|log> |
| 151 | <around*|(|l<rsub|<text|in>,0>|)>=<frac|ln |
| 152 | <around*|(|l<rsub|<text|in>,0>|)>|<text|ln> |
| 153 | <around*|(|10|)>><eq-number><label|eq |
| 154 | rel-x0-lin0>>>|<row|<cell|10<rsup|x<rsub|0>>>|<cell|=>|<cell|l<rsub|<text|in>,0>>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|s<around*|(|x|)>>|<cell|=>|<cell|<around*|(|k**x|)><rsup|2>+1<eq-number><label|eq |
| 155 | gsize-model>>>|<row|<cell|s<around*|(|x=6|)>=25>|<cell|=>|<cell|k<rsup|2>\<cdot\><around*|(|6|)><rsup|2>+1>>|<row|<cell|25>|<cell|=>|<cell|k<rsup|2>\<cdot\><around*|(|36|)>+1>>|<row|<cell|s<around*|(|x<rsub|0>|)>>|<cell|=>|<cell|<around*|(|k*x<rsub|0>|)><rsup|2>+1<eq-number><label|eq |
| 156 | gsize-param-rel>>>|<row|<cell|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>>>|<cell|=>|<cell|k<rsup|2>>>|<row|<cell|k>|<cell|=>|<cell|<sqrt|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>>>>>|<row|<cell|k>|<cell|=>|<cell|<frac|<sqrt|s<around*|(|x<rsub|0>|)>-1>|x<rsub|0>>>>|<row|<cell|k<rsup|2>>|<cell|=>|<cell|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>><eq-number><label|eq |
| 157 | gsize-const-ksq>>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|s<around*|(|x<rsub|0>|)>-1|x<rsub|0><rsup|2>>|)>\<cdot\><around*|(|<frac|<text|ln><around*|(|l<rsub|<text|in>>|)>|<text|ln> |
| 158 | <around*|(|10|)>>|)><rsup|2>+1>>|<row|<cell|>|<cell|=>|<cell|<around*|(|<frac|<text|ln> |
| 159 | <around*|(|10|)>|ln <around*|(|l<rsub|<text|in>,0>|)>>|)><rsup|2>\<cdot\><around*|(|s<around*|(|x<rsub|0>|)>-1|)>\<cdot\><around*|(|<frac|<text|ln><around*|(|l<rsub|<text|in>>|)>|<text|ln> |
| 160 | <around*|(|10|)>>|)><rsup|2>+1>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|s<around*|(|l<rsub|<text|in>,0>|)>-1|)>\<cdot\><around*|(|<frac|<text|ln><around*|(|l<rsub|<text|in>>|)>|ln |
| 161 | <around*|(|l<rsub|<text|in>,0>|)>>|)><rsup|2>+1>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|s<around*|(|l<rsub|<text|in>,0>|)>-1|<around*|[|ln |
| 162 | <around*|(|l<rsub|<text|in>,0>|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1<eq-number><label|eq |
| 163 | gsize-lin>>>>> |
| 164 | </eqnarray*> |
| 165 | |
| 166 | Equation <reference|eq gsize-model> defines a quadratic equation for the |
| 167 | linear range <math|s> and the logarithmic domain <math|x>. <math|x> is |
| 168 | defined in terms of <math|l<rsub|<text|in>>> via a domain transformation |
| 169 | defined by <hgroup|Equation <reference|eq rel-x-lin>>. The result is |
| 170 | <hgroup|Equation <reference|eq gsize-lin>> which defines |
| 171 | <math|s<around*|(|l<rsub|<text|in>>|)>> as a function of |
| 172 | <math|l<rsub|<text|in>>> and parameters |
| 173 | <math|s<around*|(|l<rsub|<text|in>,0>|)>> and <math|l<rsub|<text|in>,0>>. |
| 174 | The parameters define how quickly or slowly the quadratic equation grows. |
| 175 | In other words, if a user wishes for a <math|1<separating-space|0.2em>000<separating-space|0.2em>000> |
| 176 | line input to be split into groups each containing, on average, <math|25> |
| 177 | lines, then they should plug in <math|l<rsub|<text|in>,0>=1<separating-space|0.2em>000<separating-space|0.2em>000> |
| 178 | and <math|s<around*|(|l<rsub|<text|in>,0>|)>=25> into <hgroup|Equation |
| 179 | <reference|eq gsize-lin>> as is done in <hgroup|Equation <reference|eq |
| 180 | gsize-ex-1>>. This equation can then be used to calculate target group |
| 181 | sizes <math|s> as a function of other input line counts |
| 182 | <math|l<rsub|<text|in>>> besides <math|l<rsub|<text|in>>=1<separating-space|0.2em>000<separating-space|0.2em>000>. |
| 183 | For example, plugging <math|l<rsub|<text|in>>=500> into <hgroup|Equation |
| 184 | <reference|eq gsize-ex-1>> yields <hgroup|Equation <reference|eq |
| 185 | gsize-ex-1-lin500>> which specifies a target group size of |
| 186 | <math|5.85629\<cong\>6>. |
| 187 | |
| 188 | <\eqnarray*> |
| 189 | <tformat|<table|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|s<around*|(|l<rsub|<text|in>,0>|)>-1|<around*|[|ln |
| 190 | <around*|(|l<rsub|<text|in>,0>|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1>>|<row|<cell|s<around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|<around*|(|<frac|25-1|<around*|[|<text|ln |
| 191 | ><around*|(|1<separating-space|0.2em>000<separating-space|0.2em>000|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1<eq-number><label|eq |
| 192 | gsize-ex-1>>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|s<around*|(|l<rsub|<text|in>>=500|)>>|<cell|=>|<cell|<around*|(|<frac|25-1|<around*|[|<text|ln |
| 193 | ><around*|(|1<separating-space|0.2em>000<separating-space|0.2em>000|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|500|)>|]><rsup|2>+1<eq-number><label|eq |
| 194 | gsize-ex-1-lin500>>>|<row|<cell|s<around*|(|l<rsub|<text|in>>=500|)>>|<cell|=>|<cell|5.85629>>>> |
| 195 | </eqnarray*> |
| 196 | |
| 197 | \; |
| 198 | |
| 199 | <subsubsection|Jump from expected value> |
| 200 | |
| 201 | A method <shell|bkshuf> may employ to decide when read the next sequential |
| 202 | unconsumed input line is to simply do so with probability |
| 203 | <math|p<rsub|<text|seq>>> such that the expected value of the average group |
| 204 | size trends towards <math|s>. |
| 205 | |
| 206 | <\eqnarray*> |
| 207 | <tformat|<table|<row|<cell|p<rsub|<text|seq>>>|<cell|=>|<cell|<around*|(|1-p<rsub|<text|jump>>|)>>>|<row|<cell|p<rsub|<text|jump>>>|<cell|=>|<cell|1-p<rsub|<text|seq>>>>|<row|<cell|s>|<cell|=>|<cell|<frac|1|p<rsub|<text|jump>>>=<frac|1|1-p<rsub|<text|seq>>><eq-number>>>|<row|<cell|s>|<cell|=>|<cell|<frac|1|1-p<rsub|<text|seq>>>>>|<row|<cell|1-p<rsub|<text|seq>>>|<cell|=>|<cell|<frac|1|s>>>|<row|<cell|p<rsub|<text|seq>>-1>|<cell|=>|<cell|<frac|-1|s>>>|<row|<cell|p<rsub|<text|seq>>>|<cell|=>|<cell|1-<frac|1|s<around*|(|l<rsub|<text|in>>|)>><eq-number><label|eq |
| 208 | pseq-from-s-lin>>>|<row|<cell|p<rsub|<text|jump>>>|<cell|=>|<cell|<frac|1|s<around*|(|l<rsub|<text|in>>|)>><eq-number><label|eq |
| 209 | pjump-from-s-lin>>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|p<rsub|<text|seq>><around*|(|l<rsub|<text|in>>|)>>|<cell|=>|<cell|1-<around*|[|<around*|(|<frac|s<around*|(|l<rsub|<text|in>,0>|)>-1|<around*|[|ln |
| 210 | <around*|(|l<rsub|<text|in>,0>|)>|]><rsup|2>>|)>\<cdot\><around*|[|<text|ln><around*|(|l<rsub|<text|in>>|)>|]><rsup|2>+1|]><rsup|-1><eq-number><label|eq |
| 211 | pseq-from-s-lin-exp>>>>> |
| 212 | </eqnarray*> |
| 213 | |
| 214 | <subsubsection|Jump from random variate of inverse gaussian distribution> |
| 215 | |
| 216 | Another method <shell|bkshuf> may employ to decide when to read the next |
| 217 | sequential unconsumed input line is to use an inverse gaussian |
| 218 | distribution. This may be done by generating from the distribution a float |
| 219 | sampled from the inverse gaussian with range 0 to infinity with mean |
| 220 | <math|\<mu\>> whenever a new random position in the input list is selected; |
| 221 | the float is rounded to the nearest integer.<\footnote> |
| 222 | See <name|Michael, John R.> "Generating Random Variates Using |
| 223 | Transformations with Multiple Roots". 1976. |
| 224 | <hlink|https://doi.org/10.2307/2683801|https://doi.org/10.2307/2683801> . |
| 225 | </footnote> Then, after consuming an input line, this integer is |
| 226 | decremented by one and another sequential line is consumed provided the |
| 227 | integer does not become less than or equal to zero. The inverse gaussian |
| 228 | distribution requires specifying the mean <math|\<mu\>> and the shape |
| 229 | parameter <math|\<lambda\>>; a higher <math|\<lambda\>> results in a |
| 230 | greater spread. An upper bound may also be specified since the distribution |
| 231 | has none except for that imposed by its programming implementation. |
| 232 | |
| 233 | The result of using an inverse gaussian distribution is an output with |
| 234 | potentially much more regular group sizes than using the previously |
| 235 | mentioned expected value method. However, the implementation of the inverse |
| 236 | gaussian sampling operation described by (Michael, 1976) requires several |
| 237 | exponent calculations and a square root calculation in addition to various |
| 238 | multiplication and division operations. If sufficient processing power is |
| 239 | available, this may not necessarily be an issue. |
| 240 | |
| 241 | <subsubsection|Output structure> |
| 242 | |
| 243 | Regardless of whether group sizes are determined by the expected value |
| 244 | method or using variates of an inverse gaussian distribution, mimicking the |
| 245 | <shell|shuf> property of all input lines being present in the output, |
| 246 | albeit rearranged, results in a side effect: the first output lines are |
| 247 | more likely to contain groups with uninterrupted sequence runs (high ROP) |
| 248 | while groups in the last output lines are almost certain to contain |
| 249 | sequence jumps within a group (less ROP). The reason for this is that |
| 250 | <shell|bkshuf>, when it encounters an input line that has already been |
| 251 | consumed, will skip to the next available input line. The decision could be |
| 252 | made to skip to a new random line, but, it is simpler to simply read the |
| 253 | next available input line. The author's original intention of sampling only |
| 254 | a short initial portion of the output is compatible with the behavior that |
| 255 | ROP is preserved mostly at the beginning of the output. |
| 256 | |
| 257 | <section|Version History> |
| 258 | |
| 259 | <\big-table|<tabular|<tformat|<table|<row|<cell|Version |
| 260 | No.>|<cell|Date>|<cell|Path>|<cell|Description>>|<row|<cell|<verbatim|0.0.1>>|<cell|2023-02-14>|<cell|<verbatim|unitproc/bkshuf>>|<cell|Initial |
| 261 | draft implemented in <name|Bash>.>>>>>> |
| 262 | A table listing versions of <shell|bkshuf>. |
| 263 | </big-table> |
| 264 | |
| 265 | <\description> |
| 266 | <item*|v0.0.1>Initial implementation in <shell|bash> <verbatim|5.1.16> |
| 267 | with <shell|bc> <verbatim|1.07.1> and <name|GNU Coreutils> |
| 268 | <verbatim|8.32> and tested on <name|Pop!_OS> <verbatim|22.04 LTS>. Saved |
| 269 | to the author's <name|BK-2020-03> repository<\footnote> |
| 270 | See commit <hlink|<verbatim|080ea4c>|https://gitlab.com/baltakatei/baltakatei-exdev/-/blob/080ea4c0ff0d4e6b5ce86f664fa6645c1cb02bf0/unitproc/bkshuf> |
| 271 | at <hlink|https://gitlab.com/baltakatei/baltakatei-exdev|https://gitlab.com/baltakatei/baltakatei-exdev> |
| 272 | . |
| 273 | </footnote>. |
| 274 | </description> |
| 275 | </body> |
| 276 | |
| 277 | <initial|<\collection> |
| 278 | </collection>> |
| 279 | |
| 280 | <\references> |
| 281 | <\collection> |
| 282 | <associate|auto-1|<tuple|1|1|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 283 | <associate|auto-10|<tuple|3.3.3|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 284 | <associate|auto-11|<tuple|3.3.4|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 285 | <associate|auto-12|<tuple|4|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 286 | <associate|auto-13|<tuple|1|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 287 | <associate|auto-14|<tuple|1|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 288 | <associate|auto-2|<tuple|2|1|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 289 | <associate|auto-3|<tuple|3|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 290 | <associate|auto-4|<tuple|3.1|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 291 | <associate|auto-5|<tuple|3.2|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 292 | <associate|auto-6|<tuple|3.3|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 293 | <associate|auto-7|<tuple|3.3.1|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 294 | <associate|auto-8|<tuple|1|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 295 | <associate|auto-9|<tuple|3.3.2|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 296 | <associate|eq gsize-const-ksq|<tuple|5|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 297 | <associate|eq gsize-ex-1|<tuple|7|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 298 | <associate|eq gsize-ex-1-lin500|<tuple|8|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 299 | <associate|eq gsize-lin|<tuple|6|4|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 300 | <associate|eq gsize-model|<tuple|3|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 301 | <associate|eq gsize-param-rel|<tuple|4|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 302 | <associate|eq pjump-from-s-lin|<tuple|11|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 303 | <associate|eq pseq-from-s-lin|<tuple|10|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 304 | <associate|eq pseq-from-s-lin-exp|<tuple|12|?|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 305 | <associate|eq rel-x-lin|<tuple|1|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 306 | <associate|eq rel-x0-lin0|<tuple|2|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 307 | <associate|fig ex-plot-s|<tuple|1|3|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 308 | <associate|footnote-1|<tuple|1|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 309 | <associate|footnote-2|<tuple|2|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 310 | <associate|footnr-1|<tuple|1|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 311 | <associate|footnr-2|<tuple|2|5|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 312 | <associate|jump-to-random|<tuple|4|2|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>> |
| 313 | </collection> |
| 314 | </references> |
| 315 | |
| 316 | <\auxiliary> |
| 317 | <\collection> |
| 318 | <\associate|figure> |
| 319 | <tuple|normal|<\surround|<hidden-binding|<tuple>|1>|> |
| 320 | A plot of a possible function that relates target group size |
| 321 | <with|mode|<quote|math>|s> and input lines |
| 322 | <with|mode|<quote|math>|l<rsub|<with|mode|<quote|text>|in>>> that |
| 323 | provide some ROP and PIR. The function is tuned to achieve |
| 324 | <with|mode|<quote|math>|s<around*|(|l<rsub|<with|mode|<quote|text>|in>>=10<rsup|6>|)>=25>. |
| 325 | </surround>|<pageref|auto-8>> |
| 326 | </associate> |
| 327 | <\associate|table> |
| 328 | <tuple|normal|<\surround|<hidden-binding|<tuple>|1>|> |
| 329 | A table listing versions of <with|mode|<quote|prog>|prog-language|<quote|shell>|font-family|<quote|rm>|bkshuf>. |
| 330 | </surround>|<pageref|auto-13>> |
| 331 | </associate> |
| 332 | <\associate|toc> |
| 333 | <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|1<space|2spc>Summary> |
| 334 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 335 | <no-break><pageref|auto-1><vspace|0.5fn> |
| 336 | |
| 337 | <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|2<space|2spc>Objective> |
| 338 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 339 | <no-break><pageref|auto-2><vspace|0.5fn> |
| 340 | |
| 341 | <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|3<space|2spc>Design> |
| 342 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 343 | <no-break><pageref|auto-3><vspace|0.5fn> |
| 344 | |
| 345 | <with|par-left|<quote|1tab>|3.1<space|2spc>Definitions |
| 346 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 347 | <no-break><pageref|auto-4>> |
| 348 | |
| 349 | <with|par-left|<quote|1tab>|3.2<space|2spc>Process |
| 350 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 351 | <no-break><pageref|auto-5>> |
| 352 | |
| 353 | <with|par-left|<quote|1tab>|3.3<space|2spc>Parameter analysis |
| 354 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 355 | <no-break><pageref|auto-6>> |
| 356 | |
| 357 | <with|par-left|<quote|2tab>|3.3.1<space|2spc>Target group size |
| 358 | calculation <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 359 | <no-break><pageref|auto-7>> |
| 360 | |
| 361 | <with|par-left|<quote|2tab>|3.3.2<space|2spc>Jump from expected value |
| 362 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 363 | <no-break><pageref|auto-9>> |
| 364 | |
| 365 | <with|par-left|<quote|2tab>|3.3.3<space|2spc>Jump from random variate |
| 366 | of inverse gaussian distribution <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 367 | <no-break><pageref|auto-10>> |
| 368 | |
| 369 | <with|par-left|<quote|2tab>|3.3.4<space|2spc>Output structure |
| 370 | <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 371 | <no-break><pageref|auto-11>> |
| 372 | |
| 373 | <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|4<space|2spc>Version |
| 374 | History> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>> |
| 375 | <no-break><pageref|auto-12><vspace|0.5fn> |
| 376 | </associate> |
| 377 | </collection> |
| 378 | </auxiliary> |