zdv2.bktei.com Git - BK-2020-03.git/blame_incremental

... / ...

Commit	Line	Data
	1	<TeXmacs\|2.1.1>
	2
	3	<style\|generic>
	4
	5	<\body>
	6	<doc-data\|<doc-title\|bkshuf: A Shuf-Like Utility with Pre-Image Resistance
	7	and Relative Order Preservation for Random Sampling of Long
	8	Lists>\|<doc-author\|<author-data\|<author-name\|Steven Baltakatei
	9	Sandoval>>>\|<doc-date\|2023-02-14T13:56+00>\|<doc-misc\|CC BY-SA 4.0>>
	10
	11	<section\|Summary>
	12
	13	<shell\|bkshuf> is a <shell\|shuf>-like utility designed to output randomly
	14	sized groups of lines with a group size distribution centered around some
	15	characteristic value.
	16
	17	<section\|Objective>
	18
	19	The author desires to create a <shell\|shuf>-like utility named
	20	<shell\|bkshuf> to mix line lists in order to produce output line lists with
	21	the following somewhat conflicting properties:
	22
	23	<\description>
	24	<item*\|Pre-image resistance (PIR)>An output line's position should not
	25	contain information about its input line position.
	26
	27	<item*\|Relative order preservation (ROP)>Two neighboring lines in the
	28	input stream should have a high probability of remaining neighbors in the
	29	output stream.
	30	</description>
	31
	32	The objective is to improve the value of a short random scan of a small
	33	fraction of a potentially large input list; output that demonstrates ROP as
	34	well as some degree of PIR may achieve this objective. In contrast, the
	35	<shell\|shuf> utility provides PIR but no ROP: a line's neighbor in the
	36	output of <shell\|shuf> is equally likely to be any other random line from
	37	the input.\
	38
	39	In other words, output produced by <shell\|bkshuf> should group together
	40	sequential segments of the input lines in order to partially preserve
	41	relationships that may exist between sequential files. For example, this
	42	could be done by jumping to a random position in the input lines, consuming
	43	(i.e. reading, outputting, and marking a line not to be read again) some
	44	amount of sequential lines, then repeating the process until every line is
	45	consumed. The amount of sequential lines to read between jumps affects how
	46	well the above desired properties are satisfied.
	47
	48	The objective of <shell\|bkshuf> is not to completely prevent the
	49	possibility of reassembling the input given the output. Additionally, a
	50	valuable property desired of <shell\|bkshuf> is output which demonstrates
	51	sufficiently high PIR compared to ROP such that only a short (compared to
	52	the logarithm of the input list size) sequential scan of the output list
	53	from a random starting position is required before a jump to a new group is
	54	is encountered; this would permit the overal contents of very large input
	55	line lists to be sampled.
	56
	57	<section\|Design>
	58
	59	<subsection\|Definitions>
	60
	61	<\eqnarray*>
	62	<tformat\|<table\|<row\|<cell\|l>\|<cell\|:>\|<cell\|<text\|number of
	63	lines>>>\|<row\|<cell\|l<rsub\|<text\|in>>>\|<cell\|:>\|<cell\|<text\|input line
	64	count>>>\|<row\|<cell\|l<rsub\|<text\|out>>>\|<cell\|:>\|<cell\|<text\|output line
	65	count>>>\|<row\|<cell\|c>\|<cell\|:>\|<cell\|<text\|target group
	66	count>>>\|<row\|<cell\|s>\|<cell\|:>\|<cell\|<text\|target group
	67	size>>>\|<row\|<cell\|p<rsub\|<text\|seq>><rsub\|<text\|>>>\|<cell\|:>\|<cell\|<text\|probability
	68	to include next sequential line>>>\|<row\|<cell\|s<around*\|(\|l<rsub\|<text\|in>,0>\|)>>\|<cell\|:>\|<cell\|<text\|<math\|target
	69	group size parameter>>>>\|<row\|<cell\|l<rsub\|<text\|in>,0>>\|<cell\|:>\|<cell\|<text\|input
	70	line count parameter>>>>>
	71	</eqnarray*>
	72
	73	<subsection\|Process>
	74
	75	<\enumerate-numeric>
	76	<item>Acquire and count input lines (via <shell\|/dev/stdin> or positional
	77	arguments).
	78
	79	<item>Calculate line count <math\|l<rsub\|<text\|in>>> .
	80
	81	<item>Calculate target group size <math\|s>.
	82
	83	<item>Select random unconsumed input line and consume it to
	84	output.<label\|jump-to-random>
	85
	86	<item>Consume the next sequential line with probability
	87	<math\|p<rsub\|<text\|seq>>>. Otherwise if some input lines remain
	88	unconsumed, go to step <reference\|jump-to-random>. Otherwise, exit.
	89	</enumerate-numeric>
	90
	91	<subsection\|Parameter analysis>
	92
	93	<subsubsection\|Target group size calculation>
	94
	95	The simultaneous presence of ROP and PIR properties in the output depends
	96	upon the amount of sequential lines that are read before <shell\|bkshuf>
	97	jumps to a new random position in the input list. This amount is the
	98	<em\|target group size>, <math\|s>; it is the \Ptarget\Q since <math\|s>
	99	represents the average of a distribution of group sizes that may be
	100	selected, not a single group size. In this analysis, the total number of
	101	lines in the input list is <math\|l<rsub\|<text\|in>>>. For small input line
	102	counts, (e.g. <math\|l<rsub\|<text\|in>>\<cong\>10>) the target group size
	103	should be nearly one (e.g. <math\|s\<cong\>1>) since group sizes any larger
	104	than this would have almost no PIR (e.g. a group size of <math\|s=8> for
	105	<math\|l<rsub\|<text\|in>>=10> would be 80% identical to the input). For
	106	modest line input counts (e.g. <math\|l<rsub\|<text\|in>>\<cong\>100>), the
	107	target group size may be allowed to be larger, such as a tenth of the input
	108	line count (e.g. <math\|s\<cong\>10>); this would provide some PIR
	109	(approximately <math\|10!> permutations between the approximately
	110	<math\|<frac\|l<rsub\|<text\|in>>\|s>\<cong\><frac\|100\|10>\<cong\>10> groups)
	111	while each line in groups around size <math\|10> would have a low
	112	probability of not being next to its neighbor (<math\|8> of the 10 lines
	113	would retain the same two neighbors while the two ends would retain one
	114	each). For very large input line counts (e.g.
	115	<math\|l<rsub\|<text\|in>>\<cong\>1<separating-space\|0.2em>000<separating-space\|0.2em>000>),
	116	however, breaking up and randomizing the input into ten groups of
	117	<math\|100<separating-space\|0.2em>000> offers very little PIR; the benefit
	118	of the very high ROP is also lost since sequential scanning of tens of
	119	thousands of lines is required before a random jump to a new group may be
	120	encountered; therefore, the target group size should be a much smaller
	121	fraction of <math\|l<rsub\|<text\|in>>>, (e.g. <math\|s\<cong\>20>) while still
	122	increasing. The relationship between a desireable target group size
	123	<math\|s> and the input line count <math\|l<rsub\|<text\|in>>> is non-linear.
	124	The author believes a reasonable approach is to scale the group size to the
	125	logarithm of input line count.
	126
	127	<hgroup\|Figure <reference\|fig ex-plot-s>> shows an example plot of
	128	<math\|s<around*\|(\|l<rsub\|<text\|in>>\|)>> that is tuned to achieve a target
	129	group size of <math\|s<around*\|(\|l<rsub\|<text\|in>>=10<rsup\|6>\|)>=25> for an
	130	input list length of <math\|l<rsub\|<text\|in>>=10<rsup\|6>> lines.\
	131
	132	<\big-figure\|<with\|gr-mode\|<tuple\|edit\|point>\|gr-frame\|<tuple\|scale\|1.18922cm\|<tuple\|0.299593gw\|0.120812gh>>\|gr-geometry\|<tuple\|geometry\|1par\|0.6par>\|gr-grid\|<tuple\|cartesian\|<point\|0\|0>\|1>\|gr-grid-old\|<tuple\|cartesian\|<point\|0\|0>\|1>\|gr-edit-grid-aspect\|<tuple\|<tuple\|axes\|none>\|<tuple\|1\|none>\|<tuple\|10\|none>>\|gr-edit-grid\|<tuple\|cartesian\|<point\|0\|0>\|1>\|gr-edit-grid-old\|<tuple\|cartesian\|<point\|0\|0>\|1>\|gr-grid-aspect-props\|<tuple\|<tuple\|axes\|#808080>\|<tuple\|1\|#c0c0c0>\|<tuple\|10\|pastel
	133	blue>>\|gr-grid-aspect\|<tuple\|<tuple\|axes\|#808080>\|<tuple\|1\|#c0c0c0>\|<tuple\|10\|pastel
	134	blue>>\|magnify\|1.18920711463847\|gr-auto-crop\|false\|<graphics\|\|<math-at\|5\|<point\|-0.221848749616356\|1.0>>\|<math-at\|10\|<point\|-0.397940008672038\|2.0>>\|<math-at\|15\|<point\|-0.397940008672038\|3.0>>\|<math-at\|20\|<point\|-0.397940008672038\|4.0>>\|<math-at\|25\|<point\|-0.397940008672038\|5.0>>\|<math-at\|30\|<point\|-0.397940008672038\|6.0>>\|<math-at\|s\|<point\|0.0719460474170896\|6.34360008343183>>\|<point\|0\|0.2>\|<point\|6\|5>\|<math-at\|10<rsup\|1>\|<point\|1.0\|-0.4>>\|<math-at\|10<rsup\|2>\|<point\|2.0\|-0.4>>\|<math-at\|10<rsup\|3>\|<point\|3.0\|-0.4>>\|<math-at\|10<rsup\|4>\|<point\|4.0\|-0.4>>\|<math-at\|10<rsup\|5>\|<point\|5.0\|-0.4>>\|<math-at\|10<rsup\|6>\|<point\|6.0\|-0.4>>\|<math-at\|1\|<point\|1.0\|-0.8>>\|<math-at\|2\|<point\|2.0\|-0.8>>\|<math-at\|3\|<point\|3.0\|-0.8>>\|<math-at\|4\|<point\|4.0\|-0.8>>\|<math-at\|5\|<point\|5.0\|-0.8>>\|<math-at\|6\|<point\|6.0\|-0.8>>\|<math-at\|x\|<point\|7.0\|-0.8>>\|<math-at\|l<rsub\|<text\|in>>\|<point\|7.0\|-0.4>>\|<point\|5.0\|3.5>\|<point\|4.0\|2.3>\|<point\|3.0\|1.4>\|<point\|2.0\|0.7>\|<point\|1.0\|0.3>>>>
	135	<label\|fig ex-plot-s>A plot of a possible function that relates target
	136	group size <math\|s> and input lines <math\|l<rsub\|<text\|in>>> that provide
	137	some ROP and PIR. The function is tuned to achieve
	138	<math\|s<around*\|(\|l<rsub\|<text\|in>>=10<rsup\|6>\|)>=25>.\
	139	</big-figure>
	140
	141	The following is a set of equations that are used to derive a definition
	142	for <math\|s<around*\|(\|l<rsub\|<text\|in>>\|)>> that satisfies the plot in
	143	<hgroup\|Figure <reference\|fig ex-plot-s>>.\
	144
	145	<\eqnarray*>
	146	<tformat\|<table\|<row\|<cell\|x>\|<cell\|=>\|<cell\|<text\|log>
	147	<around*\|(\|l<rsub\|<text\|in>>\|)>=<frac\|ln
	148	<around*\|(\|l<rsub\|<text\|in>>\|)>\|<text\|ln>
	149	<around*\|(\|10\|)>><eq-number><label\|eq
	150	rel-x-lin>>>\|<row\|<cell\|10<rsup\|x>>\|<cell\|=>\|<cell\|l<rsub\|<text\|in>>>>\|<row\|<cell\|x<rsub\|0>>\|<cell\|=>\|<cell\|<text\|log>
	151	<around*\|(\|l<rsub\|<text\|in>,0>\|)>=<frac\|ln
	152	<around*\|(\|l<rsub\|<text\|in>,0>\|)>\|<text\|ln>
	153	<around*\|(\|10\|)>><eq-number><label\|eq
	154	rel-x0-lin0>>>\|<row\|<cell\|10<rsup\|x<rsub\|0>>>\|<cell\|=>\|<cell\|l<rsub\|<text\|in>,0>>>\|<row\|<cell\|>\|<cell\|>\|<cell\|>>\|<row\|<cell\|s<around\|(\|x\|)>>\|<cell\|=>\|<cell\|<around\|(\|k**x\|)><rsup\|2>+1<eq-number><label\|eq
	155	gsize-model>>>\|<row\|<cell\|s<around\|(\|x=6\|)>=25>\|<cell\|=>\|<cell\|k<rsup\|2>\<cdot\><around\|(\|6\|)><rsup\|2>+1>>\|<row\|<cell\|25>\|<cell\|=>\|<cell\|k<rsup\|2>\<cdot\><around\|(\|36\|)>+1>>\|<row\|<cell\|s<around\|(\|x<rsub\|0>\|)>>\|<cell\|=>\|<cell\|<around\|(\|kx<rsub\|0>\|)><rsup\|2>+1<eq-number><label\|eq
	156	gsize-param-rel>>>\|<row\|<cell\|<frac\|s<around\|(\|x<rsub\|0>\|)>-1\|x<rsub\|0><rsup\|2>>>\|<cell\|=>\|<cell\|k<rsup\|2>>>\|<row\|<cell\|k>\|<cell\|=>\|<cell\|<sqrt\|<frac\|s<around\|(\|x<rsub\|0>\|)>-1\|x<rsub\|0><rsup\|2>>>>>\|<row\|<cell\|k>\|<cell\|=>\|<cell\|<frac\|<sqrt\|s<around\|(\|x<rsub\|0>\|)>-1>\|x<rsub\|0>>>>\|<row\|<cell\|k<rsup\|2>>\|<cell\|=>\|<cell\|<frac\|s<around\|(\|x<rsub\|0>\|)>-1\|x<rsub\|0><rsup\|2>><eq-number><label\|eq
	157	gsize-const-ksq>>>\|<row\|<cell\|s<around\|(\|l<rsub\|<text\|in>>\|)>>\|<cell\|=>\|<cell\|<around\|(\|<frac\|s<around\|(\|x<rsub\|0>\|)>-1\|x<rsub\|0><rsup\|2>>\|)>\<cdot\><around\|(\|<frac\|<text\|ln><around*\|(\|l<rsub\|<text\|in>>\|)>\|<text\|ln>
	158	<around\|(\|10\|)>>\|)><rsup\|2>+1>>\|<row\|<cell\|>\|<cell\|=>\|<cell\|<around\|(\|<frac\|<text\|ln>
	159	<around\|(\|10\|)>\|ln <around\|(\|l<rsub\|<text\|in>,0>\|)>>\|)><rsup\|2>\<cdot\><around\|(\|s<around\|(\|x<rsub\|0>\|)>-1\|)>\<cdot\><around\|(\|<frac\|<text\|ln><around\|(\|l<rsub\|<text\|in>>\|)>\|<text\|ln>
	160	<around\|(\|10\|)>>\|)><rsup\|2>+1>>\|<row\|<cell\|s<around\|(\|l<rsub\|<text\|in>>\|)>>\|<cell\|=>\|<cell\|<around\|(\|s<around\|(\|l<rsub\|<text\|in>,0>\|)>-1\|)>\<cdot\><around\|(\|<frac\|<text\|ln><around\|(\|l<rsub\|<text\|in>>\|)>\|ln
	161	<around\|(\|l<rsub\|<text\|in>,0>\|)>>\|)><rsup\|2>+1>>\|<row\|<cell\|s<around\|(\|l<rsub\|<text\|in>>\|)>>\|<cell\|=>\|<cell\|<around\|(\|<frac\|s<around\|(\|l<rsub\|<text\|in>,0>\|)>-1\|<around*\|[\|ln
	162	<around\|(\|l<rsub\|<text\|in>,0>\|)>\|]><rsup\|2>>\|)>\<cdot\><around\|[\|<text\|ln><around*\|(\|l<rsub\|<text\|in>>\|)>\|]><rsup\|2>+1<eq-number><label\|eq
	163	gsize-lin>>>>>
	164	</eqnarray*>
	165
	166	Equation <reference\|eq gsize-model> defines a quadratic equation for the
	167	linear range <math\|s> and the logarithmic domain <math\|x>. <math\|x> is
	168	defined in terms of <math\|l<rsub\|<text\|in>>> via a domain transformation
	169	defined by <hgroup\|Equation <reference\|eq rel-x-lin>>. The result is
	170	<hgroup\|Equation <reference\|eq gsize-lin>> which defines
	171	<math\|s<around*\|(\|l<rsub\|<text\|in>>\|)>> as a function of
	172	<math\|l<rsub\|<text\|in>>> and parameters
	173	<math\|s<around*\|(\|l<rsub\|<text\|in>,0>\|)>> and <math\|l<rsub\|<text\|in>,0>>.
	174	The parameters define how quickly or slowly the quadratic equation grows.
	175	In other words, if a user wishes for a <math\|1<separating-space\|0.2em>000<separating-space\|0.2em>000>
	176	line input to be split into groups each containing, on average, <math\|25>
	177	lines, then they should plug in <math\|l<rsub\|<text\|in>,0>=1<separating-space\|0.2em>000<separating-space\|0.2em>000>
	178	and <math\|s<around*\|(\|l<rsub\|<text\|in>,0>\|)>=25> into <hgroup\|Equation
	179	<reference\|eq gsize-lin>> as is done in <hgroup\|Equation <reference\|eq
	180	gsize-ex-1>>. This equation can then be used to calculate target group
	181	sizes <math\|s> as a function of other input line counts
	182	<math\|l<rsub\|<text\|in>>> besides <math\|l<rsub\|<text\|in>>=1<separating-space\|0.2em>000<separating-space\|0.2em>000>.
	183	For example, plugging <math\|l<rsub\|<text\|in>>=500> into <hgroup\|Equation
	184	<reference\|eq gsize-ex-1>> yields <hgroup\|Equation <reference\|eq
	185	gsize-ex-1-lin500>> which specifies a target group size of
	186	<math\|5.85629\<cong\>6>.
	187
	188	<\eqnarray*>
	189	<tformat\|<table\|<row\|<cell\|s<around\|(\|l<rsub\|<text\|in>>\|)>>\|<cell\|=>\|<cell\|<around\|(\|<frac\|s<around\|(\|l<rsub\|<text\|in>,0>\|)>-1\|<around\|[\|ln
	190	<around\|(\|l<rsub\|<text\|in>,0>\|)>\|]><rsup\|2>>\|)>\<cdot\><around\|[\|<text\|ln><around\|(\|l<rsub\|<text\|in>>\|)>\|]><rsup\|2>+1>>\|<row\|<cell\|s<around\|(\|l<rsub\|<text\|in>>\|)>>\|<cell\|=>\|<cell\|<around\|(\|<frac\|25-1\|<around\|[\|<text\|ln
	191	><around\|(\|1<separating-space\|0.2em>000<separating-space\|0.2em>000\|)>\|]><rsup\|2>>\|)>\<cdot\><around\|[\|<text\|ln><around*\|(\|l<rsub\|<text\|in>>\|)>\|]><rsup\|2>+1<eq-number><label\|eq
	192	gsize-ex-1>>>\|<row\|<cell\|>\|<cell\|>\|<cell\|>>\|<row\|<cell\|s<around\|(\|l<rsub\|<text\|in>>=500\|)>>\|<cell\|=>\|<cell\|<around\|(\|<frac\|25-1\|<around*\|[\|<text\|ln
	193	><around\|(\|1<separating-space\|0.2em>000<separating-space\|0.2em>000\|)>\|]><rsup\|2>>\|)>\<cdot\><around\|[\|<text\|ln><around*\|(\|500\|)>\|]><rsup\|2>+1<eq-number><label\|eq
	194	gsize-ex-1-lin500>>>\|<row\|<cell\|s<around*\|(\|l<rsub\|<text\|in>>=500\|)>>\|<cell\|=>\|<cell\|5.85629>>>>
	195	</eqnarray*>
	196
	197	\;
	198
	199	<subsubsection\|Jump from expected value>
	200
	201	A method <shell\|bkshuf> may employ to decide when read the next sequential
	202	unconsumed input line is to simply do so with probability
	203	<math\|p<rsub\|<text\|seq>>> such that the expected value of the average group
	204	size trends towards <math\|s>.
	205
	206	<\eqnarray*>
	207	<tformat\|<table\|<row\|<cell\|p<rsub\|<text\|seq>>>\|<cell\|=>\|<cell\|<around\|(\|1-p<rsub\|<text\|jump>>\|)>>>\|<row\|<cell\|p<rsub\|<text\|jump>>>\|<cell\|=>\|<cell\|1-p<rsub\|<text\|seq>>>>\|<row\|<cell\|s>\|<cell\|=>\|<cell\|<frac\|1\|p<rsub\|<text\|jump>>>=<frac\|1\|1-p<rsub\|<text\|seq>>><eq-number>>>\|<row\|<cell\|s>\|<cell\|=>\|<cell\|<frac\|1\|1-p<rsub\|<text\|seq>>>>>\|<row\|<cell\|1-p<rsub\|<text\|seq>>>\|<cell\|=>\|<cell\|<frac\|1\|s>>>\|<row\|<cell\|p<rsub\|<text\|seq>>-1>\|<cell\|=>\|<cell\|<frac\|-1\|s>>>\|<row\|<cell\|p<rsub\|<text\|seq>>>\|<cell\|=>\|<cell\|1-<frac\|1\|s<around\|(\|l<rsub\|<text\|in>>\|)>><eq-number><label\|eq
	208	pseq-from-s-lin>>>\|<row\|<cell\|p<rsub\|<text\|jump>>>\|<cell\|=>\|<cell\|<frac\|1\|s<around*\|(\|l<rsub\|<text\|in>>\|)>><eq-number><label\|eq
	209	pjump-from-s-lin>>>\|<row\|<cell\|>\|<cell\|>\|<cell\|>>\|<row\|<cell\|p<rsub\|<text\|seq>><around\|(\|l<rsub\|<text\|in>>\|)>>\|<cell\|=>\|<cell\|1-<around\|[\|<around\|(\|<frac\|s<around\|(\|l<rsub\|<text\|in>,0>\|)>-1\|<around*\|[\|ln
	210	<around\|(\|l<rsub\|<text\|in>,0>\|)>\|]><rsup\|2>>\|)>\<cdot\><around\|[\|<text\|ln><around*\|(\|l<rsub\|<text\|in>>\|)>\|]><rsup\|2>+1\|]><rsup\|-1><eq-number><label\|eq
	211	pseq-from-s-lin-exp>>>>>
	212	</eqnarray*>
	213
	214	<subsubsection\|Jump from random variate of inverse gaussian distribution>
	215
	216	Another method <shell\|bkshuf> may employ to decide when to read the next
	217	sequential unconsumed input line is to use an inverse gaussian
	218	distribution. This may be done by generating from the distribution a float
	219	sampled from the inverse gaussian with range 0 to infinity with mean
	220	<math\|\<mu\>> whenever a new random position in the input list is selected;
	221	the float is rounded to the nearest integer.<\footnote>
	222	See <name\|Michael, John R.> "Generating Random Variates Using
	223	Transformations with Multiple Roots". 1976.
	224	<hlink\|https://doi.org/10.2307/2683801\|https://doi.org/10.2307/2683801> .
	225	</footnote> Then, after consuming an input line, this integer is
	226	decremented by one and another sequential line is consumed provided the
	227	integer does not become less than or equal to zero. The inverse gaussian
	228	distribution requires specifying the mean <math\|\<mu\>> and the shape
	229	parameter <math\|\<lambda\>>; a higher <math\|\<lambda\>> results in a
	230	greater spread. An upper bound may also be specified since the distribution
	231	has none except for that imposed by its programming implementation.
	232
	233	The result of using an inverse gaussian distribution is an output with
	234	potentially much more regular group sizes than using the previously
	235	mentioned expected value method. However, the implementation of the inverse
	236	gaussian sampling operation described by (Michael, 1976) requires several
	237	exponent calculations and a square root calculation in addition to various
	238	multiplication and division operations. If sufficient processing power is
	239	available, this may not necessarily be an issue.
	240
	241	<subsubsection\|Output structure>
	242
	243	Regardless of whether group sizes are determined by the expected value
	244	method or using variates of an inverse gaussian distribution, mimicking the
	245	<shell\|shuf> property of all input lines being present in the output,
	246	albeit rearranged, results in a side effect: the first output lines are
	247	more likely to contain groups with uninterrupted sequence runs (high ROP)
	248	while groups in the last output lines are almost certain to contain
	249	sequence jumps within a group (less ROP). The reason for this is that
	250	<shell\|bkshuf>, when it encounters an input line that has already been
	251	consumed, will skip to the next available input line. The decision could be
	252	made to skip to a new random line, but, it is simpler to simply read the
	253	next available input line. The author's original intention of sampling only
	254	a short initial portion of the output is compatible with the behavior that
	255	ROP is preserved mostly at the beginning of the output.
	256
	257	<section\|Version History>
	258
	259	<\big-table\|<tabular\|<tformat\|<table\|<row\|<cell\|Version
	260	No.>\|<cell\|Date>\|<cell\|Path>\|<cell\|Description>>\|<row\|<cell\|<verbatim\|0.0.1>>\|<cell\|2023-02-14>\|<cell\|<verbatim\|unitproc/bkshuf>>\|<cell\|Initial
	261	draft implemented in <name\|Bash>.>>>>>>
	262	A table listing versions of <shell\|bkshuf>.
	263	</big-table>
	264
	265	<\description>
	266	<item*\|v0.0.1>Initial implementation in <shell\|bash> <verbatim\|5.1.16>
	267	with <shell\|bc> <verbatim\|1.07.1> and <name\|GNU Coreutils>
	268	<verbatim\|8.32> and tested on <name\|Pop!_OS> <verbatim\|22.04 LTS>. Saved
	269	to the author's <name\|BK-2020-03> repository<\footnote>
	270	See commit <hlink\|<verbatim\|080ea4c>\|https://gitlab.com/baltakatei/baltakatei-exdev/-/blob/080ea4c0ff0d4e6b5ce86f664fa6645c1cb02bf0/unitproc/bkshuf>
	271	at <hlink\|https://gitlab.com/baltakatei/baltakatei-exdev\|https://gitlab.com/baltakatei/baltakatei-exdev>
	272	.
	273	</footnote>.
	274	</description>
	275	</body>
	276
	277	<initial\|<\collection>
	278	</collection>>
	279
	280	<\references>
	281	<\collection>
	282	<associate\|auto-1\|<tuple\|1\|1\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	283	<associate\|auto-10\|<tuple\|3.3.3\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	284	<associate\|auto-11\|<tuple\|3.3.4\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	285	<associate\|auto-12\|<tuple\|4\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	286	<associate\|auto-13\|<tuple\|1\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	287	<associate\|auto-14\|<tuple\|1\|?\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	288	<associate\|auto-2\|<tuple\|2\|1\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	289	<associate\|auto-3\|<tuple\|3\|2\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	290	<associate\|auto-4\|<tuple\|3.1\|2\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	291	<associate\|auto-5\|<tuple\|3.2\|2\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	292	<associate\|auto-6\|<tuple\|3.3\|2\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	293	<associate\|auto-7\|<tuple\|3.3.1\|2\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	294	<associate\|auto-8\|<tuple\|1\|3\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	295	<associate\|auto-9\|<tuple\|3.3.2\|4\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	296	<associate\|eq gsize-const-ksq\|<tuple\|5\|3\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	297	<associate\|eq gsize-ex-1\|<tuple\|7\|4\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	298	<associate\|eq gsize-ex-1-lin500\|<tuple\|8\|4\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	299	<associate\|eq gsize-lin\|<tuple\|6\|4\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	300	<associate\|eq gsize-model\|<tuple\|3\|3\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	301	<associate\|eq gsize-param-rel\|<tuple\|4\|3\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	302	<associate\|eq pjump-from-s-lin\|<tuple\|11\|?\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	303	<associate\|eq pseq-from-s-lin\|<tuple\|10\|?\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	304	<associate\|eq pseq-from-s-lin-exp\|<tuple\|12\|?\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	305	<associate\|eq rel-x-lin\|<tuple\|1\|3\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	306	<associate\|eq rel-x0-lin0\|<tuple\|2\|3\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	307	<associate\|fig ex-plot-s\|<tuple\|1\|3\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	308	<associate\|footnote-1\|<tuple\|1\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	309	<associate\|footnote-2\|<tuple\|2\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	310	<associate\|footnr-1\|<tuple\|1\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	311	<associate\|footnr-2\|<tuple\|2\|5\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	312	<associate\|jump-to-random\|<tuple\|4\|2\|../../../../../wr/20230213..bkshuf_draft/src/doc.tm>>
	313	</collection>
	314	</references>
	315
	316	<\auxiliary>
	317	<\collection>
	318	<\associate\|figure>
	319	<tuple\|normal\|<\surround\|<hidden-binding\|<tuple>\|1>\|>
	320	A plot of a possible function that relates target group size
	321	<with\|mode\|<quote\|math>\|s> and input lines
	322	<with\|mode\|<quote\|math>\|l<rsub\|<with\|mode\|<quote\|text>\|in>>> that
	323	provide some ROP and PIR. The function is tuned to achieve
	324	<with\|mode\|<quote\|math>\|s<around*\|(\|l<rsub\|<with\|mode\|<quote\|text>\|in>>=10<rsup\|6>\|)>=25>.
	325	</surround>\|<pageref\|auto-8>>
	326	</associate>
	327	<\associate\|table>
	328	<tuple\|normal\|<\surround\|<hidden-binding\|<tuple>\|1>\|>
	329	A table listing versions of <with\|mode\|<quote\|prog>\|prog-language\|<quote\|shell>\|font-family\|<quote\|rm>\|bkshuf>.
	330	</surround>\|<pageref\|auto-13>>
	331	</associate>
	332	<\associate\|toc>
	333	<vspace*\|1fn><with\|font-series\|<quote\|bold>\|math-font-series\|<quote\|bold>\|1<space\|2spc>Summary>
	334	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	335	<no-break><pageref\|auto-1><vspace\|0.5fn>
	336
	337	<vspace*\|1fn><with\|font-series\|<quote\|bold>\|math-font-series\|<quote\|bold>\|2<space\|2spc>Objective>
	338	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	339	<no-break><pageref\|auto-2><vspace\|0.5fn>
	340
	341	<vspace*\|1fn><with\|font-series\|<quote\|bold>\|math-font-series\|<quote\|bold>\|3<space\|2spc>Design>
	342	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	343	<no-break><pageref\|auto-3><vspace\|0.5fn>
	344
	345	<with\|par-left\|<quote\|1tab>\|3.1<space\|2spc>Definitions
	346	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	347	<no-break><pageref\|auto-4>>
	348
	349	<with\|par-left\|<quote\|1tab>\|3.2<space\|2spc>Process
	350	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	351	<no-break><pageref\|auto-5>>
	352
	353	<with\|par-left\|<quote\|1tab>\|3.3<space\|2spc>Parameter analysis
	354	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	355	<no-break><pageref\|auto-6>>
	356
	357	<with\|par-left\|<quote\|2tab>\|3.3.1<space\|2spc>Target group size
	358	calculation <datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	359	<no-break><pageref\|auto-7>>
	360
	361	<with\|par-left\|<quote\|2tab>\|3.3.2<space\|2spc>Jump from expected value
	362	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	363	<no-break><pageref\|auto-9>>
	364
	365	<with\|par-left\|<quote\|2tab>\|3.3.3<space\|2spc>Jump from random variate
	366	of inverse gaussian distribution <datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	367	<no-break><pageref\|auto-10>>
	368
	369	<with\|par-left\|<quote\|2tab>\|3.3.4<space\|2spc>Output structure
	370	<datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	371	<no-break><pageref\|auto-11>>
	372
	373	<vspace*\|1fn><with\|font-series\|<quote\|bold>\|math-font-series\|<quote\|bold>\|4<space\|2spc>Version
	374	History> <datoms\|<macro\|x\|<repeat\|<arg\|x>\|<with\|font-series\|medium\|<with\|font-size\|1\|<space\|0.2fn>.<space\|0.2fn>>>>>\|<htab\|5mm>>
	375	<no-break><pageref\|auto-12><vspace\|0.5fn>
	376	</associate>
	377	</collection>
	378	</auxiliary>