[ < < < Home ] [ < < Reference Start ] [ < Reference Contents ]
[ < Previous=PDGREPPE Groups ] [ Next=PDGREPPE Position > ]

Intelligence Services

PDGREPPE Quantity or Repetitions

A quantity is a number of matches to a pattern segment.

 <previous>+	one or more of <previous> atom or group
 <previous>*	any or zero of <previous> atom or group
 <previous>?	one or zero of <previous> atom or group

 <previous>!	nil or zero of <previous> atom or group

 <previous>-	one or more bytes NOT <previous> atom or group
 <previous>/	any or zero bytes NOT <previous> atom or group
 <previous>;	one or zero bytes NOT <previous> atom or group

A previous pattern is either a group OR the smallest PREVIOUS
piece (atom) of pattern.

Use of PREVIOUS data is the traditional post-fix way of
specifying REPEATED parts of a pattern in Regular Expressions.

The smallest previous chunk of pattern can be a single character
match pattern like "x", ".", "[\a]", or "\a", or a
multi-character match pattern like "\N" for the PC-DOS newline
sequence, explained later, or a position pattern like "`", "'",
"<", ">", "^" or "$".

A previous pattern can also be a group or entire expression with
the {<group>} or (<group>) nomenclature that can contain many
small chunks within itself as a conglomerate expression.

With <previous>[-/;?*+] operators the match returned will be the
match with the GREATEST allowable width.  The ! no match
operator has no width since no match has no width.

Also a match to a quantity of <previous> areas only succeeds if
the last match of a <previous> area comes at an end of file
(EOF) or before the end of the data area being surveyed.

The DEFAULT maximum span length for an ENTIRE pattern match,
including repeats is 1024 characters.

This can be shortened or lengthened by PDGREPPE Window-Width
option "-w<d>".  If the match length of a <previous> area BEYOND
THE FIRST MATCH exceeds this length of "<d>" characters,
NO-match occurs.  Thus a number of Repeat matches can be limited
to an exact amount by this Window-Width option or the Repeat
Range specification "<prev>[-/;?*+]{min,max}" to be explained
later.

Also, since each LINE of data in a text file is similar to a
database with its records being like LINES, a Magic search
usually confines any results found by some of its special
operators like Repeat and ANY (.) factors to a SINGLE LINE.

This means that areas of text that overlap lines will NOT
be easily found UNLESS a special option is used:

-x	repeat (/,-,;,?,*,+) and any (.) can X (cross) at newline

This allows searches that use Quantity to easily match across
the DOS new-line.

See PDGREPPE Options for more info.

So here are the Repeat Operators from

	+ * and ?	(positive repeat)

on through

	! ; / and -	(negative repeat)

+ Operator:

<previous>+	one or more of <previous> atom or group

This One-Or-More repeat operator matches continuous sequences:

	c+

will match

	c

or

	cc

or

	ccc

or

	ccc...


	pdgreppe -Hjc "el+" file_id.diz


..."el+" finds any "e" in file_id.diz followed by "l+", ONE OR
MORE "l" characters.

* Operator:

<previous>*	any or zero of <previous> atom or group

This Zero-Or-More operator matches no sequence or continuous
sequences:

	c*

will match NOTHING

or

	c

or

	cc

or

	ccc...


	pdgreppe -Hjc "el*" file_id.diz


..."el*" finds any "e" in file_id.diz followed by "l*",
ZERO OR MORE "l" characters.

? Operator:

<previous>?	one or zero of <previous> atom or group

This Zero-Or-One operator matches NO sequence (Zero) or a single
sequence:

	c?

will match NOTHING

or

	c

but NOT the first c of

	cc

and NOT the first two c's of

	ccc

Zero or One Repetition ? ...


	pdgreppe -Hjc "el?" file_id.diz


..."el?" finds any "e" in file_id.diz followed by "l?", ZERO OR
JUST ONE "l" character.  (NO "ell"... with 2 or more l's).

! Operator:

<previous>!	nil or zero of <previous> atom or group

This Zero-Or-NOT operator matches NO sequence (Zero)

	c!

will match an empty area or NOTHING

but NOT at any c in

	c or cc or ccc...

The pattern <pat>! is tested against the current area, and if it
does NOT match, then the search PASSES this point and continues
with more matching on any further pattern.

No further advance in data by the Not operator ! is made.

So if it were desired to test using Not like "\a!", then for
this test to match a character and then move on to match more
data, the pattern would have to be something like "\a!." because
the Any factor "." would cause the pattern search to move
one more character beyond the Not-matched area.

A single character alternative to the NOT operator ! that does
advances by one character is the negative character class
[^<set>].  A pattern like "\a!." is equivalent to "[^a]", takes
up less memory space but is SLOWER for searching than "[^a]".


	pdgreppe -Hjc "es!" file_id.diz


..."es!" finds any "e" in file_id.diz followed by "s!", NOT an
"s" character.  (NO "es"... with 1 or more s's).

NOTE : In later DOS CMD.EXE shells, it may be necessary to precede
"!" with a caret ^ character to escape shell interpretation.

- Operator:

<previous>-	one or more bytes NOT <previous> atom or group

This One-Or-More NOT Repeat operator matches ONE or MORE
continuous bytes that are NOT equal to the Repeat Pattern:

This is the equivalent of

{<something>{0}.}+{1,}

Example:

	s-

will match and advance the search pointer for each X in

	Xs

or

	XXs

or

	XXs

but will NOT match any of

	s or ss or sss...


	pdgreppe -1 -Hjc "s-" file_id.diz


..."s-" finds ONE OR MORE leading bytes that are NOT a "s" in
file_id.diz up to the first place where there IS a "s" or an end
of line.

If the '-' quantity does not find any match at EOF or
End-of-Line (EOL) then it will remain there because there are NO
more bytes after an EOF or EOL marker.

/ Operator:

<previous>/	any or zero bytes NOT <previous> atom or group

This Zero-Or-More NOT Repeat operator matches continuous bytes
or NO bytes that are NOT equal to the Repeat Pattern:

This is the equivalent of

{<something>{0}.}+{0,}

Example:

	s/

will match Zero or more bytes before any area that has or does
not have a "s".

If the pattern is not matched, then NO advance by one byte is
made, and the pattern matcher stays at the initial point.


	pdgreppe -1 -Hjc "s/" file_id.diz


..."s/" finds some leading bytes or NO leading bytes in
file_id.diz up to the first "s".

; Operator:

<previous>;	one or zero bytes NOT <previous> atom or group

This Zero-Or-One NOT repeat operator matches Zero bytes or JUST
a single byte (One) that is NOT equal to the Repeat Pattern:

This is the equivalent of

{<something>{0}.}+{0,1}

Example:

	s;

will match any position or a single byte that come before a possible
"s" character,

or the point before s in

	s

or the X before s in

	Xs

but not the first "X" of "XX" in

	XXs


	pdgreppe -1 -Hjc "s;" file_id.diz


..."s;" finds some areas in file_id.diz followed by "s;",
ZERO OR just ONE NOT "s" character.

If the pattern is not matched, then NO advance by one byte is
made.

In summary the NOT Repeat operators of

	"-/;"

are almost exactly opposite the corresponding

	"+*?"

Repeat operators.

Repeat Patterns with LIMITS of the form

	<SomePattern>*{<n>,<m>}

give exact numerical boundaries to Quantity tests.

 <previous>*{<n>}	exactly	 <n>	    matches of <previous>
 <previous>*{<n>,<x>}   between	 <n> to <x> matches of <previous>
 <previous>*{<n>,}	at least <n>	    matches of <previous>
 <previous>*{,<n>}	zero to	 <n>	    matches of <previous>

 <previous>+{<n>,<x>}	same as <previous>*{<n>,<x>}
 <previous>?{<n>,<x>}	same as <previous>*{<n>,<x>}

 <previous>-{<n>,<m>}   like <previous>*{<n>,<m>}, if NOT match, advance bytes
 <previous>/{<n>,<m>}   like <previous>*{<n>,<m>}, if NOT match, advance bytes
 <previous>;{<n>,<m>}   like <previous>*{<n>,<m>}, if NOT match, advance bytes

 <previous>!{<n>,<m>}   if <n> or  <m> NOT ZERO, do NOT match, like <previous>!
 <previous>!{<n>,<m>}   if <n> and <m> are ZERO, just match one time (also -/;)

With patterns like

<previous>*{<min>,<max>}

<min> and <max> MUST BE

	between 0

	AND

	a maximum of 65535

and the format must exactly follow one of the 4 forms:

	{<min>} or {<min>,<max>} or {<min>,} or {,<max>}

otherwise an error will result.

PDGREPPE will automatically insert the Maximum of 65535 for <max>
if the form is {<min>,} or will insert the Minimum of 0 for <min>
if the form is {,<max>}.

These {<min>,<max>} repeat forms definitely test for there being <min>
MINimum number of repeats.

However the <max> MAXimum limit can be complicated.  Normally if
there are more than the <max> number of repeats of something then
the repeat test will fail.

However it is possible that there could be more repeats beyond a
certain area of something that are themselves covered by further
matching by the pattern that help to limit the <max> number of
repeats.

This is because each and every match repeat depends on what happens
to the ENTIRE remainder of the search pattern!

So if the pattern and data areas beyond a section of repeat matching
become limited, then that will reflect upon the repeat matching and
numbers of repeat matches will be limited.

For example, if some data is 7 x's

	xxxxxxx

then it is quite possible for repeat patterns such as

	 x+{2} and x+{3}

of

	x+{2}x+{3}x+{2}

to PASS even though they match at areas of the 7 x's where
there are more than 2 and 3 x's in sequence!

Each time a repeat test done by the first x+{2} pattern passes,
a thorough check is made to determine if the next x+{3}x+{2}
patterns also match.  If they don't then the repeat matching
continues on up the x's until finally all three of the x+{2},
x+{3}, and x+{2} patterns PASS in succession.

This means that the numeric {Min,Max} parameters to Repeat
operators can be used to match select certain fields in a list,
sequence or range as shown above.

Example:

	(x+{3})=first3(x+{1})=middle1(x+{3})=last3

This would allow (Marked Groups) operators to pick the first3,
middle1 and last3 x's of the 7 x's.  (Where there are exactly 7 x's
and no more.)  This can be very useful for picking and choosing
in a field of similar consequtive parameters.

Other:

To use a pattern like "{min,max}" EXACTLY AS IT IS for an {unmarked
group pattern}, use the Separator operator "," like

	<previous>*,{min,max}

to separate a pattern like

	"<previous>*"

from a group like

	"{min,max}"

of a numeric {min,max} form that otherwise would have been
interpreted as one of the number forms.  More on the Separator
"," operator later.

A repeat sequence that searches for the last match can have
before/after cut Control operators of ":b" and ":a" limit the
match area to the last match in the sequence.  This is a way to
select a last area in a repeated sequence.

Examples of *{<min>,<max>}:


	pdgreppe -Hjc "\C*{3}" file_id.diz


..."\C*{3}" finds any sequence of EXACTLY 3 Consonants "\C" from
[bcdfghjklmnpqrstvwxzB-DF-HJ-NP-TV-XZ].

Consonant areas less than or greater than 3 are ignored.


	pdgreppe -Hjc "\u*{2,3}" file_id.diz


..."\u*{2,3}" finds sequences of 2 to 3 Upper-case "\u" letters.

Areas with less than 2 or greater than 3 Upper-case "\u" letters
are ignored.


	pdgreppe -1 -Hjc "<\u*{,3}" file_id.diz


..."<\u*{,3}" finds some areas with 0 to 3 Upper case "\u"
letters at the start of any word "<".

Upper case letter sequences longer than 3 repetitions are NOT
matched.


	pdgreppe -Hjc "<\u*{3,}" file_id.diz


..."<\u*{3,}" finds 3 or more Upper case "\u" letters at the
start of any word "<".

Upper case letter sequences less than 3 repetitions are NOT
matched.

But with the ONE or MORE bytes NOT operator "-"...


	pdgreppe -Hjc "<\u-{3,}" file_id.diz


..."<\u-{3,}" finds 3 or more bytes that are NOT Upper case "\u"
letters at the start of any word "<".

For Quantity operators "/-;!?+*", at least two rules must be
followed:

<previous>	as :[abci] OR :f[<EMPTY-file>] matches ONE or MORE TIMES

If the repeat pattern <previous> contains just a combination of
:a, :b, :c, :i or an :f[<EMPTY-file>] that measure for matching
anywhere, then a the match succeeds just ONE TIME for ANY
current data point.

Also, POSITIONS, like any of "`'^$<>", to be explained in the
next section, MATCH ANY NUMBER OF TIMES if found, so using a
patterns like

	^*{1,}
	^*{65535,}
	^*{2,3}
	^*{,70}

will pass at the start of any line (^) each and every time for any
non-zero {<min>,<max>}.

Other patterns like

	^+
	^*
and
	^?

will also pass at a SOL.

But patterns like

	^!
	^-

will NOT pass at a SOL since they rely on a definite NO-MATCH
condition.

Stacked Repeat operators like

	<pat>++

or

	<pat>+-

are equivalent to

	<pat>+{}+

or

	<pat>+{}-

where the second or further stacked operation is done on the
Empty Pattern {} that matches EVERY point in the data.

[ < < < Home ] [ < < Reference Start ] [ < Reference Contents ]
[ < Previous=PDGREPPE Groups ] [ Next=PDGREPPE Position > ]

Intelligence Services

© Intelligence Services 1987 - 2008   GPO Box 9,   ADELAIDE SA 5001,   AUSTRALIA
EMAIL   :   intlsvs@gmail.com