[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Groups ]
[ Next=PDGREPPE Position > ]
A quantity is a number of matches to a pattern segment. <previous>+ one or more of <previous> atom or group <previous>* any or zero of <previous> atom or group <previous>? one or zero of <previous> atom or group <previous>! nil or zero of <previous> atom or group <previous>- one or more bytes NOT <previous> atom or group <previous>/ any or zero bytes NOT <previous> atom or group <previous>; one or zero bytes NOT <previous> atom or group A previous pattern is either a group OR the smallest PREVIOUS piece (atom) of pattern. Use of PREVIOUS data is the traditional post-fix way of specifying REPEATED parts of a pattern in Regular Expressions. The smallest previous chunk of pattern can be a single character match pattern like "x", ".", "[\a]", or "\a", or a multi-character match pattern like "\N" for the PC-DOS newline sequence, explained later, or a position pattern like "`", "'", "<", ">", "^" or "$". A previous pattern can also be a group or entire expression with the {<group>} or (<group>) nomenclature that can contain many small chunks within itself as a conglomerate expression. With <previous>[-/;?*+] operators the match returned will be the match with the GREATEST allowable width. The ! no match operator has no width since no match has no width. Also a match to a quantity of <previous> areas only succeeds if the last match of a <previous> area comes at an end of file (EOF) or before the end of the data area being surveyed. The DEFAULT maximum span length for an ENTIRE pattern match, including repeats is 1024 characters. This can be shortened or lengthened by PDGREPPE Window-Width option "-w<d>". If the match length of a <previous> area BEYOND THE FIRST MATCH exceeds this length of "<d>" characters, NO-match occurs. Thus a number of Repeat matches can be limited to an exact amount by this Window-Width option or the Repeat Range specification "<prev>[-/;?*+]{min,max}" to be explained later. Also, since each LINE of data in a text file is similar to a database with its records being like LINES, a Magic search usually confines any results found by some of its special operators like Repeat and ANY (.) factors to a SINGLE LINE. This means that areas of text that overlap lines will NOT be easily found UNLESS a special option is used: -x repeat (/,-,;,?,*,+) and any (.) can X (cross) at newline This allows searches that use Quantity to easily match across the DOS new-line. See PDGREPPE Options for more info. So here are the Repeat Operators from + * and ? (positive repeat) on through ! ; / and - (negative repeat) + Operator: <previous>+ one or more of <previous> atom or group This One-Or-More repeat operator matches continuous sequences: c+ will match c or cc or ccc or ccc... pdgreppe -Hjc "el+" file_id.diz ..."el+" finds any "e" in file_id.diz followed by "l+", ONE OR MORE "l" characters. * Operator: <previous>* any or zero of <previous> atom or group This Zero-Or-More operator matches no sequence or continuous sequences: c* will match NOTHING or c or cc or ccc... pdgreppe -Hjc "el*" file_id.diz ..."el*" finds any "e" in file_id.diz followed by "l*", ZERO OR MORE "l" characters. ? Operator: <previous>? one or zero of <previous> atom or group This Zero-Or-One operator matches NO sequence (Zero) or a single sequence: c? will match NOTHING or c but NOT the first c of cc and NOT the first two c's of ccc Zero or One Repetition ? ... pdgreppe -Hjc "el?" file_id.diz ..."el?" finds any "e" in file_id.diz followed by "l?", ZERO OR JUST ONE "l" character. (NO "ell"... with 2 or more l's). ! Operator: <previous>! nil or zero of <previous> atom or group This Zero-Or-NOT operator matches NO sequence (Zero) c! will match an empty area or NOTHING but NOT at any c in c or cc or ccc... The pattern <pat>! is tested against the current area, and if it does NOT match, then the search PASSES this point and continues with more matching on any further pattern. No further advance in data by the Not operator ! is made. So if it were desired to test using Not like "\a!", then for this test to match a character and then move on to match more data, the pattern would have to be something like "\a!." because the Any factor "." would cause the pattern search to move one more character beyond the Not-matched area. A single character alternative to the NOT operator ! that does advances by one character is the negative character class [^<set>]. A pattern like "\a!." is equivalent to "[^a]", takes up less memory space but is SLOWER for searching than "[^a]". pdgreppe -Hjc "es!" file_id.diz ..."es!" finds any "e" in file_id.diz followed by "s!", NOT an "s" character. (NO "es"... with 1 or more s's). NOTE : In later DOS CMD.EXE shells, it may be necessary to precede "!" with a caret ^ character to escape shell interpretation. - Operator: <previous>- one or more bytes NOT <previous> atom or group This One-Or-More NOT Repeat operator matches ONE or MORE continuous bytes that are NOT equal to the Repeat Pattern: This is the equivalent of {<something>{0}.}+{1,} Example: s- will match and advance the search pointer for each X in Xs or XXs or XXs but will NOT match any of s or ss or sss... pdgreppe -1 -Hjc "s-" file_id.diz ..."s-" finds ONE OR MORE leading bytes that are NOT a "s" in file_id.diz up to the first place where there IS a "s" or an end of line. If the '-' quantity does not find any match at EOF or End-of-Line (EOL) then it will remain there because there are NO more bytes after an EOF or EOL marker. / Operator: <previous>/ any or zero bytes NOT <previous> atom or group This Zero-Or-More NOT Repeat operator matches continuous bytes or NO bytes that are NOT equal to the Repeat Pattern: This is the equivalent of {<something>{0}.}+{0,} Example: s/ will match Zero or more bytes before any area that has or does not have a "s". If the pattern is not matched, then NO advance by one byte is made, and the pattern matcher stays at the initial point. pdgreppe -1 -Hjc "s/" file_id.diz ..."s/" finds some leading bytes or NO leading bytes in file_id.diz up to the first "s". ; Operator: <previous>; one or zero bytes NOT <previous> atom or group This Zero-Or-One NOT repeat operator matches Zero bytes or JUST a single byte (One) that is NOT equal to the Repeat Pattern: This is the equivalent of {<something>{0}.}+{0,1} Example: s; will match any position or a single byte that come before a possible "s" character, or the point before s in s or the X before s in Xs but not the first "X" of "XX" in XXs pdgreppe -1 -Hjc "s;" file_id.diz ..."s;" finds some areas in file_id.diz followed by "s;", ZERO OR just ONE NOT "s" character. If the pattern is not matched, then NO advance by one byte is made. In summary the NOT Repeat operators of "-/;" are almost exactly opposite the corresponding "+*?" Repeat operators. Repeat Patterns with LIMITS of the form <SomePattern>*{<n>,<m>} give exact numerical boundaries to Quantity tests. <previous>*{<n>} exactly <n> matches of <previous> <previous>*{<n>,<x>} between <n> to <x> matches of <previous> <previous>*{<n>,} at least <n> matches of <previous> <previous>*{,<n>} zero to <n> matches of <previous> <previous>+{<n>,<x>} same as <previous>*{<n>,<x>} <previous>?{<n>,<x>} same as <previous>*{<n>,<x>} <previous>-{<n>,<m>} like <previous>*{<n>,<m>}, if NOT match, advance bytes <previous>/{<n>,<m>} like <previous>*{<n>,<m>}, if NOT match, advance bytes <previous>;{<n>,<m>} like <previous>*{<n>,<m>}, if NOT match, advance bytes <previous>!{<n>,<m>} if <n> or <m> NOT ZERO, do NOT match, like <previous>! <previous>!{<n>,<m>} if <n> and <m> are ZERO, just match one time (also -/;) With patterns like <previous>*{<min>,<max>} <min> and <max> MUST BE between 0 AND a maximum of 65535 and the format must exactly follow one of the 4 forms: {<min>} or {<min>,<max>} or {<min>,} or {,<max>} otherwise an error will result. PDGREPPE will automatically insert the Maximum of 65535 for <max> if the form is {<min>,} or will insert the Minimum of 0 for <min> if the form is {,<max>}. These {<min>,<max>} repeat forms definitely test for there being <min> MINimum number of repeats. However the <max> MAXimum limit can be complicated. Normally if there are more than the <max> number of repeats of something then the repeat test will fail. However it is possible that there could be more repeats beyond a certain area of something that are themselves covered by further matching by the pattern that help to limit the <max> number of repeats. This is because each and every match repeat depends on what happens to the ENTIRE remainder of the search pattern! So if the pattern and data areas beyond a section of repeat matching become limited, then that will reflect upon the repeat matching and numbers of repeat matches will be limited. For example, if some data is 7 x's xxxxxxx then it is quite possible for repeat patterns such as x+{2} and x+{3} of x+{2}x+{3}x+{2} to PASS even though they match at areas of the 7 x's where there are more than 2 and 3 x's in sequence! Each time a repeat test done by the first x+{2} pattern passes, a thorough check is made to determine if the next x+{3}x+{2} patterns also match. If they don't then the repeat matching continues on up the x's until finally all three of the x+{2}, x+{3}, and x+{2} patterns PASS in succession. This means that the numeric {Min,Max} parameters to Repeat operators can be used to match select certain fields in a list, sequence or range as shown above. Example: (x+{3})=first3(x+{1})=middle1(x+{3})=last3 This would allow (Marked Groups) operators to pick the first3, middle1 and last3 x's of the 7 x's. (Where there are exactly 7 x's and no more.) This can be very useful for picking and choosing in a field of similar consequtive parameters. Other: To use a pattern like "{min,max}" EXACTLY AS IT IS for an {unmarked group pattern}, use the Separator operator "," like <previous>*,{min,max} to separate a pattern like "<previous>*" from a group like "{min,max}" of a numeric {min,max} form that otherwise would have been interpreted as one of the number forms. More on the Separator "," operator later. A repeat sequence that searches for the last match can have before/after cut Control operators of ":b" and ":a" limit the match area to the last match in the sequence. This is a way to select a last area in a repeated sequence. Examples of *{<min>,<max>}: pdgreppe -Hjc "\C*{3}" file_id.diz ..."\C*{3}" finds any sequence of EXACTLY 3 Consonants "\C" from [bcdfghjklmnpqrstvwxzB-DF-HJ-NP-TV-XZ]. Consonant areas less than or greater than 3 are ignored. pdgreppe -Hjc "\u*{2,3}" file_id.diz ..."\u*{2,3}" finds sequences of 2 to 3 Upper-case "\u" letters. Areas with less than 2 or greater than 3 Upper-case "\u" letters are ignored. pdgreppe -1 -Hjc "<\u*{,3}" file_id.diz ..."<\u*{,3}" finds some areas with 0 to 3 Upper case "\u" letters at the start of any word "<". Upper case letter sequences longer than 3 repetitions are NOT matched. pdgreppe -Hjc "<\u*{3,}" file_id.diz ..."<\u*{3,}" finds 3 or more Upper case "\u" letters at the start of any word "<". Upper case letter sequences less than 3 repetitions are NOT matched. But with the ONE or MORE bytes NOT operator "-"... pdgreppe -Hjc "<\u-{3,}" file_id.diz ..."<\u-{3,}" finds 3 or more bytes that are NOT Upper case "\u" letters at the start of any word "<". For Quantity operators "/-;!?+*", at least two rules must be followed: <previous> as :[abci] OR :f[<EMPTY-file>] matches ONE or MORE TIMES If the repeat pattern <previous> contains just a combination of :a, :b, :c, :i or an :f[<EMPTY-file>] that measure for matching anywhere, then a the match succeeds just ONE TIME for ANY current data point. Also, POSITIONS, like any of "`'^$<>", to be explained in the next section, MATCH ANY NUMBER OF TIMES if found, so using a patterns like ^*{1,} ^*{65535,} ^*{2,3} ^*{,70} will pass at the start of any line (^) each and every time for any non-zero {<min>,<max>}. Other patterns like ^+ ^* and ^? will also pass at a SOL. But patterns like ^! ^- will NOT pass at a SOL since they rely on a definite NO-MATCH condition. Stacked Repeat operators like <pat>++ or <pat>+- are equivalent to <pat>+{}+ or <pat>+{}- where the second or further stacked operation is done on the Empty Pattern {} that matches EVERY point in the data.
[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Groups ]
[ Next=PDGREPPE Position > ]
© Intelligence Services 1987 - 2008
GPO Box 9, ADELAIDE SA 5001, AUSTRALIA
EMAIL : intlsvs@gmail.com