[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Search Operators ]
[ Next=PDGREPPE Quantity or Repetitions > ]
A group is a complete sub-pattern, expression or section within a pattern. A group can gather things together (collect) to be worked on as a single unit. A group can exclude (isolate) an inner expression from external activities. Both of these functions of collection and/or isolation are very important to effective pattern use. Unmarked Groups {ME0}...{MEn} Unmarked Groups of ME's x{ME0|ME1|...}y isolate x and y from OR group x{ME0&ME1&...}y isolate x and y from AND group {ME}* select wider area of ME for repeat test {} empty group matches ANY point or position An unmarked group is a group containing a pattern like {<pat>} where <pat> represents the pattern. pdgreppe -Hjc "{F|PD}GREPPE" file_id.diz ..."{F|PD}GREPPE" ISOLATES OR sequence "{F|PD}" from common string "GREPPE" and finds "FGREPPE" OR "PDGREPPE". Groups also allow a Quantity or Repeat reference to be larger if the reference is enclosed in a Marked or Unmarked Group. Such as the pattern {abc}+ that means: the reference "abc" repeated 1 or more times "+" being LARGER and like "abc", "abcabc" or "abcabcabc". This pattern can select a larger area than a similar pattern abc+ that means: the single reference "ab", followed by "c" repeated 1 or more times "+", like "abc", "abcc" or "abccc". Repeat or Quantity specifiers with symbols like "+" and "*" are explained in a following section. pdgreppe -Hjc "T&{TextTools}!." file_id.diz ...Unmarked Group "{TextTools}" is a sub-expression used by a NOT "!" operator. "T&{TextTools}!." finds any "T" in file_id.diz AND if it is NOT "!" the starting "T" of "TextTools" (like "TextTools" on the line starting with "README") then advance by the next of ANY "." character. A valuable component of PDGREPPE is the ability to match each and every position in a data area. This can be done by using the Empty Group: {} empty group matches ANY point or position pdgreppe -Hjc "{}" abc ...Unmarked Group Empty Pattern "{}" matches each and every data point, including End-of-File in file "abc". The number of data points in a file is the number of bytes in a file plus one. The first point represents the Start-of-File (SOF). The last point is at the End-of-File (EOF). The first point of every line is a Start-of-Line (SOL). The last point of every line is an End-of-Line (EOL). Each data point occurs just BEFORE a possible byte, except the last point at the EOF. Marked Groups (MG's) (ME0)...(MEn) Marked Groups of ME's x(ME0|ME1|...)y isolate x and y from OR group x(ME0&ME1&...)y isolate x and y from AND group (ME)* select wider area of ME for repeat test () empty group matches ANY point or position (ME)=<VName> use variable name <VName> to identify Marked Group (ME) (ME)::<VName> use variable name <VName> to identify Marked Group (ME) #<VName> previous Marked Group of pattern identified by <VName> %<VName> previous Marked Group of data match identified by <VName> #<n> previous <n>th Marked Group of pattern %<n> previous <n>th Marked Group of data match A marked group is a group containing a pattern like (<pat>) where <pat> represents the pattern. A marked group is used when a previous section of pattern is to be used, perhaps to avoid typing the pattern in again, or when a previous section of matched data is to be referenced. Marked groups give MAGIC expressions use of MEMORY (data retention and remembrance), a key element to well-planned thoughtful searches. Marked group numbers, used by "#" and "%" operators are to match a group of previous PATTERN "#" or pattern match DATA "%". These marked group numbers are usually 0-based, the number assigned for the first group (<grp1>) is 0, for (<grp2> it is 1, etc. pdgreppe -Hjc "(F|PD)GREPPE" file_id.diz ..."(F|PD)GREPPE" like "{F|PD}GREPPE" above, but ALSO allows re-use of pattern "(F|PD)" by previous "%<n>" DATA and "#<n>" PATTERN operators... Previous DATA operators: %<n> and %<VName> These operators check for repeated data. Example: A pattern like "(elephant|frog).*%0" will find a match in an areas like "That elephant is a big elephant" or "My frog is a Big frog" but will NOT find a match in an area like "An elephant is a frog" because "(elephant|frog).*%0" matches only something like "elephant"<possible-something-else>"elephant", or "frog"<possible-something-else>"frog". pdgreppe -1 -Hjc "(\u)%0" file_id.diz ...marked group Re-Match "%" operator in "(\u)%0" finds capital letter "(\u)" followed by identical capital letter "%0". pdgreppe -1 -Hjc "(\u)=up.*%up" file_id.diz ...marked group Re-Match "%" operator in "(\u)=up.*%up" finds capital letter "(\u)" followed by ANY other characters ".*" followed by an identical capital letter "%up" that refers to the DATA found "(\u)" and identified "=up" by "(\u)=up". pdgreppe -1 -Hjc "(\u)::up.*%up" file_id.diz ...same but using "::" for marked group assignment to name "up". Case Sensitivity Info: (Case Sensitivity Controls ":i", ":c" are discussed later in the Control section. They govern whether lower case letters like "abc" are matched the same as UPPER case letters like "ABC".) Previous PATTERN operators: #<n> and #<VName> These operators allow REUSE of patterns. Example: A pattern like "(tiger|fox).*#0" will find a match in areas like "A tiger is like a fox" or "My tiger is a big tiger" or "A fox can outfox a fox" or "Foxes and tigers have orange hair" because the USE of a previous pattern operator "#0" in "(tiger|fox).*#0" matches anything like "tiger...tiger" or "tiger...fox", or "fox...fox" or "fox...tiger" because the original pattern "(tiger|fox)" had both possibilities. Another example: pdgreppe -Hjc "(\u+)\S+#0" file_id.diz ...marked group Re-Use "#" operator in "(\u+)\S+#0" finds capital letter(s) "(\u+)" followed by white Space character(s) "\S+" followed by more capital letter(s) because "#0" references "(\u+)". Likewise, with a (Marked Group) reference variable name used: pdgreppe -Hjc "(\u+)=UPS\S+#UPS" file_id.diz ...marked group Re-Use "#" operator in "(\u+)=UPS\S+#UPS" finds capital letter(s) "(\u+)", tagged and identified as a Marked Group by "=UPS" followed by white Space character(s) "\S+" followed by capital letter(s) because "#UPS" references "(\u+)" in "(\u+)=UPS". These allow easy REUSE of a pattern segment in a larger pattern. Also if the pattern changes it only needs to be changed in one place. (The use of quantity operators like "+*?!" are discussed later in the section on PDGREPPE Quantity.) pdgreppe -Hjc "([\u^\O]+)\S+#0" file_id.diz ...Marked Group Re-Use "#" operator in "([\u^\O]+)\S+#0" finds capital letter(s) "\u" but NOT "^" vOwels "\O" in the (MarkedGroup) "([\u^\O]+)" followed by white Space character(s) past the (MarkedGroup) "\S+" AGAIN followed by capital letter(s) but NOT vOwels because "#0" references "[\u^\O]+" INSIDE the (MarkedGroup) "([\u^\O]+)". (The use of [Ranges] is discussed later in the section about PDGREPPE Single Characters.) Similarly, pdgreppe -Hjc "([\u^\O]+)=UpsNotVowels\S+#UpsNotVowels" file_id.diz ..can work with (MarkedGroup) reference variable names like "=UpsNotVowels" rather than numeric references like "#0". With "#", duplicate sections of a pattern can be changed in one place, where the section is defined. Note that # Re-Use operator merely inserts another copy of the original user pattern section where it appears. A valuable component of PDGREPPE is the ability to match each and every position in a data area. This can be done by using the Empty Group: () empty group matches ANY point or position pdgreppe -Hjc "()" abc ...Marked Group Empty Pattern "()" matches each and every data point, including End-of-File in file "abc", as explained in previous section on Unmarked {Groups}. This point can also be retested by "#" or data referenced by "%" Marked Group symbols. Important: Option "-jg" allows all references to a previous (MarkedGroup) like (isMG)=isSportsCar or (isMG)::isSportsCar to be GLOBAL. All names for groups in the pattern must be UNIQUE with this option. Also any use of the %<number> data matching function references any possible (MarkedGroups) in the current pattern to include those built the with (MarkedGroup) #<number> reuse function. Marked Group ReUse "#" does not allow reuse of pattern sections that include Named marked groups (e.g. ((pat)=Name)#0) because this tries to use duplicate names, when all MG names are supposed to be unique in a pattern. Also, any MG "#" pattern that refers to a previous MG, will add those MG's and this must be taken into account for the remainder of the pattern. e.g. "(([xy])P)#0(I)%2" will become "(([xy])P)([xy])(I)%2", with %2 referring to the data from the second "([xy])" and NOT the MG "(I)" which is the third MG. Examine the final expanded pattern with options -s to view the expanded MG Re-Use pattern. e.g. "(([xy])P)#0%2" becomes "(([xy])P)#0={([xy])}%2" in the expanded pattern. Also, a pattern or sub-pattern that consists of an empty plain group "{}" or marked group "()" matches ANY data point for zero width. See Also Options "-jg" and "-kg" in PDGREPPE OPTIONS. Groups Summary: Groups shield parts of a pattern from other parts of pattern. Groups also gather parts of a pattern together into a larger expression. Marked Groups (Expression) go even further and let the ME pattern use a pattern or data later for more matching or in external activities (like edit functions). All Groups contain one or more atoms. See PDGREPPE Pattern Definition Language.
[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Search Operators ]
[ Next=PDGREPPE Quantity or Repetitions > ]
© Intelligence Services 1987 - 2008
GPO Box 9, ADELAIDE SA 5001, AUSTRALIA
EMAIL : intlsvs@gmail.com