[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Search Operators ]
[ Next=PDGREPPE Quantity or Repetitions > ]
A group is a complete sub-pattern, expression or section within
a pattern.
A group can gather things together (collect) to be worked on as
a single unit.
A group can exclude (isolate) an inner expression from external
activities.
Both of these functions of collection and/or isolation are very
important to effective pattern use.
Unmarked Groups
{ME0}...{MEn} Unmarked Groups of ME's
x{ME0|ME1|...}y isolate x and y from OR group
x{ME0&ME1&...}y isolate x and y from AND group
{ME}* select wider area of ME for repeat test
{} empty group matches ANY point or position
An unmarked group is a group containing a pattern like {<pat>}
where <pat> represents the pattern.
pdgreppe -Hjc "{F|PD}GREPPE" file_id.diz
..."{F|PD}GREPPE" ISOLATES OR sequence "{F|PD}" from common
string "GREPPE" and finds "FGREPPE" OR "PDGREPPE".
Groups also allow a Quantity or Repeat reference to be larger if
the reference is enclosed in a Marked or Unmarked Group. Such
as the pattern
{abc}+
that means:
the reference "abc" repeated 1 or more times "+" being LARGER
and like "abc", "abcabc" or "abcabcabc".
This pattern can select a larger area than a similar pattern
abc+
that means:
the single reference "ab", followed by "c" repeated 1 or more
times "+", like "abc", "abcc" or "abccc".
Repeat or Quantity specifiers with symbols like "+" and "*" are
explained in a following section.
pdgreppe -Hjc "T&{TextTools}!." file_id.diz
...Unmarked Group "{TextTools}" is a sub-expression used by a
NOT "!" operator.
"T&{TextTools}!." finds any "T" in file_id.diz AND if it is
NOT "!" the starting "T" of "TextTools" (like "TextTools" on the
line starting with "README") then advance by the next of ANY "."
character.
A valuable component of PDGREPPE is the ability to match each
and every position in a data area. This can be done by using
the Empty Group:
{} empty group matches ANY point or position
pdgreppe -Hjc "{}" abc
...Unmarked Group Empty Pattern "{}" matches each and every data
point, including End-of-File in file "abc".
The number of data points in a file is the number of bytes in a
file plus one.
The first point represents the Start-of-File (SOF).
The last point is at the End-of-File (EOF).
The first point of every line is a Start-of-Line (SOL).
The last point of every line is an End-of-Line (EOL).
Each data point occurs just BEFORE a possible byte, except the
last point at the EOF.
Marked Groups (MG's)
(ME0)...(MEn) Marked Groups of ME's
x(ME0|ME1|...)y isolate x and y from OR group
x(ME0&ME1&...)y isolate x and y from AND group
(ME)* select wider area of ME for repeat test
() empty group matches ANY point or position
(ME)=<VName> use variable name <VName> to identify Marked Group (ME)
(ME)::<VName> use variable name <VName> to identify Marked Group (ME)
#<VName> previous Marked Group of pattern identified by <VName>
%<VName> previous Marked Group of data match identified by <VName>
#<n> previous <n>th Marked Group of pattern
%<n> previous <n>th Marked Group of data match
A marked group is a group containing a pattern like (<pat>)
where <pat> represents the pattern.
A marked group is used when a previous section of pattern is to
be used, perhaps to avoid typing the pattern in again, or when a
previous section of matched data is to be referenced.
Marked groups give MAGIC expressions use of MEMORY (data
retention and remembrance), a key element to well-planned
thoughtful searches.
Marked group numbers, used by "#" and "%" operators are to match
a group of previous PATTERN "#" or pattern match DATA "%".
These marked group numbers are usually 0-based, the number
assigned for the first group (<grp1>) is 0, for (<grp2> it is 1,
etc.
pdgreppe -Hjc "(F|PD)GREPPE" file_id.diz
..."(F|PD)GREPPE" like "{F|PD}GREPPE" above,
but ALSO allows re-use of pattern "(F|PD)" by previous "%<n>"
DATA and "#<n>" PATTERN operators...
Previous DATA operators:
%<n>
and
%<VName>
These operators check for repeated data.
Example:
A pattern like
"(elephant|frog).*%0"
will find a match in an areas like
"That elephant is a big elephant"
or
"My frog is a Big frog"
but will NOT find a match in an area like
"An elephant is a frog"
because
"(elephant|frog).*%0"
matches only something like
"elephant"<possible-something-else>"elephant",
or
"frog"<possible-something-else>"frog".
pdgreppe -1 -Hjc "(\u)%0" file_id.diz
...marked group Re-Match "%" operator in "(\u)%0" finds capital
letter "(\u)" followed by identical capital letter "%0".
pdgreppe -1 -Hjc "(\u)=up.*%up" file_id.diz
...marked group Re-Match "%" operator in "(\u)=up.*%up" finds
capital letter "(\u)" followed by ANY other characters ".*"
followed by an identical capital letter "%up" that refers to the
DATA found "(\u)" and identified "=up" by "(\u)=up".
pdgreppe -1 -Hjc "(\u)::up.*%up" file_id.diz
...same but using "::" for marked group assignment to name "up".
Case Sensitivity Info:
(Case Sensitivity Controls ":i", ":c" are discussed later in the
Control section. They govern whether lower case letters like
"abc" are matched the same as UPPER case letters like "ABC".)
Previous PATTERN operators:
#<n>
and
#<VName>
These operators allow REUSE of patterns.
Example:
A pattern like
"(tiger|fox).*#0"
will find a match in areas like
"A tiger is like a fox"
or
"My tiger is a big tiger"
or
"A fox can outfox a fox"
or
"Foxes and tigers have orange hair"
because the USE of a previous pattern operator "#0" in
"(tiger|fox).*#0"
matches anything like
"tiger...tiger"
or
"tiger...fox",
or
"fox...fox"
or
"fox...tiger"
because the original pattern "(tiger|fox)" had both possibilities.
Another example:
pdgreppe -Hjc "(\u+)\S+#0" file_id.diz
...marked group Re-Use "#" operator in "(\u+)\S+#0" finds
capital letter(s) "(\u+)" followed by white Space character(s)
"\S+" followed by more capital letter(s) because "#0" references
"(\u+)".
Likewise, with a (Marked Group) reference variable name used:
pdgreppe -Hjc "(\u+)=UPS\S+#UPS" file_id.diz
...marked group Re-Use "#" operator in "(\u+)=UPS\S+#UPS" finds
capital letter(s) "(\u+)", tagged and identified as a Marked
Group by "=UPS" followed by white Space character(s) "\S+"
followed by capital letter(s) because "#UPS" references "(\u+)"
in "(\u+)=UPS".
These allow easy REUSE of a pattern segment in a larger pattern.
Also if the pattern changes it only needs to be changed in one
place.
(The use of quantity operators like "+*?!" are discussed later
in the section on PDGREPPE Quantity.)
pdgreppe -Hjc "([\u^\O]+)\S+#0" file_id.diz
...Marked Group Re-Use "#" operator in
"([\u^\O]+)\S+#0"
finds capital letter(s) "\u" but NOT "^" vOwels "\O" in the
(MarkedGroup)
"([\u^\O]+)"
followed by white Space character(s) past the (MarkedGroup)
"\S+"
AGAIN followed by capital letter(s) but NOT vOwels because
"#0"
references "[\u^\O]+" INSIDE the (MarkedGroup) "([\u^\O]+)".
(The use of [Ranges] is discussed later in the section about
PDGREPPE Single Characters.)
Similarly,
pdgreppe -Hjc "([\u^\O]+)=UpsNotVowels\S+#UpsNotVowels" file_id.diz
..can work with (MarkedGroup) reference variable names
like "=UpsNotVowels" rather than numeric references like "#0".
With "#", duplicate sections of a pattern can be changed
in one place, where the section is defined.
Note that # Re-Use operator merely inserts another copy of the
original user pattern section where it appears.
A valuable component of PDGREPPE is the ability to match each
and every position in a data area. This can be done by using
the Empty Group:
() empty group matches ANY point or position
pdgreppe -Hjc "()" abc
...Marked Group Empty Pattern "()" matches each and every data
point, including End-of-File in file "abc", as explained in previous
section on Unmarked {Groups}.
This point can also be retested by "#" or data referenced by "%"
Marked Group symbols.
Important:
Option "-jg" allows all references to a previous
(MarkedGroup) like
(isMG)=isSportsCar
or
(isMG)::isSportsCar
to be GLOBAL.
All names for groups in the pattern must be UNIQUE with this
option. Also any use of the %<number> data matching function
references any possible (MarkedGroups) in the current pattern to
include those built the with (MarkedGroup) #<number> reuse
function.
Marked Group ReUse "#" does not allow reuse of pattern sections
that include Named marked groups (e.g. ((pat)=Name)#0) because
this tries to use duplicate names, when all MG names are supposed
to be unique in a pattern.
Also, any MG "#" pattern that refers to a previous MG, will add
those MG's and this must be taken into account for the remainder
of the pattern.
e.g.
"(([xy])P)#0(I)%2" will become "(([xy])P)([xy])(I)%2",
with %2 referring to the data from the second "([xy])"
and NOT the MG "(I)" which is the third MG.
Examine the final expanded pattern with options -s
to view the expanded MG Re-Use pattern.
e.g.
"(([xy])P)#0%2" becomes "(([xy])P)#0={([xy])}%2"
in the expanded pattern.
Also, a pattern or sub-pattern that consists of an
empty plain group "{}" or marked group "()" matches
ANY data point for zero width.
See Also Options "-jg" and "-kg" in PDGREPPE OPTIONS.
Groups Summary:
Groups shield parts of a pattern from other parts of pattern.
Groups also gather parts of a pattern together into a larger
expression.
Marked Groups (Expression) go even further and let the ME
pattern use a pattern or data later for more matching or in
external activities (like edit functions).
All Groups contain one or more atoms.
See PDGREPPE Pattern Definition Language.
[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Search Operators ]
[ Next=PDGREPPE Quantity or Repetitions > ]
© Intelligence Services 1987 - 2008
GPO Box 9, ADELAIDE SA 5001, AUSTRALIA
EMAIL : intlsvs@gmail.com