[ < < < Home ] [ < < Reference Start ] [ < Reference Contents ]
[ < Previous=PDGREPPE Search Operators ] [ Next=PDGREPPE Quantity or Repetitions > ]

Intelligence Services

PDGREPPE Groups

A group is a complete sub-pattern, expression or section within
a pattern.

A group can gather things together (collect) to be worked on as
a single unit.

A group can exclude (isolate) an inner expression from external
activities.

Both of these functions of collection and/or isolation are very
important to effective pattern use.

Unmarked Groups

 {ME0}...{MEn}	Unmarked Groups of ME's

 x{ME0|ME1|...}y	isolate x and y from OR  group
 x{ME0&ME1&...}y	isolate x and y from AND group
 {ME}*			select wider area of ME  for repeat test
 {}			empty  group matches ANY point or position

An unmarked group is a group containing a pattern like {<pat>}
where <pat> represents the pattern.


	pdgreppe -Hjc "{F|PD}GREPPE" file_id.diz


..."{F|PD}GREPPE" ISOLATES OR sequence "{F|PD}" from common
string "GREPPE" and finds "FGREPPE" OR "PDGREPPE".

Groups also allow a Quantity or Repeat reference to be larger if
the reference is enclosed in a Marked or Unmarked Group.  Such
as the pattern

	{abc}+

that means:

the reference "abc" repeated 1 or more times "+" being LARGER
and like "abc", "abcabc" or "abcabcabc".

This pattern can select a larger area than a similar pattern

	abc+

that means:

the single reference "ab", followed by "c" repeated 1 or more
times "+", like "abc", "abcc" or "abccc".

Repeat or Quantity specifiers with symbols like "+" and "*" are
explained in a following section.


	pdgreppe -Hjc "T&{TextTools}!." file_id.diz


...Unmarked Group "{TextTools}" is a sub-expression used by a
NOT "!" operator.

"T&{TextTools}!." finds any "T" in file_id.diz AND if it is
NOT "!" the starting "T" of "TextTools" (like "TextTools" on the
line starting with "README") then advance by the next of ANY "."
character.

A valuable component of PDGREPPE is the ability to match each
and every position in a data area.  This can be done by using
the Empty Group:

{}		empty  group matches ANY point or position


	pdgreppe -Hjc "{}" abc


...Unmarked Group Empty Pattern "{}" matches each and every data
point, including End-of-File in file "abc".

The number of data points in a file is the number of bytes in a
file plus one.

The first point represents the Start-of-File (SOF).

The last point is at the End-of-File (EOF).

The first point of every line is a Start-of-Line (SOL).

The last point of every line is an End-of-Line (EOL).

Each data point occurs just BEFORE a possible byte, except the
last point at the EOF.

Marked Groups (MG's)

 (ME0)...(MEn)	Marked Groups of ME's

 x(ME0|ME1|...)y	isolate x and y from OR  group
 x(ME0&ME1&...)y	isolate x and y from AND group
 (ME)*			select wider area of ME  for repeat test
 ()			empty  group matches ANY point or position

 (ME)=<VName>	use variable name <VName> to identify Marked Group (ME)
 (ME)::<VName>	use variable name <VName> to identify Marked Group (ME)

 #<VName>	previous Marked Group of pattern    identified by <VName>
 %<VName>	previous Marked Group of data match identified by <VName>

 #<n>		previous <n>th Marked Group of pattern
 %<n>           previous <n>th Marked Group of data match

A marked group is a group containing a pattern like (<pat>)
where <pat> represents the pattern.

A marked group is used when a previous section of pattern is to
be used, perhaps to avoid typing the pattern in again, or when a
previous section of matched data is to be referenced.

Marked groups give MAGIC expressions use of MEMORY (data
retention and remembrance), a key element to well-planned
thoughtful searches.

Marked group numbers, used by "#" and "%" operators are to match
a group of previous PATTERN "#" or pattern match DATA "%".

These marked group numbers are usually 0-based, the number
assigned for the first group (<grp1>) is 0, for (<grp2> it is 1,
etc.


	pdgreppe -Hjc "(F|PD)GREPPE" file_id.diz


..."(F|PD)GREPPE" like "{F|PD}GREPPE" above,

but ALSO allows re-use of pattern "(F|PD)" by previous "%<n>"
DATA and "#<n>" PATTERN operators...

Previous DATA operators:

	%<n>

and

	%<VName>

These operators check for repeated data.

Example:

A pattern like

	"(elephant|frog).*%0"

will find a match in an areas like

	"That elephant is a big elephant"

or

	"My frog is a Big frog"

but will NOT find a match in an area like

	"An elephant is a frog"

because

	"(elephant|frog).*%0"

matches only something like

	"elephant"<possible-something-else>"elephant",

or

	"frog"<possible-something-else>"frog".


	pdgreppe -1 -Hjc "(\u)%0" file_id.diz


...marked group Re-Match "%" operator in "(\u)%0" finds capital
letter "(\u)" followed by identical capital letter "%0".


	pdgreppe -1 -Hjc "(\u)=up.*%up" file_id.diz


...marked group Re-Match "%" operator in "(\u)=up.*%up" finds
capital letter "(\u)" followed by ANY other characters ".*"
followed by an identical capital letter "%up" that refers to the
DATA found "(\u)" and identified "=up" by "(\u)=up".


	pdgreppe -1 -Hjc "(\u)::up.*%up" file_id.diz


...same but using "::" for marked group assignment to name "up".

Case Sensitivity Info:

(Case Sensitivity Controls ":i", ":c" are discussed later in the
Control section.  They govern whether lower case letters like
"abc" are matched the same as UPPER case letters like "ABC".)

Previous PATTERN operators:

	#<n>

and

	#<VName>

These operators allow REUSE of patterns.

Example:

A pattern like

	"(tiger|fox).*#0"

will find a match in areas like

	"A tiger is like a fox"

or

	"My tiger is a big tiger"

or

	"A fox can outfox a fox"

or

	"Foxes and tigers have orange hair"

because the USE of a previous pattern operator "#0" in

	"(tiger|fox).*#0"

matches anything like

	"tiger...tiger"

or

	"tiger...fox",

or

	"fox...fox"

or

	"fox...tiger"

because the original pattern "(tiger|fox)" had both possibilities.

Another example:


	pdgreppe -Hjc "(\u+)\S+#0" file_id.diz


...marked group Re-Use "#" operator in "(\u+)\S+#0" finds
capital letter(s) "(\u+)" followed by white Space character(s)
"\S+" followed by more capital letter(s) because "#0" references
"(\u+)".

Likewise, with a (Marked Group) reference variable name used:


	pdgreppe -Hjc "(\u+)=UPS\S+#UPS" file_id.diz


...marked group Re-Use "#" operator in "(\u+)=UPS\S+#UPS" finds
capital letter(s) "(\u+)", tagged and identified as a Marked
Group by "=UPS" followed by white Space character(s) "\S+"
followed by capital letter(s) because "#UPS" references "(\u+)"
in "(\u+)=UPS".

These allow easy REUSE of a pattern segment in a larger pattern.

Also if the pattern changes it only needs to be changed in one
place.

(The use of quantity operators like "+*?!" are discussed later
in the section on PDGREPPE Quantity.)


	pdgreppe -Hjc "([\u^\O]+)\S+#0" file_id.diz


...Marked Group Re-Use "#" operator in

	"([\u^\O]+)\S+#0"

finds capital letter(s) "\u" but NOT "^" vOwels "\O" in the
(MarkedGroup)

	"([\u^\O]+)"

followed by white Space character(s) past the (MarkedGroup)

	"\S+"

AGAIN followed by capital letter(s) but NOT vOwels because

	"#0"

references "[\u^\O]+" INSIDE the (MarkedGroup) "([\u^\O]+)".

(The use of [Ranges] is discussed later in the section about
PDGREPPE Single Characters.)

Similarly,


	pdgreppe -Hjc "([\u^\O]+)=UpsNotVowels\S+#UpsNotVowels" file_id.diz


..can work with (MarkedGroup) reference variable names
like "=UpsNotVowels" rather than numeric references like "#0".

With "#", duplicate sections of a pattern can be changed
in one place, where the section is defined.

Note that # Re-Use operator merely inserts another copy of the
original user pattern section where it appears.

A valuable component of PDGREPPE is the ability to match each
and every position in a data area.  This can be done by using
the Empty Group:

()		empty  group matches ANY point or position


	pdgreppe -Hjc "()" abc


...Marked Group Empty Pattern "()" matches each and every data
point, including End-of-File in file "abc", as explained in previous
section on Unmarked {Groups}.

This point can also be retested by "#" or data referenced by "%"
Marked Group symbols.

Important:

Option "-jg" allows all references to a previous
(MarkedGroup) like

	(isMG)=isSportsCar
or
	(isMG)::isSportsCar

to be GLOBAL.

All names for groups in the pattern must be UNIQUE with this
option.  Also any use of the %<number> data matching function
references any possible (MarkedGroups) in the current pattern to
include those built the with (MarkedGroup) #<number> reuse
function.

Marked Group ReUse "#" does not allow reuse of pattern sections
that include Named marked groups (e.g. ((pat)=Name)#0) because
this tries to use duplicate names, when all MG names are supposed
to be unique in a pattern.

Also, any MG "#" pattern that refers to a previous MG, will add
those MG's and this must be taken into account for the remainder
of the pattern.

e.g.

"(([xy])P)#0(I)%2" will become "(([xy])P)([xy])(I)%2",

with %2 referring to the data from the second "([xy])"
and NOT the MG "(I)" which is the third MG.

Examine the final expanded pattern with options -s
to view the expanded MG Re-Use pattern.

e.g.

"(([xy])P)#0%2" becomes "(([xy])P)#0={([xy])}%2"
in the expanded pattern.

Also, a pattern or sub-pattern that consists of an
empty plain group "{}" or marked group "()" matches
ANY data point for zero width.

See Also Options "-jg" and "-kg" in PDGREPPE OPTIONS.

Groups Summary:

Groups shield parts of a pattern from other parts of pattern.

Groups also gather parts of a pattern together into a larger
expression.

Marked Groups (Expression) go even further and let the ME
pattern use a pattern or data later for more matching or in
external activities (like edit functions).

All Groups contain one or more atoms.


See PDGREPPE Pattern Definition Language.


[ < < < Home ] [ < < Reference Start ] [ < Reference Contents ]
[ < Previous=PDGREPPE Search Operators ] [ Next=PDGREPPE Quantity or Repetitions > ]

Intelligence Services

© Intelligence Services 1987 - 2008   GPO Box 9,   ADELAIDE SA 5001,   AUSTRALIA
EMAIL   :   intlsvs@gmail.com