[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Limits ]
[ Next=PDGREPPE Options > ]
PDGREPPE offers an advanced feature over most other GREP's:
Pattern Definitions or PDEF's
and
Pattern Definition Dictionaries
These help You to formulate and re-use Magic Expressions
for both Search and Replace.
This document explains the basis for PDEF's in PDGREPPE and
their syntax (rules) and semantics (activities) that form
Pattern Definition Language or PDL.
The Basis for Pattern Definitions
One of the major limitations to most magic expressions in data
search and modification tools is the unavailability of magic
pattern definition storage for retrieval and reference.
Let the hard work You have done to create Magic Expressions
overcome reinvention of the Magic Wheel again.
Storage
If storage for magic expressions exists, then a pattern can be
more clearly constructed and comments given along with the
pattern to explain what the pattern does and consists of.
PDL allows use of multiple files for storage and reference so
files can be categorised and used for separate functions.
Storage allows REUSE of a magic pattern:
(After constructing a successful pattern, from either the command
line or a file in an editor or perhaps some other process, You
can store it so it can be reused.)
PDL Options
PDGREPPE has options for storage such as
-@<d> file(s) for Search pattern @Definitions.
that specifies the "wells" or sources that pattern definitions
are drawn from
and similar options
-k@<d> file(s) for Replace pattern @Definitions.
-j@<d> @Definition file(s) default directories.++ (.;:HOME:)
that allow PDEF's in Pattern Definition Files (PDFILE's) to be
referenced from "standard" places that You specify, such as a
directory or folder named "DEFS".
Another special option is
-jO Optimise for many pattern @definitions.
that causes patterns with many repeated definitions and/or frequent reuse
of definition(s) to be built more quickly.
Creation of Definitions
A typical formation of a successful magic pattern is done by
placing a copy of a PDL file in your favourite editor with a
pattern expressed as a definition in that file.
Then You edit the pattern until You think it will match or
replace as intended.
After this You jump out of the editor, perhaps into a separate
window and give the pattern a test on some data from a command
shell or other process. PDGREPPE is given a pattern as part of
its arguments or command line parameters that references the
PDEF. This could be a simple batch file that calls a PDGREPPE
command line.
An output from PDGREPPE on the screen or through an editor or
some other monitor is examined to compare the actual result to
the intended result.
If the results compare favourably, the PDEF is available for
further use, PDEF dictionary/file maintenance or modification.
If not, this sequence is repeated until You have a stable
pattern that does a correct search or replace activity.
Development and Debug of PDEF's
A Summary option
-s show Summary of Pattern, File and Results per file.
or an interactive edit command
P
displays the Full Pattern that results from using special magic
such as definitions, and Marked Group pattern/data operators.
This Full Pattern display can also show the PDFILE name and line
number of the definitions incorporated in a pattern by options
-jD show Search @Definition=File:LineNumber={Pattern} in FullPattern.
and
-kD show Replace @Definition=File:LineNumber={Pattern} in FullPattern.
Perhaps one of the best PDL Debug options is
-jW show Warnings for pattern @definitions.
It shows
1. Repeated
2. Misnamed
3. Empty
4. Indent-broken
or
5. White-space terminated
PDEF's in PDFILE's.
The integrity of PDFILE's is enhanced by using options such as
these.
Customisation
When creating a PDEF, You have a number of options that allow even
more flexibility and ease of use such as options
-jg allow Global (Marked Group) Search references (% or #).
-kg allow Global (Marked Group) Replace references (%+-?).
These allow a "basic" type GLOBAL access to all PDEF's in all
specified PDFILE's, so any Definition can be de-encapsulated.
See PDGREPPE Options.
Syntax
PDL lets symbolic names be used for patterns of Literal and
Magic Expressions.
And these can be used in either SEARCH or REPLACE patterns.
Collection or compilation of a definition of the form
@<mydef>
or
@@<MyMacro>(<MyPattern>)
searches one of several possible Pattern Definition Files
(PDFILE's) for a complete Pattern Definition (PDEF).
A symbolic name might be something like
@quack
that represents a Search pattern like
duck|goose|gosling
that matches "duck" or "goose" or "gosling".
PDL permits pattern storage in standard or optional locations
and is an easy way to use pattern dictionaries.
So, if the "@quack" definition is stored in a PDFILE like
"ERE.DEF" as a PDEF of
@quack duck|goose|gosling
then a command line like
PDGREPPE "@quack" birdtale.txt
can be used to search for any text occurrence of "duck" or
"goose" or "gosling" in the file "birdtale.txt".
Pattern Definition File Syntax
A PDFILE is a file You create with a text editor or word
processor (WP). It must be a pure text file, so if making it
with a WP, then be sure and have the WP save it as a pure TEXT
document or it probably will not work.
Pattern Definition File Location
Collection of PDEF's in Pattern Definition Files (PDFILE's)
are located by the combination of -@<d> (Search) and -k@<d>
options, plus -j@<d> path selector option.
Any PDFILE given on the PDGREPPE command line as an
Absolute file name is used.
eg
For an Absolute path on the CL
PDGREPPE -@"c:\defs\mydefs.def" @xyz bits
PDGREPPE will use the attached data <d> of option "-@<d>"
"c:\defs\mydefs.def"
to specify the file that will be used as the first PDFILE to look
for PDEF's.
If a PDFILE has been located at this point and a PDEF of "@xyz"
found in that PDFILE, the collection for that PDEF is finished
and incorporated into the pattern.
If the file is not located, or the PDEF has not been found in
the file, the next step is skipped, since "c:\defs\mydefs.def"
is an Absolute path.
However if the path was a Relative path like
-@"doit.def"
or
-@"action\doit.def"
then the next step would be taken.
If option "-j@<d>" is enabled:
Another attempt is made to find a PDFILE with the PDEF with any
directories given by option
-j@<d>
Example:
A file like
doit.def
is appended to a PDEF default path like
d:\defs
of -j@"d:\defs" to become
d:\defs\doit.def
Or if the file were given as as a composite path like
action\doit.def
that is both a directory "action" and a file "doit.def"
then the PDEF default path could be
d:\defs\action\doit.def
Summary of PDFILE Access:
The first place searched for a PDEF is in any PDFILE list
specified in a command line argument
e.g. -@mydefs.def;thisthat.def
with its absolute paths to PDFILE's.
If a definition is not found in any command-line PDFILE list,
then the PDFILE access is tried with the alternate PDFILE
directory option of "-j@<d>" along with the same list
-@<d> for Search PDEF's
or
-k@<d> for Replace PDEF's
If still not located, attempts are made to locate it
in any of the -j@<d> directories.
An error message is given if no definition can be found.
However, when found, a definition is placed into the pattern.
And the exact file name and line location can be seen in the
1. Pattern summaries (by option -s)
or, if interactively editing, by
2. The interactive edit command "p" that displays the full patterns
being used for editing.
Pattern Definition Syntax
A PDEF consists of:
a Definition Name (DNAME) <mydef>
at the start of a line and for an entire line
followed by
1. Spaces or Tabs
or
2. an escaped newline (the escape character "\" followed by a newline)
(followed by possible indentation space)
and then
the Definition Pattern (DPAT).
The DPAT replaces the DNAME or @<mydef> in the original pattern.
The PDEF should take the form of a DNAME at the start of the
line followed by White Space (spaces and/or tabs and/or escaped
new lines(with possible indentation)) followed by the DPAT.
Back-Slash "\" NewLine Continuation
A DPAT or the intervening White Space can be made longer than a
single line by ending or "escaping" a line with the Back-Slash
character "\" followed by a New Line.
When this is done, the start of the next line can also be
indented with White Space.
(The Back-Slash does NOT become part of the final pattern.)
The DPAT ends with the first unescaped New Line or EOF (End of
File).
Indentation of Definition segments that start with White Space
To start a DPAT with tab or space, the ERE pattern special
character tokens "\t" or "\s", or operator separator character
"," can be specified as the first characters in the DPAT,
followed by the leading tab or space.
Zero Bytes
PDL ignores any ASCII 0 characters. These however can be
specified in a pattern with the string "\0".
Comment Sequences
PDL, like some languages, uses the semi colon character ";"
for comments.
And these comments can be continued to the next line(s) by use
of the Back-Slash noted above. Thus, all it takes to ignore an
entire definition, that may be continued onto following lines by
"\" Back-Slash characters, is to put a single comment byte ";"
at the very front of the definition.
Also, invalid DNAME's count as comments and skip any
further interpretation as a PDEF.
Definition Names
The first character of a DNAME should be a non-digit member of
the VARIABLE character class like one of [a-zA-Z_], default
variable characters range, or as specified by option -jv<d>.
Tab, space, semi-colon ";", ASCII 0, and Back-Slash escape "\"
characters; or new line sequences CANNOT be used in a DNAME.
(As soon as Tab(s) or Space(s), or escaped NewLine(s) followed
by possible indentation, are encountered after a DNAME, the
search or replace pattern builder begins looking for a
definition character.)
Any misnamed DNAME and its DPAT is skipped over.
More on @Definition Names
DNAMES should be some meaningful word or combination of words
(eg @advanceForward).
DNAMES are case sensitive: upper and lower case letters must
match in a pattern and in the PDFILE.
A DNAME must occur on just one line, no continuation
escape-sequence like in a DPAT is allowed.
Definition Patterns
Each DPAT is expected to be a complete expression that can be
fully evaluated, even if it leads to other DPAT's by way of
other PDEF's.
No definition is allowed to call itself directly or indirectly,
through itself or another definition, because this results in an
endless loop.
Marked groups reference numbers after "#" and "%" characters
retain their point of reference or position number WITHIN THEIR
OWN PATTERN, NOT VISIBLE OR AFFECTED BY OTHER DEFINITIONS
(unless globalisation options like "-jg" or "-kg" are used).
This isolation gives ENCAPSULATION (an ingredient of object
oriented technology) to PDEF's. It provides a way of limiting
the scope of definitions to immediate areas for safety.
Marked groups reference NAMES after "#" and "%" are by default
ENCAPSULATED.
Global @Definitions
With options "-jg" and "-kg" Definition names become GLOBAL.
For this globalised format, all Marked group names in the main
User Pattern and any Definition Patterns used by it or others
must be UNIQUE.
See Options "-jg" and "-kg" in PDGREPPE OPTIONS.
The General Form of Basic Use for PDL @Definitions
@<d> insert definition <d> from pattern definition file here
A PDL definition in a pattern should have the form @<def-name>.
In a PDL file there should be the definition <def-name> followed
by its definition.
An example: In a pattern there could be the string "my @fruit"
and then in the PDL file there would be something like:
fruit apple|orange|banana
or
fruit apple|\
orange|\
banana
or possibly with comments
;my favourite fruits
fruit apple|\
orange|\
banana
Then to use it just give a command like:
pdgreppe -@<PDFILE> @fruit <files-to-search-through>
where <PDFILE> has Pattern Definition File name(s) where the
definition @fruit is stored. And <files-to-search-through> are
the files of interest to look through for apples, oranges or
bananas.
pdgreppe -Hjc "@fruit" ere.def
or perhaps tropical fruits
pdgreppe -Hjc "@tropFruit" donate.frm
ERE.DEF has a definition for these as
tropFruit :i{pineapple|\
coconut|\
papaya}
and ":i" causes any search for them to be case insensitive.
To have Tabs or Spaces start a DPAT, just use a Separation
character sequence like "," or \<space> or \<tab> followed by
the Tabs or Spaces needed or use "\s" or "\t" to represent Tabs
or Spaces at the front of a definition.
Pattern Definition Macros Syntax
The General Form of a Definition Macro is like:
@@<d>(<MyPattern>)
that inserts a definition <d> from a PDFILE as a Macro.
It uses the first non-space item after a Definition Name as a
unique text string.
This that be substituted in the remainder of the definition in
the PDL file, by the variable in the User pattern:
<MyPattern>
An example: In a user pattern there could be the string
"@@bread(Rye)"
and then in the PDL PDFILE there would be a line like:
bread type I will have it on type bread please
that will become, by substitution,
I will have it on Rye bread please
in the user pattern.
pdgreppe -Hjc "@@bread(Rye)" ere.def
What could be more simple?
Simple plain text Macro substitution in PDL can
1) give further meaning to your patterns by allowing You
to use meaningful names that also work as variables
2) provide shielding between complex pattern specifications
and your simple insertions of relevant pattern data
and
3) save You much typing, or recalculation, when making or
reusing Magic patterns, especially those that are similar
to ones used before.
Definition-File Relationship
To see the PDFILE names and lines where Definitions are located
and what they are composed of, use option "-jD" along with
option "-s" for a summary Full Pattern, or if in interactive
edit, use the "P" command for show search/replace patterns.
PDL processing gives information when there are errors. It will
give the name of the file in question that contains a faulty
definition, and a stack of information of any PDFILE's and
definitions that were also being processed at the time of error.
Repeated definitions are reported with option "-jW" so if You
are unable to make a pattern with definitions correctly, try
this to see if a definition is in some other part of the
intended file or other files that are being used for
definitions.
Summary
PDL or Pattern Definition Language, used with PDGREPPE is an
attempt to make Magic Expressions (ME's) more easy to use and
reusable.
(Especially since having to "cook up" ME's from scratch
can be difficult.)
It becomes more easy when there is a PDL dictionary or "tool
box" to help.
Let your Magic aspire to reusability! Use PDL!
Sometimes referred to as Pagan Definition Language...
Pagans are the ones who have it right!
[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Limits ]
[ Next=PDGREPPE Options > ]
© Intelligence Services 1987 - 2008
GPO Box 9, ADELAIDE SA 5001, AUSTRALIA
EMAIL : intlsvs@gmail.com