Make your own free website on Tripod.com

[ < < < Home ] [ < < Reference Start ] [ < Reference Contents ]
[ < Previous=PDGREPPE Limits ] [ Next=PDGREPPE Options > ]

Intelligence Services

PDL - Pattern Definition Language

PDGREPPE offers an advanced feature over most other GREP's:

	Pattern Definitions or PDEF's

and

	Pattern Definition Dictionaries

These help You to formulate and re-use Magic Expressions
for both Search and Replace.

This document explains the basis for PDEF's in PDGREPPE and
their syntax (rules) and semantics (activities) that form
Pattern Definition Language or PDL.

The Basis for Pattern Definitions

One of the major limitations to most magic expressions in data
search and modification tools is the unavailability of magic
pattern definition storage for retrieval and reference.

Let the hard work You have done to create Magic Expressions
overcome reinvention of the Magic Wheel again.

Storage

If storage for magic expressions exists, then a pattern can be
more clearly constructed and comments given along with the
pattern to explain what the pattern does and consists of.

PDL allows use of multiple files for storage and reference so
files can be categorised and used for separate functions.

Storage allows REUSE of a magic pattern:

(After constructing a successful pattern, from either the command
line or a file in an editor or perhaps some other process, You
can store it so it can be reused.)

PDL Options

PDGREPPE has options for storage such as

-@<d>	file(s) for Search pattern @Definitions.

that specifies the "wells" or sources that pattern definitions
are drawn from

and similar options

-k@<d>	file(s) for Replace pattern @Definitions.
-j@<d>	  @Definition file(s) default directories.++ (.;:HOME:)

that allow PDEF's in Pattern Definition Files (PDFILE's) to be
referenced from "standard" places that You specify, such as a
directory or folder named "DEFS".

Another special option is

-jO	Optimise for many pattern @definitions.

that causes patterns with many repeated definitions and/or frequent reuse
of definition(s) to be built more quickly.

Creation of Definitions

A typical formation of a successful magic pattern is done by
placing a copy of a PDL file in your favourite editor with a
pattern expressed as a definition in that file.

Then You edit the pattern until You think it will match or
replace as intended.

After this You jump out of the editor, perhaps into a separate
window and give the pattern a test on some data from a command
shell or other process. PDGREPPE is given a pattern as part of
its arguments or command line parameters that references the
PDEF.  This could be a simple batch file that calls a PDGREPPE
command line.

An output from PDGREPPE on the screen or through an editor or
some other monitor is examined to compare the actual result to
the intended result.

If the results compare favourably, the PDEF is available for
further use, PDEF dictionary/file maintenance or modification.

If not, this sequence is repeated until You have a stable
pattern that does a correct search or replace activity.

Development and Debug of PDEF's

A Summary option

	-s show Summary of Pattern, File and Results per file.

or an interactive edit command

	P

displays the Full Pattern that results from using special magic
such as definitions, and Marked Group pattern/data operators.

This Full Pattern display can also show the PDFILE name and line
number of the definitions incorporated in a pattern by options

-jD show Search  @Definition=File:LineNumber={Pattern} in FullPattern.

and

-kD show Replace @Definition=File:LineNumber={Pattern} in FullPattern.

Perhaps one of the best PDL Debug options is

-jW show Warnings for pattern @definitions.

It shows

1. Repeated
2. Misnamed
3. Empty
4. Indent-broken
or
5. White-space terminated

PDEF's in PDFILE's.

The integrity of PDFILE's is enhanced by using options such as
these.

Customisation

When creating a PDEF, You have a number of options that allow even
more flexibility and ease of use such as options

-jg	  allow Global (Marked Group) Search references (% or #).
-kg	  allow Global (Marked Group) Replace references (%+-?).

These allow a "basic" type GLOBAL access to all PDEF's in all
specified PDFILE's, so any Definition can be de-encapsulated.

See PDGREPPE Options.

Syntax

PDL lets symbolic names be used for patterns of Literal and
Magic Expressions.

And these can be used in either SEARCH or REPLACE patterns.

Collection or compilation of a definition of the form

	@<mydef>
or

	@@<MyMacro>(<MyPattern>)

searches one of several possible Pattern Definition Files
(PDFILE's) for a complete Pattern Definition (PDEF).

A symbolic name might be something like

	@quack

that represents a Search pattern like

	duck|goose|gosling

that matches "duck" or "goose" or "gosling".

PDL permits pattern storage in standard or optional locations
and is an easy way to use pattern dictionaries.

So, if the "@quack" definition is stored in a PDFILE like
"ERE.DEF" as a PDEF of

	@quack	duck|goose|gosling

then a command line like

	PDGREPPE "@quack" birdtale.txt

can be used to search for any text occurrence of "duck" or
"goose" or "gosling" in the file "birdtale.txt".

Pattern Definition File Syntax

A PDFILE is a file You create with a text editor or word
processor (WP).  It must be a pure text file, so if making it
with a WP, then be sure and have the WP save it as a pure TEXT
document or it probably will not work.

Pattern Definition File Location

Collection of PDEF's in Pattern Definition Files (PDFILE's)
are located by the combination of -@<d> (Search) and -k@<d>
options, plus -j@<d> path selector option.

Any PDFILE given on the PDGREPPE command line as an
Absolute file name is used.

eg

For an Absolute path on the CL

	PDGREPPE -@"c:\defs\mydefs.def" @xyz bits

PDGREPPE will use the attached data <d> of option "-@<d>"

	"c:\defs\mydefs.def"

to specify the file that will be used as the first PDFILE to look
for PDEF's.

If a PDFILE has been located at this point and a PDEF of "@xyz"
found in that PDFILE, the collection for that PDEF is finished
and incorporated into the pattern.

If the file is not located, or the PDEF has not been found in
the file, the next step is skipped, since "c:\defs\mydefs.def"
is an Absolute path.

However if the path was a Relative path like

	-@"doit.def"

or

	-@"action\doit.def"

then the next step would be taken.

If option "-j@<d>" is enabled:

Another attempt is made to find a PDFILE with the PDEF with any
directories given by option

	-j@<d>

Example:

A file like

	doit.def

is appended to a PDEF default path like

	d:\defs

of -j@"d:\defs" to become

	d:\defs\doit.def

Or if the file were given as as a composite path like

	action\doit.def

that is both a directory "action" and a file "doit.def"
then the PDEF default path could be

	d:\defs\action\doit.def

Summary of PDFILE Access:

The first place searched for a PDEF is in any PDFILE list
specified in a command line argument

e.g. -@mydefs.def;thisthat.def

with its absolute paths to PDFILE's.

If a definition is not found in any command-line PDFILE list,
then the PDFILE access is tried with the alternate PDFILE
directory option of "-j@<d>" along with the same list

	-@<d>  for Search PDEF's

or

	-k@<d> for Replace PDEF's

If still not located, attempts are made to locate it
in any of the -j@<d> directories.

An error message is given if no definition can be found.

However, when found, a definition is placed into the pattern.

And the exact file name and line location can be seen in the

1. Pattern summaries (by option -s)

or, if interactively editing, by

2. The interactive edit command "p" that displays the full patterns
   being used for editing.

Pattern Definition Syntax

A PDEF consists of:

	a Definition Name (DNAME) <mydef>

at the start of a line and for an entire line

followed by

1. Spaces or Tabs

or

2. an escaped newline (the escape character "\" followed by a newline)
  (followed by possible indentation space)

and then

	the Definition Pattern (DPAT).

The DPAT replaces the DNAME or @<mydef> in the original pattern.

The PDEF should take the form of a DNAME at the start of the
line followed by White Space (spaces and/or tabs and/or escaped
new lines(with possible indentation)) followed by the DPAT.

Back-Slash "\" NewLine Continuation

A DPAT or the intervening White Space can be made longer than a
single line by ending or "escaping" a line with the Back-Slash
character "\" followed by a New Line.

When this is done, the start of the next line can also be
indented with White Space.

(The Back-Slash does NOT become part of the final pattern.)

The DPAT ends with the first unescaped New Line or EOF (End of
File).

Indentation of Definition segments that start with White Space

To start a DPAT with tab or space, the ERE pattern special
character tokens "\t" or "\s", or operator separator character
"," can be specified as the first characters in the DPAT,
followed by the leading tab or space.

Zero Bytes

PDL ignores any ASCII 0 characters.  These however can be
specified in a pattern with the string "\0".

Comment Sequences

PDL, like some languages, uses the semi colon character ";"
for comments.

And these comments can be continued to the next line(s) by use
of the Back-Slash noted above.  Thus, all it takes to ignore an
entire definition, that may be continued onto following lines by
"\" Back-Slash characters, is to put a single comment byte ";"
at the very front of the definition.

Also, invalid DNAME's count as comments and skip any
further interpretation as a PDEF.

Definition Names

The first character of a DNAME should be a non-digit member of
the VARIABLE character class like one of [a-zA-Z_], default
variable characters range, or as specified by option -jv<d>.

Tab, space, semi-colon ";", ASCII 0, and Back-Slash escape "\"
characters; or new line sequences CANNOT be used in a DNAME.

(As soon as Tab(s) or Space(s), or escaped NewLine(s) followed
by possible indentation, are encountered after a DNAME, the
search or replace pattern builder begins looking for a
definition character.)

Any misnamed DNAME and its DPAT is skipped over.

More on @Definition Names

DNAMES should be some meaningful word or combination of words

(eg @advanceForward).

DNAMES are case sensitive: upper and lower case letters must
match in a pattern and in the PDFILE.

A DNAME must occur on just one line, no continuation
escape-sequence like in a DPAT is allowed.

Definition Patterns

Each DPAT is expected to be a complete expression that can be
fully evaluated, even if it leads to other DPAT's by way of
other PDEF's.

No definition is allowed to call itself directly or indirectly,
through itself or another definition, because this results in an
endless loop.

Marked groups reference numbers after "#" and "%" characters
retain their point of reference or position number WITHIN THEIR
OWN PATTERN, NOT VISIBLE OR AFFECTED BY OTHER DEFINITIONS
(unless globalisation options like "-jg" or "-kg" are used).

This isolation gives ENCAPSULATION (an ingredient of object
oriented technology) to PDEF's.  It provides a way of limiting
the scope of definitions to immediate areas for safety.

Marked groups reference NAMES after "#" and "%" are by default
ENCAPSULATED.

Global @Definitions

With options "-jg" and "-kg" Definition names become GLOBAL.

For this globalised format, all Marked group names in the main
User Pattern and any Definition Patterns used by it or others
must be UNIQUE.

See Options "-jg" and "-kg" in PDGREPPE OPTIONS.

The General Form of Basic Use for PDL @Definitions

@<d>	insert definition <d> from pattern definition file here

A PDL definition in a pattern should have the form @<def-name>.
In a PDL file there should be the definition <def-name> followed
by its definition.

An example: In a pattern there could be the string "my @fruit"
and then in the PDL file there would be something like:

	fruit	apple|orange|banana

or

	fruit	apple|\
		orange|\
		banana

or possibly with comments

	;my favourite fruits
	fruit	apple|\
		orange|\
		banana

Then to use it just give a command like:

	pdgreppe -@<PDFILE> @fruit <files-to-search-through>

where <PDFILE> has Pattern Definition File name(s) where the
definition @fruit is stored.  And <files-to-search-through> are
the files of interest to look through for apples, oranges or
bananas.


	pdgreppe -Hjc "@fruit" ere.def


or perhaps tropical fruits


	pdgreppe -Hjc "@tropFruit" donate.frm


ERE.DEF has a definition for these as

	tropFruit	:i{pineapple|\
			coconut|\
			papaya}

and ":i" causes any search for them to be case insensitive.

To have Tabs or Spaces start a DPAT, just use a Separation
character sequence like "," or \<space> or \<tab> followed by
the Tabs or Spaces needed or use "\s" or "\t" to represent Tabs
or Spaces at the front of a definition.

Pattern Definition Macros Syntax

The General Form of a Definition Macro is like:

	@@<d>(<MyPattern>)

that inserts a definition <d> from a PDFILE as a Macro.

It uses the first non-space item after a Definition Name as a
unique text string.

This that be substituted in the remainder of the definition in
the PDL file, by the variable in the User pattern:

	 <MyPattern>

An example: In a user pattern there could be the string

	"@@bread(Rye)"

and then in the PDL PDFILE there would be a line like:

	bread	type	I will have it on type bread please

that will become, by substitution,

	I will have it on Rye bread please

in the user pattern.


	pdgreppe -Hjc "@@bread(Rye)" ere.def


What could be more simple?

Simple plain text Macro substitution in PDL can

1)	give further meaning to your patterns by allowing You
	to use meaningful names that also work as variables

2)	provide shielding between complex pattern specifications
	and your simple insertions of relevant pattern data

and

3)	save You much typing, or recalculation, when making or
	reusing Magic patterns, especially those that are similar
	to ones used before.

Definition-File Relationship

To see the PDFILE names and lines where Definitions are located
and what they are composed of, use option "-jD" along with
option "-s" for a summary Full Pattern, or if in interactive
edit, use the "P" command for show search/replace patterns.

PDL processing gives information when there are errors.  It will
give the name of the file in question that contains a faulty
definition, and a stack of information of any PDFILE's and
definitions that were also being processed at the time of error.

Repeated definitions are reported with option "-jW" so if You
are unable to make a pattern with definitions correctly, try
this to see if a definition is in some other part of the
intended file or other files that are being used for
definitions.

Summary

PDL or Pattern Definition Language, used with PDGREPPE is an
attempt to make Magic Expressions (ME's) more easy to use and
reusable.

(Especially since having to "cook up" ME's from scratch
can be difficult.)

It becomes more easy when there is a PDL dictionary or "tool
box" to help.

Let your Magic aspire to reusability!  Use PDL!

Sometimes referred to as Pagan Definition Language...

Pagans are the ones who have it right!

[ < < < Home ] [ < < Reference Start ] [ < Reference Contents ]
[ < Previous=PDGREPPE Limits ] [ Next=PDGREPPE Options > ]

Intelligence Services

© Intelligence Services 1987 - 2008   GPO Box 9,   ADELAIDE SA 5001,   AUSTRALIA
EMAIL   :   intlsvs@gmail.com