[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Limits ]
[ Next=PDGREPPE Options > ]
PDGREPPE offers an advanced feature over most other GREP's: Pattern Definitions or PDEF's and Pattern Definition Dictionaries These help You to formulate and re-use Magic Expressions for both Search and Replace. This document explains the basis for PDEF's in PDGREPPE and their syntax (rules) and semantics (activities) that form Pattern Definition Language or PDL. The Basis for Pattern Definitions One of the major limitations to most magic expressions in data search and modification tools is the unavailability of magic pattern definition storage for retrieval and reference. Let the hard work You have done to create Magic Expressions overcome reinvention of the Magic Wheel again. Storage If storage for magic expressions exists, then a pattern can be more clearly constructed and comments given along with the pattern to explain what the pattern does and consists of. PDL allows use of multiple files for storage and reference so files can be categorised and used for separate functions. Storage allows REUSE of a magic pattern: (After constructing a successful pattern, from either the command line or a file in an editor or perhaps some other process, You can store it so it can be reused.) PDL Options PDGREPPE has options for storage such as -@<d> file(s) for Search pattern @Definitions. that specifies the "wells" or sources that pattern definitions are drawn from and similar options -k@<d> file(s) for Replace pattern @Definitions. -j@<d> @Definition file(s) default directories.++ (.;:HOME:) that allow PDEF's in Pattern Definition Files (PDFILE's) to be referenced from "standard" places that You specify, such as a directory or folder named "DEFS". Another special option is -jO Optimise for many pattern @definitions. that causes patterns with many repeated definitions and/or frequent reuse of definition(s) to be built more quickly. Creation of Definitions A typical formation of a successful magic pattern is done by placing a copy of a PDL file in your favourite editor with a pattern expressed as a definition in that file. Then You edit the pattern until You think it will match or replace as intended. After this You jump out of the editor, perhaps into a separate window and give the pattern a test on some data from a command shell or other process. PDGREPPE is given a pattern as part of its arguments or command line parameters that references the PDEF. This could be a simple batch file that calls a PDGREPPE command line. An output from PDGREPPE on the screen or through an editor or some other monitor is examined to compare the actual result to the intended result. If the results compare favourably, the PDEF is available for further use, PDEF dictionary/file maintenance or modification. If not, this sequence is repeated until You have a stable pattern that does a correct search or replace activity. Development and Debug of PDEF's A Summary option -s show Summary of Pattern, File and Results per file. or an interactive edit command P displays the Full Pattern that results from using special magic such as definitions, and Marked Group pattern/data operators. This Full Pattern display can also show the PDFILE name and line number of the definitions incorporated in a pattern by options -jD show Search @Definition=File:LineNumber={Pattern} in FullPattern. and -kD show Replace @Definition=File:LineNumber={Pattern} in FullPattern. Perhaps one of the best PDL Debug options is -jW show Warnings for pattern @definitions. It shows 1. Repeated 2. Misnamed 3. Empty 4. Indent-broken or 5. White-space terminated PDEF's in PDFILE's. The integrity of PDFILE's is enhanced by using options such as these. Customisation When creating a PDEF, You have a number of options that allow even more flexibility and ease of use such as options -jg allow Global (Marked Group) Search references (% or #). -kg allow Global (Marked Group) Replace references (%+-?). These allow a "basic" type GLOBAL access to all PDEF's in all specified PDFILE's, so any Definition can be de-encapsulated. See PDGREPPE Options. Syntax PDL lets symbolic names be used for patterns of Literal and Magic Expressions. And these can be used in either SEARCH or REPLACE patterns. Collection or compilation of a definition of the form @<mydef> or @@<MyMacro>(<MyPattern>) searches one of several possible Pattern Definition Files (PDFILE's) for a complete Pattern Definition (PDEF). A symbolic name might be something like @quack that represents a Search pattern like duck|goose|gosling that matches "duck" or "goose" or "gosling". PDL permits pattern storage in standard or optional locations and is an easy way to use pattern dictionaries. So, if the "@quack" definition is stored in a PDFILE like "ERE.DEF" as a PDEF of @quack duck|goose|gosling then a command line like PDGREPPE "@quack" birdtale.txt can be used to search for any text occurrence of "duck" or "goose" or "gosling" in the file "birdtale.txt". Pattern Definition File Syntax A PDFILE is a file You create with a text editor or word processor (WP). It must be a pure text file, so if making it with a WP, then be sure and have the WP save it as a pure TEXT document or it probably will not work. Pattern Definition File Location Collection of PDEF's in Pattern Definition Files (PDFILE's) are located by the combination of -@<d> (Search) and -k@<d> options, plus -j@<d> path selector option. Any PDFILE given on the PDGREPPE command line as an Absolute file name is used. eg For an Absolute path on the CL PDGREPPE -@"c:\defs\mydefs.def" @xyz bits PDGREPPE will use the attached data <d> of option "-@<d>" "c:\defs\mydefs.def" to specify the file that will be used as the first PDFILE to look for PDEF's. If a PDFILE has been located at this point and a PDEF of "@xyz" found in that PDFILE, the collection for that PDEF is finished and incorporated into the pattern. If the file is not located, or the PDEF has not been found in the file, the next step is skipped, since "c:\defs\mydefs.def" is an Absolute path. However if the path was a Relative path like -@"doit.def" or -@"action\doit.def" then the next step would be taken. If option "-j@<d>" is enabled: Another attempt is made to find a PDFILE with the PDEF with any directories given by option -j@<d> Example: A file like doit.def is appended to a PDEF default path like d:\defs of -j@"d:\defs" to become d:\defs\doit.def Or if the file were given as as a composite path like action\doit.def that is both a directory "action" and a file "doit.def" then the PDEF default path could be d:\defs\action\doit.def Summary of PDFILE Access: The first place searched for a PDEF is in any PDFILE list specified in a command line argument e.g. -@mydefs.def;thisthat.def with its absolute paths to PDFILE's. If a definition is not found in any command-line PDFILE list, then the PDFILE access is tried with the alternate PDFILE directory option of "-j@<d>" along with the same list -@<d> for Search PDEF's or -k@<d> for Replace PDEF's If still not located, attempts are made to locate it in any of the -j@<d> directories. An error message is given if no definition can be found. However, when found, a definition is placed into the pattern. And the exact file name and line location can be seen in the 1. Pattern summaries (by option -s) or, if interactively editing, by 2. The interactive edit command "p" that displays the full patterns being used for editing. Pattern Definition Syntax A PDEF consists of: a Definition Name (DNAME) <mydef> at the start of a line and for an entire line followed by 1. Spaces or Tabs or 2. an escaped newline (the escape character "\" followed by a newline) (followed by possible indentation space) and then the Definition Pattern (DPAT). The DPAT replaces the DNAME or @<mydef> in the original pattern. The PDEF should take the form of a DNAME at the start of the line followed by White Space (spaces and/or tabs and/or escaped new lines(with possible indentation)) followed by the DPAT. Back-Slash "\" NewLine Continuation A DPAT or the intervening White Space can be made longer than a single line by ending or "escaping" a line with the Back-Slash character "\" followed by a New Line. When this is done, the start of the next line can also be indented with White Space. (The Back-Slash does NOT become part of the final pattern.) The DPAT ends with the first unescaped New Line or EOF (End of File). Indentation of Definition segments that start with White Space To start a DPAT with tab or space, the ERE pattern special character tokens "\t" or "\s", or operator separator character "," can be specified as the first characters in the DPAT, followed by the leading tab or space. Zero Bytes PDL ignores any ASCII 0 characters. These however can be specified in a pattern with the string "\0". Comment Sequences PDL, like some languages, uses the semi colon character ";" for comments. And these comments can be continued to the next line(s) by use of the Back-Slash noted above. Thus, all it takes to ignore an entire definition, that may be continued onto following lines by "\" Back-Slash characters, is to put a single comment byte ";" at the very front of the definition. Also, invalid DNAME's count as comments and skip any further interpretation as a PDEF. Definition Names The first character of a DNAME should be a non-digit member of the VARIABLE character class like one of [a-zA-Z_], default variable characters range, or as specified by option -jv<d>. Tab, space, semi-colon ";", ASCII 0, and Back-Slash escape "\" characters; or new line sequences CANNOT be used in a DNAME. (As soon as Tab(s) or Space(s), or escaped NewLine(s) followed by possible indentation, are encountered after a DNAME, the search or replace pattern builder begins looking for a definition character.) Any misnamed DNAME and its DPAT is skipped over. More on @Definition Names DNAMES should be some meaningful word or combination of words (eg @advanceForward). DNAMES are case sensitive: upper and lower case letters must match in a pattern and in the PDFILE. A DNAME must occur on just one line, no continuation escape-sequence like in a DPAT is allowed. Definition Patterns Each DPAT is expected to be a complete expression that can be fully evaluated, even if it leads to other DPAT's by way of other PDEF's. No definition is allowed to call itself directly or indirectly, through itself or another definition, because this results in an endless loop. Marked groups reference numbers after "#" and "%" characters retain their point of reference or position number WITHIN THEIR OWN PATTERN, NOT VISIBLE OR AFFECTED BY OTHER DEFINITIONS (unless globalisation options like "-jg" or "-kg" are used). This isolation gives ENCAPSULATION (an ingredient of object oriented technology) to PDEF's. It provides a way of limiting the scope of definitions to immediate areas for safety. Marked groups reference NAMES after "#" and "%" are by default ENCAPSULATED. Global @Definitions With options "-jg" and "-kg" Definition names become GLOBAL. For this globalised format, all Marked group names in the main User Pattern and any Definition Patterns used by it or others must be UNIQUE. See Options "-jg" and "-kg" in PDGREPPE OPTIONS. The General Form of Basic Use for PDL @Definitions @<d> insert definition <d> from pattern definition file here A PDL definition in a pattern should have the form @<def-name>. In a PDL file there should be the definition <def-name> followed by its definition. An example: In a pattern there could be the string "my @fruit" and then in the PDL file there would be something like: fruit apple|orange|banana or fruit apple|\ orange|\ banana or possibly with comments ;my favourite fruits fruit apple|\ orange|\ banana Then to use it just give a command like: pdgreppe -@<PDFILE> @fruit <files-to-search-through> where <PDFILE> has Pattern Definition File name(s) where the definition @fruit is stored. And <files-to-search-through> are the files of interest to look through for apples, oranges or bananas. pdgreppe -Hjc "@fruit" ere.def or perhaps tropical fruits pdgreppe -Hjc "@tropFruit" donate.frm ERE.DEF has a definition for these as tropFruit :i{pineapple|\ coconut|\ papaya} and ":i" causes any search for them to be case insensitive. To have Tabs or Spaces start a DPAT, just use a Separation character sequence like "," or \<space> or \<tab> followed by the Tabs or Spaces needed or use "\s" or "\t" to represent Tabs or Spaces at the front of a definition. Pattern Definition Macros Syntax The General Form of a Definition Macro is like: @@<d>(<MyPattern>) that inserts a definition <d> from a PDFILE as a Macro. It uses the first non-space item after a Definition Name as a unique text string. This that be substituted in the remainder of the definition in the PDL file, by the variable in the User pattern: <MyPattern> An example: In a user pattern there could be the string "@@bread(Rye)" and then in the PDL PDFILE there would be a line like: bread type I will have it on type bread please that will become, by substitution, I will have it on Rye bread please in the user pattern. pdgreppe -Hjc "@@bread(Rye)" ere.def What could be more simple? Simple plain text Macro substitution in PDL can 1) give further meaning to your patterns by allowing You to use meaningful names that also work as variables 2) provide shielding between complex pattern specifications and your simple insertions of relevant pattern data and 3) save You much typing, or recalculation, when making or reusing Magic patterns, especially those that are similar to ones used before. Definition-File Relationship To see the PDFILE names and lines where Definitions are located and what they are composed of, use option "-jD" along with option "-s" for a summary Full Pattern, or if in interactive edit, use the "P" command for show search/replace patterns. PDL processing gives information when there are errors. It will give the name of the file in question that contains a faulty definition, and a stack of information of any PDFILE's and definitions that were also being processed at the time of error. Repeated definitions are reported with option "-jW" so if You are unable to make a pattern with definitions correctly, try this to see if a definition is in some other part of the intended file or other files that are being used for definitions. Summary PDL or Pattern Definition Language, used with PDGREPPE is an attempt to make Magic Expressions (ME's) more easy to use and reusable. (Especially since having to "cook up" ME's from scratch can be difficult.) It becomes more easy when there is a PDL dictionary or "tool box" to help. Let your Magic aspire to reusability! Use PDL! Sometimes referred to as Pagan Definition Language... Pagans are the ones who have it right!
[ < < < Home ]
[ < < Reference Start ]
[ < Reference Contents ]
[ < Previous=PDGREPPE Limits ]
[ Next=PDGREPPE Options > ]
© Intelligence Services 1987 - 2008
GPO Box 9, ADELAIDE SA 5001, AUSTRALIA
EMAIL : intlsvs@gmail.com