RegExp Filters

PAW support two kind of Regular expression filters, Replace- and Size-Filters.
Replace-Filters inspect the HTML page returned from the web server and do substitutions according to the rules of the active Replace-Filters.
Size-Filters in addition can include height and width definitions which are read from a specified file to filter for examples images.


Information for developers

Special instructions known by Replace- and Size-Filters are:

@START
Inset  at start of HTML file.
@END
Append to end of HTML file.

In addition Size-Filters can include the following instructions:

@HEIGHT
Width read from file.
@WIDTH
Height read from file.
The width and height used in Size-Filters read from a file which has the form.
Each line contains one width and height definition:

WIDTHxHEIGHT

Perl 5 Regular Expressions

PAW Regualar Expression Filters support basic Perl 5 Regular Expressions (including back references).

Regular Expression Syntax
Metacharacter Meaning
\

Escapes the character(s) immediately following it

.

Matches any single character except a newline (unless /s is used)

^

Matches at the beginning of the string (or line, if /m used)

$

Matches at the end of the string (or line, if /m used)

*

Matches the preceding element 0 or more times

+

Matches the preceding element 1 or more times

?

Matches the preceding element 0 or 1 times

{...}

Specifies a range of occurrences for the element preceding it

[...]

Matches any one of the class of characters contained within the brackets

(...)

Groups regular expressions

|

Matches either the expression preceding or following it


Escaped Sequence
Code Matches
\a Alarm (beep)
\n Newline
\r Carriage return
\t Tab
\f Formfeed
\e Escape
\007 Any octal ASCII value
\x7f Any hexadecimal ASCII value
\cx Control-x

Character Classes

Code Matches
\d A digit, same as [0-9]
\D A nondigit, same as [^0-9]
\w A word character (alphanumeric), same as [a-zA-Z_0-9]
\W A nonword character, [^a-zA-Z_0-9]
\s A whitespace character, same as [\t\n\r\f]
\S A non-whitespace character, [^\t\n\r\f]

Anchors
Assertion Meaning
^

Matches at the beginning of the string (or line, if /m used)

$

Matches at the end of the string (or line, if /m used)

\b

Matches at word boundary (between \w and \W)

\B

Matches except at word boundary

\A

Matches at the beginning of the string

\Z

Matches at the end of the string or before a newline

\z

Matches only at the end of the string

\G

Matches where previous m//g left off


Quantifiers
Maximal Minimal Allowed Range
{n,m} {n,m}?

Must occur at least n times but no more than m times

{n,} {n,}?

Must occur at least n times

{n} {n}?

Must match exactly n times

* *?

0 or more times (same as {0,})

+ +?

1 or more times (same as {1,})

? ??

0 or 1 time (same as {0,1})