Revert "[WIN32] removed the pcre sources from our repo"

[vuplus_xbmc] / lib / win32 / pcre / doc / pcre.txt
diff --git a/lib/win32/pcre/doc/pcre.txt b/lib/win32/pcre/doc/pcre.txt

new file mode 100644 (file)

index 0000000..2ccc7bb
--- /dev/null
+++ b/lib/win32/pcre/doc/pcre.txt
@@ -0,0 +1,7074 @@
+-----------------------------------------------------------------------------
+This file contains a concatenation of the PCRE man pages, converted to plain
+text format for ease of searching with a text editor, or for use on systems
+that do not have a man page processor. The small individual files that give
+synopses of each function in the library have not been included. Neither has
+the pcredemo program. There are separate text files for the pcregrep and
+pcretest commands.
+-----------------------------------------------------------------------------
+
+
+PCRE(3)                                                                PCRE(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+INTRODUCTION
+
+       The  PCRE  library is a set of functions that implement regular expres-
+       sion pattern matching using the same syntax and semantics as Perl, with
+       just  a few differences. Some features that appeared in Python and PCRE
+       before they appeared in Perl are also available using the  Python  syn-
+       tax,  there  is  some  support for one or two .NET and Oniguruma syntax
+       items, and there is an option for requesting some  minor  changes  that
+       give better JavaScript compatibility.
+
+       The  current implementation of PCRE corresponds approximately with Perl
+       5.10, including support for UTF-8 encoded strings and  Unicode  general
+       category  properties.  However,  UTF-8  and  Unicode  support has to be
+       explicitly enabled; it is not the default. The  Unicode  tables  corre-
+       spond to Unicode release 5.1.
+
+       In  addition to the Perl-compatible matching function, PCRE contains an
+       alternative function that matches the same compiled patterns in a  dif-
+       ferent way. In certain circumstances, the alternative function has some
+       advantages.  For a discussion of the two matching algorithms,  see  the
+       pcrematching page.
+
+       PCRE  is  written  in C and released as a C library. A number of people
+       have written wrappers and interfaces of various kinds.  In  particular,
+       Google  Inc.   have  provided  a comprehensive C++ wrapper. This is now
+       included as part of the PCRE distribution. The pcrecpp page has details
+       of  this  interface.  Other  people's contributions can be found in the
+       Contrib directory at the primary FTP site, which is:
+
+       ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
+
+       Details of exactly which Perl regular expression features are  and  are
+       not supported by PCRE are given in separate documents. See the pcrepat-
+       tern and pcrecompat pages. There is a syntax summary in the  pcresyntax
+       page.
+
+       Some  features  of  PCRE can be included, excluded, or changed when the
+       library is built. The pcre_config() function makes it  possible  for  a
+       client  to  discover  which  features are available. The features them-
+       selves are described in the pcrebuild page. Documentation about  build-
+       ing  PCRE  for various operating systems can be found in the README and
+       NON-UNIX-USE files in the source distribution.
+
+       The library contains a number of undocumented  internal  functions  and
+       data  tables  that  are  used by more than one of the exported external
+       functions, but which are not intended  for  use  by  external  callers.
+       Their  names  all begin with "_pcre_", which hopefully will not provoke
+       any name clashes. In some environments, it is possible to control which
+       external  symbols  are  exported when a shared library is built, and in
+       these cases the undocumented symbols are not exported.
+
+
+USER DOCUMENTATION
+
+       The user documentation for PCRE comprises a number  of  different  sec-
+       tions.  In the "man" format, each of these is a separate "man page". In
+       the HTML format, each is a separate page, linked from the  index  page.
+       In  the  plain  text format, all the sections, except the pcredemo sec-
+       tion, are concatenated, for ease of searching. The sections are as fol-
+       lows:
+
+         pcre              this document
+         pcre-config       show PCRE installation configuration information
+         pcreapi           details of PCRE's native C API
+         pcrebuild         options for building PCRE
+         pcrecallout       details of the callout feature
+         pcrecompat        discussion of Perl compatibility
+         pcrecpp           details of the C++ wrapper
+         pcredemo          a demonstration C program that uses PCRE
+         pcregrep          description of the pcregrep command
+         pcrematching      discussion of the two matching algorithms
+         pcrepartial       details of the partial matching facility
+         pcrepattern       syntax and semantics of supported
+                             regular expressions
+         pcreperform       discussion of performance issues
+         pcreposix         the POSIX-compatible C API
+         pcreprecompile    details of saving and re-using precompiled patterns
+         pcresample        discussion of the pcredemo program
+         pcrestack         discussion of stack usage
+         pcresyntax        quick syntax reference
+         pcretest          description of the pcretest testing command
+
+       In  addition,  in the "man" and HTML formats, there is a short page for
+       each C library function, listing its arguments and results.
+
+
+LIMITATIONS
+
+       There are some size limitations in PCRE but it is hoped that they  will
+       never in practice be relevant.
+
+       The  maximum  length of a compiled pattern is 65539 (sic) bytes if PCRE
+       is compiled with the default internal linkage size of 2. If you want to
+       process  regular  expressions  that are truly enormous, you can compile
+       PCRE with an internal linkage size of 3 or 4 (see the  README  file  in
+       the  source  distribution and the pcrebuild documentation for details).
+       In these cases the limit is substantially larger.  However,  the  speed
+       of execution is slower.
+
+       All values in repeating quantifiers must be less than 65536.
+
+       There is no limit to the number of parenthesized subpatterns, but there
+       can be no more than 65535 capturing subpatterns.
+
+       The maximum length of name for a named subpattern is 32 characters, and
+       the maximum number of named subpatterns is 10000.
+
+       The  maximum  length of a subject string is the largest positive number
+       that an integer variable can hold. However, when using the  traditional
+       matching function, PCRE uses recursion to handle subpatterns and indef-
+       inite repetition.  This means that the available stack space may  limit
+       the size of a subject string that can be processed by certain patterns.
+       For a discussion of stack issues, see the pcrestack documentation.
+
+
+UTF-8 AND UNICODE PROPERTY SUPPORT
+
+       From release 3.3, PCRE has  had  some  support  for  character  strings
+       encoded  in the UTF-8 format. For release 4.0 this was greatly extended
+       to cover most common requirements, and in release 5.0  additional  sup-
+       port for Unicode general category properties was added.
+
+       In  order  process  UTF-8 strings, you must build PCRE to include UTF-8
+       support in the code, and, in addition,  you  must  call  pcre_compile()
+       with  the  PCRE_UTF8  option  flag,  or the pattern must start with the
+       sequence (*UTF8). When either of these is the case,  both  the  pattern
+       and  any  subject  strings  that  are matched against it are treated as
+       UTF-8 strings instead of strings of 1-byte characters.
+
+       If you compile PCRE with UTF-8 support, but do not use it at run  time,
+       the  library will be a bit bigger, but the additional run time overhead
+       is limited to testing the PCRE_UTF8 flag occasionally, so should not be
+       very big.
+
+       If PCRE is built with Unicode character property support (which implies
+       UTF-8 support), the escape sequences \p{..}, \P{..}, and  \X  are  sup-
+       ported.  The available properties that can be tested are limited to the
+       general category properties such as Lu for an upper case letter  or  Nd
+       for  a  decimal number, the Unicode script names such as Arabic or Han,
+       and the derived properties Any and L&. A full  list  is  given  in  the
+       pcrepattern documentation. Only the short names for properties are sup-
+       ported. For example, \p{L} matches a letter. Its Perl synonym,  \p{Let-
+       ter},  is  not  supported.   Furthermore,  in Perl, many properties may
+       optionally be prefixed by "Is", for compatibility with Perl  5.6.  PCRE
+       does not support this.
+
+   Validity of UTF-8 strings
+
+       When  you  set  the  PCRE_UTF8 flag, the strings passed as patterns and
+       subjects are (by default) checked for validity on entry to the relevant
+       functions.  From  release 7.3 of PCRE, the check is according the rules
+       of RFC 3629, which are themselves derived from the  Unicode  specifica-
+       tion.  Earlier  releases  of PCRE followed the rules of RFC 2279, which
+       allows the full range of 31-bit values (0 to 0x7FFFFFFF).  The  current
+       check allows only values in the range U+0 to U+10FFFF, excluding U+D800
+       to U+DFFF.
+
+       The excluded code points are the "Low Surrogate Area"  of  Unicode,  of
+       which  the Unicode Standard says this: "The Low Surrogate Area does not
+       contain any  character  assignments,  consequently  no  character  code
+       charts or namelists are provided for this area. Surrogates are reserved
+       for use with UTF-16 and then must be used in pairs."  The  code  points
+       that  are  encoded  by  UTF-16  pairs are available as independent code
+       points in the UTF-8 encoding. (In  other  words,  the  whole  surrogate
+       thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)
+
+       If  an  invalid  UTF-8  string  is  passed  to  PCRE,  an  error return
+       (PCRE_ERROR_BADUTF8) is given. In some situations, you may already know
+       that your strings are valid, and therefore want to skip these checks in
+       order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag at
+       compile  time  or at run time, PCRE assumes that the pattern or subject
+       it is given (respectively) contains only valid  UTF-8  codes.  In  this
+       case, it does not diagnose an invalid UTF-8 string.
+
+       If  you  pass  an  invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set,
+       what happens depends on why the string is invalid. If the  string  con-
+       forms to the "old" definition of UTF-8 (RFC 2279), it is processed as a
+       string of characters in the range 0  to  0x7FFFFFFF.  In  other  words,
+       apart from the initial validity test, PCRE (when in UTF-8 mode) handles
+       strings according to the more liberal rules of RFC  2279.  However,  if
+       the  string does not even conform to RFC 2279, the result is undefined.
+       Your program may crash.
+
+       If you want to process strings  of  values  in  the  full  range  0  to
+       0x7FFFFFFF,  encoded in a UTF-8-like manner as per the old RFC, you can
+       set PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in
+       this situation, you will have to apply your own validity check.
+
+   General comments about UTF-8 mode
+
+       1.  An  unbraced  hexadecimal  escape sequence (such as \xb3) matches a
+       two-byte UTF-8 character if the value is greater than 127.
+
+       2. Octal numbers up to \777 are recognized, and  match  two-byte  UTF-8
+       characters for values greater than \177.
+
+       3.  Repeat quantifiers apply to complete UTF-8 characters, not to indi-
+       vidual bytes, for example: \x{100}{3}.
+
+       4. The dot metacharacter matches one UTF-8 character instead of a  sin-
+       gle byte.
+
+       5.  The  escape sequence \C can be used to match a single byte in UTF-8
+       mode, but its use can lead to some strange effects.  This  facility  is
+       not available in the alternative matching function, pcre_dfa_exec().
+
+       6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
+       test characters of any code value, but the characters that PCRE  recog-
+       nizes  as  digits,  spaces,  or  word characters remain the same set as
+       before, all with values less than 256. This remains true even when PCRE
+       includes  Unicode  property support, because to do otherwise would slow
+       down PCRE in many common cases. If you really want to test for a  wider
+       sense  of,  say,  "digit",  you must use Unicode property tests such as
+       \p{Nd}. Note that this also applies to \b, because  it  is  defined  in
+       terms of \w and \W.
+
+       7.  Similarly,  characters that match the POSIX named character classes
+       are all low-valued characters.
+
+       8. However, the Perl 5.10 horizontal and vertical  whitespace  matching
+       escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char-
+       acters.
+
+       9. Case-insensitive matching applies only to  characters  whose  values
+       are  less than 128, unless PCRE is built with Unicode property support.
+       Even when Unicode property support is available, PCRE  still  uses  its
+       own  character  tables when checking the case of low-valued characters,
+       so as not to degrade performance.  The Unicode property information  is
+       used only for characters with higher values. Even when Unicode property
+       support is available, PCRE supports case-insensitive matching only when
+       there  is  a  one-to-one  mapping between a letter's cases. There are a
+       small number of many-to-one mappings in Unicode;  these  are  not  sup-
+       ported by PCRE.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+       Putting  an actual email address here seems to have been a spam magnet,
+       so I've taken it away. If you want to email me, use  my  two  initials,
+       followed by the two digits 10, at the domain cam.ac.uk.
+
+
+REVISION
+
+       Last updated: 28 September 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREBUILD(3)                                                      PCREBUILD(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE BUILD-TIME OPTIONS
+
+       This  document  describes  the  optional  features  of PCRE that can be
+       selected when the library is compiled. It assumes use of the  configure
+       script,  where the optional features are selected or deselected by pro-
+       viding options to configure before running the make  command.  However,
+       the  same  options  can be selected in both Unix-like and non-Unix-like
+       environments using the GUI facility of cmake-gui if you are using CMake
+       instead of configure to build PCRE.
+
+       There  is  a  lot more information about building PCRE in non-Unix-like
+       environments in the file called NON_UNIX_USE, which is part of the PCRE
+       distribution.  You  should consult this file as well as the README file
+       if you are building in a non-Unix-like environment.
+
+       The complete list of options for configure (which includes the standard
+       ones  such  as  the  selection  of  the  installation directory) can be
+       obtained by running
+
+         ./configure --help
+
+       The following sections include  descriptions  of  options  whose  names
+       begin with --enable or --disable. These settings specify changes to the
+       defaults for the configure command. Because of the way  that  configure
+       works,  --enable  and --disable always come in pairs, so the complemen-
+       tary option always exists as well, but as it specifies the default,  it
+       is not described.
+
+
+C++ SUPPORT
+
+       By default, the configure script will search for a C++ compiler and C++
+       header files. If it finds them, it automatically builds the C++ wrapper
+       library for PCRE. You can disable this by adding
+
+         --disable-cpp
+
+       to the configure command.
+
+
+UTF-8 SUPPORT
+
+       To build PCRE with support for UTF-8 Unicode character strings, add
+
+         --enable-utf8
+
+       to  the  configure  command.  Of  itself, this does not make PCRE treat
+       strings as UTF-8. As well as compiling PCRE with this option, you  also
+       have  have to set the PCRE_UTF8 option when you call the pcre_compile()
+       or pcre_compile2() functions.
+
+       If you set --enable-utf8 when compiling in an EBCDIC environment,  PCRE
+       expects its input to be either ASCII or UTF-8 (depending on the runtime
+       option). It is not possible to support both EBCDIC and UTF-8  codes  in
+       the  same  version  of  the  library.  Consequently,  --enable-utf8 and
+       --enable-ebcdic are mutually exclusive.
+
+
+UNICODE CHARACTER PROPERTY SUPPORT
+
+       UTF-8 support allows PCRE to process character values greater than  255
+       in  the  strings that it handles. On its own, however, it does not pro-
+       vide any facilities for accessing the properties of such characters. If
+       you  want  to  be able to use the pattern escapes \P, \p, and \X, which
+       refer to Unicode character properties, you must add
+
+         --enable-unicode-properties
+
+       to the configure command. This implies UTF-8 support, even if you  have
+       not explicitly requested it.
+
+       Including  Unicode  property  support  adds around 30K of tables to the
+       PCRE library. Only the general category properties such as  Lu  and  Nd
+       are supported. Details are given in the pcrepattern documentation.
+
+
+CODE VALUE OF NEWLINE
+
+       By  default,  PCRE interprets the linefeed (LF) character as indicating
+       the end of a line. This is the normal newline  character  on  Unix-like
+       systems.  You  can compile PCRE to use carriage return (CR) instead, by
+       adding
+
+         --enable-newline-is-cr
+
+       to the  configure  command.  There  is  also  a  --enable-newline-is-lf
+       option, which explicitly specifies linefeed as the newline character.
+
+       Alternatively, you can specify that line endings are to be indicated by
+       the two character sequence CRLF. If you want this, add
+
+         --enable-newline-is-crlf
+
+       to the configure command. There is a fourth option, specified by
+
+         --enable-newline-is-anycrlf
+
+       which causes PCRE to recognize any of the three sequences  CR,  LF,  or
+       CRLF as indicating a line ending. Finally, a fifth option, specified by
+
+         --enable-newline-is-any
+
+       causes PCRE to recognize any Unicode newline sequence.
+
+       Whatever  line  ending convention is selected when PCRE is built can be
+       overridden when the library functions are called. At build time  it  is
+       conventional to use the standard for your operating system.
+
+
+WHAT \R MATCHES
+
+       By  default,  the  sequence \R in a pattern matches any Unicode newline
+       sequence, whatever has been selected as the line  ending  sequence.  If
+       you specify
+
+         --enable-bsr-anycrlf
+
+       the  default  is changed so that \R matches only CR, LF, or CRLF. What-
+       ever is selected when PCRE is built can be overridden when the  library
+       functions are called.
+
+
+BUILDING SHARED AND STATIC LIBRARIES
+
+       The  PCRE building process uses libtool to build both shared and static
+       Unix libraries by default. You can suppress one of these by adding  one
+       of
+
+         --disable-shared
+         --disable-static
+
+       to the configure command, as required.
+
+
+POSIX MALLOC USAGE
+
+       When PCRE is called through the POSIX interface (see the pcreposix doc-
+       umentation), additional working storage is  required  for  holding  the
+       pointers  to capturing substrings, because PCRE requires three integers
+       per substring, whereas the POSIX interface provides only  two.  If  the
+       number of expected substrings is small, the wrapper function uses space
+       on the stack, because this is faster than using malloc() for each call.
+       The default threshold above which the stack is no longer used is 10; it
+       can be changed by adding a setting such as
+
+         --with-posix-malloc-threshold=20
+
+       to the configure command.
+
+
+HANDLING VERY LARGE PATTERNS
+
+       Within a compiled pattern, offset values are used  to  point  from  one
+       part  to another (for example, from an opening parenthesis to an alter-
+       nation metacharacter). By default, two-byte values are used  for  these
+       offsets,  leading  to  a  maximum size for a compiled pattern of around
+       64K. This is sufficient to handle all but the most  gigantic  patterns.
+       Nevertheless,  some  people do want to process truyl enormous patterns,
+       so it is possible to compile PCRE to use three-byte or  four-byte  off-
+       sets by adding a setting such as
+
+         --with-link-size=3
+
+       to  the  configure  command.  The value given must be 2, 3, or 4. Using
+       longer offsets slows down the operation of PCRE because it has to  load
+       additional bytes when handling them.
+
+
+AVOIDING EXCESSIVE STACK USAGE
+
+       When matching with the pcre_exec() function, PCRE implements backtrack-
+       ing by making recursive calls to an internal function  called  match().
+       In  environments  where  the size of the stack is limited, this can se-
+       verely limit PCRE's operation. (The Unix environment does  not  usually
+       suffer from this problem, but it may sometimes be necessary to increase
+       the maximum stack size.  There is a discussion in the  pcrestack  docu-
+       mentation.)  An alternative approach to recursion that uses memory from
+       the heap to remember data, instead of using recursive  function  calls,
+       has  been  implemented to work round the problem of limited stack size.
+       If you want to build a version of PCRE that works this way, add
+
+         --disable-stack-for-recursion
+
+       to the configure command. With this configuration, PCRE  will  use  the
+       pcre_stack_malloc  and pcre_stack_free variables to call memory manage-
+       ment functions. By default these point to malloc() and free(), but  you
+       can replace the pointers so that your own functions are used instead.
+
+       Separate  functions  are  provided  rather  than  using pcre_malloc and
+       pcre_free because the  usage  is  very  predictable:  the  block  sizes
+       requested  are  always  the  same,  and  the blocks are always freed in
+       reverse order. A calling program might be able to  implement  optimized
+       functions  that  perform  better  than  malloc()  and free(). PCRE runs
+       noticeably more slowly when built in this way. This option affects only
+       the pcre_exec() function; it is not relevant for pcre_dfa_exec().
+
+
+LIMITING PCRE RESOURCE USAGE
+
+       Internally,  PCRE has a function called match(), which it calls repeat-
+       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
+       pcre_exec()  function.  By controlling the maximum number of times this
+       function may be called during a single matching operation, a limit  can
+       be  placed  on  the resources used by a single call to pcre_exec(). The
+       limit can be changed at run time, as described in the pcreapi  documen-
+       tation.  The default is 10 million, but this can be changed by adding a
+       setting such as
+
+         --with-match-limit=500000
+
+       to  the  configure  command.  This  setting  has  no  effect   on   the
+       pcre_dfa_exec() matching function.
+
+       In  some  environments  it is desirable to limit the depth of recursive
+       calls of match() more strictly than the total number of calls, in order
+       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
+       for-recursion is specified) that is used. A second limit controls this;
+       it  defaults  to  the  value  that is set for --with-match-limit, which
+       imposes no additional constraints. However, you can set a  lower  limit
+       by adding, for example,
+
+         --with-match-limit-recursion=10000
+
+       to  the  configure  command.  This  value can also be overridden at run
+       time.
+
+
+CREATING CHARACTER TABLES AT BUILD TIME
+
+       PCRE uses fixed tables for processing characters whose code values  are
+       less  than 256. By default, PCRE is built with a set of tables that are
+       distributed in the file pcre_chartables.c.dist. These  tables  are  for
+       ASCII codes only. If you add
+
+         --enable-rebuild-chartables
+
+       to  the  configure  command, the distributed tables are no longer used.
+       Instead, a program called dftables is compiled and  run.  This  outputs
+       the source for new set of tables, created in the default locale of your
+       C runtime system. (This method of replacing the tables does not work if
+       you  are cross compiling, because dftables is run on the local host. If
+       you need to create alternative tables when cross  compiling,  you  will
+       have to do so "by hand".)
+
+
+USING EBCDIC CODE
+
+       PCRE  assumes  by  default that it will run in an environment where the
+       character code is ASCII (or Unicode, which is  a  superset  of  ASCII).
+       This  is  the  case for most computer operating systems. PCRE can, how-
+       ever, be compiled to run in an EBCDIC environment by adding
+
+         --enable-ebcdic
+
+       to the configure command. This setting implies --enable-rebuild-charta-
+       bles.  You  should  only  use  it if you know that you are in an EBCDIC
+       environment (for example,  an  IBM  mainframe  operating  system).  The
+       --enable-ebcdic option is incompatible with --enable-utf8.
+
+
+PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT
+
+       By default, pcregrep reads all files as plain text. You can build it so
+       that it recognizes files whose names end in .gz or .bz2, and reads them
+       with libz or libbz2, respectively, by adding one or both of
+
+         --enable-pcregrep-libz
+         --enable-pcregrep-libbz2
+
+       to the configure command. These options naturally require that the rel-
+       evant libraries are installed on your system. Configuration  will  fail
+       if they are not.
+
+
+PCRETEST OPTION FOR LIBREADLINE SUPPORT
+
+       If you add
+
+         --enable-pcretest-libreadline
+
+       to  the  configure  command,  pcretest  is  linked with the libreadline
+       library, and when its input is from a terminal, it reads it  using  the
+       readline() function. This provides line-editing and history facilities.
+       Note that libreadline is GPL-licensed, so if you distribute a binary of
+       pcretest linked in this way, there may be licensing issues.
+
+       Setting  this  option  causes  the -lreadline option to be added to the
+       pcretest build. In many operating environments with  a  sytem-installed
+       libreadline this is sufficient. However, in some environments (e.g.  if
+       an unmodified distribution version of readline is in use),  some  extra
+       configuration  may  be necessary. The INSTALL file for libreadline says
+       this:
+
+         "Readline uses the termcap functions, but does not link with the
+         termcap or curses library itself, allowing applications which link
+         with readline the to choose an appropriate library."
+
+       If your environment has not been set up so that an appropriate  library
+       is automatically included, you may need to add something like
+
+         LIBS="-ncurses"
+
+       immediately before the configure command.
+
+
+SEE ALSO
+
+       pcreapi(3), pcre_config(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 29 September 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREMATCHING(3)                                                PCREMATCHING(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE MATCHING ALGORITHMS
+
+       This document describes the two different algorithms that are available
+       in PCRE for matching a compiled regular expression against a given sub-
+       ject  string.  The  "standard"  algorithm  is  the  one provided by the
+       pcre_exec() function.  This works in the same was  as  Perl's  matching
+       function, and provides a Perl-compatible matching operation.
+
+       An  alternative  algorithm is provided by the pcre_dfa_exec() function;
+       this operates in a different way, and is not  Perl-compatible.  It  has
+       advantages  and disadvantages compared with the standard algorithm, and
+       these are described below.
+
+       When there is only one possible way in which a given subject string can
+       match  a pattern, the two algorithms give the same answer. A difference
+       arises, however, when there are multiple possibilities. For example, if
+       the pattern
+
+         ^<.*>
+
+       is matched against the string
+
+         <something> <something else> <something further>
+
+       there are three possible answers. The standard algorithm finds only one
+       of them, whereas the alternative algorithm finds all three.
+
+
+REGULAR EXPRESSIONS AS TREES
+
+       The set of strings that are matched by a regular expression can be rep-
+       resented  as  a  tree structure. An unlimited repetition in the pattern
+       makes the tree of infinite size, but it is still a tree.  Matching  the
+       pattern  to a given subject string (from a given starting point) can be
+       thought of as a search of the tree.  There are two  ways  to  search  a
+       tree:  depth-first  and  breadth-first, and these correspond to the two
+       matching algorithms provided by PCRE.
+
+
+THE STANDARD MATCHING ALGORITHM
+
+       In the terminology of Jeffrey Friedl's book "Mastering Regular  Expres-
+       sions",  the  standard  algorithm  is an "NFA algorithm". It conducts a
+       depth-first search of the pattern tree. That is, it  proceeds  along  a
+       single path through the tree, checking that the subject matches what is
+       required. When there is a mismatch, the algorithm  tries  any  alterna-
+       tives  at  the  current point, and if they all fail, it backs up to the
+       previous branch point in the  tree,  and  tries  the  next  alternative
+       branch  at  that  level.  This often involves backing up (moving to the
+       left) in the subject string as well.  The  order  in  which  repetition
+       branches  are  tried  is controlled by the greedy or ungreedy nature of
+       the quantifier.
+
+       If a leaf node is reached, a matching string has  been  found,  and  at
+       that  point the algorithm stops. Thus, if there is more than one possi-
+       ble match, this algorithm returns the first one that it finds.  Whether
+       this  is the shortest, the longest, or some intermediate length depends
+       on the way the greedy and ungreedy repetition quantifiers are specified
+       in the pattern.
+
+       Because  it  ends  up  with a single path through the tree, it is rela-
+       tively straightforward for this algorithm to keep  track  of  the  sub-
+       strings  that  are  matched  by portions of the pattern in parentheses.
+       This provides support for capturing parentheses and back references.
+
+
+THE ALTERNATIVE MATCHING ALGORITHM
+
+       This algorithm conducts a breadth-first search of  the  tree.  Starting
+       from  the  first  matching  point  in the subject, it scans the subject
+       string from left to right, once, character by character, and as it does
+       this,  it remembers all the paths through the tree that represent valid
+       matches. In Friedl's terminology, this is a kind  of  "DFA  algorithm",
+       though  it is not implemented as a traditional finite state machine (it
+       keeps multiple states active simultaneously).
+
+       Although the general principle of this matching algorithm  is  that  it
+       scans  the subject string only once, without backtracking, there is one
+       exception: when a lookaround assertion is encountered,  the  characters
+       following  or  preceding  the  current  point  have to be independently
+       inspected.
+
+       The scan continues until either the end of the subject is  reached,  or
+       there  are  no more unterminated paths. At this point, terminated paths
+       represent the different matching possibilities (if there are none,  the
+       match  has  failed).   Thus,  if there is more than one possible match,
+       this algorithm finds all of them, and in particular, it finds the long-
+       est.  There  is  an  option to stop the algorithm after the first match
+       (which is necessarily the shortest) is found.
+
+       Note that all the matches that are found start at the same point in the
+       subject. If the pattern
+
+         cat(er(pillar)?)
+
+       is  matched  against the string "the caterpillar catchment", the result
+       will be the three strings "cat", "cater", and "caterpillar" that  start
+       at the fourth character of the subject. The algorithm does not automat-
+       ically move on to find matches that start at later positions.
+
+       There are a number of features of PCRE regular expressions that are not
+       supported by the alternative matching algorithm. They are as follows:
+
+       1.  Because  the  algorithm  finds  all possible matches, the greedy or
+       ungreedy nature of repetition quantifiers is not relevant.  Greedy  and
+       ungreedy quantifiers are treated in exactly the same way. However, pos-
+       sessive quantifiers can make a difference when what follows could  also
+       match what is quantified, for example in a pattern like this:
+
+         ^a++\w!
+
+       This  pattern matches "aaab!" but not "aaa!", which would be matched by
+       a non-possessive quantifier. Similarly, if an atomic group is  present,
+       it  is matched as if it were a standalone pattern at the current point,
+       and the longest match is then "locked in" for the rest of  the  overall
+       pattern.
+
+       2. When dealing with multiple paths through the tree simultaneously, it
+       is not straightforward to keep track of  captured  substrings  for  the
+       different  matching  possibilities,  and  PCRE's implementation of this
+       algorithm does not attempt to do this. This means that no captured sub-
+       strings are available.
+
+       3.  Because no substrings are captured, back references within the pat-
+       tern are not supported, and cause errors if encountered.
+
+       4. For the same reason, conditional expressions that use  a  backrefer-
+       ence  as  the  condition or test for a specific group recursion are not
+       supported.
+
+       5. Because many paths through the tree may be  active,  the  \K  escape
+       sequence, which resets the start of the match when encountered (but may
+       be on some paths and not on others), is not  supported.  It  causes  an
+       error if encountered.
+
+       6.  Callouts  are  supported, but the value of the capture_top field is
+       always 1, and the value of the capture_last field is always -1.
+
+       7. The \C escape sequence, which (in the standard algorithm) matches  a
+       single  byte, even in UTF-8 mode, is not supported because the alterna-
+       tive algorithm moves through the subject  string  one  character  at  a
+       time, for all active paths through the tree.
+
+       8.  Except for (*FAIL), the backtracking control verbs such as (*PRUNE)
+       are not supported. (*FAIL) is supported, and  behaves  like  a  failing
+       negative assertion.
+
+
+ADVANTAGES OF THE ALTERNATIVE ALGORITHM
+
+       Using  the alternative matching algorithm provides the following advan-
+       tages:
+
+       1. All possible matches (at a single point in the subject) are automat-
+       ically  found,  and  in particular, the longest match is found. To find
+       more than one match using the standard algorithm, you have to do kludgy
+       things with callouts.
+
+       2.  Because  the  alternative  algorithm  scans the subject string just
+       once, and never needs to backtrack, it is possible to  pass  very  long
+       subject  strings  to  the matching function in several pieces, checking
+       for partial matching each time.  The  pcrepartial  documentation  gives
+       details of partial matching.
+
+
+DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
+
+       The alternative algorithm suffers from a number of disadvantages:
+
+       1.  It  is  substantially  slower  than the standard algorithm. This is
+       partly because it has to search for all possible matches, but  is  also
+       because it is less susceptible to optimization.
+
+       2. Capturing parentheses and back references are not supported.
+
+       3. Although atomic groups are supported, their use does not provide the
+       performance advantage that it does for the standard algorithm.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 29 September 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREAPI(3)                                                          PCREAPI(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE NATIVE API
+
+       #include <pcre.h>
+
+       pcre *pcre_compile(const char *pattern, int options,
+            const char **errptr, int *erroffset,
+            const unsigned char *tableptr);
+
+       pcre *pcre_compile2(const char *pattern, int options,
+            int *errorcodeptr,
+            const char **errptr, int *erroffset,
+            const unsigned char *tableptr);
+
+       pcre_extra *pcre_study(const pcre *code, int options,
+            const char **errptr);
+
+       int pcre_exec(const pcre *code, const pcre_extra *extra,
+            const char *subject, int length, int startoffset,
+            int options, int *ovector, int ovecsize);
+
+       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
+            const char *subject, int length, int startoffset,
+            int options, int *ovector, int ovecsize,
+            int *workspace, int wscount);
+
+       int pcre_copy_named_substring(const pcre *code,
+            const char *subject, int *ovector,
+            int stringcount, const char *stringname,
+            char *buffer, int buffersize);
+
+       int pcre_copy_substring(const char *subject, int *ovector,
+            int stringcount, int stringnumber, char *buffer,
+            int buffersize);
+
+       int pcre_get_named_substring(const pcre *code,
+            const char *subject, int *ovector,
+            int stringcount, const char *stringname,
+            const char **stringptr);
+
+       int pcre_get_stringnumber(const pcre *code,
+            const char *name);
+
+       int pcre_get_stringtable_entries(const pcre *code,
+            const char *name, char **first, char **last);
+
+       int pcre_get_substring(const char *subject, int *ovector,
+            int stringcount, int stringnumber,
+            const char **stringptr);
+
+       int pcre_get_substring_list(const char *subject,
+            int *ovector, int stringcount, const char ***listptr);
+
+       void pcre_free_substring(const char *stringptr);
+
+       void pcre_free_substring_list(const char **stringptr);
+
+       const unsigned char *pcre_maketables(void);
+
+       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
+            int what, void *where);
+
+       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
+
+       int pcre_refcount(pcre *code, int adjust);
+
+       int pcre_config(int what, void *where);
+
+       char *pcre_version(void);
+
+       void *(*pcre_malloc)(size_t);
+
+       void (*pcre_free)(void *);
+
+       void *(*pcre_stack_malloc)(size_t);
+
+       void (*pcre_stack_free)(void *);
+
+       int (*pcre_callout)(pcre_callout_block *);
+
+
+PCRE API OVERVIEW
+
+       PCRE has its own native API, which is described in this document. There
+       are also some wrapper functions that correspond to  the  POSIX  regular
+       expression  API.  These  are  described in the pcreposix documentation.
+       Both of these APIs define a set of C function calls. A C++  wrapper  is
+       distributed with PCRE. It is documented in the pcrecpp page.
+
+       The  native  API  C  function prototypes are defined in the header file
+       pcre.h, and on Unix systems the library itself is called  libpcre.   It
+       can normally be accessed by adding -lpcre to the command for linking an
+       application  that  uses  PCRE.  The  header  file  defines  the  macros
+       PCRE_MAJOR  and  PCRE_MINOR to contain the major and minor release num-
+       bers for the library.  Applications can use these  to  include  support
+       for different releases of PCRE.
+
+       The   functions   pcre_compile(),  pcre_compile2(),  pcre_study(),  and
+       pcre_exec() are used for compiling and matching regular expressions  in
+       a  Perl-compatible  manner. A sample program that demonstrates the sim-
+       plest way of using them is provided in the file  called  pcredemo.c  in
+       the PCRE source distribution. A listing of this program is given in the
+       pcredemo documentation, and the pcresample documentation describes  how
+       to compile and run it.
+
+       A second matching function, pcre_dfa_exec(), which is not Perl-compati-
+       ble, is also provided. This uses a different algorithm for  the  match-
+       ing.  The  alternative algorithm finds all possible matches (at a given
+       point in the subject), and scans the subject just  once  (unless  there
+       are  lookbehind  assertions).  However,  this algorithm does not return
+       captured substrings. A description of the two matching  algorithms  and
+       their  advantages  and disadvantages is given in the pcrematching docu-
+       mentation.
+
+       In addition to the main compiling and  matching  functions,  there  are
+       convenience functions for extracting captured substrings from a subject
+       string that is matched by pcre_exec(). They are:
+
+         pcre_copy_substring()
+         pcre_copy_named_substring()
+         pcre_get_substring()
+         pcre_get_named_substring()
+         pcre_get_substring_list()
+         pcre_get_stringnumber()
+         pcre_get_stringtable_entries()
+
+       pcre_free_substring() and pcre_free_substring_list() are also provided,
+       to free the memory used for extracted strings.
+
+       The  function  pcre_maketables()  is  used  to build a set of character
+       tables  in  the  current  locale   for   passing   to   pcre_compile(),
+       pcre_exec(),  or  pcre_dfa_exec(). This is an optional facility that is
+       provided for specialist use.  Most  commonly,  no  special  tables  are
+       passed,  in  which case internal tables that are generated when PCRE is
+       built are used.
+
+       The function pcre_fullinfo() is used to find out  information  about  a
+       compiled  pattern; pcre_info() is an obsolete version that returns only
+       some of the available information, but is retained for  backwards  com-
+       patibility.   The function pcre_version() returns a pointer to a string
+       containing the version of PCRE and its date of release.
+
+       The function pcre_refcount() maintains a  reference  count  in  a  data
+       block  containing  a compiled pattern. This is provided for the benefit
+       of object-oriented applications.
+
+       The global variables pcre_malloc and pcre_free  initially  contain  the
+       entry  points  of  the  standard malloc() and free() functions, respec-
+       tively. PCRE calls the memory management functions via these variables,
+       so  a  calling  program  can replace them if it wishes to intercept the
+       calls. This should be done before calling any PCRE functions.
+
+       The global variables pcre_stack_malloc  and  pcre_stack_free  are  also
+       indirections  to  memory  management functions. These special functions
+       are used only when PCRE is compiled to use  the  heap  for  remembering
+       data, instead of recursive function calls, when running the pcre_exec()
+       function. See the pcrebuild documentation for  details  of  how  to  do
+       this.  It  is  a non-standard way of building PCRE, for use in environ-
+       ments that have limited stacks. Because of the greater  use  of  memory
+       management,  it  runs  more  slowly. Separate functions are provided so
+       that special-purpose external code can be  used  for  this  case.  When
+       used,  these  functions  are always called in a stack-like manner (last
+       obtained, first freed), and always for memory blocks of the same  size.
+       There  is  a discussion about PCRE's stack usage in the pcrestack docu-
+       mentation.
+
+       The global variable pcre_callout initially contains NULL. It can be set
+       by  the  caller  to  a "callout" function, which PCRE will then call at
+       specified points during a matching operation. Details are given in  the
+       pcrecallout documentation.
+
+
+NEWLINES
+
+       PCRE  supports five different conventions for indicating line breaks in
+       strings: a single CR (carriage return) character, a  single  LF  (line-
+       feed) character, the two-character sequence CRLF, any of the three pre-
+       ceding, or any Unicode newline sequence. The Unicode newline  sequences
+       are  the  three just mentioned, plus the single characters VT (vertical
+       tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
+       separator, U+2028), and PS (paragraph separator, U+2029).
+
+       Each  of  the first three conventions is used by at least one operating
+       system as its standard newline sequence. When PCRE is built, a  default
+       can  be  specified.  The default default is LF, which is the Unix stan-
+       dard. When PCRE is run, the default can be overridden,  either  when  a
+       pattern is compiled, or when it is matched.
+
+       At compile time, the newline convention can be specified by the options
+       argument of pcre_compile(), or it can be specified by special  text  at
+       the start of the pattern itself; this overrides any other settings. See
+       the pcrepattern page for details of the special character sequences.
+
+       In the PCRE documentation the word "newline" is used to mean "the char-
+       acter  or pair of characters that indicate a line break". The choice of
+       newline convention affects the handling of  the  dot,  circumflex,  and
+       dollar metacharacters, the handling of #-comments in /x mode, and, when
+       CRLF is a recognized line ending sequence, the match position  advance-
+       ment for a non-anchored pattern. There is more detail about this in the
+       section on pcre_exec() options below.
+
+       The choice of newline convention does not affect the interpretation  of
+       the  \n  or  \r  escape  sequences, nor does it affect what \R matches,
+       which is controlled in a similar way, but by separate options.
+
+
+MULTITHREADING
+
+       The PCRE functions can be used in  multi-threading  applications,  with
+       the  proviso  that  the  memory  management  functions  pointed  to  by
+       pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
+       callout function pointed to by pcre_callout, are shared by all threads.
+
+       The  compiled form of a regular expression is not altered during match-
+       ing, so the same compiled pattern can safely be used by several threads
+       at once.
+
+
+SAVING PRECOMPILED PATTERNS FOR LATER USE
+
+       The compiled form of a regular expression can be saved and re-used at a
+       later time, possibly by a different program, and even on a  host  other
+       than  the  one  on  which  it  was  compiled.  Details are given in the
+       pcreprecompile documentation. However, compiling a  regular  expression
+       with  one version of PCRE for use with a different version is not guar-
+       anteed to work and may cause crashes.
+
+
+CHECKING BUILD-TIME OPTIONS
+
+       int pcre_config(int what, void *where);
+
+       The function pcre_config() makes it possible for a PCRE client to  dis-
+       cover which optional features have been compiled into the PCRE library.
+       The pcrebuild documentation has more details about these optional  fea-
+       tures.
+
+       The  first  argument  for pcre_config() is an integer, specifying which
+       information is required; the second argument is a pointer to a variable
+       into  which  the  information  is  placed. The following information is
+       available:
+
+         PCRE_CONFIG_UTF8
+
+       The output is an integer that is set to one if UTF-8 support is  avail-
+       able; otherwise it is set to zero.
+
+         PCRE_CONFIG_UNICODE_PROPERTIES
+
+       The  output  is  an  integer  that is set to one if support for Unicode
+       character properties is available; otherwise it is set to zero.
+
+         PCRE_CONFIG_NEWLINE
+
+       The output is an integer whose value specifies  the  default  character
+       sequence  that is recognized as meaning "newline". The four values that
+       are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
+       and  -1  for  ANY.  Though they are derived from ASCII, the same values
+       are returned in EBCDIC environments. The default should normally corre-
+       spond to the standard sequence for your operating system.
+
+         PCRE_CONFIG_BSR
+
+       The output is an integer whose value indicates what character sequences
+       the \R escape sequence matches by default. A value of 0 means  that  \R
+       matches  any  Unicode  line ending sequence; a value of 1 means that \R
+       matches only CR, LF, or CRLF. The default can be overridden when a pat-
+       tern is compiled or matched.
+
+         PCRE_CONFIG_LINK_SIZE
+
+       The  output  is  an  integer that contains the number of bytes used for
+       internal linkage in compiled regular expressions. The value is 2, 3, or
+       4.  Larger  values  allow larger regular expressions to be compiled, at
+       the expense of slower matching. The default value of  2  is  sufficient
+       for  all  but  the  most massive patterns, since it allows the compiled
+       pattern to be up to 64K in size.
+
+         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
+
+       The output is an integer that contains the threshold  above  which  the
+       POSIX  interface  uses malloc() for output vectors. Further details are
+       given in the pcreposix documentation.
+
+         PCRE_CONFIG_MATCH_LIMIT
+
+       The output is a long integer that gives the default limit for the  num-
+       ber  of  internal  matching  function calls in a pcre_exec() execution.
+       Further details are given with pcre_exec() below.
+
+         PCRE_CONFIG_MATCH_LIMIT_RECURSION
+
+       The output is a long integer that gives the default limit for the depth
+       of   recursion  when  calling  the  internal  matching  function  in  a
+       pcre_exec() execution.  Further  details  are  given  with  pcre_exec()
+       below.
+
+         PCRE_CONFIG_STACKRECURSE
+
+       The  output is an integer that is set to one if internal recursion when
+       running pcre_exec() is implemented by recursive function calls that use
+       the  stack  to remember their state. This is the usual way that PCRE is
+       compiled. The output is zero if PCRE was compiled to use blocks of data
+       on  the  heap  instead  of  recursive  function  calls.  In  this case,
+       pcre_stack_malloc and  pcre_stack_free  are  called  to  manage  memory
+       blocks on the heap, thus avoiding the use of the stack.
+
+
+COMPILING A PATTERN
+
+       pcre *pcre_compile(const char *pattern, int options,
+            const char **errptr, int *erroffset,
+            const unsigned char *tableptr);
+
+       pcre *pcre_compile2(const char *pattern, int options,
+            int *errorcodeptr,
+            const char **errptr, int *erroffset,
+            const unsigned char *tableptr);
+
+       Either of the functions pcre_compile() or pcre_compile2() can be called
+       to compile a pattern into an internal form. The only difference between
+       the  two interfaces is that pcre_compile2() has an additional argument,
+       errorcodeptr, via which a numerical error  code  can  be  returned.  To
+       avoid  too  much repetition, we refer just to pcre_compile() below, but
+       the information applies equally to pcre_compile2().
+
+       The pattern is a C string terminated by a binary zero, and is passed in
+       the  pattern  argument.  A  pointer to a single block of memory that is
+       obtained via pcre_malloc is returned. This contains the  compiled  code
+       and related data. The pcre type is defined for the returned block; this
+       is a typedef for a structure whose contents are not externally defined.
+       It is up to the caller to free the memory (via pcre_free) when it is no
+       longer required.
+
+       Although the compiled code of a PCRE regex is relocatable, that is,  it
+       does not depend on memory location, the complete pcre data block is not
+       fully relocatable, because it may contain a copy of the tableptr  argu-
+       ment, which is an address (see below).
+
+       The options argument contains various bit settings that affect the com-
+       pilation. It should be zero if no options are required.  The  available
+       options  are  described  below. Some of them (in particular, those that
+       are compatible with Perl, but some others as well) can also be set  and
+       unset  from  within  the  pattern  (see the detailed description in the
+       pcrepattern documentation). For those options that can be different  in
+       different  parts  of  the pattern, the contents of the options argument
+       specifies their settings at the start of compilation and execution. The
+       PCRE_ANCHORED, PCRE_BSR_xxx, and PCRE_NEWLINE_xxx options can be set at
+       the time of matching as well as at compile time.
+
+       If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
+       if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
+       sets the variable pointed to by errptr to point to a textual error mes-
+       sage. This is a static string that is part of the library. You must not
+       try to free it. The byte offset from the start of the  pattern  to  the
+       character  that  was  being  processed when the error was discovered is
+       placed in the variable pointed to by erroffset, which must not be NULL.
+       If  it  is,  an  immediate error is given. Some errors are not detected
+       until checks are carried out when the whole pattern has  been  scanned;
+       in this case the offset is set to the end of the pattern.
+
+       If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
+       codeptr argument is not NULL, a non-zero error code number is  returned
+       via  this argument in the event of an error. This is in addition to the
+       textual error message. Error codes and messages are listed below.
+
+       If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
+       character  tables  that  are  built  when  PCRE  is compiled, using the
+       default C locale. Otherwise, tableptr must be an address  that  is  the
+       result  of  a  call to pcre_maketables(). This value is stored with the
+       compiled pattern, and used again by pcre_exec(), unless  another  table
+       pointer is passed to it. For more discussion, see the section on locale
+       support below.
+
+       This code fragment shows a typical straightforward  call  to  pcre_com-
+       pile():
+
+         pcre *re;
+         const char *error;
+         int erroffset;
+         re = pcre_compile(
+           "^A.*Z",          /* the pattern */
+           0,                /* default options */
+           &error,           /* for error message */
+           &erroffset,       /* for error offset */
+           NULL);            /* use default character tables */
+
+       The  following  names  for option bits are defined in the pcre.h header
+       file:
+
+         PCRE_ANCHORED
+
+       If this bit is set, the pattern is forced to be "anchored", that is, it
+       is  constrained to match only at the first matching point in the string
+       that is being searched (the "subject string"). This effect can also  be
+       achieved  by appropriate constructs in the pattern itself, which is the
+       only way to do it in Perl.
+
+         PCRE_AUTO_CALLOUT
+
+       If this bit is set, pcre_compile() automatically inserts callout items,
+       all  with  number  255, before each pattern item. For discussion of the
+       callout facility, see the pcrecallout documentation.
+
+         PCRE_BSR_ANYCRLF
+         PCRE_BSR_UNICODE
+
+       These options (which are mutually exclusive) control what the \R escape
+       sequence  matches.  The choice is either to match only CR, LF, or CRLF,
+       or to match any Unicode newline sequence. The default is specified when
+       PCRE is built. It can be overridden from within the pattern, or by set-
+       ting an option when a compiled pattern is matched.
+
+         PCRE_CASELESS
+
+       If this bit is set, letters in the pattern match both upper  and  lower
+       case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
+       changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
+       always  understands the concept of case for characters whose values are
+       less than 128, so caseless matching is always possible. For  characters
+       with  higher  values,  the concept of case is supported if PCRE is com-
+       piled with Unicode property support, but not otherwise. If you want  to
+       use  caseless  matching  for  characters 128 and above, you must ensure
+       that PCRE is compiled with Unicode property support  as  well  as  with
+       UTF-8 support.
+
+         PCRE_DOLLAR_ENDONLY
+
+       If  this bit is set, a dollar metacharacter in the pattern matches only
+       at the end of the subject string. Without this option,  a  dollar  also
+       matches  immediately before a newline at the end of the string (but not
+       before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
+       if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
+       Perl, and no way to set it within a pattern.
+
+         PCRE_DOTALL
+
+       If this bit is set, a dot metacharater in the pattern matches all char-
+       acters,  including  those that indicate newline. Without it, a dot does
+       not match when the current position is at a  newline.  This  option  is
+       equivalent  to Perl's /s option, and it can be changed within a pattern
+       by a (?s) option setting. A negative class such as [^a] always  matches
+       newline characters, independent of the setting of this option.
+
+         PCRE_DUPNAMES
+
+       If  this  bit is set, names used to identify capturing subpatterns need
+       not be unique. This can be helpful for certain types of pattern when it
+       is  known  that  only  one instance of the named subpattern can ever be
+       matched. There are more details of named subpatterns  below;  see  also
+       the pcrepattern documentation.
+
+         PCRE_EXTENDED
+
+       If  this  bit  is  set,  whitespace  data characters in the pattern are
+       totally ignored except when escaped or inside a character class. White-
+       space does not include the VT character (code 11). In addition, charac-
+       ters between an unescaped # outside a character class and the next new-
+       line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
+       option, and it can be changed within a pattern by a  (?x)  option  set-
+       ting.
+
+       This  option  makes  it possible to include comments inside complicated
+       patterns.  Note, however, that this applies only  to  data  characters.
+       Whitespace   characters  may  never  appear  within  special  character
+       sequences in a pattern, for  example  within  the  sequence  (?(  which
+       introduces a conditional subpattern.
+
+         PCRE_EXTRA
+
+       This  option  was invented in order to turn on additional functionality
+       of PCRE that is incompatible with Perl, but it  is  currently  of  very
+       little  use. When set, any backslash in a pattern that is followed by a
+       letter that has no special meaning  causes  an  error,  thus  reserving
+       these  combinations  for  future  expansion.  By default, as in Perl, a
+       backslash followed by a letter with no special meaning is treated as  a
+       literal.  (Perl can, however, be persuaded to give a warning for this.)
+       There are at present no other features controlled by  this  option.  It
+       can also be set by a (?X) option setting within a pattern.
+
+         PCRE_FIRSTLINE
+
+       If  this  option  is  set,  an  unanchored pattern is required to match
+       before or at the first  newline  in  the  subject  string,  though  the
+       matched text may continue over the newline.
+
+         PCRE_JAVASCRIPT_COMPAT
+
+       If this option is set, PCRE's behaviour is changed in some ways so that
+       it is compatible with JavaScript rather than Perl. The changes  are  as
+       follows:
+
+       (1)  A  lone  closing square bracket in a pattern causes a compile-time
+       error, because this is illegal in JavaScript (by default it is  treated
+       as a data character). Thus, the pattern AB]CD becomes illegal when this
+       option is set.
+
+       (2) At run time, a back reference to an unset subpattern group  matches
+       an  empty  string (by default this causes the current matching alterna-
+       tive to fail). A pattern such as (\1)(a) succeeds when this  option  is
+       set  (assuming  it can find an "a" in the subject), whereas it fails by
+       default, for Perl compatibility.
+
+         PCRE_MULTILINE
+
+       By default, PCRE treats the subject string as consisting  of  a  single
+       line  of characters (even if it actually contains newlines). The "start
+       of line" metacharacter (^) matches only at the  start  of  the  string,
+       while  the  "end  of line" metacharacter ($) matches only at the end of
+       the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
+       is set). This is the same as Perl.
+
+       When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
+       constructs match immediately following or immediately  before  internal
+       newlines  in  the  subject string, respectively, as well as at the very
+       start and end. This is equivalent to Perl's /m option, and  it  can  be
+       changed within a pattern by a (?m) option setting. If there are no new-
+       lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
+       setting PCRE_MULTILINE has no effect.
+
+         PCRE_NEWLINE_CR
+         PCRE_NEWLINE_LF
+         PCRE_NEWLINE_CRLF
+         PCRE_NEWLINE_ANYCRLF
+         PCRE_NEWLINE_ANY
+
+       These  options  override the default newline definition that was chosen
+       when PCRE was built. Setting the first or the second specifies  that  a
+       newline  is  indicated  by a single character (CR or LF, respectively).
+       Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the
+       two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies
+       that any of the three preceding sequences should be recognized. Setting
+       PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
+       recognized. The Unicode newline sequences are the three just mentioned,
+       plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
+       U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
+       (paragraph  separator,  U+2029).  The  last  two are recognized only in
+       UTF-8 mode.
+
+       The newline setting in the  options  word  uses  three  bits  that  are
+       treated as a number, giving eight possibilities. Currently only six are
+       used (default plus the five values above). This means that if  you  set
+       more  than one newline option, the combination may or may not be sensi-
+       ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
+       PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
+       cause an error.
+
+       The only time that a line break is specially recognized when  compiling
+       a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a
+       character class is encountered. This indicates  a  comment  that  lasts
+       until  after the next line break sequence. In other circumstances, line
+       break  sequences  are  treated  as  literal  data,   except   that   in
+       PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
+       and are therefore ignored.
+
+       The newline option that is set at compile time becomes the default that
+       is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden.
+
+         PCRE_NO_AUTO_CAPTURE
+
+       If this option is set, it disables the use of numbered capturing paren-
+       theses in the pattern. Any opening parenthesis that is not followed  by
+       ?  behaves as if it were followed by ?: but named parentheses can still
+       be used for capturing (and they acquire  numbers  in  the  usual  way).
+       There is no equivalent of this option in Perl.
+
+         PCRE_UNGREEDY
+
+       This  option  inverts  the "greediness" of the quantifiers so that they
+       are not greedy by default, but become greedy if followed by "?". It  is
+       not  compatible  with Perl. It can also be set by a (?U) option setting
+       within the pattern.
+
+         PCRE_UTF8
+
+       This option causes PCRE to regard both the pattern and the  subject  as
+       strings  of  UTF-8 characters instead of single-byte character strings.
+       However, it is available only when PCRE is built to include UTF-8  sup-
+       port.  If not, the use of this option provokes an error. Details of how
+       this option changes the behaviour of PCRE are given in the  section  on
+       UTF-8 support in the main pcre page.
+
+         PCRE_NO_UTF8_CHECK
+
+       When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
+       automatically checked. There is a  discussion  about  the  validity  of
+       UTF-8  strings  in  the main pcre page. If an invalid UTF-8 sequence of
+       bytes is found, pcre_compile() returns an error. If  you  already  know
+       that your pattern is valid, and you want to skip this check for perfor-
+       mance reasons, you can set the PCRE_NO_UTF8_CHECK option.  When  it  is
+       set,  the  effect  of  passing  an invalid UTF-8 string as a pattern is
+       undefined. It may cause your program to crash. Note  that  this  option
+       can  also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the
+       UTF-8 validity checking of subject strings.
+
+
+COMPILATION ERROR CODES
+
+       The following table lists the error  codes  than  may  be  returned  by
+       pcre_compile2(),  along with the error messages that may be returned by
+       both compiling functions. As PCRE has developed, some error codes  have
+       fallen out of use. To avoid confusion, they have not been re-used.
+
+          0  no error
+          1  \ at end of pattern
+          2  \c at end of pattern
+          3  unrecognized character follows \
+          4  numbers out of order in {} quantifier
+          5  number too big in {} quantifier
+          6  missing terminating ] for character class
+          7  invalid escape sequence in character class
+          8  range out of order in character class
+          9  nothing to repeat
+         10  [this code is not in use]
+         11  internal error: unexpected repeat
+         12  unrecognized character after (? or (?-
+         13  POSIX named classes are supported only within a class
+         14  missing )
+         15  reference to non-existent subpattern
+         16  erroffset passed as NULL
+         17  unknown option bit(s) set
+         18  missing ) after comment
+         19  [this code is not in use]
+         20  regular expression is too large
+         21  failed to get memory
+         22  unmatched parentheses
+         23  internal error: code overflow
+         24  unrecognized character after (?<
+         25  lookbehind assertion is not fixed length
+         26  malformed number or name after (?(
+         27  conditional group contains more than two branches
+         28  assertion expected after (?(
+         29  (?R or (?[+-]digits must be followed by )
+         30  unknown POSIX class name
+         31  POSIX collating elements are not supported
+         32  this version of PCRE is not compiled with PCRE_UTF8 support
+         33  [this code is not in use]
+         34  character value in \x{...} sequence is too large
+         35  invalid condition (?(0)
+         36  \C not allowed in lookbehind assertion
+         37  PCRE does not support \L, \l, \N, \U, or \u
+         38  number after (?C is > 255
+         39  closing ) for (?C expected
+         40  recursive call could loop indefinitely
+         41  unrecognized character after (?P
+         42  syntax error in subpattern name (missing terminator)
+         43  two named subpatterns have the same name
+         44  invalid UTF-8 string
+         45  support for \P, \p, and \X has not been compiled
+         46  malformed \P or \p sequence
+         47  unknown property name after \P or \p
+         48  subpattern name is too long (maximum 32 characters)
+         49  too many named subpatterns (maximum 10000)
+         50  [this code is not in use]
+         51  octal value is greater than \377 (not in UTF-8 mode)
+         52  internal error: overran compiling workspace
+         53   internal  error:  previously-checked  referenced  subpattern not
+       found
+         54  DEFINE group contains more than one branch
+         55  repeating a DEFINE group is not allowed
+         56  inconsistent NEWLINE options
+         57  \g is not followed by a braced, angle-bracketed, or quoted
+               name/number or by a plain number
+         58  a numbered reference must not be zero
+         59  (*VERB) with an argument is not supported
+         60  (*VERB) not recognized
+         61  number is too big
+         62  subpattern name expected
+         63  digit expected after (?+
+         64  ] is an invalid data character in JavaScript compatibility mode
+
+       The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
+       values may be used if the limits were changed when PCRE was built.
+
+
+STUDYING A PATTERN
+
+       pcre_extra *pcre_study(const pcre *code, int options
+            const char **errptr);
+
+       If  a  compiled  pattern is going to be used several times, it is worth
+       spending more time analyzing it in order to speed up the time taken for
+       matching.  The function pcre_study() takes a pointer to a compiled pat-
+       tern as its first argument. If studying the pattern produces additional
+       information  that  will  help speed up matching, pcre_study() returns a
+       pointer to a pcre_extra block, in which the study_data field points  to
+       the results of the study.
+
+       The  returned  value  from  pcre_study()  can  be  passed  directly  to
+       pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block  also  con-
+       tains  other  fields  that can be set by the caller before the block is
+       passed; these are described below in the section on matching a pattern.
+
+       If studying the  pattern  does  not  produce  any  useful  information,
+       pcre_study() returns NULL. In that circumstance, if the calling program
+       wants  to  pass  any  of   the   other   fields   to   pcre_exec()   or
+       pcre_dfa_exec(), it must set up its own pcre_extra block.
+
+       The  second  argument of pcre_study() contains option bits. At present,
+       no options are defined, and this argument should always be zero.
+
+       The third argument for pcre_study() is a pointer for an error  message.
+       If  studying  succeeds  (even  if no data is returned), the variable it
+       points to is set to NULL. Otherwise it is set to  point  to  a  textual
+       error message. This is a static string that is part of the library. You
+       must not try to free it. You should test the  error  pointer  for  NULL
+       after calling pcre_study(), to be sure that it has run successfully.
+
+       This is a typical call to pcre_study():
+
+         pcre_extra *pe;
+         pe = pcre_study(
+           re,             /* result of pcre_compile() */
+           0,              /* no options exist */
+           &error);        /* set to NULL or points to a message */
+
+       Studying a pattern does two things: first, a lower bound for the length
+       of subject string that is needed to match the pattern is computed. This
+       does not mean that there are any strings of that length that match, but
+       it does guarantee that no shorter strings match. The value is  used  by
+       pcre_exec()  and  pcre_dfa_exec()  to  avoid  wasting time by trying to
+       match strings that are shorter than the lower bound. You can  find  out
+       the value in a calling program via the pcre_fullinfo() function.
+
+       Studying a pattern is also useful for non-anchored patterns that do not
+       have a single fixed starting character. A bitmap of  possible  starting
+       bytes  is  created. This speeds up finding a position in the subject at
+       which to start matching.
+
+
+LOCALE SUPPORT
+
+       PCRE handles caseless matching, and determines whether  characters  are
+       letters,  digits, or whatever, by reference to a set of tables, indexed
+       by character value. When running in UTF-8 mode, this  applies  only  to
+       characters  with  codes  less than 128. Higher-valued codes never match
+       escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
+       with  Unicode  character property support. The use of locales with Uni-
+       code is discouraged. If you are handling characters with codes  greater
+       than  128, you should either use UTF-8 and Unicode, or use locales, but
+       not try to mix the two.
+
+       PCRE contains an internal set of tables that are used  when  the  final
+       argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
+       applications.  Normally, the internal tables recognize only ASCII char-
+       acters. However, when PCRE is built, it is possible to cause the inter-
+       nal tables to be rebuilt in the default "C" locale of the local system,
+       which may cause them to be different.
+
+       The  internal tables can always be overridden by tables supplied by the
+       application that calls PCRE. These may be created in a different locale
+       from  the  default.  As more and more applications change to using Uni-
+       code, the need for this locale support is expected to die away.
+
+       External tables are built by calling  the  pcre_maketables()  function,
+       which  has no arguments, in the relevant locale. The result can then be
+       passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
+       example,  to  build  and use tables that are appropriate for the French
+       locale (where accented characters with  values  greater  than  128  are
+       treated as letters), the following code could be used:
+
+         setlocale(LC_CTYPE, "fr_FR");
+         tables = pcre_maketables();
+         re = pcre_compile(..., tables);
+
+       The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
+       if you are using Windows, the name for the French locale is "french".
+
+       When pcre_maketables() runs, the tables are built  in  memory  that  is
+       obtained  via  pcre_malloc. It is the caller's responsibility to ensure
+       that the memory containing the tables remains available for as long  as
+       it is needed.
+
+       The pointer that is passed to pcre_compile() is saved with the compiled
+       pattern, and the same tables are used via this pointer by  pcre_study()
+       and normally also by pcre_exec(). Thus, by default, for any single pat-
+       tern, compilation, studying and matching all happen in the same locale,
+       but different patterns can be compiled in different locales.
+
+       It  is  possible to pass a table pointer or NULL (indicating the use of
+       the internal tables) to pcre_exec(). Although  not  intended  for  this
+       purpose,  this facility could be used to match a pattern in a different
+       locale from the one in which it was compiled. Passing table pointers at
+       run time is discussed below in the section on matching a pattern.
+
+
+INFORMATION ABOUT A PATTERN
+
+       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
+            int what, void *where);
+
+       The  pcre_fullinfo() function returns information about a compiled pat-
+       tern. It replaces the obsolete pcre_info() function, which is neverthe-
+       less retained for backwards compability (and is documented below).
+
+       The  first  argument  for  pcre_fullinfo() is a pointer to the compiled
+       pattern. The second argument is the result of pcre_study(), or NULL  if
+       the  pattern  was not studied. The third argument specifies which piece
+       of information is required, and the fourth argument is a pointer  to  a
+       variable  to  receive  the  data. The yield of the function is zero for
+       success, or one of the following negative numbers:
+
+         PCRE_ERROR_NULL       the argument code was NULL
+                               the argument where was NULL
+         PCRE_ERROR_BADMAGIC   the "magic number" was not found
+         PCRE_ERROR_BADOPTION  the value of what was invalid
+
+       The "magic number" is placed at the start of each compiled  pattern  as
+       an  simple check against passing an arbitrary memory pointer. Here is a
+       typical call of pcre_fullinfo(), to obtain the length of  the  compiled
+       pattern:
+
+         int rc;
+         size_t length;
+         rc = pcre_fullinfo(
+           re,               /* result of pcre_compile() */
+           pe,               /* result of pcre_study(), or NULL */
+           PCRE_INFO_SIZE,   /* what is required */
+           &length);         /* where to put the data */
+
+       The  possible  values for the third argument are defined in pcre.h, and
+       are as follows:
+
+         PCRE_INFO_BACKREFMAX
+
+       Return the number of the highest back reference  in  the  pattern.  The
+       fourth  argument  should  point to an int variable. Zero is returned if
+       there are no back references.
+
+         PCRE_INFO_CAPTURECOUNT
+
+       Return the number of capturing subpatterns in the pattern.  The  fourth
+       argument should point to an int variable.
+
+         PCRE_INFO_DEFAULT_TABLES
+
+       Return  a pointer to the internal default character tables within PCRE.
+       The fourth argument should point to an unsigned char *  variable.  This
+       information call is provided for internal use by the pcre_study() func-
+       tion. External callers can cause PCRE to use  its  internal  tables  by
+       passing a NULL table pointer.
+
+         PCRE_INFO_FIRSTBYTE
+
+       Return  information  about  the first byte of any matched string, for a
+       non-anchored pattern. The fourth argument should point to an int  vari-
+       able.  (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
+       is still recognized for backwards compatibility.)
+
+       If there is a fixed first byte, for example, from  a  pattern  such  as
+       (cat|cow|coyote), its value is returned. Otherwise, if either
+
+       (a)  the pattern was compiled with the PCRE_MULTILINE option, and every
+       branch starts with "^", or
+
+       (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
+       set (if it were set, the pattern would be anchored),
+
+       -1  is  returned, indicating that the pattern matches only at the start
+       of a subject string or after any newline within the  string.  Otherwise
+       -2 is returned. For anchored patterns, -2 is returned.
+
+         PCRE_INFO_FIRSTTABLE
+
+       If  the pattern was studied, and this resulted in the construction of a
+       256-bit table indicating a fixed set of bytes for the first byte in any
+       matching  string, a pointer to the table is returned. Otherwise NULL is
+       returned. The fourth argument should point to an unsigned char *  vari-
+       able.
+
+         PCRE_INFO_HASCRORLF
+
+       Return  1  if  the  pattern  contains any explicit matches for CR or LF
+       characters, otherwise 0. The fourth argument should  point  to  an  int
+       variable.  An explicit match is either a literal CR or LF character, or
+       \r or \n.
+
+         PCRE_INFO_JCHANGED
+
+       Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
+       otherwise  0. The fourth argument should point to an int variable. (?J)
+       and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.
+
+         PCRE_INFO_LASTLITERAL
+
+       Return the value of the rightmost literal byte that must exist  in  any
+       matched  string,  other  than  at  its  start,  if such a byte has been
+       recorded. The fourth argument should point to an int variable. If there
+       is  no such byte, -1 is returned. For anchored patterns, a last literal
+       byte is recorded only if it follows something of variable  length.  For
+       example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
+       /^a\dz\d/ the returned value is -1.
+
+         PCRE_INFO_MINLENGTH
+
+       If the pattern was studied and a minimum length  for  matching  subject
+       strings  was  computed,  its  value is returned. Otherwise the returned
+       value is -1. The value is a number of characters, not bytes  (this  may
+       be  relevant in UTF-8 mode). The fourth argument should point to an int
+       variable. A non-negative value is a lower bound to the  length  of  any
+       matching  string.  There  may not be any strings of that length that do
+       actually match, but every string that does match is at least that long.
+
+         PCRE_INFO_NAMECOUNT
+         PCRE_INFO_NAMEENTRYSIZE
+         PCRE_INFO_NAMETABLE
+
+       PCRE supports the use of named as well as numbered capturing  parenthe-
+       ses.  The names are just an additional way of identifying the parenthe-
+       ses, which still acquire numbers. Several convenience functions such as
+       pcre_get_named_substring()  are  provided  for extracting captured sub-
+       strings by name. It is also possible to extract the data  directly,  by
+       first  converting  the  name to a number in order to access the correct
+       pointers in the output vector (described with pcre_exec() below). To do
+       the  conversion,  you  need  to  use  the  name-to-number map, which is
+       described by these three values.
+
+       The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
+       gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
+       of each entry; both of these  return  an  int  value.  The  entry  size
+       depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
+       a pointer to the first entry of the table  (a  pointer  to  char).  The
+       first two bytes of each entry are the number of the capturing parenthe-
+       sis, most significant byte first. The rest of the entry is  the  corre-
+       sponding name, zero terminated.
+
+       The  names are in alphabetical order. Duplicate names may appear if (?|
+       is used to create multiple groups with the same number, as described in
+       the  section  on  duplicate subpattern numbers in the pcrepattern page.
+       Duplicate names for subpatterns with different  numbers  are  permitted
+       only  if  PCRE_DUPNAMES  is  set. In all cases of duplicate names, they
+       appear in the table in the order in which they were found in  the  pat-
+       tern.  In  the  absence  of (?| this is the order of increasing number;
+       when (?| is used this is not necessarily the case because later subpat-
+       terns may have lower numbers.
+
+       As  a  simple  example of the name/number table, consider the following
+       pattern (assume PCRE_EXTENDED is set, so white space -  including  new-
+       lines - is ignored):
+
+         (?<date> (?<year>(\d\d)?\d\d) -
+         (?<month>\d\d) - (?<day>\d\d) )
+
+       There  are  four  named subpatterns, so the table has four entries, and
+       each entry in the table is eight bytes long. The table is  as  follows,
+       with non-printing bytes shows in hexadecimal, and undefined bytes shown
+       as ??:
+
+         00 01 d  a  t  e  00 ??
+         00 05 d  a  y  00 ?? ??
+         00 04 m  o  n  t  h  00
+         00 02 y  e  a  r  00 ??
+
+       When writing code to extract data  from  named  subpatterns  using  the
+       name-to-number  map,  remember that the length of the entries is likely
+       to be different for each compiled pattern.
+
+         PCRE_INFO_OKPARTIAL
+
+       Return 1  if  the  pattern  can  be  used  for  partial  matching  with
+       pcre_exec(),  otherwise  0.  The fourth argument should point to an int
+       variable. From  release  8.00,  this  always  returns  1,  because  the
+       restrictions  that  previously  applied  to  partial matching have been
+       lifted. The pcrepartial documentation gives details of  partial  match-
+       ing.
+
+         PCRE_INFO_OPTIONS
+
+       Return  a  copy of the options with which the pattern was compiled. The
+       fourth argument should point to an unsigned long  int  variable.  These
+       option bits are those specified in the call to pcre_compile(), modified
+       by any top-level option settings at the start of the pattern itself. In
+       other  words,  they are the options that will be in force when matching
+       starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with
+       the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
+       and PCRE_EXTENDED.
+
+       A pattern is automatically anchored by PCRE if  all  of  its  top-level
+       alternatives begin with one of the following:
+
+         ^     unless PCRE_MULTILINE is set
+         \A    always
+         \G    always
+         .*    if PCRE_DOTALL is set and there are no back
+                 references to the subpattern in which .* appears
+
+       For such patterns, the PCRE_ANCHORED bit is set in the options returned
+       by pcre_fullinfo().
+
+         PCRE_INFO_SIZE
+
+       Return the size of the compiled pattern, that is, the  value  that  was
+       passed as the argument to pcre_malloc() when PCRE was getting memory in
+       which to place the compiled data. The fourth argument should point to a
+       size_t variable.
+
+         PCRE_INFO_STUDYSIZE
+
+       Return the size of the data block pointed to by the study_data field in
+       a pcre_extra block. That is,  it  is  the  value  that  was  passed  to
+       pcre_malloc() when PCRE was getting memory into which to place the data
+       created by pcre_study(). If pcre_extra is NULL, or there  is  no  study
+       data,  zero  is  returned. The fourth argument should point to a size_t
+       variable.
+
+
+OBSOLETE INFO FUNCTION
+
+       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
+
+       The pcre_info() function is now obsolete because its interface  is  too
+       restrictive  to return all the available data about a compiled pattern.
+       New  programs  should  use  pcre_fullinfo()  instead.  The   yield   of
+       pcre_info()  is the number of capturing subpatterns, or one of the fol-
+       lowing negative numbers:
+
+         PCRE_ERROR_NULL       the argument code was NULL
+         PCRE_ERROR_BADMAGIC   the "magic number" was not found
+
+       If the optptr argument is not NULL, a copy of the  options  with  which
+       the  pattern  was  compiled  is placed in the integer it points to (see
+       PCRE_INFO_OPTIONS above).
+
+       If the pattern is not anchored and the  firstcharptr  argument  is  not
+       NULL,  it is used to pass back information about the first character of
+       any matched string (see PCRE_INFO_FIRSTBYTE above).
+
+
+REFERENCE COUNTS
+
+       int pcre_refcount(pcre *code, int adjust);
+
+       The pcre_refcount() function is used to maintain a reference  count  in
+       the data block that contains a compiled pattern. It is provided for the
+       benefit of applications that  operate  in  an  object-oriented  manner,
+       where different parts of the application may be using the same compiled
+       pattern, but you want to free the block when they are all done.
+
+       When a pattern is compiled, the reference count field is initialized to
+       zero.   It is changed only by calling this function, whose action is to
+       add the adjust value (which may be positive or  negative)  to  it.  The
+       yield of the function is the new value. However, the value of the count
+       is constrained to lie between 0 and 65535, inclusive. If the new  value
+       is outside these limits, it is forced to the appropriate limit value.
+
+       Except  when it is zero, the reference count is not correctly preserved
+       if a pattern is compiled on one host and then  transferred  to  a  host
+       whose byte-order is different. (This seems a highly unlikely scenario.)
+
+
+MATCHING A PATTERN: THE TRADITIONAL FUNCTION
+
+       int pcre_exec(const pcre *code, const pcre_extra *extra,
+            const char *subject, int length, int startoffset,
+            int options, int *ovector, int ovecsize);
+
+       The  function pcre_exec() is called to match a subject string against a
+       compiled pattern, which is passed in the code argument. If the  pattern
+       was  studied,  the  result  of  the study should be passed in the extra
+       argument. This function is the main matching facility of  the  library,
+       and it operates in a Perl-like manner. For specialist use there is also
+       an alternative matching function, which is described below in the  sec-
+       tion about the pcre_dfa_exec() function.
+
+       In  most applications, the pattern will have been compiled (and option-
+       ally studied) in the same process that calls pcre_exec().  However,  it
+       is possible to save compiled patterns and study data, and then use them
+       later in different processes, possibly even on different hosts.  For  a
+       discussion about this, see the pcreprecompile documentation.
+
+       Here is an example of a simple call to pcre_exec():
+
+         int rc;
+         int ovector[30];
+         rc = pcre_exec(
+           re,             /* result of pcre_compile() */
+           NULL,           /* we didn't study the pattern */
+           "some string",  /* the subject string */
+           11,             /* the length of the subject string */
+           0,              /* start at offset 0 in the subject */
+           0,              /* default options */
+           ovector,        /* vector of integers for substring information */
+           30);            /* number of elements (NOT size in bytes) */
+
+   Extra data for pcre_exec()
+
+       If  the  extra argument is not NULL, it must point to a pcre_extra data
+       block. The pcre_study() function returns such a block (when it  doesn't
+       return  NULL), but you can also create one for yourself, and pass addi-
+       tional information in it. The pcre_extra block contains  the  following
+       fields (not necessarily in this order):
+
+         unsigned long int flags;
+         void *study_data;
+         unsigned long int match_limit;
+         unsigned long int match_limit_recursion;
+         void *callout_data;
+         const unsigned char *tables;
+
+       The  flags  field  is a bitmap that specifies which of the other fields
+       are set. The flag bits are:
+
+         PCRE_EXTRA_STUDY_DATA
+         PCRE_EXTRA_MATCH_LIMIT
+         PCRE_EXTRA_MATCH_LIMIT_RECURSION
+         PCRE_EXTRA_CALLOUT_DATA
+         PCRE_EXTRA_TABLES
+
+       Other flag bits should be set to zero. The study_data field is  set  in
+       the  pcre_extra  block  that is returned by pcre_study(), together with
+       the appropriate flag bit. You should not set this yourself, but you may
+       add  to  the  block by setting the other fields and their corresponding
+       flag bits.
+
+       The match_limit field provides a means of preventing PCRE from using up
+       a  vast amount of resources when running patterns that are not going to
+       match, but which have a very large number  of  possibilities  in  their
+       search  trees. The classic example is a pattern that uses nested unlim-
+       ited repeats.
+
+       Internally, PCRE uses a function called match() which it calls  repeat-
+       edly  (sometimes  recursively). The limit set by match_limit is imposed
+       on the number of times this function is called during  a  match,  which
+       has  the  effect  of  limiting the amount of backtracking that can take
+       place. For patterns that are not anchored, the count restarts from zero
+       for each position in the subject string.
+
+       The  default  value  for  the  limit can be set when PCRE is built; the
+       default default is 10 million, which handles all but the  most  extreme
+       cases.  You  can  override  the  default by suppling pcre_exec() with a
+       pcre_extra    block    in    which    match_limit    is    set,     and
+       PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is
+       exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
+
+       The match_limit_recursion field is similar to match_limit, but  instead
+       of limiting the total number of times that match() is called, it limits
+       the depth of recursion. The recursion depth is a  smaller  number  than
+       the  total number of calls, because not all calls to match() are recur-
+       sive.  This limit is of use only if it is set smaller than match_limit.
+
+       Limiting the recursion depth limits the amount of  stack  that  can  be
+       used, or, when PCRE has been compiled to use memory on the heap instead
+       of the stack, the amount of heap memory that can be used.
+
+       The default value for match_limit_recursion can be  set  when  PCRE  is
+       built;  the  default  default  is  the  same  value  as the default for
+       match_limit. You can override the default by suppling pcre_exec()  with
+       a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and
+       PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the
+       limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
+
+       The  callout_data  field is used in conjunction with the "callout" fea-
+       ture, and is described in the pcrecallout documentation.
+
+       The tables field  is  used  to  pass  a  character  tables  pointer  to
+       pcre_exec();  this overrides the value that is stored with the compiled
+       pattern. A non-NULL value is stored with the compiled pattern  only  if
+       custom  tables  were  supplied to pcre_compile() via its tableptr argu-
+       ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
+       PCRE's  internal  tables  to be used. This facility is helpful when re-
+       using patterns that have been saved after compiling  with  an  external
+       set  of  tables,  because  the  external tables might be at a different
+       address when pcre_exec() is called. See the  pcreprecompile  documenta-
+       tion for a discussion of saving compiled patterns for later use.
+
+   Option bits for pcre_exec()
+
+       The  unused  bits of the options argument for pcre_exec() must be zero.
+       The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,
+       PCRE_NOTBOL,    PCRE_NOTEOL,    PCRE_NOTEMPTY,   PCRE_NOTEMPTY_ATSTART,
+       PCRE_NO_START_OPTIMIZE,  PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_SOFT,   and
+       PCRE_PARTIAL_HARD.
+
+         PCRE_ANCHORED
+
+       The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first
+       matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or
+       turned  out to be anchored by virtue of its contents, it cannot be made
+       unachored at matching time.
+
+         PCRE_BSR_ANYCRLF
+         PCRE_BSR_UNICODE
+
+       These options (which are mutually exclusive) control what the \R escape
+       sequence  matches.  The choice is either to match only CR, LF, or CRLF,
+       or to match any Unicode newline sequence. These  options  override  the
+       choice that was made or defaulted when the pattern was compiled.
+
+         PCRE_NEWLINE_CR
+         PCRE_NEWLINE_LF
+         PCRE_NEWLINE_CRLF
+         PCRE_NEWLINE_ANYCRLF
+         PCRE_NEWLINE_ANY
+
+       These  options  override  the  newline  definition  that  was chosen or
+       defaulted when the pattern was compiled. For details, see the  descrip-
+       tion  of  pcre_compile()  above.  During  matching,  the newline choice
+       affects the behaviour of the dot, circumflex,  and  dollar  metacharac-
+       ters.  It may also alter the way the match position is advanced after a
+       match failure for an unanchored pattern.
+
+       When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is
+       set,  and a match attempt for an unanchored pattern fails when the cur-
+       rent position is at a  CRLF  sequence,  and  the  pattern  contains  no
+       explicit  matches  for  CR  or  LF  characters,  the  match position is
+       advanced by two characters instead of one, in other words, to after the
+       CRLF.
+
+       The above rule is a compromise that makes the most common cases work as
+       expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL
+       option is not set), it does not match the string "\r\nA" because, after
+       failing at the start, it skips both the CR and the LF before  retrying.
+       However,  the  pattern  [\r\n]A does match that string, because it con-
+       tains an explicit CR or LF reference, and so advances only by one char-
+       acter after the first failure.
+
+       An explicit match for CR of LF is either a literal appearance of one of
+       those characters, or one of the \r or  \n  escape  sequences.  Implicit
+       matches  such  as [^X] do not count, nor does \s (which includes CR and
+       LF in the characters that it matches).
+
+       Notwithstanding the above, anomalous effects may still occur when  CRLF
+       is a valid newline sequence and explicit \r or \n escapes appear in the
+       pattern.
+
+         PCRE_NOTBOL
+
+       This option specifies that first character of the subject string is not
+       the  beginning  of  a  line, so the circumflex metacharacter should not
+       match before it. Setting this without PCRE_MULTILINE (at compile  time)
+       causes  circumflex  never to match. This option affects only the behav-
+       iour of the circumflex metacharacter. It does not affect \A.
+
+         PCRE_NOTEOL
+
+       This option specifies that the end of the subject string is not the end
+       of  a line, so the dollar metacharacter should not match it nor (except
+       in multiline mode) a newline immediately before it. Setting this  with-
+       out PCRE_MULTILINE (at compile time) causes dollar never to match. This
+       option affects only the behaviour of the dollar metacharacter. It  does
+       not affect \Z or \z.
+
+         PCRE_NOTEMPTY
+
+       An empty string is not considered to be a valid match if this option is
+       set. If there are alternatives in the pattern, they are tried.  If  all
+       the  alternatives  match  the empty string, the entire match fails. For
+       example, if the pattern
+
+         a?b?
+
+       is applied to a string not beginning with "a" or  "b",  it  matches  an
+       empty  string at the start of the subject. With PCRE_NOTEMPTY set, this
+       match is not valid, so PCRE searches further into the string for occur-
+       rences of "a" or "b".
+
+         PCRE_NOTEMPTY_ATSTART
+
+       This  is  like PCRE_NOTEMPTY, except that an empty string match that is
+       not at the start of  the  subject  is  permitted.  If  the  pattern  is
+       anchored, such a match can occur only if the pattern contains \K.
+
+       Perl     has    no    direct    equivalent    of    PCRE_NOTEMPTY    or
+       PCRE_NOTEMPTY_ATSTART, but it does make a special  case  of  a  pattern
+       match  of  the empty string within its split() function, and when using
+       the /g modifier. It is  possible  to  emulate  Perl's  behaviour  after
+       matching a null string by first trying the match again at the same off-
+       set with PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED,  and  then  if  that
+       fails, by advancing the starting offset (see below) and trying an ordi-
+       nary match again. There is some code that demonstrates how to  do  this
+       in the pcredemo sample program.
+
+         PCRE_NO_START_OPTIMIZE
+
+       There  are a number of optimizations that pcre_exec() uses at the start
+       of a match, in order to speed up the process. For  example,  if  it  is
+       known  that  a  match must start with a specific character, it searches
+       the subject for that character, and fails immediately if it cannot find
+       it,  without actually running the main matching function. When callouts
+       are in use, these optimizations can cause  them  to  be  skipped.  This
+       option  disables  the  "start-up" optimizations, causing performance to
+       suffer, but ensuring that the callouts do occur.
+
+         PCRE_NO_UTF8_CHECK
+
+       When PCRE_UTF8 is set at compile time, the validity of the subject as a
+       UTF-8  string is automatically checked when pcre_exec() is subsequently
+       called.  The value of startoffset is also checked  to  ensure  that  it
+       points  to  the start of a UTF-8 character. There is a discussion about
+       the validity of UTF-8 strings in the section on UTF-8  support  in  the
+       main  pcre  page.  If  an  invalid  UTF-8  sequence  of bytes is found,
+       pcre_exec() returns the error PCRE_ERROR_BADUTF8. If  startoffset  con-
+       tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
+
+       If  you  already  know that your subject is valid, and you want to skip
+       these   checks   for   performance   reasons,   you   can    set    the
+       PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
+       do this for the second and subsequent calls to pcre_exec() if  you  are
+       making  repeated  calls  to  find  all  the matches in a single subject
+       string. However, you should be  sure  that  the  value  of  startoffset
+       points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
+       set, the effect of passing an invalid UTF-8 string as a subject,  or  a
+       value  of startoffset that does not point to the start of a UTF-8 char-
+       acter, is undefined. Your program may crash.
+
+         PCRE_PARTIAL_HARD
+         PCRE_PARTIAL_SOFT
+
+       These options turn on the partial matching feature. For backwards  com-
+       patibility,  PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial
+       match occurs if the end of the subject string is reached  successfully,
+       but  there  are not enough subject characters to complete the match. If
+       this happens when PCRE_PARTIAL_HARD  is  set,  pcre_exec()  immediately
+       returns  PCRE_ERROR_PARTIAL.  Otherwise,  if  PCRE_PARTIAL_SOFT is set,
+       matching continues by testing any other alternatives. Only if they  all
+       fail  is  PCRE_ERROR_PARTIAL  returned (instead of PCRE_ERROR_NOMATCH).
+       The portion of the string that was inspected when the partial match was
+       found  is  set  as  the first matching string. There is a more detailed
+       discussion in the pcrepartial documentation.
+
+   The string to be matched by pcre_exec()
+
+       The subject string is passed to pcre_exec() as a pointer in subject,  a
+       length (in bytes) in length, and a starting byte offset in startoffset.
+       In UTF-8 mode, the byte offset must point to the start of a UTF-8 char-
+       acter.  Unlike  the pattern string, the subject may contain binary zero
+       bytes. When the starting offset is zero, the search for a match  starts
+       at  the  beginning  of  the subject, and this is by far the most common
+       case.
+
+       A non-zero starting offset is useful when searching for  another  match
+       in  the same subject by calling pcre_exec() again after a previous suc-
+       cess.  Setting startoffset differs from just passing over  a  shortened
+       string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
+       with any kind of lookbehind. For example, consider the pattern
+
+         \Biss\B
+
+       which finds occurrences of "iss" in the middle of  words.  (\B  matches
+       only  if  the  current position in the subject is not a word boundary.)
+       When applied to the string "Mississipi" the first call  to  pcre_exec()
+       finds  the  first  occurrence. If pcre_exec() is called again with just
+       the remainder of the subject,  namely  "issipi",  it  does  not  match,
+       because \B is always false at the start of the subject, which is deemed
+       to be a word boundary. However, if pcre_exec()  is  passed  the  entire
+       string again, but with startoffset set to 4, it finds the second occur-
+       rence of "iss" because it is able to look behind the starting point  to
+       discover that it is preceded by a letter.
+
+       If  a  non-zero starting offset is passed when the pattern is anchored,
+       one attempt to match at the given offset is made. This can only succeed
+       if  the  pattern  does  not require the match to be at the start of the
+       subject.
+
+   How pcre_exec() returns captured substrings
+
+       In general, a pattern matches a certain portion of the subject, and  in
+       addition,  further  substrings  from  the  subject may be picked out by
+       parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
+       this  is  called "capturing" in what follows, and the phrase "capturing
+       subpattern" is used for a fragment of a pattern that picks out  a  sub-
+       string.  PCRE  supports several other kinds of parenthesized subpattern
+       that do not cause substrings to be captured.
+
+       Captured substrings are returned to the caller via a vector of integers
+       whose  address is passed in ovector. The number of elements in the vec-
+       tor is passed in ovecsize, which must be a non-negative  number.  Note:
+       this argument is NOT the size of ovector in bytes.
+
+       The  first  two-thirds of the vector is used to pass back captured sub-
+       strings, each substring using a pair of integers. The  remaining  third
+       of  the  vector is used as workspace by pcre_exec() while matching cap-
+       turing subpatterns, and is not available for passing back  information.
+       The  number passed in ovecsize should always be a multiple of three. If
+       it is not, it is rounded down.
+
+       When a match is successful, information about  captured  substrings  is
+       returned  in  pairs  of integers, starting at the beginning of ovector,
+       and continuing up to two-thirds of its length at the  most.  The  first
+       element  of  each pair is set to the byte offset of the first character
+       in a substring, and the second is set to the byte offset of  the  first
+       character  after  the end of a substring. Note: these values are always
+       byte offsets, even in UTF-8 mode. They are not character counts.
+
+       The first pair of integers, ovector[0]  and  ovector[1],  identify  the
+       portion  of  the subject string matched by the entire pattern. The next
+       pair is used for the first capturing subpattern, and so on.  The  value
+       returned by pcre_exec() is one more than the highest numbered pair that
+       has been set.  For example, if two substrings have been  captured,  the
+       returned  value is 3. If there are no capturing subpatterns, the return
+       value from a successful match is 1, indicating that just the first pair
+       of offsets has been set.
+
+       If a capturing subpattern is matched repeatedly, it is the last portion
+       of the string that it matched that is returned.
+
+       If the vector is too small to hold all the captured substring  offsets,
+       it is used as far as possible (up to two-thirds of its length), and the
+       function returns a value of zero. If the substring offsets are  not  of
+       interest,  pcre_exec()  may  be  called with ovector passed as NULL and
+       ovecsize as zero. However, if the pattern contains back references  and
+       the  ovector is not big enough to remember the related substrings, PCRE
+       has to get additional memory for use during matching. Thus it  is  usu-
+       ally advisable to supply an ovector.
+
+       The pcre_fullinfo() function can be used to find out how many capturing
+       subpatterns there are in a compiled  pattern.  The  smallest  size  for
+       ovector  that  will allow for n captured substrings, in addition to the
+       offsets of the substring matched by the whole pattern, is (n+1)*3.
+
+       It is possible for capturing subpattern number n+1 to match  some  part
+       of the subject when subpattern n has not been used at all. For example,
+       if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
+       return from the function is 4, and subpatterns 1 and 3 are matched, but
+       2 is not. When this happens, both values in  the  offset  pairs  corre-
+       sponding to unused subpatterns are set to -1.
+
+       Offset  values  that correspond to unused subpatterns at the end of the
+       expression are also set to -1. For example,  if  the  string  "abc"  is
+       matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
+       matched. The return from the function is 2, because  the  highest  used
+       capturing subpattern number is 1. However, you can refer to the offsets
+       for the second and third capturing subpatterns if  you  wish  (assuming
+       the vector is large enough, of course).
+
+       Some  convenience  functions  are  provided for extracting the captured
+       substrings as separate strings. These are described below.
+
+   Error return values from pcre_exec()
+
+       If pcre_exec() fails, it returns a negative number. The  following  are
+       defined in the header file:
+
+         PCRE_ERROR_NOMATCH        (-1)
+
+       The subject string did not match the pattern.
+
+         PCRE_ERROR_NULL           (-2)
+
+       Either  code  or  subject  was  passed as NULL, or ovector was NULL and
+       ovecsize was not zero.
+
+         PCRE_ERROR_BADOPTION      (-3)
+
+       An unrecognized bit was set in the options argument.
+
+         PCRE_ERROR_BADMAGIC       (-4)
+
+       PCRE stores a 4-byte "magic number" at the start of the compiled  code,
+       to catch the case when it is passed a junk pointer and to detect when a
+       pattern that was compiled in an environment of one endianness is run in
+       an  environment  with the other endianness. This is the error that PCRE
+       gives when the magic number is not present.
+
+         PCRE_ERROR_UNKNOWN_OPCODE (-5)
+
+       While running the pattern match, an unknown item was encountered in the
+       compiled  pattern.  This  error  could be caused by a bug in PCRE or by
+       overwriting of the compiled pattern.
+
+         PCRE_ERROR_NOMEMORY       (-6)
+
+       If a pattern contains back references, but the ovector that  is  passed
+       to pcre_exec() is not big enough to remember the referenced substrings,
+       PCRE gets a block of memory at the start of matching to  use  for  this
+       purpose.  If the call via pcre_malloc() fails, this error is given. The
+       memory is automatically freed at the end of matching.
+
+         PCRE_ERROR_NOSUBSTRING    (-7)
+
+       This error is used by the pcre_copy_substring(),  pcre_get_substring(),
+       and  pcre_get_substring_list()  functions  (see  below).  It  is  never
+       returned by pcre_exec().
+
+         PCRE_ERROR_MATCHLIMIT     (-8)
+
+       The backtracking limit, as specified by  the  match_limit  field  in  a
+       pcre_extra  structure  (or  defaulted) was reached. See the description
+       above.
+
+         PCRE_ERROR_CALLOUT        (-9)
+
+       This error is never generated by pcre_exec() itself. It is provided for
+       use  by  callout functions that want to yield a distinctive error code.
+       See the pcrecallout documentation for details.
+
+         PCRE_ERROR_BADUTF8        (-10)
+
+       A string that contains an invalid UTF-8 byte sequence was passed  as  a
+       subject.
+
+         PCRE_ERROR_BADUTF8_OFFSET (-11)
+
+       The UTF-8 byte sequence that was passed as a subject was valid, but the
+       value of startoffset did not point to the beginning of a UTF-8  charac-
+       ter.
+
+         PCRE_ERROR_PARTIAL        (-12)
+
+       The  subject  string did not match, but it did match partially. See the
+       pcrepartial documentation for details of partial matching.
+
+         PCRE_ERROR_BADPARTIAL     (-13)
+
+       This code is no longer in  use.  It  was  formerly  returned  when  the
+       PCRE_PARTIAL  option  was used with a compiled pattern containing items
+       that were  not  supported  for  partial  matching.  From  release  8.00
+       onwards, there are no restrictions on partial matching.
+
+         PCRE_ERROR_INTERNAL       (-14)
+
+       An  unexpected  internal error has occurred. This error could be caused
+       by a bug in PCRE or by overwriting of the compiled pattern.
+
+         PCRE_ERROR_BADCOUNT       (-15)
+
+       This error is given if the value of the ovecsize argument is negative.
+
+         PCRE_ERROR_RECURSIONLIMIT (-21)
+
+       The internal recursion limit, as specified by the match_limit_recursion
+       field  in  a  pcre_extra  structure (or defaulted) was reached. See the
+       description above.
+
+         PCRE_ERROR_BADNEWLINE     (-23)
+
+       An invalid combination of PCRE_NEWLINE_xxx options was given.
+
+       Error numbers -16 to -20 and -22 are not used by pcre_exec().
+
+
+EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
+
+       int pcre_copy_substring(const char *subject, int *ovector,
+            int stringcount, int stringnumber, char *buffer,
+            int buffersize);
+
+       int pcre_get_substring(const char *subject, int *ovector,
+            int stringcount, int stringnumber,
+            const char **stringptr);
+
+       int pcre_get_substring_list(const char *subject,
+            int *ovector, int stringcount, const char ***listptr);
+
+       Captured substrings can be  accessed  directly  by  using  the  offsets
+       returned  by  pcre_exec()  in  ovector.  For convenience, the functions
+       pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
+       string_list()  are  provided for extracting captured substrings as new,
+       separate, zero-terminated strings. These functions identify  substrings
+       by  number.  The  next section describes functions for extracting named
+       substrings.
+
+       A substring that contains a binary zero is correctly extracted and  has
+       a  further zero added on the end, but the result is not, of course, a C
+       string.  However, you can process such a string  by  referring  to  the
+       length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub-
+       string().  Unfortunately, the interface to pcre_get_substring_list() is
+       not  adequate for handling strings containing binary zeros, because the
+       end of the final string is not independently indicated.
+
+       The first three arguments are the same for all  three  of  these  func-
+       tions:  subject  is  the subject string that has just been successfully
+       matched, ovector is a pointer to the vector of integer offsets that was
+       passed to pcre_exec(), and stringcount is the number of substrings that
+       were captured by the match, including the substring  that  matched  the
+       entire regular expression. This is the value returned by pcre_exec() if
+       it is greater than zero. If pcre_exec() returned zero, indicating  that
+       it  ran out of space in ovector, the value passed as stringcount should
+       be the number of elements in the vector divided by three.
+
+       The functions pcre_copy_substring() and pcre_get_substring() extract  a
+       single  substring,  whose  number  is given as stringnumber. A value of
+       zero extracts the substring that matched the  entire  pattern,  whereas
+       higher  values  extract  the  captured  substrings.  For pcre_copy_sub-
+       string(), the string is placed in buffer,  whose  length  is  given  by
+       buffersize,  while  for  pcre_get_substring()  a new block of memory is
+       obtained via pcre_malloc, and its address is  returned  via  stringptr.
+       The  yield  of  the function is the length of the string, not including
+       the terminating zero, or one of these error codes:
+
+         PCRE_ERROR_NOMEMORY       (-6)
+
+       The buffer was too small for pcre_copy_substring(), or the  attempt  to
+       get memory failed for pcre_get_substring().
+
+         PCRE_ERROR_NOSUBSTRING    (-7)
+
+       There is no substring whose number is stringnumber.
+
+       The  pcre_get_substring_list()  function  extracts  all  available sub-
+       strings and builds a list of pointers to them. All this is  done  in  a
+       single block of memory that is obtained via pcre_malloc. The address of
+       the memory block is returned via listptr, which is also  the  start  of
+       the  list  of  string pointers. The end of the list is marked by a NULL
+       pointer. The yield of the function is zero if all  went  well,  or  the
+       error code
+
+         PCRE_ERROR_NOMEMORY       (-6)
+
+       if the attempt to get the memory block failed.
+
+       When  any of these functions encounter a substring that is unset, which
+       can happen when capturing subpattern number n+1 matches  some  part  of
+       the  subject, but subpattern n has not been used at all, they return an
+       empty string. This can be distinguished from a genuine zero-length sub-
+       string  by inspecting the appropriate offset in ovector, which is nega-
+       tive for unset substrings.
+
+       The two convenience functions pcre_free_substring() and  pcre_free_sub-
+       string_list()  can  be  used  to free the memory returned by a previous
+       call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
+       tively.  They  do  nothing  more  than  call the function pointed to by
+       pcre_free, which of course could be called directly from a  C  program.
+       However,  PCRE is used in some situations where it is linked via a spe-
+       cial  interface  to  another  programming  language  that  cannot   use
+       pcre_free  directly;  it is for these cases that the functions are pro-
+       vided.
+
+
+EXTRACTING CAPTURED SUBSTRINGS BY NAME
+
+       int pcre_get_stringnumber(const pcre *code,
+            const char *name);
+
+       int pcre_copy_named_substring(const pcre *code,
+            const char *subject, int *ovector,
+            int stringcount, const char *stringname,
+            char *buffer, int buffersize);
+
+       int pcre_get_named_substring(const pcre *code,
+            const char *subject, int *ovector,
+            int stringcount, const char *stringname,
+            const char **stringptr);
+
+       To extract a substring by name, you first have to find associated  num-
+       ber.  For example, for this pattern
+
+         (a+)b(?<xxx>\d+)...
+
+       the number of the subpattern called "xxx" is 2. If the name is known to
+       be unique (PCRE_DUPNAMES was not set), you can find the number from the
+       name by calling pcre_get_stringnumber(). The first argument is the com-
+       piled pattern, and the second is the name. The yield of the function is
+       the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no
+       subpattern of that name.
+
+       Given the number, you can extract the substring directly, or use one of
+       the functions described in the previous section. For convenience, there
+       are also two functions that do the whole job.
+
+       Most   of   the   arguments    of    pcre_copy_named_substring()    and
+       pcre_get_named_substring()  are  the  same  as  those for the similarly
+       named functions that extract by number. As these are described  in  the
+       previous  section,  they  are not re-described here. There are just two
+       differences:
+
+       First, instead of a substring number, a substring name is  given.  Sec-
+       ond, there is an extra argument, given at the start, which is a pointer
+       to the compiled pattern. This is needed in order to gain access to  the
+       name-to-number translation table.
+
+       These  functions call pcre_get_stringnumber(), and if it succeeds, they
+       then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
+       ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
+       behaviour may not be what you want (see the next section).
+
+       Warning: If the pattern uses the (?| feature to set up multiple subpat-
+       terns  with  the  same number, as described in the section on duplicate
+       subpattern numbers in the pcrepattern page, you  cannot  use  names  to
+       distinguish  the  different subpatterns, because names are not included
+       in the compiled code. The matching process uses only numbers. For  this
+       reason,  the  use of different names for subpatterns of the same number
+       causes an error at compile time.
+
+
+DUPLICATE SUBPATTERN NAMES
+
+       int pcre_get_stringtable_entries(const pcre *code,
+            const char *name, char **first, char **last);
+
+       When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
+       subpatterns  are not required to be unique. (Duplicate names are always
+       allowed for subpatterns with the same number, created by using the  (?|
+       feature.  Indeed,  if  such subpatterns are named, they are required to
+       use the same names.)
+
+       Normally, patterns with duplicate names are such that in any one match,
+       only  one of the named subpatterns participates. An example is shown in
+       the pcrepattern documentation.
+
+       When   duplicates   are   present,   pcre_copy_named_substring()    and
+       pcre_get_named_substring()  return the first substring corresponding to
+       the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
+       (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
+       function returns one of the numbers that are associated with the  name,
+       but it is not defined which it is.
+
+       If  you want to get full details of all captured substrings for a given
+       name, you must use  the  pcre_get_stringtable_entries()  function.  The
+       first argument is the compiled pattern, and the second is the name. The
+       third and fourth are pointers to variables which  are  updated  by  the
+       function. After it has run, they point to the first and last entries in
+       the name-to-number table  for  the  given  name.  The  function  itself
+       returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
+       there are none. The format of the table is described above in the  sec-
+       tion  entitled  Information  about  a  pattern.  Given all the relevant
+       entries for the name, you can extract each of their numbers, and  hence
+       the captured data, if any.
+
+
+FINDING ALL POSSIBLE MATCHES
+
+       The  traditional  matching  function  uses a similar algorithm to Perl,
+       which stops when it finds the first match, starting at a given point in
+       the  subject.  If you want to find all possible matches, or the longest
+       possible match, consider using the alternative matching  function  (see
+       below)  instead.  If you cannot use the alternative function, but still
+       need to find all possible matches, you can kludge it up by  making  use
+       of the callout facility, which is described in the pcrecallout documen-
+       tation.
+
+       What you have to do is to insert a callout right at the end of the pat-
+       tern.   When your callout function is called, extract and save the cur-
+       rent matched substring. Then return  1,  which  forces  pcre_exec()  to
+       backtrack  and  try other alternatives. Ultimately, when it runs out of
+       matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
+
+
+MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
+
+       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
+            const char *subject, int length, int startoffset,
+            int options, int *ovector, int ovecsize,
+            int *workspace, int wscount);
+
+       The function pcre_dfa_exec()  is  called  to  match  a  subject  string
+       against  a  compiled pattern, using a matching algorithm that scans the
+       subject string just once, and does not backtrack.  This  has  different
+       characteristics  to  the  normal  algorithm, and is not compatible with
+       Perl. Some of the features of PCRE patterns are not  supported.  Never-
+       theless,  there are times when this kind of matching can be useful. For
+       a discussion of the two matching algorithms, and  a  list  of  features
+       that  pcre_dfa_exec() does not support, see the pcrematching documenta-
+       tion.
+
+       The arguments for the pcre_dfa_exec() function  are  the  same  as  for
+       pcre_exec(), plus two extras. The ovector argument is used in a differ-
+       ent way, and this is described below. The other  common  arguments  are
+       used  in  the  same way as for pcre_exec(), so their description is not
+       repeated here.
+
+       The two additional arguments provide workspace for  the  function.  The
+       workspace  vector  should  contain at least 20 elements. It is used for
+       keeping  track  of  multiple  paths  through  the  pattern  tree.  More
+       workspace  will  be  needed for patterns and subjects where there are a
+       lot of potential matches.
+
+       Here is an example of a simple call to pcre_dfa_exec():
+
+         int rc;
+         int ovector[10];
+         int wspace[20];
+         rc = pcre_dfa_exec(
+           re,             /* result of pcre_compile() */
+           NULL,           /* we didn't study the pattern */
+           "some string",  /* the subject string */
+           11,             /* the length of the subject string */
+           0,              /* start at offset 0 in the subject */
+           0,              /* default options */
+           ovector,        /* vector of integers for substring information */
+           10,             /* number of elements (NOT size in bytes) */
+           wspace,         /* working space vector */
+           20);            /* number of elements (NOT size in bytes) */
+
+   Option bits for pcre_dfa_exec()
+
+       The unused bits of the options argument  for  pcre_dfa_exec()  must  be
+       zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
+       LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,
+       PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PAR-
+       TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All  but  the  last
+       four  of  these  are  exactly  the  same  as  for pcre_exec(), so their
+       description is not repeated here.
+
+         PCRE_PARTIAL_HARD
+         PCRE_PARTIAL_SOFT
+
+       These have the same general effect as they do for pcre_exec(), but  the
+       details  are  slightly  different.  When  PCRE_PARTIAL_HARD  is set for
+       pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of  the  sub-
+       ject  is  reached  and there is still at least one matching possibility
+       that requires additional characters. This happens even if some complete
+       matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
+       code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
+       of  the  subject  is  reached, there have been no complete matches, but
+       there is still at least one matching possibility. The  portion  of  the
+       string  that  was inspected when the longest partial match was found is
+       set as the first matching string in both cases.
+
+         PCRE_DFA_SHORTEST
+
+       Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
+       stop as soon as it has found one match. Because of the way the alterna-
+       tive algorithm works, this is necessarily the shortest  possible  match
+       at the first possible matching point in the subject string.
+
+         PCRE_DFA_RESTART
+
+       When pcre_dfa_exec() returns a partial match, it is possible to call it
+       again, with additional subject characters, and have  it  continue  with
+       the  same match. The PCRE_DFA_RESTART option requests this action; when
+       it is set, the workspace and wscount options must  reference  the  same
+       vector  as  before  because data about the match so far is left in them
+       after a partial match. There is more discussion of this facility in the
+       pcrepartial documentation.
+
+   Successful returns from pcre_dfa_exec()
+
+       When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
+       string in the subject. Note, however, that all the matches from one run
+       of  the  function  start  at the same point in the subject. The shorter
+       matches are all initial substrings of the longer matches. For  example,
+       if the pattern
+
+         <.*>
+
+       is matched against the string
+
+         This is <something> <something else> <something further> no more
+
+       the three matched strings are
+
+         <something>
+         <something> <something else>
+         <something> <something else> <something further>
+
+       On  success,  the  yield of the function is a number greater than zero,
+       which is the number of matched substrings.  The  substrings  themselves
+       are  returned  in  ovector. Each string uses two elements; the first is
+       the offset to the start, and the second is the offset to  the  end.  In
+       fact,  all  the  strings  have the same start offset. (Space could have
+       been saved by giving this only once, but it was decided to retain  some
+       compatibility  with  the  way pcre_exec() returns data, even though the
+       meaning of the strings is different.)
+
+       The strings are returned in reverse order of length; that is, the long-
+       est  matching  string is given first. If there were too many matches to
+       fit into ovector, the yield of the function is zero, and the vector  is
+       filled with the longest matches.
+
+   Error returns from pcre_dfa_exec()
+
+       The  pcre_dfa_exec()  function returns a negative number when it fails.
+       Many of the errors are the same  as  for  pcre_exec(),  and  these  are
+       described  above.   There are in addition the following errors that are
+       specific to pcre_dfa_exec():
+
+         PCRE_ERROR_DFA_UITEM      (-16)
+
+       This return is given if pcre_dfa_exec() encounters an item in the  pat-
+       tern  that  it  does not support, for instance, the use of \C or a back
+       reference.
+
+         PCRE_ERROR_DFA_UCOND      (-17)
+
+       This return is given if pcre_dfa_exec()  encounters  a  condition  item
+       that  uses  a back reference for the condition, or a test for recursion
+       in a specific group. These are not supported.
+
+         PCRE_ERROR_DFA_UMLIMIT    (-18)
+
+       This return is given if pcre_dfa_exec() is called with an  extra  block
+       that contains a setting of the match_limit field. This is not supported
+       (it is meaningless).
+
+         PCRE_ERROR_DFA_WSSIZE     (-19)
+
+       This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
+       workspace vector.
+
+         PCRE_ERROR_DFA_RECURSE    (-20)
+
+       When  a  recursive subpattern is processed, the matching function calls
+       itself recursively, using private vectors for  ovector  and  workspace.
+       This  error  is  given  if  the output vector is not large enough. This
+       should be extremely rare, as a vector of size 1000 is used.
+
+
+SEE ALSO
+
+       pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-
+       tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 03 October 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRECALLOUT(3)                                                  PCRECALLOUT(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE CALLOUTS
+
+       int (*pcre_callout)(pcre_callout_block *);
+
+       PCRE provides a feature called "callout", which is a means of temporar-
+       ily passing control to the caller of PCRE  in  the  middle  of  pattern
+       matching.  The  caller of PCRE provides an external function by putting
+       its entry point in the global variable pcre_callout. By  default,  this
+       variable contains NULL, which disables all calling out.
+
+       Within  a  regular  expression,  (?C) indicates the points at which the
+       external function is to be called.  Different  callout  points  can  be
+       identified  by  putting  a number less than 256 after the letter C. The
+       default value is zero.  For  example,  this  pattern  has  two  callout
+       points:
+
+         (?C1)abc(?C2)def
+
+       If  the  PCRE_AUTO_CALLOUT  option  bit  is  set when pcre_compile() or
+       pcre_compile2() is called, PCRE  automatically  inserts  callouts,  all
+       with  number  255,  before  each  item  in the pattern. For example, if
+       PCRE_AUTO_CALLOUT is used with the pattern
+
+         A(\d{2}|--)
+
+       it is processed as if it were
+
+       (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
+
+       Notice that there is a callout before and after  each  parenthesis  and
+       alternation  bar.  Automatic  callouts  can  be  used  for tracking the
+       progress of pattern matching. The pcretest command has an  option  that
+       sets  automatic callouts; when it is used, the output indicates how the
+       pattern is matched. This is useful information when you are  trying  to
+       optimize the performance of a particular pattern.
+
+
+MISSING CALLOUTS
+
+       You  should  be  aware  that,  because of optimizations in the way PCRE
+       matches patterns by default, callouts  sometimes  do  not  happen.  For
+       example, if the pattern is
+
+         ab(?C4)cd
+
+       PCRE knows that any matching string must contain the letter "d". If the
+       subject string is "abyz", the lack of "d" means that  matching  doesn't
+       ever  start,  and  the  callout is never reached. However, with "abyd",
+       though the result is still no match, the callout is obeyed.
+
+       If the pattern is studied, PCRE knows the minimum length of a  matching
+       string,  and will immediately give a "no match" return without actually
+       running a match if the subject is not long enough, or,  for  unanchored
+       patterns, if it has been scanned far enough.
+
+       You  can disable these optimizations by passing the PCRE_NO_START_OPTI-
+       MIZE option to pcre_exec() or  pcre_dfa_exec().  This  slows  down  the
+       matching  process,  but  does  ensure that callouts such as the example
+       above are obeyed.
+
+
+THE CALLOUT INTERFACE
+
+       During matching, when PCRE reaches a callout point, the external  func-
+       tion  defined by pcre_callout is called (if it is set). This applies to
+       both the pcre_exec() and the pcre_dfa_exec()  matching  functions.  The
+       only  argument  to  the callout function is a pointer to a pcre_callout
+       block. This structure contains the following fields:
+
+         int          version;
+         int          callout_number;
+         int         *offset_vector;
+         const char  *subject;
+         int          subject_length;
+         int          start_match;
+         int          current_position;
+         int          capture_top;
+         int          capture_last;
+         void        *callout_data;
+         int          pattern_position;
+         int          next_item_length;
+
+       The version field is an integer containing the version  number  of  the
+       block  format. The initial version was 0; the current version is 1. The
+       version number will change again in future  if  additional  fields  are
+       added, but the intention is never to remove any of the existing fields.
+
+       The  callout_number  field  contains the number of the callout, as com-
+       piled into the pattern (that is, the number after ?C for  manual  call-
+       outs, and 255 for automatically generated callouts).
+
+       The  offset_vector field is a pointer to the vector of offsets that was
+       passed  by  the  caller  to  pcre_exec()   or   pcre_dfa_exec().   When
+       pcre_exec()  is used, the contents can be inspected in order to extract
+       substrings that have been matched so  far,  in  the  same  way  as  for
+       extracting  substrings after a match has completed. For pcre_dfa_exec()
+       this field is not useful.
+
+       The subject and subject_length fields contain copies of the values that
+       were passed to pcre_exec().
+
+       The  start_match  field normally contains the offset within the subject
+       at which the current match attempt  started.  However,  if  the  escape
+       sequence  \K has been encountered, this value is changed to reflect the
+       modified starting point. If the pattern is not  anchored,  the  callout
+       function may be called several times from the same point in the pattern
+       for different starting points in the subject.
+
+       The current_position field contains the offset within  the  subject  of
+       the current match pointer.
+
+       When  the  pcre_exec() function is used, the capture_top field contains
+       one more than the number of the highest numbered captured substring  so
+       far.  If  no substrings have been captured, the value of capture_top is
+       one. This is always the case when pcre_dfa_exec() is used,  because  it
+       does not support captured substrings.
+
+       The  capture_last  field  contains the number of the most recently cap-
+       tured substring. If no substrings have been captured, its value is  -1.
+       This is always the case when pcre_dfa_exec() is used.
+
+       The  callout_data  field contains a value that is passed to pcre_exec()
+       or pcre_dfa_exec() specifically so that it can be passed back in  call-
+       outs.  It  is  passed  in the pcre_callout field of the pcre_extra data
+       structure. If no such data was passed, the value of callout_data  in  a
+       pcre_callout  block  is  NULL. There is a description of the pcre_extra
+       structure in the pcreapi documentation.
+
+       The pattern_position field is present from version 1 of the  pcre_call-
+       out structure. It contains the offset to the next item to be matched in
+       the pattern string.
+
+       The next_item_length field is present from version 1 of the  pcre_call-
+       out structure. It contains the length of the next item to be matched in
+       the pattern string. When the callout immediately precedes  an  alterna-
+       tion  bar, a closing parenthesis, or the end of the pattern, the length
+       is zero. When the callout precedes an opening parenthesis,  the  length
+       is that of the entire subpattern.
+
+       The  pattern_position  and next_item_length fields are intended to help
+       in distinguishing between different automatic callouts, which all  have
+       the same callout number. However, they are set for all callouts.
+
+
+RETURN VALUES
+
+       The  external callout function returns an integer to PCRE. If the value
+       is zero, matching proceeds as normal. If  the  value  is  greater  than
+       zero,  matching  fails  at  the current point, but the testing of other
+       matching possibilities goes ahead, just as if a lookahead assertion had
+       failed.  If  the  value  is less than zero, the match is abandoned, and
+       pcre_exec() or pcre_dfa_exec() returns the negative value.
+
+       Negative  values  should  normally  be   chosen   from   the   set   of
+       PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan-
+       dard "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT  is
+       reserved  for  use  by callout functions; it will never be used by PCRE
+       itself.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 29 September 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRECOMPAT(3)                                                    PCRECOMPAT(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+DIFFERENCES BETWEEN PCRE AND PERL
+
+       This  document describes the differences in the ways that PCRE and Perl
+       handle regular expressions. The differences  described  here  are  with
+       respect to Perl 5.10.
+
+       1.  PCRE has only a subset of Perl's UTF-8 and Unicode support. Details
+       of what it does have are given in the section on UTF-8 support  in  the
+       main pcre page.
+
+       2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl
+       permits them, but they do not mean what you might think.  For  example,
+       (?!a){3} does not assert that the next three characters are not "a". It
+       just asserts that the next character is not "a" three times.
+
+       3. Capturing subpatterns that occur inside  negative  lookahead  asser-
+       tions  are  counted,  but their entries in the offsets vector are never
+       set. Perl sets its numerical variables from any such patterns that  are
+       matched before the assertion fails to match something (thereby succeed-
+       ing), but only if the negative lookahead assertion  contains  just  one
+       branch.
+
+       4.  Though  binary zero characters are supported in the subject string,
+       they are not allowed in a pattern string because it is passed as a nor-
+       mal C string, terminated by zero. The escape sequence \0 can be used in
+       the pattern to represent a binary zero.
+
+       5. The following Perl escape sequences are not supported: \l,  \u,  \L,
+       \U, and \N. In fact these are implemented by Perl's general string-han-
+       dling and are not part of its pattern matching engine. If any of  these
+       are encountered by PCRE, an error is generated.
+
+       6.  The Perl escape sequences \p, \P, and \X are supported only if PCRE
+       is built with Unicode character property support. The  properties  that
+       can  be tested with \p and \P are limited to the general category prop-
+       erties such as Lu and Nd, script names such as Greek or  Han,  and  the
+       derived  properties  Any  and  L&. PCRE does support the Cs (surrogate)
+       property, which Perl does not; the  Perl  documentation  says  "Because
+       Perl hides the need for the user to understand the internal representa-
+       tion of Unicode characters, there is no need to implement the  somewhat
+       messy concept of surrogates."
+
+       7. PCRE does support the \Q...\E escape for quoting substrings. Charac-
+       ters in between are treated as literals.  This  is  slightly  different
+       from  Perl  in  that  $  and  @ are also handled as literals inside the
+       quotes. In Perl, they cause variable interpolation (but of course  PCRE
+       does not have variables). Note the following examples:
+
+           Pattern            PCRE matches      Perl matches
+
+           \Qabc$xyz\E        abc$xyz           abc followed by the
+                                                  contents of $xyz
+           \Qabc\$xyz\E       abc\$xyz          abc\$xyz
+           \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
+
+       The  \Q...\E  sequence  is recognized both inside and outside character
+       classes.
+
+       8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
+       constructions.  However,  there is support for recursive patterns. This
+       is not available in Perl 5.8, but it is in Perl 5.10.  Also,  the  PCRE
+       "callout"  feature allows an external function to be called during pat-
+       tern matching. See the pcrecallout documentation for details.
+
+       9. Subpatterns that are called  recursively  or  as  "subroutines"  are
+       always  treated  as  atomic  groups  in  PCRE. This is like Python, but
+       unlike Perl. There is a discussion of an example that explains this  in
+       more  detail  in  the section on recursion differences from Perl in the
+       pcrepattern page.
+
+       10. There are some differences that are concerned with the settings  of
+       captured  strings  when  part  of  a  pattern is repeated. For example,
+       matching "aba" against the  pattern  /^(a(b)?)+$/  in  Perl  leaves  $2
+       unset, but in PCRE it is set to "b".
+
+       11.  PCRE  does  support  Perl  5.10's  backtracking  verbs  (*ACCEPT),
+       (*FAIL), (*F), (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but  only  in
+       the forms without an argument. PCRE does not support (*MARK).
+
+       12.  PCRE's handling of duplicate subpattern numbers and duplicate sub-
+       pattern names is not as general as Perl's. This is a consequence of the
+       fact the PCRE works internally just with numbers, using an external ta-
+       ble to translate between numbers and names. In  particular,  a  pattern
+       such  as  (?|(?<a>A)|(?<b)B),  where the two capturing parentheses have
+       the same number but different names, is not supported,  and  causes  an
+       error  at compile time. If it were allowed, it would not be possible to
+       distinguish which parentheses matched, because both names map  to  cap-
+       turing subpattern number 1. To avoid this confusing situation, an error
+       is given at compile time.
+
+       13. PCRE provides some extensions to the Perl regular expression facil-
+       ities.   Perl  5.10  includes new features that are not in earlier ver-
+       sions of Perl, some of which (such as named parentheses) have  been  in
+       PCRE for some time. This list is with respect to Perl 5.10:
+
+       (a)  Although  lookbehind  assertions  in  PCRE must match fixed length
+       strings, each alternative branch of a lookbehind assertion can match  a
+       different  length  of  string.  Perl requires them all to have the same
+       length.
+
+       (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the  $
+       meta-character matches only at the very end of the string.
+
+       (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
+       cial meaning is faulted. Otherwise, like Perl, the backslash is quietly
+       ignored.  (Perl can be made to issue a warning.)
+
+       (d)  If  PCRE_UNGREEDY is set, the greediness of the repetition quanti-
+       fiers is inverted, that is, by default they are not greedy, but if fol-
+       lowed by a question mark they are.
+
+       (e) PCRE_ANCHORED can be used at matching time to force a pattern to be
+       tried only at the first matching position in the subject string.
+
+       (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
+       and  PCRE_NO_AUTO_CAPTURE  options for pcre_exec() have no Perl equiva-
+       lents.
+
+       (g) The \R escape sequence can be restricted to match only CR,  LF,  or
+       CRLF by the PCRE_BSR_ANYCRLF option.
+
+       (h) The callout facility is PCRE-specific.
+
+       (i) The partial matching facility is PCRE-specific.
+
+       (j) Patterns compiled by PCRE can be saved and re-used at a later time,
+       even on different hosts that have the other endianness.
+
+       (k) The alternative matching function (pcre_dfa_exec())  matches  in  a
+       different way and is not Perl-compatible.
+
+       (l)  PCRE  recognizes some special sequences such as (*CR) at the start
+       of a pattern that set overall options that cannot be changed within the
+       pattern.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 04 October 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREPATTERN(3)                                                  PCREPATTERN(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE REGULAR EXPRESSION DETAILS
+
+       The  syntax and semantics of the regular expressions that are supported
+       by PCRE are described in detail below. There is a quick-reference  syn-
+       tax summary in the pcresyntax page. PCRE tries to match Perl syntax and
+       semantics as closely as it can. PCRE  also  supports  some  alternative
+       regular  expression  syntax (which does not conflict with the Perl syn-
+       tax) in order to provide some compatibility with regular expressions in
+       Python, .NET, and Oniguruma.
+
+       Perl's  regular expressions are described in its own documentation, and
+       regular expressions in general are covered in a number of  books,  some
+       of  which  have  copious  examples. Jeffrey Friedl's "Mastering Regular
+       Expressions", published by  O'Reilly,  covers  regular  expressions  in
+       great  detail.  This  description  of  PCRE's  regular  expressions  is
+       intended as reference material.
+
+       The original operation of PCRE was on strings of  one-byte  characters.
+       However,  there is now also support for UTF-8 character strings. To use
+       this, PCRE must be built to include UTF-8 support, and  you  must  call
+       pcre_compile()  or  pcre_compile2() with the PCRE_UTF8 option. There is
+       also a special sequence that can be given at the start of a pattern:
+
+         (*UTF8)
+
+       Starting a pattern with this sequence  is  equivalent  to  setting  the
+       PCRE_UTF8  option.  This  feature  is  not Perl-compatible. How setting
+       UTF-8 mode affects pattern matching  is  mentioned  in  several  places
+       below.  There  is  also  a  summary of UTF-8 features in the section on
+       UTF-8 support in the main pcre page.
+
+       The remainder of this document discusses the  patterns  that  are  sup-
+       ported  by  PCRE when its main matching function, pcre_exec(), is used.
+       From  release  6.0,   PCRE   offers   a   second   matching   function,
+       pcre_dfa_exec(),  which matches using a different algorithm that is not
+       Perl-compatible. Some of the features discussed below are not available
+       when  pcre_dfa_exec()  is used. The advantages and disadvantages of the
+       alternative function, and how it differs from the normal function,  are
+       discussed in the pcrematching page.
+
+
+NEWLINE CONVENTIONS
+
+       PCRE  supports five different conventions for indicating line breaks in
+       strings: a single CR (carriage return) character, a  single  LF  (line-
+       feed) character, the two-character sequence CRLF, any of the three pre-
+       ceding, or any Unicode newline sequence. The pcreapi page  has  further
+       discussion  about newlines, and shows how to set the newline convention
+       in the options arguments for the compiling and matching functions.
+
+       It is also possible to specify a newline convention by starting a  pat-
+       tern string with one of the following five sequences:
+
+         (*CR)        carriage return
+         (*LF)        linefeed
+         (*CRLF)      carriage return, followed by linefeed
+         (*ANYCRLF)   any of the three above
+         (*ANY)       all Unicode newline sequences
+
+       These  override  the default and the options given to pcre_compile() or
+       pcre_compile2(). For example, on a Unix system where LF is the  default
+       newline sequence, the pattern
+
+         (*CR)a.b
+
+       changes the convention to CR. That pattern matches "a\nb" because LF is
+       no longer a newline. Note that these special settings,  which  are  not
+       Perl-compatible,  are  recognized  only at the very start of a pattern,
+       and that they must be in upper case.  If  more  than  one  of  them  is
+       present, the last one is used.
+
+       The  newline  convention  does  not  affect what the \R escape sequence
+       matches. By default, this is any Unicode  newline  sequence,  for  Perl
+       compatibility.  However, this can be changed; see the description of \R
+       in the section entitled "Newline sequences" below. A change of \R  set-
+       ting can be combined with a change of newline convention.
+
+
+CHARACTERS AND METACHARACTERS
+
+       A  regular  expression  is  a pattern that is matched against a subject
+       string from left to right. Most characters stand for  themselves  in  a
+       pattern,  and  match  the corresponding characters in the subject. As a
+       trivial example, the pattern
+
+         The quick brown fox
+
+       matches a portion of a subject string that is identical to itself. When
+       caseless  matching is specified (the PCRE_CASELESS option), letters are
+       matched independently of case. In UTF-8 mode, PCRE  always  understands
+       the  concept  of case for characters whose values are less than 128, so
+       caseless matching is always possible. For characters with  higher  val-
+       ues,  the concept of case is supported if PCRE is compiled with Unicode
+       property support, but not otherwise.   If  you  want  to  use  caseless
+       matching  for  characters  128  and above, you must ensure that PCRE is
+       compiled with Unicode property support as well as with UTF-8 support.
+
+       The power of regular expressions comes  from  the  ability  to  include
+       alternatives  and  repetitions in the pattern. These are encoded in the
+       pattern by the use of metacharacters, which do not stand for themselves
+       but instead are interpreted in some special way.
+
+       There  are  two different sets of metacharacters: those that are recog-
+       nized anywhere in the pattern except within square brackets, and  those
+       that  are  recognized  within square brackets. Outside square brackets,
+       the metacharacters are as follows:
+
+         \      general escape character with several uses
+         ^      assert start of string (or line, in multiline mode)
+         $      assert end of string (or line, in multiline mode)
+         .      match any character except newline (by default)
+         [      start character class definition
+         |      start of alternative branch
+         (      start subpattern
+         )      end subpattern
+         ?      extends the meaning of (
+                also 0 or 1 quantifier
+                also quantifier minimizer
+         *      0 or more quantifier
+         +      1 or more quantifier
+                also "possessive quantifier"
+         {      start min/max quantifier
+
+       Part of a pattern that is in square brackets  is  called  a  "character
+       class". In a character class the only metacharacters are:
+
+         \      general escape character
+         ^      negate the class, but only if the first character
+         -      indicates character range
+         [      POSIX character class (only if followed by POSIX
+                  syntax)
+         ]      terminates the character class
+
+       The following sections describe the use of each of the metacharacters.
+
+
+BACKSLASH
+
+       The backslash character has several uses. Firstly, if it is followed by
+       a non-alphanumeric character, it takes away any  special  meaning  that
+       character  may  have.  This  use  of  backslash  as an escape character
+       applies both inside and outside character classes.
+
+       For example, if you want to match a * character, you write  \*  in  the
+       pattern.   This  escaping  action  applies whether or not the following
+       character would otherwise be interpreted as a metacharacter, so  it  is
+       always  safe  to  precede  a non-alphanumeric with backslash to specify
+       that it stands for itself. In particular, if you want to match a  back-
+       slash, you write \\.
+
+       If  a  pattern is compiled with the PCRE_EXTENDED option, whitespace in
+       the pattern (other than in a character class) and characters between  a
+       # outside a character class and the next newline are ignored. An escap-
+       ing backslash can be used to include a whitespace  or  #  character  as
+       part of the pattern.
+
+       If  you  want  to remove the special meaning from a sequence of charac-
+       ters, you can do so by putting them between \Q and \E. This is  differ-
+       ent  from  Perl  in  that  $  and  @ are handled as literals in \Q...\E
+       sequences in PCRE, whereas in Perl, $ and @ cause  variable  interpola-
+       tion. Note the following examples:
+
+         Pattern            PCRE matches   Perl matches
+
+         \Qabc$xyz\E        abc$xyz        abc followed by the
+                                             contents of $xyz
+         \Qabc\$xyz\E       abc\$xyz       abc\$xyz
+         \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
+
+       The  \Q...\E  sequence  is recognized both inside and outside character
+       classes.
+
+   Non-printing characters
+
+       A second use of backslash provides a way of encoding non-printing char-
+       acters  in patterns in a visible manner. There is no restriction on the
+       appearance of non-printing characters, apart from the binary zero  that
+       terminates  a  pattern,  but  when  a pattern is being prepared by text
+       editing, it is  often  easier  to  use  one  of  the  following  escape
+       sequences than the binary character it represents:
+
+         \a        alarm, that is, the BEL character (hex 07)
+         \cx       "control-x", where x is any character
+         \e        escape (hex 1B)
+         \f        formfeed (hex 0C)
+         \n        linefeed (hex 0A)
+         \r        carriage return (hex 0D)
+         \t        tab (hex 09)
+         \ddd      character with octal code ddd, or backreference
+         \xhh      character with hex code hh
+         \x{hhh..} character with hex code hhh..
+
+       The  precise  effect of \cx is as follows: if x is a lower case letter,
+       it is converted to upper case. Then bit 6 of the character (hex 40)  is
+       inverted.   Thus  \cz becomes hex 1A, but \c{ becomes hex 3B, while \c;
+       becomes hex 7B.
+
+       After \x, from zero to two hexadecimal digits are read (letters can  be
+       in  upper  or  lower case). Any number of hexadecimal digits may appear
+       between \x{ and }, but the value of the character  code  must  be  less
+       than 256 in non-UTF-8 mode, and less than 2**31 in UTF-8 mode. That is,
+       the maximum value in hexadecimal is 7FFFFFFF. Note that this is  bigger
+       than the largest Unicode code point, which is 10FFFF.
+
+       If  characters  other than hexadecimal digits appear between \x{ and },
+       or if there is no terminating }, this form of escape is not recognized.
+       Instead,  the  initial  \x  will  be interpreted as a basic hexadecimal
+       escape, with no following digits, giving a  character  whose  value  is
+       zero.
+
+       Characters whose value is less than 256 can be defined by either of the
+       two syntaxes for \x. There is no difference in the way  they  are  han-
+       dled. For example, \xdc is exactly the same as \x{dc}.
+
+       After  \0  up  to two further octal digits are read. If there are fewer
+       than two digits, just  those  that  are  present  are  used.  Thus  the
+       sequence \0\x\07 specifies two binary zeros followed by a BEL character
+       (code value 7). Make sure you supply two digits after the initial  zero
+       if the pattern character that follows is itself an octal digit.
+
+       The handling of a backslash followed by a digit other than 0 is compli-
+       cated.  Outside a character class, PCRE reads it and any following dig-
+       its  as  a  decimal  number. If the number is less than 10, or if there
+       have been at least that many previous capturing left parentheses in the
+       expression,  the  entire  sequence  is  taken  as  a  back reference. A
+       description of how this works is given later, following the  discussion
+       of parenthesized subpatterns.
+
+       Inside  a  character  class, or if the decimal number is greater than 9
+       and there have not been that many capturing subpatterns, PCRE  re-reads
+       up to three octal digits following the backslash, and uses them to gen-
+       erate a data character. Any subsequent digits stand for themselves.  In
+       non-UTF-8  mode,  the  value  of a character specified in octal must be
+       less than \400. In UTF-8 mode, values up to  \777  are  permitted.  For
+       example:
+
+         \040   is another way of writing a space
+         \40    is the same, provided there are fewer than 40
+                   previous capturing subpatterns
+         \7     is always a back reference
+         \11    might be a back reference, or another way of
+                   writing a tab
+         \011   is always a tab
+         \0113  is a tab followed by the character "3"
+         \113   might be a back reference, otherwise the
+                   character with octal code 113
+         \377   might be a back reference, otherwise
+                   the byte consisting entirely of 1 bits
+         \81    is either a back reference, or a binary zero
+                   followed by the two characters "8" and "1"
+
+       Note  that  octal  values of 100 or greater must not be introduced by a
+       leading zero, because no more than three octal digits are ever read.
+
+       All the sequences that define a single character value can be used both
+       inside  and  outside character classes. In addition, inside a character
+       class, the sequence \b is interpreted as the backspace  character  (hex
+       08),  and the sequences \R and \X are interpreted as the characters "R"
+       and "X", respectively. Outside a character class, these sequences  have
+       different meanings (see below).
+
+   Absolute and relative back references
+
+       The  sequence  \g followed by an unsigned or a negative number, option-
+       ally enclosed in braces, is an absolute or relative back  reference.  A
+       named back reference can be coded as \g{name}. Back references are dis-
+       cussed later, following the discussion of parenthesized subpatterns.
+
+   Absolute and relative subroutine calls
+
+       For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
+       name or a number enclosed either in angle brackets or single quotes, is
+       an alternative syntax for referencing a subpattern as  a  "subroutine".
+       Details  are  discussed  later.   Note  that  \g{...} (Perl syntax) and
+       \g<...> (Oniguruma syntax) are not synonymous. The  former  is  a  back
+       reference; the latter is a subroutine call.
+
+   Generic character types
+
+       Another use of backslash is for specifying generic character types. The
+       following are always recognized:
+
+         \d     any decimal digit
+         \D     any character that is not a decimal digit
+         \h     any horizontal whitespace character
+         \H     any character that is not a horizontal whitespace character
+         \s     any whitespace character
+         \S     any character that is not a whitespace character
+         \v     any vertical whitespace character
+         \V     any character that is not a vertical whitespace character
+         \w     any "word" character
+         \W     any "non-word" character
+
+       Each pair of escape sequences partitions the complete set of characters
+       into  two disjoint sets. Any given character matches one, and only one,
+       of each pair.
+
+       These character type sequences can appear both inside and outside char-
+       acter  classes.  They each match one character of the appropriate type.
+       If the current matching point is at the end of the subject string,  all
+       of them fail, since there is no character to match.
+
+       For  compatibility  with Perl, \s does not match the VT character (code
+       11).  This makes it different from the the POSIX "space" class. The  \s
+       characters  are  HT  (9), LF (10), FF (12), CR (13), and space (32). If
+       "use locale;" is included in a Perl script, \s may match the VT charac-
+       ter. In PCRE, it never does.
+
+       In  UTF-8 mode, characters with values greater than 128 never match \d,
+       \s, or \w, and always match \D, \S, and \W. This is true even when Uni-
+       code  character  property  support is available. These sequences retain
+       their original meanings from before UTF-8 support was available, mainly
+       for  efficiency  reasons. Note that this also affects \b, because it is
+       defined in terms of \w and \W.
+
+       The sequences \h, \H, \v, and \V are Perl 5.10 features. In contrast to
+       the  other  sequences, these do match certain high-valued codepoints in
+       UTF-8 mode.  The horizontal space characters are:
+
+         U+0009     Horizontal tab
+         U+0020     Space
+         U+00A0     Non-break space
+         U+1680     Ogham space mark
+         U+180E     Mongolian vowel separator
+         U+2000     En quad
+         U+2001     Em quad
+         U+2002     En space
+         U+2003     Em space
+         U+2004     Three-per-em space
+         U+2005     Four-per-em space
+         U+2006     Six-per-em space
+         U+2007     Figure space
+         U+2008     Punctuation space
+         U+2009     Thin space
+         U+200A     Hair space
+         U+202F     Narrow no-break space
+         U+205F     Medium mathematical space
+         U+3000     Ideographic space
+
+       The vertical space characters are:
+
+         U+000A     Linefeed
+         U+000B     Vertical tab
+         U+000C     Formfeed
+         U+000D     Carriage return
+         U+0085     Next line
+         U+2028     Line separator
+         U+2029     Paragraph separator
+
+       A "word" character is an underscore or any character less than 256 that
+       is  a  letter  or  digit.  The definition of letters and digits is con-
+       trolled by PCRE's low-valued character tables, and may vary if  locale-
+       specific  matching is taking place (see "Locale support" in the pcreapi
+       page). For example, in a French locale such  as  "fr_FR"  in  Unix-like
+       systems,  or "french" in Windows, some character codes greater than 128
+       are used for accented letters, and these are matched by \w. The use  of
+       locales with Unicode is discouraged.
+
+   Newline sequences
+
+       Outside  a  character class, by default, the escape sequence \R matches
+       any Unicode newline sequence. This is a Perl 5.10 feature. In non-UTF-8
+       mode \R is equivalent to the following:
+
+         (?>\r\n|\n|\x0b|\f|\r|\x85)
+
+       This  is  an  example  of an "atomic group", details of which are given
+       below.  This particular group matches either the two-character sequence
+       CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,
+       U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
+       return, U+000D), or NEL (next line, U+0085). The two-character sequence
+       is treated as a single unit that cannot be split.
+
+       In UTF-8 mode, two additional characters whose codepoints  are  greater
+       than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
+       rator, U+2029).  Unicode character property support is not  needed  for
+       these characters to be recognized.
+
+       It is possible to restrict \R to match only CR, LF, or CRLF (instead of
+       the complete set  of  Unicode  line  endings)  by  setting  the  option
+       PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.
+       (BSR is an abbrevation for "backslash R".) This can be made the default
+       when  PCRE  is  built;  if this is the case, the other behaviour can be
+       requested via the PCRE_BSR_UNICODE option.   It  is  also  possible  to
+       specify  these  settings  by  starting a pattern string with one of the
+       following sequences:
+
+         (*BSR_ANYCRLF)   CR, LF, or CRLF only
+         (*BSR_UNICODE)   any Unicode newline sequence
+
+       These override the default and the options given to  pcre_compile()  or
+       pcre_compile2(),  but  they  can  be  overridden  by  options  given to
+       pcre_exec() or pcre_dfa_exec(). Note that these special settings, which
+       are  not  Perl-compatible,  are  recognized only at the very start of a
+       pattern, and that they must be in upper case. If more than one of  them
+       is present, the last one is used. They can be combined with a change of
+       newline convention, for example, a pattern can start with:
+
+         (*ANY)(*BSR_ANYCRLF)
+
+       Inside a character class, \R matches the letter "R".
+
+   Unicode character properties
+
+       When PCRE is built with Unicode character property support, three addi-
+       tional  escape sequences that match characters with specific properties
+       are available.  When not in UTF-8 mode, these sequences are  of  course
+       limited  to  testing characters whose codepoints are less than 256, but
+       they do work in this mode.  The extra escape sequences are:
+
+         \p{xx}   a character with the xx property
+         \P{xx}   a character without the xx property
+         \X       an extended Unicode sequence
+
+       The property names represented by xx above are limited to  the  Unicode
+       script names, the general category properties, and "Any", which matches
+       any character (including newline). Other properties such as "InMusical-
+       Symbols"  are  not  currently supported by PCRE. Note that \P{Any} does
+       not match any characters, so always causes a match failure.
+
+       Sets of Unicode characters are defined as belonging to certain scripts.
+       A  character from one of these sets can be matched using a script name.
+       For example:
+
+         \p{Greek}
+         \P{Han}
+
+       Those that are not part of an identified script are lumped together  as
+       "Common". The current list of scripts is:
+
+       Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
+       Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,
+       Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
+       Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-
+       gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,
+       Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
+       Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,
+       Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
+       Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
+
+       Each  character has exactly one general category property, specified by
+       a two-letter abbreviation. For compatibility with Perl, negation can be
+       specified  by  including a circumflex between the opening brace and the
+       property name. For example, \p{^Lu} is the same as \P{Lu}.
+
+       If only one letter is specified with \p or \P, it includes all the gen-
+       eral  category properties that start with that letter. In this case, in
+       the absence of negation, the curly brackets in the escape sequence  are
+       optional; these two examples have the same effect:
+
+         \p{L}
+         \pL
+
+       The following general category property codes are supported:
+
+         C     Other
+         Cc    Control
+         Cf    Format
+         Cn    Unassigned
+         Co    Private use
+         Cs    Surrogate
+
+         L     Letter
+         Ll    Lower case letter
+         Lm    Modifier letter
+         Lo    Other letter
+         Lt    Title case letter
+         Lu    Upper case letter
+
+         M     Mark
+         Mc    Spacing mark
+         Me    Enclosing mark
+         Mn    Non-spacing mark
+
+         N     Number
+         Nd    Decimal number
+         Nl    Letter number
+         No    Other number
+
+         P     Punctuation
+         Pc    Connector punctuation
+         Pd    Dash punctuation
+         Pe    Close punctuation
+         Pf    Final punctuation
+         Pi    Initial punctuation
+         Po    Other punctuation
+         Ps    Open punctuation
+
+         S     Symbol
+         Sc    Currency symbol
+         Sk    Modifier symbol
+         Sm    Mathematical symbol
+         So    Other symbol
+
+         Z     Separator
+         Zl    Line separator
+         Zp    Paragraph separator
+         Zs    Space separator
+
+       The  special property L& is also supported: it matches a character that
+       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
+       classified as a modifier or "other".
+
+       The  Cs  (Surrogate)  property  applies only to characters in the range
+       U+D800 to U+DFFF. Such characters are not valid in UTF-8  strings  (see
+       RFC 3629) and so cannot be tested by PCRE, unless UTF-8 validity check-
+       ing has been turned off (see the discussion  of  PCRE_NO_UTF8_CHECK  in
+       the pcreapi page). Perl does not support the Cs property.
+
+       The  long  synonyms  for  property  names  that  Perl supports (such as
+       \p{Letter}) are not supported by PCRE, nor is it  permitted  to  prefix
+       any of these properties with "Is".
+
+       No character that is in the Unicode table has the Cn (unassigned) prop-
+       erty.  Instead, this property is assumed for any code point that is not
+       in the Unicode table.
+
+       Specifying  caseless  matching  does not affect these escape sequences.
+       For example, \p{Lu} always matches only upper case letters.
+
+       The \X escape matches any number of Unicode  characters  that  form  an
+       extended Unicode sequence. \X is equivalent to
+
+         (?>\PM\pM*)
+
+       That  is,  it matches a character without the "mark" property, followed
+       by zero or more characters with the "mark"  property,  and  treats  the
+       sequence  as  an  atomic group (see below).  Characters with the "mark"
+       property are typically accents that  affect  the  preceding  character.
+       None  of  them  have  codepoints less than 256, so in non-UTF-8 mode \X
+       matches any one character.
+
+       Matching characters by Unicode property is not fast, because  PCRE  has
+       to  search  a  structure  that  contains data for over fifteen thousand
+       characters. That is why the traditional escape sequences such as \d and
+       \w do not use Unicode properties in PCRE.
+
+   Resetting the match start
+
+       The escape sequence \K, which is a Perl 5.10 feature, causes any previ-
+       ously matched characters not  to  be  included  in  the  final  matched
+       sequence. For example, the pattern:
+
+         foo\Kbar
+
+       matches  "foobar",  but reports that it has matched "bar". This feature
+       is similar to a lookbehind assertion (described  below).   However,  in
+       this  case, the part of the subject before the real match does not have
+       to be of fixed length, as lookbehind assertions do. The use of \K  does
+       not  interfere  with  the setting of captured substrings.  For example,
+       when the pattern
+
+         (foo)\Kbar
+
+       matches "foobar", the first substring is still set to "foo".
+
+   Simple assertions
+
+       The final use of backslash is for certain simple assertions. An  asser-
+       tion  specifies a condition that has to be met at a particular point in
+       a match, without consuming any characters from the subject string.  The
+       use  of subpatterns for more complicated assertions is described below.
+       The backslashed assertions are:
+
+         \b     matches at a word boundary
+         \B     matches when not at a word boundary
+         \A     matches at the start of the subject
+         \Z     matches at the end of the subject
+                 also matches before a newline at the end of the subject
+         \z     matches only at the end of the subject
+         \G     matches at the first matching position in the subject
+
+       These assertions may not appear in character classes (but note that  \b
+       has a different meaning, namely the backspace character, inside a char-
+       acter class).
+
+       A word boundary is a position in the subject string where  the  current
+       character  and  the previous character do not both match \w or \W (i.e.
+       one matches \w and the other matches \W), or the start or  end  of  the
+       string if the first or last character matches \w, respectively. Neither
+       PCRE nor Perl has a separte "start of word" or "end  of  word"  metase-
+       quence.  However,  whatever follows \b normally determines which it is.
+       For example, the fragment \ba matches "a" at the start of a word.
+
+       The \A, \Z, and \z assertions differ from  the  traditional  circumflex
+       and dollar (described in the next section) in that they only ever match
+       at the very start and end of the subject string, whatever  options  are
+       set.  Thus,  they are independent of multiline mode. These three asser-
+       tions are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options, which
+       affect  only the behaviour of the circumflex and dollar metacharacters.
+       However, if the startoffset argument of pcre_exec() is non-zero,  indi-
+       cating that matching is to start at a point other than the beginning of
+       the subject, \A can never match. The difference between \Z  and  \z  is
+       that \Z matches before a newline at the end of the string as well as at
+       the very end, whereas \z matches only at the end.
+
+       The \G assertion is true only when the current matching position is  at
+       the  start point of the match, as specified by the startoffset argument
+       of pcre_exec(). It differs from \A when the  value  of  startoffset  is
+       non-zero.  By calling pcre_exec() multiple times with appropriate argu-
+       ments, you can mimic Perl's /g option, and it is in this kind of imple-
+       mentation where \G can be useful.
+
+       Note,  however,  that  PCRE's interpretation of \G, as the start of the
+       current match, is subtly different from Perl's, which defines it as the
+       end  of  the  previous  match. In Perl, these can be different when the
+       previously matched string was empty. Because PCRE does just  one  match
+       at a time, it cannot reproduce this behaviour.
+
+       If  all  the alternatives of a pattern begin with \G, the expression is
+       anchored to the starting match position, and the "anchored" flag is set
+       in the compiled regular expression.
+
+
+CIRCUMFLEX AND DOLLAR
+
+       Outside a character class, in the default matching mode, the circumflex
+       character is an assertion that is true only  if  the  current  matching
+       point  is  at the start of the subject string. If the startoffset argu-
+       ment of pcre_exec() is non-zero, circumflex  can  never  match  if  the
+       PCRE_MULTILINE  option  is  unset. Inside a character class, circumflex
+       has an entirely different meaning (see below).
+
+       Circumflex need not be the first character of the pattern if  a  number
+       of  alternatives are involved, but it should be the first thing in each
+       alternative in which it appears if the pattern is ever  to  match  that
+       branch.  If all possible alternatives start with a circumflex, that is,
+       if the pattern is constrained to match only at the start  of  the  sub-
+       ject,  it  is  said  to be an "anchored" pattern. (There are also other
+       constructs that can cause a pattern to be anchored.)
+
+       A dollar character is an assertion that is true  only  if  the  current
+       matching  point  is  at  the  end of the subject string, or immediately
+       before a newline at the end of the string (by default). Dollar need not
+       be  the  last  character of the pattern if a number of alternatives are
+       involved, but it should be the last item in  any  branch  in  which  it
+       appears. Dollar has no special meaning in a character class.
+
+       The  meaning  of  dollar  can be changed so that it matches only at the
+       very end of the string, by setting the  PCRE_DOLLAR_ENDONLY  option  at
+       compile time. This does not affect the \Z assertion.
+
+       The meanings of the circumflex and dollar characters are changed if the
+       PCRE_MULTILINE option is set. When  this  is  the  case,  a  circumflex
+       matches  immediately after internal newlines as well as at the start of
+       the subject string. It does not match after a  newline  that  ends  the
+       string.  A dollar matches before any newlines in the string, as well as
+       at the very end, when PCRE_MULTILINE is set. When newline is  specified
+       as  the  two-character  sequence CRLF, isolated CR and LF characters do
+       not indicate newlines.
+
+       For example, the pattern /^abc$/ matches the subject string  "def\nabc"
+       (where  \n  represents a newline) in multiline mode, but not otherwise.
+       Consequently, patterns that are anchored in single  line  mode  because
+       all  branches  start  with  ^ are not anchored in multiline mode, and a
+       match for circumflex is  possible  when  the  startoffset  argument  of
+       pcre_exec()  is  non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if
+       PCRE_MULTILINE is set.
+
+       Note that the sequences \A, \Z, and \z can be used to match  the  start
+       and  end of the subject in both modes, and if all branches of a pattern
+       start with \A it is always anchored, whether or not  PCRE_MULTILINE  is
+       set.
+
+
+FULL STOP (PERIOD, DOT)
+
+       Outside a character class, a dot in the pattern matches any one charac-
+       ter in the subject string except (by default) a character  that  signi-
+       fies  the  end  of  a line. In UTF-8 mode, the matched character may be
+       more than one byte long.
+
+       When a line ending is defined as a single character, dot never  matches
+       that  character; when the two-character sequence CRLF is used, dot does
+       not match CR if it is immediately followed  by  LF,  but  otherwise  it
+       matches  all characters (including isolated CRs and LFs). When any Uni-
+       code line endings are being recognized, dot does not match CR or LF  or
+       any of the other line ending characters.
+
+       The  behaviour  of  dot  with regard to newlines can be changed. If the
+       PCRE_DOTALL option is set, a dot matches  any  one  character,  without
+       exception. If the two-character sequence CRLF is present in the subject
+       string, it takes two dots to match it.
+
+       The handling of dot is entirely independent of the handling of  circum-
+       flex  and  dollar,  the  only relationship being that they both involve
+       newlines. Dot has no special meaning in a character class.
+
+
+MATCHING A SINGLE BYTE
+
+       Outside a character class, the escape sequence \C matches any one byte,
+       both  in  and  out  of  UTF-8 mode. Unlike a dot, it always matches any
+       line-ending characters. The feature is provided in  Perl  in  order  to
+       match  individual bytes in UTF-8 mode. Because it breaks up UTF-8 char-
+       acters into individual bytes, what remains in the string may be a  mal-
+       formed  UTF-8  string.  For this reason, the \C escape sequence is best
+       avoided.
+
+       PCRE does not allow \C to appear in  lookbehind  assertions  (described
+       below),  because  in UTF-8 mode this would make it impossible to calcu-
+       late the length of the lookbehind.
+
+
+SQUARE BRACKETS AND CHARACTER CLASSES
+
+       An opening square bracket introduces a character class, terminated by a
+       closing square bracket. A closing square bracket on its own is not spe-
+       cial by default.  However, if the PCRE_JAVASCRIPT_COMPAT option is set,
+       a lone closing square bracket causes a compile-time error. If a closing
+       square bracket is required as a member of the class, it should  be  the
+       first  data  character  in  the  class (after an initial circumflex, if
+       present) or escaped with a backslash.
+
+       A character class matches a single character in the subject.  In  UTF-8
+       mode, the character may be more than one byte long. A matched character
+       must be in the set of characters defined by the class, unless the first
+       character  in  the  class definition is a circumflex, in which case the
+       subject character must not be in the set defined by  the  class.  If  a
+       circumflex  is actually required as a member of the class, ensure it is
+       not the first character, or escape it with a backslash.
+
+       For example, the character class [aeiou] matches any lower case  vowel,
+       while  [^aeiou]  matches  any character that is not a lower case vowel.
+       Note that a circumflex is just a convenient notation for specifying the
+       characters  that  are in the class by enumerating those that are not. A
+       class that starts with a circumflex is not an assertion; it still  con-
+       sumes  a  character  from the subject string, and therefore it fails if
+       the current pointer is at the end of the string.
+
+       In UTF-8 mode, characters with values greater than 255 can be  included
+       in  a  class as a literal string of bytes, or by using the \x{ escaping
+       mechanism.
+
+       When caseless matching is set, any letters in a  class  represent  both
+       their  upper  case  and lower case versions, so for example, a caseless
+       [aeiou] matches "A" as well as "a", and a caseless  [^aeiou]  does  not
+       match  "A", whereas a caseful version would. In UTF-8 mode, PCRE always
+       understands the concept of case for characters whose  values  are  less
+       than  128, so caseless matching is always possible. For characters with
+       higher values, the concept of case is supported  if  PCRE  is  compiled
+       with  Unicode  property support, but not otherwise.  If you want to use
+       caseless matching in UTF8-mode for characters 128 and above,  you  must
+       ensure  that  PCRE is compiled with Unicode property support as well as
+       with UTF-8 support.
+
+       Characters that might indicate line breaks are  never  treated  in  any
+       special  way  when  matching  character  classes,  whatever line-ending
+       sequence is in  use,  and  whatever  setting  of  the  PCRE_DOTALL  and
+       PCRE_MULTILINE options is used. A class such as [^a] always matches one
+       of these characters.
+
+       The minus (hyphen) character can be used to specify a range of  charac-
+       ters  in  a  character  class.  For  example,  [d-m] matches any letter
+       between d and m, inclusive. If a  minus  character  is  required  in  a
+       class,  it  must  be  escaped  with a backslash or appear in a position
+       where it cannot be interpreted as indicating a range, typically as  the
+       first or last character in the class.
+
+       It is not possible to have the literal character "]" as the end charac-
+       ter of a range. A pattern such as [W-]46] is interpreted as a class  of
+       two  characters ("W" and "-") followed by a literal string "46]", so it
+       would match "W46]" or "-46]". However, if the "]"  is  escaped  with  a
+       backslash  it is interpreted as the end of range, so [W-\]46] is inter-
+       preted as a class containing a range followed by two other  characters.
+       The  octal or hexadecimal representation of "]" can also be used to end
+       a range.
+
+       Ranges operate in the collating sequence of character values. They  can
+       also   be  used  for  characters  specified  numerically,  for  example
+       [\000-\037]. In UTF-8 mode, ranges can include characters whose  values
+       are greater than 255, for example [\x{100}-\x{2ff}].
+
+       If a range that includes letters is used when caseless matching is set,
+       it matches the letters in either case. For example, [W-c] is equivalent
+       to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if
+       character tables for a French locale are in  use,  [\xc8-\xcb]  matches
+       accented  E  characters in both cases. In UTF-8 mode, PCRE supports the
+       concept of case for characters with values greater than 128  only  when
+       it is compiled with Unicode property support.
+
+       The  character types \d, \D, \p, \P, \s, \S, \w, and \W may also appear
+       in a character class, and add the characters that  they  match  to  the
+       class. For example, [\dABCDEF] matches any hexadecimal digit. A circum-
+       flex can conveniently be used with the upper case  character  types  to
+       specify  a  more  restricted  set of characters than the matching lower
+       case type. For example, the class [^\W_] matches any letter  or  digit,
+       but not underscore.
+
+       The  only  metacharacters  that are recognized in character classes are
+       backslash, hyphen (only where it can be  interpreted  as  specifying  a
+       range),  circumflex  (only  at the start), opening square bracket (only
+       when it can be interpreted as introducing a POSIX class name - see  the
+       next  section),  and  the  terminating closing square bracket. However,
+       escaping other non-alphanumeric characters does no harm.
+
+
+POSIX CHARACTER CLASSES
+
+       Perl supports the POSIX notation for character classes. This uses names
+       enclosed  by  [: and :] within the enclosing square brackets. PCRE also
+       supports this notation. For example,
+
+         [01[:alpha:]%]
+
+       matches "0", "1", any alphabetic character, or "%". The supported class
+       names are
+
+         alnum    letters and digits
+         alpha    letters
+         ascii    character codes 0 - 127
+         blank    space or tab only
+         cntrl    control characters
+         digit    decimal digits (same as \d)
+         graph    printing characters, excluding space
+         lower    lower case letters
+         print    printing characters, including space
+         punct    printing characters, excluding letters and digits
+         space    white space (not quite the same as \s)
+         upper    upper case letters
+         word     "word" characters (same as \w)
+         xdigit   hexadecimal digits
+
+       The  "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
+       and space (32). Notice that this list includes the VT  character  (code
+       11). This makes "space" different to \s, which does not include VT (for
+       Perl compatibility).
+
+       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
+       from  Perl  5.8. Another Perl extension is negation, which is indicated
+       by a ^ character after the colon. For example,
+
+         [12[:^digit:]]
+
+       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the
+       POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
+       these are not supported, and an error is given if they are encountered.
+
+       In UTF-8 mode, characters with values greater than 128 do not match any
+       of the POSIX character classes.
+
+
+VERTICAL BAR
+
+       Vertical  bar characters are used to separate alternative patterns. For
+       example, the pattern
+
+         gilbert|sullivan
+
+       matches either "gilbert" or "sullivan". Any number of alternatives  may
+       appear,  and  an  empty  alternative  is  permitted (matching the empty
+       string). The matching process tries each alternative in turn, from left
+       to  right, and the first one that succeeds is used. If the alternatives
+       are within a subpattern (defined below), "succeeds" means matching  the
+       rest of the main pattern as well as the alternative in the subpattern.
+
+
+INTERNAL OPTION SETTING
+
+       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
+       PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
+       within  the  pattern  by  a  sequence  of  Perl option letters enclosed
+       between "(?" and ")".  The option letters are
+
+         i  for PCRE_CASELESS
+         m  for PCRE_MULTILINE
+         s  for PCRE_DOTALL
+         x  for PCRE_EXTENDED
+
+       For example, (?im) sets caseless, multiline matching. It is also possi-
+       ble to unset these options by preceding the letter with a hyphen, and a
+       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
+       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
+       is also permitted. If a  letter  appears  both  before  and  after  the
+       hyphen, the option is unset.
+
+       The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
+       can be changed in the same way as the Perl-compatible options by  using
+       the characters J, U and X respectively.
+
+       When  one  of  these  option  changes occurs at top level (that is, not
+       inside subpattern parentheses), the change applies to the remainder  of
+       the pattern that follows. If the change is placed right at the start of
+       a pattern, PCRE extracts it into the global options (and it will there-
+       fore show up in data extracted by the pcre_fullinfo() function).
+
+       An  option  change  within a subpattern (see below for a description of
+       subpatterns) affects only that part of the current pattern that follows
+       it, so
+
+         (a(?i)b)c
+
+       matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
+       used).  By this means, options can be made to have  different  settings
+       in  different parts of the pattern. Any changes made in one alternative
+       do carry on into subsequent branches within the  same  subpattern.  For
+       example,
+
+         (a(?i)b|c)
+
+       matches  "ab",  "aB",  "c",  and "C", even though when matching "C" the
+       first branch is abandoned before the option setting.  This  is  because
+       the  effects  of option settings happen at compile time. There would be
+       some very weird behaviour otherwise.
+
+       Note: There are other PCRE-specific options that  can  be  set  by  the
+       application  when  the  compile  or match functions are called. In some
+       cases the pattern can contain special leading sequences such as (*CRLF)
+       to  override  what  the application has set or what has been defaulted.
+       Details are given in the section entitled  "Newline  sequences"  above.
+       There  is  also  the  (*UTF8)  leading sequence that can be used to set
+       UTF-8 mode; this is equivalent to setting the PCRE_UTF8 option.
+
+
+SUBPATTERNS
+
+       Subpatterns are delimited by parentheses (round brackets), which can be
+       nested.  Turning part of a pattern into a subpattern does two things:
+
+       1. It localizes a set of alternatives. For example, the pattern
+
+         cat(aract|erpillar|)
+
+       matches  one  of the words "cat", "cataract", or "caterpillar". Without
+       the parentheses, it would match  "cataract",  "erpillar"  or  an  empty
+       string.
+
+       2.  It  sets  up  the  subpattern as a capturing subpattern. This means
+       that, when the whole pattern  matches,  that  portion  of  the  subject
+       string that matched the subpattern is passed back to the caller via the
+       ovector argument of pcre_exec(). Opening parentheses are  counted  from
+       left  to  right  (starting  from 1) to obtain numbers for the capturing
+       subpatterns.
+
+       For example, if the string "the red king" is matched against  the  pat-
+       tern
+
+         the ((red|white) (king|queen))
+
+       the captured substrings are "red king", "red", and "king", and are num-
+       bered 1, 2, and 3, respectively.
+
+       The fact that plain parentheses fulfil  two  functions  is  not  always
+       helpful.   There are often times when a grouping subpattern is required
+       without a capturing requirement. If an opening parenthesis is  followed
+       by  a question mark and a colon, the subpattern does not do any captur-
+       ing, and is not counted when computing the  number  of  any  subsequent
+       capturing  subpatterns. For example, if the string "the white queen" is
+       matched against the pattern
+
+         the ((?:red|white) (king|queen))
+
+       the captured substrings are "white queen" and "queen", and are numbered
+       1 and 2. The maximum number of capturing subpatterns is 65535.
+
+       As  a  convenient shorthand, if any option settings are required at the
+       start of a non-capturing subpattern,  the  option  letters  may  appear
+       between the "?" and the ":". Thus the two patterns
+
+         (?i:saturday|sunday)
+         (?:(?i)saturday|sunday)
+
+       match exactly the same set of strings. Because alternative branches are
+       tried from left to right, and options are not reset until  the  end  of
+       the  subpattern is reached, an option setting in one branch does affect
+       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
+       "Saturday".
+
+
+DUPLICATE SUBPATTERN NUMBERS
+
+       Perl 5.10 introduced a feature whereby each alternative in a subpattern
+       uses the same numbers for its capturing parentheses. Such a  subpattern
+       starts  with (?| and is itself a non-capturing subpattern. For example,
+       consider this pattern:
+
+         (?|(Sat)ur|(Sun))day
+
+       Because the two alternatives are inside a (?| group, both sets of  cap-
+       turing  parentheses  are  numbered one. Thus, when the pattern matches,
+       you can look at captured substring number  one,  whichever  alternative
+       matched.  This  construct  is useful when you want to capture part, but
+       not all, of one of a number of alternatives. Inside a (?| group, paren-
+       theses  are  numbered as usual, but the number is reset at the start of
+       each branch. The numbers of any capturing buffers that follow the  sub-
+       pattern  start after the highest number used in any branch. The follow-
+       ing example is taken from the Perl documentation.  The  numbers  under-
+       neath show in which buffer the captured content will be stored.
+
+         # before  ---------------branch-reset----------- after
+         / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
+         # 1            2         2  3        2     3     4
+
+       A  backreference  to  a  numbered subpattern uses the most recent value
+       that is set for that number by any subpattern.  The  following  pattern
+       matches "abcabc" or "defdef":
+
+         /(?|(abc)|(def))\1/
+
+       In  contrast, a recursive or "subroutine" call to a numbered subpattern
+       always refers to the first one in the pattern with  the  given  number.
+       The following pattern matches "abcabc" or "defabc":
+
+         /(?|(abc)|(def))(?1)/
+
+       If  a condition test for a subpattern's having matched refers to a non-
+       unique number, the test is true if any of the subpatterns of that  num-
+       ber have matched.
+
+       An  alternative approach to using this "branch reset" feature is to use
+       duplicate named subpatterns, as described in the next section.
+
+
+NAMED SUBPATTERNS
+
+       Identifying capturing parentheses by number is simple, but  it  can  be
+       very  hard  to keep track of the numbers in complicated regular expres-
+       sions. Furthermore, if an  expression  is  modified,  the  numbers  may
+       change.  To help with this difficulty, PCRE supports the naming of sub-
+       patterns. This feature was not added to Perl until release 5.10. Python
+       had  the  feature earlier, and PCRE introduced it at release 4.0, using
+       the Python syntax. PCRE now supports both the Perl and the Python  syn-
+       tax.  Perl  allows  identically  numbered subpatterns to have different
+       names, but PCRE does not.
+
+       In PCRE, a subpattern can be named in one of three  ways:  (?<name>...)
+       or  (?'name'...)  as in Perl, or (?P<name>...) as in Python. References
+       to capturing parentheses from other parts of the pattern, such as back-
+       references,  recursion,  and conditions, can be made by name as well as
+       by number.
+
+       Names consist of up to  32  alphanumeric  characters  and  underscores.
+       Named  capturing  parentheses  are  still  allocated numbers as well as
+       names, exactly as if the names were not present. The PCRE API  provides
+       function calls for extracting the name-to-number translation table from
+       a compiled pattern. There is also a convenience function for extracting
+       a captured substring by name.
+
+       By  default, a name must be unique within a pattern, but it is possible
+       to relax this constraint by setting the PCRE_DUPNAMES option at compile
+       time.  (Duplicate  names are also always permitted for subpatterns with
+       the same number, set up as described in the previous  section.)  Dupli-
+       cate  names  can  be useful for patterns where only one instance of the
+       named parentheses can match. Suppose you want to match the  name  of  a
+       weekday,  either as a 3-letter abbreviation or as the full name, and in
+       both cases you want to extract the abbreviation. This pattern (ignoring
+       the line breaks) does the job:
+
+         (?<DN>Mon|Fri|Sun)(?:day)?|
+         (?<DN>Tue)(?:sday)?|
+         (?<DN>Wed)(?:nesday)?|
+         (?<DN>Thu)(?:rsday)?|
+         (?<DN>Sat)(?:urday)?
+
+       There  are  five capturing substrings, but only one is ever set after a
+       match.  (An alternative way of solving this problem is to use a "branch
+       reset" subpattern, as described in the previous section.)
+
+       The  convenience  function  for extracting the data by name returns the
+       substring for the first (and in this example, the only)  subpattern  of
+       that  name  that  matched.  This saves searching to find which numbered
+       subpattern it was.
+
+       If you make a backreference to a non-unique named subpattern from else-
+       where  in the pattern, the one that corresponds to the first occurrence
+       of the name is used. In the absence of duplicate numbers (see the  pre-
+       vious  section)  this  is  the one with the lowest number. If you use a
+       named reference in a condition test (see the section  about  conditions
+       below),  either  to check whether a subpattern has matched, or to check
+       for recursion, all subpatterns with the same name are  tested.  If  the
+       condition  is  true for any one of them, the overall condition is true.
+       This is the same behaviour as testing by number. For further details of
+       the interfaces for handling named subpatterns, see the pcreapi documen-
+       tation.
+
+       Warning: You cannot use different names to distinguish between two sub-
+       patterns  with  the same number because PCRE uses only the numbers when
+       matching. For this reason, an error is given at compile time if differ-
+       ent  names  are given to subpatterns with the same number. However, you
+       can give the same name to subpatterns with the same number,  even  when
+       PCRE_DUPNAMES is not set.
+
+
+REPETITION
+
+       Repetition  is  specified  by  quantifiers, which can follow any of the
+       following items:
+
+         a literal data character
+         the dot metacharacter
+         the \C escape sequence
+         the \X escape sequence (in UTF-8 mode with Unicode properties)
+         the \R escape sequence
+         an escape such as \d that matches a single character
+         a character class
+         a back reference (see next section)
+         a parenthesized subpattern (unless it is an assertion)
+         a recursive or "subroutine" call to a subpattern
+
+       The general repetition quantifier specifies a minimum and maximum  num-
+       ber  of  permitted matches, by giving the two numbers in curly brackets
+       (braces), separated by a comma. The numbers must be  less  than  65536,
+       and the first must be less than or equal to the second. For example:
+
+         z{2,4}
+
+       matches  "zz",  "zzz",  or  "zzzz". A closing brace on its own is not a
+       special character. If the second number is omitted, but  the  comma  is
+       present,  there  is  no upper limit; if the second number and the comma
+       are both omitted, the quantifier specifies an exact number of  required
+       matches. Thus
+
+         [aeiou]{3,}
+
+       matches at least 3 successive vowels, but may match many more, while
+
+         \d{8}
+
+       matches  exactly  8  digits. An opening curly bracket that appears in a
+       position where a quantifier is not allowed, or one that does not  match
+       the  syntax of a quantifier, is taken as a literal character. For exam-
+       ple, {,6} is not a quantifier, but a literal string of four characters.
+
+       In UTF-8 mode, quantifiers apply to UTF-8  characters  rather  than  to
+       individual bytes. Thus, for example, \x{100}{2} matches two UTF-8 char-
+       acters, each of which is represented by a two-byte sequence. Similarly,
+       when Unicode property support is available, \X{3} matches three Unicode
+       extended sequences, each of which may be several bytes long  (and  they
+       may be of different lengths).
+
+       The quantifier {0} is permitted, causing the expression to behave as if
+       the previous item and the quantifier were not present. This may be use-
+       ful  for  subpatterns that are referenced as subroutines from elsewhere
+       in the pattern. Items other than subpatterns that have a {0} quantifier
+       are omitted from the compiled pattern.
+
+       For  convenience, the three most common quantifiers have single-charac-
+       ter abbreviations:
+
+         *    is equivalent to {0,}
+         +    is equivalent to {1,}
+         ?    is equivalent to {0,1}
+
+       It is possible to construct infinite loops by  following  a  subpattern
+       that can match no characters with a quantifier that has no upper limit,
+       for example:
+
+         (a?)*
+
+       Earlier versions of Perl and PCRE used to give an error at compile time
+       for  such  patterns. However, because there are cases where this can be
+       useful, such patterns are now accepted, but if any  repetition  of  the
+       subpattern  does in fact match no characters, the loop is forcibly bro-
+       ken.
+
+       By default, the quantifiers are "greedy", that is, they match  as  much
+       as  possible  (up  to  the  maximum number of permitted times), without
+       causing the rest of the pattern to fail. The classic example  of  where
+       this gives problems is in trying to match comments in C programs. These
+       appear between /* and */ and within the comment,  individual  *  and  /
+       characters  may  appear. An attempt to match C comments by applying the
+       pattern
+
+         /\*.*\*/
+
+       to the string
+
+         /* first comment */  not comment  /* second comment */
+
+       fails, because it matches the entire string owing to the greediness  of
+       the .*  item.
+
+       However,  if  a quantifier is followed by a question mark, it ceases to
+       be greedy, and instead matches the minimum number of times possible, so
+       the pattern
+
+         /\*.*?\*/
+
+       does  the  right  thing with the C comments. The meaning of the various
+       quantifiers is not otherwise changed,  just  the  preferred  number  of
+       matches.   Do  not  confuse this use of question mark with its use as a
+       quantifier in its own right. Because it has two uses, it can  sometimes
+       appear doubled, as in
+
+         \d??\d
+
+       which matches one digit by preference, but can match two if that is the
+       only way the rest of the pattern matches.
+
+       If the PCRE_UNGREEDY option is set (an option that is not available  in
+       Perl),  the  quantifiers are not greedy by default, but individual ones
+       can be made greedy by following them with a  question  mark.  In  other
+       words, it inverts the default behaviour.
+
+       When  a  parenthesized  subpattern  is quantified with a minimum repeat
+       count that is greater than 1 or with a limited maximum, more memory  is
+       required  for  the  compiled  pattern, in proportion to the size of the
+       minimum or maximum.
+
+       If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
+       alent  to  Perl's  /s) is set, thus allowing the dot to match newlines,
+       the pattern is implicitly anchored, because whatever  follows  will  be
+       tried  against every character position in the subject string, so there
+       is no point in retrying the overall match at  any  position  after  the
+       first.  PCRE  normally treats such a pattern as though it were preceded
+       by \A.
+
+       In cases where it is known that the subject  string  contains  no  new-
+       lines,  it  is  worth setting PCRE_DOTALL in order to obtain this opti-
+       mization, or alternatively using ^ to indicate anchoring explicitly.
+
+       However, there is one situation where the optimization cannot be  used.
+       When  .*   is  inside  capturing  parentheses that are the subject of a
+       backreference elsewhere in the pattern, a match at the start  may  fail
+       where a later one succeeds. Consider, for example:
+
+         (.*)abc\1
+
+       If  the subject is "xyz123abc123" the match point is the fourth charac-
+       ter. For this reason, such a pattern is not implicitly anchored.
+
+       When a capturing subpattern is repeated, the value captured is the sub-
+       string that matched the final iteration. For example, after
+
+         (tweedle[dume]{3}\s*)+
+
+       has matched "tweedledum tweedledee" the value of the captured substring
+       is "tweedledee". However, if there are  nested  capturing  subpatterns,
+       the  corresponding captured values may have been set in previous itera-
+       tions. For example, after
+
+         /(a|(b))+/
+
+       matches "aba" the value of the second captured substring is "b".
+
+
+ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
+
+       With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
+       repetition,  failure  of what follows normally causes the repeated item
+       to be re-evaluated to see if a different number of repeats  allows  the
+       rest  of  the pattern to match. Sometimes it is useful to prevent this,
+       either to change the nature of the match, or to cause it  fail  earlier
+       than  it otherwise might, when the author of the pattern knows there is
+       no point in carrying on.
+
+       Consider, for example, the pattern \d+foo when applied to  the  subject
+       line
+
+         123456bar
+
+       After matching all 6 digits and then failing to match "foo", the normal
+       action of the matcher is to try again with only 5 digits  matching  the
+       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
+       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
+       the  means for specifying that once a subpattern has matched, it is not
+       to be re-evaluated in this way.
+
+       If we use atomic grouping for the previous example, the  matcher  gives
+       up  immediately  on failing to match "foo" the first time. The notation
+       is a kind of special parenthesis, starting with (?> as in this example:
+
+         (?>\d+)foo
+
+       This kind of parenthesis "locks up" the  part of the  pattern  it  con-
+       tains  once  it  has matched, and a failure further into the pattern is
+       prevented from backtracking into it. Backtracking past it  to  previous
+       items, however, works as normal.
+
+       An  alternative  description  is that a subpattern of this type matches
+       the string of characters that an  identical  standalone  pattern  would
+       match, if anchored at the current point in the subject string.
+
+       Atomic grouping subpatterns are not capturing subpatterns. Simple cases
+       such as the above example can be thought of as a maximizing repeat that
+       must  swallow  everything  it can. So, while both \d+ and \d+? are pre-
+       pared to adjust the number of digits they match in order  to  make  the
+       rest of the pattern match, (?>\d+) can only match an entire sequence of
+       digits.
+
+       Atomic groups in general can of course contain arbitrarily  complicated
+       subpatterns,  and  can  be  nested. However, when the subpattern for an
+       atomic group is just a single repeated item, as in the example above, a
+       simpler  notation,  called  a "possessive quantifier" can be used. This
+       consists of an additional + character  following  a  quantifier.  Using
+       this notation, the previous example can be rewritten as
+
+         \d++foo
+
+       Note that a possessive quantifier can be used with an entire group, for
+       example:
+
+         (abc|xyz){2,3}+
+
+       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
+       PCRE_UNGREEDY option is ignored. They are a convenient notation for the
+       simpler forms of atomic group. However, there is no difference  in  the
+       meaning  of  a  possessive  quantifier and the equivalent atomic group,
+       though there may be a performance  difference;  possessive  quantifiers
+       should be slightly faster.
+
+       The  possessive  quantifier syntax is an extension to the Perl 5.8 syn-
+       tax.  Jeffrey Friedl originated the idea (and the name)  in  the  first
+       edition of his book. Mike McCloskey liked it, so implemented it when he
+       built Sun's Java package, and PCRE copied it from there. It  ultimately
+       found its way into Perl at release 5.10.
+
+       PCRE has an optimization that automatically "possessifies" certain sim-
+       ple pattern constructs. For example, the sequence  A+B  is  treated  as
+       A++B  because  there is no point in backtracking into a sequence of A's
+       when B must follow.
+
+       When a pattern contains an unlimited repeat inside  a  subpattern  that
+       can  itself  be  repeated  an  unlimited number of times, the use of an
+       atomic group is the only way to avoid some  failing  matches  taking  a
+       very long time indeed. The pattern
+
+         (\D+|<\d+>)*[!?]
+
+       matches  an  unlimited number of substrings that either consist of non-
+       digits, or digits enclosed in <>, followed by either ! or  ?.  When  it
+       matches, it runs quickly. However, if it is applied to
+
+         aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+       it  takes  a  long  time  before reporting failure. This is because the
+       string can be divided between the internal \D+ repeat and the  external
+       *  repeat  in  a  large  number of ways, and all have to be tried. (The
+       example uses [!?] rather than a single character at  the  end,  because
+       both  PCRE  and  Perl have an optimization that allows for fast failure
+       when a single character is used. They remember the last single  charac-
+       ter  that  is required for a match, and fail early if it is not present
+       in the string.) If the pattern is changed so that  it  uses  an  atomic
+       group, like this:
+
+         ((?>\D+)|<\d+>)*[!?]
+
+       sequences of non-digits cannot be broken, and failure happens quickly.
+
+
+BACK REFERENCES
+
+       Outside a character class, a backslash followed by a digit greater than
+       0 (and possibly further digits) is a back reference to a capturing sub-
+       pattern  earlier  (that is, to its left) in the pattern, provided there
+       have been that many previous capturing left parentheses.
+
+       However, if the decimal number following the backslash is less than 10,
+       it  is  always  taken  as a back reference, and causes an error only if
+       there are not that many capturing left parentheses in the  entire  pat-
+       tern.  In  other words, the parentheses that are referenced need not be
+       to the left of the reference for numbers less than 10. A "forward  back
+       reference"  of  this  type can make sense when a repetition is involved
+       and the subpattern to the right has participated in an  earlier  itera-
+       tion.
+
+       It  is  not  possible to have a numerical "forward back reference" to a
+       subpattern whose number is 10 or  more  using  this  syntax  because  a
+       sequence  such  as  \50 is interpreted as a character defined in octal.
+       See the subsection entitled "Non-printing characters" above for further
+       details  of  the  handling of digits following a backslash. There is no
+       such problem when named parentheses are used. A back reference  to  any
+       subpattern is possible using named parentheses (see below).
+
+       Another  way  of  avoiding  the ambiguity inherent in the use of digits
+       following a backslash is to use the \g escape sequence, which is a fea-
+       ture  introduced  in  Perl  5.10.  This  escape  must be followed by an
+       unsigned number or a negative number, optionally  enclosed  in  braces.
+       These examples are all identical:
+
+         (ring), \1
+         (ring), \g1
+         (ring), \g{1}
+
+       An  unsigned number specifies an absolute reference without the ambigu-
+       ity that is present in the older syntax. It is also useful when literal
+       digits follow the reference. A negative number is a relative reference.
+       Consider this example:
+
+         (abc(def)ghi)\g{-1}
+
+       The sequence \g{-1} is a reference to the most recently started captur-
+       ing  subpattern  before \g, that is, is it equivalent to \2. Similarly,
+       \g{-2} would be equivalent to \1. The use of relative references can be
+       helpful  in  long  patterns,  and  also in patterns that are created by
+       joining together fragments that contain references within themselves.
+
+       A back reference matches whatever actually matched the  capturing  sub-
+       pattern  in  the  current subject string, rather than anything matching
+       the subpattern itself (see "Subpatterns as subroutines" below for a way
+       of doing that). So the pattern
+
+         (sens|respons)e and \1ibility
+
+       matches  "sense and sensibility" and "response and responsibility", but
+       not "sense and responsibility". If caseful matching is in force at  the
+       time  of the back reference, the case of letters is relevant. For exam-
+       ple,
+
+         ((?i)rah)\s+\1
+
+       matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
+       original capturing subpattern is matched caselessly.
+
+       There  are  several  different ways of writing back references to named
+       subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
+       \k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
+       unified back reference syntax, in which \g can be used for both numeric
+       and  named  references,  is  also supported. We could rewrite the above
+       example in any of the following ways:
+
+         (?<p1>(?i)rah)\s+\k<p1>
+         (?'p1'(?i)rah)\s+\k{p1}
+         (?P<p1>(?i)rah)\s+(?P=p1)
+         (?<p1>(?i)rah)\s+\g{p1}
+
+       A subpattern that is referenced by  name  may  appear  in  the  pattern
+       before or after the reference.
+
+       There  may be more than one back reference to the same subpattern. If a
+       subpattern has not actually been used in a particular match,  any  back
+       references to it always fail by default. For example, the pattern
+
+         (a|(bc))\2
+
+       always  fails  if  it starts to match "a" rather than "bc". However, if
+       the PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back refer-
+       ence to an unset value matches an empty string.
+
+       Because  there may be many capturing parentheses in a pattern, all dig-
+       its following a backslash are taken as part of a potential back  refer-
+       ence  number.   If  the  pattern continues with a digit character, some
+       delimiter must  be  used  to  terminate  the  back  reference.  If  the
+       PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{
+       syntax or an empty comment (see "Comments" below) can be used.
+
+       A back reference that occurs inside the parentheses to which it  refers
+       fails  when  the subpattern is first used, so, for example, (a\1) never
+       matches.  However, such references can be useful inside  repeated  sub-
+       patterns. For example, the pattern
+
+         (a|b\1)+
+
+       matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
+       ation of the subpattern,  the  back  reference  matches  the  character
+       string  corresponding  to  the previous iteration. In order for this to
+       work, the pattern must be such that the first iteration does  not  need
+       to  match the back reference. This can be done using alternation, as in
+       the example above, or by a quantifier with a minimum of zero.
+
+
+ASSERTIONS
+
+       An assertion is a test on the characters  following  or  preceding  the
+       current  matching  point that does not actually consume any characters.
+       The simple assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are
+       described above.
+
+       More  complicated  assertions  are  coded as subpatterns. There are two
+       kinds: those that look ahead of the current  position  in  the  subject
+       string,  and  those  that  look  behind  it. An assertion subpattern is
+       matched in the normal way, except that it does not  cause  the  current
+       matching position to be changed.
+
+       Assertion  subpatterns  are  not  capturing subpatterns, and may not be
+       repeated, because it makes no sense to assert the  same  thing  several
+       times.  If  any kind of assertion contains capturing subpatterns within
+       it, these are counted for the purposes of numbering the capturing  sub-
+       patterns in the whole pattern.  However, substring capturing is carried
+       out only for positive assertions, because it does not  make  sense  for
+       negative assertions.
+
+   Lookahead assertions
+
+       Lookahead assertions start with (?= for positive assertions and (?! for
+       negative assertions. For example,
+
+         \w+(?=;)
+
+       matches a word followed by a semicolon, but does not include the  semi-
+       colon in the match, and
+
+         foo(?!bar)
+
+       matches  any  occurrence  of  "foo" that is not followed by "bar". Note
+       that the apparently similar pattern
+
+         (?!foo)bar
+
+       does not find an occurrence of "bar"  that  is  preceded  by  something
+       other  than "foo"; it finds any occurrence of "bar" whatsoever, because
+       the assertion (?!foo) is always true when the next three characters are
+       "bar". A lookbehind assertion is needed to achieve the other effect.
+
+       If you want to force a matching failure at some point in a pattern, the
+       most convenient way to do it is  with  (?!)  because  an  empty  string
+       always  matches, so an assertion that requires there not to be an empty
+       string must always fail.   The  Perl  5.10  backtracking  control  verb
+       (*FAIL) or (*F) is essentially a synonym for (?!).
+
+   Lookbehind assertions
+
+       Lookbehind  assertions start with (?<= for positive assertions and (?<!
+       for negative assertions. For example,
+
+         (?<!foo)bar
+
+       does find an occurrence of "bar" that is not  preceded  by  "foo".  The
+       contents  of  a  lookbehind  assertion are restricted such that all the
+       strings it matches must have a fixed length. However, if there are sev-
+       eral  top-level  alternatives,  they  do  not all have to have the same
+       fixed length. Thus
+
+         (?<=bullock|donkey)
+
+       is permitted, but
+
+         (?<!dogs?|cats?)
+
+       causes an error at compile time. Branches that match  different  length
+       strings  are permitted only at the top level of a lookbehind assertion.
+       This is an extension compared with Perl (5.8 and 5.10), which  requires
+       all branches to match the same length of string. An assertion such as
+
+         (?<=ab(c|de))
+
+       is  not  permitted,  because  its single top-level branch can match two
+       different lengths, but it is acceptable to PCRE if rewritten to use two
+       top-level branches:
+
+         (?<=abc|abde)
+
+       In some cases, the Perl 5.10 escape sequence \K (see above) can be used
+       instead of  a  lookbehind  assertion  to  get  round  the  fixed-length
+       restriction.
+
+       The  implementation  of lookbehind assertions is, for each alternative,
+       to temporarily move the current position back by the fixed  length  and
+       then try to match. If there are insufficient characters before the cur-
+       rent position, the assertion fails.
+
+       PCRE does not allow the \C escape (which matches a single byte in UTF-8
+       mode)  to appear in lookbehind assertions, because it makes it impossi-
+       ble to calculate the length of the lookbehind. The \X and  \R  escapes,
+       which can match different numbers of bytes, are also not permitted.
+
+       "Subroutine"  calls  (see below) such as (?2) or (?&X) are permitted in
+       lookbehinds, as long as the subpattern matches a  fixed-length  string.
+       Recursion, however, is not supported.
+
+       Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
+       assertions to specify efficient matching of fixed-length strings at the
+       end of subject strings. Consider a simple pattern such as
+
+         abcd$
+
+       when  applied  to  a  long string that does not match. Because matching
+       proceeds from left to right, PCRE will look for each "a" in the subject
+       and  then  see  if what follows matches the rest of the pattern. If the
+       pattern is specified as
+
+         ^.*abcd$
+
+       the initial .* matches the entire string at first, but when this  fails
+       (because there is no following "a"), it backtracks to match all but the
+       last character, then all but the last two characters, and so  on.  Once
+       again  the search for "a" covers the entire string, from right to left,
+       so we are no better off. However, if the pattern is written as
+
+         ^.*+(?<=abcd)
+
+       there can be no backtracking for the .*+ item; it can  match  only  the
+       entire  string.  The subsequent lookbehind assertion does a single test
+       on the last four characters. If it fails, the match fails  immediately.
+       For  long  strings, this approach makes a significant difference to the
+       processing time.
+
+   Using multiple assertions
+
+       Several assertions (of any sort) may occur in succession. For example,
+
+         (?<=\d{3})(?<!999)foo
+
+       matches "foo" preceded by three digits that are not "999". Notice  that
+       each  of  the  assertions is applied independently at the same point in
+       the subject string. First there is a  check  that  the  previous  three
+       characters  are  all  digits,  and  then there is a check that the same
+       three characters are not "999".  This pattern does not match "foo" pre-
+       ceded  by  six  characters,  the first of which are digits and the last
+       three of which are not "999". For example, it  doesn't  match  "123abc-
+       foo". A pattern to do that is
+
+         (?<=\d{3}...)(?<!999)foo
+
+       This  time  the  first assertion looks at the preceding six characters,
+       checking that the first three are digits, and then the second assertion
+       checks that the preceding three characters are not "999".
+
+       Assertions can be nested in any combination. For example,
+
+         (?<=(?<!foo)bar)baz
+
+       matches  an occurrence of "baz" that is preceded by "bar" which in turn
+       is not preceded by "foo", while
+
+         (?<=\d{3}(?!999)...)foo
+
+       is another pattern that matches "foo" preceded by three digits and  any
+       three characters that are not "999".
+
+
+CONDITIONAL SUBPATTERNS
+
+       It  is possible to cause the matching process to obey a subpattern con-
+       ditionally or to choose between two alternative subpatterns,  depending
+       on  the result of an assertion, or whether a specific capturing subpat-
+       tern has already been matched. The two possible  forms  of  conditional
+       subpattern are:
+
+         (?(condition)yes-pattern)
+         (?(condition)yes-pattern|no-pattern)
+
+       If  the  condition is satisfied, the yes-pattern is used; otherwise the
+       no-pattern (if present) is used. If there are more  than  two  alterna-
+       tives in the subpattern, a compile-time error occurs.
+
+       There  are  four  kinds of condition: references to subpatterns, refer-
+       ences to recursion, a pseudo-condition called DEFINE, and assertions.
+
+   Checking for a used subpattern by number
+
+       If the text between the parentheses consists of a sequence  of  digits,
+       the condition is true if a capturing subpattern of that number has pre-
+       viously matched. If there is more than one  capturing  subpattern  with
+       the  same  number  (see  the earlier section about duplicate subpattern
+       numbers), the condition is true if any of them have been set. An alter-
+       native  notation is to precede the digits with a plus or minus sign. In
+       this case, the subpattern number is relative rather than absolute.  The
+       most  recently opened parentheses can be referenced by (?(-1), the next
+       most recent by (?(-2), and so on. In looping  constructs  it  can  also
+       make  sense  to  refer  to  subsequent  groups  with constructs such as
+       (?(+2).
+
+       Consider the following pattern, which  contains  non-significant  white
+       space to make it more readable (assume the PCRE_EXTENDED option) and to
+       divide it into three parts for ease of discussion:
+
+         ( \( )?    [^()]+    (?(1) \) )
+
+       The first part matches an optional opening  parenthesis,  and  if  that
+       character is present, sets it as the first captured substring. The sec-
+       ond part matches one or more characters that are not  parentheses.  The
+       third part is a conditional subpattern that tests whether the first set
+       of parentheses matched or not. If they did, that is, if subject started
+       with an opening parenthesis, the condition is true, and so the yes-pat-
+       tern is executed and a  closing  parenthesis  is  required.  Otherwise,
+       since  no-pattern  is  not  present, the subpattern matches nothing. In
+       other words,  this  pattern  matches  a  sequence  of  non-parentheses,
+       optionally enclosed in parentheses.
+
+       If  you  were  embedding  this pattern in a larger one, you could use a
+       relative reference:
+
+         ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
+
+       This makes the fragment independent of the parentheses  in  the  larger
+       pattern.
+
+   Checking for a used subpattern by name
+
+       Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
+       used subpattern by name. For compatibility  with  earlier  versions  of
+       PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
+       also recognized. However, there is a possible ambiguity with this  syn-
+       tax,  because  subpattern  names  may  consist entirely of digits. PCRE
+       looks first for a named subpattern; if it cannot find one and the  name
+       consists  entirely  of digits, PCRE looks for a subpattern of that num-
+       ber, which must be greater than zero. Using subpattern names that  con-
+       sist entirely of digits is not recommended.
+
+       Rewriting the above example to use a named subpattern gives this:
+
+         (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
+
+       If  the  name used in a condition of this kind is a duplicate, the test
+       is applied to all subpatterns of the same name, and is true if any  one
+       of them has matched.
+
+   Checking for pattern recursion
+
+       If the condition is the string (R), and there is no subpattern with the
+       name R, the condition is true if a recursive call to the whole  pattern
+       or any subpattern has been made. If digits or a name preceded by amper-
+       sand follow the letter R, for example:
+
+         (?(R3)...) or (?(R&name)...)
+
+       the condition is true if the most recent recursion is into a subpattern
+       whose number or name is given. This condition does not check the entire
+       recursion stack. If the name used in a condition  of  this  kind  is  a
+       duplicate, the test is applied to all subpatterns of the same name, and
+       is true if any one of them is the most recent recursion.
+
+       At "top level", all these recursion test  conditions  are  false.   The
+       syntax for recursive patterns is described below.
+
+   Defining subpatterns for use by reference only
+
+       If  the  condition  is  the string (DEFINE), and there is no subpattern
+       with the name DEFINE, the condition is  always  false.  In  this  case,
+       there  may  be  only  one  alternative  in the subpattern. It is always
+       skipped if control reaches this point  in  the  pattern;  the  idea  of
+       DEFINE  is that it can be used to define "subroutines" that can be ref-
+       erenced from elsewhere. (The use of "subroutines" is described  below.)
+       For  example,  a pattern to match an IPv4 address could be written like
+       this (ignore whitespace and line breaks):
+
+         (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
+         \b (?&byte) (\.(?&byte)){3} \b
+
+       The first part of the pattern is a DEFINE group inside which a  another
+       group  named "byte" is defined. This matches an individual component of
+       an IPv4 address (a number less than 256). When  matching  takes  place,
+       this  part  of  the pattern is skipped because DEFINE acts like a false
+       condition. The rest of the pattern uses references to the  named  group
+       to  match the four dot-separated components of an IPv4 address, insist-
+       ing on a word boundary at each end.
+
+   Assertion conditions
+
+       If the condition is not in any of the above  formats,  it  must  be  an
+       assertion.   This may be a positive or negative lookahead or lookbehind
+       assertion. Consider  this  pattern,  again  containing  non-significant
+       white space, and with the two alternatives on the second line:
+
+         (?(?=[^a-z]*[a-z])
+         \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
+
+       The  condition  is  a  positive  lookahead  assertion  that  matches an
+       optional sequence of non-letters followed by a letter. In other  words,
+       it  tests  for the presence of at least one letter in the subject. If a
+       letter is found, the subject is matched against the first  alternative;
+       otherwise  it  is  matched  against  the  second.  This pattern matches
+       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
+       letters and dd are digits.
+
+
+COMMENTS
+
+       The  sequence (?# marks the start of a comment that continues up to the
+       next closing parenthesis. Nested parentheses  are  not  permitted.  The
+       characters  that make up a comment play no part in the pattern matching
+       at all.
+
+       If the PCRE_EXTENDED option is set, an unescaped # character outside  a
+       character  class  introduces  a  comment  that continues to immediately
+       after the next newline in the pattern.
+
+
+RECURSIVE PATTERNS
+
+       Consider the problem of matching a string in parentheses, allowing  for
+       unlimited  nested  parentheses.  Without the use of recursion, the best
+       that can be done is to use a pattern that  matches  up  to  some  fixed
+       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
+       depth.
+
+       For some time, Perl has provided a facility that allows regular expres-
+       sions  to recurse (amongst other things). It does this by interpolating
+       Perl code in the expression at run time, and the code can refer to  the
+       expression itself. A Perl pattern using code interpolation to solve the
+       parentheses problem can be created like this:
+
+         $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
+
+       The (?p{...}) item interpolates Perl code at run time, and in this case
+       refers recursively to the pattern in which it appears.
+
+       Obviously, PCRE cannot support the interpolation of Perl code. Instead,
+       it supports special syntax for recursion of  the  entire  pattern,  and
+       also  for  individual  subpattern  recursion. After its introduction in
+       PCRE and Python, this kind of  recursion  was  subsequently  introduced
+       into Perl at release 5.10.
+
+       A  special  item  that consists of (? followed by a number greater than
+       zero and a closing parenthesis is a recursive call of the subpattern of
+       the  given  number, provided that it occurs inside that subpattern. (If
+       not, it is a "subroutine" call, which is described  in  the  next  sec-
+       tion.)  The special item (?R) or (?0) is a recursive call of the entire
+       regular expression.
+
+       This PCRE pattern solves the nested  parentheses  problem  (assume  the
+       PCRE_EXTENDED option is set so that white space is ignored):
+
+         \( ( [^()]++ | (?R) )* \)
+
+       First  it matches an opening parenthesis. Then it matches any number of
+       substrings which can either be a  sequence  of  non-parentheses,  or  a
+       recursive  match  of the pattern itself (that is, a correctly parenthe-
+       sized substring).  Finally there is a closing parenthesis. Note the use
+       of a possessive quantifier to avoid backtracking into sequences of non-
+       parentheses.
+
+       If this were part of a larger pattern, you would not  want  to  recurse
+       the entire pattern, so instead you could use this:
+
+         ( \( ( [^()]++ | (?1) )* \) )
+
+       We  have  put the pattern into parentheses, and caused the recursion to
+       refer to them instead of the whole pattern.
+
+       In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
+       tricky.  This  is made easier by the use of relative references (a Perl
+       5.10 feature).  Instead of (?1) in the  pattern  above  you  can  write
+       (?-2) to refer to the second most recently opened parentheses preceding
+       the recursion. In other  words,  a  negative  number  counts  capturing
+       parentheses leftwards from the point at which it is encountered.
+
+       It  is  also  possible  to refer to subsequently opened parentheses, by
+       writing references such as (?+2). However, these  cannot  be  recursive
+       because  the  reference  is  not inside the parentheses that are refer-
+       enced. They are always "subroutine" calls, as  described  in  the  next
+       section.
+
+       An  alternative  approach is to use named parentheses instead. The Perl
+       syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
+       supported. We could rewrite the above example as follows:
+
+         (?<pn> \( ( [^()]++ | (?&pn) )* \) )
+
+       If  there  is more than one subpattern with the same name, the earliest
+       one is used.
+
+       This particular example pattern that we have been looking  at  contains
+       nested unlimited repeats, and so the use of a possessive quantifier for
+       matching strings of non-parentheses is important when applying the pat-
+       tern  to  strings  that do not match. For example, when this pattern is
+       applied to
+
+         (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
+
+       it yields "no match" quickly. However, if a  possessive  quantifier  is
+       not  used, the match runs for a very long time indeed because there are
+       so many different ways the + and * repeats can carve  up  the  subject,
+       and all have to be tested before failure can be reported.
+
+       At  the  end  of a match, the values of capturing parentheses are those
+       from the outermost level. If you want to obtain intermediate values,  a
+       callout  function can be used (see below and the pcrecallout documenta-
+       tion). If the pattern above is matched against
+
+         (ab(cd)ef)
+
+       the value for the inner capturing parentheses  (numbered  2)  is  "ef",
+       which  is the last value taken on at the top level. If a capturing sub-
+       pattern is not matched at the top level, its final value is unset, even
+       if it is (temporarily) set at a deeper level.
+
+       If  there are more than 15 capturing parentheses in a pattern, PCRE has
+       to obtain extra memory to store data during a recursion, which it  does
+       by using pcre_malloc, freeing it via pcre_free afterwards. If no memory
+       can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
+
+       Do not confuse the (?R) item with the condition (R),  which  tests  for
+       recursion.   Consider  this pattern, which matches text in angle brack-
+       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
+       brackets  (that is, when recursing), whereas any characters are permit-
+       ted at the outer level.
+
+         < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
+
+       In this pattern, (?(R) is the start of a conditional  subpattern,  with
+       two  different  alternatives for the recursive and non-recursive cases.
+       The (?R) item is the actual recursive call.
+
+   Recursion difference from Perl
+
+       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
+       always treated as an atomic group. That is, once it has matched some of
+       the subject string, it is never re-entered, even if it contains untried
+       alternatives  and  there  is a subsequent matching failure. This can be
+       illustrated by the following pattern, which purports to match a  palin-
+       dromic  string  that contains an odd number of characters (for example,
+       "a", "aba", "abcba", "abcdcba"):
+
+         ^(.|(.)(?1)\2)$
+
+       The idea is that it either matches a single character, or two identical
+       characters  surrounding  a sub-palindrome. In Perl, this pattern works;
+       in PCRE it does not if the pattern is  longer  than  three  characters.
+       Consider the subject string "abcba":
+
+       At  the  top level, the first character is matched, but as it is not at
+       the end of the string, the first alternative fails; the second alterna-
+       tive is taken and the recursion kicks in. The recursive call to subpat-
+       tern 1 successfully matches the next character ("b").  (Note  that  the
+       beginning and end of line tests are not part of the recursion).
+
+       Back  at  the top level, the next character ("c") is compared with what
+       subpattern 2 matched, which was "a". This fails. Because the  recursion
+       is  treated  as  an atomic group, there are now no backtracking points,
+       and so the entire match fails. (Perl is able, at  this  point,  to  re-
+       enter  the  recursion  and try the second alternative.) However, if the
+       pattern is written with the alternatives in the other order, things are
+       different:
+
+         ^((.)(?1)\2|.)$
+
+       This  time,  the recursing alternative is tried first, and continues to
+       recurse until it runs out of characters, at which point  the  recursion
+       fails.  But  this  time  we  do  have another alternative to try at the
+       higher level. That is the big difference:  in  the  previous  case  the
+       remaining alternative is at a deeper recursion level, which PCRE cannot
+       use.
+
+       To change the pattern so that matches all palindromic strings, not just
+       those  with  an  odd number of characters, it is tempting to change the
+       pattern to this:
+
+         ^((.)(?1)\2|.?)$
+
+       Again, this works in Perl, but not in PCRE, and for  the  same  reason.
+       When  a  deeper  recursion has matched a single character, it cannot be
+       entered again in order to match an empty string.  The  solution  is  to
+       separate  the two cases, and write out the odd and even cases as alter-
+       natives at the higher level:
+
+         ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
+
+       If you want to match typical palindromic phrases, the  pattern  has  to
+       ignore all non-word characters, which can be done like this:
+
+         ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
+
+       If run with the PCRE_CASELESS option, this pattern matches phrases such
+       as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
+       Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-
+       ing into sequences of non-word characters. Without this, PCRE  takes  a
+       great  deal  longer  (ten  times or more) to match typical phrases, and
+       Perl takes so long that you think it has gone into a loop.
+
+       WARNING: The palindrome-matching patterns above work only if  the  sub-
+       ject  string  does not start with a palindrome that is shorter than the
+       entire string.  For example, although "abcba" is correctly matched,  if
+       the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,
+       then fails at top level because the end of the string does not  follow.
+       Once  again, it cannot jump back into the recursion to try other alter-
+       natives, so the entire match fails.
+
+
+SUBPATTERNS AS SUBROUTINES
+
+       If the syntax for a recursive subpattern reference (either by number or
+       by  name)  is used outside the parentheses to which it refers, it oper-
+       ates like a subroutine in a programming language. The "called"  subpat-
+       tern may be defined before or after the reference. A numbered reference
+       can be absolute or relative, as in these examples:
+
+         (...(absolute)...)...(?2)...
+         (...(relative)...)...(?-1)...
+         (...(?+1)...(relative)...
+
+       An earlier example pointed out that the pattern
+
+         (sens|respons)e and \1ibility
+
+       matches "sense and sensibility" and "response and responsibility",  but
+       not "sense and responsibility". If instead the pattern
+
+         (sens|respons)e and (?1)ibility
+
+       is  used, it does match "sense and responsibility" as well as the other
+       two strings. Another example is  given  in  the  discussion  of  DEFINE
+       above.
+
+       Like  recursive  subpatterns, a subroutine call is always treated as an
+       atomic group. That is, once it has matched some of the subject  string,
+       it  is  never  re-entered, even if it contains untried alternatives and
+       there is a subsequent matching failure. Any capturing parentheses  that
+       are  set  during  the  subroutine  call revert to their previous values
+       afterwards.
+
+       When a subpattern is used as a subroutine, processing options  such  as
+       case-independence are fixed when the subpattern is defined. They cannot
+       be changed for different calls. For example, consider this pattern:
+
+         (abc)(?i:(?-1))
+
+       It matches "abcabc". It does not match "abcABC" because the  change  of
+       processing option does not affect the called subpattern.
+
+
+ONIGURUMA SUBROUTINE SYNTAX
+
+       For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+       name or a number enclosed either in angle brackets or single quotes, is
+       an  alternative  syntax  for  referencing a subpattern as a subroutine,
+       possibly recursively. Here are two of the examples used above,  rewrit-
+       ten using this syntax:
+
+         (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
+         (sens|respons)e and \g'1'ibility
+
+       PCRE  supports  an extension to Oniguruma: if a number is preceded by a
+       plus or a minus sign it is taken as a relative reference. For example:
+
+         (abc)(?i:\g<-1>)
+
+       Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
+       synonymous.  The former is a back reference; the latter is a subroutine
+       call.
+
+
+CALLOUTS
+
+       Perl has a feature whereby using the sequence (?{...}) causes arbitrary
+       Perl  code to be obeyed in the middle of matching a regular expression.
+       This makes it possible, amongst other things, to extract different sub-
+       strings that match the same pair of parentheses when there is a repeti-
+       tion.
+
+       PCRE provides a similar feature, but of course it cannot obey arbitrary
+       Perl code. The feature is called "callout". The caller of PCRE provides
+       an external function by putting its entry point in the global  variable
+       pcre_callout.   By default, this variable contains NULL, which disables
+       all calling out.
+
+       Within a regular expression, (?C) indicates the  points  at  which  the
+       external  function  is  to be called. If you want to identify different
+       callout points, you can put a number less than 256 after the letter  C.
+       The  default  value is zero.  For example, this pattern has two callout
+       points:
+
+         (?C1)abc(?C2)def
+
+       If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
+       automatically  installed  before each item in the pattern. They are all
+       numbered 255.
+
+       During matching, when PCRE reaches a callout point (and pcre_callout is
+       set),  the  external function is called. It is provided with the number
+       of the callout, the position in the pattern, and, optionally, one  item
+       of  data  originally supplied by the caller of pcre_exec(). The callout
+       function may cause matching to proceed, to backtrack, or to fail  alto-
+       gether. A complete description of the interface to the callout function
+       is given in the pcrecallout documentation.
+
+
+BACKTRACKING CONTROL
+
+       Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
+       which are described in the Perl documentation as "experimental and sub-
+       ject to change or removal in a future version of Perl". It goes  on  to
+       say:  "Their usage in production code should be noted to avoid problems
+       during upgrades." The same remarks apply to the PCRE features described
+       in this section.
+
+       Since  these  verbs  are  specifically related to backtracking, most of
+       them can be  used  only  when  the  pattern  is  to  be  matched  using
+       pcre_exec(), which uses a backtracking algorithm. With the exception of
+       (*FAIL), which behaves like a failing negative assertion, they cause an
+       error if encountered by pcre_dfa_exec().
+
+       If any of these verbs are used in an assertion or subroutine subpattern
+       (including recursive subpatterns), their effect  is  confined  to  that
+       subpattern;  it  does  not extend to the surrounding pattern. Note that
+       such subpatterns are processed as anchored at the point where they  are
+       tested.
+
+       The  new verbs make use of what was previously invalid syntax: an open-
+       ing parenthesis followed by an asterisk. In Perl, they are generally of
+       the form (*VERB:ARG) but PCRE does not support the use of arguments, so
+       its general form is just (*VERB). Any number of these verbs  may  occur
+       in a pattern. There are two kinds:
+
+   Verbs that act immediately
+
+       The following verbs act as soon as they are encountered:
+
+          (*ACCEPT)
+
+       This  verb causes the match to end successfully, skipping the remainder
+       of the pattern. When inside a recursion, only the innermost pattern  is
+       ended  immediately.  If  (*ACCEPT) is inside capturing parentheses, the
+       data so far is captured. (This feature was added  to  PCRE  at  release
+       8.00.) For example:
+
+         A((?:A|B(*ACCEPT)|C)D)
+
+       This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-
+       tured by the outer parentheses.
+
+         (*FAIL) or (*F)
+
+       This verb causes the match to fail, forcing backtracking to  occur.  It
+       is  equivalent to (?!) but easier to read. The Perl documentation notes
+       that it is probably useful only when combined  with  (?{})  or  (??{}).
+       Those  are,  of course, Perl features that are not present in PCRE. The
+       nearest equivalent is the callout feature, as for example in this  pat-
+       tern:
+
+         a+(?C)(*FAIL)
+
+       A  match  with the string "aaaa" always fails, but the callout is taken
+       before each backtrack happens (in this example, 10 times).
+
+   Verbs that act after backtracking
+
+       The following verbs do nothing when they are encountered. Matching con-
+       tinues  with what follows, but if there is no subsequent match, a fail-
+       ure is forced.  The verbs  differ  in  exactly  what  kind  of  failure
+       occurs.
+
+         (*COMMIT)
+
+       This  verb  causes  the whole match to fail outright if the rest of the
+       pattern does not match. Even if the pattern is unanchored,  no  further
+       attempts  to  find  a match by advancing the starting point take place.
+       Once (*COMMIT) has been passed, pcre_exec() is committed to  finding  a
+       match at the current starting point, or not at all. For example:
+
+         a+(*COMMIT)b
+
+       This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
+       of dynamic anchor, or "I've started, so I must finish."
+
+         (*PRUNE)
+
+       This verb causes the match to fail at the current position if the  rest
+       of the pattern does not match. If the pattern is unanchored, the normal
+       "bumpalong" advance to the next starting character then happens.  Back-
+       tracking  can  occur as usual to the left of (*PRUNE), or when matching
+       to the right of (*PRUNE), but if there is no match to the right,  back-
+       tracking  cannot  cross (*PRUNE).  In simple cases, the use of (*PRUNE)
+       is just an alternative to an atomic group or possessive quantifier, but
+       there  are  some uses of (*PRUNE) that cannot be expressed in any other
+       way.
+
+         (*SKIP)
+
+       This verb is like (*PRUNE), except that if the pattern  is  unanchored,
+       the  "bumpalong" advance is not to the next character, but to the posi-
+       tion in the subject where (*SKIP) was  encountered.  (*SKIP)  signifies
+       that  whatever  text  was  matched leading up to it cannot be part of a
+       successful match. Consider:
+
+         a+(*SKIP)b
+
+       If the subject is "aaaac...",  after  the  first  match  attempt  fails
+       (starting  at  the  first  character in the string), the starting point
+       skips on to start the next attempt at "c". Note that a possessive quan-
+       tifer  does not have the same effect as this example; although it would
+       suppress backtracking  during  the  first  match  attempt,  the  second
+       attempt  would  start at the second character instead of skipping on to
+       "c".
+
+         (*THEN)
+
+       This verb causes a skip to the next alternation if the rest of the pat-
+       tern does not match. That is, it cancels pending backtracking, but only
+       within the current alternation. Its name  comes  from  the  observation
+       that it can be used for a pattern-based if-then-else block:
+
+         ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
+
+       If  the COND1 pattern matches, FOO is tried (and possibly further items
+       after the end of the group if FOO succeeds);  on  failure  the  matcher
+       skips  to  the second alternative and tries COND2, without backtracking
+       into COND1. If (*THEN) is used outside  of  any  alternation,  it  acts
+       exactly like (*PRUNE).
+
+
+SEE ALSO
+
+       pcreapi(3), pcrecallout(3), pcrematching(3), pcresyntax(3), pcre(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 18 October 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRESYNTAX(3)                                                    PCRESYNTAX(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE REGULAR EXPRESSION SYNTAX SUMMARY
+
+       The  full syntax and semantics of the regular expressions that are sup-
+       ported by PCRE are described in  the  pcrepattern  documentation.  This
+       document contains just a quick-reference summary of the syntax.
+
+
+QUOTING
+
+         \x         where x is non-alphanumeric is a literal x
+         \Q...\E    treat enclosed characters as literal
+
+
+CHARACTERS
+
+         \a         alarm, that is, the BEL character (hex 07)
+         \cx        "control-x", where x is any character
+         \e         escape (hex 1B)
+         \f         formfeed (hex 0C)
+         \n         newline (hex 0A)
+         \r         carriage return (hex 0D)
+         \t         tab (hex 09)
+         \ddd       character with octal code ddd, or backreference
+         \xhh       character with hex code hh
+         \x{hhh..}  character with hex code hhh..
+
+
+CHARACTER TYPES
+
+         .          any character except newline;
+                      in dotall mode, any character whatsoever
+         \C         one byte, even in UTF-8 mode (best avoided)
+         \d         a decimal digit
+         \D         a character that is not a decimal digit
+         \h         a horizontal whitespace character
+         \H         a character that is not a horizontal whitespace character
+         \p{xx}     a character with the xx property
+         \P{xx}     a character without the xx property
+         \R         a newline sequence
+         \s         a whitespace character
+         \S         a character that is not a whitespace character
+         \v         a vertical whitespace character
+         \V         a character that is not a vertical whitespace character
+         \w         a "word" character
+         \W         a "non-word" character
+         \X         an extended Unicode sequence
+
+       In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
+
+
+GENERAL CATEGORY PROPERTY CODES FOR \p and \P
+
+         C          Other
+         Cc         Control
+         Cf         Format
+         Cn         Unassigned
+         Co         Private use
+         Cs         Surrogate
+
+         L          Letter
+         Ll         Lower case letter
+         Lm         Modifier letter
+         Lo         Other letter
+         Lt         Title case letter
+         Lu         Upper case letter
+         L&         Ll, Lu, or Lt
+
+         M          Mark
+         Mc         Spacing mark
+         Me         Enclosing mark
+         Mn         Non-spacing mark
+
+         N          Number
+         Nd         Decimal number
+         Nl         Letter number
+         No         Other number
+
+         P          Punctuation
+         Pc         Connector punctuation
+         Pd         Dash punctuation
+         Pe         Close punctuation
+         Pf         Final punctuation
+         Pi         Initial punctuation
+         Po         Other punctuation
+         Ps         Open punctuation
+
+         S          Symbol
+         Sc         Currency symbol
+         Sk         Modifier symbol
+         Sm         Mathematical symbol
+         So         Other symbol
+
+         Z          Separator
+         Zl         Line separator
+         Zp         Paragraph separator
+         Zs         Space separator
+
+
+SCRIPT NAMES FOR \p AND \P
+
+       Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
+       Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cu-
+       neiform,  Cypriot,  Cyrillic,  Deseret, Devanagari, Ethiopic, Georgian,
+       Glagolitic, Gothic, Greek, Gujarati, Gurmukhi,  Han,  Hangul,  Hanunoo,
+       Hebrew,  Hiragana,  Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi,
+       Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian,  Malayalam,
+       Mongolian,  Myanmar,  New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian,
+       Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurash-
+       tra,  Shavian,  Sinhala,  Sudanese, Syloti_Nagri, Syriac, Tagalog, Tag-
+       banwa,  Tai_Le,  Tamil,  Telugu,  Thaana,  Thai,   Tibetan,   Tifinagh,
+       Ugaritic, Vai, Yi.
+
+
+CHARACTER CLASSES
+
+         [...]       positive character class
+         [^...]      negative character class
+         [x-y]       range (can be used for hex characters)
+         [[:xxx:]]   positive POSIX named set
+         [[:^xxx:]]  negative POSIX named set
+
+         alnum       alphanumeric
+         alpha       alphabetic
+         ascii       0-127
+         blank       space or tab
+         cntrl       control character
+         digit       decimal digit
+         graph       printing, excluding space
+         lower       lower case letter
+         print       printing, including space
+         punct       printing, excluding alphanumeric
+         space       whitespace
+         upper       upper case letter
+         word        same as \w
+         xdigit      hexadecimal digit
+
+       In PCRE, POSIX character set names recognize only ASCII characters. You
+       can use \Q...\E inside a character class.
+
+
+QUANTIFIERS
+
+         ?           0 or 1, greedy
+         ?+          0 or 1, possessive
+         ??          0 or 1, lazy
+         *           0 or more, greedy
+         *+          0 or more, possessive
+         *?          0 or more, lazy
+         +           1 or more, greedy
+         ++          1 or more, possessive
+         +?          1 or more, lazy
+         {n}         exactly n
+         {n,m}       at least n, no more than m, greedy
+         {n,m}+      at least n, no more than m, possessive
+         {n,m}?      at least n, no more than m, lazy
+         {n,}        n or more, greedy
+         {n,}+       n or more, possessive
+         {n,}?       n or more, lazy
+
+
+ANCHORS AND SIMPLE ASSERTIONS
+
+         \b          word boundary (only ASCII letters recognized)
+         \B          not a word boundary
+         ^           start of subject
+                      also after internal newline in multiline mode
+         \A          start of subject
+         $           end of subject
+                      also before newline at end of subject
+                      also before internal newline in multiline mode
+         \Z          end of subject
+                      also before newline at end of subject
+         \z          end of subject
+         \G          first matching position in subject
+
+
+MATCH POINT RESET
+
+         \K          reset start of match
+
+
+ALTERNATION
+
+         expr|expr|expr...
+
+
+CAPTURING
+
+         (...)           capturing group
+         (?<name>...)    named capturing group (Perl)
+         (?'name'...)    named capturing group (Perl)
+         (?P<name>...)   named capturing group (Python)
+         (?:...)         non-capturing group
+         (?|...)         non-capturing group; reset group numbers for
+                          capturing groups in each alternative
+
+
+ATOMIC GROUPS
+
+         (?>...)         atomic, non-capturing group
+
+
+COMMENT
+
+         (?#....)        comment (not nestable)
+
+
+OPTION SETTING
+
+         (?i)            caseless
+         (?J)            allow duplicate names
+         (?m)            multiline
+         (?s)            single line (dotall)
+         (?U)            default ungreedy (lazy)
+         (?x)            extended (ignore white space)
+         (?-...)         unset option(s)
+
+       The following is recognized only at the start of a pattern or after one
+       of the newline-setting options with similar syntax:
+
+         (*UTF8)         set UTF-8 mode
+
+
+LOOKAHEAD AND LOOKBEHIND ASSERTIONS
+
+         (?=...)         positive look ahead
+         (?!...)         negative look ahead
+         (?<=...)        positive look behind
+         (?<!...)        negative look behind
+
+       Each top-level branch of a look behind must be of a fixed length.
+
+
+BACKREFERENCES
+
+         \n              reference by number (can be ambiguous)
+         \gn             reference by number
+         \g{n}           reference by number
+         \g{-n}          relative reference by number
+         \k<name>        reference by name (Perl)
+         \k'name'        reference by name (Perl)
+         \g{name}        reference by name (Perl)
+         \k{name}        reference by name (.NET)
+         (?P=name)       reference by name (Python)
+
+
+SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)
+
+         (?R)            recurse whole pattern
+         (?n)            call subpattern by absolute number
+         (?+n)           call subpattern by relative number
+         (?-n)           call subpattern by relative number
+         (?&name)        call subpattern by name (Perl)
+         (?P>name)       call subpattern by name (Python)
+         \g<name>        call subpattern by name (Oniguruma)
+         \g'name'        call subpattern by name (Oniguruma)
+         \g<n>           call subpattern by absolute number (Oniguruma)
+         \g'n'           call subpattern by absolute number (Oniguruma)
+         \g<+n>          call subpattern by relative number (PCRE extension)
+         \g'+n'          call subpattern by relative number (PCRE extension)
+         \g<-n>          call subpattern by relative number (PCRE extension)
+         \g'-n'          call subpattern by relative number (PCRE extension)
+
+
+CONDITIONAL PATTERNS
+
+         (?(condition)yes-pattern)
+         (?(condition)yes-pattern|no-pattern)
+
+         (?(n)...        absolute reference condition
+         (?(+n)...       relative reference condition
+         (?(-n)...       relative reference condition
+         (?(<name>)...   named reference condition (Perl)
+         (?('name')...   named reference condition (Perl)
+         (?(name)...     named reference condition (PCRE)
+         (?(R)...        overall recursion condition
+         (?(Rn)...       specific group recursion condition
+         (?(R&name)...   specific recursion condition
+         (?(DEFINE)...   define subpattern for reference
+         (?(assert)...   assertion condition
+
+
+BACKTRACKING CONTROL
+
+       The following act immediately they are reached:
+
+         (*ACCEPT)       force successful match
+         (*FAIL)         force backtrack; synonym (*F)
+
+       The  following  act only when a subsequent match failure causes a back-
+       track to reach them. They all force a match failure, but they differ in
+       what happens afterwards. Those that advance the start-of-match point do
+       so only if the pattern is not anchored.
+
+         (*COMMIT)       overall failure, no advance of starting point
+         (*PRUNE)        advance to next starting character
+         (*SKIP)         advance start to current matching position
+         (*THEN)         local failure, backtrack to next alternation
+
+
+NEWLINE CONVENTIONS
+
+       These are recognized only at the very start of the pattern or  after  a
+       (*BSR_...) or (*UTF8) option.
+
+         (*CR)           carriage return only
+         (*LF)           linefeed only
+         (*CRLF)         carriage return followed by linefeed
+         (*ANYCRLF)      all three of the above
+         (*ANY)          any Unicode newline sequence
+
+
+WHAT \R MATCHES
+
+       These  are  recognized only at the very start of the pattern or after a
+       (*...) option that sets the newline convention or UTF-8 mode.
+
+         (*BSR_ANYCRLF)  CR, LF, or CRLF
+         (*BSR_UNICODE)  any Unicode newline sequence
+
+
+CALLOUTS
+
+         (?C)      callout
+         (?Cn)     callout with data n
+
+
+SEE ALSO
+
+       pcrepattern(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 11 April 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREPARTIAL(3)                                                  PCREPARTIAL(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PARTIAL MATCHING IN PCRE
+
+       In  normal  use  of  PCRE,  if  the  subject  string  that is passed to
+       pcre_exec() or pcre_dfa_exec() matches as far as it goes,  but  is  too
+       short  to  match  the  entire  pattern, PCRE_ERROR_NOMATCH is returned.
+       There are circumstances where it might be helpful to  distinguish  this
+       case from other cases in which there is no match.
+
+       Consider, for example, an application where a human is required to type
+       in data for a field with specific formatting requirements.  An  example
+       might be a date in the form ddmmmyy, defined by this pattern:
+
+         ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
+
+       If the application sees the user's keystrokes one by one, and can check
+       that what has been typed so far is potentially valid,  it  is  able  to
+       raise  an  error  as  soon  as  a  mistake  is made, by beeping and not
+       reflecting the character that has been typed, for example. This immedi-
+       ate  feedback is likely to be a better user interface than a check that
+       is delayed until the entire string has been entered.  Partial  matching
+       can  also  sometimes be useful when the subject string is very long and
+       is not all available at once.
+
+       PCRE supports partial matching by means of  the  PCRE_PARTIAL_SOFT  and
+       PCRE_PARTIAL_HARD options, which can be set when calling pcre_exec() or
+       pcre_dfa_exec(). For backwards compatibility, PCRE_PARTIAL is a synonym
+       for PCRE_PARTIAL_SOFT. The essential difference between the two options
+       is whether or not a partial match is preferred to an  alternative  com-
+       plete  match,  though the details differ between the two matching func-
+       tions. If both options are set, PCRE_PARTIAL_HARD takes precedence.
+
+       Setting a partial matching option disables two of PCRE's optimizations.
+       PCRE  remembers the last literal byte in a pattern, and abandons match-
+       ing immediately if such a byte is not present in  the  subject  string.
+       This  optimization cannot be used for a subject string that might match
+       only partially. If the pattern was  studied,  PCRE  knows  the  minimum
+       length  of  a  matching string, and does not bother to run the matching
+       function on shorter strings. This optimization  is  also  disabled  for
+       partial matching.
+
+
+PARTIAL MATCHING USING pcre_exec()
+
+       A partial match occurs during a call to pcre_exec() whenever the end of
+       the subject string is reached successfully, but  matching  cannot  con-
+       tinue because more characters are needed. However, at least one charac-
+       ter must have been matched. (In other words, a partial match can  never
+       be an empty string.)
+
+       If  PCRE_PARTIAL_SOFT  is  set,  the  partial  match is remembered, but
+       matching continues as normal, and other alternatives in the pattern are
+       tried.   If  no  complete  match  can  be  found,  pcre_exec()  returns
+       PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. If there are at least
+       two slots in the offsets vector, the first of them is set to the offset
+       of the earliest character that was inspected when the partial match was
+       found.  For  convenience,  the  second  offset points to the end of the
+       string so that a substring can easily be identified.
+
+       For the majority of patterns, the first offset identifies the start  of
+       the  partially matched string. However, for patterns that contain look-
+       behind assertions, or \K, or begin with \b or  \B,  earlier  characters
+       have been inspected while carrying out the match. For example:
+
+         /(?<=abc)123/
+
+       This pattern matches "123", but only if it is preceded by "abc". If the
+       subject string is "xyzabc12", the offsets after a partial match are for
+       the  substring  "abc12",  because  all  these  characters are needed if
+       another match is tried with extra characters added.
+
+       If there is more than one partial match, the first one that  was  found
+       provides the data that is returned. Consider this pattern:
+
+         /123\w+X|dogY/
+
+       If  this is matched against the subject string "abc123dog", both alter-
+       natives fail to match, but the end of the  subject  is  reached  during
+       matching,    so    PCRE_ERROR_PARTIAL    is    returned    instead   of
+       PCRE_ERROR_NOMATCH. The  offsets  are  set  to  3  and  9,  identifying
+       "123dog"  as  the first partial match that was found. (In this example,
+       there are two partial matches,  because  "dog"  on  its  own  partially
+       matches the second alternative.)
+
+       If PCRE_PARTIAL_HARD is set for pcre_exec(), it returns PCRE_ERROR_PAR-
+       TIAL as soon as a partial match is found, without continuing to  search
+       for  possible  complete matches. The difference between the two options
+       can be illustrated by a pattern such as:
+
+         /dog(sbody)?/
+
+       This matches either "dog" or "dogsbody", greedily (that is, it  prefers
+       the  longer  string  if  possible). If it is matched against the string
+       "dog" with PCRE_PARTIAL_SOFT, it yields a  complete  match  for  "dog".
+       However, if PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL.
+       On the other hand, if the pattern is made ungreedy the result  is  dif-
+       ferent:
+
+         /dog(sbody)??/
+
+       In  this case the result is always a complete match because pcre_exec()
+       finds that first, and it never continues  after  finding  a  match.  It
+       might  be easier to follow this explanation by thinking of the two pat-
+       terns like this:
+
+         /dog(sbody)?/    is the same as  /dogsbody|dog/
+         /dog(sbody)??/   is the same as  /dog|dogsbody/
+
+       The second pattern will never  match  "dogsbody"  when  pcre_exec()  is
+       used, because it will always find the shorter match first.
+
+
+PARTIAL MATCHING USING pcre_dfa_exec()
+
+       The  pcre_dfa_exec()  function moves along the subject string character
+       by character, without backtracking, searching for all possible  matches
+       simultaneously.  If the end of the subject is reached before the end of
+       the pattern, there is the possibility of a partial  match,  again  pro-
+       vided that at least one character has matched.
+
+       When  PCRE_PARTIAL_SOFT  is set, PCRE_ERROR_PARTIAL is returned only if
+       there have been no complete matches. Otherwise,  the  complete  matches
+       are  returned.   However,  if PCRE_PARTIAL_HARD is set, a partial match
+       takes precedence over any complete matches. The portion of  the  string
+       that  was  inspected when the longest partial match was found is set as
+       the first matching string, provided there are at least two slots in the
+       offsets vector.
+
+       Because  pcre_dfa_exec()  always searches for all possible matches, and
+       there is no difference between greedy and ungreedy repetition, its  be-
+       haviour is different from pcre_exec when PCRE_PARTIAL_HARD is set. Con-
+       sider the string "dog"  matched  against  the  ungreedy  pattern  shown
+       above:
+
+         /dog(sbody)??/
+
+       Whereas  pcre_exec()  stops  as soon as it finds the complete match for
+       "dog", pcre_dfa_exec() also finds the partial match for "dogsbody", and
+       so returns that when PCRE_PARTIAL_HARD is set.
+
+
+PARTIAL MATCHING AND WORD BOUNDARIES
+
+       If  a  pattern ends with one of sequences \b or \B, which test for word
+       boundaries, partial matching with PCRE_PARTIAL_SOFT can  give  counter-
+       intuitive results. Consider this pattern:
+
+         /\bcat\b/
+
+       This matches "cat", provided there is a word boundary at either end. If
+       the subject string is "the cat", the comparison of the final "t" with a
+       following  character  cannot  take  place, so a partial match is found.
+       However, pcre_exec() carries on with normal matching, which matches  \b
+       at  the  end  of  the subject when the last character is a letter, thus
+       finding a complete match. The result, therefore, is not PCRE_ERROR_PAR-
+       TIAL.  The  same  thing  happens  with pcre_dfa_exec(), because it also
+       finds the complete match.
+
+       Using PCRE_PARTIAL_HARD in this  case  does  yield  PCRE_ERROR_PARTIAL,
+       because then the partial match takes precedence.
+
+
+FORMERLY RESTRICTED PATTERNS
+
+       For releases of PCRE prior to 8.00, because of the way certain internal
+       optimizations  were  implemented  in  the  pcre_exec()  function,   the
+       PCRE_PARTIAL  option  (predecessor  of  PCRE_PARTIAL_SOFT) could not be
+       used with all patterns. From release 8.00 onwards, the restrictions  no
+       longer  apply,  and  partial matching with pcre_exec() can be requested
+       for any pattern.
+
+       Items that were formerly restricted were repeated single characters and
+       repeated  metasequences. If PCRE_PARTIAL was set for a pattern that did
+       not conform to the restrictions, pcre_exec() returned  the  error  code
+       PCRE_ERROR_BADPARTIAL  (-13).  This error code is no longer in use. The
+       PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if  a  compiled
+       pattern can be used for partial matching now always returns 1.
+
+
+EXAMPLE OF PARTIAL MATCHING USING PCRETEST
+
+       If  the  escape  sequence  \P  is  present in a pcretest data line, the
+       PCRE_PARTIAL_SOFT option is used for  the  match.  Here  is  a  run  of
+       pcretest that uses the date example quoted above:
+
+           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+         data> 25jun04\P
+          0: 25jun04
+          1: jun
+         data> 25dec3\P
+         Partial match: 23dec3
+         data> 3ju\P
+         Partial match: 3ju
+         data> 3juj\P
+         No match
+         data> j\P
+         No match
+
+       The  first  data  string  is  matched completely, so pcretest shows the
+       matched substrings. The remaining four strings do not  match  the  com-
+       plete pattern, but the first two are partial matches. Similar output is
+       obtained when pcre_dfa_exec() is used.
+
+       If the escape sequence \P is present more than once in a pcretest  data
+       line, the PCRE_PARTIAL_HARD option is set for the match.
+
+
+MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()
+
+       When a partial match has been found using pcre_dfa_exec(), it is possi-
+       ble to continue the match by  providing  additional  subject  data  and
+       calling  pcre_dfa_exec()  again  with the same compiled regular expres-
+       sion, this time setting the PCRE_DFA_RESTART option. You must pass  the
+       same working space as before, because this is where details of the pre-
+       vious partial match are stored. Here  is  an  example  using  pcretest,
+       using  the  \R  escape  sequence to set the PCRE_DFA_RESTART option (\D
+       specifies the use of pcre_dfa_exec()):
+
+           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+         data> 23ja\P\D
+         Partial match: 23ja
+         data> n05\R\D
+          0: n05
+
+       The first call has "23ja" as the subject, and requests  partial  match-
+       ing;  the  second  call  has  "n05"  as  the  subject for the continued
+       (restarted) match.  Notice that when the match is  complete,  only  the
+       last  part  is  shown;  PCRE  does not retain the previously partially-
+       matched string. It is up to the calling program to do that if it  needs
+       to.
+
+       You  can  set  the  PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
+       PCRE_DFA_RESTART to continue partial matching over  multiple  segments.
+       This  facility  can  be  used  to  pass  very  long  subject strings to
+       pcre_dfa_exec().
+
+
+MULTI-SEGMENT MATCHING WITH pcre_exec()
+
+       From release 8.00, pcre_exec() can also be  used  to  do  multi-segment
+       matching.  Unlike  pcre_dfa_exec(),  it  is not possible to restart the
+       previous match with a new segment of data. Instead, new  data  must  be
+       added  to  the  previous  subject  string, and the entire match re-run,
+       starting from the point where the partial match occurred. Earlier  data
+       can be discarded.  Consider an unanchored pattern that matches dates:
+
+           re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
+         data> The date is 23ja\P
+         Partial match: 23ja
+
+       At  this stage, an application could discard the text preceding "23ja",
+       add on text from the next segment, and call pcre_exec()  again.  Unlike
+       pcre_dfa_exec(),  the  entire matching string must always be available,
+       and the complete matching process occurs for each call, so more  memory
+       and more processing time is needed.
+
+       Note:  If  the pattern contains lookbehind assertions, or \K, or starts
+       with \b or \B, the string that is returned for  a  partial  match  will
+       include  characters  that  precede the partially matched string itself,
+       because these must be retained when adding on  more  characters  for  a
+       subsequent matching attempt.
+
+
+ISSUES WITH MULTI-SEGMENT MATCHING
+
+       Certain types of pattern may give problems with multi-segment matching,
+       whichever matching function is used.
+
+       1. If the pattern contains tests for the beginning or end  of  a  line,
+       you  need  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri-
+       ate, when the subject string for any call does not contain  the  begin-
+       ning or end of a line.
+
+       2.  Lookbehind  assertions at the start of a pattern are catered for in
+       the offsets that are returned for a partial match. However, in  theory,
+       a  lookbehind assertion later in the pattern could require even earlier
+       characters to be inspected, and it might not have been reached  when  a
+       partial  match occurs. This is probably an extremely unlikely case; you
+       could guard against it to a certain extent by  always  including  extra
+       characters at the start.
+
+       3.  Matching  a subject string that is split into multiple segments may
+       not always produce exactly the same result as matching over one  single
+       long  string,  especially  when  PCRE_PARTIAL_SOFT is used. The section
+       "Partial Matching and Word Boundaries" above describes  an  issue  that
+       arises  if  the  pattern ends with \b or \B. Another kind of difference
+       may occur when there are multiple  matching  possibilities,  because  a
+       partial match result is given only when there are no completed matches.
+       This means that as soon as the shortest match has been found, continua-
+       tion  to  a  new subject segment is no longer possible.  Consider again
+       this pcretest example:
+
+           re> /dog(sbody)?/
+         data> dogsb\P
+          0: dog
+         data> do\P\D
+         Partial match: do
+         data> gsb\R\P\D
+          0: g
+         data> dogsbody\D
+          0: dogsbody
+          1: dog
+
+       The first data line passes the string "dogsb" to  pcre_exec(),  setting
+       the  PCRE_PARTIAL_SOFT  option.  Although the string is a partial match
+       for "dogsbody", the  result  is  not  PCRE_ERROR_PARTIAL,  because  the
+       shorter  string  "dog" is a complete match. Similarly, when the subject
+       is presented to pcre_dfa_exec() in several parts ("do" and "gsb"  being
+       the first two) the match stops when "dog" has been found, and it is not
+       possible to continue. On the other hand, if "dogsbody" is presented  as
+       a single string, pcre_dfa_exec() finds both matches.
+
+       Because of these problems, it is probably best to use PCRE_PARTIAL_HARD
+       when matching multi-segment data. The example above then  behaves  dif-
+       ferently:
+
+           re> /dog(sbody)?/
+         data> dogsb\P\P
+         Partial match: dogsb
+         data> do\P\D
+         Partial match: do
+         data> gsb\R\P\P\D
+         Partial match: gsb
+
+
+       4. Patterns that contain alternatives at the top level which do not all
+       start with the  same  pattern  item  may  not  work  as  expected  when
+       PCRE_DFA_RESTART  is  used  with pcre_dfa_exec(). For example, consider
+       this pattern:
+
+         1234|3789
+
+       If the first part of the subject is "ABC123", a partial  match  of  the
+       first  alternative  is found at offset 3. There is no partial match for
+       the second alternative, because such a match does not start at the same
+       point  in  the  subject  string. Attempting to continue with the string
+       "7890" does not yield a match  because  only  those  alternatives  that
+       match  at  one  point in the subject are remembered. The problem arises
+       because the start of the second alternative matches  within  the  first
+       alternative.  There  is  no  problem with anchored patterns or patterns
+       such as:
+
+         1234|ABCD
+
+       where no string can be a partial match for both alternatives.  This  is
+       not  a  problem if pcre_exec() is used, because the entire match has to
+       be rerun each time:
+
+           re> /1234|3789/
+         data> ABC123\P
+         Partial match: 123
+         data> 1237890
+          0: 3789
+
+       Of course, instead of using PCRE_DFA_PARTIAL, the same technique of re-
+       running the entire match can also be used with pcre_dfa_exec(). Another
+       possibility is to work with two buffers. If a partial match at offset n
+       in  the first buffer is followed by "no match" when PCRE_DFA_RESTART is
+       used on the second buffer, you can then try a  new  match  starting  at
+       offset n+1 in the first buffer.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 19 October 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+SAVING AND RE-USING PRECOMPILED PCRE PATTERNS
+
+       If  you  are running an application that uses a large number of regular
+       expression patterns, it may be useful to store them  in  a  precompiled
+       form  instead  of  having to compile them every time the application is
+       run.  If you are not  using  any  private  character  tables  (see  the
+       pcre_maketables()  documentation),  this is relatively straightforward.
+       If you are using private tables, it is a little bit more complicated.
+
+       If you save compiled patterns to a file, you can copy them to a differ-
+       ent  host  and  run them there. This works even if the new host has the
+       opposite endianness to the one on which  the  patterns  were  compiled.
+       There  may  be a small performance penalty, but it should be insignifi-
+       cant. However, compiling regular expressions with one version  of  PCRE
+       for  use  with  a  different  version is not guaranteed to work and may
+       cause crashes.
+
+
+SAVING A COMPILED PATTERN
+       The value returned by pcre_compile() points to a single block of memory
+       that  holds  the compiled pattern and associated data. You can find the
+       length of this block in bytes by calling pcre_fullinfo() with an  argu-
+       ment  of  PCRE_INFO_SIZE. You can then save the data in any appropriate
+       manner. Here is sample code that compiles a pattern and writes it to  a
+       file. It assumes that the variable fd refers to a file that is open for
+       output:
+
+         int erroroffset, rc, size;
+         char *error;
+         pcre *re;
+
+         re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
+         if (re == NULL) { ... handle errors ... }
+         rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
+         if (rc < 0) { ... handle errors ... }
+         rc = fwrite(re, 1, size, fd);
+         if (rc != size) { ... handle errors ... }
+
+       In this example, the bytes  that  comprise  the  compiled  pattern  are
+       copied  exactly.  Note that this is binary data that may contain any of
+       the 256 possible byte  values.  On  systems  that  make  a  distinction
+       between binary and non-binary data, be sure that the file is opened for
+       binary output.
+
+       If you want to write more than one pattern to a file, you will have  to
+       devise  a  way of separating them. For binary data, preceding each pat-
+       tern with its length is probably  the  most  straightforward  approach.
+       Another  possibility is to write out the data in hexadecimal instead of
+       binary, one pattern to a line.
+
+       Saving compiled patterns in a file is only one possible way of  storing
+       them  for later use. They could equally well be saved in a database, or
+       in the memory of some daemon process that passes them  via  sockets  to
+       the processes that want them.
+
+       If  the pattern has been studied, it is also possible to save the study
+       data in a similar way to the compiled  pattern  itself.  When  studying
+       generates  additional  information, pcre_study() returns a pointer to a
+       pcre_extra data block. Its format is defined in the section on matching
+       a  pattern in the pcreapi documentation. The study_data field points to
+       the binary study data,  and  this  is  what  you  must  save  (not  the
+       pcre_extra  block itself). The length of the study data can be obtained
+       by calling pcre_fullinfo() with  an  argument  of  PCRE_INFO_STUDYSIZE.
+       Remember  to check that pcre_study() did return a non-NULL value before
+       trying to save the study data.
+
+
+RE-USING A PRECOMPILED PATTERN
+
+       Re-using a precompiled pattern is straightforward. Having  reloaded  it
+       into   main   memory,   you   pass   its   pointer  to  pcre_exec()  or
+       pcre_dfa_exec() in the usual way. This  should  work  even  on  another
+       host,  and  even  if  that  host has the opposite endianness to the one
+       where the pattern was compiled.
+
+       However, if you passed a pointer to custom character  tables  when  the
+       pattern  was  compiled  (the  tableptr argument of pcre_compile()), you
+       must now pass a similar  pointer  to  pcre_exec()  or  pcre_dfa_exec(),
+       because  the  value  saved  with the compiled pattern will obviously be
+       nonsense. A field in a pcre_extra() block is used to pass this data, as
+       described  in the section on matching a pattern in the pcreapi documen-
+       tation.
+
+       If you did not provide custom character tables  when  the  pattern  was
+       compiled,  the  pointer  in  the compiled pattern is NULL, which causes
+       pcre_exec() to use PCRE's internal tables. Thus, you  do  not  need  to
+       take any special action at run time in this case.
+
+       If  you  saved study data with the compiled pattern, you need to create
+       your own pcre_extra data block and set the study_data field to point to
+       the  reloaded  study  data. You must also set the PCRE_EXTRA_STUDY_DATA
+       bit in the flags field to indicate that study  data  is  present.  Then
+       pass  the  pcre_extra  block  to  pcre_exec() or pcre_dfa_exec() in the
+       usual way.
+
+
+COMPATIBILITY WITH DIFFERENT PCRE RELEASES
+
+       In general, it is safest to  recompile  all  saved  patterns  when  you
+       update  to  a new PCRE release, though not all updates actually require
+       this. Recompiling is definitely needed for release 7.2.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 13 June 2007
+       Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREPERFORM(3)                                                  PCREPERFORM(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE PERFORMANCE
+
+       Two  aspects  of performance are discussed below: memory usage and pro-
+       cessing time. The way you express your pattern as a regular  expression
+       can affect both of them.
+
+
+MEMORY USAGE
+
+       Patterns are compiled by PCRE into a reasonably efficient byte code, so
+       that most simple patterns do not use much memory. However, there is one
+       case where memory usage can be unexpectedly large. When a parenthesized
+       subpattern has a quantifier with a minimum greater than 1 and/or a lim-
+       ited  maximum,  the  whole subpattern is repeated in the compiled code.
+       For example, the pattern
+
+         (abc|def){2,4}
+
+       is compiled as if it were
+
+         (abc|def)(abc|def)((abc|def)(abc|def)?)?
+
+       (Technical aside: It is done this way so that backtrack  points  within
+       each of the repetitions can be independently maintained.)
+
+       For  regular expressions whose quantifiers use only small numbers, this
+       is not usually a problem. However, if the numbers are large,  and  par-
+       ticularly  if  such repetitions are nested, the memory usage can become
+       an embarrassment. For example, the very simple pattern
+
+         ((ab){1,1000}c){1,3}
+
+       uses 51K bytes when compiled. When PCRE is compiled  with  its  default
+       internal  pointer  size of two bytes, the size limit on a compiled pat-
+       tern is 64K, and this is reached with the above pattern  if  the  outer
+       repetition is increased from 3 to 4. PCRE can be compiled to use larger
+       internal pointers and thus handle larger compiled patterns, but  it  is
+       better to try to rewrite your pattern to use less memory if you can.
+
+       One  way  of reducing the memory usage for such patterns is to make use
+       of PCRE's "subroutine" facility. Re-writing the above pattern as
+
+         ((ab)(?2){0,999}c)(?1){0,2}
+
+       reduces the memory requirements to 18K, and indeed it remains under 20K
+       even  with the outer repetition increased to 100. However, this pattern
+       is not exactly equivalent, because the "subroutine" calls  are  treated
+       as  atomic groups into which there can be no backtracking if there is a
+       subsequent matching failure. Therefore, PCRE cannot  do  this  kind  of
+       rewriting  automatically.   Furthermore,  there is a noticeable loss of
+       speed when executing the modified pattern. Nevertheless, if the  atomic
+       grouping  is  not  a  problem and the loss of speed is acceptable, this
+       kind of rewriting will allow you to process patterns that  PCRE  cannot
+       otherwise handle.
+
+
+PROCESSING TIME
+
+       Certain  items  in regular expression patterns are processed more effi-
+       ciently than others. It is more efficient to use a character class like
+       [aeiou]   than   a   set   of  single-character  alternatives  such  as
+       (a|e|i|o|u). In general, the simplest construction  that  provides  the
+       required behaviour is usually the most efficient. Jeffrey Friedl's book
+       contains a lot of useful general discussion  about  optimizing  regular
+       expressions  for  efficient  performance.  This document contains a few
+       observations about PCRE.
+
+       Using Unicode character properties (the \p,  \P,  and  \X  escapes)  is
+       slow,  because PCRE has to scan a structure that contains data for over
+       fifteen thousand characters whenever it needs a  character's  property.
+       If  you  can  find  an  alternative pattern that does not use character
+       properties, it will probably be faster.
+
+       When a pattern begins with .* not in  parentheses,  or  in  parentheses
+       that are not the subject of a backreference, and the PCRE_DOTALL option
+       is set, the pattern is implicitly anchored by PCRE, since it can  match
+       only  at  the start of a subject string. However, if PCRE_DOTALL is not
+       set, PCRE cannot make this optimization, because  the  .  metacharacter
+       does  not then match a newline, and if the subject string contains new-
+       lines, the pattern may match from the character  immediately  following
+       one of them instead of from the very start. For example, the pattern
+
+         .*second
+
+       matches  the subject "first\nand second" (where \n stands for a newline
+       character), with the match starting at the seventh character. In  order
+       to do this, PCRE has to retry the match starting after every newline in
+       the subject.
+
+       If you are using such a pattern with subject strings that do  not  con-
+       tain newlines, the best performance is obtained by setting PCRE_DOTALL,
+       or starting the pattern with ^.* or ^.*? to indicate  explicit  anchor-
+       ing.  That saves PCRE from having to scan along the subject looking for
+       a newline to restart at.
+
+       Beware of patterns that contain nested indefinite  repeats.  These  can
+       take  a  long time to run when applied to a string that does not match.
+       Consider the pattern fragment
+
+         ^(a+)*
+
+       This can match "aaaa" in 16 different ways, and this  number  increases
+       very  rapidly  as the string gets longer. (The * repeat can match 0, 1,
+       2, 3, or 4 times, and for each of those cases other than 0 or 4, the  +
+       repeats  can  match  different numbers of times.) When the remainder of
+       the pattern is such that the entire match is going to fail, PCRE has in
+       principle  to  try  every  possible  variation,  and  this  can take an
+       extremely long time, even for relatively short strings.
+
+       An optimization catches some of the more simple cases such as
+
+         (a+)*b
+
+       where a literal character follows. Before  embarking  on  the  standard
+       matching  procedure,  PCRE checks that there is a "b" later in the sub-
+       ject string, and if there is not, it fails the match immediately.  How-
+       ever,  when  there  is no following literal this optimization cannot be
+       used. You can see the difference by comparing the behaviour of
+
+         (a+)*\d
+
+       with the pattern above. The former gives  a  failure  almost  instantly
+       when  applied  to  a  whole  line of "a" characters, whereas the latter
+       takes an appreciable time with strings longer than about 20 characters.
+
+       In many cases, the solution to this kind of performance issue is to use
+       an atomic group or a possessive quantifier.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 06 March 2007
+       Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCREPOSIX(3)                                                      PCREPOSIX(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions.
+
+
+SYNOPSIS OF POSIX API
+
+       #include <pcreposix.h>
+
+       int regcomp(regex_t *preg, const char *pattern,
+            int cflags);
+
+       int regexec(regex_t *preg, const char *string,
+            size_t nmatch, regmatch_t pmatch[], int eflags);
+
+       size_t regerror(int errcode, const regex_t *preg,
+            char *errbuf, size_t errbuf_size);
+
+       void regfree(regex_t *preg);
+
+
+DESCRIPTION
+
+       This  set  of  functions provides a POSIX-style API to the PCRE regular
+       expression package. See the pcreapi documentation for a description  of
+       PCRE's native API, which contains much additional functionality.
+
+       The functions described here are just wrapper functions that ultimately
+       call  the  PCRE  native  API.  Their  prototypes  are  defined  in  the
+       pcreposix.h  header  file,  and  on  Unix systems the library itself is
+       called pcreposix.a, so can be accessed by  adding  -lpcreposix  to  the
+       command  for  linking  an application that uses them. Because the POSIX
+       functions call the native ones, it is also necessary to add -lpcre.
+
+       I have implemented only those POSIX option bits that can be  reasonably
+       mapped  to PCRE native options. In addition, the option REG_EXTENDED is
+       defined with the value zero. This has no  effect,  but  since  programs
+       that  are  written  to  the POSIX interface often use it, this makes it
+       easier to slot in PCRE as a replacement library.  Other  POSIX  options
+       are not even defined.
+
+       There  are also some other options that are not defined by POSIX. These
+       have been added at the request of users who want to make use of certain
+       PCRE-specific features via the POSIX calling interface.
+
+       When  PCRE  is  called  via these functions, it is only the API that is
+       POSIX-like in style. The syntax and semantics of  the  regular  expres-
+       sions  themselves  are  still  those of Perl, subject to the setting of
+       various PCRE options, as described below. "POSIX-like in  style"  means
+       that  the  API  approximates  to  the POSIX definition; it is not fully
+       POSIX-compatible, and in multi-byte encoding  domains  it  is  probably
+       even less compatible.
+
+       The  header for these functions is supplied as pcreposix.h to avoid any
+       potential clash with other POSIX  libraries.  It  can,  of  course,  be
+       renamed or aliased as regex.h, which is the "correct" name. It provides
+       two structure types, regex_t for  compiled  internal  forms,  and  reg-
+       match_t  for  returning  captured substrings. It also defines some con-
+       stants whose names start  with  "REG_";  these  are  used  for  setting
+       options and identifying error codes.
+
+
+COMPILING A PATTERN
+
+       The  function regcomp() is called to compile a pattern into an internal
+       form. The pattern is a C string terminated by a  binary  zero,  and  is
+       passed  in  the  argument  pattern. The preg argument is a pointer to a
+       regex_t structure that is used as a base for storing information  about
+       the compiled regular expression.
+
+       The argument cflags is either zero, or contains one or more of the bits
+       defined by the following macros:
+
+         REG_DOTALL
+
+       The PCRE_DOTALL option is set when the regular expression is passed for
+       compilation to the native function. Note that REG_DOTALL is not part of
+       the POSIX standard.
+
+         REG_ICASE
+
+       The PCRE_CASELESS option is set when the regular expression  is  passed
+       for compilation to the native function.
+
+         REG_NEWLINE
+
+       The  PCRE_MULTILINE option is set when the regular expression is passed
+       for compilation to the native function. Note that this does  not  mimic
+       the  defined  POSIX  behaviour  for REG_NEWLINE (see the following sec-
+       tion).
+
+         REG_NOSUB
+
+       The PCRE_NO_AUTO_CAPTURE option is set when the regular  expression  is
+       passed for compilation to the native function. In addition, when a pat-
+       tern that is compiled with this flag is passed to regexec() for  match-
+       ing,  the  nmatch  and  pmatch  arguments  are ignored, and no captured
+       strings are returned.
+
+         REG_UNGREEDY
+
+       The PCRE_UNGREEDY option is set when the regular expression  is  passed
+       for  compilation  to the native function. Note that REG_UNGREEDY is not
+       part of the POSIX standard.
+
+         REG_UTF8
+
+       The PCRE_UTF8 option is set when the regular expression is  passed  for
+       compilation  to the native function. This causes the pattern itself and
+       all data strings used for matching it to be treated as  UTF-8  strings.
+       Note that REG_UTF8 is not part of the POSIX standard.
+
+       In  the  absence  of  these  flags, no options are passed to the native
+       function.  This means the the  regex  is  compiled  with  PCRE  default
+       semantics.  In particular, the way it handles newline characters in the
+       subject string is the Perl way, not the POSIX way.  Note  that  setting
+       PCRE_MULTILINE  has only some of the effects specified for REG_NEWLINE.
+       It does not affect the way newlines are matched by . (they are not)  or
+       by a negative class such as [^a] (they are).
+
+       The  yield of regcomp() is zero on success, and non-zero otherwise. The
+       preg structure is filled in on success, and one member of the structure
+       is  public: re_nsub contains the number of capturing subpatterns in the
+       regular expression. Various error codes are defined in the header file.
+
+       NOTE: If the yield of regcomp() is non-zero, you must  not  attempt  to
+       use the contents of the preg structure. If, for example, you pass it to
+       regexec(), the result is undefined and your program is likely to crash.
+
+
+MATCHING NEWLINE CHARACTERS
+
+       This area is not simple, because POSIX and Perl take different views of
+       things.   It  is  not possible to get PCRE to obey POSIX semantics, but
+       then PCRE was never intended to be a POSIX engine. The following  table
+       lists  the  different  possibilities for matching newline characters in
+       PCRE:
+
+                                 Default   Change with
+
+         . matches newline          no     PCRE_DOTALL
+         newline matches [^a]       yes    not changeable
+         $ matches \n at end        yes    PCRE_DOLLARENDONLY
+         $ matches \n in middle     no     PCRE_MULTILINE
+         ^ matches \n in middle     no     PCRE_MULTILINE
+
+       This is the equivalent table for POSIX:
+
+                                 Default   Change with
+
+         . matches newline          yes    REG_NEWLINE
+         newline matches [^a]       yes    REG_NEWLINE
+         $ matches \n at end        no     REG_NEWLINE
+         $ matches \n in middle     no     REG_NEWLINE
+         ^ matches \n in middle     no     REG_NEWLINE
+
+       PCRE's behaviour is the same as Perl's, except that there is no equiva-
+       lent  for  PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is
+       no way to stop newline from matching [^a].
+
+       The  default  POSIX  newline  handling  can  be  obtained  by   setting
+       PCRE_DOTALL  and  PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE
+       behave exactly as for the REG_NEWLINE action.
+
+
+MATCHING A PATTERN
+
+       The function regexec() is called  to  match  a  compiled  pattern  preg
+       against  a  given string, which is by default terminated by a zero byte
+       (but see REG_STARTEND below), subject to the options in  eflags.  These
+       can be:
+
+         REG_NOTBOL
+
+       The PCRE_NOTBOL option is set when calling the underlying PCRE matching
+       function.
+
+         REG_NOTEMPTY
+
+       The PCRE_NOTEMPTY option is set when calling the underlying PCRE match-
+       ing function. Note that REG_NOTEMPTY is not part of the POSIX standard.
+       However, setting this option can give more POSIX-like behaviour in some
+       situations.
+
+         REG_NOTEOL
+
+       The PCRE_NOTEOL option is set when calling the underlying PCRE matching
+       function.
+
+         REG_STARTEND
+
+       The string is considered to start at string +  pmatch[0].rm_so  and  to
+       have  a terminating NUL located at string + pmatch[0].rm_eo (there need
+       not actually be a NUL at that location), regardless  of  the  value  of
+       nmatch.  This  is a BSD extension, compatible with but not specified by
+       IEEE Standard 1003.2 (POSIX.2), and should  be  used  with  caution  in
+       software intended to be portable to other systems. Note that a non-zero
+       rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location
+       of the string, not how it is matched.
+
+       If  the pattern was compiled with the REG_NOSUB flag, no data about any
+       matched strings  is  returned.  The  nmatch  and  pmatch  arguments  of
+       regexec() are ignored.
+
+       If the value of nmatch is zero, or if the value pmatch is NULL, no data
+       about any matched strings is returned.
+
+       Otherwise,the portion of the string that was matched, and also any cap-
+       tured substrings, are returned via the pmatch argument, which points to
+       an array of nmatch structures of type regmatch_t, containing  the  mem-
+       bers  rm_so  and rm_eo. These contain the offset to the first character
+       of each substring and the offset to the first character after  the  end
+       of  each substring, respectively. The 0th element of the vector relates
+       to the entire portion of string that was matched;  subsequent  elements
+       relate  to  the capturing subpatterns of the regular expression. Unused
+       entries in the array have both structure members set to -1.
+
+       A successful match yields  a  zero  return;  various  error  codes  are
+       defined  in  the  header  file,  of which REG_NOMATCH is the "expected"
+       failure code.
+
+
+ERROR MESSAGES
+
+       The regerror() function maps a non-zero errorcode from either regcomp()
+       or  regexec()  to  a  printable message. If preg is not NULL, the error
+       should have arisen from the use of that structure. A message terminated
+       by  a  binary  zero  is  placed  in  errbuf. The length of the message,
+       including the zero, is limited to errbuf_size. The yield of  the  func-
+       tion is the size of buffer needed to hold the whole message.
+
+
+MEMORY USAGE
+
+       Compiling  a regular expression causes memory to be allocated and asso-
+       ciated with the preg structure. The function regfree() frees  all  such
+       memory,  after  which  preg may no longer be used as a compiled expres-
+       sion.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 02 September 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+
+
+PCRECPP(3)                                                          PCRECPP(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions.
+
+
+SYNOPSIS OF C++ WRAPPER
+
+       #include <pcrecpp.h>
+
+
+DESCRIPTION
+
+       The  C++  wrapper  for PCRE was provided by Google Inc. Some additional
+       functionality was added by Giuseppe Maxia. This brief man page was con-
+       structed  from  the  notes  in the pcrecpp.h file, which should be con-
+       sulted for further details.
+
+
+MATCHING INTERFACE
+
+       The "FullMatch" operation checks that supplied text matches a  supplied
+       pattern  exactly.  If pointer arguments are supplied, it copies matched
+       sub-strings that match sub-patterns into them.
+
+         Example: successful match
+            pcrecpp::RE re("h.*o");
+            re.FullMatch("hello");
+
+         Example: unsuccessful match (requires full match):
+            pcrecpp::RE re("e");
+            !re.FullMatch("hello");
+
+         Example: creating a temporary RE object:
+            pcrecpp::RE("h.*o").FullMatch("hello");
+
+       You can pass in a "const char*" or a "string" for "text". The  examples
+       below  tend to use a const char*. You can, as in the different examples
+       above, store the RE object explicitly in a variable or use a  temporary
+       RE  object.  The  examples below use one mode or the other arbitrarily.
+       Either could correctly be used for any of these examples.
+
+       You must supply extra pointer arguments to extract matched subpieces.
+
+         Example: extracts "ruby" into "s" and 1234 into "i"
+            int i;
+            string s;
+            pcrecpp::RE re("(\\w+):(\\d+)");
+            re.FullMatch("ruby:1234", &s, &i);
+
+         Example: does not try to extract any extra sub-patterns
+            re.FullMatch("ruby:1234", &s);
+
+         Example: does not try to extract into NULL
+            re.FullMatch("ruby:1234", NULL, &i);
+
+         Example: integer overflow causes failure
+            !re.FullMatch("ruby:1234567891234", NULL, &i);
+
+         Example: fails because there aren't enough sub-patterns:
+            !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
+
+         Example: fails because string cannot be stored in integer
+            !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
+
+       The provided pointer arguments can be pointers to  any  scalar  numeric
+       type, or one of:
+
+          string        (matched piece is copied to string)
+          StringPiece   (StringPiece is mutated to point to matched piece)
+          T             (where "bool T::ParseFrom(const char*, int)" exists)
+          NULL          (the corresponding matched sub-pattern is not copied)
+
+       The  function returns true iff all of the following conditions are sat-
+       isfied:
+
+         a. "text" matches "pattern" exactly;
+
+         b. The number of matched sub-patterns is >= number of supplied
+            pointers;
+
+         c. The "i"th argument has a suitable type for holding the
+            string captured as the "i"th sub-pattern. If you pass in
+            void * NULL for the "i"th argument, or a non-void * NULL
+            of the correct type, or pass fewer arguments than the
+            number of sub-patterns, "i"th captured sub-pattern is
+            ignored.
+
+       CAVEAT: An optional sub-pattern that does  not  exist  in  the  matched
+       string  is  assigned  the  empty  string. Therefore, the following will
+       return false (because the empty string is not a valid number):
+
+          int number;
+          pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
+
+       The matching interface supports at most 16 arguments per call.  If  you
+       need    more,    consider    using    the    more   general   interface
+       pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
+
+       NOTE: Do not use no_arg, which is used internally to mark the end of  a
+       list  of optional arguments, as a placeholder for missing arguments, as
+       this can lead to segfaults.
+
+
+QUOTING METACHARACTERS
+
+       You can use the "QuoteMeta" operation to insert backslashes before  all
+       potentially  meaningful  characters  in  a string. The returned string,
+       used as a regular expression, will exactly match the original string.
+
+         Example:
+            string quoted = RE::QuoteMeta(unquoted);
+
+       Note that it's legal to escape a character even if it  has  no  special
+       meaning  in  a  regular expression -- so this function does that. (This
+       also makes it identical to the perl function  of  the  same  name;  see
+       "perldoc    -f    quotemeta".)    For   example,   "1.5-2.0?"   becomes
+       "1\.5\-2\.0\?".
+
+
+PARTIAL MATCHES
+
+       You can use the "PartialMatch" operation when you want the  pattern  to
+       match any substring of the text.
+
+         Example: simple search for a string:
+            pcrecpp::RE("ell").PartialMatch("hello");
+
+         Example: find first number in a string:
+            int number;
+            pcrecpp::RE re("(\\d+)");
+            re.PartialMatch("x*100 + 20", &number);
+            assert(number == 100);
+
+
+UTF-8 AND THE MATCHING INTERFACE
+
+       By  default,  pattern  and text are plain text, one byte per character.
+       The UTF8 flag, passed to  the  constructor,  causes  both  pattern  and
+       string to be treated as UTF-8 text, still a byte stream but potentially
+       multiple bytes per character. In practice, the text is likelier  to  be
+       UTF-8  than  the pattern, but the match returned may depend on the UTF8
+       flag, so always use it when matching UTF8 text. For example,  "."  will
+       match  one  byte normally but with UTF8 set may match up to three bytes
+       of a multi-byte character.
+
+         Example:
+            pcrecpp::RE_Options options;
+            options.set_utf8();
+            pcrecpp::RE re(utf8_pattern, options);
+            re.FullMatch(utf8_string);
+
+         Example: using the convenience function UTF8():
+            pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
+            re.FullMatch(utf8_string);
+
+       NOTE: The UTF8 flag is ignored if pcre was not configured with the
+             --enable-utf8 flag.
+
+
+PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE
+
+       PCRE defines some modifiers to  change  the  behavior  of  the  regular
+       expression   engine.  The  C++  wrapper  defines  an  auxiliary  class,
+       RE_Options, as a vehicle to pass such modifiers to  a  RE  class.  Cur-
+       rently, the following modifiers are supported:
+
+          modifier              description               Perl corresponding
+
+          PCRE_CASELESS         case insensitive match      /i
+          PCRE_MULTILINE        multiple lines match        /m
+          PCRE_DOTALL           dot matches newlines        /s
+          PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
+          PCRE_EXTRA            strict escape parsing       N/A
+          PCRE_EXTENDED         ignore whitespaces          /x
+          PCRE_UTF8             handles UTF8 chars          built-in
+          PCRE_UNGREEDY         reverses * and *?           N/A
+          PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
+
+       (*)  Both Perl and PCRE allow non capturing parentheses by means of the
+       "?:" modifier within the pattern itself. e.g. (?:ab|cd) does  not  cap-
+       ture, while (ab|cd) does.
+
+       For  a  full  account on how each modifier works, please check the PCRE
+       API reference page.
+
+       For each modifier, there are two member functions whose  name  is  made
+       out  of  the  modifier  in  lowercase,  without the "PCRE_" prefix. For
+       instance, PCRE_CASELESS is handled by
+
+         bool caseless()
+
+       which returns true if the modifier is set, and
+
+         RE_Options & set_caseless(bool)
+
+       which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
+       be  accessed  through  the  set_match_limit()  and match_limit() member
+       functions. Setting match_limit to a non-zero value will limit the  exe-
+       cution  of pcre to keep it from doing bad things like blowing the stack
+       or taking an eternity to return a result.  A  value  of  5000  is  good
+       enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
+       to  zero  disables  match  limiting.  Alternatively,   you   can   call
+       match_limit_recursion()  which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
+       limit how much  PCRE  recurses.  match_limit()  limits  the  number  of
+       matches PCRE does; match_limit_recursion() limits the depth of internal
+       recursion, and therefore the amount of stack that is used.
+
+       Normally, to pass one or more modifiers to a RE class,  you  declare  a
+       RE_Options object, set the appropriate options, and pass this object to
+       a RE constructor. Example:
+
+          RE_options opt;
+          opt.set_caseless(true);
+          if (RE("HELLO", opt).PartialMatch("hello world")) ...
+
+       RE_options has two constructors. The default constructor takes no argu-
+       ments  and creates a set of flags that are off by default. The optional
+       parameter option_flags is to facilitate transfer of legacy code from  C
+       programs.  This lets you do
+
+          RE(pattern,
+            RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
+
+       However, new code is better off doing
+
+          RE(pattern,
+            RE_Options().set_caseless(true).set_multiline(true))
+              .PartialMatch(str);
+
+       If you are going to pass one of the most used modifiers, there are some
+       convenience functions that return a RE_Options class with the appropri-
+       ate  modifier  already  set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
+       and EXTENDED().
+
+       If you need to set several options at once, and you don't  want  to  go
+       through  the pains of declaring a RE_Options object and setting several
+       options, there is a parallel method that give you such ability  on  the
+       fly.  You  can  concatenate several set_xxxxx() member functions, since
+       each of them returns a reference to its class object. For  example,  to
+       pass  PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
+       statement, you may write:
+
+          RE(" ^ xyz \\s+ .* blah$",
+            RE_Options()
+              .set_caseless(true)
+              .set_extended(true)
+              .set_multiline(true)).PartialMatch(sometext);
+
+
+SCANNING TEXT INCREMENTALLY
+
+       The "Consume" operation may be useful if you want to  repeatedly  match
+       regular expressions at the front of a string and skip over them as they
+       match. This requires use of the "StringPiece" type, which represents  a
+       sub-range  of  a  real  string.  Like RE, StringPiece is defined in the
+       pcrecpp namespace.
+
+         Example: read lines of the form "var = value" from a string.
+            string contents = ...;                 // Fill string somehow
+            pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
+
+            string var;
+            int value;
+            pcrecpp::RE re("(\\w+) = (\\d+)\n");
+            while (re.Consume(&input, &var, &value)) {
+              ...;
+            }
+
+       Each successful call  to  "Consume"  will  set  "var/value",  and  also
+       advance "input" so it points past the matched text.
+
+       The  "FindAndConsume"  operation  is  similar to "Consume" but does not
+       anchor your match at the beginning of  the  string.  For  example,  you
+       could extract all words from a string by repeatedly calling
+
+         pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
+
+
+PARSING HEX/OCTAL/C-RADIX NUMBERS
+
+       By default, if you pass a pointer to a numeric value, the corresponding
+       text is interpreted as a base-10  number.  You  can  instead  wrap  the
+       pointer with a call to one of the operators Hex(), Octal(), or CRadix()
+       to interpret the text in another base. The CRadix  operator  interprets
+       C-style  "0"  (base-8)  and  "0x"  (base-16)  prefixes, but defaults to
+       base-10.
+
+         Example:
+           int a, b, c, d;
+           pcrecpp::RE re("(.*) (.*) (.*) (.*)");
+           re.FullMatch("100 40 0100 0x40",
+                        pcrecpp::Octal(&a), pcrecpp::Hex(&b),
+                        pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
+
+       will leave 64 in a, b, c, and d.
+
+
+REPLACING PARTS OF STRINGS
+
+       You can replace the first match of "pattern" in "str"  with  "rewrite".
+       Within  "rewrite",  backslash-escaped  digits (\1 to \9) can be used to
+       insert text matching corresponding parenthesized group  from  the  pat-
+       tern. \0 in "rewrite" refers to the entire matching text. For example:
+
+         string s = "yabba dabba doo";
+         pcrecpp::RE("b+").Replace("d", &s);
+
+       will  leave  "s" containing "yada dabba doo". The result is true if the
+       pattern matches and a replacement occurs, false otherwise.
+
+       GlobalReplace is like Replace except that it replaces  all  occurrences
+       of  the  pattern  in  the string with the rewrite. Replacements are not
+       subject to re-matching. For example:
+
+         string s = "yabba dabba doo";
+         pcrecpp::RE("b+").GlobalReplace("d", &s);
+
+       will leave "s" containing "yada dada doo". It  returns  the  number  of
+       replacements made.
+
+       Extract  is like Replace, except that if the pattern matches, "rewrite"
+       is copied into "out" (an additional argument) with substitutions.   The
+       non-matching  portions  of "text" are ignored. Returns true iff a match
+       occurred and the extraction happened successfully;  if no match occurs,
+       the string is left unaffected.
+
+
+AUTHOR
+
+       The C++ wrapper was contributed by Google Inc.
+       Copyright (c) 2007 Google Inc.
+
+
+REVISION
+
+       Last updated: 17 March 2009
+------------------------------------------------------------------------------
+
+
+PCRESAMPLE(3)                                                    PCRESAMPLE(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE SAMPLE PROGRAM
+
+       A simple, complete demonstration program, to get you started with using
+       PCRE, is supplied in the file pcredemo.c in the  PCRE  distribution.  A
+       listing  of this program is given in the pcredemo documentation. If you
+       do not have a copy of the PCRE distribution, you can save this  listing
+       to re-create pcredemo.c.
+
+       The program compiles the regular expression that is its first argument,
+       and matches it against the subject string in its  second  argument.  No
+       PCRE  options are set, and default character tables are used. If match-
+       ing succeeds, the program outputs  the  portion  of  the  subject  that
+       matched, together with the contents of any captured substrings.
+
+       If the -g option is given on the command line, the program then goes on
+       to check for further matches of the same regular expression in the same
+       subject  string. The logic is a little bit tricky because of the possi-
+       bility of matching an empty string. Comments in the code  explain  what
+       is going on.
+
+       If  PCRE  is  installed in the standard include and library directories
+       for your operating system, you should be able to compile the demonstra-
+       tion program using this command:
+
+         gcc -o pcredemo pcredemo.c -lpcre
+
+       If  PCRE is installed elsewhere, you may need to add additional options
+       to the command line. For example, on a Unix-like system that  has  PCRE
+       installed  in  /usr/local,  you  can  compile the demonstration program
+       using a command like this:
+
+         gcc -o pcredemo -I/usr/local/include pcredemo.c \
+             -L/usr/local/lib -lpcre
+
+       Once you have compiled the demonstration program, you  can  run  simple
+       tests like this:
+
+         ./pcredemo 'cat|dog' 'the cat sat on the mat'
+         ./pcredemo -g 'cat|dog' 'the dog sat on the cat'
+
+       Note  that  there  is  a  much  more comprehensive test program, called
+       pcretest, which supports  many  more  facilities  for  testing  regular
+       expressions and the PCRE library. The pcredemo program is provided as a
+       simple coding example.
+
+       When you try to run pcredemo when PCRE is not installed in the standard
+       library  directory,  you  may  get an error like this on some operating
+       systems (e.g. Solaris):
+
+         ld.so.1: a.out: fatal: libpcre.so.0: open failed:  No  such  file  or
+       directory
+
+       This  is  caused  by the way shared library support works on those sys-
+       tems. You need to add
+
+         -R/usr/local/lib
+
+       (for example) to the compile command to get round this problem.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 30 September 2009
+       Copyright (c) 1997-2009 University of Cambridge.
+------------------------------------------------------------------------------
+PCRESTACK(3)                                                      PCRESTACK(3)
+
+
+NAME
+       PCRE - Perl-compatible regular expressions
+
+
+PCRE DISCUSSION OF STACK USAGE
+
+       When  you call pcre_exec(), it makes use of an internal function called
+       match(). This calls itself recursively at branch points in the pattern,
+       in  order to remember the state of the match so that it can back up and
+       try a different alternative if the first one fails.  As  matching  pro-
+       ceeds  deeper  and deeper into the tree of possibilities, the recursion
+       depth increases.
+
+       Not all calls of match() increase the recursion depth; for an item such
+       as  a* it may be called several times at the same level, after matching
+       different numbers of a's. Furthermore, in a number of cases  where  the
+       result  of  the  recursive call would immediately be passed back as the
+       result of the current call (a "tail recursion"), the function  is  just
+       restarted instead.
+
+       The pcre_dfa_exec() function operates in an entirely different way, and
+       hardly uses recursion at all. The limit on its complexity is the amount
+       of  workspace  it  is  given.  The comments that follow do NOT apply to
+       pcre_dfa_exec(); they are relevant only for pcre_exec().
+
+       You can set limits on the number of times that match() is called,  both
+       in  total  and  recursively. If the limit is exceeded, an error occurs.
+       For details, see the section on  extra  data  for  pcre_exec()  in  the
+       pcreapi documentation.
+
+       Each  time  that match() is actually called recursively, it uses memory
+       from the process stack. For certain kinds of  pattern  and  data,  very
+       large  amounts of stack may be needed, despite the recognition of "tail
+       recursion".  You can often reduce the amount of recursion,  and  there-
+       fore  the  amount of stack used, by modifying the pattern that is being
+       matched. Consider, for example, this pattern:
+
+         ([^<]|<(?!inet))+
+
+       It matches from wherever it starts until it encounters "<inet"  or  the
+       end  of  the  data,  and is the kind of pattern that might be used when
+       processing an XML file. Each iteration of the outer parentheses matches
+       either  one  character that is not "<" or a "<" that is not followed by
+       "inet". However, each time a  parenthesis  is  processed,  a  recursion
+       occurs, so this formulation uses a stack frame for each matched charac-
+       ter. For a long string, a lot of stack is required. Consider  now  this
+       rewritten pattern, which matches exactly the same strings:
+
+         ([^<]++|<(?!inet))+
+
+       This  uses very much less stack, because runs of characters that do not
+       contain "<" are "swallowed" in one item inside the parentheses.  Recur-
+       sion  happens  only when a "<" character that is not followed by "inet"
+       is encountered (and we assume this is relatively  rare).  A  possessive
+       quantifier  is  used  to stop any backtracking into the runs of non-"<"
+       characters, but that is not related to stack usage.
+
+       This example shows that one way of avoiding stack problems when  match-
+       ing long subject strings is to write repeated parenthesized subpatterns
+       to match more than one character whenever possible.
+
+   Compiling PCRE to use heap instead of stack
+
+       In environments where stack memory is constrained, you  might  want  to
+       compile  PCRE to use heap memory instead of stack for remembering back-
+       up points. This makes it run a lot more slowly, however. Details of how
+       to do this are given in the pcrebuild documentation. When built in this
+       way, instead of using the stack, PCRE obtains and frees memory by call-
+       ing  the  functions  that  are  pointed to by the pcre_stack_malloc and
+       pcre_stack_free variables. By default,  these  point  to  malloc()  and
+       free(),  but you can replace the pointers to cause PCRE to use your own
+       functions. Since the block sizes are always the same,  and  are  always
+       freed in reverse order, it may be possible to implement customized mem-
+       ory handlers that are more efficient than the standard functions.
+
+   Limiting PCRE's stack usage
+
+       PCRE has an internal counter that can be used to  limit  the  depth  of
+       recursion,  and  thus cause pcre_exec() to give an error code before it
+       runs out of stack. By default, the limit is very  large,  and  unlikely
+       ever  to operate. It can be changed when PCRE is built, and it can also
+       be set when pcre_exec() is called. For details of these interfaces, see
+       the pcrebuild and pcreapi documentation.
+
+       As a very rough rule of thumb, you should reckon on about 500 bytes per
+       recursion. Thus, if you want to limit your  stack  usage  to  8Mb,  you
+       should  set  the  limit at 16000 recursions. A 64Mb stack, on the other
+       hand, can support around 128000 recursions. The pcretest  test  program
+       has a command line option (-S) that can be used to increase the size of
+       its stack.
+
+   Changing stack size in Unix-like systems
+
+       In Unix-like environments, there is not often a problem with the  stack
+       unless  very  long  strings  are  involved, though the default limit on
+       stack size varies from system to system. Values from 8Mb  to  64Mb  are
+       common. You can find your default limit by running the command:
+
+         ulimit -s
+
+       Unfortunately,  the  effect  of  running out of stack is often SIGSEGV,
+       though sometimes a more explicit error message is given. You  can  nor-
+       mally increase the limit on stack size by code such as this:
+
+         struct rlimit rlim;
+         getrlimit(RLIMIT_STACK, &rlim);
+         rlim.rlim_cur = 100*1024*1024;
+         setrlimit(RLIMIT_STACK, &rlim);
+
+       This  reads  the current limits (soft and hard) using getrlimit(), then
+       attempts to increase the soft limit to  100Mb  using  setrlimit().  You
+       must do this before calling pcre_exec().
+
+   Changing stack size in Mac OS X
+
+       Using setrlimit(), as described above, should also work on Mac OS X. It
+       is also possible to set a stack size when linking a program. There is a
+       discussion   about   stack  sizes  in  Mac  OS  X  at  this  web  site:
+       http://developer.apple.com/qa/qa2005/qa1419.html.
+
+
+AUTHOR
+
+       Philip Hazel
+       University Computing Service
+       Cambridge CB2 3QH, England.
+
+
+REVISION
+
+       Last updated: 09 July 2008
+       Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+
+