If we just want to integrate regular expressions into SQL Server
using a “quick and dirty” approach, then we could use the VBScript
regular expression COM objects in conjunction with the “sp_OA*”
functions. This is the path taken by Cory Koski in his article on
using regular expressions in SQL Server at http://sqlteam.com/item.asp?ItemID=13947.
An XP implementation of a regular expression on the other hand
should be much faster as it avoids the inherent overhead in the
The first thing to decide is what regular expression engine to use
to implement the XP in. There are a number of choices we could pick
from including reusing the VBScript regular expression COM object.
Unfortunately this COM object does not provide an early bound
interface meaning that we would still have the overhead involved in
the “sp_OA*” function calls. Another alternative is PCRE with PCRE++
(a C++ wrapper-class for the Perl Compatible Regular Expressions
The choice we finally plump for is the ATL Regular expression parser
that is included in the ATL Server framework provided in Visual
Studio .NET 2003. Unlike most of the other XP’s in this book, this
is one XP, which we must compile in this latest version of Visual
Studio because the regular expression support we use was added in
this version. Another nice advantage, which the ATL regular
expression classes have, is that it handles MBCS and Unicode data as
well as ASCII. Doing searches on MBCS strings is a very handy
feature as many Asian languages use these types of strings. The
classes we use are “ATL::CAtlRegExp" and “ATL::CAtlREMatchContext”.
They provide a very complete set of regular expression operators,
support case sensitivity, has a small memory footprint and it is
very fast as it is implemented as raw C++ template based code.
As is usual we use the XP++ framework to provide the skeleton to the
XP DLL code. As this XP requires Visual Studio .NET 2003, it is also
a good test of the XP++ framework in compilers other than Visual
Studio 6, which is used by most of the other XP++ samples.
The XP namely XP_REGEXP exposes 3 XP’s using the following XP++
The module (XP_REGEXPProc.cpp) contains the code for the individual
XP’s. The code is divided in three blocks, with very similar code,
one for ASCII, one for MBCS and another one for Unicode. This is
required due to the way the ATL classes are designed. The
“CAtlRegExp” class takes a template traits class, which specifies
what character set is to be used. Each one of these XP’s is made up
of four basic steps for matching a regular expression.
First, the parameter data is obtained (string to be searched,
regular expression and boolean to determine case sensitivity) as
ATL::CAtlRECharTraitsMB::RECHARTYPE* pszSourceText = (ATL::CAtlRECharTraitsMB::RECHARTYPE*)
ATL::CAtlRECharTraitsMB::RECHARTYPE* pszRegularExpression = (ATL::CAtlRECharTraitsMB::RECHARTYPE*)
DBBOOL bCaseSensitive =
Second, an instance of the regular expression parser is created:
Third, the regular expression is parsed ready for it to be used:
ATL::REParseError reParseError = regexp.Parse(pszRegularExpression,
Finally, the match is performed:
BOOL bMatch = regexp.Match(pszSourceText, &mcRegExp);
The “bMatch” variable is then returned as an OUTPUT parameter in the
XP. Note that the code snippets above were based on the ASCII XP.
Each XP in XP_REGEXP has four parameters, namely @SourceText, @RegExp,
@CaseSensitive and @Match. “SourceText” is the string being
searched, “RegExp” is the regular expression, “CaseSensitive” is the
case sensitivity flag and “Match” is an output parameter boolean
that returns the result of the match.
The above book excerpt is from:
Turbocharge Database Performance with C++ External Procedures
Joseph Gama, P. J. Naughter