=====Using regular expressions=====
//by Richard Russell, December 2006//\\ \\ Regular expressions provide a means to specify a pattern of characters, or syntax rule, which a string (or part of a string) must match. Certain //metacharacters// have special significance; for example a dot (.) matches any single character, square brackets [] enclose a list of matching characters, a caret (^) signifies negation and so on. Here are some simple examples:\\ \\
| a..d\\ | matches "abcd", "axyd", "a12d" etc.\\ |
| [abc]\\ | matches "a", "b" or "c"\\ |
| [a-z]\\ | matches any lowercase letter\\ |
| [^b]at\\ | matches "cat", "fat", "hat" etc. but not "bat"\\ |
\\ For more information on the syntax of regular expressions see this [[http://en.wikipedia.org/wiki/Regular_expression|Wikipedia article]].\\ \\ You can make use of regular expressions in your BBC BASIC program by means of the **gnu_regex** DLL which can be downloaded from [[http://people.delphiforums.com/gjc/gnu_regex.html|here]][[/Using%20regular%20expressions#footnote|[1]]]. To start with you must load the DLL in the usual way:\\ \\
SYS "LoadLibrary", "gnu_regex.dll" TO gnu_regex%
IF gnu_regex% = 0 ERROR 100, "Cannot load gnu_regex.dll"
SYS "GetProcAddress", gnu_regex%, "regcomp" TO regcomp%
SYS "GetProcAddress", gnu_regex%, "regexec" TO regexec%
For this to work **gnu_regex.dll** needs to be in the current directory, the Windows directory (often C:\WINDOWS), the Windows system directory (often C:\WINDOWS\SYSTEM32) or one of the directories specified in the PATH environment variable. Alternatively you can copy the file to your BBC BASIC for Windows library folder and load it explicitly from there:\\ \\
SYS "LoadLibrary", @lib$+"gnu_regex.dll" TO gnu_regex%
The code below illustrates a very simple example of setting up a pattern and inputting strings from the user which are tested against this pattern:\\ \\
DIM buffer% 255
pattern$ = "[abcxyz]"
SYS regcomp%, buffer%, pattern$, 0 TO result%
IF result% ERROR 101, "Failed to compile regular expression"
REPEAT
INPUT "Enter a string: " test$
SYS regexec%, buffer%, test$, 0, 0, 0 TO result%
IF result% PRINT "Not matched" ELSE PRINT "Matched"
UNTIL FALSE
You should ensure that **buffer%** points to a memory buffer large enough to contain the //compiled// regular expression (although it's not clear how you are supposed to ascertain this!). As always, make sure you execute the **DIM** statement only once, or use **DIM LOCAL**, to avoid a memory leak and an eventual **No room** error.\\ \\ In this example the pattern matches the characters "a", "b", "c", "x", "y" or "z" anywhere in the string. The program as listed provides no information on //where// in the string the match occurred. You can discover that information by amending the program as follows:\\ \\
DIM offsets{start%, finish%}
REPEAT
INPUT "Enter a string: " test$
SYS regexec%, buffer%, test$, 1, offsets{}, 0 TO result%
IF result% PRINT "Not matched" ELSE PRINT "Matched at ";offsets.start%
UNTIL FALSE
Here **offsets.start%** is set to the offset from the beginning of the string of the first match.\\ \\ You can specify that the matching is //case insensitive// by changing the final parameter of **regcomp** from 0 to 2 as follows:\\ \\
_REG_ICASE = 2
SYS regcomp%, buffer%, pattern$, _REG_ICASE TO result%
You can also specify the use of **extended regular expressions** by setting the final parameter to 1:\\ \\
_REG_EXTENDED = 1
SYS regcomp%, buffer%, pattern$, _REG_EXTENDED TO result%
In this mode additional //metacharacters// are recognised, for example the vertical bar (|) signifies alternatives:\\ \\
| abc|def\\ | matches "abc" or "def"\\ |
\\
----
[1] When last checked, the file **gnu_regex.exe** was corrupted (missing the last byte). To repair it you can use this simple BBC BASIC program:\\ \\
F% = OPENUP("gnu_regex.exe")
PTR#F% = EXT#F%
BPUT #F%,0
CLOSE #F%