=====Using regular expressions===== //by Richard Russell, December 2006//\\ \\ Regular expressions provide a means to specify a pattern of characters, or syntax rule, which a string (or part of a string) must match. Certain //metacharacters// have special significance; for example a dot (.) matches any single character, square brackets [] enclose a list of matching characters, a caret (^) signifies negation and so on. Here are some simple examples:\\ \\ | a..d\\ | matches "abcd", "axyd", "a12d" etc.\\ | | [abc]\\ | matches "a", "b" or "c"\\ | | [a-z]\\ | matches any lowercase letter\\ | | [^b]at\\ | matches "cat", "fat", "hat" etc. but not "bat"\\ | \\ For more information on the syntax of regular expressions see this [[http://en.wikipedia.org/wiki/Regular_expression|Wikipedia article]].\\ \\ You can make use of regular expressions in your BBC BASIC program by means of the **gnu_regex** DLL which can be downloaded from [[http://people.delphiforums.com/gjc/gnu_regex.html|here]][[/Using%20regular%20expressions#footnote|[1]]]. To start with you must load the DLL in the usual way:\\ \\ SYS "LoadLibrary", "gnu_regex.dll" TO gnu_regex% IF gnu_regex% = 0 ERROR 100, "Cannot load gnu_regex.dll" SYS "GetProcAddress", gnu_regex%, "regcomp" TO regcomp% SYS "GetProcAddress", gnu_regex%, "regexec" TO regexec% For this to work **gnu_regex.dll** needs to be in the current directory, the Windows directory (often C:\WINDOWS), the Windows system directory (often C:\WINDOWS\SYSTEM32) or one of the directories specified in the PATH environment variable. Alternatively you can copy the file to your BBC BASIC for Windows library folder and load it explicitly from there:\\ \\ SYS "LoadLibrary", @lib$+"gnu_regex.dll" TO gnu_regex% The code below illustrates a very simple example of setting up a pattern and inputting strings from the user which are tested against this pattern:\\ \\ DIM buffer% 255 pattern$ = "[abcxyz]" SYS regcomp%, buffer%, pattern$, 0 TO result% IF result% ERROR 101, "Failed to compile regular expression" REPEAT INPUT "Enter a string: " test$ SYS regexec%, buffer%, test$, 0, 0, 0 TO result% IF result% PRINT "Not matched" ELSE PRINT "Matched" UNTIL FALSE You should ensure that **buffer%** points to a memory buffer large enough to contain the //compiled// regular expression (although it's not clear how you are supposed to ascertain this!). As always, make sure you execute the **DIM** statement only once, or use **DIM LOCAL**, to avoid a memory leak and an eventual **No room** error.\\ \\ In this example the pattern matches the characters "a", "b", "c", "x", "y" or "z" anywhere in the string. The program as listed provides no information on //where// in the string the match occurred. You can discover that information by amending the program as follows:\\ \\ DIM offsets{start%, finish%} REPEAT INPUT "Enter a string: " test$ SYS regexec%, buffer%, test$, 1, offsets{}, 0 TO result% IF result% PRINT "Not matched" ELSE PRINT "Matched at ";offsets.start% UNTIL FALSE Here **offsets.start%** is set to the offset from the beginning of the string of the first match.\\ \\ You can specify that the matching is //case insensitive// by changing the final parameter of **regcomp** from 0 to 2 as follows:\\ \\ _REG_ICASE = 2 SYS regcomp%, buffer%, pattern$, _REG_ICASE TO result% You can also specify the use of **extended regular expressions** by setting the final parameter to 1:\\ \\ _REG_EXTENDED = 1 SYS regcomp%, buffer%, pattern$, _REG_EXTENDED TO result% In this mode additional //metacharacters// are recognised, for example the vertical bar (|) signifies alternatives:\\ \\ | abc|def\\ | matches "abc" or "def"\\ | \\ ---- [1] When last checked, the file **gnu_regex.exe** was corrupted (missing the last byte). To repair it you can use this simple BBC BASIC program:\\ \\ F% = OPENUP("gnu_regex.exe") PTR#F% = EXT#F% BPUT #F%,0 CLOSE #F%