Posts Tagged ‘regular expressions’

Regex Challenge

Posted in Troubleshooting on February 25th, 2011 by Jamie – Be the first to comment

A friend asked for a regex that matches a paragraph that contains only upper-case text inside a nested hierarchy of tags. Some examples:

Matches:


<p class="abcdefg"><a href="1.htm"><span>HELLO THERE</span></a></p>
<p class="c8"><span class="c7">BY ERIC D. JAMES, MD</span></p>
<p style="border:1px solid red">HELLO DARLING</p>

Fail:


<p class="c8"><span class="c7">BY Eric James, MD</span></p>
<p style="border:1px solid red">Hello Darling</p>
<p class="abcdefg"><a href="1.htm"><span>HELLO THeRE</span></a></p>

I came up with the following expression:

/<p[^>]*>(<[^>]*>)*[^a-z<]+(<\/[^p][^>]*>)*<\/p[^>]*>/

It doesn’t handle tags interspersed with text or nested paragraph tags.

Here’s a sample on Rubular.

Word Jumble Game: Part 5

Posted in Software on March 22nd, 2010 by Jamie – Be the first to comment

I used jQuery for the UI. I am a recent convert to jQuery, having mostly used Prototype + Scriptaculous.

The word list is embedded into the page script as a javascript array. On document ready, html is generated, which writes the first and last word to the page, and creates blank input boxes for the intermediate words.

There is a keyup event bound on each input box, which will determine if the word is correct. If it is, a css class will be added which shows a green underline underneath the box. Otherwise, a red underline will be shown.

Finally, there are buttons on the page which are created dynamically and provides hints or reveal all of the answers.

Word Jumble Filled

Word Jumble Filled

Word Jumble Game: Part 4

Posted in Software on March 20th, 2010 by Jamie – Be the first to comment

Search

The problem of generating the chain of clues is a simple search problem. In this case, depth-first search was used, because the algorithm would attempt path depth-wise and only explore another branch if the generated chain was not long enough.

Another tactic would have been to use a breadth first search. To use breadth-first search, we could have modified the regex pattern to find all words that differed from the base word by just one letter.

Using water as the base word, that regular expression looks something like: /([^w]ater|w[^a]ter|wa[^t]er|wat[^e]r|wate[^r])/. This would find all words in the dictionary that differed by one word (let’s call this word set B).

If we were using breadth-first search, we would then repeat the process with all of the words we just found (word set B).

If you were to visualize the difference between breadth-first and depth-first search, breadth-first would look like a tree with wide but shallow roots. Depth-first search would look like a tree with few but deep roots.

Query Params

The flexibility of the puzzle is enhanced by optional query parameters that may be applied. The word param allows specification of the starting or seed word. The length param specifies the maximum length of the puzzle.

Recursion

The program uses recursion to perform the search. This almost goes without saying, for it is difficult to do general search without recursion (although you could do so with macros and similar programming constructs). Search may be done using loop control structures but I can’t imagine an elegant solution using loops.

The pseudocode for the recursion is basically:

function build(baseWord, chainWords, maxLength)

    regex = generateRandomRegex(baseWord)
    wordSetB = getPossibleWords(regex, notIn=chainWords)
    for(word in wordSetB)

        chain = build(word, chainWords+word, maxLength)
        if Length(chain) >= maxLength

            break

    return chain

Word Jumble Game: Part 3

Posted in Software on March 18th, 2010 by Jamie – Be the first to comment

The first thing I did was made sure that the word list would be cached on application start. This was as simple as creating an Application.cfc cfcomponent and implementing the onApplicationStart function.  This function reads the dictionary in (described in the last entry) and caches the word list in a ColdFusion array. There are other options for storing this data, but this had the best mix of speed and function considering the method of search I wanted to use against it.

Although the dictionary was only 52K, this caching probably helped performance a great deal.

To generate the word list, I decided on the following algorithm:

  1. Choose an initial starting word (at random, or via user entry)
  2. Use the word to generate a regular expression.
    Replace a random single letter with the Regex pattern [^L] (where L is the letter you have replaced).

    Example:

    word: water
    regex: w[^a]ter

  3. Next, iterate through all of the words, testing each word against the regular expression. Store all matches.
  4. With each match, one-by-one, repeat Step 2 until we get a chain of N words. (Where N is the maximum length of the chain.)
  5. Obviously, if we have no more matches, we stop. If we have at least a 3-word chain, we can use it.

There are a few considerations not discussed above in generating the puzzle:

  • If we match a word that is already in the chain, we should ignore that word to avoid duplicates.
  • Not implemented: we should not replace a letter in the same position twice. For example, if we replace the “w” in water, don’t replace the “h” hater (if hater is the 2nd word).
  • Depth-first versus Breadth-first searching…to be discussed

Word Jumble Game: Part 2

Posted in Software on March 16th, 2010 by Jamie – Be the first to comment

In my last entry, I described the concept behind the Word Jumble game. In this entry, I will describe initial steps in creating the game.

Firstly, I needed some dictionary of words. The Unix flavors have built-in dictionaries, and I develop on OSX, so I Googled the location of its dictionary:

/usr/share/dict/words

I knew I wanted to do puzzles of only 5-letter words, so I used the

grep

command to create a file of just these words.

grep ^.....$ /usr/share/dict/words > dictionary-5letterwords.txt

Notice the regular expression I used. I wanted to demonstrate an actual use of regular expressions for this project. The regular expression

/^.....$/

says to match a line of just 5 characters. The period means to match any character. I made the assumption that there would be no words in the dictionary with a space or other punctuation–although that was, perhaps, a faulty assumption.

Next, I started working on the code. Since we use mostly ColdFusion at Wharton, that’s what I wrote the app in.

Word Jumble Filled

Word Jumble Filled

Tech Talk: Regular Expression Recap

Posted in Programming on March 5th, 2010 by Jamie – 2 Comments

Yesterday I presented a tech talk on regular expressions. Overall, the feedback was very helpful! One thing I need to work on is making sure that my examples are work 100%. I know it can be very confusing when someone stumbles through an example.

Another thing which I need to work on is pacing. Unfortunately, I did not word my questionnaire in a way that allowed me to know whether people thought I was too fast or too slow. I will definitely correct this the next time I get to speak. I think that if one already knew regular expressions, the pace would be very slow.

Overall, it was a great experience. I relish every chance I get to speak. I don’t particularly enjoy it personally, but I find being able to speak in a clear, engaging way, is an important talent to cultivate.

For anyone who is interested, here is the presentation.
http://docs.google.com/present/view?id=aqfbzmmn589_2817gbrbx6g6

I have also linked to the questionnaire I handed out (it still needs work), in case you’d like to see it or use it yourself.
http://docs.google.com/View?id=aqfbzmmn589_2899d8kw2wcz

Outlook Rules VBA, to Bypass Exchange’s Rule Limit

Posted in Programming on April 20th, 2009 by Jamie – Be the first to comment

Custom Outlook Rules VBA to Bypass Exchange’s 32K rule limit. Add entries to the array returned by Jam_GetRules to add more rules. The first element of each array is a comma-delimited list of properties to check To, From, and/or Subject. The second element is a regular expression supported by Microsoft’s VBScript RegEx class. The third element is a folder to move the item to.

Note that when using Exchange, the address is not example@example.com, but a path containing the user’s domain ID. The rule will also test against the Proper Name associated with the address.

Public WithEvents myOlItems As Outlook.Items

Private Sub Application_Startup()
    Jam_Init
End Sub

Private Sub Jam_Init()
    Set myOlItems = Outlook.Session.GetDefaultFolder(olFolderInbox).Items
End Sub

Private Function Jam_GetRules()
    Jam_GetRules = Array( _
        Array("To,From", "domainId", "AP"), _
        Array("To,From", "jacob", "IT"), _
        Array("Subject", "(approval chg|Ticket #)", "Help Desk"), _
        Array("Subject", "weekly job postings", "HR") _
        )

End Function

Private Sub myOlItems_ItemAdd(ByVal item As Object)
    Jam_ItemAdd item
End Sub

Private Sub Jam_ItemAdd(ByRef item As Object)
   ' Check to make sure it is an Outlook mail message, otherwise
   ' subsequent code will probably fail depending on what type
   ' of item it is.
   If TypeName(item) = "MailItem" Then
        Jam_HandleMailItem item
   End If

End Sub

Private Sub Jam_ProcessInbox()
    Dim item As MailItem
    For Each item In Outlook.Session.GetDefaultFolder(olFolderInbox).Items
        Jam_HandleMailItem item
    Next
End Sub

Private Sub Jam_HandleMailItem(ByRef item As MailItem)
    Dim itemRecipients: Set itemRecipients = item.Recipients
    Dim itemTo: itemTo = Jam_AddressListToString(item.Recipients, "Address", ",")

    For Each rule In Jam_GetRules
        Dim ruleProps: ruleProps = Split(rule(0), ",")
        Dim rulePattern: rulePattern = rule(1)
        Dim folderName: folderName = rule(2)

        For Each p In ruleProps
            Dim toTest: toTest = ""
            Select Case p
                Case "To"
                    toTest = itemTo
                Case "Subject"
                    toTest = item.subject
                Case "From"
                    toTest = item.SenderName &amp; " &lt;" &amp; item.SenderEmailAddress &amp; "&gt;"

            End Select
            If RE_TestInsensitive(toTest, rulePattern) Then
                ' perform action
                ' item.Move (MAPIFolder)
                Dim folder
                Set folder = Jam_GetFolder(folderName)
                If Not folder Is Nothing Then
                    'MsgBox "move " &amp; item.subject &amp; " to " &amp; folderName
                    item.Move (folder)
                    Exit For
                End If
            End If
        Next
    Next
End Sub

Private Function Jam_AddressListToString(ByRef list, ByVal prop, ByVal delim)
    Dim rtn: rtn = Array()
    For Each item In list
        Array_Append rtn, CStr(item.name &amp; " &lt;" &amp; item.Address &amp; "&gt;")
    Next
    Jam_AddressListToString = Join(rtn, delim)
End Function

Public Function Jam_GetFolder(ByVal folderName As String) As MAPIFolder
    Set Jam_GetFolder = Jam_GetFolderHelper(folderName, _
        Outlook.Session.GetDefaultFolder(olFolderInbox))

End Function

Private Function Jam_GetFolderHelper(ByVal folderName As String, ByRef parent As MAPIFolder) As MAPIFolder
    Set Jam_GetFolderHelper = Nothing
    Dim f As MAPIFolder, rtnFolder As MAPIFolder

    For Each f In parent.Folders
        If f.name = folderName Then
            Set Jam_GetFolderHelper = f
            Exit Function
        End If
    Next

    For Each f In parent.Folders
        Set rtnFolder = Jam_GetFolderHelper(folderName, f)
        If Not rtnFolder Is Nothing Then
            Set Jam_GetFolderHelper = rtnFolder
            Exit Function
        End If
    Next
End Function

''
' Appends a value onto the end of an array.
' @param    myList  The target array
' @param    myItem  The item to Array_Append
' @todo     Add support for appending objects
Function Array_Append(ByRef myList, ByRef myItem)
    If Not IsArray(myList) Then
        Exit Function
    End If

    ReDim Preserve myList(UBound(myList) + 1)

    myIndex = UBound(myList)

    If IsObject(myItem) Then
        Set myList(myIndex) = myItem
    Else
        myList(myIndex) = myItem
    End If

    Array_Append = myList
End Function

''
' Performs global test
' @return       Returns true if pattern matches string
'
Function RE_Test(ByVal str, ByVal pattern, ByVal caseSensitive)
    Dim reBase: Set reBase = CreateObject("VBScript.RegExp")
    reBase.pattern = pattern
    reBase.IgnoreCase = Not caseSensitive
    RE_Test = reBase.Test(str)

    Set reBase = Nothing
End Function

''
' Tests wehther a string matches a pattern case-insensitively
Function RE_TestInsensitive(ByVal str, ByVal pattern)
    RE_TestInsensitive = RE_Test(str, pattern, False)
End Function

Switch to our mobile site