Posts Tagged ‘haskell’

Contest Fail

Posted in Programming on January 16th, 2012 by Jamie – Be the first to comment

Last weekend I had a few hours to spare, so I decided to work on one of the problems from CodeSprint2, InterviewStreet’s contest to find the best programming talent.

Since I didn’t have long to work, I decided to try to complete the algorithm problem with the highest value. I chose to work on Count Strings (closed now). Given a regular expression, and a string length N, count the number of distinct strings of length N which the regular expression can match.

I looked at the sample test case, spent some time thinking about the problem and decided to try it. Although bounds for test cases were provided, I gave them only a cursory glance. This would turn out to be a mistake.

In my daily work, I typically focus on getting things working rather than spending time on determining the optimal way to do things. It is with this mindset that I decided that I would actually enumerate every string that a regular expression could produce, and then count them. I failed even before I began.

The regular expression language was limited to the alphabet {a, b}, with operators star, union, and concatenation.

I decided to use Haskell, which has the terseness of Ruby or Python, but the performance and type checking of a compiled language. I created a data type to represent the regular expressions. Recursive data types made this representation very straightforward.

data RegularExpression = Symbol Char
  | Concat RegularExpression RegularExpression
  | Union RegularExpression RegularExpression
  | Star RegularExpression
  deriving (Show, Eq)

I had the enumeration part complete fairly quickly. Pattern matching makes everything really intuitive and readable. The star operation required the most code.

listPossibilities (Symbol c) limit
  | limit > 0 = [c]
  | otherwise = []

listPossibilities (Concat r1 r2) limit = combos where
  o1 = listPossibilities r1 limit
  o2 = listPossibilities r2 limit
  combos = [ a ++ b | a <- o1, b <- o2, length (a++b) <= limit ]

listPossibilities (Union r1 r2) limit = possibilities where
  o1 = listPossibilities r1 limit
  o2 = listPossibilities r2 limit
  possibilities = o1 ++ o2

listPossibilities (Star _) 0 = [""]
listPossibilities (Star r1) limit = possibilities where
  opt = listPossibilities r1 limit
  possibilities = "": whileLimit opt limit optSet
  optSet = Set.fromList opt 

  -- | Uses nub right now, really inefficient. probably should use some sort of memoization
  whileLimit :: [String] -> Int -> [String] -> [String]
  whileLimit base lim acc = pos where
    new = List.nub [a++b | a <- base, b <- acc, length (a++b) <= lim]
    pos = if null new || length new < length acc then acc
          else whileLimit base lim (List.nub $ acc ++ new)

The next component I needed was a parser to convert input into these regular expression objects. I used the Parsec parser module to construct the parser. This part actually took a bit longer, as I tried to refine the parser to accept deeply nested expressions.

The regular expression union parser below parses a parenthesized expression or symbol, followed by a pipe, and another parenthesized expression or symbol, returning a RegularExpression object.

reUnion :: GenParser Char st RegularExpression
reUnion = do
  re1 <- pexpr <|> reSymbol
  char '|'
  re2 <- pexpr <|> reSymbol
  return (Union re1 re2)

Once I got the parser working, I glued it together with the enumeration portion, and submitted it to the contest. It immediately failed to compiled due to a dependency on Test.HUnit I had in my code. I reorganized the code so that I could easily remove this dependency, and resubmitted.

This time, it took, but when I went back after a few minutes to check the submission status, it had failed all but 1 of the test cases. It had taken too much time! I went back to the submission guidelines and noted that there was a time limit of 5 minutes.

Originally, I had used an algorithm which enumerated by checking distinctness using a List type. I knew at the time that this was very inefficient, but I focused on finishing the implementation at the time. I went back to this part and used a Set instead. I was optimistic that this would fix my problems and I would be able to knockout at least another test case.

I resubmitted and I failed 10 tests again. Then, I reread the problem, and looks at the boundaries for N. N could be up to 100,000 (IIRC)! Of course enumeration (especially the way I was doing it which would enumerate many strings of less than length N) would fail–I had taken a totally incorrect approach to this problem. I did not need to enumerate–I only needed to give a number of potential strings. This could’ve been done without enumeration. For example, given the expression a*, and a number N, there is only 1 possibility. Given (a|b)^*, 2^N. For example, one of the test cases was (a|b)^*, N=5, which has 32 possibilities. Another of the expressions was a^*ba^*. Since b can only appear once, the possible strings include strings of  (N-1) a’s and a single b. There are N such combinations (the number of positions b can be in, in an N length string): baa…, aba…, aab…, … . This is the approach I should’ve taken.

I left the solution there, and acknowledged my defeat. Source here: https://github.com/jamiely/count-strings

Haskell Poetry Parser

Posted in Programming on December 23rd, 2011 by Jamie – Be the first to comment

For our final project for CIS552: Advanced Programming, my partner and I implemented a poetry parser. The program identifies a body of text piped from STDIN as one of: haiku, rhyming poem (aba, or aabba), sonnet, or limerick.

The project source is located at: https://github.com/jamiely/poetry-parser.

The project involved using the CMU Pronouncing Dictionary found here, creating parsers for the dictionary, and base parsers that could be combined for various forms.

For example, we implemented a parser for syllables that allows us to create a haiku parser.

-- | Parses a Haiku
haiku :: PoemParser RhymeMap
haiku = do
  ns 5
  ns 7
  ns 5
  return Map.empty where
  ns = nSyllables

and a rhyming scheme parser that allowed us to recognize limericks:

aabba :: PoemParser RhymeMap
aabba = rhymeScheme "aabba" Map.empty 

-- | The stress pattern of a limerick
-- note: syllable requirement is implicit
limerickStress :: PoemParser RhymeMap
limerickStress = do
  stressLine [D,U, D,D,U, D,D,U]
  stressLine [D,U, D,D,U, D,D,U]
  stressLine [D,U, D,D,U]
  stressLine [D,U, D,D,U]
  stressLine [D,U, D,D,U, D,D,U] 

limerick :: PoemParser RhymeMap
limerick = limerickStress <&> aabba

For example, we can determine the type of the following poem:

An old silent pond…
A frog jumps into the pond,
splash! Silence again.
- Basho

by running the program thusly:

> ./Main < extra/basho.haiku

with the result:

Opened file
Poem:
An old silent pond...
A frog jumps into the pond,
splash! Silence again.

Dictionary: 

A  AH0
AGAIN  AH0 G EH1 N
AN  AE1 N
FROG  F R AA1 G
INTO  IH0 N T UW1
JUMPS  JH AH1 M P S
OLD  OW1 L D
POND  P AA1 N D
POND'S  P AA1 N D Z
SILENCE  S AY1 L AH0 N S
SILENT  S AY1 L AH0 N T
SPLASH  S P L AE1 SH
THE  DH AH0

Type: Haiku

And the following poem:

something something fox
apple jacks
word here pox

by running the program thusly:

> ./Main < extra/fox.rhyming

with the result:

Opened file
Poem:
something something fox
apple jacks
word here pox

Dictionary: 

APPLE  AE1 P AH0 L
FOX  F AA1 K S
HERE  HH IY1 R
JACKS  JH AE1 K S
POX  P AA1 K S
SOMETHING  S AH1 M TH IH0 NG
WORD  W ER1 D

Type: Rhyming poem: aba

Pattern Matching in Haskell

Posted in Programming on November 14th, 2011 by Jamie – Be the first to comment

I’ve only been taking one class this semester. It’s an advanced programming course focusing on the functional language Haskell. While I was familiar with some functional concepts such as map, fold, and currying from using Javascript, and lazy-evaluation from dabbling with Clojure, this class has introduced a number of new, intriguing concepts.

Not necessarily a functional feature, pattern matching in Haskell is an incredible feature that has really come to make a lot of sense to me. Given a type constructor with two definitions:


-- Given the type below
data Shape = Rectangle Float Float Float Float | Circle Float Float Float

area :: Shape -> Float
-- We can pattern match below, to change the defn depending on the constructor
area (Rectangle x1 y1 x2 y2) = (y1 - y2) * (x1 - x2) :: Float
-- Use _ to discard matches
area (Circle _ _ radius) = pi * radius * radius

In Haskell, tuples are a basic data type. You can pattern match to unpack a tuple, like this:

tmp = (1, (2, (3, 4)))

unpackTmp1 :: (Int, (Int, (Int, Int)))
unpackTmp1 (_, (_, rtn)) = rtn
unpackTmp1 tmp
-- rtn = (3,4)

unpackTmp2 :: (Int, (Int, (Int, Int)))
unpackTmp2 (_, (_, (_, rtn))) = rtn
unpackTmp2 tmp
-- rtn = 4

So, pattern is something really new to me that I haven’t seen in other languages before. It makes dealing with tuples easy, and you can also use it with lists:

x = [1..10]
secondItem (_:i:_) = i
secondItem x
-- returns 2

thirdItem (_:_:i:_) = i
thirdItem x
-- return 3

tail' (_:rest) = rest
tail' x
-- returns [2..10]

Switch to our mobile site