The number of syllables for words in the text

I have the following code snippet to find the number of syllables for all words in a given source code "sample.txt" using NLTK:

   import re
   import nltk
   from curses.ascii import isdigit
   from nltk.corpus import cmudict
   import nltk.data
   import pprint

   d = cmudict.dict()

   tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
   fp = open("sample.txt")
   data = fp.read()
   tokens = nltk.wordpunct_tokenize(data)
   text = nltk.Text(tokens)
   words = [w.lower() for w in text]
   print words #to print all the words in input text
   regexp = "[A-Za-z]+"
   exp = re.compile(regexp)

   def nsyl(word):
      return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])

  sum1 = 0
  count = 0
  count1 = 0
  for a in words:
     if exp.match(a)):
         print a
         print "no of syllables:",nysl(a)
         sum1 = sum1 + nysl(a)
         print "sum of syllables:",sum1
         if nysl(a)<3:
             count = count + 1
         else:
             count1 = count1 + 1

  print "no of words with syll count less than 3:",count
  print "no of complex words:",count1

This code will correspond to each input word using the cmu dictionary and indicate the number of syllables for the word. But it does not work and displays an error if the word is not found in the dictionary or I use my own name in the input file. I want to check if the word exists in the dictatorship, and if not, skip it and continue and consider the next word. How to do it?

+3
source share
1 answer

I assume the problem is a key mistake. Replace your definition with

def nsyl(word):
  lowercase = word.lowercase()
  if lowercase not in d:
     return -1
  else:
     return max([len([y for y in x if isdigit(y[-1])]) for x in d[lowercase]])

, , nsyl, nsyl.

+2

All Articles