I have the following code snippet to find the number of syllables for all words in a given source code "sample.txt" using NLTK:
import re
import nltk
from curses.ascii import isdigit
from nltk.corpus import cmudict
import nltk.data
import pprint
d = cmudict.dict()
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("sample.txt")
data = fp.read()
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words
regexp = "[A-Za-z]+"
exp = re.compile(regexp)
def nsyl(word):
return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])
sum1 = 0
count = 0
count1 = 0
for a in words:
if exp.match(a)):
print a
print "no of syllables:",nysl(a)
sum1 = sum1 + nysl(a)
print "sum of syllables:",sum1
if nysl(a)<3:
count = count + 1
else:
count1 = count1 + 1
print "no of words with syll count less than 3:",count
print "no of complex words:",count1
This code will correspond to each input word using the cmu dictionary and indicate the number of syllables for the word. But it does not work and displays an error if the word is not found in the dictionary or I use my own name in the input file. I want to check if the word exists in the dictatorship, and if not, skip it and continue and consider the next word. How to do it?
aks source
share