I am trying to find a way to deny offers based on POS tags. Please consider:
include_once 'class.postagger.php';
function negate($sentence) {
$tagger = new PosTagger('includes/lexicon.txt');
$tags = $tagger->tag($sentence);
foreach ($tags as $t) {
$input[] = trim($t['token']) . "/" . trim($t['tag']) . " ";
}
$sentence = implode(" ", $input);
$postagged = $sentence;
$sentence = preg_replace("/(\w+)\/(JJ|MD|RB|VB|VBD|VBN)\b/", "not$1/$2", $sentence);
$sentence = preg_replace("/\/[A-Z$]+/", "", $sentence);
return "$postagged<br>$sentence";
}
BTW: In this example, I use POS tagging and the lexicon of Ian Barber. An example of this code run would be:
echo negate("I will never go to their place again");
I/NN will/MD never/RB go/VB to/TO their/PRP$ place/NN again/RB
I notwill notnever notgo to their place notagain
As you can see (and this problem is also commented on in the code), the denial of the words themselves is also canceled, as well: neverbecomes notnever, which, obviously, should not be. Since my regular expression skills are not all this, is there a way to exclude these words from the regular expression used?
[edit] , / , , , ( ) : -)