I sent this message to the Solr mailing list, but I also try here if there is a Solr expert there.
I'm trying to use a regex fragmenter, and it's hard for me to get the results that I want. I try to get fragments that begin with the word character and end with punctuation, but for some reason the fragments returned to me seem very inflexible, despite the fact that I provided a big slope. Here are the relevant options I'm using, maybe someone can indicate where I made a mistake:
<str name="hl.fragsize">500</str>
<str name="hl.fragmenter">regex</str>
<str name="hl.regex.slop">0.8</str>
<str name="hl.regex.pattern">[\w].*{400,600}[.!?]</str>
<str name="hl">true</str>
<str name="q">chinese</str>
This must match between 400-600 characters, starting with the word character and ending with one of.!?. Here is an example of a typical result:
. Check out these pictures. Nine panda kids on display for the first time Thursday in southwestern China. They are less than a year old. They just recently stopped breastfeeding. There are only 1,600 of these guys who went into the mountain forests of central China, another 120 people in Chinese breeding facilities and zoos. And they are about 20 who live outside of China in zoos. They almost completely exist on bamboo. They can live up to 30 years. And also these little guys will end up with a lot more. They will grow
, ! , , , , , . , , . , , ...
,