Divide the text into a sentence, even Mr. Ms. exists in the text

I have a problem, I want to break the text into a sentence using fullstop (.)

For instance:

Mr. Bean is a British comedy series of 14 hour and a half episodes with Rowan Atkinson as the main character. Various episodes were written by Atkinson, Robin Driscoll, Richard Curtis and one of them, Ben Elton.

If I share the above text, I have 3 sentences like

1. Mr.

2. Bean is a British comedy television series of 14 half-hour episodes in which Rowan Atkinson plays a major role. Various episodes were written by Atkinson, Robin Driscoll, Richard Curtis and one of them, Ben Elton.

3. Various episodes were written by Atkinson, Robin Driscoll, Richard Curtis and one of them was Ben Elton.


I want to include Mr. in the second sentence, since the text should be divided into two sentences, not three.

1. Mr. Bean is a British comedy series of 14 hour and a half episodes with Rowan Atkinson as the main character. Various episodes were written by Atkinson, Robin Driscoll, Richard Curtis and one of them, Ben Elton.

2. Various episodes were written by Atkinson, Robin Driscoll, Richard Curtis, and one of them, Ben Elton.

Please help me. I appreciate the instant feedback from the community.

Thank.

+3
source share
3 answers

(, a.m.), .

, ( , , *), :

  • Mr. Mrs. Mr* Mrs*
  • .
  • Mr* Mrs* Mr. Mrs.

, NUL -, :

static IEnumerable<string> Splitter(string sentences)
{
    char sentinel = '\0';
    return sentences.Replace("Mr.", "Mr" + sentinel)
        .Replace("Mrs.", "Mrs" + sentinel)
        .Split(new[] { ". " }, StringSplitOptions.None)
        .Select(s => s.Replace("Mr" + sentinel, "Mr.")
                        .Replace("Mrs" + sentinel, "Mrs."));
}

, , - , GUID .

+6

( ) , split. . , .

:

  • <dot> <dot><dot>.
  • ( ) Mr<dot>.
  • , .
  • Mr<dot> ( ...) -.
  • <dot><dot> <dot>.

, escape-/.

. , .

+3
static IEnumerable<string> Splitter(string sentences)
{
    foreach (string s in 
        Regex.Split(sentences, "(?<!((mr)|(mrs)))\\.", RegexOptions.IgnoreCase))
    {
        if (!String.IsNullOrWhiteSpace(s)) yield return s.Trim() + ".";
    }
}

Simple regex based answer using negative appearance.

0
source

All Articles