Convert OCRed Unstructured Text to Correct Text

I use Microsoft MODIin VB6for OCR images. (I know about other OCR tools like tesseract etc., but I find MODI more accurate than others)

The image, which should be OCRed, is as follows

enter image description here

and the text that I get after OCR looks below

Text1
Text2
Text3
Number1
Number2
Number3

The problem is that the corresponding text from the opposite column is not supported. How can I map Number1 to Text1?

I can only think of such a decision.

MODI provides the coordinates of all OCRed words like this.

LeftPos = Img.Layout.Words(0).Rects(0).Left
TopPos = Img.Layout.Words(0).Rects(0).Top

, , TopPos , LeftPos. . , , mysql.

SELECT group_concat(word ORDER BY `left` SEPARATOR ' ')
FROM test_copy
GROUP BY `top`

, . , .

DIV 5 , 5 , . node.js, , LeftPos, , .

: js , , Number1 5 , Text2 .

?

+3
1

100%, , "" , , , , ( , ). () . , .

Horizontal projection

, , , - . - , , , . , , 50% `Text1, , , .


SQL, , .

select 
    word.id, word.Top, word.Left, word.Right, word.Bottom 
from 
    word
where 
    (word.Top >= @leftColWordTop and word.Top <= @leftColWordBottom)
    or (word.Bottom >= @leftColWordTop  and word.Bottom <= @leftColWordBottom)

psuedo VB6 .

'assume words is a collection of WordInfo objects with an Id, Top, 
'   Left, Bottom, Right properties filled in, and a LineAnchorWordId 
'   property that has not been set yet.

'get the words in left-to-right order
wordsLeftToRight = SortLeftToRight(words) 

'also get the words in top-to-bottom order
wordsTopToBottom = SortTopToBottom(words) 

'pass through identifying a line "anchor", that being the left-most 
'   word that starts (and defines) a line
for each anchorWord in wordsLeftToRight

    'check if the word has been mapped to aline yet by checking if 
    '   its anchor property has been set yet.  This assumes 0 is not 
    '   a valid id, use -1 instead if needed
    if anchorWord.LineAnchorWordId = 0 then 

        'not locate every word on this line, as bounded by the 
        '   anchorWord.  every word determined to be on this line 
        '   gets its LineAnchorWordId property set to the Id of the 
        '   anchorWord
        for each lineWord in wordsTopToBottom

            if lineWord.Bottom < anchorWord.Top Then

                'skip it,it is above the line (but keep searching down
                '   because we haven't reached the anchorWord location yet)

            else if lineWord.Top > anchorWord.Bottom Then

                'skip it,it is below the line, and exit the search 
                '   early since all the rest will also be below the line
                exit for

            else if OverlapsWithinTolerance(anchorWord, lineWord) then

                lineWord.LineAnchorWordId = anchorWord.Id

            endif

        next

    end if

next anchorWord

'at this point, every word has been assigned a LineAnchorWordId, 
'   and every word on the same line will have a matching LineAnchorWordId
'   value.  If stored in a DB you can now group them by LineAnchorWordId 
' and sort them by their Left coord to get your output.
+4

All Articles