Programmatically reading a Microsoft Word document

I have my students who send their Microsoft Word assignments to ColdFusion 10 server. I would like to write an error checking tool to check for common errors such as missing page number in the title, school name on the title page, their name on the title page and etc. I specify many APA rules, Example. The phrase "Running head:" should be in the header section of page 1, but not on the rest of the paper. I assign a point value to each rule.

Ideally, this error check will be performed when they submit the task and report it immediately. This may require the use of

parser.parseFromString(str, "text/xml");

But alternatively, if I could write a program that I ran to check for errors, that could help automate my evaluation. In other words, using Microsoft Access or Visual Studio. But I do not want to do this, because then I will have to have Visual Studio on the server, and I do not think it will be possible.

The last option is to download all documents from the server and run the program locally, which is one step better than categorizing everything manually.

+5
source share
3 answers

I did this a few years ago using VBA, see this article. Here is an excerpt that analyzes each paragraph of a document:

Public Sub ParseLines()
    Dim singleLine As Paragraph
    Dim lineText As String

    For Each singleLine In ActiveDocument.Paragraphs
        lineText = singleLine.Range.Text

        '// parse the text here...

    Next singleLine
End Sub
+2
source

, , , Id POI Apache . ,

fis = createObject("java","java.io.FileInputStream").init(ExpandPath('./mydoc.docx'));
document = createObject("java","org.apache.poi.xwpf.usermodel.XWPFDocument").init(fis);
fis.close();

policy = document.getHeaderFooterPolicy();
firstHeader = policy.getFirstPageHeader().getText();
defaultHeader = policy.getDefaultHeader().getText();

, . , .

APACHE POI

+2

Try:

http://docxextractor.riaforge.org/

I extract all clear and some formatting

Disclaimer: I wrote it

+1
source

All Articles