Read Word document in php

Now I am doing a project and I am stuck in reading text documents.

Word file content.

This is a test word file in PHP.

Thank you.

PHP code.

    $myFile = "wordfile.docx";
    $fh = fopen($myFile, 'r');
    $theData = fread($fh, 1000);
    fclose($fh);
    echo $theData;

output:

PK!éQ°Â[Content_Types].xml ¢( ´"MOÂ@†ï&þ‡f¯¦]ð`Œ¡pP<*‰Ïëv
 «Ýì,_ÿÞiI¡(ziÒNß÷}fÚÞ`©‹h•5)ë&‘6Sf’²×ñc|Ë"Âd¢°R¶dƒþåEo
 ¼r€© ¦l‚»ãå´ÀÄ:0TÉ­×"ЭŸp'䧘¿îtn¸´&€  q(=X¿÷¹˜!.éñ
 š„ä,º_¿WF¥L8W()ò²Êu <"œ›l.Þ%¤¬Ìqª^Nøp0ÙKPºl­*Õ3Ó
 «¢‘ðáIhbçë3žY9ÓÔwr¼¹F›çJB­/Ýœ·é;é"©+Z(³e?ÈaUþ=ÅÚ÷Ä
 ø7¦Ã<I?Hû<4ÆeÓÉ:bGÛž!ÐN    ùþÛÆmCÇs+ÂÞ_þbǼ$§ó4ïœ
 0ñ£¶n…´#€W×îٕͱH:#oÒÎñ¿h{»JuLGÎ êõÐtÄêDZXg÷åFÌ kÈæÕîÿÿPK
 !ÇÂ'¼ß_rel

Anyway, to read a Word document in PHP?

+5
source share
5 answers

For docx use this function

function read_docx($filename){

    $striped_content = '';
    $content = '';

    if(!$filename || !file_exists($filename)) return false;

    $zip = zip_open($filename);
    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }
    zip_close($zip);      
    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "\r\n", $content);
    $striped_content = strip_tags($content);

    return $striped_content;
}

It will return text from docx

+10
source

"PHPWord is a library written in pure PHP that provides a set of classes for writing and reading from various document file formats." (PHPOffice, 2016)

This open php library should solve your problem. you can download it or get it by composer:

https://github.com/PHPOffice/PHPWord

+3
source

"docx" "doc". Docx xml zipfile ( wikipedia). Doc .

, docx php ( Phpdocx ). , zip xml , - , ZipArchive, docx DOMDocument SimpleXML XMLReader XSLTProcessor, xml.

+2

A Word document is not conveniently stored as a text file (it is more like an xml / binary file), so you can’t just use the echo and expect it to display the user-readable part of the file docx.

There is a library there that can do what you want, but only docfile is required

Docvert

+1
source

All Articles