@Beta public class WordToTextConverter extends AbstractWordConverter
UNICODECHAR_NO_BREAK_SPACE, UNICODECHAR_NONBREAKING_HYPHEN, UNICODECHAR_ZERO_WIDTH_SPACE
Constructor and Description |
---|
WordToTextConverter()
Creates new instance of
WordToTextConverter .
|
WordToTextConverter(org.w3c.dom.Document document)
Creates new instance of
WordToTextConverter .
|
WordToTextConverter(TextDocumentFacade textDocumentFacade) |
Modifier and Type | Method and Description |
---|---|
protected void |
afterProcess()
Special actions that need to be called after processing complete, like updating stylesheets or building document notes list.
(处理完成后需要调用的特殊操作,例如更新样式表或构建文档注释列表。)
|
org.w3c.dom.Document |
getDocument() |
java.lang.String |
getText() |
static java.lang.String |
getText(DirectoryNode root) |
static java.lang.String |
getText(java.io.File docFile) |
static java.lang.String |
getText(HWPFDocumentCore wordDocument) |
boolean |
isOutputSummaryInformation() |
static void |
main(java.lang.String[] args)
Java main() interface to interact with
WordToTextConverter
(Java main() 接口与 WordToTextConverter 交互)
|
protected void |
outputCharacters(org.w3c.dom.Element block, CharacterRun characterRun, java.lang.String text) |
protected void |
processBookmarks(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range range, int currentTableLevel, java.util.List<Bookmark> rangeBookmarks)
Wrap range into bookmark(s) and process it.
(将范围包装到书签中并进行处理。)
|
protected void |
processDocumentInformation(SummaryInformation summaryInformation) |
void |
processDocumentPart(HWPFDocumentCore wordDocument, Range range) |
protected void |
processDrawnObject(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, java.lang.String path, org.w3c.dom.Element block) |
protected void |
processEndnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range endnoteTextRange) |
protected void |
processFootnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range footnoteTextRange) |
protected void |
processHyperlink(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String hyperlink) |
protected void |
processImage(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture) |
protected void |
processImage(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture, java.lang.String url) |
protected void |
processImageWithoutPicturesManager(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture) |
protected void |
processLineBreak(org.w3c.dom.Element block, CharacterRun characterRun) |
protected boolean |
processOle2(HWPFDocument wordDocument, org.w3c.dom.Element block, Entry entry) |
protected void |
processPageBreak(HWPFDocumentCore wordDocument, org.w3c.dom.Element flow) |
protected void |
processPageref(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String pageref) |
protected void |
processParagraph(HWPFDocumentCore wordDocument, org.w3c.dom.Element parentElement, int currentTableLevel, Paragraph paragraph, java.lang.String bulletText) |
protected void |
processSection(HWPFDocumentCore wordDocument, Section section, int s) |
protected void |
processTable(HWPFDocumentCore wordDocument, org.w3c.dom.Element flow, Table table) |
void |
setOutputSummaryInformation(boolean outputDocumentInformation) |
getCharacterRunTriplet, getFontReplacer, getNumberColumnsSpanned, getNumberRowsSpanned, getPicturesManager, processCharacters, processDeadField, processDocument, processDrawnObject, processDropDownList, processField, processNoteAnchor, processParagraphes, processSingleSection, processSymbol, setFontReplacer, setPicturesManager, tryDeadField
public WordToTextConverter() throws javax.xml.parsers.ParserConfigurationException
WordToTextConverter
. Can be used for output several
HWPFDocument
s into single text document.
(创建 WordToTextConverter 的新实例。可用于将多个 HWPFDocument 输出为单个文本文档。)
javax.xml.parsers.ParserConfigurationException
- if an internal
DocumentBuilder
cannot be created
(javax.xml.parsers.ParserConfigurationException - 如果无法创建内部 DocumentBuilder)
public WordToTextConverter(org.w3c.dom.Document document)
WordToTextConverter
. Can be used for output several
HWPFDocument
s into single text document.
(创建 WordToTextConverter 的新实例。可用于将多个 HWPFDocument 输出为单个文本文档。)
document
- XML DOM Document used as storage for text pieces
(document - 用于存储文本片段的 XML DOM 文档)
public WordToTextConverter(TextDocumentFacade textDocumentFacade)
public static java.lang.String getText(DirectoryNode root) throws java.lang.Exception
java.lang.Exception
(java.lang.异常)
public static java.lang.String getText(java.io.File docFile) throws java.lang.Exception
java.lang.Exception
(java.lang.异常)
public static java.lang.String getText(HWPFDocumentCore wordDocument) throws java.lang.Exception
java.lang.Exception
(java.lang.异常)
public static void main(java.lang.String[] args) throws java.lang.Exception
WordToTextConverter
Usage: WordToTextConverter infile outfile
Where infile is an input .doc file ( Word 95-2007) which will be rendered as plain text into outfile (与 WordToTextConverter 交互的 Java main() 接口 用法:WordToTextConverter infile outfile 其中 infile 是一个输入 .doc 文件(Word 95-2007),它将作为纯文本呈现到 outfile)java.lang.Exception
(java.lang.异常)
protected void afterProcess()
AbstractWordConverter
afterProcess
in class
AbstractWordConverter
public org.w3c.dom.Document getDocument()
getDocument
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 getDocument)
public java.lang.String getText() throws java.lang.Exception
java.lang.Exception
(java.lang.异常)
public boolean isOutputSummaryInformation()
protected void outputCharacters(org.w3c.dom.Element block, CharacterRun characterRun, java.lang.String text)
outputCharacters
in class
AbstractWordConverter
(AbstractWordConverter 类中的 outputCharacters)
protected void processBookmarks(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range range, int currentTableLevel, java.util.List<Bookmark> rangeBookmarks)
AbstractWordConverter
processBookmarks
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processBookmarks)
protected void processDocumentInformation(SummaryInformation summaryInformation)
processDocumentInformation
in class
AbstractWordConverter
(AbstractWordConverter 类中的 processDocumentInformation)
public void processDocumentPart(HWPFDocumentCore wordDocument, Range range)
processDocumentPart
in class
AbstractWordConverter
protected void processDrawnObject(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, java.lang.String path, org.w3c.dom.Element block)
processDrawnObject
in class
AbstractWordConverter
(AbstractWordConverter 类中的 processDrawnObject)
protected void processEndnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range endnoteTextRange)
processEndnoteAutonumbered
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processEndnoteAutonumbered)
protected void processFootnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, org.w3c.dom.Element block, Range footnoteTextRange)
processFootnoteAutonumbered
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processFootnoteAutonumbered)
protected void processHyperlink(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String hyperlink)
processHyperlink
in class
AbstractWordConverter
(AbstractWordConverter 类中的 processHyperlink)
protected void processImage(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture)
processImage
in class
AbstractWordConverter
protected void processImage(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture, java.lang.String url)
processImage
in class
AbstractWordConverter
(AbstractWordConverter 类中的 processImage)
protected void processImageWithoutPicturesManager(org.w3c.dom.Element currentBlock, boolean inlined, Picture picture)
processImageWithoutPicturesManager
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processImageWithoutPicturesManager)
protected void processLineBreak(org.w3c.dom.Element block, CharacterRun characterRun)
processLineBreak
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processLineBreak)
protected boolean processOle2(HWPFDocument wordDocument, org.w3c.dom.Element block, Entry entry) throws java.lang.Exception
processOle2
in class
AbstractWordConverter
java.lang.Exception
(java.lang.异常)
protected void processPageBreak(HWPFDocumentCore wordDocument, org.w3c.dom.Element flow)
processPageBreak
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processPageBreak)
protected void processPageref(HWPFDocumentCore wordDocument, org.w3c.dom.Element currentBlock, Range textRange, int currentTableLevel, java.lang.String pageref)
processPageref
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processPageref)
protected void processParagraph(HWPFDocumentCore wordDocument, org.w3c.dom.Element parentElement, int currentTableLevel, Paragraph paragraph, java.lang.String bulletText)
processParagraph
in class
AbstractWordConverter
(AbstractWordConverter 类中的 processParagraph)
protected void processSection(HWPFDocumentCore wordDocument, Section section, int s)
processSection
in class
AbstractWordConverter
(AbstractWordConverter 类中的 processSection)
protected void processTable(HWPFDocumentCore wordDocument, org.w3c.dom.Element flow, Table table)
processTable
in class
AbstractWordConverter
(类 AbstractWordConverter 中的 processTable)
public void setOutputSummaryInformation(boolean outputDocumentInformation)
Copyright 2021 The Apache Software Foundation or its licensors, as applicable.