public final class WordExtractor extends java.lang.Object implements POIOLE2TextExtractor
Constructor and Description |
---|
WordExtractor(DirectoryNode dir) |
WordExtractor(HWPFDocument doc)
Create a new Word Extractor
|
WordExtractor(java.io.InputStream is)
Create a new Word Extractor
|
WordExtractor(POIFSFileSystem fs)
Create a new Word Extractor
|
Modifier and Type | Method and Description |
---|---|
java.lang.String[] |
getCommentsText() |
HWPFDocument |
getDocument()
Return the underlying POIDocument
(返回底层 POIDocument)
|
java.lang.String[] |
getEndnoteText() |
HWPFDocument |
getFilesystem() |
java.lang.String |
getFooterText()
Deprecated.
3.8 beta 4
(已弃用。 3.8 测试版 4)
|
java.lang.String[] |
getFootnoteText() |
java.lang.String |
getHeaderText()
Deprecated.
3.8 beta 4
(已弃用。 3.8 测试版 4)
|
java.lang.String[] |
getMainTextboxText() |
java.lang.String[] |
getParagraphText()
Get the text from the word file, as an array with one String per paragraph
(从 word 文件中获取文本,作为每个段落一个字符串的数组)
|
java.lang.String |
getText()
Grab the text, based on the WordToTextConverter.
(根据 WordToTextConverter 抓取文本。)
|
java.lang.String |
getTextFromPieces()
Grab the text out of the text pieces.
(从文本片段中提取文本。)
|
boolean |
isCloseFilesystem() |
void |
setCloseFilesystem(boolean doCloseFilesystem) |
static java.lang.String |
stripFields(java.lang.String text)
Removes any fields (eg macros, page markers etc) from the string.
(从字符串中删除任何字段(例如宏、页标记等)。)
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getDocSummaryInformation, getMetadataTextExtractor, getRoot, getSummaryInformation
close
public WordExtractor(java.io.InputStream is) throws java.io.IOException
is
- InputStream containing the word file
(is - InputStream 包含单词文件)
java.io.IOException
(java.io.IOException)
public WordExtractor(POIFSFileSystem fs) throws java.io.IOException
fs
- POIFSFileSystem containing the word file
(fs - 包含单词文件的 POIFSFileSystem)
java.io.IOException
(java.io.IOException)
public WordExtractor(DirectoryNode dir) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public WordExtractor(HWPFDocument doc)
doc
- The HWPFDocument to extract from
(doc - 要从中提取的 HWPFDocument)
public java.lang.String[] getParagraphText()
public java.lang.String[] getFootnoteText()
public java.lang.String[] getMainTextboxText()
public java.lang.String[] getEndnoteText()
public java.lang.String[] getCommentsText()
@Deprecated public java.lang.String getHeaderText()
@Deprecated public java.lang.String getFooterText()
public java.lang.String getTextFromPieces()
public java.lang.String getText()
getText
in interface
POITextExtractor
(接口 POITextExtractor 中的 getText)
public static java.lang.String stripFields(java.lang.String text)
public HWPFDocument getDocument()
POIOLE2TextExtractor
getDocument
in interface
POIOLE2TextExtractor
(POIOLE2TextExtractor 接口中的getDocument)
getDocument
in interface
POITextExtractor
(接口 POITextExtractor 中的 getDocument)
public void setCloseFilesystem(boolean doCloseFilesystem)
setCloseFilesystem
in interface
POITextExtractor
(接口 POITextExtractor 中的 setCloseFilesystem)
doCloseFilesystem
-
true
(default), if underlying resources/filesystem should be closed on
POITextExtractor.close()
(doCloseFilesystem - true(默认),如果底层资源/文件系统应该在 POITextExtractor.close() 上关闭)
public boolean isCloseFilesystem()
isCloseFilesystem
in interface
POITextExtractor
(接口 POITextExtractor 中的 isCloseFilesystem)
true
, if resources/filesystem should be closed on
POITextExtractor.close()
(true,如果资源/文件系统应该在 POITextExtractor.close() 上关闭)
public HWPFDocument getFilesystem()
getFilesystem
in interface
POITextExtractor
(接口 POITextExtractor 中的 getFilesystem)
Copyright 2021 The Apache Software Foundation or its licensors, as applicable.