public final class ExtractorFactory
extends java.lang.Object
Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath
Note 2 - for text extractor creation across all formats, use POIXMLExtractorFactory
contained within the OOXML jar.
Note 3 - rather than using this, for most cases you would be better off switching to Apache Tika instead!
(为您提供的文档找出正确的 POIOLE2TextExtractor,并将其返回。注 1 - 如果 POI Scratchpad jar 不存在于运行时类路径中,则对于许多文件格式将失败注 2 - 对于所有格式的文本提取器创建,请使用 OOXML jar 中包含的 POIXMLExtractorFactory。注 3 - 在大多数情况下,最好不要使用它,而是切换到 Apache Tika!)Modifier and Type | Field and Description |
---|---|
static java.lang.String |
OOXML_PACKAGE
Some OPCPackages are packed in side an OLE2 container.
(一些 OPCPackages 包装在 OLE2 容器中。)
|
Modifier and Type | Method and Description |
---|---|
static void |
addProvider(ExtractorProvider provider) |
static POITextExtractor |
createExtractor(DirectoryNode root)
Create the Extractor, if possible.
(如果可能,创建提取器。)
|
static POITextExtractor |
createExtractor(DirectoryNode root, java.lang.String password) |
static POITextExtractor |
createExtractor(java.io.File file) |
static POITextExtractor |
createExtractor(java.io.File file, java.lang.String password) |
static POITextExtractor |
createExtractor(java.io.InputStream input) |
static POITextExtractor |
createExtractor(java.io.InputStream input, java.lang.String password) |
static POITextExtractor |
createExtractor(POIFSFileSystem fs) |
static POITextExtractor |
createExtractor(POIFSFileSystem fs, java.lang.String password) |
static java.lang.Boolean |
getAllThreadsPreferEventExtractors()
Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.
(所有线程都应该更喜欢基于事件而不是基于用户模型的提取器吗? (用户模型提取器往往更准确,但使用更多内存)默认是使用线程级别设置,默认为 false。)
|
static POITextExtractor[] |
getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext)
Returns an array of text extractors, one for each of the embedded documents in the file (if there are any).
(返回一个文本提取器数组,一个用于文件中的每个嵌入文档(如果有的话)。)
|
static boolean |
getPreferEventExtractor()
Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.
(这个线程是否应该使用基于事件的提取器? 首先检查所有线程,然后检查特定线程。)
|
static boolean |
getThreadPrefersEventExtractors()
Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.
(这个线程应该更喜欢基于事件而不是基于用户模型的提取器吗? (用户模型提取器往往更准确,但使用更多内存)默认为 false。)
|
static void |
removeProvider(java.lang.Class<? extends ExtractorProvider> provider) |
static void |
setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.
(所有线程都应该更喜欢基于事件而不是基于用户模型的提取器吗?如果设置,将优先于线程级别设置。)
|
static void |
setThreadPrefersEventExtractors(boolean preferEventExtractors)
Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.
(这个线程应该更喜欢基于事件而不是基于用户模型的提取器吗?仅当所有线程设置为空时才使用。)
|
public static final java.lang.String OOXML_PACKAGE
DirectoryNode
is called
"EncryptedPackage"
, otherwise the node is called "Packge"
(一些 OPCPackages 包装在 OLE2 容器中。 如果已加密,则 DirectoryNode 称为“EncryptedPackage”,否则该节点称为“Packge”)
public static boolean getThreadPrefersEventExtractors()
public static java.lang.Boolean getAllThreadsPreferEventExtractors()
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
preferEventExtractors
- If this threads should prefer event based extractors.
(preferEventExtractors - 如果这个线程应该更喜欢基于事件的提取器。)
public static void setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
preferEventExtractors
- If all threads should prefer event based extractors.
(preferEventExtractors - 如果所有线程都应该首选基于事件的提取器。)
public static boolean getPreferEventExtractor()
public static POITextExtractor createExtractor(POIFSFileSystem fs) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public static POITextExtractor createExtractor(POIFSFileSystem fs, java.lang.String password) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public static POITextExtractor createExtractor(java.io.InputStream input) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public static POITextExtractor createExtractor(java.io.InputStream input, java.lang.String password) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public static POITextExtractor createExtractor(java.io.File file) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public static POITextExtractor createExtractor(java.io.File file, java.lang.String password) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public static POITextExtractor createExtractor(DirectoryNode root) throws java.io.IOException
POIXMLExtractorFactory
for that.
(如果可能,创建提取器。一般需要 Scratchpad jar。请注意,这也不会检查嵌入式 OOXML 资源,请为此使用 POIXMLExtractorFactory。)
root
- The
DirectoryNode
pointing to a document.
(root - 指向文档的 DirectoryNode。)
POITextExtractor
, an exception is thrown if no TextExtractor can be created for some reason.
(生成的 POITextExtractor,如果由于某种原因无法创建 TextExtractor,则会引发异常。)
java.io.IOException
- If converting the
DirectoryNode
into a HSSFWorkbook fails
(java.io.IOException - 如果将 DirectoryNode 转换为 HSSFWorkbook 失败)
OldFileFormatException
- If the
DirectoryNode
points to a format of an unsupported version of Excel.
(OldFileFormatException - 如果 DirectoryNode 指向不受支持的 Excel 版本的格式。)
java.lang.IllegalArgumentException
- If creating the Extractor fails
(java.lang.IllegalArgumentException - 如果创建提取器失败)
public static POITextExtractor createExtractor(DirectoryNode root, java.lang.String password) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public static POITextExtractor[] getEmbeddedDocsTextExtractors(POIOLE2TextExtractor ext) throws java.io.IOException
POITextExtractor
for each embedded file.
(返回一个文本提取器数组,一个用于文件中的每个嵌入文档(如果有的话)。如果没有嵌入文档,您将返回一个空数组。否则,您将为每个嵌入文件获得一个打开的 POITextExtractor。)
ext
- The extractor to look at for embedded documents
(ext - 查看嵌入文档的提取器)
java.io.IOException
- If converting the
DirectoryNode
into a HSSFWorkbook fails
(java.io.IOException - 如果将 DirectoryNode 转换为 HSSFWorkbook 失败)
OldFileFormatException
- If the
DirectoryNode
points to a format of an unsupported version of Excel.
(OldFileFormatException - 如果 DirectoryNode 指向不受支持的 Excel 版本的格式。)
java.lang.IllegalArgumentException
- If creating the Extractor fails
(java.lang.IllegalArgumentException - 如果创建提取器失败)
public static void addProvider(ExtractorProvider provider)
public static void removeProvider(java.lang.Class<? extends ExtractorProvider> provider)
Copyright 2021 The Apache Software Foundation or its licensors, as applicable.