public final class POIXMLExtractorFactory extends java.lang.Object implements ExtractorProvider
Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath
Note 2 - rather than using this, for most cases you would be better off switching to Apache Tika instead!
(为您提供的文档找出正确的 POITextExtractor,并将其返回。注 1 - 如果 POI Scratchpad jar 不存在于运行时类路径中,许多文件格式将失败注 2 - 而不是使用它,在大多数情况下,您最好改用 Apache Tika!)Constructor and Description |
---|
POIXMLExtractorFactory() |
Modifier and Type | Method and Description |
---|---|
boolean |
accepts(FileMagic fm) |
POITextExtractor |
create(DirectoryNode poifsDir, java.lang.String password)
Create Extractor from POIFS node
(从 POIFS 节点创建提取器)
|
POITextExtractor |
create(java.io.File f, java.lang.String password)
Create Extractor via file
(通过文件创建提取器)
|
POITextExtractor |
create(java.io.InputStream inp, java.lang.String password)
Create Extractor via InputStream
(通过 InputStream 创建提取器)
|
POIXMLTextExtractor |
create(OPCPackage pkg)
Tries to determine the actual type of file and produces a matching text-extractor for it.
(尝试确定文件的实际类型并为其生成匹配的文本提取器。)
|
POITextExtractor |
create(POIFSFileSystem fs) |
static java.lang.Boolean |
getAllThreadsPreferEventExtractors()
Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.
(所有线程都应该更喜欢基于事件而不是基于用户模型的提取器吗? (用户模型提取器往往更准确,但使用更多内存)默认是使用线程级别设置,默认为 false。)
|
static boolean |
getPreferEventExtractor()
Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.
(这个线程是否应该使用基于事件的提取器?首先检查所有线程,然后检查特定线程。)
|
static boolean |
getThreadPrefersEventExtractors()
Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.
(这个线程应该更喜欢基于事件而不是基于用户模型的提取器吗? (用户模型提取器往往更准确,但使用更多内存)默认为 false。)
|
static void |
setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.
(所有线程都应该更喜欢基于事件而不是基于用户模型的提取器吗?如果设置,将优先于线程级别设置。)
|
static void |
setThreadPrefersEventExtractors(boolean preferEventExtractors)
Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.
(这个线程应该更喜欢基于事件而不是基于用户模型的提取器吗?仅当所有线程设置为空时才使用。)
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
identifyEmbeddedResources
public boolean accepts(FileMagic fm)
accepts
in interface
ExtractorProvider
(在接口 ExtractorProvider 中接受)
public static boolean getThreadPrefersEventExtractors()
public static java.lang.Boolean getAllThreadsPreferEventExtractors()
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors)
public static void setAllThreadsPreferEventExtractors(java.lang.Boolean preferEventExtractors)
public static boolean getPreferEventExtractor()
public POITextExtractor create(java.io.File f, java.lang.String password) throws java.io.IOException
ExtractorProvider
create
in interface
ExtractorProvider
(在接口 ExtractorProvider 中创建)
f
- the file
(f - 文件)
password
- the password or
null
if not encrypted
(password - 密码;如果未加密,则为 null)
java.io.IOException
- if file can't be read or parsed
(java.io.IOException - 如果文件无法读取或解析)
public POITextExtractor create(java.io.InputStream inp, java.lang.String password) throws java.io.IOException
ExtractorProvider
create
in interface
ExtractorProvider
(在接口 ExtractorProvider 中创建)
inp
- the stream
(inp - 流)
password
- the password or
null
if not encrypted
(password - 密码;如果未加密,则为 null)
java.io.IOException
- if stream can't be read or parsed
(java.io.IOException - 如果无法读取或解析流)
public POIXMLTextExtractor create(OPCPackage pkg) throws java.io.IOException
pkg
- An
OPCPackage
.
(pkg - 一个 OPC 包。)
POIXMLTextExtractor
for the given file.
(给定文件的 POIXMLTextExtractor。)
java.io.IOException
- If an error occurs while reading the file
(java.io.IOException - 如果读取文件时发生错误)
java.lang.IllegalArgumentException
- If no matching file type could be found.
(java.lang.IllegalArgumentException - 如果找不到匹配的文件类型。)
public POITextExtractor create(POIFSFileSystem fs) throws java.io.IOException
java.io.IOException
(java.io.IOException)
public POITextExtractor create(DirectoryNode poifsDir, java.lang.String password) throws java.io.IOException
ExtractorProvider
create
in interface
ExtractorProvider
(在接口 ExtractorProvider 中创建)
poifsDir
- the node
(poifsDir - 节点)
password
- the password or
null
if not encrypted
(password - 密码;如果未加密,则为 null)
java.io.IOException
- if node can't be parsed
(java.io.IOException - 如果无法解析节点)
Copyright 2021 The Apache Software Foundation or its licensors, as applicable.