StringUtil (POI API Documentation)

java.lang.Object
- org.apache.poi.util.StringUtil

```
@Internal
public final class StringUtil
extends java.lang.Object
```
Collection of string handling utilities (字符串处理实用程序的集合)

Field Summary

Fields
Modifier and Type	Field and Description
`static java.nio.charset.Charset`	`UTF16LE`
`static java.nio.charset.Charset`	`UTF8`
`static java.nio.charset.Charset`	`WIN_1252`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static int`	`countMatches(java.lang.CharSequence haystack, char needle)` Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches (在 haystack 中计算 needle 出现的次数与 org.apache.commons.lang3.StringUtils#countMatches 具有相同的签名)
`static boolean`	`endsWithIgnoreCase(java.lang.String haystack, java.lang.String suffix)` Tests if the string ends with the specified suffix, ignoring case consideration. (测试字符串是否以指定的后缀结尾，忽略大小写。)
`static int`	`getEncodedSize(java.lang.String value)`
`static java.lang.String`	`getFromCompressedUnicode(byte[] string, int offset, int len)` Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (将 8 位数据（在 ISO-8859-1 代码页中）读入（unicode）Java 字符串并返回。)
`static java.lang.String`	`getFromUnicodeLE(byte[] string)` Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it. (给定一个小端格式的 16 位 unicode 字符的字节数组（最重要的字节最后一个），返回它的 Java 字符串表示。)
`static java.lang.String`	`getFromUnicodeLE(byte[] string, int offset, int len)` Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. (给定一个 Little Endian 格式的 16 位 unicode 字符的字节数组（最重要的字节在后），返回它的 Java 字符串表示。)
`static java.lang.String`	`getFromUnicodeLE0Terminated(byte[] string, int offset, int len)` Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. (给定一个 Little Endian 格式的 16 位 unicode 字符的字节数组（最重要的字节在后），返回它的 Java 字符串表示。)
`static java.lang.String`	`getPreferredEncoding()`
`static byte[]`	`getToUnicodeLE(java.lang.String string)` Convert String to 16-bit unicode characters in little endian format (将字符串转换为小端格式的 16 位 unicode 字符)
`static boolean`	`hasMultibyte(java.lang.String value)` check the parameter has multibyte character (检查参数是否有多字节字符)
`static boolean`	`isUpperCase(char c)`
`static java.lang.String`	`join(java.lang.Object[] array)`
`static java.lang.String`	`join(java.lang.Object[] array, java.lang.String separator)`
`static java.lang.String`	`join(java.lang.String separator, java.lang.Object... array)`
`static java.lang.String`	`mapMsCodepointString(java.lang.String string)` Some strings may contain encoded characters of the unicode private use area. (某些字符串可能包含 unicode 私人使用区域的编码字符。)
`static void`	`putCompressedUnicode(java.lang.String input, byte[] output, int offset)` Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (获取一个 unicode (java) 字符串，并将其作为 8 位数据返回（在 ISO-8859-1 代码页中）。)
`static void`	`putCompressedUnicode(java.lang.String input, LittleEndianOutput out)`
`static void`	`putUnicodeLE(java.lang.String input, byte[] output, int offset)` Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (接受一个 unicode 字符串，并将其作为提供的字节数组中的小端（最重要的字节最后一个）字节返回。)
`static void`	`putUnicodeLE(java.lang.String input, LittleEndianOutput out)`
`static java.lang.String`	`readCompressedUnicode(LittleEndianInput in, int nChars)`
`static java.lang.String`	`readUnicodeLE(LittleEndianInput in, int nChars)`
`static java.lang.String`	`readUnicodeString(LittleEndianInput in)` InputStream `in` is expected to contain: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0. (InputStream in 预计包含： ushort nChars byte is16BitFlag byte[]/char[] characterData 对于这种编码，即使 nChars==0，is16BitFlag 也始终存在。)
`static java.lang.String`	`readUnicodeString(LittleEndianInput in, int nChars)` InputStream `in` is expected to contain: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0. (InputStream in 应包含： byte is16BitFlag byte[]/char[] characterData 对于此编码，即使 nChars==0，is16BitFlag 也始终存在。)
`static boolean`	`startsWithIgnoreCase(java.lang.String haystack, java.lang.String prefix)` Tests if the string starts with the specified prefix, ignoring case consideration. (测试字符串是否以指定的前缀开头，忽略大小写。)
`static java.lang.String`	`toLowerCase(char c)`
`static java.lang.String`	`toUpperCase(char c)`
`static void`	`writeUnicodeString(LittleEndianOutput out, java.lang.String value)` OutputStream `out` will get: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0. (OutputStream out 将得到： ushort nChars byte is16BitFlag byte[]/char[] characterData 对于这种编码，即使 nChars==0，is16BitFlag 也始终存在。)
`static void`	`writeUnicodeStringFlagAndData(LittleEndianOutput out, java.lang.String value)` OutputStream `out` will get: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0. (OutputStream out 将得到： byte is16BitFlag byte[]/char[] characterData 对于这种编码，is16BitFlag 始终存在，即使 nChars==0。)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - UTF16LE
```
public static final java.nio.charset.Charset UTF16LE
```
  - UTF8
```
public static final java.nio.charset.Charset UTF8
```
  - WIN_1252
```
public static final java.nio.charset.Charset WIN_1252
```
- Method Detail
  - getFromUnicodeLE
```
public static java.lang.String getFromUnicodeLE(byte[] string,
                                                int offset,
                                                int len)
                                         throws java.lang.ArrayIndexOutOfBoundsException,
                                                java.lang.IllegalArgumentException
```
    Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
    { 0x16, 0x00 } -0x16
    (给定一个 Little Endian 格式的 16 位 unicode 字符的字节数组（最重要的字节在后），返回它的 Java 字符串表示。 { 0x16, 0x00 } -0x16)
    
    Parameters:
    
    string - the byte array to be converted (string - 要转换的字节数组)
    
    offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character (offset - 字节数组的初始偏移量。假设 string[ offset ] 和 string[ offset + 1 ] 包含第一个 16 位 unicode 字符)
    
    len - the length of the final string (len - 最终字符串的长度)
    
    Returns:
    
    the converted string, never null. (转换后的字符串，从不为空。)
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length) (java.lang.ArrayIndexOutOfBoundsException - 如果偏移量超出字节数组的范围（即，为负数或大于或等于 string.length）)
    
    java.lang.IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length) (java.lang.IllegalArgumentException - 如果 len 太大（即，字符串中没有足够的数据来创建该长度的字符串）)
  - getFromUnicodeLE
```
public static java.lang.String getFromUnicodeLE(byte[] string)
```
    Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.
    { 0x16, 0x00 } -0x16
    (给定一个小端格式的 16 位 unicode 字符的字节数组（最重要的字节最后一个），返回它的 Java 字符串表示。 { 0x16, 0x00 } -0x16)
    
    Parameters:
    
    string - the byte array to be converted (string - 要转换的字节数组)
    
    Returns:
    
    the converted string, never null (转换后的字符串，从不为空)
  - getToUnicodeLE
```
public static byte[] getToUnicodeLE(java.lang.String string)
```
    Convert String to 16-bit unicode characters in little endian format (将字符串转换为小端格式的 16 位 unicode 字符)
    
    Parameters:
    
    string - the string (字符串 - 字符串)
    
    Returns:
    
    the byte array of 16-bit unicode characters (16 位 unicode 字符的字节数组)
  - getFromCompressedUnicode
```
public static java.lang.String getFromCompressedUnicode(byte[] string,
                                                        int offset,
                                                        int len)
```
    Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string) (将 8 位数据（在 ISO-8859-1 代码页中）读入（unicode）Java 字符串并返回。（在 Excel 术语中，将压缩的 8 位 unicode 读取为字符串）)
    
    Parameters:
    
    string - byte array to read (string - 要读取的字节数组)
    
    offset - offset to read byte array (offset - 读取字节数组的偏移量)
    
    len - length to read byte array (len - 读取字节数组的长度)
    
    Returns:
    
    String generated String instance by reading byte array (String 通过读取字节数组生成String实例)
  - readCompressedUnicode
```
public static java.lang.String readCompressedUnicode(LittleEndianInput in,
                                                     int nChars)
```
  - readUnicodeString
```
public static java.lang.String readUnicodeString(LittleEndianInput in)
```
    InputStream in is expected to contain:
    1. ushort nChars
    2. byte is16BitFlag
    3. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0.
    This structure is also known as a XLUnicodeString.
    (InputStream in 预计包含： ushort nChars byte is16BitFlag byte[]/char[] characterData 对于这种编码，即使 nChars==0，is16BitFlag 也始终存在。此结构也称为 XLUnicodeString。)
  - readUnicodeString
```
public static java.lang.String readUnicodeString(LittleEndianInput in,
                                                 int nChars)
```
    InputStream in is expected to contain:
    1. byte is16BitFlag
    2. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0.
    This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, readUnicodeString(LittleEndianInput) can be used. (InputStream in 应包含： byte is16BitFlag byte[]/char[] characterData 对于此编码，即使 nChars==0，is16BitFlag 也始终存在。当 nChars 字段未在 is16BitFlag 之前立即存储为 ushort 时，应使用此方法。否则，可以使用 readUnicodeString(LittleEndianInput)。)
  - writeUnicodeString
```
public static void writeUnicodeString(LittleEndianOutput out,
                                      java.lang.String value)
```
    OutputStream out will get:
    1. ushort nChars
    2. byte is16BitFlag
    3. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0. (OutputStream out 将得到： ushort nChars byte is16BitFlag byte[]/char[] characterData 对于这种编码，即使 nChars==0，is16BitFlag 也始终存在。)
  - writeUnicodeStringFlagAndData
```
public static void writeUnicodeStringFlagAndData(LittleEndianOutput out,
                                                 java.lang.String value)
```
    OutputStream out will get:
    1. byte is16BitFlag
    2. byte[]/char[] characterData
    For this encoding, the is16BitFlag is always present even if nChars==0.
    This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, writeUnicodeString(LittleEndianOutput, String) can be used. (OutputStream out 将得到： byte is16BitFlag byte[]/char[] characterData 对于这种编码，is16BitFlag 始终存在，即使 nChars==0。当 nChars 字段未在 is16BitFlag 之前立即存储为 ushort 时，应使用此方法。否则，可以使用 writeUnicodeString(LittleEndianOutput, String)。)
  - getEncodedSize
```
public static int getEncodedSize(java.lang.String value)
```
    Returns:
    
    the number of bytes that would be written by writeUnicodeString(LittleEndianOutput, String) (writeUnicodeString(LittleEndianOutput, String) 将写入的字节数)
  - putCompressedUnicode
```
public static void putCompressedUnicode(java.lang.String input,
                                        byte[] output,
                                        int offset)
```
    Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode) (获取一个 unicode (java) 字符串，并将其作为 8 位数据返回（在 ISO-8859-1 代码页中）。（在 Excel 术语中，编写压缩的 8 位 unicode）)
    
    Parameters:
    
    input - the String containing the data to be written (input - 包含要写入的数据的字符串)
    
    output - the byte array to which the data is to be written (output - 要写入数据的字节数组)
    
    offset - an offset into the byte arrat at which the data is start when written (offset - 写入时数据开始的字节数组的偏移量)
  - putCompressedUnicode
```
public static void putCompressedUnicode(java.lang.String input,
                                        LittleEndianOutput out)
```
  - putUnicodeLE
```
public static void putUnicodeLE(java.lang.String input,
                                byte[] output,
                                int offset)
```
    Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode) (接受一个 unicode 字符串，并将其作为提供的字节数组中的小端（最重要的字节最后一个）字节返回。（在 Excel 术语中，写未压缩的 unicode）)
    
    Parameters:
    
    input - the String containing the unicode data to be written (input - 包含要写入的 unicode 数据的字符串)
    
    output - the byte array to hold the uncompressed unicode, should be twice the length of the String (output - 保存未压缩 unicode 的字节数组，应该是字符串长度的两倍)
    
    offset - the offset to start writing into the byte array (offset - 开始写入字节数组的偏移量)
  - putUnicodeLE
```
public static void putUnicodeLE(java.lang.String input,
                                LittleEndianOutput out)
```
  - readUnicodeLE
```
public static java.lang.String readUnicodeLE(LittleEndianInput in,
                                             int nChars)
```
  - getPreferredEncoding
```
public static java.lang.String getPreferredEncoding()
```
    Returns:
    
    the encoding we want to use, currently hardcoded to ISO-8859-1 (我们要使用的编码，目前硬编码为 ISO-8859-1)
  - hasMultibyte
```
public static boolean hasMultibyte(java.lang.String value)
```
    check the parameter has multibyte character (检查参数是否有多字节字符)
    
    Parameters:
    
    value - string to check (value - 要检查的字符串)
    
    Returns:
    
    boolean result true:string has at least one multibyte character (布尔结果为真：字符串至少有一个多字节字符)
  - startsWithIgnoreCase
```
public static boolean startsWithIgnoreCase(java.lang.String haystack,
                                           java.lang.String prefix)
```
    Tests if the string starts with the specified prefix, ignoring case consideration. (测试字符串是否以指定的前缀开头，忽略大小写。)
  - endsWithIgnoreCase
```
public static boolean endsWithIgnoreCase(java.lang.String haystack,
                                         java.lang.String suffix)
```
    Tests if the string ends with the specified suffix, ignoring case consideration. (测试字符串是否以指定的后缀结尾，忽略大小写。)
  - toLowerCase
```
@Internal
public static java.lang.String toLowerCase(char c)
```
  - toUpperCase
```
@Internal
public static java.lang.String toUpperCase(char c)
```
  - isUpperCase
```
@Internal
public static boolean isUpperCase(char c)
```
  - mapMsCodepointString
```
public static java.lang.String mapMsCodepointString(java.lang.String string)
```
    Some strings may contain encoded characters of the unicode private use area. Currently the characters of the symbol fonts are mapped to the corresponding characters in the normal unicode range. (某些字符串可能包含 unicode 私人使用区域的编码字符。目前符号字体的字符映射到正常unicode范围内的对应字符。)
    
    Parameters:
    
    string - the original string (string - 原始字符串)
    
    Returns:
    
    the string with mapped characters (具有映射字符的字符串)
    
    See Also:
    
    Private Use Area (symbol), Symbol font - Unicode alternatives for Greek and special characters in HTML
  - join
```
@Internal
public static java.lang.String join(java.lang.Object[] array,
                                              java.lang.String separator)
```
  - join
```
@Internal
public static java.lang.String join(java.lang.Object[] array)
```
  - join
```
@Internal
public static java.lang.String join(java.lang.String separator,
                                              java.lang.Object... array)
```
  - countMatches
```
public static int countMatches(java.lang.CharSequence haystack,
                               char needle)
```
    Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches (在 haystack 中计算 needle 出现的次数与 org.apache.commons.lang3.StringUtils#countMatches 具有相同的签名)
    
    Parameters:
    
    haystack - the CharSequence to check, may be null (haystack - 要检查的 CharSequence，可能为 null)
    
    needle - the character to count the quantity of (needle - 计算数量的字符)
    
    Returns:
    
    the number of occurrences, 0 if the CharSequence is null (出现次数，如果 CharSequence 为 null，则为 0)
  - getFromUnicodeLE0Terminated
```
public static java.lang.String getFromUnicodeLE0Terminated(byte[] string,
                                                           int offset,
                                                           int len)
                                                    throws java.lang.ArrayIndexOutOfBoundsException,
                                                           java.lang.IllegalArgumentException
```
    Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. Scans the byte array for two continous 0 bytes and returns the string before.
    #61881: there seem to be programs out there, which write the 0-termination also at the beginning of the string. Check if the next two bytes contain a valid ascii char and correct the _recdata with a '?' char
    (给定一个 Little Endian 格式的 16 位 unicode 字符的字节数组（最重要的字节在后），返回它的 Java 字符串表示。扫描字节数组中两个连续的 0 字节并返回之前的字符串。 #61881：那里似乎有程序，它们也在字符串的开头写入 0 终止符。检查接下来的两个字节是否包含有效的 ascii 字符并使用“？”更正 _recdata字符)
    
    Parameters:
    
    string - the byte array to be converted (string - 要转换的字节数组)
    
    offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character (offset - 字节数组的初始偏移量。假设 string[ offset ] 和 string[ offset + 1 ] 包含第一个 16 位 unicode 字符)
    
    len - the max. length of the final string (len - 最大值。最终字符串的长度)
    
    Returns:
    
    the converted string, never null. (转换后的字符串，从不为空。)
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length) (java.lang.ArrayIndexOutOfBoundsException - 如果偏移量超出字节数组的范围（即，为负数或大于或等于 string.length）)
    
    java.lang.IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length) (java.lang.IllegalArgumentException - 如果 len 太大（即，字符串中没有足够的数据来创建该长度的字符串）)

Class StringUtil

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

UTF16LE

UTF8

WIN_1252

Method Detail

getFromUnicodeLE

getFromUnicodeLE

getToUnicodeLE

getFromCompressedUnicode

readCompressedUnicode

readUnicodeString

readUnicodeString

writeUnicodeString

writeUnicodeStringFlagAndData

getEncodedSize

putCompressedUnicode

putCompressedUnicode

putUnicodeLE

putUnicodeLE

readUnicodeLE

getPreferredEncoding

hasMultibyte

startsWithIgnoreCase

endsWithIgnoreCase

toLowerCase

toUpperCase

isUpperCase

mapMsCodepointString

join

join

join

countMatches

getFromUnicodeLE0Terminated