U8_TEXTPREP_STR(3C) | Standard C Library Functions | U8_TEXTPREP_STR(3C) |
u8_textprep_str - string-based UTF-8 text preparation function
#include <sys/u8_textprep.h> size_t u8_textprep_str(char *inarray, size_t *inlen,
char *outarray, size_t *outlen, int flag,
size_t unicode_version, int *errnum);
inarray
inlen
outarray
outlen
flag
U8_TEXTPREP_IGNORE_NULL
With this option, null byte does not stop the preparation and the preparation continues until inlen specified amount of inarray bytes are all consumed for preparation or an error happened.
U8_TEXTPREP_IGNORE_INVALID
When this option is set, u8_textprep_str() does not stop the preparation and instead treats such characters as no need to do any preparation.
U8_TEXTPREP_TOUPPER
U8_TEXTPREP_TOLOWER
U8_TEXTPREP_NFD
U8_TEXTPREP_NFC
U8_TEXTPREP_NFKD
U8_TEXTPREP_NFKC
Only one case folding option is allowed. Only one Unicode Normalization option is allowed.
When a case folding option and a Unicode Normalization option are specified together, UTF-8 text preparation is done by doing case folding first and then Unicode Normalization.
If no option is specified, no processing occurs except the simple copying of bytes from input to output.
unicode_version
U8_UNICODE_320
U8_UNICODE_500
U8_UNICODE_LATEST
errnum
E2BIG
EBADF
EILSEQ
EINVAL
ERANGE
The u8_textprep_str() function prepares the sequence of UTF-8 characters in the array specified by inarray into a sequence of corresponding UTF-8 characters prepared in the array specified by outarray. The inarray argument points to a character byte array to the first character in the input array and inlen indicates the number of bytes to the end of the array to be converted. The outarray argument points to a character byte array to the first available byte in the output array and outlen indicates the number of the available bytes to the end of the array. Unless flag is U8_TEXTPREP_IGNORE_NULL, u8_textprep_str() normally stops when it encounters a null byte from the input array regardless of the current inlen value.
If flag is U8_TEXTPREP_IGNORE_INVALID and a sequence of input bytes does not form a valid UTF-8 character, preparation stops after the previous successfully prepared character. If flag is U8_TEXTPREP_IGNORE_INVALID and the input array ends with an incomplete UTF-8 character, preparation stops after the previous successfully prepared bytes. If the output array is not large enough to hold the entire prepared text, preparation stops just prior to the input bytes that would cause the output array to overflow. The value pointed to by inlen is decremented to reflect the number of bytes still not prepared in the input array. The value pointed to by outlen is decremented to reflect the number of bytes still available in the output array.
The u8_textprep_str() function updates the values pointed to by inlen and outlen arguments to reflect the extent of the preparation. When U8_TEXTPREP_IGNORE_INVALID is specified, u8_textprep_str() returns the number of illegal or incomplete characters found during the text preparation. When U8_TEXTPREP_IGNORE_INVALID is not specified and the text preparation is entirely successful, the function returns 0. If the entire string in the input array is prepared, the value pointed to by inlen will be 0. If the text preparation is stopped due to any conditions mentioned above, the value pointed to by inlen will be non-zero and errnum is set to indicate the error. If such and any other error occurs, u8_textprep_str() returns (size_t)-1 and sets errnum to indicate the error.
Example 1 Simple UTF-8 text preparation
#include <sys/u8_textprep.h> . . . size_t ret; char ib[MAXPATHLEN]; char ob[MAXPATHLEN]; size_t il, ol; int err; . . . /*
* We got a UTF-8 pathname from somewhere.
*
* Calculate the length of input string including the terminating
* NULL byte and prepare other arguments.
*/ (void) strlcpy(ib, pathname, MAXPATHLEN); il = strlen(ib) + 1; ol = MAXPATHLEN; /*
* Do toupper case folding, apply Unicode Normalization Form D,
* ignore NULL byte, and ignore any illegal/incomplete characters.
*/ ret = u8_textprep_str(ib, &il, ob, &ol,
(U8_TEXTPREP_IGNORE_NULL|U8_TEXTPREP_IGNORE_INVALID|
U8_TEXTPREP_TOUPPER|U8_TEXTPREP_NFD), U8_UNICODE_LATEST, &err); if (ret == (size_t)-1) {
if (err == E2BIG)
return (-1);
if (err == EBADF)
return (-2);
if (err == ERANGE)
return (-3);
return (-4); }
See attributes(7) for descriptions of the following attributes:
ATTRIBUTE TYPE | ATTRIBUTE VALUE |
Interface Stability | Committed |
MT-Level | MT-Safe |
u8_strcmp(3C), u8_validate(3C), attributes(7), u8_strcmp(9F), u8_textprep_str(9F), u8_validate(9F)
The Unicode Standard (http://www.unicode.org)
After the text preparation, the number of prepared UTF-8 characters and the total number bytes may decrease or increase when you compare the numbers with the input buffer.
Case conversions are performed using Unicode data of the corresponding version. There are no locale-specific case conversions that can be performed.
September 18, 2007 | OmniOS |