C16RTOMB(3C) | Standard C Library Functions | C16RTOMB(3C) |
c16rtomb
,
c32rtomb
, wcrtomb
,
wcrtomb_l
— convert
wide-characters to character sequences
#include
<uchar.h>
size_t
c16rtomb
(char *restrict str,
char16_t c16, mbstate_t *restrict
ps);
size_t
c32rtomb
(char *restrict str,
char32_t c32, mbstate_t *restrict
ps);
#include
<stdio.h>
size_t
wcrtomb
(char *restrict str,
wchar_t wc, mbstate_t *restrict
ps);
#include <stdio.h>
#include <xlocale.h>
size_t
wcrtomb_l
(char *restrict str,
wchar_t wc, mbstate_t *restrict
ps, locale_t loc);
The
c16rtomb
(),
c32rtomb
(),
wcrtomb
(),
and wcrtomb_l
() functions convert wide-character
sequences into a series of multi-byte characters. The functions work in the
following formats:
c16rtomb
()c32rtomb
()wcrtomb
(),
wcrtomb_l
()The functions all work by looking at the passed in wide-character
(c16, c32,
wc) and appending it to the current conversion state,
ps. Once a valid code point, based on the current
locale, is found, then it will be converted into a series of characters that
are stored in str. Up to
MB_CUR_MAX
bytes will be stored in
str. It is the caller's responsibility to ensure that
there is sufficient space in str.
The functions are all influenced by the
LC_CTYPE
category of the current locale for
determining what is considered a valid character. For example, in the
C locale, only ASCII
characters are recognized, while in a
UTF-8
based locale like
en_us.UTF-8,
all valid Unicode code points are recognized and will be converted into the
corresponding multi-byte sequence. The
wcrtomb_l
()
function uses the locale passed in loc rather than the
locale of the current thread.
The ps argument represents a multi-byte
conversion state which can be used across multiple calls to a given function
(but not mixed between functions). These allow for characters to be consumed
from subsequent buffers, e.g. different values of str.
The functions may be called from multiple threads as long as they use unique
values for ps. If ps is
NULL
, then a function-specific buffer will be used
for the conversion state; however, this is stored between all threads and
its use is not recommended.
The functions all have a special behavior when
NULL
is passed for str. They
instead will treat it as though a the NULL wide-character was passed in
c16, c32, or
wc and an internal buffer (buf) will be used to write
out the results of the conversion. In other words, the functions would be
called as:
c16rtomb(buf, L'\0', ps) c32rtomb(buf, L'\0', ps) wcrtomb(buf, L'\0', ps) wcrtomb_l(buf, L'\0', ps, loc)
Not all locales in the system are Unicode based locales. For example, ISO 8859 family locales have code points with values that do not match their counterparts in Unicode. When using these functions with non-Unicode based locales, the code points returned will be those determined by the locale. They will not be converted from the corresponding Unicode code point. For example, if using the Euro sign in ISO 8859-15, these functions will not encode the Unicode value 0x20ac into the ISO 8859-15 value 0xa4.
Regardless of the locale, the characters returned
will be encoded as though the code point were the corresponding value in
Unicode. This means that when using UTF-16, if the corresponding code point
were in the range for surrogate pairs, then the
c16rtomb
()
function will expect to receive that code point in that fashion.
This behavior of the
c16rtomb
()
and
c32rtomb
()
functions should not be relied upon, is not portable, and subject to change
for non-Unicode locales.
Upon successful completion, the
c16rtomb
(), c32rtomb
(),
wcrtomb
(), and wcrtomb_l
()
functions return the number of bytes stored in str.
Otherwise,
(size_t)-1
is returned to indicate an encoding error and errno is
set.
Example 1 Converting a UTF-32 character into a multi-byte character sequence.
#include <locale.h> #include <stdlib.h> #include <string.h> #include <err.h> #include <stdio.h> #include <uchar.h> int main(void) { mbstate_t mbs; size_t ret; char buf[MB_CUR_MAX]; char32_t val = 0x5149; const char *uchar_exp = "\xe5\x85\x89"; (void) memset(&mbs, 0, sizeof (mbs)); (void) setlocale(LC_CTYPE, "en_US.UTF-8"); ret = c32rtomb(buf, val, &mbs); if (ret != strlen(uchar_exp)) { errx(EXIT_FAILURE, "failed to convert string, got %zd", ret); } if (strncmp(buf, uchar_exp, ret) != 0) { errx(EXIT_FAILURE, "converted char32_t does not match " "expected value"); } return (0); }
The c16rtomb
(),
c32rtomb
(), wcrtomb
(), and
wcrtomb_l
() functions will fail if:
The c16rtomb
(),
c32rtomb
(), wcrtomb
(), and
wcrtomb_l
() functions are
MT-Safe
as long as different mbstate_t structures are passed
in ps. If ps is
NULL
or different threads use the same value for
ps, then the functions are
Unsafe.
mbrtoc16(3C), mbrtoc32(3C), mbrtowc(3C), newlocale(3C), setlocale(3C), uselocale(3C), uchar.h(3HEAD), environ(7)
December 2, 2023 | OmniOS |