| MBRTOC16(3C) | Standard C Library Functions | MBRTOC16(3C) | 
mbrtoc16,
    mbrtoc32, mbrtowc,
    mbrtowc_l — convert
    characters to wide characters
#include
  <wchar.h>
size_t
  
  mbrtowc(wchar_t *restrict pwc,
    const char *restrict str, size_t
    len, mstate_t *restrict ps);
#include <wchar.h>
  
  #include <xlocale.h>
size_t
  
  mbrtowc_l(wchar_t *restrict pwc,
    const char *restrict str, size_t
    len, mstate_t *restrict ps,
    locale_t loc);
#include
  <uchar.h>
size_t
  
  mbrtoc16(char16_t *restrict
    p16c, const char *restrict str,
    size_t len, mbstate_t *restrict
    ps);
size_t
  
  mbrtoc32(char32_t *restrict
    p32c, const char *restrict str,
    size_t len, mbstate_t *restrict
    ps);
The
    mbrtoc16(),
    mbrtoc32(),
    mbrtowc(),
    and mbrtowc_l() functions convert character
    sequences, which may contain multi-byte characters, into different character
    formats. The functions work in the following formats:
mbrtoc16()mbrtoc32()mbrtowc(),
    mbrtowc_l()The functions consume up to len
    characters from the string str and accumulate them in
    ps until a valid character is found, which is
    influenced by the LC_CTYPE category of the current
    locale. For example, in the
    C locale, only ASCII
    characters are recognized, while in a
    UTF-8
    based locale like
    en_US.UTF-8,
    UTF-8 multi-byte character sequences that represent Unicode code points are
    recognized. The
    mbrtowc_l()
    function uses the locale passed in loc rather than the
    locale of the current thread.
When a valid character sequence has been found, it
    is converted to either a 16-bit character sequence for
    mbrtoc16()
    or a 32-bit character sequence for
    mbrtoc32()
    and will be stored in p16c and
    p32c respectively.
The ps argument represents a multi-byte
    conversion state which can be used across multiple calls to a given function
    (but not mixed between functions). These allow for characters to be consumed
    from subsequent buffers, e.g. different values of str.
    The functions may be called from multiple threads as long as they use unique
    values for ps. If ps is
    NULL, then a function-specific buffer will be used
    for the conversion state; however, this is stored between all threads and
    its use is not recommended.
When using these functions, more than one character may be output for a given set of consumed input characters. An example of this is when a given code point is represented as a set of surrogate pairs in UTF-16, which require two 16-bit characters to represent a code point. When this occurs, the functions return the special return value -3.
The functions all have a special behavior when
    NULL is passed for str. They
    instead will treat it as though pwc,
    p16c, or p32c were
    NULL, str had been passed as
    the empty string, "" and the length, len,
    would appear as the value 1. In other words, the functions would be called
    as:
mbrtowc(NULL, "", 1, ps) mbrtowc_l(NULL, "", 1, ps) mbrtoc16(NULL, "", 1, ps) mbrtoc32(NULL, "", 1, ps)
Not all locales in the system are Unicode based locales. For example, ISO 8859 family locales have code points with values that do not match their counterparts in Unicode. When using these functions with non-Unicode based locales, the code points returned will be those determined by the locale. They will not be converted to the corresponding Unicode code point. For example, if using the Euro sign in ISO 8859-15, these functions might return the code point 0xa4 and not the Unicode value 0x20ac.
Regardless of the locale, the characters returned will be encoded as though the code point were the corresponding value in Unicode. This means that if a locale returns a value that would be a surrogate pair in the UTF-16 encoding, it will still be encoded as a UTF-16 character.
This behavior of the
    mbrtoc16()
    and
    mbrtoc32()
    functions should not be relied upon, is not portable, and subject to change
    for non-Unicode locales.
The mbrtoc16(),
    mbrtoc32(), mbrtowc(), and
    mbrtowc_l() functions return the following
  values:
EILSEQ. No data was written into the wide
      character buffer (pwc, p16c,
      p32c).mbrtoc16() and
      mbrtoc32() functions.Example 1 Using the
    mbrtoc32() function to convert a multibyte
  string.
#include <locale.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>
#include <stdio.h>
#include <uchar.h>
int
main(void)
{
	mbstate_t mbs;
	char32_t out;
	size_t ret;
	const char *uchar_str = "\xe5\x85\x89";
	(void) memset(&mbs, 0, sizeof (mbs));
	(void) setlocale(LC_CTYPE, "en_US.UTF-8");
	ret = mbrtoc32(&out, uchar_str, strlen(uchar_str), &mbs);
	if (ret != strlen(uchar_str)) {
		errx(EXIT_FAILURE, "failed to convert string, got %zd",
		    ret);
	}
	(void) printf("Converted %zu bytes into UTF-32 character "
	    "0x%x0, ret, out);
	return (0);
}
When compiled and run, this produces:
$ ./a.out Converted 3 bytes into UTF-32 character 0x5149
Example 2 Handling surrogate pairs from the
    mbrtoc16() function.
#include <locale.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>
#include <stdio.h>
#include <uchar.h>
int
main(void)
{
        mbstate_t mbs;
        char16_t first, second;
        size_t ret;
        const char *uchar_str = "\xf0\x9f\x92\xa9";
        (void) memset(&mbs, 0, sizeof (mbs));
        (void) setlocale(LC_CTYPE, "en_US.UTF-8");
        ret = mbrtoc16(&first, uchar_str, strlen(uchar_str), &mbs);
        if (ret != strlen(uchar_str)) {
                errx(EXIT_FAILURE, "failed to convert string, got %zd",
                    ret);
        }
        ret = mbrtoc16(&second, "", 0, &mbs);
        if (ret != (size_t)-3) {
                errx(EXIT_FAILURE, "didn't get second surrogate pair, "
                    "got %zd", ret);
        }
        (void) printf("UTF-16 surrogates: 0x%x 0x%x0, first, second);
        return (0);
}
When compiled and run, this produces:
$ ./a.out UTF-16 surrogates: 0xd83d 0xdca9
The mbrtoc16(),
    mbrtoc32(), mbrtowc(), and
    mbrtowc_l() functions will fail if:
The
    mbrtoc16(),
    mbrtoc32(),
    mbrtowc(),
    and mbrtowc_l() functions are
    MT-Safe
    as long as different mbstate_t structures are passed
    in ps. If ps is
    NULL or different threads use the same value for
    ps, then the functions are
    Unsafe.
c16rtomb(3C), c32rtomb(3C), newlocale(3C), setlocale(3C), uselocale(3C), wcrtomb(3C), uchar.h(3HEAD), environ(7)
| June 5, 2023 | OmniOS |