idn2.conf(5) File Formats and Configurations idn2.conf(5)

idn2.conf, .idn2rc - configuration files for idnkit version 2

/etc/idn2.conf
~/.idn2rc

idn2.conf and .idn2rc are default configuration files for the idnkit library version 2 which is a toolkit for handling internationalized domain names. idnkit version 2 supports IDNA2008 only. For IDNA2003, use idnkit version 1. idnkit version 2 also supports UTS #46, but it is restrictive and experimental (see ``UTS #46 SUPPORT'').

If a path to the configuration file is specified explicitly by an application, the idnkit library tries to read the file. Otherwise, the idnkit library tries to load the user's configuration file ~/.idn2rc first, and then tries the system configuration file /etc/idn2.conf. Note that idnkit library loads either, not both.

If no default configuration file exists on the system, the idnkit library assumes the configuration file is empty.

The configuration file is a simple text file, and each line in the file (other than comment lines, which begin with ``#'', and empty lines) forms an entry of the following format:

keyword value..

``language'' entry specifies the ``current language''. The current language is used when the idnkit library performs lowercase conversion (see ``MAP ENTRY'') and language-based local mapping (see ``LANGUAGE-LOCAL ENTRY'').

The entry can be specified only once. If the entry is not specified, the library determines the current language from locale information.

syntax)

language  language

language must be an ISO639 language code. Both ISO639-1 (e.g. ``en'' for English) and ISO639-2 (e.g. ``eng'') codes are recognized.

``map'' entry specifies mapping procedures. Unlike IDNA2003, IDNA2008 doesn't define explicit mapping procedures. The idnkit library performs mapping procedures according with the entry.

syntax)

map  procedure ...

The following procedures are currently available:

Map uppercase letters to lowercase.
Decompose full-width and half-width characters.
Unicode Normalization Form C.
Unicode Normalization Form KC.
Map specific characters to ``.'' (U+002E; FULL STOP).
UTS #46 non-transitional mapping.
UTS #46 transitional mapping.
UTS #46 non-transitional validation.
UTS #46 transitional validation.
Language based local mapping.
TLD based local mapping.
Apply ``lowercase'', ``width'', ``nfc'' and ``delimitermap'' in that order.
Same as ``rfc5895''.
Apply ``tr46-map'', ``nfc'' and ``tr46-check'' in that order.
Apply ``tr46-map-deviation'', ``nfc'' and ``tr46-check-deviation'' in that order.

The procedures are executed in the order listed in the entry. The same procedure can be specified twice or more. Suppose that ``map nfc language-local nfc'' is specified, idnkit does Unicode Normalization Form C, language based local mapping, and then performs NFC again.

The entry can be specified only once. If the entry is not specified, the library supposes that:

map rfc5895 language-local nfc

is specified.

``delimiters'' specifies code points which should be mapped to ``.'' (U+002E; FULL STOP) at delimitermap.

The mapping is applied only when ``delimitermap'' is specified in a ``map'' entry.

syntax)

delimitermap  code-point ...

code-point is a hexadecimal integer of Unicode code point of a delimiter (e.g. 3002), which can be preceded by ``U+'' (e.g. U+3002).

The entry can be specified only once. If the entry is not specified, the library assumes "3002" is specified.

``language-local'' entry specifies language based local mapping.

The mapping procedure is applied only when ``language-local'' is specified in a ``map'' entry.

syntax)

language-local  language  map-file

If the current language matches language, mapping specified by map-file is performed. Otherwise no mappings are performed.

language must be an ISO639 language code. Both ISO639-1 (e.g. ``en'' for English) and ISO639-2 (e.g. ``eng'') codes are recognized.

A local mapping with ``*'' as language is a default mapping. When the current language is not matched to any languages of ``language-local'' entries, the default mapping is applied.

``tld-local'' entry specifies TLD (top level domain) based local mapping.

The mapping is applied only when ``tld-local'' is specified in a ``map'' entry.

syntax)

tld-local  tld  map-file

If a TLD of a domain name matches tld, mapping specified by map-file is performed on the domain name. Otherwise no mappings are performed.

If tld is ``*'', mapping is applied to domain names whose TLD does not match any TLDs specified in ``tld-local'' entries. If tld is ``-'', the mapping is applied to domain names without any dots.

For backward compatibility to idnkit version 1.0, the entry name ``local-map'' can be used instead of ``tld-local''. The entry can be defined multiple times.

idn2.conf or ~/.idn2rc doesn't have an entry to specify the local encoding, since it is determined from the application's current locale information. That is to say each application can use different local encoding.

Though idnkit tries hard to find out the local encoding, sometimes it fails. For example, there are applications which use non-ASCII encoding but work in C locale. In this case, you can specify the application's local encoding by an environment variable ``IDN_LOCAL_CODESET''. Just set the encoding name (or its alias name) to the variable, and idnkit will use the encoding as the local one, regardless of the locale setting.

idnkit version 2 also supports UTS (Unicode Technical Standard) #46, but it is restrictive since the goal of idnkit version 2 is to support IDNA2008.

idnkit version 2 provides four mapping procedures for UTS #46:

Input of the mapping procedure is a whole domain name, not a list of labels, and the domain name may contains A-labels. ``tr46-check'' and ``tr46-check-deviation'' themselves don't split the domain name into labels or convert A-labels in it to U-labels. That is to say that idnkit cannot apply ``tr46-check'' or ``tr46-check-deviation'' to A-labels.

The following shows a sample configuration file.

#
# a sample configuration.
#
# The current language.
language ja
# Mapping procedures.
map lowercase width nfc delimitermap language-local nfc
# Register delimiters
delimiters 3002 ff0e ff61
# Register language-specific mappings for Japanese and Turkish.
language-local ja /usr/local/share/idnkit/map/ja.map
language-local tr /usr/local/share/idnkit/map/tr.map

/etc/idn2.conf
~/.idn2rc
/etc/idn2.conf.sample - sample configuration with comments

idncheck(1), idncmp(1), idnconv2(1), iconv(3), libidnkit(3), idnalias.conf(5), idnlang.conf(5)

September 21, 2012 OmniOS