|INTRO(9F)||Kernel Functions for Drivers||INTRO(9F)|
Most manual pages are similar to those in other sections. They have common fields such as the NAME, a SYNOPSIS to show which header files to include and prototypes, an extended DESCRIPTION discussing its use, and the common combination of RETURN VALUES and ERRORS. Some manuals will have examples and additional manuals to reference in the SEE ALSO section.
DDI_FAILURE, indicating success and failure respectively. Some functions will return additional error codes to indicate why something failed. In general, when checking a response code is always preferred to compare that something equals or does not equal
DDI_SUCCESSas there can be many different error cases and additional ones can be added over time.
When executing high-level interrupts, the thread may only execute a limited number of functions. In particular, it may call ddi_intr_trigger_softint(9F), mutex_enter(9F), and mutex_exit(9F). It is critical that the mutex being used be properly initialized with the driver's interrupt priority. The system will transparently pick the correct implementation of a mutex based on the interrupt type. Aside from the above, one must not block while in high-level interrupt context.
On the other hand, when a thread is not in high-level
interrupt context, most of these restrictions are lifted. Kernel memory
may be allocated (if using a non-blocking allocation such as
KM_NOSLEEP_LAZY), and many of the other
documented functions may be called.
Regardless of whether a thread is in high-level or low-level interrupt context, it will never have a user context associated with it and therefore cannot use routines like ddi_copyin(9F) or ddi_copyout(9F).
Every function listed below has its own manual page in section 9F and can be read with man(1). In addition, some corresponding concepts are documented in section 9 and some groups of functions are present to support a specific type of device driver, which is discussed more in section 9E .
The console should be used sparingly. While a notice may be found there, one should assume that it may be missed either due to overflow, not being connected to say a serial console at the time, or some other reason. While the system log is better than the console, folks need to take care not to spam the log. Imagine if someone logged every time a network packet was generated or received, you'd quickly potentially run out of space and make it harder to find useful messages for bizarre behavior. It's also important to remember that only system administrators and privileged users can actually see this log. Where possible and appropriate use programmatic errors in routines that allow it.
The system also supports a structured event log called a system event that is processed by syseventd(8). This is used by the OS to provide notifications for things like device insertion and removal or the change of a data link. These are driven by the ddi_log_sysevent(9F) function and allow arbitrary additional structured metadata in the form of a nvlist_t.
When allocating memory, an important choice must be made: whether
or not to block for memory. If one opts to perform a sleeping allocation,
then the caller can be guaranteed that the allocation will succeed, but it
may take some time and the thread will be blocked during that entire
duration. This is the
KM_SLEEP flag. On the other
hand, there are many circumstances where this is not appropriate, especially
because a thread that is inside a memory allocation function cannot
currently be cancelled. If the thread corresponds to a user process, then it
will not be killable.
Given that there are many situations where this is not
appropriate, the kernel offers an allocation mode where it will not block
for memory to be available:
KM_NOSLEEP_LAZY. These allocations can fail and
NULL when they do fail. Even though these are
said to be no sleep operations, that does not mean that the caller may not
end up temporarily blocked due to mutex contention or due to trying a bit
more aggressively to reclaim memory in the case of
KM_NOSLEEP. Unless operating in special
KM_NOSLEEP_LAZY should be
If a device driver has its own complex object that has more significant set up and tear down costs, then the kmem cache function family should be considered. To use a kmem cache, it must first be created using the kmem_cache_create(9F) function, which requires specifying the size, alignment, and constructors and destructors. Individual objects are allocated from the cache with the kmem_cache_alloc(9F) function. An important constraint when using the caches is that when an object is freed with kmem_cache_free(9F), it is the callers responsibility to ensure that the object is returned to its constructed state prior to freeing it. If the object is reused, prior to the kernel reclaiming the memory for other uses, then the constructor will not be called again. Most device drivers do not need to create a kmem cache for their own allocations.
If you are writing a device driver that is trying to interact with the networking, STREAMS, or USB subsystems, then they are generally using the mblk_t data structure which is managed through a different set of APIs, though they are leveraging kmem under the hood.
The vmem set of interfaces allows for the management of abstract regions of integers, generally representing memory or some other object, each with an offset and length. While it is not common that a device driver needs to do their own such management, vmem_create(9F) and vmem_alloc(9F) are what to reach for when the need arises. Rather than using vmem, if one needs to model a set of integers where each is a valid identifier, that is you need to allocate every integer between 0 and 1000 as a distinct identifier, instead use id_space_create(9F) which is discussed in Identifier Management. For more information on vmem, see vmem(9).
A nvlist_t structure is initialized with the nvlist_alloc(9F) function and can operate with two different degrees of uniqueness: a mode where only names are unique or that every name is qualified to a type. The former means that if I have an integer name “foo” and then add a string, array, or any other value with the same name, it will be replaced. However, if were using the name and type as unique, then the value would only be replaced if both the pair's type and the name “foo” matched a pair that was already present. Otherwise, the two different entries would co-exist.
When constructing an nvlist, it is normally backed by the normal kmem allocator and may either use sleeping or non-sleeping allocations. It is also possible to use a custom allocator, though that generally has not been necessary in the kernel.
Specific keys and values can be looked up directly with the nvlist_lookup family of functions, but the entire list can be iterated as well, which is especially useful when trying to validate that no unknown keys are present in the list. The iteration API nvlist_next_nvpair(9F) allows one to then get both the key's name, the type of value of the pair, and then the value itself.
Due to the current implementation, callers should generally prefer the non-sleeping variants because the sleeping ones are not cancellable (currently this is backed by vmem, but this should not be assumed and may change in the future).
In addition, condition variables provide means for waiting and detecting that a signal has been delivered. These variants are particularly useful when writing character device operations for device drivers as it allows users the chance to cancel an operation and not be blocked indefinitely on something that may not occur. These _sig variants should generally be preferred where applicable.
The kernel also provides memory barrier primitives. See the Memory Barriers section for more information. There is no need to use manual memory barriers when using the synchronization primitives. The synchronization primitives contain that the appropriate barriers are present to ensure coherency while the lock is held.
The mod_install(9F) and mod_remove(9F) functions are used during a driver's _init(9E) and _fini(9E) functions.
There are two different ways that drivers often manage their instance state which is created during attach(9E). The first is the use of ddi_set_driver_private(9F) and ddi_get_driver_private(9F). This stores a driver-specific value on the dev_info_t structure which allows it to be used during other operations. Some device driver frameworks may use this themselves, making this unavailable to the driver.
The other path is to use the soft state suite of functions which dynamically grows to cover the number of instances of a device that exist. The soft state is generally initialized in the _init(9E) entry point with ddi_soft_state_init(9F) and then instances are allocated and freed during attach(9E) and detach(9E) with ddi_soft_state_zalloc(9F) and ddi_soft_state_free(9F), and then retrieved with ddi_get_soft_state(9F).
There are many different informational properties about a device driver. For example, ddi_driver_name(9F) returns the name of the device driver, ddi_get_name(9F) returns the name of the node in the tree, ddi_get_parent(9F) returns a node's parent, and ddi_get_instance(9F) returns the instance number of a specific driver.
There are a series of properties that exist on the tree, the exact set of which depend on the class of the device and are often documented in a specific device class's manual. For example, the “reg” property is used for PCI and PCIe devices to describe the various base address registers, their types, and related, which are documented in pci(5).
When getting a property one can constrain it to the current instance or you can ask for a parent to try to look up the property. Which mode is appropriate depends on the specific class of driver, its parent, and the property.
Using a dev_info_t * pointer has to be done carefully. When a device driver is in any of its dev_ops(9S), cb_ops(9S), or similar callback functions that it has registered with the kernel, then it can always safely use its own dev_info_t and those of any parents it discovers through ddi_get_parent(9F). However, it cannot assume the validity of any siblings or children unless there are other circumstances that guarantee that they will not disappear. In the broader kernel, one should not assume that it is safe to use a given dev_info_t * structure without the appropriate NDI (nexus driver interface) hold having been applied.
To facilitate accessing memory, the kernel provides a few routines that can be used. In most contexts the main thing to use is ddi_copyin(9F) and ddi_copyout(9F). These will safely dereference addresses and ensure that the address is appropriate depending on whether this is coming from the user or kernel. When operating with the kernel's uio_t structure which is for mostly used when processing read and write requests, instead uiomove(9F) is the goto function.
When reading data from userland into the kernel, there is another concern: the data model. The most common place this comes up is in an ioctl(9E) handler or other places where the kernel is operating on data that isn't fixed size. Particularly in C, though this applies to other languages, structures and unions vary in the size and alignment requirements between 32-bit and 64-bit processes. The same even applies if one uses pointers or the long, size_t, or similar types in C. In supported 32-bit and 64-bit environments these types are 4 and 8 bytes respectively. To account for this, when data is not fixed size between all data models, the driver must look at the data model of the process it is copying data from.
The simplest way to solve this problem is to try to make the data
structure the same across the different models. It's not sufficient to just
use the same structure definition and fixed size types as the alignment and
padding between the two can vary. For example, the alignment of a 64-bit
integer like a uint64_t can change between a 32-bit
and 64-bit data model. One way to check for the data structures being
identical is to leverage the
ctfdiff(1) program, generally with
However, there are times when a structure simply can't be the same, such as when we're encoding a pointer into the structure or a type like the size_t. When this happens, the most natural way to accomplish this is to use the ddi_model_convert_from(9F) function which can determine the appropriate model from the ioctl's arguments. This provides a natural way to copy a structure in and out in the appropriate data model and convert it at those points to the kernel's native form.
An alternate way to approach the data model is to use the STRUCT_DECL(9F) functions, but as this requires wrapping every access to every member, often times the ddi_model_convert_from(9F) approach and taking care of converting values and ensuring that limits aren't exceeded at the end is preferred.
To begin with register setup, one often first looks at the number
of register sets that exist and their size. Most PCI-based device drivers
will skip calling
ddi_dev_nregs(9F) and will
just move straight to calling
determine the size of a register set that they are interested in. To
actually map the registers, a device driver will call
which requires both a register set and a series of attributes and returns an
access handle that is used to actually read and write the registers. When
setting up registers, one must have a corresponding
ddi_device_acc_attr_t structure which is used to
define what endianness the register set is in, whether any kind of
reordering is allowed (if in doubt specify
DDI_STRICTORDER_ACC), and whether any particular
error handling is being used. The structure and all of its different options
are described in
Once a register handle is obtained, then it's easy to read and write the register space. Functions are organized based on the size of the access. For the most part, most situations call for the use of the ddi_get8(9F), ddi_get16(9F), ddi_get32(9F), and ddi_get64(9F) functions to read a register and the ddi_put8(9F), ddi_put16(9F), ddi_put32(9F), and ddi_put64(9F) functions to set a register value. While there are the ddi_io_ and ddi_mem_ families of functions below, these are not generally needed and are generally present for compatibility. The kernel will automatically perform the appropriate type of register read for the device type in question.
Once a register set is no longer being used, the ddi_regs_map_free(9F) function should be used to release resources. In most cases, this happens while executing the detach(9E) entry point.
The first thing that a driver needs to do to set up DMA is to understand the constraints of the device and bus. These constraints are described in a series of attributes in the ddi_dma_attr_t structure which is defined in ddi_dma_attr(9S). The reason that attributes exist is because different devices, and sometimes different memory uses with a device, have different requirements for memory. A simple example of this is that not all devices can accept memory addresses that are 64-bits wide and may have to be constrained to the lower 32-bits of memory. Another common constraint is how this memory is chunked up. Some devices may require that all of the DMA memory be contiguous, while others can allow that to be broken up into say up to 4 or 8 different regions.
When memory is allocated for DMA it isn't immediately mapped into the kernel's address space. The addresses that describe a DMA address are defined in a DMA cookie, several of which may make up a request. However, those addresses are always physical addresses or addresses that are virtualized by an IOMMU. There are some cases were the kernel or a driver needs to be able to access that memory, such as memory that represents a networking packet. The IP stack will expect to be able to actually read the data it's given.
To begin with allocating DMA memory, a driver first fills out its attribute structure. Once that's ready, the DMA allocation process can begin. This starts off by a driver calling ddi_dma_alloc_handle(9F). This handle is used through the lifetime of a given DMA memory buffer, but it can be used across multiple operations that a device or the kernel may perform. The next step is to actually request that the kernel allocate some amount of memory in the kernel for this DMA request. This phase actually allocates addresses in virtual address space for the activity and also requires a register attribute object that is discussed in Device Register Setup and Access. Armed with this a driver can now call ddi_dma_mem_alloc(9F) to specify how much memory they are looking for. If this is successful, a virtual address, the actual length of the region, and an access handle will be returned.
At this point, the virtual address region is present. Most drivers will access this virtual address range directly and will ignore the register access handle. The side effect of this is that they will handle all endianness issues with the memory region themselves. If the driver would prefer to go through the handle, then it can use the register access functions discussed earlier.
Before the memory can be programmed into the device, it must be bound to a series of physical addresses or addresses virtualized by an IOMMU. While the kernel presents the illusion of a single consistent virtual address range for applications, the physical reality can be quite different. When the driver is ready it calls ddi_dma_addr_bind_handle(9F) to create the mapping to well known physical addresses.
These addresses are stored in a series of cookies. A driver can determine the number of cookies for a given request by utilizing its DMA handle and calling ddi_dma_ncookies(9F) and then pairing that with ddi_dma_cookie_get(9F). These DMA cookies will not change and can be used time and time again until ddi_dma_unbind_handle(9F) is called. With this information in hand, a physical device can be programmed with these addresses and let loose to perform I/O.
When performing I/O to and from a device, synchronization is a vitally important thing which ensures that the actual state in memory is coherent with the rest of the CPU's internal structures such as caches. In general, a given DMA request is only going in one direction: for a device or for the local CPU. In either case, the ddi_dma_sync(9F) function must be called after the kernel is done writing to a region of DMA memory and before it triggers the device or the kernel must call it after the device has told it that some activity has completed that it is going to check.
Some DMA operations utilize what are called DMA windows. The most common consumer is something like a disk device where DMA operations to a given series of sectors can be split up into different chunks where as long as all the transfers are performed, the intermediate states are acceptable. Put another way, because of how SCSI and SAS commands are designed, block devices can basically take a given I/O request and break it into multiple independent I/Os that will equate to the same final item.
When a device supports this mode of operation and it is opted into, then a DMA allocation may result in the use of DMA windows. This allows for cases where the kernel can't perform a DMA allocation for the entire request, but instead can allocate a partial region and then walk through each part one at a time. This is uncommon outside of block devices and usually also is related to calling ddi_dma_buf_bind_handle(9F).
Drivers first need to know how many interrupts that they require. For example, a networking driver may want to have an interrupt made available for each ring that it has. To discover the number of interrupts available, the driver should call ddi_intr_get_navail(9F). If there are sufficient interrupts, it can proceed to actually allocate the interrupts with ddi_intr_alloc(9F). When allocating interrupts, callers need to check to see how many interrupts the system actually gave them. Just because an interrupt is allocated does not mean that it will fire or be ready to use, there are a series of additional steps that the driver must take.
To go through and enable the interrupt, the driver should go through and get the interrupt capabilities with ddi_intr_get_cap(9F) and the priority of the interrupt with ddi_intr_get_pri(9F). The priority must be used while creating mutexes and related synchronization primitives that will be used during the interrupt handler. At this point, the driver can go ahead and register the functions that will be called with each allocated interrupt with the ddi_intr_add_handler(9F) function. The arguments can vary for each allocated interrupt. It is common to have an interrupt-specific data structure passed in one of the arguments or an interrupt number, while the other argument is generally the driver's instance-specific data structure.
At this point, the last step for the interrupt to be made active
from the kernel's perspective is to enable it. This will use either the
functions depending on the interrupt's capabilities. The reason that these
are different is because some interrupt types (MSI) require that all
interrupts in a group be enabled and disabled at the same time. This is
indicated with the
DDI_INTR_FLAG_BLOCK flag found in
the interrupt's capabilities. Once that is called, interrupts that are
generated by a device will be delivered to the registered function.
It's important to note that there is often device-specific interrupt setup that is required. While the kernel takes care of updating any pieces of the processor's interrupt controller, I/O crossbar, or the PCI MSI and MSI-X capabilities, many devices have device-specific registers that are used to manage, set up, and acknowledge interrupts. These registers or other controls are often capable of separately masking interrupts and are generally what should be used if there are times that you need to separately enable or disable interrupts such as to poll an I/O ring.
When unwinding interrupts, one needs to work in the reverse order here. Until ddi_intr_block_disable(9F) or ddi_intr_disable(9F) is called, one should assume that their interrupt handler will be called. Due to cases where an interrupt is shared between multiple devices, this can happen even if the device is quiesced! Only after that is done is it safe to then free the interrupts with a call to ddi_intr_free(9F).
In UNIX tradition, character, block, and STREAMS device special files are identified by a major and minor number. All instances of a given driver share the same major number, which means that a device driver must coordinate the minor number space across all instances. While a minor node is created with a fixed minor number, it is possible to change the minor number while processing an open(9E) call, allowing subsequent character device operations to uniquely identify a particular caller. This is usually referred to as a driver that “clones”.
When drivers aren't performing cloning, then usually the minor
number used when creating the minor node is some fixed offset or multiple of
the driver's instance number. When cloning and a driver needs to allocate
and manage a minor number space, usually an ID space is leveraged whose IDs
are usually in the range from 0 through
There are severa different strategies for tracking data structures as they
relate to minor numbers. Sometimes, the soft state functionality is used.
Others might keep an AVL tree around or tie the data to some other data
structure. The method chosen often varies on the specifics of the
implementation and its broader context.
The dev_t structure represents the combined major and minor number. It can be taken apart with the getmajor(9F) and getminor(9F) functions and then reconstructed with the makedevice(9F) function.
The high-resolution clock is implemented using an architecture and platform-specific means. For example, on x86 it is generally backed by the TSC (time stamp counter).
In general, this time should not be used by drivers for any purpose. It can jump around, drift, and most aspects in the kernel are not based on the real-time clock. For any device timing activities, the high-resolution clock should be used.
In general, drivers should prefer the high-resolution monotonic clock for tracking events internally.
With these different timing mechanisms, the kernel provides a few different ways to delay execution or to get a callback after some amount of time passes.
The delay(9F) and drv_usecwait(9F) functions are used to block the execution of the current thread. delay(9F) can be used in conditions where sleeping and blocking is allowed where as drv_usecwait(9F) is a busy-wait, which is appropriate for some device drivers, particularly when in high-level interrupt context.
The kernel also allows a function to be called after some time has elapsed. This callback occurs on a different thread and will be executed in kernel context. A timeout can be scheduled in the future with the timeout(9F) function and cancelled with the untimeout(9F) function. There is also a STREAMs-specific version that can be used if the circumstances are required with the qtimeout(9F) function.
These are all considered one-shot events. That is, they will only happen once after being scheduled. If instead, a driver requires periodic behavior, such as needing something to occur every second, then it should use the ddi_periodic_add(9F) function to establish that.
While task queues are a flexible mechanism for handling and processing events that occur in a well defined context, they do not have an inherent backpressure mechanism built in. This means it is possible to add events to a task queue faster than they can be processed. For high-volume events, this must be considered before just dispatching an event. Do not rely on a non-sleeping allocation in the task queue dispatch context.
Most operations that device drivers implement are given a credential. However, from within the kernel, a credential can be obtained that refers to a specific zone, the current process, or a generic kernel credential.
It is up to drivers and the kernel writ-large to check whether a
given credential is authorized to perform a given operation. This is
encapsulated by the various privilege checks that exist. The most common
check used is drv_priv(9F) which
For device drivers, particularly those that represent block devices, they should first call ddi_devid_init(9F) to initialize the device ID data structure. After that is done, it is then safe to call ddi_devid_register(9F) to notify the kernel about the ID.
Message blocks are chained together by a series of two different
pointers: b_cont and b_next.
When a message is split across multiple data buffers, they are linked by the
b_cont pointer. However, multiple distinct messages
can be chained together and linked by the b_next
pointer. Let's look at this in the context of a series of networking
packets. If we had a chain of say 10 UDP packets that we were given, each
UDP packet is considered an independent message and would be linked from one
to the next based on the order they should be transmitted with the
b_next pointer. However, an individual message may be
entirely in one message block, in which case its
b_cont pointer would be
but if say the packet were split into a 100 byte data buffer that contained
the headers and then a 1000 byte data buffer that contained the actual
packet data, those two would be linked together by
b_cont. A continued message would never have its next
pointer used to link it to a wholly different message. Visually you might
see this as:
+---------------+ | UDP Message 0 | | Bytes 0-1100 | | b_cont ---+--> NULL | b_next + | +---------|-----+ | v +---------------+ +----------------+ | UDP Message 1 | | UDP Message 1+ | | Bytes 0-100 | | Bytes 100-1100 | | b_cont ---+--> | b_cont ----+->NULL | b_next + | | b_next ----+->NULL +---------|-----+ +----------------+ | ... | v +---------------+ | UDP Message 9 | | Bytes 0-1100 | | b_cont ---+--> NULL | b_next ---+--> NULL +---------------+
Message blocks all have an associated data block which contains
the actual data that is present. Multiple message blocks can share the same
data block as well. The data block has a notion of a type, which is
M_DATA which signifies that they operate
To allocate message blocks, one generally uses the allocb(9F) function to create one; however, you can also create message blocks using your own source of data through functions like desballoc(9F). This is generally used when one wants to use memory that was originally used for DMA to pass data back into the kernel, such as in a networking device driver. When this happens, a callback function will be called once the last user of the data block is done with it.
The functions listed below often end in either “msg” or “b” to indicate that they will operate on an entire message and follow the b_cont pointer or they will not respectively.
The ddi_ufm_init(9E) and ddi_ufm_fini(9E) functions are used to indicate support of the subsystem to the kernel. The driver is required to use the ddi_ufm_update(9F) function to indicate both that it is ready to receive UFM requests and to indicate that any data that the kernel may have previously received has changed. Once that's completed, then the other functions listed here are generally used as part of implementing specific callback functions that are registered.
A driver can then go through and perform arbitrary reads of the firmware file through the firmware_read(9F) interface until they have read everything that they need. Once complete, the corresponding handle needs to be released through the firmware_close(9F) function.
To begin, a driver must declare which capabilities it implements during its attach(9E) function by calling ddi_fm_init(9F). The set of capabilities it receives back may be less than what was requested because the capabilities are dependent on the overall chain of drivers present.
DDI_FM_EREPORT_CAPABLE was negotiated,
then the driver is expected to generate error events when certain conditions
occur using the
function or the more specific
function. If a caller has negotiated
DDI_FM_ACCCHK_CAPABLE, then it is allowed to set up
its register attributes to indicate that it will check for errors on the
register handle after using functions like
ddi_set8(9F) by calling
and reacting accordingly. Similarly, if a driver has negotiated
DDI_FM_DMACHK_CAPABLE, then it will use
to check the results of DMA activity and handle the results appropriately.
Similar to register accesses, the DMA attributes must be updated to set that
error handling is anticipated on this handle. The
ddi_fm_init(9F) manual page has
an overview of the other types of flags that can be negotiated and how they
Device drivers register initially with the kernel by using the scsi_ha_init(9F) function and then, in their attach routine, register specific instances, using functions like scsi_hba_iport_register(9F) or instead scsi_hba_tran_alloc(9F) and scsi_hba_attach_setup(9F). New drivers are encouraged to use the target map and iports framework to simplify the device driver writing process.
To initially obtain a struct buf, drivers should begin by calling getrbuf(9S) at which point, the caller can fill in the structure. Once that's done, the physio(9F) function can be used to actually perform the I/O and wait until it's complete.
Once a given configuration, sometimes the default, is selected, then the driver can proceed to opening up what the USB architecture calls a pipe, which provides a way to send requests to a specific USB endpoint. First, specific endpoints can be looked up using the usb_lookup_ep_data(9F) function which gets information from the parsed descriptors and then that gets filled into an extended descriptor with usb_ep_xdescr_fill(9F). With that in hand, a pipe can be opened with usb_pipe_xopen(9F).
Once a pipe has been opened, which most often happens in a driver's attach(9E) entry point, then requests can be allocated and submitted. There is a different allocation for each type of request (e.g. usb_alloc_bulk_req(9F)) and a different submission function for each type as well. Each request structure has a corresponding page in section 9S that describes the structure, its members, and how to work with it.
One other major concern for USB devices, which isn't as common with other types of devices, is that they can be yanked out and reinserted at any time. To help determine when this happens, the kernel offers the usb_register_event_cbs(9F) function which allows a driver to register for callbacks when a device is disconnected, reconnected, or around checkpoint suspend/resume behavior.
To access PCI configuration space, a device driver should first call pci_config_setup(9F). Generally, drivers will call this in their attach(9E) entry point and then tear down the configuration space access with the pci_config_teardown(9F) entry point in detach(9E). After setting up access to configuration space, the returned handle can be used in all of the various configuration space routines to get and set specific sized values in configuration space.
STREAMS messages are passed around using message blocks, which use the mblk_t type. See Message Block Functions for more about how the data structure and functions that manipulate message blocks.
These functions should generally not be used when implementing a networking device driver today. See mac(9E) instead.
M_IOCTL, then these routines can often be used to convert the structure into one that asks for data to be copied in, copied out, or to finally acknowledge the ioctl as successful or to terminate the processing in error.
Kernel statistics are grouped using a tuple of four identifiers,
separated by colons when using
kstat(8). These are, in order, the
statistic module name, instance, a name which covers a group of statistics,
and an individual name for a statistic. In addition, kernel statistics have
a class which is used to group similar named groups of statistics together
across devices. When using
specify the first three parts of the tuple and the class. The naming of
individual statistics, the last part of the tuple, varies based upon the
type of the statistic. For the most part, drivers will use the kstat type
KSTAT_TYPE_NAMED, which allows multiple name-value
pairs to exist within the statistic. For example, the kernel's layer 2
networking framework, mac(9E), creates a
kstat with the driver's name and instance and names it “mac”.
Within this named group, there are statistics for all of the different
individual stats that the kernel and devices track such as bytes transmitted
and received, the state and speed of the link, and advertised and enabled
A device driver can initialize a kstat with the kstat_create(9F) function. It will not be made accessible to users until the kstat_install(9F) function is called. The device driver must perform additional initialization of the kstat before proceeding and calling kstat_install(9F). The kstat structure that drivers see is discussed in kstat(9S).
The ddi_cb_register(9F) function is used to collect over classes of events such as when participating in dynamic interrupt sharing.
Before opening a device itself, callers must obtain a notion of their identity which is used when making subsequent calls. The simplest form is often to use the device's dev_info_t and call ldi_ident_from_dip(9F); however, there are also methods available based upon having a dev_t or a STREAMS struct queue.
Once that identity is established, there are several ways to open a device such as ldi_open_by_dev(9F), ldi_open_by_devid(9F), or ldi_open_by_name(9F). Once an LDI device has been opened, then all of the other functions may be used to operate on the device; however, consumers of the LDI must think carefully about what kind of device they are opening. While a kernel pseudo-device driver cannot disappear while it is open, when the device represents an actual piece of hardware, it is possible for it to be physically removed and no longer be accessible. Consumers should not assume that a layered device will always be present.
The primary other locales that the system supports are generally UTF-8 based and so the kernel provides a set of routines to deal with UTF-8 and Unicode normalization. However, there are still cases where different character encodings are required or conversation between UTF-8 and some other type is required. This is provided by the kernel iconv framework, which provides a subset of the traditional userland iconv conversions.
To get started, drivers generally will need to first use net_protocol_lookup(9F) to get a handle to say that they're interested in looking at IPv4 or IPv6 traffic and then can allocate an actual hook object with hook_alloc(9F). After filling out the hook, the hook can be inserted into the actual system with net_hook_register(9F).
Hooks operate in the context of a networking stack. Every networking stack in the system is independent and therefore has its own set of interfaces, routing tables, settings, and related. Most zones have their own networking stack. This is the exclusive-IP option that is described in zoneadm(8).
Drivers can register to get a callback for every netstack in the system and be notified when they are created and destroyed. This is done by calling the net_instance_register(9F) function, filling out its data structure, and then finally calling net_instance_regster(9F). Like other callback interfaces, the moment the callback functions are registered, drivers need to expect that they're going to be called.
illumos Developer's Guide, https://www.illumos.org/books/dev/.
Writing Device Drivers, https://www.illumos.org/books/wdd/.
|January 26, 2023||OmniOS|