MHD(4I) | Ioctl Requests | MHD(4I) |
mhd
— multihost
disk control operations
#include
<sys/mhd.h>
The mhd
ioctl(2) control access rights of a
multihost disk, using disk reservations on the disk device.
The stability level of this interface (see attributes(7)) is evolving. As a result, the interface is subject to change and you should limit your use of it.
The mhd ioctls fall into two major categories: (1) ioctls for non-shared multihost disks and (2) ioctls for shared multihost disks.
One ioctl, MHIOCENFAILFAST
, is applicable
to both non-shared and shared multihost disks. It is described after the
first two categories.
All the ioctls require root privilege.
For all of the ioctls, the caller should obtain the file
descriptor for the device by calling
open(2) with the
O_NDELAY
flag; without the
O_NDELAY
flag, the open may fail due to another host
already having a conflicting reservation on the device. Some of the ioctls
below permit the caller to forcibly clear a conflicting reservation held by
another host, however, in order to call the ioctl, the caller must first
obtain the open file descriptor.
Non-shared multihost disks ioctls consist of
MHIOCTKOWN
, MHIOCRELEASE
,
MHIOCSTATUS
, and
MHIOCQRESERVE
. These ioctl requests control the
access rights of non-shared multihost disks. A non-shared multihost disk is
one that supports serialized, mutually exclusive I/O mastery by the
connected hosts. This is in contrast to the shared-disk model, in which
concurrent access is allowed from more than one host (see below).
A non-shared multihost disk can be in one of two states:
Each multihost disk driver views the machine on which it's running as the “local host”; each views all other machines as “remote hosts”. For each I/O or ioctl request, the requesting host is the local host.
Note that the non-shared ioctls are designed to work with SCSI-2 disks. The SCSI-2 RESERVE/RELEASE command set is the underlying hardware facility in the device that supports the non-shared ioctls.
The function prototypes for the non-shared ioctls are:
ioctl
(fd, MHIOCTKOWN);ioctl
(fd, MHIOCRELEASE);ioctl
(fd, MHIOCSTATUS);ioctl
(fd, MHIOCQRESERVE);
MHIOCTKOWN
Implementation Note: Reservations (exclusive access rights) broken via random resets should be reinstated by the driver upon their detection, for example, in the automatic probe function described below.
MHIOCRELEASE
MHIOCSTATUS
EIO
if the probe failed for some other
reason.MHIOCQRESERVE
EACCES
. The MHIOCQRESERVE
ioctl does NOT issue a bus device reset or bus reset prior to attempting
the SCSI-2 reserve command. It also does not take care of re-instating
reservations that disappear due to bus resets or bus device resets; if
that behavior is desired, then the caller can call
MHIOCTKOWN
after the
MHIOCQRESERVE
has returned success. If the device
does not support the SCSI-2 Reserve command, then the ioctl returns
-1
with errno set to
ENOTSUP
. The MHIOCQRESERVE
ioctl is intended to be used by high-availability or clustering software
for a “quorum” disk, hence, the “Q” in the
name of the ioctl.Shared multihost disks ioctls control access to shared multihost disks. The ioctls are merely a veneer on the SCSI-3 Persistent Reservation facility. Therefore, the underlying semantic model is not described in detail here, see instead the SCSI-3 standard. The SCSI-3 Persistent Reservations support the concept of a group of hosts all sharing access to a disk.
The function prototypes and descriptions for the shared multihost ioctls are as follows:
ioctl
(fd,
MHIOCGRP_INKEYS, (mhioc_inkeys_t
*)k)Issues the SCSI-3 command Persistent Reserve In Read Keys to
the device. On input, the field k->li should be
initialized by the caller with k->li.listsize
reflecting how big of an array the caller has allocated for the
k->lilist field and with
‘k->li.listlen == 0
’. On
return, the field k->li.listlen is updated to
indicate the number of reservation keys the device currently has: if
this value is larger than k->li.listsize then
that indicates that the caller should have passed a bigger
k->li.list array with a bigger
k->li.listsize. The number of array elements
actually written by the callee into k->li.list
is the minimum of k->li.listlen and
k->li.listsize. The field
k->generation is updated with the generation
information returned by the SCSI-3 Read Keys query. If the device does
not support SCSI-3 Persistent Reservations, then this ioctl returns
-1 with errno set to
ENOTSUP
.
ioctl
(fd,
MHIOCGRP_INRESV, (mhioc_inresvs_t
*)r)Issues the SCSI-3 command Persistent Reserve In Read
Reservations to the device. Remarks similar to
MHIOCGRP_INKEYS
apply to the array manipulation.
If the device does not support SCSI-3 Persistent Reservations, then this
ioctl returns -1 with errno set
to ENOTSUP
.
ioctl
(fd,
MHIOCGRP_REGISTER, (mhioc_register_t
*)r)Issues the SCSI-3 command Persistent Reserve Out Register. The
fields of structure r are all inputs; none of the
fields are modified by the ioctl. The field
r->aptpl should be set to true to specify that
registrations and reservations should persist across device power
failures, or to false to specify that registrations and reservations
should be cleared upon device power failure; true is the recommended
setting. The field r->oldkey is the key that
the caller believes the device may already have for this host initiator;
if the caller believes that that this host initiator is not already
registered with this device, it should pass the special key of all
zeros. To achieve the effect of unregistering with the device, the
caller should pass its current key for the
r->oldkey field and an
r->newkey field containing the special key of
all zeros. If the device returns the SCSI error code Reservation
Conflict, this ioctl returns -1 with
errno set to EACCES
.
ioctl
(fd,
MHIOCGRP_RESERVE, (mhioc_resv_desc_t
*)r)Issues the SCSI-3 command Persistent Reserve Out Reserve. The
fields of structure r are all inputs; none of the
fields are modified by the ioctl. If the device returns the SCSI error
code Reservation Conflict, this ioctl returns -1 with
errno set to EACCES
.
ioctl
(fd,
MHIOCGRP_PREEMPTANDABORT,
(mhioc_preemptandabort_t *)r)Issues the SCSI-3 command Persistent Reserve Out
Preempt-And-Abort. The fields of structure r are
all inputs; none of the fields are modified by the ioctl. The key of the
victim host is specified by the field
r->victim_key. The field
r->resvdesc supplies the preempter's key and
the reservation that it is requesting as part of the SCSI-3
Preempt-And-Abort command. If the device returns the SCSI error code
Reservation Conflict, this ioctl returns -1 with
errno set to EACCES
.
ioctl
(fd,
MHIOCGRP_PREEMPT,
(mhioc_preemptandabort_t *)r)Similar to MHIOCGRP_PREEMPTANDABORT
,
but instead issues the SCSI-3 command Persistent Reserve Out Preempt.
(Note: This command is not implemented).
ioctl
(fd,
MHIOCGRP_CLEAR, (mhioc_resv_key_t
*)r)MHIOCGRP_REGISTER
.For each device, the non-shared ioctls should not be mixed with
the Persistent Reserve Out shared ioctls, and vice-versa, otherwise, the
underlying device is likely to return errors, because SCSI does not permit
SCSI-2 reservations to be mixed with SCSI-3 reservations on a single device.
It is, however, legitimate to call the Persistent Reserve In ioctls, because
these are query only. Issuing the MHIOCGRP_INKEYS
ioctl is the recommended way for a caller to determine if the device
supports SCSI-3 Persistent Reservations (the ioctl will return
-1 with errno set to
ENOTSUP
if the device does not).
The MHIOCENFAILFAST
ioctl is applicable
for both non-shared and shared disks, and may be used with either the
non-shared or shared ioctls.
ioctl
(fd,
MHIOENFAILFAST, (unsigned int
*)millisecs)Enables or disables the failfast option in the multihost disk
driver and enables or disables automatic probing of a multihost disk,
described below. The argument is an unsigned integer specifying the
number of milliseconds to wait between executions of the automatic probe
function. An argument of zero disables the failfast option and disables
automatic probing. If the MHIOCENFAILFAST
ioctl
is never called, the effect is defined to be that both the failfast
option and automatic probing are disabled.
The MHIOCENFAILFAST
ioctl sets up a
timeout in the driver to periodically schedule automatic probes of the disk.
The automatic probe function works in this manner: The driver is scheduled
to probe the multihost disk every n milliseconds, rounded up to the next
integral multiple of the system clock's resolution. If
the driver immediately panics the machine to comply with the failfast model.
If the driver makes this discovery outside the timeout function, especially during a read or write operation, it is imperative that it panic the system then as well.
Each request returns -1 on failure and sets errno to indicate the error.
EPERM
EACCES
EIO
EOPNOTSUP
Uncommitted
March 13, 2022 | OmniOS |