lunes, 18 de julio de 2022

Oracle ASM devices pointing to multipath devices and not scsi paths

 

How to make sure Oracle ASM devices pointing to multipath devices and not scsi paths, sd devices when using ASMLib to manage ASM disks?

 SOLUTION VERIFICADA - Actualizado  - 

Red Hat Insights can detect this issue

Proactively detect and remediate issues impacting your systems.

Medio Ambiente

  • Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8
  • device-mapper-multipath
  • Oracle ASM using ASMLib

Cuestión

  • Oracle application crashes when a single path in multipath fails. The application should be unaware of underlying path failures.
  • ASM crashed and with that the Oracle DB
  • Using Device Mapper Multipathing for Oracle database, and expect Oracle LUNs to see multipath, not sd devices?
  • How to make sure Oracle ASM devices pointing to multipath devices and not scsi paths, sd devices when using ASMLib to manage ASM disks?
  • I/O's to the SAN are not shared across all the paths of multipath in Oracle application server configured with Oracleasm.

Resolución

ORACLEASM_SCANORDER should be configured to force the use of the multipath pseudo-device. Since ASM uses entries from /proc/partition, a filter would need to be set to exclude underlying paths.

  1. Edit /etc/sysconfig/oracleasm and add dm to the SCANORDER, and sd to SCANEXCLUDE as follows:

    # ORACLEASM_SCANORDER: Matching patterns to order disk scanning
    ORACLEASM_SCANORDER="dm"
    
    # ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
    ORACLEASM_SCANEXCLUDE="sd"
    
  2. This would require that the oracleasm configuration to be updated:

    # oracleasm configure
    # oracleasm scandisks
    
  3. File /etc/sysconfig/oracleasm is soft-linked to /etc/sysconfig/oracleasm-_dev_oracleasm which is the file used by OracleASM. Verify the soft-link exists.

    # ls -al /etc/sysconfig/oracleasm
    lrwxrwxrwx 1 root root 39 Feb 22 15:54 /etc/sysconfig/oracleasm -> /etc/  sysconfig/oracleasm-_dev_oracleasm
    
  4. Changes to the oracleasm configuration file requires a restart of OracleASM service to take effect. This can be disruptive in a production environment.

    Note: It is recommended to schedule a reboot after setting SCANORDER and SCANEXCLUDE in /etc/sysconfig/oracleasm. Normally a system reboot is not required for oracleasm to start using the multipath devices. However, in (private) RHBZ#1683606, it has been noticed that, while oracleasm was still allowed to detect single paths (before the configuration change and the restart of oracleasm) it could change the value of counters used in device structures within the kernel (block_device.bd_holders) to invalid (negative) values and make the paths appear as being in use. If this happens, restarting only oracleasm will not clear the counters and the devices will continue appearing as being in use. In this case, multipath will still be unable to add the paths to the corresponding maps until the system is rebooted. If this happens, messages similar to the following will be appearing in the system logs whenever multipath tries to add one of those paths to the corresponding map:

    device-mapper: table: 253:<dm_num>: multipath: error getting device
    device-mapper: ioctl: error adding target to table
    

    The problem can appear even when multipath is using the paths (i.e. the counters can be "silently" changed while the paths are in use by multipath). In such a scenario, the problem will appear in case of an outage, which will cause the paths to be removed from the maps. When the paths return, multipath will be failing to add them to the corresponding maps.

    For this reason, it is recommended to schedule a reboot after setting SCANORDER and SCANEXCLUDE in /etc/sysconfig/oracleasm.

  5. Once restarted, verify the multipath device is being used, a major of 253 should be returned:

    # oracleasm querydisk -d <ASM_DISK_NAME>
    

Causa Raíz

When devices were added to the DISKGROUP, the underlying sd* device was used instead of the multipath pseudo device.

The dm-* devices are intended for internal use and are not persistent. However, once the DISKGROUP is created this writes metadata to the device which ASM is then able to check the header regardless of the dm- assignment. The intention here is to force ASM to read from multipath devices.

Procedimientos para el Diagnóstico

  • Query the disk to obtain the major:minor number of the disk being used by the disk group:

    # /etc/init.d/oracleasm querydisk -d ASM_DATA1
    Disk "ASM_Data1" is a valid ASM disk on device [8,16]
    
  • We can see that 8:16 is the underlying sdb path, not the ASM_DATA1 multipath pseudo device. Failover would not occur with this configuration.

    ASM_DATA1 (3600500000000000001) dm-24 IBM,2107900
    [size=100G][features=1 queue_if_no_path][hwhandler=0][rw]
    \_ round-robin 0 [prio=0][active]
    \_ 3:0:1:1 sdb 8:16   [failed][faulty]
    \_ 5:0:0:1 sdc 8:32   [active][ready] 
    \_ 5:0:1:1 sdd 8:48   [active][ready] 
    \_ 3:0:0:1 sde  8:64  [failed][faulty]
    
  • This can also be see in /proc/partitions:

     8     0  142577664 sda
     8     1     514048 sda1
     8     2   24579450 sda2
     8     3   12289725 sda3
     8    16   52428800 sdb
     8    32   52428800 sdc
     8    48   52428800 sdd
     8    64   52428800 sde
    
  • The major:minor of the multipath ASM_DATA1 pseudo device would be 253:24, or dm-24. This is the device that should be used:

    253    24   52428800 dm-24
    
  • Note: To check if an oracleasm device is mapped correctly in a vmcore, please see How to map an oracleasm path in a vmcore.

No hay comentarios: