Ursachenforschung
Die Preisfrage lautet jetzt: ” Was ist mit der Tape Library los? Liegt das Problem auf Seite des Hosts oder Jukebox?”
Dazu erst einmal das Networker-Log in ein brauchbares Format konvertieren:
root # nsr_render_log /nsr/logs/daemon.raw > /tmp/daemon.log
Nun mit dem Editor der Wahl o.ä. zum Ende des Protokolls und dann zum Beginn des Fehlers zurückblättern:
Ende des Logs:
…
7224 08/15/08 19:05:12 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 3547]. Jukebox `senlib3′ is not ready or it is busy.
39078 08/15/08 19:05:13 0 0 2 3 18486 0 senbackup1 nsrlcpd#1 Jukebox error: Jukebox:senlib3 access:/dev/scsi/changer/c3t500104F0007E9B64d0 failed:MOVE MEDIUM key:2 status:CHECK CONDITION No Additional Sense, Logical Unit Not Ready, Manual Intervention Required
0 08/15/08 19:05:14 2 0 0 1 18471 0 senbackup1 nsrmmgd lcpd 1 at host senbackup1 reported error ‘Jukebox `senlib3′ is not ready or it is busy.’ for the command `4′.
7224 08/15/08 19:05:14 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 3548]. Jukebox `senlib3′ is not ready or it is busy.
4690 08/15/08 19:05:14 2 0 0 1 18471 0 senbackup1 nsrmmgd Jukebox `senlib3′ is not ready or it is busy.
7224 08/15/08 19:05:14 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 3548]. Jukebox `senlib3′ is not ready or it is busy.
12361 08/15/08 19:05:15 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 3547]. Finished with status: retryable
…
Beginn des Fehlers:
…
42506 08/14/08 22:33:05 2 0 0 1 17996 0 senbackup1 nsrd write completion notice: Writing to volume 000061 complete
0 08/14/08 22:34:10 2 0 0 1 17996 0 senbackup1 nsrd Operation 277 started : Unload jukebox device `/dev/rmt/2cbn’..
38752 08/14/08 22:34:12 2 0 0 1 17996 0 senbackup1 nsrd /dev/rmt/2cbn Eject operation in progress
39078 08/14/08 22:36:55 0 0 2 3 18486 0 senbackup1 nsrlcpd#1 Jukebox error: Jukebox:senlib3 access:/dev/scsi/changer/c3t500104F0007E9B64d0 failed:MOVE MEDIUM key:4 status:CHECK CONDITION No Additional Sense, Diagnostic Failure on Component 0×01
0 08/14/08 22:36:56 2 0 0 1 18471 0 senbackup1 nsrmmgd lcpd 1 at host senbackup1 reported error ‘Jukebox:senlib3 access:/dev/scsi/changer/c3t500104F0007E9B6
4d0 failed:MOVE MEDIUM key:4 status:CHECK CONDITION No Additional Sense, Diagnostic Failure on Component 0×01′ for the command `5′.
7224 08/14/08 22:36:56 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 277]. Jukebox:senlib3 access:/dev/scsi/changer/c3t500104F0007E9B64d0
failed:MOVE MEDIUM key:4 status:CHECK CONDITION No Additional Sense, Diagnostic Failure on Component 0×01
4690 08/14/08 22:36:56 2 0 0 1 18471 0 senbackup1 nsrmmgd Jukebox:senlib3 access:/dev/scsi/changer/c3t500104F0007E9B64d0 failed:MOVE MEDIUM key:4 status:CHECK CONDITION No Additional Sense, Diagnostic Failure on Component 0×01
7224 08/14/08 22:36:56 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 277]. Jukebox `senlib3′ failed to unload the requested volume from de
vice `/dev/rmt/2cbn’.
7224 08/14/08 22:36:56 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 277]. Jukebox:senlib3 access:/dev/scsi/changer/c3t500104F0007E9B64d0
failed:MOVE MEDIUM key:4 status:CHECK CONDITION No Additional Sense, Diagnostic Failure on Component 0×01
12361 08/14/08 22:37:00 2 0 0 1 17996 0 senbackup1 nsrd [Jukebox `senlib3', operation # 277]. Finished with status: retryable
42506 08/14/08 22:37:28 2 0 0 1 17996 0 senbackup1 nsrd media warning: Jukebox `senlib3′ Hardware status of jukebox ‘senlib3′ changed from ‘ready’ to ‘needs
attention’
…
Aha. Scheinbar ist das Entladen eines Tapes fehlgeschlagen. Seitdem reagiert die Jukebox nicht mehr.
Es handelt sich hier um eine Sun/STK SL500 Tape-Library, die per Fibrechannel-Fabric angebunden ist. Jetzt also erst mal prüfen, ob es ggf. Probleme im SAN gibt.
Zunächst einmal gilt es Infos zur Jukebox-Konfiguration in Erfahrung zu bringen. Aus den obigen Fehlermeldungen wissen wir bereits, dass das Jukebox-Changer-Device /dev/scsi/changer/c3t500104F0007E9B64d0 heißt und somit am Controller 3 angeschlossen ist.
root@senbackup1 # ls -l /dev/scsi/changer
total 2
lrwxrwxrwx 1 root root 89 Sep 11 2007 c3t500104F0007E9B64d0 -> ../../../devices/pci@7c0/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0/sgen@w500104f0007e9b64,0:changer
root # ls -l /dev/rmt/*cbn
lrwxrwxrwx 1 root root 80 Sep 13 2007 /dev/rmt/0cbn -> ../../devices/pci@7c0/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0/st@w500104f0007e9b71,0:cbn
lrwxrwxrwx 1 root root 80 Sep 13 2007 /dev/rmt/1cbn -> ../../devices/pci@7c0/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0/st@w500104f0007e9b6e,0:cbn
lrwxrwxrwx 1 root root 80 Sep 13 2007 /dev/rmt/2cbn -> ../../devices/pci@7c0/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0/st@w500104f0007e9b74,0:cbn
lrwxrwxrwx 1 root root 80 Sep 13 2007 /dev/rmt/3cbn -> ../../devices/pci@7c0/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0/st@w500104f0007e9b77,0:cbn
lrwxrwxrwx 1 root root 80 Sep 13 2007 /dev/rmt/4cbn -> ../../devices/pci@7c0/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0/st@w500104f0007e9b7d,0:cbn
lrwxrwxrwx 1 root root 80 Apr 30 13:34 /dev/rmt/5cbn -> ../../devices/pci@7c0/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0/st@w500104f0007e9b7a,0:cbn
Aus dem Hardwarepfad ist ersichtlich, daß je 3 Laufwerke an 2 seperaten FC-HBAs angeschlossen sind.
root # ls -l /dev/cfg/ | grep devices/pci@7c0/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0
lrwxrwxrwx 1 root root 56 Aug 24 2007 c3 -> ../../devices/pci@7c0/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0:fc
root # ls -l /dev/cfg/ | grep devices/pci@7c0/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0
lrwxrwxrwx 1 root root 56 Aug 24 2007 c5 -> ../../devices/pci@7c0/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0:fc
Tape 0,1,2 und die Robotik hängen also am Controller 3, Tape 3,4,5 am Controller 5.
root # fcinfo hba-port -l
…
HBA Port WWN: 2101001b323f0a1d
OS Device Name: /dev/cfg/c3
Manufacturer: QLogic Corp.
Model: 375-3356-01
Firmware Version: 4.2.2
FCode/BIOS Version: BIOS: 1.24; fcode: 1.26; EFI: 1.8;
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2001001b323f0a1d
…
HBA Port WWN: 210100e08bb2d018
OS Device Name: /dev/cfg/c5
Manufacturer: QLogic Corp.
Model: 375-3356-01
Firmware Version: 4.2.2
FCode/BIOS Version: BIOS: 1.24; fcode: 1.26; EFI: 1.8;
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 200100e08bb2d018
…
Nun sind also die WWPNs der Controller bekannt. Damit lässt sich abfragen, welche LUNs der Server auf diesen Ports sieht:
root # fcinfo remote-port -p 2101001b323f0a1d -s
Remote Port WWN: 500104f0007e9b64
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0007e9b63
LUN: 0
Vendor: STK
Product: SL500
OS Device Name: /devices/pci@7c0/pci@0/pci@8/SUNW,qlc@0,1/fp@0,0/sgen@w500104f0007e9b64,0
…
Remote Port WWN: 500104f0007e9b6e
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0007e9b6d
LUN: 0
Vendor: HP
Product: Ultrium 3-SCSI
OS Device Name: /dev/rmt/1n
Remote Port WWN: 500104f0007e9b71
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0007e9b70
LUN: 0
Vendor: HP
Product: Ultrium 3-SCSI
OS Device Name: /dev/rmt/0n
Remote Port WWN: 500104f0007e9b74
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0007e9b73
LUN: 0
Vendor: HP
Product: Ultrium 3-SCSI
OS Device Name: /dev/rmt/2n
Am Controller 3 sind 3 HP LTO-3 Tape Drives und der Changer der SL500 (über sgen-Treiber) sichtbar.
root # fcinfo remote-port -p 210100e08bb2d018 -s
…
Remote Port WWN: 500104f0007e9b77
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0007e9b76
LUN: 0
Vendor: HP
Product: Ultrium 3-SCSI
OS Device Name: /dev/rmt/3n
Remote Port WWN: 500104f0007e9b7a
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0007e9b79
LUN: 0
Vendor: HP
Product: Ultrium 3-SCSI
OS Device Name: /dev/rmt/5n
Remote Port WWN: 500104f0007e9b7d
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0007e9b7c
LUN: 0
Vendor: HP
Product: Ultrium 3-SCSI
OS Device Name: /dev/rmt/4n
Am Controller 5 sind 3 HP LTO-3 Tape Drives sichtbar.
Fazit: Im SAN scheint es zumindest auf den ersten Blick kein Problem zu geben.
Der Networker bietet auch noch Befehle zum Thema an:
root # inquire -l
-l flag found: searching all LUNs, which may take over 10 minutes per adapter
for some fibre channel adapters. Please be patient.
…
scsidev@3.3.0:HP Ultrium 3-SCSI M21S|Tape, /dev/rmt/1cbn
S/N: HU107120CY
ATNN=HP Ultrium 3-SCSI HU107120CY
WWNN=50060B000036F6EA
WWPN=500104F0007E9B6E
WW2N=500104F0007E9B6D
PORT=00000001
scsidev@3.4.0:HP Ultrium 3-SCSI M21S|Tape, /dev/rmt/0cbn
S/N: HU107120BG
ATNN=HP Ultrium 3-SCSI HU107120BG
WWNN=50060B000036F686
WWPN=500104F0007E9B71
WW2N=500104F0007E9B70
PORT=00000001
scsidev@3.7.0:HP Ultrium 3-SCSI L58S|Tape, /dev/rmt/2cbn
S/N: HU10704J3U
ATNN=HP Ultrium 3-SCSI HU10704J3U
WWNN=500110A00058B456
WWPN=500104F0007E9B74
WW2N=500104F0007E9B73
PORT=00000001
…
scsidev@5.1.0:HP Ultrium 3-SCSI L58S|Tape, /dev/rmt/3cbn
S/N: HU107070AD
ATNN=HP Ultrium 3-SCSI HU107070AD
WWNN=500110A00058D182
WWPN=500104F0007E9B77
WW2N=500104F0007E9B76
PORT=00000001
scsidev@5.3.0:HP Ultrium 3-SCSI L58S|Tape, /dev/rmt/5cbn
S/N: HU10704J4A
ATNN=HP Ultrium 3-SCSI HU10704J4A
WWNN=500110A00058B446
WWPN=500104F0007E9B7A
WW2N=500104F0007E9B79
PORT=00000001
scsidev@5.6.0:HP Ultrium 3-SCSI L58S|Tape, /dev/rmt/4cbn
S/N: HU10704J4R
ATNN=HP Ultrium 3-SCSI HU10704J4R
WWNN=500110A00058B42A
WWPN=500104F0007E9B7D
WW2N=500104F0007E9B7C
PORT=00000001
Verdächtig: inquire findet die Robotik nicht…
root # jbverify
14866:jbverify:
Jbverify is running on host senbackup1, SunOS 5.10
….
14917:jbverify:
Finished processing stand-alone devices.
14918:jbverify:
Processing jukebox devices…
14920:jbverify:
Processing jukebox senlib3:
39078:jbverify: SYSTEM error: Device busy
39077:jbverify: error, Cannot initialize jukebox! Driver may not be loaded!
14926:jbverify:
Failed while processing jukebox senlib3 on senbackup1
14929:jbverify:
Finished processing jukebox devices.
**********************************************************************
Summary report of jbverify
======= ====== == ========
Hostname Device Handle Blocksize Jukebox Drv No. Status
——– ————- ——— ——- ——- ——
senbackup1 /AFTD/Exchange/_AF_readonly 131072 N/A N/A Pass
senbackup1 /AFTD/Arriba/_AF_readonly 131072 N/A N/A Pass
senbackup1 /AFTD/Windows/_AF_readonly 131072 N/A N/A Pass
senbackup1 /AFTD/UNIX/_AF_readonly 131072 N/A N/A Pass
senbackup1 /AFTD/SQL/_AF_readonly 131072 N/A N/A Pass
senbackup1 /AFTD/Exchange 131072 N/A N/A Pass
senbackup1 /AFTD/default/_AF_readonly 131072 N/A N/A Pass
senbackup1 /AFTD/Arriba 131072 N/A N/A Pass
senbackup1 /AFTD/UNIX 131072 N/A N/A Pass
senbackup1 /AFTD/SQL 131072 N/A N/A Pass
senbackup1 /AFTD/default 131072 N/A N/A Pass
senbackup1 /AFTD/Windows 131072 N/A N/A Pass
senbackup1 Unknown N/A senlib3 N/A Fail
**********************************************************************
14938:jbverify:
Exiting jbverify with failures.
root # jbverify -j -v -v
14866:jbverify:
Jbverify is running on host senbackup1, SunOS 5.10
14890:jbverify:
turning verbose mode on.
14900:jbverify:
No. of retries set by default to 10.
14902:jbverify:
Using senbackup1 as NetWorker server.
14918:jbverify:
Processing jukebox devices…
14920:jbverify:
Processing jukebox senlib3:
39078:jbverify: SYSTEM error: Device busy
39077:jbverify: error, Cannot initialize jukebox! Driver may not be loaded!
14926:jbverify:
Failed while processing jukebox senlib3 on senbackup1
14929:jbverify:
Finished processing jukebox devices.
**********************************************************************
Summary report of jbverify
======= ====== == ========
Hostname Device Handle Blocksize Jukebox Drv No. Status
——– ————- ——— ——- ——- ——
senbackup1 Unknown N/A senlib3 N/A Fail
**********************************************************************
14938:jbverify:
Exiting jbverify with failures.
Damit wird es nun offensichtlich, dass wohl ein Problem in der Tape-Library selbst besteht. Deshalb nun der Griff zur Library Console:
Oh je, das sieht verdächtig nach Hardware-Problem aus…


