FreeNAS Fibre Channel LUN Error / Timeout

FreeNAS Server - Fibre Connections (Ethernet and FC)

TL;DR

I’m a bit long winded, I know. If you’re pulling your hair out, scroll right down to the solution (which of course may or may not match your actual problem).

Background

I’m posting this because there is very little information on Google regarding these particular error messages, and I’m hoping that this will help point you in the right direction if you’re having the same issue.

I have a new FreeNAS server that’s going to be serving my home network via SMB / NFS, but I also wanted to give my dev and production environments access to its main storage pool for non-critical file systems (backup staging, test environments, etc). My TCP/IP networks (home/dev/prod/management) are rigorously firewalled off from each other, and putting the NAS on all them via ethernet would introduce a potential point of transit between those networks that’s undesirable.

Thus I decided to use fibre channel as the method of access for the non-“private” networks.

aran kaspar did a great writeup on how to set up an FC target on FreeNAS 9.3 which worked for me right off the bat.

The Problem

I began by testing out a backup script on one of my CentOS/Xen VM servers against a LUN exposed by the FreeNAS box. It was dumping about 850GB to the LUN, initially at an average of 200MB/s. This was over a switched 4Gbps FC connection — transfer was a bit slow, but there’s a lot of layers to this whole setup, I figured.

But after about 65GB it failed, with the following error messages in /var/log/messages:

Feb 28 03:48:54 hp-vm02 kernel: lpfc 0000:07:00.0: 1:(0):0713 SCSI layer issued Device Reset (0, 1) return x2002
Feb 28 03:50:39 hp-vm02 kernel: INFO: task jbd2/sdc1-8:32506 blocked for more than 120 seconds.
Feb 28 03:50:39 hp-vm02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 28 03:50:39 hp-vm02 kernel: jbd2/sdc1-8 D ffff88007f433580 0 32506 2 0x00000080
Feb 28 03:50:39 hp-vm02 kernel: ffff880023199bc0 0000000000000246 ffff880023199fd8 0000000000013580
Feb 28 03:50:39 hp-vm02 kernel: ffff880023198010 0000000000013580 0000000000013580 0000000000013580
Feb 28 03:50:39 hp-vm02 kernel: ffff880023199fd8 0000000000013580 ffff880078036040 ffff8800744180c0
Feb 28 03:50:39 hp-vm02 kernel: Call Trace:
Feb 28 03:50:39 hp-vm02 kernel: [] ? __wait_on_buffer+0x30/0x30
Feb 28 03:50:39 hp-vm02 kernel: [] schedule+0x29/0x70
Feb 28 03:50:39 hp-vm02 kernel: [] io_schedule+0x8c/0xd0
Feb 28 03:50:39 hp-vm02 kernel: [] sleep_on_buffer+0xe/0x20
Feb 28 03:50:39 hp-vm02 kernel: [] __wait_on_bit+0x5f/0x90
Feb 28 03:50:39 hp-vm02 kernel: [] ? __wait_on_buffer+0x30/0x30
Feb 28 03:50:39 hp-vm02 kernel: [] out_of_line_wait_on_bit+0x78/0x90
Feb 28 03:50:39 hp-vm02 kernel: [] ? autoremove_wake_function+0x40/0x40
Feb 28 03:50:39 hp-vm02 kernel: [] __wait_on_buffer+0x2e/0x30
Feb 28 03:50:39 hp-vm02 kernel: [] jbd2_journal_commit_transaction+0xcc7/0x14d0 [jbd2]
Feb 28 03:50:39 hp-vm02 kernel: [] ? wake_up_bit+0x40/0x40
Feb 28 03:50:39 hp-vm02 kernel: [] kjournald2+0xcd/0x250 [jbd2]
Feb 28 03:50:39 hp-vm02 kernel: [] ? wake_up_bit+0x40/0x40
Feb 28 03:50:39 hp-vm02 kernel: [] ? commit_timeout+0x10/0x10 [jbd2]
Feb 28 03:50:39 hp-vm02 kernel: [] kthread+0x9e/0xb0
Feb 28 03:50:39 hp-vm02 kernel: [] kernel_thread_helper+0x4/0x10
Feb 28 03:50:39 hp-vm02 kernel: [] ? retint_restore_args+0x5/0x6
Feb 28 03:50:39 hp-vm02 kernel: [] ? gs_change+0x13/0x13
Feb 28 03:55:43 hp-vm02 kernel: lpfc 0000:07:00.0: 1:(0):0713 SCSI layer issued Device Reset (0, 1) return x2002

And on the FreeNAS side, I saw these error messages (both servers’ times are very accurate to each other):

Feb 28 03:48:54 champ ctlfedone: got XPT_IMMEDIATE_NOTIFY status 0x36 tag 0x26 seq 0x115e68
Feb 28 03:48:54 champ ctlfedone: got XPT_IMMEDIATE_NOTIFY status 0x36 tag 0xffffffff seq 0x127fd8
Feb 28 03:48:54 champ (ctl2:isp0:0:1024:1): ctlfe_done: returning task I/O tag 0x26 seq 0x115e68
Feb 28 03:48:54 champ (ctl2:isp0:0:1024:1): ctlfe_done: returning task I/O tag 0xffffffff seq 0x127fd8
Feb 28 03:48:58 champ isp0: isp_handle_platform_ctio: CTIO7[127fd8] seq 0 nc 1 sts 0x2 flg 0xa060 sns 0 resid 0 FIN
Feb 28 03:54:41 champ (1:3:1/1): WRITE(10). CDB: 2a 08 3f c4 b1 a7 00 00 08 00
Feb 28 03:54:41 champ (1:3:1/1): Tag: 0x12894c, type 1
Feb 28 03:54:41 champ (1:3:1/1): ctl_process_done: 107 seconds
Feb 28 03:55:43 champ ctlfedone: got XPT_IMMEDIATE_NOTIFY status 0x36 tag 0x27 seq 0x11b484
Feb 28 03:55:43 champ ctlfedone: got XPT_IMMEDIATE_NOTIFY status 0x36 tag 0xffffffff seq 0x123e88
Feb 28 03:55:43 champ (ctl2:isp0:0:1024:1): ctlfe_done: returning task I/O tag 0x27 seq 0x11b484
Feb 28 03:55:43 champ (ctl2:isp0:0:1024:1): ctlfe_done: returning task I/O tag 0xffffffff seq 0x123e88

The initator (VM server) was reporting SCSI layer issued Device Reset followed by a terrible looking exception.

The target (FreeNAS server) was giving messages that were neigh indecipherable to me. In particular, Googling XPT_IMMEDIATE_NOTIFY did not yield much.

After a number of those errors, the initator’s SCSI subsystem decided the LUN was bad and the filesystem thought it best to enter read-only mode. That’s what caused the failure of the backup script at the high level.

I found a variety of posts with similar-but-not-quite messages alleged to have been caused by everything from faulty drivers to incompatible HBAs, bad FC switches, bad cabling, and tons of other possibilities. Because I’d previously used the same hardware for a Solaris based SAN storage server, I knew the hardware and drivers (on the Linux side, anyway) were good.

Eventually someone recommended increasing the timeout on the initiator HBA (I can’t remember where I saw it, or I’d give them credit where due). The default timeout was 30, so I raised it to 120:

echo 120 > /sys/devices/pci0000:00/0000:00:09.0/0000:07:00.0/host1/rport-1:0-4/target1:0:0/1:0:0:1/timeout

Note that I’m using Emulex cards on a CentOS distrib. The details of the above command will vary widely depending upon your card, its position in the PCIe bus, your flavor of OS, number of LUNs, the phase of the moon, and the vagaries of all non-existent deities. However, find . -name timeout is probably the fastest way to get yourself to the appropriate path.

The eagle-eyed among you may notice that in the log segments above, it says task jbd2/sdc1-8:32506 blocked for more than 120 seconds. That’s because I copied those error messages after I raised the timeout.

Increasing the timout is a kludgey fix at best. Under normal operation, your LUN should not take more than 30 seconds to respond to a command. Something else is wrong.

The rest of the log files yielded nothing.

The Solution

It was a bad freaking disk on the NAS.

It was that simple.

FreeNAS Server - Bad Disk Removed

The culprit.

The LUN was backed by a file extent on the FreeNAS server, stored on a pool that’s made of two 10 x 3TB RAIDZ2 vdevs. That’s twenty drives, and it didn’t occur to me that one might be a problem because why should it be? There were no SMART errors, no log messages, no nothing. The disk seemed to be fine.

I found the problem visually, by staring vacantly at the server as I thought about the problem. All the drives’ activity lights were flashing intermittently as data was being written, but only one drive’s light was off constantly. (My hardware is such that an “off” LED indicates activity, and an “on” LED indicates idle. Ass-backwards if you ask me, but that’s how it is.)

I checked the disk activity graphs in FreeNAS’s GUI, and sure enough the misbehaving disk was writing constantly at a slow speed while all the other disks showed identical spikes of writes followed by troughs of nothing as they waited for that one disk to finish. It’s a classic problem in RAID subsystems (and etc.), but percolated up through the hardware to the kernel to ZFS to CTL/CAM to the initiator, where it couldn’t complete the initiators commands in a timely fashion.

You can also see this problem in action with the following command / output:

# ctladm dumpooa

Dumping OOA queues
LUN 1 tag 0x129aa8 RTR: WRITE(10). CDB: 2a 00 02 ea ec 3f 00 02 00 00 (94812 ms)
LUN 1 tag 0x129ad4 RTR: WRITE(10). CDB: 2a 00 02 ea ee 3f 00 02 00 00 (93 ms)
LUN 1 tag 0x129e9c RTR: WRITE(10). CDB: 2a 00 02 eb 1a 3f 00 02 00 00 (35 ms)
LUN 1 tag 0x129ec8 RTR: WRITE(10). CDB: 2a 00 02 eb 1c 3f 00 02 00 00 (35 ms)
LUN 1 tag 0x129ef4 RTR: WRITE(10). CDB: 2a 00 02 eb 1e 3f 00 02 00 00 (34 ms)
LUN 1 tag 0x129f20 RTR: WRITE(10). CDB: 2a 00 02 eb 20 3f 00 02 00 00 (34 ms)
LUN 1 tag 0x129f4c RTR: WRITE(10). CDB: 2a 00 02 eb 22 3f 00 02 00 00 (34 ms)
LUN 1 tag 0x129f78 RTR: WRITE(10). CDB: 2a 00 02 eb 24 3f 00 02 00 00 (33 ms)
LUN 1 tag 0x129fa4 RTR: WRITE(10). CDB: 2a 00 02 eb 26 3f 00 02 00 00 (33 ms)
LUN 1 tag 0x129fd0 RTR: WRITE(10). CDB: 2a 00 02 eb 28 3f 00 02 00 00 (33 ms)
LUN 1 tag 0x129ffc RTR: WRITE(10). CDB: 2a 00 02 eb 2a 3f 00 02 00 00 (31 ms)
LUN 1 tag 0x12a054 RTR: WRITE(10). CDB: 2a 00 02 eb 2e 3f 00 02 00 00 (21 ms)
LUN 1 tag 0x12a080 RTR: WRITE(10). CDB: 2a 00 02 eb 30 3f 00 02 00 00 (21 ms)
LUN 1 tag 0x12a0ac RTR: WRITE(10). CDB: 2a 00 02 eb 32 3f 00 02 00 00 (21 ms)
LUN 1 tag 0x12a0d8 RTR: WRITE(10). CDB: 2a 00 02 eb 34 3f 00 02 00 00 (21 ms)
LUN 1 tag 0x12a104 RTR: WRITE(10). CDB: 2a 00 02 eb 36 3f 00 02 00 00 (19 ms)
LUN 1 tag 0x12a130 RTR: WRITE(10). CDB: 2a 00 02 eb 38 3f 00 02 00 00 (19 ms)
LUN 1 tag 0x12a15c RTR: WRITE(10). CDB: 2a 00 02 eb 3a 3f 00 02 00 00 (18 ms)
LUN 1 tag 0x12a188 RTR: WRITE(10). CDB: 2a 00 02 eb 3c 3f 00 02 00 00 (15 ms)
LUN 1 tag 0x12a1b4 RTR: WRITE(10). CDB: 2a 00 02 eb 3e 3f 00 02 00 00 (15 ms)
LUN 1 tag 0x12a1e0 RTR: WRITE(10). CDB: 2a 00 02 eb 40 3f 00 02 00 00 (8 ms)
LUN 1 tag 0x12a238 RTR: WRITE(10). CDB: 2a 00 02 eb 42 3f 00 02 00 00 (0 ms)
LUN 1 tag 0x12a264 RTR: WRITE(10). CDB: 2a 00 02 eb 44 3f 00 02 00 00 (0 ms)
LUN 1 tag 0x12a290 RTR: WRITE(10). CDB: 2a 00 02 eb 46 3f 00 02 00 00 (0 ms)
LUN 1 tag 0x12a2bc RTR: WRITE(10). CDB: 2a 00 02 eb 48 3f 00 02 00 00 (0 ms)
LUN 1 tag 0x12a2e8 RTR: WRITE(10). CDB: 2a 00 02 eb 4a 3f 00 02 00 00 (0 ms)
LUN 1 tag 0x12a314 RTR: WRITE(10). CDB: 2a 00 02 eb 4c 3f 00 02 00 00 (0 ms)
LUN 1 tag 0x12a340 RTR: WRITE(10). CDB: 2a 00 02 eb 4e 3f 00 02 00 00 (0 ms)
OOA queues dump done

That shows the SCSI commands that are pending. You can see in this sample output that there’s one bad egg that’s been waiting for over 90 seconds. Once that wait time on the target exceeds the timeout period on the initiator the errors occur.

Though, as above, write commands were causing timeouts, the most common command to time out was SYNCHRONIZE CACHE. Though you may be tempted to issue a ctladm realsync off command, Google that carefully and consider the ramifications. It may leave you with uncommitted writes on the target that are reported as committed to the initiator. (Is “COMMIT” the right word, or is that my RDBMS showing?)

I replaced the bum disk (which still appears to be good by all diagnostics; it’s just slow), and now there are no more timeouts. And needless to say the write speed of my pool has increased.

Conclusion

In retrospect this seems like a really simple problem, but I mistakenly thought that CTL and the initiator would play nice to whatever the max speed of the pool was, and also that the write performance of my pool exceeded the 4GB/s of the SAN. (In local testing I had been getting ~850MB/s linear write performance after cache saturation! But the disk probably started dying after I took that metric.)

The resilvering during the disk replacement was reasonably fast, so the disk subsystem should be OK: 18.2T scanned out of 18.9T at 982M/s.

If anyone can shed more light on this matter in the comments, I’d greatly appreciate it. I am the farthest thing from expert on FreeBSD / FreeNAS.

About Scott

I'm a computer guy with a new house and a love of DIY projects. I like ranting, and long drives on your lawn. I don't post everything I do, but when I do, I post it here. Maybe.
Bookmark the permalink.

One Comment

  1. Thanks. I was having a similar problem with “syncronize cache” timing out. Couldn’t figure out why either. Turns out one of the drives was bad(slow) and I could see the difference by looking at the LED’s. No SMART errors either in FreeNAS.

Leave a Reply