Discussion:
"device busy" error when registering device
Ian Pilcher
2014-07-19 00:11:50 UTC
Permalink
I just finished moving my existing Fedora 20 root filesystem onto a
bcache device (actually LVM on top of a bcache physical volume).

The bcache cache device is /dev/sda2, a partition on my SSD; the backing
device is /dev/md126p5, a partition on an Intel RAID (imsm) volume.

This configuration only boots successfully about 50% of the time. The
other 50% of the time, the bcache device is not created, and dracut
times out and dumps me into an emergency shell.

After changing the bcache-register script to use /sys/fs/bcache/register
(instead of register_quiet), I see a "device busy" error when udev
attempts to register the backing device:

[ 2.105581] bcache: register_bcache() error opening /dev/md126p5:
device busy

This is kernel 3.5.15, so this doesn't mean that the device is already
registered; something else has it (temporarily) opened. I say that it's
opened temporarily, because I am able to register the backing device
manually from the dracut shell -- which starts the the bcache device.

Looking at /usr/lib/udev/bcache-register and the bcache_register source
in drivers/md/bcache/super.c, I notice 2 things.

(1) bcache-register gives up immediately when an error occurs because of
a (possibly temporary) conflict.

(2) Although the driver logs a different message in the already
registered case ("device already registered" instead of "device
busy"), it doesn't provide userspace with any way to distinguish the
two cases; it always returns -EINVAL.

Suggested fix:

(1) Change bcache_register to return -EBUSY in the device busy case
(while still returning -EINVAL in the already registered case).

(2) Change bcache-register to check the exit code of the registration
attempt and retry in the EBUSY case.

Does this make sense?
--
========================================================================
Ian Pilcher ***@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================
Josep Lladonosa
2014-07-19 07:23:59 UTC
Permalink
Post by Ian Pilcher
I just finished moving my existing Fedora 20 root filesystem onto a
bcache device (actually LVM on top of a bcache physical volume).
The bcache cache device is /dev/sda2, a partition on my SSD; the back=
ing
Post by Ian Pilcher
device is /dev/md126p5, a partition on an Intel RAID (imsm) volume.
This configuration only boots successfully about 50% of the time. Th=
e
Post by Ian Pilcher
other 50% of the time, the bcache device is not created, and dracut
times out and dumps me into an emergency shell.
After changing the bcache-register script to use /sys/fs/bcache/regis=
ter
Post by Ian Pilcher
(instead of register_quiet), I see a "device busy" error when udev
[ 2.105581] bcache: register_bcache() error opening /dev/md126p5=
device busy
This is kernel 3.5.15, so this doesn't mean that the device is alread=
y
Post by Ian Pilcher
registered; something else has it (temporarily) opened. I say that i=
t's
Post by Ian Pilcher
opened temporarily, because I am able to register the backing device
manually from the dracut shell -- which starts the the bcache device.
Looking at /usr/lib/udev/bcache-register and the bcache_register sour=
ce
Post by Ian Pilcher
in drivers/md/bcache/super.c, I notice 2 things.
(1) bcache-register gives up immediately when an error occurs because=
of
Post by Ian Pilcher
a (possibly temporary) conflict.
(2) Although the driver logs a different message in the already
registered case ("device already registered" instead of "device
busy"), it doesn't provide userspace with any way to distinguish =
the
Post by Ian Pilcher
two cases; it always returns -EINVAL.
(1) Change bcache_register to return -EBUSY in the device busy case
(while still returning -EINVAL in the already registered case).
(2) Change bcache-register to check the exit code of the registration
attempt and retry in the EBUSY case.
Does this make sense?
--
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
com
Post by Ian Pilcher
-------- "I grew up before Mark Zuckerberg invented friendship" -----=
---
Post by Ian Pilcher
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
Post by Ian Pilcher
--
To unsubscribe from this list: send the line "unsubscribe linux-bcach=
e" in
Post by Ian Pilcher
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello,

I am using bcache for a boot device and have found a "rare" behaviour, =
too.
In my case, I think bcache device is tried to be registered twice,
first incorrectly (perhaps first readonly system mount?), or tries to
register whole devices (in my case bcache0 is made of sda1 and sdb7)
instead of partitions. It could be related to initramfs scripts.

My lsblk:

=2E# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 238,5G 0 disk
=E2=94=9C=E2=94=80sda1 8:1 0 201,8G 0 part
=E2=94=82 =E2=94=94=E2=94=80bcache0 254:0 0 898G 0 disk /
=E2=94=9C=E2=94=80sda2 8:2 0 226M 0 part
=E2=94=94=E2=94=80sda3 8:3 0 36,5G 0 part
sdb 8:16 0 931,5G 0 disk
=E2=94=9C=E2=94=80sdb1 8:17 0 1,9G 0 part
=E2=94=9C=E2=94=80sdb2 8:18 0 1M 0 part
=E2=94=9C=E2=94=80sdb5 8:21 0 953M 0 part /boot
=E2=94=9C=E2=94=80sdb6 8:22 0 30,8G 0 part [SWAP]
=E2=94=94=E2=94=80sdb7 8:23 0 898G 0 part
=E2=94=94=E2=94=80bcache0 254:0 0 898G 0 disk /


I tracked down the code by adding some variables log and found that
flash_dev_run is sometimes called with values higher than
0x9000000000000000 and makes bcache lock system boot:

=46eb 26 16:40:02 minijep kernel: [ 3.062304] sd 1:0:0:0: [sda]
500118192 512-byte logical blocks: (256 GB/238 GiB)
=46eb 26 16:40:02 minijep kernel: [ 3.062342] sd 1:0:0:0: [sda] Writ=
e
Protect is off
=46eb 26 16:40:02 minijep kernel: [ 3.062357] sd 1:0:0:0: [sda] Writ=
e
cache: enabled, read cache: enabled, doesn't support DPO or FUA
=46eb 26 16:40:02 minijep kernel: [ 3.063053] sda: sda1 sda2 sda3
=46eb 26 16:40:02 minijep kernel: [ 3.063305] sd 1:0:0:0: [sda]
Attached SCSI disk
=46eb 26 16:40:02 minijep kernel: [ 3.065135] sr0: scsi3-mmc drive:
24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
=46eb 26 16:40:02 minijep kernel: [ 3.065136] cdrom: Uniform CD-ROM
driver Revision: 3.20
=46eb 26 16:40:02 minijep kernel: [ 3.065351] sd 4:0:0:0: [sdb]
1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
=46eb 26 16:40:02 minijep kernel: [ 3.065355] sd 4:0:0:0: [sdb]
4096-byte physical blocks
=46eb 26 16:40:02 minijep kernel: [ 3.065394] sd 4:0:0:0: [sdb] Writ=
e
Protect is off
=46eb 26 16:40:02 minijep kernel: [ 3.065409] sd 4:0:0:0: [sdb] Writ=
e
cache: enabled, read cache: enabled, doesn't support DPO or FUA
=46eb 26 16:40:02 minijep kernel: [ 3.066631] sd 1:0:0:0: Attached
scsi generic sg0 type 0
=46eb 26 16:40:02 minijep kernel: [ 3.066669] sr 2:0:0:0: Attached
scsi generic sg1 type 5
=46eb 26 16:40:02 minijep kernel: [ 3.066710] sd 4:0:0:0: Attached
scsi generic sg2 type 0
=46eb 26 16:40:02 minijep kernel: [ 3.144996] bcache:
bch_cache_set_alloc() P7000: bch_cache_set_alloc: c->nr_uuids 4096
=46eb 26 16:40:02 minijep kernel: [ 3.145035] bio: create slab <bio-=
1> at 1
=46eb 26 16:40:02 minijep kernel: [ 3.357056] sdb: sdb1 sdb2 sdb5 s=
db6 sdb7
=46eb 26 16:40:02 minijep kernel: [ 3.357972] sd 4:0:0:0: [sdb]
Attached SCSI disk
=46eb 26 16:40:02 minijep kernel: [ 3.386779] bcache: uuid_io() read
UUIDs at 0:0 len 1024 -> [0:60275712 gen 0]
=46eb 26 16:40:02 minijep kernel: [ 3.499528] Switched to clocksourc=
e tsc
=46eb 26 16:40:02 minijep kernel: [ 3.928562] bcache:
bch_journal_replay() journal replay done, 354 keys in 31 entries, seq
1076068
=46eb 26 16:40:02 minijep kernel: [ 3.928717] bcache:
_debug_show_uuids_of_cache_set() probes17-302: flash_dev_run: u
ffff880801300000 c->nr_uuids 1000 u + c->nr_uuids ffff880801380000,
sizeof(struct uuid_entry) 128, c->nr_uuids*sizeof(struct uuid_entry)
524288
=46eb 26 16:40:02 minijep kernel: [ 3.928720] bcache:
_debug_show_uuids_of_cache_set() probes17-9999: flash_dev_run: u
ffff880801300000 c->nr_uuids 1000 u + c->nr_uuids ffff880801380000,
sizeof(struct uuid_entry) 128, c->nr_uuids*sizeof(struct uuid_entry)
524288
=46eb 26 16:40:02 minijep kernel: [ 3.928723] bcache:
flash_devs_run() P00: flash_dev_run: uuids[0] u: ffff880801300000,
u->sectors: 1504f6b8
=46eb 26 16:40:02 minijep kernel: [ 3.928724] bcache:
flash_devs_run() P00: flash_dev_run: uuids[1] u: ffff880801300080,
u->sectors: aed46001
=46eb 26 16:40:02 minijep kernel: [ 3.928726] bcache:
flash_devs_run() P00: flash_dev_run: uuids[2] u: ffff880801300100,
u->sectors: 9000000000800000
=46eb 26 16:40:02 minijep kernel: [ 3.928727] bcache:
flash_devs_run() P00: flash_dev_run: uuids[3] u: ffff880801300180,
u->sectors: 15050838
=46eb 26 16:40:02 minijep kernel: [ 3.928729] bcache:
flash_devs_run() P00: flash_dev_run: uuids[4] u: ffff880801300200,
u->sectors: 13e9fe800
=46eb 26 16:40:02 minijep kernel: [ 3.928730] bcache:
flash_devs_run() P00: flash_dev_run: uuids[5] u: ffff880801300280,
u->sectors: 9000000040000000
=46eb 26 16:40:02 minijep kernel: [ 3.928732] bcache:
flash_devs_run() P00: flash_dev_run: uuids[6] u: ffff880801300300,
u->sectors: 15051ff0
=46eb 26 16:40:02 minijep kernel: [ 3.928733] bcache:
flash_devs_run() P00: flash_dev_run: uuids[7] u: ffff880801300380,
u->sectors: 13ec38800
=46eb 26 16:40:02 minijep kernel: [ 3.928734] bcache:
flash_devs_run() P00: flash_dev_run: uuids[8] u: ffff880801300400,
u->sectors: 9000000037000000
=46eb 26 16:40:02 minijep kernel: [ 3.928736] bcache:
flash_devs_run() P00: flash_dev_run: uuids[9] u: ffff880801300480,
u->sectors: 1505a648
=46eb 26 16:40:02 minijep kernel: [ 3.928738] bcache:
flash_devs_run() P00: flash_dev_run: uuids[10] u: ffff880801300500,
u->sectors: 948d8800
=46eb 26 16:40:02 minijep kernel: [ 3.928740] bcache:
flash_devs_run() P00: flash_dev_run: uuids[11] u: ffff880801300580,
u->sectors: 9000000002000000
=46eb 26 16:40:02 minijep kernel: [ 3.928741] bcache:
flash_devs_run() P00: flash_dev_run: uuids[12] u: ffff880801300600,
u->sectors: 15400798
=46eb 26 16:40:02 minijep kernel: [ 3.928743] bcache:
flash_devs_run() P00: flash_dev_run: uuids[13] u: ffff880801300680,
u->sectors: 2da2c000
=46eb 26 16:40:02 minijep kernel: [ 3.928744] bcache:
flash_devs_run() P00: flash_dev_run: uuids[14] u: ffff880801300700,
u->sectors: 9000000000800000

Those enormous values (incorrectly read from device?) make no sense,
compared to normal values. That is why, in order to be able to boot, I
add an if filter in flash_devs_run() in super.c:

-------in super.c---------------------------------------------
static int flash_devs_run(struct cache_set *c)
{
int ret =3D 0;
struct uuid_entry *u;

for (u =3D c->uuids;
u < c->uuids + c->nr_uuids && !ret;
u++)
if (u->sectors < 0x9000000000000000) //line added to be
able to root boot!
if (UUID_FLASH_ONLY(u))
ret =3D flash_dev_run(c, u);

return ret;
}
-------------------------------------------------------------

Even with this "patch", system boots but it has a 30 second wait
during boot process. See wait from time 5.7 to 35.66 log lines:

=46eb 27 21:05:33 minijep kernel: [ 5.758655] [drm] Enabling RC6
states: RC6 on, RC6p off, RC6pp off
=46eb 27 21:05:33 minijep kernel: [ 35.663535] Adding 32225276k swap
on /dev/sdb6. Priority:-1 extents:1 across:32225276k
=46eb 27 21:05:33 minijep kernel: [ 35.686366] EXT4-fs (bcache0):
re-mounted. Opts: (null)
=46eb 27 21:05:33 minijep kernel: [ 35.796078] EXT4-fs (bcache0):
re-mounted. Opts: errors=3Dremount-ro
=46eb 27 21:05:33 minijep kernel: [ 36.026857] fuse init (API version=
7.22)
=46eb 27 21:05:33 minijep kernel: [ 36.035753] loop: module loaded



--=20
--
Salutacions...Josep
--
Ian Pilcher
2014-07-22 04:46:45 UTC
Permalink
Post by Josep Lladonosa
I tracked down the code by adding some variables log and found that
flash_dev_run is sometimes called with values higher than
This does not appear to have anything to do with the "device busy"
error. Please don't hijack threads.

Thanks!
--
========================================================================
Ian Pilcher ***@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================
Ian Pilcher
2014-07-22 05:13:00 UTC
Permalink
Post by Ian Pilcher
(1) Change bcache_register to return -EBUSY in the device busy case
(while still returning -EINVAL in the already registered case).
(2) Change bcache-register to check the exit code of the registration
attempt and retry in the EBUSY case.
Does this make sense?
I've gone ahead and implemented an initial version of this approach.
See the attached files:

* linux-bcache-register-retval.patch - Makes bcache_register return
-EBUSY when it is unable to get exclusive access to the device, but
the device is not already registered. It still returns -EINVAL in all
other error cases.

This allows userspace to distinguish the "device busy" case from other
errors. I couldn't find an easy way to make this determination from a
shell script, though, so I created ...

* bcreg.c - This does most of the work that was previously done in the
bcache-register script. Specifically, it will wait for the sysfs
register file to be created and then write the name of the device to
that file -- retrying if it encounters an EBUSY.

* bcache-register - bcreg doesn't call modprobe, so this script is still
required. It now calls bcreg to register the device.

Thoughts?
--
========================================================================
Ian Pilcher ***@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================
Ian Pilcher
2014-07-23 03:43:05 UTC
Permalink
Filed as https://bugzilla.kernel.org/show_bug.cgi?id=80961.
--
========================================================================
Ian Pilcher ***@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================
Rolf Fokkens
2014-08-13 20:24:33 UTC
Permalink
Hi Ian,

Sorry for the late response. Due to holidays it escaped my attention.

I'm running a very similar setup, but my system boots 100% of the time.
So it may be useful to find out what's causing the problems at your
system. You're using Intel RAID and I'm using Linux software RAID. It
may be relevant, I don't know.

These are the details of my system, maybe you can spot a significant
difference:

[***@home07 ~]# cat /proc/version
Linux version 3.15.6-200.fc20.x86_64
(***@bkernel02.phx2.fedoraproject.org) (gcc version 4.8.3 20140624
(Red Hat 4.8.3-1) (GCC) ) #1 SMP Fri Jul 18 02:36:27 UTC 2014
[***@home07 ~]#
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/BCACHE-ROOTFS 79G 56G 20G 75% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 212K 3.9G 1% /dev/shm
tmpfs 3.9G 9.2M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
tmpfs 3.9G 888K 3.9G 1% /tmp
/dev/md0 462M 383M 56M 88% /boot
[***@home07 ~]# vgdisplay
--- Volume group ---
VG Name BCACHE
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 18
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 139.91 GiB
PE Size 4.00 MiB
Total PE 35816
Alloc PE / Size 35328 / 138.00 GiB
Free PE / Size 488 / 1.91 GiB
VG UUID jIxLKK-ASqT-hlHy-D87m-lVLu-TFFc-7Tncp6

[***@home07 ~]# pvdisplay
--- Physical volume ---
PV Name /dev/bcache0
VG Name BCACHE
PV Size 139.91 GiB / not usable 2.87 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 35816
Free PE 488
Allocated PE 35328
PV UUID McXfNf-PEn1-DFEl-pAsX-3aIz-C2y6-xf75QV
[***@home07 ~]# bcache-status -s
--- bcache ---
UUID bc9e13cb-b50d-4016-bb52-1e20390ce248
Block Size 512 B
Bucket Size 512.00 KiB
Congested? False
Read Congestion 0.0ms
Write Congestion 0.0ms
Total Cache Size 30 GiB
Total Cache Used 23 GiB (77%)
Total Cache Unused 7 GiB (23%)
Evictable Cache 28 GiB (94%)
Replacement Policy [lru] fifo random
Cache Mode writethrough [writeback] writearound none
Total Hits 155910 (95%)
Total Misses 7204
Total Bypass Hits 5230 (100%)
Total Bypass Misses 0
Total Bypassed 4.0 MiB
--- Backing Device ---
Device File /dev/md2 (9:2)
bcache Device File /dev/bcache0 (252:0)
Size 140 GiB
Cache Mode writethrough [writeback] writearound none
Readahead 0
Sequential Cutoff 0 B
Merge sequential? False
State dirty
Writeback? True
Dirty Data 2 GiB
Total Hits 155910 (95%)
Total Misses 7204
Total Bypass Hits 5230 (100%)
Total Bypass Misses 0
Total Bypassed 4.0 MiB
--- Cache Device ---
Device File /dev/sdd1 (8:49)
Size 30 GiB
Block Size 512 B
Bucket Size 512.00 KiB
Replacement Policy [lru] fifo random
Discard? False
I/O Errors 0
Metadata Written 43.9 MiB
Data Written 4 GiB
Buckets 61440
Cache Used 23 GiB (77%)
Cache Unused 7 GiB (23%)
[***@home07 ~]# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 sdc3[0] sda3[1] sdb3[2]
1027968 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sdc1[0] sda1[1] sdb1[2]
496896 blocks [3/3] [UUU]

md2 : active raid5 sda5[1] sdc5[0] sdb5[2]
146705280 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>
[***@home07 ~]#

sda, sdb and sdc are SAMSUNG HD160JJ disks
sdd is a SanDisk SDSSDP06

The following may also be relevant, your device may be loked due to
misidentification:

[***@home07 ~]# for i in /dev/sd[abc]1 /dev/sd[abc]3 /dev/md2 /dev/sdd1
/dev/bcache0 ; do echo $i; wipefs "$i" | sed 's/^/ /'; done
/dev/sda1
offset type
----------------------------------------------------------------
0x438 ext3 [filesystem]
LABEL: BOOT
UUID: a3768dfd-37ec-45d1-a01b-76280ed390d0

0x1e540000 linux_raid_member [raid]
UUID: b7036aaf-3c8d-e714-bfe7-8010bc810f04

/dev/sdb1
offset type
----------------------------------------------------------------
0x438 ext3 [filesystem]
LABEL: BOOT
UUID: a3768dfd-37ec-45d1-a01b-76280ed390d0

0x1e540000 linux_raid_member [raid]
UUID: b7036aaf-3c8d-e714-bfe7-8010bc810f04

/dev/sdc1
offset type
----------------------------------------------------------------
0x438 ext3 [filesystem]
LABEL: BOOT
UUID: a3768dfd-37ec-45d1-a01b-76280ed390d0

0x1e540000 linux_raid_member [raid]
UUID: b7036aaf-3c8d-e714-bfe7-8010bc810f04

/dev/sda3
offset type
----------------------------------------------------------------
0x1f5f0000 linux_raid_member [raid]
UUID: 59d3d229-892d-7dae-e109-537ecd2580d5

/dev/sdb3
offset type
----------------------------------------------------------------
0x218 LVM2_member [raid]
UUID: 12Zw7I-EFzj-hX5g-MXyM-0LTu-rg9d-vi25QE

0x1f5f0000 linux_raid_member [raid]
UUID: 59d3d229-892d-7dae-e109-537ecd2580d5

/dev/sdc3
offset type
----------------------------------------------------------------
0x218 LVM2_member [raid]
UUID: 12Zw7I-EFzj-hX5g-MXyM-0LTu-rg9d-vi25QE

0x1f5f0000 linux_raid_member [raid]
UUID: 59d3d229-892d-7dae-e109-537ecd2580d5

/dev/md2
offset type
----------------------------------------------------------------
0x1018 bcache [other]
UUID: 63aef7ae-d550-4ca6-8063-0b7d0cd63ad5

/dev/sdd1
offset type
----------------------------------------------------------------
0x1018 bcache [other]
UUID: 0d553929-3ef5-4f65-8479-2868bbba7329

/dev/bcache0
offset type
----------------------------------------------------------------
0x218 LVM2_member [raid]
UUID: McXfNf-PEn1-DFEl-pAsX-3aIz-C2y6-xf75QV

[***@home07 ~]#


Note the single (bcache) signature on md2.Check if your md126p2 RAID
device also has single signature.

Also note the double signatures on sdb3 and sdc3. I wasn't aware of
this, these double signatures might get me into trouble if LVM would
claim them before linux raid. But I've been lucky apparently.

Rolf
Post by Ian Pilcher
I just finished moving my existing Fedora 20 root filesystem onto a
bcache device (actually LVM on top of a bcache physical volume).
The bcache cache device is /dev/sda2, a partition on my SSD; the backing
device is /dev/md126p5, a partition on an Intel RAID (imsm) volume.
This configuration only boots successfully about 50% of the time. The
other 50% of the time, the bcache device is not created, and dracut
times out and dumps me into an emergency shell.
After changing the bcache-register script to use /sys/fs/bcache/register
(instead of register_quiet), I see a "device busy" error when udev
device busy
This is kernel 3.5.15, so this doesn't mean that the device is already
registered; something else has it (temporarily) opened. I say that it's
opened temporarily, because I am able to register the backing device
manually from the dracut shell -- which starts the the bcache device.
Looking at /usr/lib/udev/bcache-register and the bcache_register source
in drivers/md/bcache/super.c, I notice 2 things.
(1) bcache-register gives up immediately when an error occurs because of
a (possibly temporary) conflict.
(2) Although the driver logs a different message in the already
registered case ("device already registered" instead of "device
busy"), it doesn't provide userspace with any way to distinguish the
two cases; it always returns -EINVAL.
(1) Change bcache_register to return -EBUSY in the device busy case
(while still returning -EINVAL in the already registered case).
(2) Change bcache-register to check the exit code of the registration
attempt and retry in the EBUSY case.
Does this make sense?
Ian Pilcher
2014-08-20 01:38:44 UTC
Permalink
Post by Rolf Fokkens
I'm running a very similar setup, but my system boots 100% of the time.
So it may be useful to find out what's causing the problems at your
system. You're using Intel RAID and I'm using Linux software RAID. It
may be relevant, I don't know.
I suspect that the most significant difference is that my backing device
is a partition on a RAID device, rather than the device itself.

In reality, however, I don't think it's worth spending a lot of time
figuring out exactly what else is accessing the device. If I understand
udev correctly, this sort of "conflict" is an inevitable consequence of
its asynchronous nature, and helper programs like bcache-register need
to be robust enough to deal with them.

(If I'm wrong about this, then I guess this is a udev bug?)

BTW, this is what the bcache-related dmesg entries look like when I hit
Post by Rolf Fokkens
[ 1.761932] bcache: bch_journal_replay() journal replay done, 1972 keys in 75 entries, seq 312536
[ 1.762043] bcache: register_cache() registered cache device sda2
[ 1.762631] bcache: register_bcache() error opening /dev/md126p5: device busy
[ 1.862949] bcache: register_bdev() registered backing device md126p5
[ 1.864268] bcache: bch_cached_dev_attach() Caching md126p5 as bcache0 on set fb8d253b-6144-4b48-8ed6-7b1724ed83a7
[ 3.553053] bcache: register_bcache() error opening /dev/sda2: device already registered
[ 4.182180] bcache: register_bcache() error opening /dev/md126p5: device already registered
You can see that the backing device is busy when it first tries to
register it, but it succeeds 1/10th of a second later. (I'm assuming
the last two "already registered" messages occur when udev replays
its events later in the boot process.)

Thanks!
--
========================================================================
Ian Pilcher ***@gmail.com
-------- "I grew up before Mark Zuckerberg invented friendship" --------
========================================================================
Rolf Fokkens
2014-09-06 07:34:58 UTC
Permalink
Post by Ian Pilcher
I suspect that the most significant difference is that my backing device
is a partition on a RAID device, rather than the device itself.
In reality, however, I don't think it's worth spending a lot of time
figuring out exactly what else is accessing the device. If I understand
udev correctly, this sort of "conflict" is an inevitable consequence of
its asynchronous nature, and helper programs like bcache-register need
to be robust enough to deal with them.
(If I'm wrong about this, then I guess this is a udev bug?)
BTW, this is what the bcache-related dmesg entries look like when I hit
Post by Rolf Fokkens
[ 1.761932] bcache: bch_journal_replay() journal replay done, 1972
keys in 75 entries, seq 312536
Post by Ian Pilcher
Post by Rolf Fokkens
[ 1.762043] bcache: register_cache() registered cache device sda2
[ 1.762631] bcache: register_bcache() error opening /dev/md126p5: device busy
[ 1.862949] bcache: register_bdev() registered backing device md126p5
[ 1.864268] bcache: bch_cached_dev_attach() Caching md126p5 as
bcache0 on set fb8d253b-6144-4b48-8ed6-7b1724ed83a7
Post by Ian Pilcher
Post by Rolf Fokkens
[ 3.553053] bcache: register_bcache() error opening /dev/sda2: device already registered
device already registered
Post by Ian Pilcher
You can see that the backing device is busy when it first tries to
register it, but it succeeds 1/10th of a second later. (I'm assuming
the last two "already registered" messages occur when udev replays
its events later in the boot process.)
To me your assumption makes sense. I would be good though if Gabriel (bcache
userspace) and/or Kent (bcache kernel) would respond to this.

I'll add some retries to bcache tools (in case of any error) and see if that
works and add a pull request here: https://github.com/g2p/bcache-tools
Loading...