Bug 581590

Summary: lvm2 lvremove on snapshots works only sometimes
Product: [openSUSE] openSUSE 11.2 Reporter: flo gleixner <gleixner>
Component: BasesystemAssignee: Xin Wei Hu <xwhu>
Status: VERIFIED UPSTREAM QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: kuehne.stefan
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 11.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description flo gleixner 2010-02-20 15:51:19 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.7) Gecko/20091222 SUSE/3.5.7-1.1.1 Firefox/3.5.7

while trying to remove a lvm2 snapshot it works sometimes, and sometimes not.
Probably related to bug #556177

Reproducible: Sometimes

Steps to Reproduce:
1. create
lvcreate -s -L 5g -n test-snap /dev/vg_system/testvol

(shows actually bug #556177 )

2. check

lvs |grep test-snap
  test-snap    vg_system swi-a-   5.00G encrypted   0.00                        
lvdisplay /dev/vg_system/test-snap
  --- Logical volume ---
  LV Name                /dev/vg_system/test-snap
  VG Name                vg_system
  LV UUID                7Mk7e9-Qifn-WBS0-jA0G-z8AF-0H6g-gyUXdS
  LV Write Access        read/write
  LV snapshot status     active destination for /dev/vg_system/encrypted
  LV Status              available
  # open                 0
  LV Size                32.00 GB
  Current LE             8192
  COW-table size         5.00 GB
  COW-table LE           1280
  Allocated to snapshot  0.00% 
  Snapshot chunk size    4.00 KB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:25
   

3. try to remove
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
  Can't remove open logical volume "test-snap"
willie:~ # lvremove /dev/vg_system/test-snap 
Do you really want to remove active logical volume "test-snap"? [y/n]: y
  Logical volume "test-snap" successfully removed
Comment 1 Stefan Kuehne 2010-09-22 08:55:16 UTC
Hi, 

same behaviour on OpenSUSE 11.3. (We didn't use 11.2. The error didn't occur in 11.1)
We use snapshots to backup Xen-DomU's, so a patch would be desirable.
Comment 2 Xin Wei Hu 2010-09-23 02:44:15 UTC
(In reply to comment #1)
> Hi, 
> 
> same behaviour on OpenSUSE 11.3. (We didn't use 11.2. The error didn't occur in
> 11.1)
> We use snapshots to backup Xen-DomU's, so a patch would be desirable.

Hi,

  I tried once to reproduce this on my machine, but failed.
  Could you provide me a step-by-step instruction on how to reproduce this ?

  Thanks.
Comment 3 Stefan Kuehne 2010-09-23 07:38:36 UTC
Hi,

thanks for help.
I also tried to reproduce the error in a test environment, but this fails. In our productive environment the error occur unsteady. Since last week we use 11.3 on one server. The migration of other servers is in planning. 

We do the following:

We have some physical server with Xen paravirtualized machines. 
These DomUs have partitions mounted like this: 
disk = [ 'phy:/dev/system/helios-dns_root,hda1,w',
         'phy:/dev/system/helios-dns_swap,hdb1,w']

We backup the DomUs with the following procedure:
1. Shutdown the DomU
  xm shutdown -w ${DomU}
2. Create a snapshot
  lvcreate --snapshot --size 2000M --name ${Partition}.backup ${Partition}
3. Start the DomU
  xm start ${DomU}
4. Mount the snapshot
5. Create a tar archive of the mounted snapshot
6. Remove the snapshot
  lvremove -f ${Partition}.backup


Since yesterday our backup script use a workaround (he tries to remove the snapshot up to 20 times :-). I will do some more testing and will wait until we migrate the other servers. Maybe the error depends on the server. If i have new informations, i will send a comment...

regards Stefan
Comment 4 Stefan Kuehne 2010-09-23 12:05:14 UTC
It depends from the server. I can reproduce the error only on one of two servers. 

Redhat has the same problem: https://bugzilla.redhat.com/show_bug.cgi?id=577798

When i stop boot.udev the problem doesn't occur.
Comment 5 Xin Wei Hu 2010-09-24 03:19:22 UTC
Hi Stefan,
  This is very helpful info.
  Could you try to 
"""
grep -r watch /etc/udev/rules.d/ /lib/udev/rules.d/
"""
  and see if there's a 80-udisks.rules ?
If yes, could you delete it and retest ?

  Thanks.

(In reply to comment #4)
> It depends from the server. I can reproduce the error only on one of two
> servers. 
> 
> Redhat has the same problem: https://bugzilla.redhat.com/show_bug.cgi?id=577798
> 
> When i stop boot.udev the problem doesn't occur.
Comment 6 Stefan Kuehne 2010-09-24 07:39:52 UTC
Hi Wei Hu,

i found the file /lib/udev/rules.d/80-udisks.rules.
If i move it away and restart /etc/init.d/boot.udev , the error occurs anyway.
If i move some other rules (1*) away, then lvcreate hangs.

regards
Stefan
Comment 7 Xin Wei Hu 2010-09-24 07:45:12 UTC
(In reply to comment #6)
> Hi Wei Hu,
> 
> i found the file /lib/udev/rules.d/80-udisks.rules.
> If i move it away and restart /etc/init.d/boot.udev , the error occurs anyway.

Can you move the 80-udisks.rules to where it was, and comment out following line
"""
KERNEL=="dm-*", OPTIONS+="watch"
"""
I see someone reports that worked for them.

> If i move some other rules (1*) away, then lvcreate hangs.

10-dm.rules       11-dm-lvm.rules   13-dm-disk.rules  
These 3 files are critical for lvm2 to working properly on 11.3
You should not remove any of them.

Thanks
Comment 8 Xin Wei Hu 2010-09-24 07:45:13 UTC
(In reply to comment #6)
> Hi Wei Hu,
> 
> i found the file /lib/udev/rules.d/80-udisks.rules.
> If i move it away and restart /etc/init.d/boot.udev , the error occurs anyway.

Can you move the 80-udisks.rules to where it was, and comment out following line
"""
KERNEL=="dm-*", OPTIONS+="watch"
"""
I see someone reports that worked for them.

> If i move some other rules (1*) away, then lvcreate hangs.

10-dm.rules       11-dm-lvm.rules   13-dm-disk.rules  
These 3 files are critical for lvm2 to working properly on 11.3
You should not remove any of them.

Thanks
Comment 9 Stefan Kuehne 2010-09-24 07:55:18 UTC
No change in behaviour.
Comment 10 Xin Wei Hu 2010-09-24 08:43:04 UTC
(In reply to comment #9)
> No change in behaviour.

So, by running
"""
grep -r watch /etc/udev/rules.d/ /lib/udev/rules.d/
"""
Do you find any other rules watching on device-mapper devices ?
Comment 11 Stefan Kuehne 2010-09-24 13:06:33 UTC
keto-0:~ # grep -r watch /etc/udev/rules.d/ /lib/udev/rules.d/
/lib/udev/rules.d/80-udisks.rules:KERNEL=="dm-*", OPTIONS+="watch"
/lib/udev/rules.d/60-persistent-storage.rules:# watch for future changes
/lib/udev/rules.d/60-persistent-storage.rules:KERNEL!="sr*", OPTIONS+="watch"
/lib/udev/rules.d/13-dm-disk.rules:OPTIONS+="watch"
Comment 12 Stefan Kuehne 2010-09-24 13:13:10 UTC
if i comment out the lines in 80-udisks.rules and 13-dm-disk.rules, then the error doesn't occur. :-)

Is this a possible workaround?
Comment 13 Xin Wei Hu 2010-09-26 06:39:11 UTC
(In reply to comment #12)
> if i comment out the lines in 80-udisks.rules and 13-dm-disk.rules, then the
> error doesn't occur. :-)
> 
> Is this a possible workaround?

Good to know ;)
Yes, I think this can be a workaround.

It's always a problem for device mapper to interact with udev watch.
We are going to follow upstream to resolve this issue.

Thanks for testing this.