Mailing List Archive

BUG: Uncatchable DRBD out-of-sync issue
Beginning is here: http://www.gossamer-threads.com/lists/drbd/users/25146

Hello everybody,

Finally I think I can reproduce the issue. When it happens:

Linux kernel: 2.6.32-19-pve (based on vzkernel-2.6.32-042stab075.2.src.rpm)
DRBD Version: 8.3.13
DRBD Mode: dual primary + LVM on top of DRBD
Virtual Machine on top of LVM: Ubuntu 10.04 or 12.04
Virtual Machine hard drive: VIRTIO
Out of sync location: SWAP partition of a Virtual Machine
Notes: out of sync blocks appears not only during verification because
the issue can be reproduced even if VM is stopped during verification
time.

New out of sync blocks appears about three times a week.
If I change VIRTIO to IDE then no issues.

Best regards,
Stanislav
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
On 04/18/2013 08:26 AM, Stanislav German-Evtushenko wrote:
> If I change VIRTIO to IDE then no issues.

Fascinating. Thanks for sharing!

Note that your kernel (and hence kvm/virtio) can be considered rather
old by now. You may see better mileage with the more recent longterm
kernels such as 3.4 or 3.2 (with native DRBD support, also).

Cheers,
Felix
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
> Note that your kernel (and hence kvm/virtio) can be considered rather old by now.
This is a stable RHEL 6 kernel at the moment.


On Thu, Apr 18, 2013 at 1:16 PM, Felix Frank <ff@mpexnet.de> wrote:
> On 04/18/2013 08:26 AM, Stanislav German-Evtushenko wrote:
>> If I change VIRTIO to IDE then no issues.
>
> Fascinating. Thanks for sharing!
>
> Note that your kernel (and hence kvm/virtio) can be considered rather
> old by now. You may see better mileage with the more recent longterm
> kernels such as 3.4 or 3.2 (with native DRBD support, also).
>
> Cheers,
> Felix



--
www.helplinux.ru - Найди себе Гуру
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
>> Note that your kernel (and hence kvm/virtio) can be considered rather old by now.
> This is a stable RHEL 6 kernel at the moment.

Exactly ;-)

Same for Debian 6, which I no longer consider fit for KVM setups
(without backports and such).
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
No choice so far :)
http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3

I don't think this is a kernel bug. Anyway would be nice if sombody
can investigate and fix or at least find work around. IDE is slow in
compare to VIRTIO.

On Thu, Apr 18, 2013 at 2:31 PM, Felix Frank <ff@mpexnet.de> wrote:
> On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
>>> Note that your kernel (and hence kvm/virtio) can be considered rather old by now.
>> This is a stable RHEL 6 kernel at the moment.
>
> Exactly ;-)
>
> Same for Debian 6, which I no longer consider fit for KVM setups
> (without backports and such).
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko <
ginermail@gmail.com> wrote:

> No choice so far :)
> http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3
>
> I don't think this is a kernel bug. Anyway would be nice if sombody
> can investigate and fix or at least find work around. IDE is slow in
> compare to VIRTIO.
>
> On Thu, Apr 18, 2013 at 2:31 PM, Felix Frank <ff@mpexnet.de> wrote:
> > On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
> >>> Note that your kernel (and hence kvm/virtio) can be considered rather
> old by now.
> >> This is a stable RHEL 6 kernel at the moment.
> >
> > Exactly ;-)
> >
> > Same for Debian 6, which I no longer consider fit for KVM setups
> > (without backports and such).
>

I have replaced all hard-drives on the first server and upgraded DRBD
kernel modules to 8.3.15. I do verifying every week. It usually founds new
out-of-sync sectors, then I check if they are false-positive or not (with
md5sum) and find that 95% of them are real.
Could anybody suggest a way to debug? Can it be DRBD + RAID problem? Or
DRBD + one specific RAID problem?
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

Just jumping in, unaware of the history of this thread...

Stanislav German-Evtushenko wrote, on 27-1-2014 7:08:
>
> On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko
> <ginermail@gmail.com <mailto:ginermail@gmail.com>> wrote:
>
> No choice so far :)
> http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3
>
> I don't think this is a kernel bug. Anyway would be nice if sombody
> can investigate and fix or at least find work around. IDE is slow in
> compare to VIRTIO.
>
> On Thu, Apr 18, 2013 at 2:31 PM, Felix Frank <ff@mpexnet.de
> <mailto:ff@mpexnet.de>> wrote:
> > On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
> >>> Note that your kernel (and hence kvm/virtio) can be considered
> rather old by now.
> >> This is a stable RHEL 6 kernel at the moment.
> >
> > Exactly ;-)
> >
> > Same for Debian 6, which I no longer consider fit for KVM setups
> > (without backports and such).
>
>
> I have replaced all hard-drives on the first server and upgraded DRBD kernel
> modules to 8.3.15. I do verifying every week. It usually founds new
> out-of-sync sectors, then I check if they are false-positive or not (with
> md5sum) and find that 95% of them are real.
> Could anybody suggest a way to debug? Can it be DRBD + RAID problem? Or DRBD
> + one specific RAID problem?

Have you figured out on which one of the servers the data is correct? And is
it always the same server? This assumes a primary/secondary setup.
If you know on which server the data is correct then you know - IF it's a
hardware problem - which server is at fault. If it's a software problem,
then you still can't tell.

Do you run a weekly/monthly RAID verification job? On both servers? Linux sw
raid has this, and presumably hw raid has this option as well.
This would pick up (most) RAID / disk issues.
Silent disk corruption on RAID arrays can occur and disk verification would
be the only way to tell (well, apart from using a filesystem like ZFS).

Good luck,

Bram.


- --
Bram Matthys
Software developer/IT consultant syzop@vulnscan.org
Website: www.vulnscan.org
PGP key: www.vulnscan.org/pubkey.asc
PGP fp: EBCA 8977 FCA6 0AB0 6EDB 04A7 6E67 6D45 7FE1 99A6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iF4EAREIAAYFAlLmToAACgkQbmdtRX/hmabbewD9HEaFbFw1j91AgDiAbgWcDari
qZ/fYOYBw/qyMMempbMA/iCKM5Y2Oa3XAUApPWc05cTZ+W9FyOGdOmNgIl4FMGE0
=z7Jn
-----END PGP SIGNATURE-----
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
On Mon, Jan 27, 2014 at 4:18 PM, Bram Matthys <syzop@vulnscan.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi,
>
> Just jumping in, unaware of the history of this thread...
>
> Stanislav German-Evtushenko wrote, on 27-1-2014 7:08:
> >
> > On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko
> > <ginermail@gmail.com <mailto:ginermail@gmail.com>> wrote:
> >
> > No choice so far :)
> > http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3
> >
> > I don't think this is a kernel bug. Anyway would be nice if sombody
> > can investigate and fix or at least find work around. IDE is slow in
> > compare to VIRTIO.
> >
> > On Thu, Apr 18, 2013 at 2:31 PM, Felix Frank <ff@mpexnet.de
> > <mailto:ff@mpexnet.de>> wrote:
> > > On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
> > >>> Note that your kernel (and hence kvm/virtio) can be considered
> > rather old by now.
> > >> This is a stable RHEL 6 kernel at the moment.
> > >
> > > Exactly ;-)
> > >
> > > Same for Debian 6, which I no longer consider fit for KVM setups
> > > (without backports and such).
> >
> >
> > I have replaced all hard-drives on the first server and upgraded DRBD
> kernel
> > modules to 8.3.15. I do verifying every week. It usually founds new
> > out-of-sync sectors, then I check if they are false-positive or not (with
> > md5sum) and find that 95% of them are real.
> > Could anybody suggest a way to debug? Can it be DRBD + RAID problem? Or
> DRBD
> > + one specific RAID problem?
>
> Have you figured out on which one of the servers the data is correct? And
> is
> it always the same server? This assumes a primary/secondary setup.
> If you know on which server the data is correct then you know - IF it's a
> hardware problem - which server is at fault. If it's a software problem,
> then you still can't tell.
>
> Do you run a weekly/monthly RAID verification job? On both servers? Linux
> sw
> raid has this, and presumably hw raid has this option as well.
> This would pick up (most) RAID / disk issues.
> Silent disk corruption on RAID arrays can occur and disk verification would
> be the only way to tell (well, apart from using a filesystem like ZFS).
>
> Good luck,
>
> Bram.
>
>
> - --
> Bram Matthys
> Software developer/IT consultant syzop@vulnscan.org
> Website: www.vulnscan.org
> PGP key: www.vulnscan.org/pubkey.asc
> PGP fp: EBCA 8977 FCA6 0AB0 6EDB 04A7 6E67 6D45 7FE1 99A6
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (MingW32)
>
> iF4EAREIAAYFAlLmToAACgkQbmdtRX/hmabbewD9HEaFbFw1j91AgDiAbgWcDari
> qZ/fYOYBw/qyMMempbMA/iCKM5Y2Oa3XAUApPWc05cTZ+W9FyOGdOmNgIl4FMGE0
> =z7Jn
> -----END PGP SIGNATURE-----
> _______________________________________________
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


> Have you figured out on which one of the servers the data is correct?
> And is it always the same server?
It depends on what server is writing. On the one which write it is always
correct.
Servers are identical and firmwares are up to date.

> Do you run a weekly/monthly RAID verification job? On both servers?
That is nice point to try. I've been thinking I'd tried everything already.

> This would pick up (most) RAID / disk issues.
This is very unlikely, however I'll try to run RAID verification job on
both and will come back with results.

Stanislav
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
Are you using iSCSI to access your volumes ? Might worth it activating iSCSI digests on both sides and see how it behaves then, wouldn’t it ? You’d probably lose some perfs but it would probably too help you identify the root cause of your problems I guess…



Regards,



Pascal.



De : drbd-user-bounces@lists.linbit.com [mailto:drbd-user-bounces@lists.linbit.com] De la part de Stanislav German-Evtushenko
Envoyé : lundi 27 janvier 2014 13:51
À : Bram Matthys
Cc : drbd-user
Objet : Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue





On Mon, Jan 27, 2014 at 4:18 PM, Bram Matthys <syzop@vulnscan.org <mailto:syzop@vulnscan.org> > wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi,

Just jumping in, unaware of the history of this thread...

Stanislav German-Evtushenko wrote, on 27-1-2014 7:08:

>
> On Thu, Apr 18, 2013 at 4:21 PM, Stanislav German-Evtushenko

> <ginermail@gmail.com <mailto:ginermail@gmail.com> <mailto:ginermail@gmail.com <mailto:ginermail@gmail.com> >> wrote:
>
> No choice so far :)
> http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.3
>
> I don't think this is a kernel bug. Anyway would be nice if sombody
> can investigate and fix or at least find work around. IDE is slow in
> compare to VIRTIO.
>
> On Thu, Apr 18, 2013 at 2:31 PM, Felix Frank <ff@mpexnet.de <mailto:ff@mpexnet.de>

> <mailto:ff@mpexnet.de <mailto:ff@mpexnet.de> >> wrote:
> > On 04/18/2013 12:20 PM, Stanislav German-Evtushenko wrote:
> >>> Note that your kernel (and hence kvm/virtio) can be considered
> rather old by now.
> >> This is a stable RHEL 6 kernel at the moment.
> >
> > Exactly ;-)
> >
> > Same for Debian 6, which I no longer consider fit for KVM setups
> > (without backports and such).
>
>
> I have replaced all hard-drives on the first server and upgraded DRBD kernel
> modules to 8.3.15. I do verifying every week. It usually founds new
> out-of-sync sectors, then I check if they are false-positive or not (with
> md5sum) and find that 95% of them are real.
> Could anybody suggest a way to debug? Can it be DRBD + RAID problem? Or DRBD
> + one specific RAID problem?

Have you figured out on which one of the servers the data is correct? And is
it always the same server? This assumes a primary/secondary setup.
If you know on which server the data is correct then you know - IF it's a
hardware problem - which server is at fault. If it's a software problem,
then you still can't tell.

Do you run a weekly/monthly RAID verification job? On both servers? Linux sw
raid has this, and presumably hw raid has this option as well.
This would pick up (most) RAID / disk issues.
Silent disk corruption on RAID arrays can occur and disk verification would
be the only way to tell (well, apart from using a filesystem like ZFS).

Good luck,

Bram.


- --
Bram Matthys
Software developer/IT consultant syzop@vulnscan.org <mailto:syzop@vulnscan.org>
Website: www.vulnscan.org <http://www.vulnscan.org>
PGP key: www.vulnscan.org/pubkey.asc <http://www.vulnscan.org/pubkey.asc>
PGP fp: EBCA 8977 FCA6 0AB0 6EDB 04A7 6E67 6D45 7FE1 99A6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iF4EAREIAAYFAlLmToAACgkQbmdtRX/hmabbewD9HEaFbFw1j91AgDiAbgWcDari
qZ/fYOYBw/qyMMempbMA/iCKM5Y2Oa3XAUApPWc05cTZ+W9FyOGdOmNgIl4FMGE0
=z7Jn
-----END PGP SIGNATURE-----
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com <mailto:drbd-user@lists.linbit.com>
http://lists.linbit.com/mailman/listinfo/drbd-user



> Have you figured out on which one of the servers the data is correct?
> And is it always the same server?
It depends on what server is writing. On the one which write it is always correct.
Servers are identical and firmwares are up to date.


> Do you run a weekly/monthly RAID verification job? On both servers?

That is nice point to try. I've been thinking I'd tried everything already.


> This would pick up (most) RAID / disk issues.

This is very unlikely, however I'll try to run RAID verification job on both and will come back with results.

Stanislav



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
On Mon, Jan 27, 2014 at 6:03 PM, Pascal BERTON <pascal.berton3@free.fr>wrote:

> Are you using iSCSI to access your volumes ? Might worth it activating
> iSCSI digests on both sides and see how it behaves then, wouldn’t it ?
> You’d probably lose some perfs but it would probably too help you identify
> the root cause of your problems I guess…
>
>
>
> Regards,
>
>
>
> Pascal.
>
>
>
> *De :* drbd-user-bounces@lists.linbit.com [mailto:
> drbd-user-bounces@lists.linbit.com] *De la part de* Stanislav
> German-Evtushenko
> *Envoyé :* lundi 27 janvier 2014 13:51
> *À :* Bram Matthys
> *Cc :* drbd-user
> *Objet :* Re: [DRBD-user] BUG: Uncatchable DRBD out-of-sync issue
>
>
>
>
>
I don't use iSCSI, I use LVM on top of DRBD.

Stanislav
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
>
>
> Have you figured out on which one of the servers the data is correct? And
> is
> it always the same server? This assumes a primary/secondary setup.
> If you know on which server the data is correct then you know - IF it's a
> hardware problem - which server is at fault. If it's a software problem,
> then you still can't tell.
>
> Do you run a weekly/monthly RAID verification job? On both servers? Linux
> sw
> raid has this, and presumably hw raid has this option as well.
> This would pick up (most) RAID / disk issues.
> Silent disk corruption on RAID arrays can occur and disk verification would
> be the only way to tell (well, apart from using a filesystem like ZFS).
>
> Good luck,
>
> Bram.


I've done RAID consistency check - both nodes are consistent.

Stanislav
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
Just to make things clearer. These results are not false-positive, they are
real. False-positive also happen but rarely. I do check for false-positive
using the following script:
----------------------------------------------------------
#!/bin/bash

# Usage: cat /var/log/kern.log | drbd_out_of_sync_compare.sh

#echo 'Mar 31 10:24:04 virt1 kernel: block drbd0: Out of sync:
start=1036171232, size=8 (sectors)'
while read line; do
if [[ $line =~ Out\ of\ sync:\ start=([0-9]+),\ size=([0-9]+) ]];
then
start=${BASH_REMATCH[1]}
size=${BASH_REMATCH[2]}
echo $start - $size
sum1=$(ssh 10.10.10.1 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null | md5sum | awk '{print $1}')
sum2=$(ssh 10.10.10.2 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null | md5sum | awk '{print $1}')
if [[ $sum1 = $sum2 ]]; then
echo OK: $sum1 - $sum2
else
echo ERR: $sum1 - $sum2
ssh 10.10.10.1 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null > /tmp/${start}_${size}_1
ssh 10.10.10.2 dd iflag=direct if=/dev/drbd0 bs=512
skip=$start count=$size 2>/dev/null < /dev/null > /tmp/${start}_${size}_2
fi
fi
done
----------------------------------------------------------

And results look like:
----------------------------------------------------------
253182888 - 16
OK: 0829f71740aab1ab98b33eae21dee122 - 0829f71740aab1ab98b33eae21dee122
253182904 - 8
OK: 620f0b67a91f7f74151bc5be745b7110 - 620f0b67a91f7f74151bc5be745b7110
253182952 - 8
OK: 620f0b67a91f7f74151bc5be745b7110 - 620f0b67a91f7f74151bc5be745b7110
253250344 - 8
OK: 620f0b67a91f7f74151bc5be745b7110 - 620f0b67a91f7f74151bc5be745b7110
253259336 - 8
OK: 620f0b67a91f7f74151bc5be745b7110 - 620f0b67a91f7f74151bc5be745b7110
719214256 - 8
OK: 0132ffdc961a93ab39f3687b2168b326 - 0132ffdc961a93ab39f3687b2168b326
719214264 - 8
OK: e824f6f1a60c23fea04cfb5d080747c2 - e824f6f1a60c23fea04cfb5d080747c2
719299576 - 8
OK: a969c6562450baa0c5306fe89fe6d4f9 - a969c6562450baa0c5306fe89fe6d4f9
1085832880 - 8
OK: 9da8849288dcaa863b96d6cf5d9fee09 - 9da8849288dcaa863b96d6cf5d9fee09
1085972048 - 8
ERR: 708d5019b36d8bc6ef68fbdf431efbb3 - bffe661e808e1b42a4c5e1cad490ec0c
1085972072 - 8
ERR: a381fea0de0a34d01db0e4d7a9f9e824 - d1d6d30932ba15611cfac831e337e634
1086079632 - 8
ERR: 75e2d49f51a691998d1e9023b252aa51 - d367cfbd482fde9827ccef063b4b55a9
1086079528 - 8
ERR: 08637fb9b63c59db91c8179a22c9e4f7 - e421dc61e71d95bc63d9ab3fd834aa3e
1086079592 - 8
ERR: 483c379d346769f711721b4df154415b - 132f45d198b603e3f33cee750a21602d
1086079440 - 16
ERR: a8cb7da5e9da13d910b5afdbdb2721d9 - 717acb3d5b7403690f4f33340855a14c
1086128384 - 8
ERR: 31a1740dce4b305eb7a888a35de48ac8 - 53bc81a20e9ab5bddba35d742d3a7551
----------------------------------------------------------

Most of the time (99%) I see ERR for the swap space of virtual machines.

Best regards,
Stanislav
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
On Thu, Jan 30, 2014 at 11:26:43AM +0400, Stanislav German-Evtushenko wrote:
> Just to make things clearer. These results are not false-positive, they are
> real. False-positive also happen but rarely.

Since you re-opened this after about one year,
allow me to paste my answer from back then as well.

.----------
| On Mon, Mar 25, 2013 at 12:20:20PM +0400, Stanislav German-Evtushenko
| wrote:
| > Futher investigations...
| >
| > First vefification went well but then strange things started to
| > happen.
| > Full logs are here: http://pastebin.com/ntbQcaNz
|
| ... "Digest mismatch, buffer modified by upper layers during write" ...
|
| You may want to read this (and following; or even the whole thread):
| http://www.gossamer-threads.com/lists/drbd/users/21069#21069
|
| as well as the links mentioned there
| | The Problem:
| | http://lwn.net/Articles/429305/
| | http://thread.gmane.org/gmane.linux.kernel/1103571
| | http://thread.gmane.org/gmane.linux.scsi/59259
|
| So you *possibly* have ongoing data corruption
| caused by hardware, or layers above DRBD.
|
| Or you may just have "normal behaviour",
| and if DRBD was not that paranoid, you'd not even notice, ever.
`---------------

[...]

> Most of the time (99%) I see ERR for the swap space of virtual machines.

If you enable "integrity-alg", do you still see those "buffer modified
by upper layers during write"?

Well, then that is your problem,
and that problem can *NOT* be fixed with DRBD "config tuning".

What does that mean?

Upper layer submits write to DRBD.
DRBD calculates checksum over data buffer.
DRBD sends that checksum.
DRBD submits data buffer to "local" backend block device.
Meanwhile, upper layer changes data buffer.
DRBD sends data buffer to peer.
DRBD receives local completion.
DRBD receives remote ACK.
DRBD completes this write to upper layer.
*only now* would the upper layer be "allowed"
to change that data buffer again.

Misbehaving upper layer results in potentially divergent blocks
on the DRBD peers. Or would result in potentially divergent blocks on
a local software RAID 1. Which is why the mdadm maintenance script
in rhel, "raid-check", intended to be run periodically from cron,
has this tell-tale chunk:
mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`
# Due to the fact that raid1/10 writes in the kernel are unbuffered,
# a raid1 array can have non-0 mismatch counts even when the
# array is healthy. These non-0 counts will only exist in
# transient data areas where they don't pose a problem. However,
# since we can't tell the difference between a non-0 count that
# is just in transient data or a non-0 count that signifies a
# real problem, simply don't check the mismatch_cnt on raid1
# devices as it's providing far too many false positives. But by
# leaving the raid1 device in the check list and performing the
# check, we still catch and correct any bad sectors there might
# be in the device.
raid_lvl=`cat /sys/block/$dev/md/level`
if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then
continue
fi

Anyways.
Point being: Either have those upper layers stop modifying buffers
while they are in-flight (keyword: "stable pages").
Kernel upgrade within the VMs may do it. Changing something in the
"virtual IO path configuration" may do it. Or not.

Or live with the results, which are
potentially not identical blocks on the DRBD peers.


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
> > Most of the time (99%) I see ERR for the swap space of virtual machines.
>
> If you enable "integrity-alg", do you still see those "buffer modified
> by upper layers during write"?
>
> Well, then that is your problem,
> and that problem can *NOT* be fixed with DRBD "config tuning".
>
> What does that mean?
>
> Upper layer submits write to DRBD.
> DRBD calculates checksum over data buffer.
> DRBD sends that checksum.
> DRBD submits data buffer to "local" backend block device.
> Meanwhile, upper layer changes data buffer.
> DRBD sends data buffer to peer.
> DRBD receives local completion.
> DRBD receives remote ACK.
> DRBD completes this write to upper layer.
> *only now* would the upper layer be "allowed"
> to change that data buffer again.
>
> Misbehaving upper layer results in potentially divergent blocks
> on the DRBD peers. Or would result in potentially divergent blocks on
> a local software RAID 1. Which is why the mdadm maintenance script
> in rhel, "raid-check", intended to be run periodically from cron,
> has this tell-tale chunk:
> mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`
> # Due to the fact that raid1/10 writes in the kernel are
> unbuffered,
> # a raid1 array can have non-0 mismatch counts even when the
> # array is healthy. These non-0 counts will only exist in
> # transient data areas where they don't pose a problem. However,
> # since we can't tell the difference between a non-0 count that
> # is just in transient data or a non-0 count that signifies a
> # real problem, simply don't check the mismatch_cnt on raid1
> # devices as it's providing far too many false positives. But by
> # leaving the raid1 device in the check list and performing the
> # check, we still catch and correct any bad sectors there might
> # be in the device.
> raid_lvl=`cat /sys/block/$dev/md/level`
> if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then
> continue
> fi
>
> Anyways.
> Point being: Either have those upper layers stop modifying buffers
> while they are in-flight (keyword: "stable pages").
> Kernel upgrade within the VMs may do it. Changing something in the
> "virtual IO path configuration" may do it. Or not.
>
> Or live with the results, which are
> potentially not identical blocks on the DRBD peers.
>

Hello Lars,

Thank you for the detailed explanation. I've done some more tests and found
that "out of sync" sectors appear for master-slave also, not only for
master-master.

Can you share your thoughts about what can cause upper layer changes in the
following schema?
KVM (usually virtio) -> LVM -> DRBD -> RAID10 -> Physical drives, while LVM
snapshots are not used.

Can LVM cause these OOS? Could it help if we replace by the following
schema?
KVM (usually virtio) -> DRBD -> LVM -> RAID10 -> Physical drives, while LVM
snapshots are not used.

Stanislav
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
On Mon, Feb 24, 2014 at 01:28:58PM +0400, Stanislav German-Evtushenko wrote:
> > > Most of the time (99%) I see ERR for the swap space of virtual machines.
> >
> > If you enable "integrity-alg", do you still see those "buffer modified
> > by upper layers during write"?
> >
> > Well, then that is your problem,
> > and that problem can *NOT* be fixed with DRBD "config tuning".
> >
> > What does that mean?
> >
> > Upper layer submits write to DRBD.
> > DRBD calculates checksum over data buffer.
> > DRBD sends that checksum.
> > DRBD submits data buffer to "local" backend block device.
> > Meanwhile, upper layer changes data buffer.
> > DRBD sends data buffer to peer.
> > DRBD receives local completion.
> > DRBD receives remote ACK.
> > DRBD completes this write to upper layer.
> > *only now* would the upper layer be "allowed"
> > to change that data buffer again.
> >
> > Misbehaving upper layer results in potentially divergent blocks
> > on the DRBD peers. Or would result in potentially divergent blocks on
> > a local software RAID 1. Which is why the mdadm maintenance script
> > in rhel, "raid-check", intended to be run periodically from cron,
> > has this tell-tale chunk:
> > mismatch_cnt=`cat /sys/block/$dev/md/mismatch_cnt`
> > # Due to the fact that raid1/10 writes in the kernel are
> > unbuffered,
> > # a raid1 array can have non-0 mismatch counts even when the
> > # array is healthy. These non-0 counts will only exist in
> > # transient data areas where they don't pose a problem. However,
> > # since we can't tell the difference between a non-0 count that
> > # is just in transient data or a non-0 count that signifies a
> > # real problem, simply don't check the mismatch_cnt on raid1
> > # devices as it's providing far too many false positives. But by
> > # leaving the raid1 device in the check list and performing the
> > # check, we still catch and correct any bad sectors there might
> > # be in the device.
> > raid_lvl=`cat /sys/block/$dev/md/level`
> > if [ "$raid_lvl" = "raid1" -o "$raid_lvl" = "raid10" ]; then
> > continue
> > fi
> >
> > Anyways.
> > Point being: Either have those upper layers stop modifying buffers
> > while they are in-flight (keyword: "stable pages").
> > Kernel upgrade within the VMs may do it. Changing something in the
> > "virtual IO path configuration" may do it. Or not.
> >
> > Or live with the results, which are
> > potentially not identical blocks on the DRBD peers.
> >
>
> Hello Lars,
>
> Thank you for the detailed explanation. I've done some more tests and found
> that "out of sync" sectors appear for master-slave also, not only for
> master-master.
>
> Can you share your thoughts about what can cause upper layer changes in the
> following schema?
> KVM (usually virtio) -> LVM -> DRBD -> RAID10 -> Physical drives, while LVM
> snapshots are not used.

The virtual machine itself is most likely doing "it".

> Can LVM cause these OOS?

Very unlikely.

> Could it help if we replace by the following schema?
> KVM (usually virtio) -> DRBD -> LVM -> RAID10 -> Physical drives,
> while LVM snapshots are not used.


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
>> Can LVM cause these OOS?
> Very unlikely.

I have another idea here. I'll try to switch drive options for KVM from
cache=none to cache=directsync or cache=writethrough. In that case KVM has
to ensure that data is on disk. I suppose this means (with disk-flushes and
disk-barrier disabled) that KVM will ensure data being written to DRBD
layer and DRBD will ensure it is in RAID cache.

Will come back with results in 2-3 weeks.

Stanislav
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
Hello Lars,

> Upper layer submits write to DRBD.
> DRBD calculates checksum over data buffer.
> DRBD sends that checksum.
> DRBD submits data buffer to "local" backend block device.
> Meanwhile, upper layer changes data buffer.
> DRBD sends data buffer to peer.
> DRBD receives local completion.
> DRBD receives remote ACK.
> DRBD completes this write to upper layer.
> *only now* would the upper layer be "allowed"
> to change that data buffer again.

I think you were right and upper layer misbehaves. I've turned write
caching off for Linux KVMs and last check found only one OOS (it probably
caused before I turned caching off, so I'll wait one more week). Thank you
for pointing the right way to dig.

So far I see the following ways to avoid OOS.
1. Disabling write caching
2. Using barriers for guest OSes - it is enabled by default for ext4 and
can be enabled for ext3 but:
- can't be enabled for swap
- not sure what to do with Windows guests (it is assumed that NTFS supports
barriers but I've seen OOS caused on Windows partitions several times, may
be I need to disable write caching inside Windows)

The first way can cause slowdowns. The second way is to difficult
especially when you can't control guest OSes.

After all I wonder why DRBD can't copy the buffer before writing and then
submit/send this copy and not the origin (that can be changed any time)?

Best regards,
Stanislav
Re: BUG: Uncatchable DRBD out-of-sync issue [ In reply to ]
Hello Lars,

Usually I need to wait for a week to get out of sync so investigation is
going slow.
Could you suggest a reliable way to simulate "Digest mismatch, *buffer
modified* by upper layers during write"? That would help a lot.

Best regards,
Stanislav