Mailing List Archive

Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log)
Well, I've updated both nodes to 2.6.35.11 and 8.3.10 running CentOS 5.5.

The good news:
I did a create-md and a full resync on the secondary node.
No more ASSERTs logged. :-)

The bad news:
The integrity checks still fails. Just the logs for today:
Feb 15 02:49:35 drbd0: Digest integrity check FAILED: 713210072s +4096
Feb 15 06:16:33 drbd0: Digest integrity check FAILED: 713142808s +4096
Feb 15 07:10:39 drbd0: Digest integrity check FAILED: 713049088s +4096
Feb 15 08:47:22 drbd0: Digest integrity check FAILED: 713119656s +4096
Feb 15 09:15:24 drbd0: Digest integrity check FAILED: 713215448s +4096
Feb 15 10:11:01 drbd0: Digest integrity check FAILED: 713232072s +4096
Feb 15 11:12:44 drbd0: Digest integrity check FAILED: 713239944s +4096
Feb 15 11:30:40 drbd0: Digest integrity check FAILED: 713106328s +4096
Feb 15 11:36:22 drbd0: Digest integrity check FAILED: 713151800s +4096
Feb 15 11:40:22 drbd0: Digest integrity check FAILED: 713166384s +4096
Feb 15 13:55:41 drbd0: Digest integrity check FAILED: 713138680s +4096
Feb 15 15:11:14 drbd0: Digest integrity check FAILED: 713189472s +4096

Searching the list shows the possible reasons:
http://lists.linbit.com/pipermail/drbd-user/2008-January/008343.html

- bit flip (in either sha1 or data) on the way from main memory to NIC
(which would go undetected by tcp checksum when you have offloading
enabled)
- bit flip on the way from NIC to main memory (the same)
- any form of corruption due to a race condition or bug
in NIC firmware or driver
- bit flip/random corruption by some reassembling network compenent
along the way
(not in your case, as I understand you use a direct passive link)
- the application (when using direct-io),
respectively the file system, re-using (modifying) the write buffer
while it is in flight, without waiting for the write to complete first
(unlikely, but we start to believe that we may have evidence
this does indeed happen under certain circumstances)
- bug in drbd miscalculating stuff
(would show up more often)

Now, I probably can rule out any NIC problems after transferring a
couple of TiB using nc (over plain TCP just like DRBD) in both directions.
Every sha1 checksum of the 100 GiB test-file was ok...

The NICs are directly connected by a crossover-cable, so no switch involved.

This leaves just the last two possibilities, right?
How can I test or debug them further?

If you need any information regarding my setup, please let me know.

Regards,
Walter
--
Schon gehört? GMX hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://www.gmx.net/de/go/toolbar
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log) [ In reply to ]
Btw, I've logged the output of:
drbdsetup /dev/drbd0 events -a -u

Somebody interested in it?

Walter
--
NEU: FreePhone - kostenlos mobil telefonieren und surfen!
Jetzt informieren: http://www.gmx.net/de/go/freephone
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log) [ In reply to ]
On Wed, Feb 16, 2011 at 08:24:37AM +0100, Walter Haidinger wrote:
> Btw, I've logged the output of:
> drbdsetup /dev/drbd0 events -a -u
>
> Somebody interested in it?

kernel logs, config, and meta data dump are more interesting.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log) [ In reply to ]
I've now replaced the onboard NICs used for the drbd link with PCIe models. The integrity checks still fail every couple of hours.
This is hardly suprising, though, because I was unable to reproduce any transmissions errors other than with drbd.

Is it therefore safe to assume to rule out the network hardware?

> kernel logs, config, and meta data dump are more interesting.

Allright. Please tell me if anything else is interesting too.
Any hints regarding howto diagnose this problem are highly appreciated!

Please note that the system is otherwise stable, no problems except the
failed integrity checks of drbd.

/proc/drbd:
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by @build.k9, 2011-02-25 09:08:11
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:9220020 dw:9220016 dr:3065364 al:0 bm:881 lo:1 pe:0 ua:1 ap:0 ep:1 wo:b oos:0
resync: used:0/61 hits:51 misses:11 starving:0 dirty:0 changed:11
act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0

drbdadm dump:
# /etc/drbd.conf
global { minor-count 16; }

common {
net {
data-integrity-alg md5;
sndbuf-size 1M;
rcvbuf-size 1M;
}
syncer {
rate 100M;
c-plan-ahead 30;
c-fill-target 4k;
c-max-rate 120M;
c-min-rate 1024;
verify-alg sha1;
csums-alg sha1;
}
}

# resource md3 on prod1b.k9: not ignored, not stacked
resource md3 {
protocol C;
on prod1a.k9 {
device /dev/drbd0 minor 0;
disk /dev/md3;
address ipv4 192.168.10.1:7788;
flexible-meta-disk /dev/sys/drbd_meta0;
}
on prod1b.k9 {
device /dev/drbd0 minor 0;
disk /dev/md3;
address ipv4 192.168.10.2:7788;
flexible-meta-disk /dev/sys/drbd_meta0;
}
net {
timeout 100;
connect-int 10;
ping-int 10;
ping-timeout 5;
max-buffers 4096;
unplug-watermark 2048;
max-epoch-size 4096;
ko-count 5;
cram-hmac-alg sha256;
shared-secret secret;
after-sb-0pri discard-younger-primary;
after-sb-1pri consensus;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
disk {
on-io-error detach;
fencing dont-care;
}
syncer {
al-extents 3389;
}
startup {
wfc-timeout 120;
degr-wfc-timeout 30;
}
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
fence-peer /usr/sbin/drbd-peer-outdater;
}

kernel dmesg output of an error:
drbd0: Digest integrity check FAILED: 182846680s +4096
drbd0: error receiving Data, l: 4136!
drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
drbd0: asender terminated
drbd0: Terminating asender thread
drbd0: Connection closed
drbd0: conn( ProtocolError -> Unconnected )
drbd0: receiver terminated
drbd0: Restarting receiver thread
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection )
drbd0: Handshake successful: Agreed network protocol version 96
drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Starting asender thread (from drbd0_receiver [5679])
drbd0: data-integrity-alg: md5
drbd0: max BIO size = 130560
drbd0: drbd_sync_handshake:
drbd0: self 232D95BBCD88356C:0000000000000000:9FD19CF528E7A53A:9FD09CF528E7A53B bits:0 flags:0
drbd0: peer 5C46D84FC9C15C7D:232D95BBCD88356D:9FD19CF528E7A53B:9FD09CF528E7A53B bits:1 flags:0
drbd0: uuid_compare()=-1 by rule 50
drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
drbd0: conn( WFBitMapT -> WFSyncUUID )
drbd0: updated sync uuid 232E95BBCD88356C:0000000000000000:9FD19CF528E7A53A:9FD09CF528E7A53B
drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
drbd0: Began resync as SyncTarget (will sync 736 KB [184 bits set]).
drbd0: Resync done (total 1 sec; paused 0 sec; 736 K/sec)
drbd0: 0 % had equal check sums, eliminated: 0K; transferred 736K total 736K
drbd0: updated UUIDs 5C46D84FC9C15C7C:0000000000000000:232E95BBCD88356C:232D95BBCD88356D
drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
drbd0: bitmap WRITE of 5924 pages took 13 jiffies
drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.

#drbdadm dump-md md3
# DRBD meta data dump
# 2011-02-25 11:58:33 +0100 [1298631513]
# prod1b.k9> drbdmeta 0 v08 /dev/sys/drbd_meta0 flex-external dump-md
#

version "v08";

# md_size_sect 139264
# md_offset 0
# al_offset 4096
# bm_offset 36864

uuid {
0x5C46D84FC9C15C7C; 0x0000000000000000; 0x232E95BBCD88356C; 0x232D95BBCD88356D;
flags 0x00000011;
}
# al-extents 3389;
la-size-sect 1555043584;
bm-byte-per-bit 4096;
device-uuid 0x95962A5B877A5C33;
# bm-bytes 24297560;
bm {
# at 0kB
3037248 times 0x0000000000000000;
}
# bits-set 0;

Last but not least the system configuration:
Two nodes, identical hardware, running as a simple active/passive heartbeat v1 cluster (no CRM).
OS: CentOS 5.5 x86_64 with vanilla 2.6.35.11 kernel and drbd 8.3.10.
HW: Asus M3A-H mainboard, Phenom X4 965, 8G DDR2-800 ECC (EDAC enabled).
NICs (all Gigabit): Onboard Atheros L1, PCIe Intel 82572EI, PCIe Intel 82574L (used as dedicated drbd link, directly connected, no switch)
Storage: drbd on top of 3-way raid-1 (Linux md software-raid of SATA drives), LVM on top of drbd, all filesystems ext3.

Again, if anything else is interesting (lsmod, lspci?), just tell me.

Regards,
Walter
--
Schon gehört? GMX hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://www.gmx.net/de/go/toolbar
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log) [ In reply to ]
On Fri, Feb 25, 2011 at 12:12:15PM +0100, Walter Haidinger wrote:
> I've now replaced the onboard NICs used for the drbd link with PCIe models. The integrity checks still fail every couple of hours.
> This is hardly suprising, though, because I was unable to reproduce any transmissions errors other than with drbd.
>
> Is it therefore safe to assume to rule out the network hardware?
>
> > kernel logs, config, and meta data dump are more interesting.
>
> Allright. Please tell me if anything else is interesting too.
> Any hints regarding howto diagnose this problem are highly appreciated!
>
> Please note that the system is otherwise stable, no problems except the
> failed integrity checks of drbd.

So you no longer have any problems/ASSERTs regarding drbd_al_read_log?

> /proc/drbd:
> version: 8.3.10 (api:88/proto:86-96)
> GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by @build.k9, 2011-02-25 09:08:11
> 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
> ns:0 nr:9220020 dw:9220016 dr:3065364 al:0 bm:881 lo:1 pe:0 ua:1 ap:0 ep:1 wo:b oos:0
> resync: used:0/61 hits:51 misses:11 starving:0 dirty:0 changed:11
> act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0
>
> drbdadm dump:
> # /etc/drbd.conf
> global { minor-count 16; }
>
> common {
> net {
> data-integrity-alg md5;
> sndbuf-size 1M;
> rcvbuf-size 1M;
> }
> syncer {
> rate 100M;
> c-plan-ahead 30;
> c-fill-target 4k;
> c-max-rate 120M;
> c-min-rate 1024;
> verify-alg sha1;
> csums-alg sha1;
> }
> }
>
> # resource md3 on prod1b.k9: not ignored, not stacked
> resource md3 {
> protocol C;
> on prod1a.k9 {
> device /dev/drbd0 minor 0;
> disk /dev/md3;
> address ipv4 192.168.10.1:7788;
> flexible-meta-disk /dev/sys/drbd_meta0;
> }
> on prod1b.k9 {
> device /dev/drbd0 minor 0;
> disk /dev/md3;
> address ipv4 192.168.10.2:7788;
> flexible-meta-disk /dev/sys/drbd_meta0;
> }
> net {
> timeout 100;
> connect-int 10;
> ping-int 10;
> ping-timeout 5;
> max-buffers 4096;
> unplug-watermark 2048;
> max-epoch-size 4096;
> ko-count 5;
> cram-hmac-alg sha256;
> shared-secret secret;
> after-sb-0pri discard-younger-primary;
> after-sb-1pri consensus;
> after-sb-2pri disconnect;
> rr-conflict disconnect;
> }
> disk {
> on-io-error detach;
> fencing dont-care;
> }
> syncer {
> al-extents 3389;
> }
> startup {
> wfc-timeout 120;
> degr-wfc-timeout 30;
> }
> handlers {
> pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
> pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
> local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
> fence-peer /usr/sbin/drbd-peer-outdater;
> }
>
> kernel dmesg output of an error:
> drbd0: Digest integrity check FAILED: 182846680s +4096
> drbd0: error receiving Data, l: 4136!
> drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )

Well, what does the other (Primary) side say?

I'd expect it to say
"Digest mismatch, buffer modified by upper layers during write: ..."

If it does not, your link corrupty data.
If it does, well, then that's what happens.
(note: this double check on the sending side
has only been introduced with 8.3.10)

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Digest integrity check FAILED [ In reply to ]
On 02/25/2011 12:33 PM, Lars Ellenberg wrote:
>> kernel dmesg output of an error:
>> > drbd0: Digest integrity check FAILED: 182846680s +4096
>> > drbd0: error receiving Data, l: 4136!
>> > drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
> Well, what does the other (Primary) side say?
>
> I'd expect it to say
> "Digest mismatch, buffer modified by upper layers during write: ..."
>
> If it does not, your link corrupty data.
> If it does, well, then that's what happens.
> (note: this double check on the sending side
> has only been introduced with 8.3.10)

i'm curious if the primary shows Connected/DUnknown ... after reconnect.

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office@ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Digest integrity check FAILED (was: ASSERT FAILED: drbd_al_read_log) [ In reply to ]
Hi Lars, thanks for the reply.

> So you no longer have any problems/ASSERTs regarding drbd_al_read_log?

No, those are gone. I did a create-md on the secondary node and a full resync. Don't know if that was "the fix", though, but I suppose so.

> Well, what does the other (Primary) side say?
> I'd expect it to say
> "Digest mismatch, buffer modified by upper layers during write: ..."

Yes, it does (see the kernel logs below).

> If it does not, your link corrupty data.
> If it does, well, then that's what happens.
> (note: this double check on the sending side
> has only been introduced with 8.3.10)

Now where do I go from here?
Any way to tell who or what is responsible for the data corruption?

Trimmed kernel logs from todays (Feb 26th) corruption:
-- primary node --
14:42:11 Digest mismatch, buffer modified by upper layers during write: 713118272s +4096
14:42:11 sock was shut down by peer
14:42:11 peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
14:42:11 short read expecting header on sock: r=0
14:42:11 new current UUID B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1
14:42:11 meta connection shut down by peer.
14:42:11 asender terminated
14:42:11 Terminating asender thread
14:42:11 Connection closed
14:42:11 conn( BrokenPipe -> Unconnected )
14:42:11 receiver terminated
14:42:11 Restarting receiver thread
14:42:11 receiver (re)started
14:42:11 conn( Unconnected -> WFConnection )
14:42:11 Handshake successful: Agreed network protocol version 96
14:42:11 Peer authenticated using 32 bytes of 'sha256' HMAC
14:42:11 conn( WFConnection -> WFReportParams )
14:42:11 Starting asender thread (from drbd0_receiver [11524])
14:42:11 data-integrity-alg: md5
14:42:11 max BIO size = 130560
14:42:12 drbd_sync_handshake:
14:42:12 self B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1 bits:54 flags:0
14:42:12 peer 7015E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1 bits:0 flags:0
14:42:12 uuid_compare()=1 by rule 70
14:42:12 peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
14:42:12 helper command: /sbin/drbdadm before-resync-source minor-0
14:42:12 helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
14:42:12 conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
14:42:12 Began resync as SyncSource (will sync 240 KB [60 bits set]).
14:42:12 updated sync UUID B9F806E19286A6F7:7016E2DDD8881707:7015E2DDD8881707:9682DC6D3721EFE1
14:42:12 Resync done (total 1 sec; paused 0 sec; 240 K/sec)
14:42:12 1 % had equal check sums, eliminated: 4K; transferred 236K total 240K
14:42:12 updated UUIDs B9F806E19286A6F7:0000000000000000:7016E2DDD8881707:7015E2DDD8881707
14:42:12 conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
14:42:12 bitmap WRITE of 5931 pages took 13 jiffies
14:42:13 0 KB (0 bits) marked out-of-sync by on disk bit-map.

-- secondary node --
14:42:11 Digest integrity check FAILED: 713118272s +4096
14:42:11 error receiving Data, l: 4136!
14:42:11 peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
14:42:11 asender terminated
14:42:11 Terminating asender thread
14:42:11 Connection closed
14:42:11 conn( ProtocolError -> Unconnected )
14:42:11 receiver terminated
14:42:11 Restarting receiver thread
14:42:11 receiver (re)started
14:42:11 conn( Unconnected -> WFConnection )
14:42:11 Handshake successful: Agreed network protocol version 96
14:42:11 Peer authenticated using 32 bytes of 'sha256' HMAC
14:42:11 conn( WFConnection -> WFReportParams )
14:42:11 Starting asender thread (from drbd0_receiver [9650])
14:42:11 data-integrity-alg: md5
14:42:11 max BIO size = 130560
14:42:11 drbd_sync_handshake:
14:42:11 self 7015E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1 bits:0 flags:0
14:42:11 peer B9F806E19286A6F7:7015E2DDD8881707:9682DC6D3721EFE1:9681DC6D3721EFE1 bits:54 flags:0
14:42:11 uuid_compare()=-1 by rule 50
14:42:11 peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
14:42:12 conn( WFBitMapT -> WFSyncUUID )
14:42:12 updated sync uuid 7016E2DDD8881706:0000000000000000:9682DC6D3721EFE0:9681DC6D3721EFE1
14:42:12 helper command: /sbin/drbdadm before-resync-target minor-0
14:42:12 helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
14:42:12 conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
14:42:12 Began resync as SyncTarget (will sync 240 KB [60 bits set]).
14:42:12 Resync done (total 1 sec; paused 0 sec; 240 K/sec)
14:42:12 1 % had equal check sums, eliminated: 4K; transferred 236K total 240K
14:42:12 updated UUIDs B9F806E19286A6F6:0000000000000000:7016E2DDD8881706:7015E2DDD8881707
14:42:12 conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
14:42:12 helper command: /sbin/drbdadm after-resync-target minor-0
14:42:12 helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
14:42:13 bitmap WRITE of 5931 pages took 15 jiffies
14:42:13 0 KB (0 bits) marked out-of-sync by on disk bit-map.



--
NEU: FreePhone - kostenlos mobil telefonieren und surfen!
Jetzt informieren: http://www.gmx.net/de/go/freephone
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Digest integrity check FAILED [ In reply to ]
> i'm curious if the primary shows Connected/DUnknown ... after reconnect.

No, it shows:
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

Walter
--
Schon gehört? GMX hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://www.gmx.net/de/go/toolbar
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Explained: Digest integrity check FAILED [ In reply to ]
On Sat, Feb 26, 2011 at 07:31:03PM +0100, Walter Haidinger wrote:
> Hi Lars, thanks for the reply.
>
> > So you no longer have any problems/ASSERTs regarding drbd_al_read_log?
>
> No, those are gone. I did a create-md on the secondary node and a full resync. Don't know if that was "the fix", though, but I suppose so.
>
> > Well, what does the other (Primary) side say?
> > I'd expect it to say
> > "Digest mismatch, buffer modified by upper layers during write: ..."
>
> Yes, it does (see the kernel logs below).
>
> > If it does not, your link corrupty data.
> > If it does, well, then that's what happens.
> > (note: this double check on the sending side
> > has only been introduced with 8.3.10)
>
> Now where do I go from here?
> Any way to tell who or what is responsible for the data corruption?

There is just "buffers modified during writeout".
That's not necessarily the same as data corruption.

Quoting the DRBD User's Guide:

Notes on data integrity

There are two independent methods in DRBD to ensure the integrity of the
mirrored data. The online-verify mechanism and the data-integrity-alg of
the network section.

Both mechanisms might deliver false positives if the user of DRBD
modifies the data which gets written to disk while the transfer goes on.
This may happen for swap, or for certain append while global sync, or
truncate/rewrite workloads, and not necessarily poses a problem for the
integrity of the data. Usually when the initiator of the data transfer
does this, it already knows that that data block will not be part of an
on disk data structure, or will be resubmitted with correct data soon
enough.

...


If you don't want to know about that, disable that check.
If the replication link interruptions caused by that check
are bad for your setup (particularly so in dual primary setups),
disabled that check.

If you want to use it anyways: that's great, do so, and live with it.

If you want to have DRBD do "end-to-end" data checksums, even if the
data buffers may be modified while being in flight, and still want it to
be efficient, sponsor feature development.

The Problem:
http://lwn.net/Articles/429305/
http://thread.gmane.org/gmane.linux.kernel/1103571
http://thread.gmane.org/gmane.linux.scsi/59259

And many many more older threads on various ML,
some of them misleading, some of them mixing
this issue of in-flight modifications
with actual (hardware caused) data corruption.


Possible Solutions:
- DRBD starts to first copy every submitted data to some
private pages, then calculates the checksum.
As this is now a checksum over *private* pages, if it does not
match, that's a always a sign of data corruption.
It also is a significant performance hit. Potentially, we could
optimistically try to get away without copying, and only take the
performance hit once we see a mismatch, in which case we'd need to
copy it still anyways, and send it again -- if we still have it.

- Linux generic write-out path is fixed to not allow
modifications of data during write-out.

- Linux generic block integrity framework is fixed in whatever
way is deemed most useful, and DRBD switches to use that instead,
respectively simply forward integrity information, which may
already have been generated by some layer above DRBD.

The "generic write out path" people seem to be on it, this time.
Not sure if it will help much with VMs on top of DRBD, as they will run
older kernels or different operating systems doing things differently,
potentially screwing things up.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Explained: Digest integrity check FAILED [ In reply to ]
Lars, first of all, thanks a _lot_ for this detailed explanation!
I guess this post will help many people in the future.

Could you add the warning about false positives to the Users Guide at
http://www.drbd.org/users-guide/s-integrity-check.html too?

> Quoting the DRBD User's Guide:
> Notes on data integrity
>
> This may happen for swap, or for certain append while global sync, or
> truncate/rewrite workloads, and not necessarily poses a problem for the
> integrity of the data. Usually when the initiator of the data transfer
> does this, it already knows that that data block will not be part of an
> on disk data structure, or will be resubmitted with correct data soon
> enough.

This is new in drbd.conf(5) of 8.3.10. Unfortunately I was only aware
of 8.3.9 which just mentions swap and ReiserFS. Having neither of those
two, I thought I was "free" of false positives...

> If you want to have DRBD do "end-to-end" data checksums, even if the
> data buffers may be modified while being in flight, and still want it to
> be efficient, sponsor feature development.

Fair enough! ;-)

> The Problem:
> http://lwn.net/Articles/429305/
> http://thread.gmane.org/gmane.linux.kernel/1103571
> http://thread.gmane.org/gmane.linux.scsi/59259
>
> And many many more older threads on various ML,
> some of them misleading, some of them mixing
> this issue of in-flight modifications
> with actual (hardware caused) data corruption.

This and the (outdated) notes of drbd.conf(5) probably got me
on the "wrong" track. It also explains why I was unable to reproduce
any data corruption.

> Possible Solutions:
[...]

One last question for clarification, though:
Given the above, even online verify isn't free of false positives, right?
Then some out-of-sync blocks are to be expected. If so, a warning in the
manpage and guide too would probably avoid some gray hair. ;-)

Thanks again and best regards,
Walter

--
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Explained: Digest integrity check FAILED [ In reply to ]
On Mon, Feb 28, 2011 at 11:26:04AM +0100, Walter Haidinger wrote:
> Lars, first of all, thanks a _lot_ for this detailed explanation!
> I guess this post will help many people in the future.
>
> Could you add the warning about false positives to the Users Guide at
> http://www.drbd.org/users-guide/s-integrity-check.html too?
>
> > Quoting the DRBD User's Guide:
> > Notes on data integrity
> >
> > This may happen for swap, or for certain append while global sync, or
> > truncate/rewrite workloads, and not necessarily poses a problem for the
> > integrity of the data. Usually when the initiator of the data transfer
> > does this, it already knows that that data block will not be part of an
> > on disk data structure, or will be resubmitted with correct data soon
> > enough.
>
> This is new in drbd.conf(5) of 8.3.10. Unfortunately I was only aware
> of 8.3.9 which just mentions swap and ReiserFS. Having neither of those
> two, I thought I was "free" of false positives...
>
> > If you want to have DRBD do "end-to-end" data checksums, even if the
> > data buffers may be modified while being in flight, and still want it to
> > be efficient, sponsor feature development.
>
> Fair enough! ;-)
>
> > The Problem:
> > http://lwn.net/Articles/429305/
> > http://thread.gmane.org/gmane.linux.kernel/1103571
> > http://thread.gmane.org/gmane.linux.scsi/59259
> >
> > And many many more older threads on various ML,
> > some of them misleading, some of them mixing
> > this issue of in-flight modifications
> > with actual (hardware caused) data corruption.
>
> This and the (outdated) notes of drbd.conf(5) probably got me
> on the "wrong" track. It also explains why I was unable to reproduce
> any data corruption.
>
> > Possible Solutions:
> [...]
>
> One last question for clarification, though:
> Given the above, even online verify isn't free of false positives, right?
> Then some out-of-sync blocks are to be expected. If so, a warning in the
> manpage and guide too would probably avoid some gray hair. ;-)

Uhm. Well.
Online verify does _not_ suffer from _this_ problem.
The pages used to read in and compare data are private to DRBD in this
case, and we won't modify them while we are calculating checksums.
Application IO is locked out while we are reading the data for checksums.

So if DRBD Online Verify finds out-of-sync blocks,
they have been out of sync at that point in time.

They may be unlucky enough to see blocks where data has been modified in
flight, and as a result local and remote disk contain differing blocks.

Such differences should be very short lived, though,
as modification means re-dirtying, and that means
the page will be resubmitted by upper layers "soon".

Unless these block belong to files that are unlinked before the
respective re-dirtied page is written out again (and thus the
re-dirtied page is simply discarded, before being re-submitted).

Conclusion:
out-of-sync blocks found by online verify, or software raid1
"resilvering", any similar procedures, do not necessarily mean
broken hardware or memory corruption.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
Re: Explained: Digest integrity check FAILED [ In reply to ]
> > One last question for clarification, though:
> > Given the above, even online verify isn't free of false positives,
> right?
> > Then some out-of-sync blocks are to be expected. If so, a warning in the
> > manpage and guide too would probably avoid some gray hair. ;-)
>
> Uhm. Well.
> Online verify does _not_ suffer from _this_ problem.

Ok. Problem is, the weekly verify usually sees several (about 20-30 for a 700 GiB drbd device) blocks out-of-sync.

> The pages used to read in and compare data are private to DRBD in this
> case, and we won't modify them while we are calculating checksums.
> Application IO is locked out while we are reading the data for checksums.
>
> So if DRBD Online Verify finds out-of-sync blocks,
> they have been out of sync at that point in time.
>
> They may be unlucky enough to see blocks where data has been modified in
> flight, and as a result local and remote disk contain differing blocks.

So, not necessarily _this_ problem but a similar one...

> Such differences should be very short lived, though,
> as modification means re-dirtying, and that means
> the page will be resubmitted by upper layers "soon".
>
> Unless these block belong to files that are unlinked before the
> respective re-dirtied page is written out again (and thus the
> re-dirtied page is simply discarded, before being re-submitted).

Hence swap is listed first in the man-page.
Databases are probably also more likely for causing this, right?

> Conclusion:
> out-of-sync blocks found by online verify, or software raid1
> "resilvering", any similar procedures, do not necessarily mean
> broken hardware or memory corruption.

The conclusion from the user perspective then is:
Verify must not find any out-of-sync blocks _only_ if everything
on top of drbd is idle (i.e. read-only or unmounted).

Lars, thanks again to clarify this issue!
IMHO you should add this to the documentation of the data integrity section.

Best regards,
Walter


--
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user