Lars Ellenberg wrote: >
> > Also as Tom said "What information should we supply, debug information, to
> > help you debug the problem", and how do we trap the data, the next time it
> > happens?
<SNIP> > if "it" happens the next time, i.e.
> you think drbd should become secondary, but it refuses with
> "somebody has still opened me for write access" or something like
> that, and neither fuser nor lsof can tell you who.
I did a fallover tonight so I could update the machine, and tried to capture
you a little info.
Sorry, I missed catching if it was write or read access.
when I issued `service heartbeat stop` I got the following in the log:
all the expected services shutting down
Nov 18 17:35:19 foo xinetd: Reconfigured: new=0 old=4 dropped=0
Nov 18 17:35:23 foo kernel: lockd: couldn't shutdown host module!
Nov 18 17:35:23 foo kernel: nfsd: last server has exited
Nov 18 17:35:23 foo kernel: nfsd: unexporting all filesystems
Nov 18 17:35:23 foo nfs: nfsd shutdown succeeded
Nov 18 17:35:23 foo nfs: rpc.rquotad shutdown succeeded
Nov 18 17:35:23 foo nfs: Shutting down NFS services: succeeded
Nov 18 17:35:23 foo rpc.statd: Caught signal 15, un-registering and
Nov 18 17:35:23 foo nfslock: rpc.statd shutdown succeeded
Nov 18 17:35:26 foo datadisk: ===> datadisk devnb1 stop <===
Nov 18 17:35:26 foo datadisk: 'devnb1' /dev/nb1 is mounted on /devnb1,
trying to unmount
Nov 18 17:35:26 foo datadisk: umount -v /dev/nb1
Nov 18 17:35:26 foo datadisk: ERROR: umount -v /dev/nb1 :
Nov 18 17:35:26 foo datadisk: ERROR: umount: /devnb1: device is busy
Nov 18 17:35:26 foo datadisk: 'devnb1' trying to kill users of /dev/nb1
Nov 18 17:35:26 foo datadisk: fuser -k -m /dev/nb1
Nov 18 17:35:26 foo datadisk: ERROR: fuser -k -m /dev/nb1 :
Nov 18 17:35:26 foo datadisk: ERROR: NO OUTPUT
Nov 18 17:35:29 foo datadisk: umount -v /dev/nb1
... rinse and repeat the errors and commands.
fuser -a -v -k -m /dev/nb1
showed no processes. > try to reduce the process list.
> have a look at it: something in there that somewhen in its lifetime
> might have accessed the device?
I killed (service ... stop) everything, but syslog,
klog, login and all the [k*] (kernel???) processes. > if yes: kill it, if possible.
> does drbd still refuse to become secondary?
still when I issued `umount /devnb1` it would fail to unmount.
I ran lsmod, and `modprobe -r`ed anything that I new I did not need to keep
the disks & keyboard running, this included the modules nfsd & lockd**.
still when I issued `umount /devnb1` it would fail to unmount, so I could
never push it to secondary.
I finaly did a `umount -r /devnb1`
then a `umount -l /devnb1`,
and issued `drbdsetup /dev/nb1 seconary `
but it still failed to become secondary.
after `shutdown -h now` and power down, I made the other machine primary on
/dev/nb1 and did a e2fsck, but it said the device was clean (which was good,
I really did not want to wait the 2 hours for the fsck).
**I don't think that the nfsd & lockd and lockd modules should have been
running by that point because their services were shutdown a long time
previous. The lockd message on heartbeat stop and the lockd module still in
the kernel were the only strange things I noticed.
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
drbd-user mailing list