Mailing List Archive

SLES11 SP3: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
Hi!

When a SLES11SP3 node joined a 3-node cluster after reboot (and preceeding update), a node with up-to-date software showed these messages (I feel these should not appear):

Jan 20 17:12:38 h10 corosync[13220]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

### So why may not the same nodes have the same nodeid?

Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: crm_find_peer: Node 84939948/h05 = 0x61ae90 - b6cabbb3-8332-4903-85be-0c06272755ac
Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: crm_find_peer: Node 17831084/h01 = 0x61e300 - 11693f38-8125-45f2-b397-86136d5894a4
Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: crm_find_peer: Node 739512330/h10 = 0x614400 - 302e33d8-7cee-4f3b-97da-b38f0d51b0f6

### above are the three nodes of the cluster

Jan 20 17:12:38 h10 attrd[13260]: crit: crm_find_peer: Node 739512321 and 17831084 share the same name 'h01'

### Now there are different nodeids it seems...

Jan 20 17:12:38 h10 attrd[13260]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

### The same again...

(pacemaker-1.1.11-0.7.53, corosync-1.4.7-0.19.6)

As a result the node h01 is offline now. Before updating the software the node was member of the cluster.

On node h01 I see messages like these:
cib[7439]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
cib[7439]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
cib[7439]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512330
cib[7439]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[739512330] - state is now member (was (null))
crmd[7444]: warning: crmd_cs_dispatch: Receiving messages from a node we think is dead: rksaph05[84939948]
crmd[7444]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512330
corosync[7402]: [MAIN ] Completed service synchronization, ready to provide service.
crmd[7444]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]

An attempt to restart openais did hang with this messages:
attrd[7442]: notice: attrd_perform_update: Sent update 7: shutdown=1421771193
corosync[7402]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=7444, seq=6) to terminate...
[message repeats]

So I killed crmd (pid 7444)m and openais shut down.
Unfortunately the problem still persists...

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: SLES11 SP3: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321 [ In reply to ]
> On 21 Jan 2015, at 3:38 am, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:
>
> Hi!
>
> When a SLES11SP3 node joined a 3-node cluster after reboot (and preceeding update), a node with up-to-date software showed these messages (I feel these should not appear):
>
> Jan 20 17:12:38 h10 corosync[13220]: [MAIN ] Completed service synchronization, ready to provide service.
> Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
> Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
> Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

We fixed this upstream a little while back.
I think the fix even came from suse.

>
> ### So why may not the same nodes have the same nodeid?
>
> Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: crm_find_peer: Node 84939948/h05 = 0x61ae90 - b6cabbb3-8332-4903-85be-0c06272755ac
> Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: crm_find_peer: Node 17831084/h01 = 0x61e300 - 11693f38-8125-45f2-b397-86136d5894a4
> Jan 20 17:12:38 h10 attrd[13260]: warning: crm_dump_peer_hash: crm_find_peer: Node 739512330/h10 = 0x614400 - 302e33d8-7cee-4f3b-97da-b38f0d51b0f6
>
> ### above are the three nodes of the cluster
>
> Jan 20 17:12:38 h10 attrd[13260]: crit: crm_find_peer: Node 739512321 and 17831084 share the same name 'h01'
>
> ### Now there are different nodeids it seems...
>
> Jan 20 17:12:38 h10 attrd[13260]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
> Jan 20 17:12:38 h10 cib[13257]: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321
>
> ### The same again...
>
> (pacemaker-1.1.11-0.7.53, corosync-1.4.7-0.19.6)
>
> As a result the node h01 is offline now. Before updating the software the node was member of the cluster.
>
> On node h01 I see messages like these:
> cib[7439]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 84939948
> cib[7439]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[84939948] - state is now member (was (null))
> cib[7439]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512330
> cib[7439]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[739512330] - state is now member (was (null))
> crmd[7444]: warning: crmd_cs_dispatch: Receiving messages from a node we think is dead: rksaph05[84939948]
> crmd[7444]: notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512330
> corosync[7402]: [MAIN ] Completed service synchronization, ready to provide service.
> crmd[7444]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
>
> An attempt to restart openais did hang with this messages:
> attrd[7442]: notice: attrd_perform_update: Sent update 7: shutdown=1421771193
> corosync[7402]: [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=7444, seq=6) to terminate...
> [message repeats]
>
> So I killed crmd (pid 7444)m and openais shut down.
> Unfortunately the problem still persists...
>
> Regards,
> Ulrich
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems