Mailing List Archive

Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
This is a really great list! With regard to cluster health and
monitoring, I did a bunch of stuff with Swift before turning to nova and
really appreciated the
way each swift service has a "healthcheck" call that can be used by a
monitoring system. While I don't think providing a production-ready
monitoring system should be part of core OpenStack, it is the core
architects who really know what needs to be checked to ensure that a
system is healthy. There are various sets of poking at ports, process
lists and so on that Crowbar, Zenoss, etc. set up but it would be a big
improvement for deployers if each openstack service provided healthcheck
apis based on expert knowledge of what is supposed to be happening
inside. That would also insulate deployers from changes in the code that
might impact what it means to be running properly. Looking forward to
the discussion.

-David



On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote:
> Interested in devops.
>
> Off the top of my head.
>
> live upgrades
> api queryable indications of cluster health
> api queryable cluster version and configuration info
> enabling monitoring as a first class concern in OpenStack (either as a
> cross cutting concern, or as it's own project)
> a framework for gathering and sharing performance benchmarks with
> architecture and configuration
>
>
> On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor <duncan@dreamhost.com
> <mailto:duncan@dreamhost.com>> wrote:
>
> For anyone interested in DevOps, Ops, cloud hosting management, etc.,
> there's a proposed session we could use your feedback on for topics of
> discussion:
> http://summit.openstack.org/sessions/view/57
>
> Respond with your thoughts and ideas, and I'll be sure to add them
> to the list.
>
> Thanks!
>
> d
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> <https://launchpad.net/%7Eopenstack>
> Post to : openstack@lists.launchpad.net
> <mailto:openstack@lists.launchpad.net>
> Unsubscribe : https://launchpad.net/~openstack
> <https://launchpad.net/%7Eopenstack>
> More help : https://help.launchpad.net/ListHelp
>
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild [ In reply to ]
Splitting monitoring into



1. Gathering of metrics (availability, performance) and reporting in a
standard fashion should be part of OpenStack.

2. Best practice sensors should sample the metrics and provide alarms
for issues which could cause service impacts. Posting of these alarms to a
monitoring system should be based on plug ins

3. Reference implementations for standard monitoring systems such as
Nagios should be available that queries the data above and feeds it into the
package selected



Each site does not want to be involved in defining the best practice.
Equally, each monitoring system should not have to have an intimate
understanding of OpenStack to produce a red/green light. The components for
1 and 2 fall under the associated openstack component. Component 3 is the
monitoring solution provider.



Tim



From: openstack-bounces+tim.bell=cern.ch@lists.launchpad.net
[mailto:openstack-bounces+tim.bell=cern.ch@lists.launchpad.net] On Behalf Of
David Kranz
Sent: 06 April 2012 16:44
To: Andrew Clay Shafer
Cc: openstack-operators@lists.openstack.org; openstack; Duncan McGreggor
Subject: Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild



This is a really great list! With regard to cluster health and monitoring, I
did a bunch of stuff with Swift before turning to nova and really
appreciated the
way each swift service has a "healthcheck" call that can be used by a
monitoring system. While I don't think providing a production-ready
monitoring system should be part of core OpenStack, it is the core
architects who really know what needs to be checked to ensure that a system
is healthy. There are various sets of poking at ports, process lists and so
on that Crowbar, Zenoss, etc. set up but it would be a big improvement for
deployers if each openstack service provided healthcheck apis based on
expert knowledge of what is supposed to be happening inside. That would also
insulate deployers from changes in the code that might impact what it means
to be running properly. Looking forward to the discussion.

-David



On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote:

Interested in devops.



Off the top of my head.



live upgrades

api queryable indications of cluster health

api queryable cluster version and configuration info

enabling monitoring as a first class concern in OpenStack (either as a cross
cutting concern, or as it's own project)

a framework for gathering and sharing performance benchmarks with
architecture and configuration





On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor <duncan@dreamhost.com>
wrote:

For anyone interested in DevOps, Ops, cloud hosting management, etc.,
there's a proposed session we could use your feedback on for topics of
discussion:
http://summit.openstack.org/sessions/view/57

Respond with your thoughts and ideas, and I'll be sure to add them to the
list.

Thanks!

d

_______________________________________________
Mailing list: https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
More help : https://help.launchpad.net/ListHelp







_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild [ In reply to ]
I love the idea of providing a documented monitoring API for the
various components, so other tools would know what is exposed and why.
Closely related to this would be providing standardized logging and
documenting error conditions, so various tools could be applied to the
logs (splunk, syslog, logstash, etc.). Making OpenStack operationally
consistent would be a boon to anyone doing tooling to work with it,
rather than everyone having to rediscover what to look for. I'm not
sure it calls for another project per se because of the cross-cutting
concerns, but I could be convinced if OpenStack operations were given
more visibility and forethought.

Thanks,
Matt Ray
Senior Technical Evangelist | Opscode Inc.
matt@opscode.com | (512) 731-2218
Twitter, IRC, GitHub: mattray



On Fri, Apr 6, 2012 at 1:13 PM, Tim Bell <Tim.Bell@cern.ch> wrote:
>
>
> Splitting monitoring into
>
>
>
> 1.       Gathering of metrics (availability, performance) and reporting in a
> standard fashion should be part of OpenStack.
>
> 2.       Best practice sensors should sample the metrics and provide alarms
> for issues which could cause service impacts. Posting of these alarms to a
> monitoring system should be based on plug ins
>
> 3.       Reference implementations for standard monitoring systems such as
> Nagios should be available that queries the data above and feeds it into the
> package selected
>
>
>
> Each site does not want to be involved in defining the best practice.
> Equally, each monitoring system should not have to have an intimate
> understanding of OpenStack to produce a red/green light.  The components for
> 1 and 2 fall under the associated openstack component. Component 3 is the
> monitoring solution provider.
>
>
>
> Tim
>
>
>
> From: openstack-bounces+tim.bell=cern.ch@lists.launchpad.net
> [mailto:openstack-bounces+tim.bell=cern.ch@lists.launchpad.net] On Behalf Of
> David Kranz
> Sent: 06 April 2012 16:44
> To: Andrew Clay Shafer
> Cc: openstack-operators@lists.openstack.org; openstack; Duncan McGreggor
> Subject: Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild
>
>
>
> This is a really great list! With regard to cluster health and monitoring, I
> did a bunch of stuff with Swift before turning to nova and really
> appreciated the
> way each swift service has a "healthcheck" call that can be used by a
> monitoring system. While I don't think providing a production-ready
> monitoring system should be part of core OpenStack, it is the core
> architects who really know what needs to be checked to ensure that a system
> is healthy. There are various sets of poking at ports, process lists and so
> on that Crowbar, Zenoss, etc. set up but it would be a big improvement for
> deployers if each openstack service provided healthcheck apis based on
> expert knowledge of what is supposed to be happening inside. That would also
> insulate deployers from changes in the code that might impact what it means
> to be running properly. Looking forward to the discussion.
>
>  -David
>
>
>
> On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote:
>
> Interested in devops.
>
>
>
> Off the top of my head.
>
>
>
> live upgrades
>
> api queryable indications of cluster health
>
> api queryable cluster version and configuration info
>
> enabling monitoring as a first class concern in OpenStack (either as a cross
> cutting concern, or as it's own project)
>
> a framework for gathering and sharing performance benchmarks with
> architecture and configuration
>
>
>
>
>
> On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor <duncan@dreamhost.com>
> wrote:
>
> For anyone interested in DevOps, Ops, cloud hosting management, etc.,
> there's a proposed session we could use your feedback on for topics of
> discussion:
>  http://summit.openstack.org/sessions/view/57
>
> Respond with your thoughts and ideas, and I'll be sure to add them to the
> list.
>
> Thanks!
>
> d
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>
>
>
>
> _______________________________________________
>
> Mailing list: https://launchpad.net/~openstack
>
> Post to     : openstack@lists.launchpad.net
>
> Unsubscribe : https://launchpad.net/~openstack
>
> More help   : https://help.launchpad.net/ListHelp
>
>
>
>
> _______________________________________________
> Openstack-operators mailing list
> Openstack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
_______________________________________________
Openstack-operators mailing list
Openstack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild [ In reply to ]
Availability metrics for me are ones that allow me to tell if the service is
up, degraded or down. Each of us as we start production monitoring need to
work out how many nova, glance and swift processes of which type should be
running. Furthermore, we need to add basic 'ping' style probes to see that
the services are responding as expected.



Performance metrics are for cases where we want to record how well the
system is running. Examples of number of REST calls/second, VMs
created/second etc. These are the kind of metrics which feed into capacity
planning, bottleneck identification, trending.



Building up an open, standard and consistent set will avoid duplicate effort
as sites deploy to production and allow us to keep the monitoring up to date
when the internals of OpenStack change.



Tim



From: Huang Zhiteng [mailto:winston.d@gmail.com]
Sent: 09 April 2012 05:42
To: Tim Bell
Cc: David Kranz; Andrew Clay Shafer;
openstack-operators@lists.openstack.org; Duncan McGreggor; openstack
Subject: Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild



Hi Tim,

Could you elaborate more on 'performance metrics'? Like what kind of
metrics are considered as performance ones? Thanks.

On Sat, Apr 7, 2012 at 2:13 AM, Tim Bell <Tim.Bell@cern.ch> wrote:



Splitting monitoring into



1. Gathering of metrics (availability, performance) and reporting in a
standard fashion should be part of OpenStack.

2. Best practice sensors should sample the metrics and provide alarms
for issues which could cause service impacts. Posting of these alarms to a
monitoring system should be based on plug ins

3. Reference implementations for standard monitoring systems such as
Nagios should be available that queries the data above and feeds it into the
package selected



Each site does not want to be involved in defining the best practice.
Equally, each monitoring system should not have to have an intimate
understanding of OpenStack to produce a red/green light. The components for
1 and 2 fall under the associated openstack component. Component 3 is the
monitoring solution provider.



Tim



From: openstack-bounces+tim.bell=cern.ch@lists.launchpad.net
[mailto:openstack-bounces+tim.bell <mailto:openstack-bounces%2Btim.bell>
=cern.ch@lists.launchpad.net] On Behalf Of David Kranz
Sent: 06 April 2012 16:44
To: Andrew Clay Shafer
Cc: openstack-operators@lists.openstack.org; openstack; Duncan McGreggor
Subject: Re: [Openstack] [Ops] OpenStack and Operations: Input from the Wild



This is a really great list! With regard to cluster health and monitoring, I
did a bunch of stuff with Swift before turning to nova and really
appreciated the
way each swift service has a "healthcheck" call that can be used by a
monitoring system. While I don't think providing a production-ready
monitoring system should be part of core OpenStack, it is the core
architects who really know what needs to be checked to ensure that a system
is healthy. There are various sets of poking at ports, process lists and so
on that Crowbar, Zenoss, etc. set up but it would be a big improvement for
deployers if each openstack service provided healthcheck apis based on
expert knowledge of what is supposed to be happening inside. That would also
insulate deployers from changes in the code that might impact what it means
to be running properly. Looking forward to the discussion.

-David



On 4/6/2012 1:06 AM, Andrew Clay Shafer wrote:

Interested in devops.



Off the top of my head.



live upgrades

api queryable indications of cluster health

api queryable cluster version and configuration info

enabling monitoring as a first class concern in OpenStack (either as a cross
cutting concern, or as it's own project)

a framework for gathering and sharing performance benchmarks with
architecture and configuration





On Thu, Apr 5, 2012 at 1:52 PM, Duncan McGreggor <duncan@dreamhost.com>
wrote:

For anyone interested in DevOps, Ops, cloud hosting management, etc.,
there's a proposed session we could use your feedback on for topics of
discussion:
http://summit.openstack.org/sessions/view/57

Respond with your thoughts and ideas, and I'll be sure to add them to the
list.

Thanks!

d

_______________________________________________
Mailing list: https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
More help : https://help.launchpad.net/ListHelp






_______________________________________________
Mailing list: https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
More help : https://help.launchpad.net/ListHelp




_______________________________________________
Mailing list: https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
<https://launchpad.net/%7Eopenstack>
More help : https://help.launchpad.net/ListHelp




--
Regards
Huang Zhiteng