Mailing List Archive

Reallocation and VMware
Hi all,

I'd like to pick your collective brains about your experiences with
reallocate, specifically when reallocating luns under VMware.

For background, we're running ONTAP 8.0.1 on a 3170 that's over three
years old. I've been going through measuring reallocation, and most
of the volumes are over 3. We have no snapshots, and only a
relatively small number of volumes are de-duplicated. All our volumes
and luns are thin-provisioned, and no aggregate is more than 76% full
(most are ~65%). We regularly have huge latency spikes (worst I've
seen so far is 5000000ms, and there are far too many to even track
over 50000ms daily), and on one filer head, but not its partner, I
regularly see disk utilisation go to 100% or more. I'm hoping
reallocate will help here.

I have a brief note from a NetApp support person who says "It’s very
important that you complete the reallocation in the following order:
1:OS 2:LUN 3: Volume".

I have two questions about this:
- is it absolutely necessary to defrag the OS before you reallocate
the lun? I'm sure I've run reallocate without defraging the OS and
still seen performance improvements. I'm also assuming that this is
only relevant to Windows VMs, not Linux (in our case, Red Hat/CentOS)
ones.
- if you only have one lun per volume, do you still need to run
reallocate on both the lun and the volume? If only one, which is
preferable?

All advice appreciated.

Thanks,
Peta

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Reallocation and VMware [ In reply to ]
Id love to see a statit during this high latency spike you've got there..

You can Email me separately, if you like. (Maybe even a hostname too)



On Wed, Apr 25, 2012 at 9:49 PM, Peta Thames <petathames@gmail.com> wrote:

> Hi all,
>
> I'd like to pick your collective brains about your experiences with
> reallocate, specifically when reallocating luns under VMware.
>
> For background, we're running ONTAP 8.0.1 on a 3170 that's over three
> years old. I've been going through measuring reallocation, and most
> of the volumes are over 3. We have no snapshots, and only a
> relatively small number of volumes are de-duplicated. All our volumes
> and luns are thin-provisioned, and no aggregate is more than 76% full
> (most are ~65%). We regularly have huge latency spikes (worst I've
> seen so far is 5000000ms, and there are far too many to even track
> over 50000ms daily), and on one filer head, but not its partner, I
> regularly see disk utilisation go to 100% or more. I'm hoping
> reallocate will help here.
>
> I have a brief note from a NetApp support person who says "It’s very
> important that you complete the reallocation in the following order:
> 1:OS 2:LUN 3: Volume".
>
> I have two questions about this:
> - is it absolutely necessary to defrag the OS before you reallocate
> the lun? I'm sure I've run reallocate without defraging the OS and
> still seen performance improvements. I'm also assuming that this is
> only relevant to Windows VMs, not Linux (in our case, Red Hat/CentOS)
> ones.
> - if you only have one lun per volume, do you still need to run
> reallocate on both the lun and the volume? If only one, which is
> preferable?
>
> All advice appreciated.
>
> Thanks,
> Peta
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>



--
---
Gustatus Similis Pullus
Re: Reallocation and VMware [ In reply to ]
Have you checked the alignment of the VMDK's?

Jack
Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: Peta Thames <petathames@gmail.com>
Sender: toasters-bounces@teaparty.net
Date: Thu, 26 Apr 2012 14:49:43
To: <Toasters@teaparty.net>
Subject: Reallocation and VMware

Hi all,

I'd like to pick your collective brains about your experiences with
reallocate, specifically when reallocating luns under VMware.

For background, we're running ONTAP 8.0.1 on a 3170 that's over three
years old. I've been going through measuring reallocation, and most
of the volumes are over 3. We have no snapshots, and only a
relatively small number of volumes are de-duplicated. All our volumes
and luns are thin-provisioned, and no aggregate is more than 76% full
(most are ~65%). We regularly have huge latency spikes (worst I've
seen so far is 5000000ms, and there are far too many to even track
over 50000ms daily), and on one filer head, but not its partner, I
regularly see disk utilisation go to 100% or more. I'm hoping
reallocate will help here.

I have a brief note from a NetApp support person who says "It’s very
important that you complete the reallocation in the following order:
1:OS 2:LUN 3: Volume".

I have two questions about this:
- is it absolutely necessary to defrag the OS before you reallocate
the lun? I'm sure I've run reallocate without defraging the OS and
still seen performance improvements. I'm also assuming that this is
only relevant to Windows VMs, not Linux (in our case, Red Hat/CentOS)
ones.
- if you only have one lun per volume, do you still need to run
reallocate on both the lun and the volume? If only one, which is
preferable?

All advice appreciated.

Thanks,
Peta

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Reallocation and VMware [ In reply to ]
Hi Jack,

You're right, and I should have mentioned it before. Large numbers of
the VMDKs are misaligned. I'd estimate about 33%, but I don't know
exactly how many as the shiny new VSC scanner got stuck halfway
through the scan I ran, leaving several VMs in a "being scanned"
state. I have a case open with Netapp to find out how to get those
VMs out of that state so I can a) continue the scan b) schedule fixing
the misaligned luns.

Not all the luns that have large latency spikes are misaligned
however. Mind you, by the same token, not all of them are fragmented,
although so far (I'm still getting through measuring them all) there's
definitely a strong correlation.

I also have to admit that I read the scale wrong in perf advisor, and
the numbers I'm seeing are in microseconds, not milliseconds. Still
way more than the 10ms I would like, but an order of magnitude better!

Peta

On 26 April 2012 15:52, Jack Lyons <jack1729@gmail.com> wrote:
> Have you checked the alignment of the VMDK's?
>
> Jack
> Sent from my Verizon Wireless BlackBerry
>
> -----Original Message-----
> From: Peta Thames <petathames@gmail.com>
> Sender: toasters-bounces@teaparty.net
> Date: Thu, 26 Apr 2012 14:49:43
> To: <Toasters@teaparty.net>
> Subject: Reallocation and VMware
>
> Hi all,
>
> I'd like to pick your collective brains about your experiences with
> reallocate, specifically when reallocating luns under VMware.
>
> For background, we're running ONTAP 8.0.1 on a 3170 that's over three
> years old.  I've been going through measuring reallocation, and most
> of the volumes are over 3.  We have no snapshots, and only a
> relatively small number of volumes are de-duplicated.  All our volumes
> and luns are thin-provisioned, and no aggregate is more than 76% full
> (most are ~65%).  We regularly have huge latency spikes (worst I've
> seen so far is 5000000ms, and there are far too many to even track
> over 50000ms daily), and on one filer head, but not its partner, I
> regularly see disk utilisation go to 100% or more.  I'm hoping
> reallocate will help here.
>
> I have a brief note from a NetApp support person who says "It’s very
> important that you complete the reallocation in the following order:
> 1:OS 2:LUN 3: Volume".
>
> I have two questions about this:
>  - is it absolutely necessary to defrag the OS before you reallocate
> the lun?  I'm sure I've run reallocate without defraging the OS and
> still seen performance improvements.  I'm also assuming that this is
> only relevant to Windows VMs, not Linux (in our case, Red Hat/CentOS)
> ones.
>  - if you only have one lun per volume, do you still need to run
> reallocate on both the lun and the volume?  If only one, which is
> preferable?
>
> All advice appreciated.
>
> Thanks,
> Peta
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Reallocation and VMware [ In reply to ]
Running 7.3.5.stuff here so please take with 8.0 grains of salt :)

1) File system alignment is the most important thing to do (you may
already be aligned, just making sure others reading this are aware).
2) Am I correct in assuming you are using iSCSI for your data stores?
Make sure the VMFS file systems are also aligned with the NetApp blocks
3) Are you using LVMs? We had a problem with our CentOS boxes where
crontab had a job running weekly at 4:22 AM on Mondays where they did a
raid check which occasionally brought our 3070 to it's knees
4) Are your latency spikes being measured from the vNIC to the filer
interface ? /stats show -i 3 iscsi (nfsv3 for NFS) /will give you a good
overview, /stats show -i 3 lun /will give a per LUN view of the same
kind of thing. Are the spikes in read or write times or network
specific? If they are due to the network itself you may want to look at
your hypervisor's network config.
5) If it's specifically slow on writes instead of reads you may need to
run a AGGR reallocate to get your free space in contiguous blocks. If
this is something you need to do it's probably because you're filer was
either *really* full or you added disks to an aggr late in the game.

statit will give you a better picture of what your disks are doing
individually than sysstat's %utilized, especially on a filer with a ton
of disks, sysstat shows the busiest during it's interval, not always the
best metric. If you have disks in the same raid group with drasticly
different IO times then maybe a reallocate is worth while.

Finally if you're using VMDK files inside of VMFS and not mounting your
iSCSI LUNs as RDMs or something you may want to consider reallocating
the VMDK files as well.

If you're using EXT3 I doubt the host file system is the problem -
although if it's a LVM all bets are off, we don't use them in our
environment.




*Jeremy Page*|Senior Technical Architect|*Gilbarco Veeder-Root, A
Danaher Company*
*Office:*336-547-5399|*Cell:*336-601-7274|*24x7 Emergency:*336-430-8151

On 04/26/2012 12:49 AM, Peta Thames wrote:
> - is it absolutely necessary to defrag the OS before you reallocate
> the lun? I'm sure I've run reallocate without defraging the OS and
> still seen performance improvements. I'm also assuming that this is
> only relevant to Windows VMs, not Linux (in our case, Red Hat/CentOS)
> ones.


Please be advised that this email may contain confidential
information. If you are not the intended recipient, please notify us
by email by replying to the sender and delete this message. The
sender disclaims that the content of this email constitutes an offer
to enter into, or the acceptance of, any agreement; provided that the
foregoing does not invalidate the binding effect of any digital or
other electronic reproduction of a manual signature that is included
in any attachment.
Re: Reallocation and VMware [ In reply to ]
On Wed, Apr 25, 2012 at 9:49 PM, Peta Thames <petathames@gmail.com> wrote:
> Hi all,
>  We regularly have huge latency spikes (worst I've
> seen so far is 5000000ms, and there are far too many to even track
> over 50000ms daily), and on one filer head,

Quick question, the latencies you are seeing are being measured in
microseconds (µs) and not milliseconds (ms). Am I correct? 500,000ms
is a lots of seconds ;)

-net

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Reallocation and VMware [ In reply to ]
Hi Jeremy,

1) Yeah, file alignment is a pain here. I'm keen to fix it but
internal change processes mean it'll take a while.

2) Sorry, I should have said this is all FC.

3) Yeah we do have about 300 LVMs, and over 1000 Windows VMs of
various versions. I do need to check crons and DBs and anything that
could be running scheduled jobs as we do have spikes at regular times
(as well as irregular). check_raid is one I'd forgotten about though,
thanks. We do have a daily spike shortly after 4am, hmm...

4) The spikes are a mix of writes and reads, I suspect it's dependent
on what the application is doing at the time.

5) I've only been here 3 weeks so I don't really know the history of
the filer. It certainly may have been much fuller than it is now in
the past.

I'm teaching myself to read statit, there's a lot in there that I
haven't been paying attention to. Jeff Mohler has been really helpful
here looking at statits from my filers, and pointed out that we spend
as much time writing stripes only 1 disk wide as we do the entire
width and all other possible stripe widths, which explains a lot. I
think I need to get both the misalignment and the fragmentation issues
fixed, easily done on a technical level, harder on an internal process
level!

Thanks for your help.

Peta

On 26 April 2012 23:21, Jeremy Page <jeremy.page@gilbarco.com> wrote:
> Running 7.3.5.stuff here so please take with 8.0 grains of salt :)
>
> 1) File system alignment is the most important thing to do (you may already
> be aligned, just making sure others reading this are aware).
> 2) Am I correct in assuming you are using iSCSI for your data stores? Make
> sure the VMFS file systems are also aligned with the NetApp blocks
> 3) Are you using LVMs? We had a problem with our CentOS boxes where crontab
> had a job running weekly at 4:22 AM on Mondays where they did a raid check
> which occasionally brought our 3070 to it's knees
> 4) Are your latency spikes being measured from the vNIC to the filer
> interface ?  stats show -i 3 iscsi (nfsv3 for NFS) will give you a good
> overview, stats show -i 3 lun will give a per LUN view of the same kind of
> thing. Are the spikes in read or write times or network specific? If they
> are due to the network itself you may want to look at your hypervisor's
> network config.
> 5) If it's specifically slow on writes instead of reads you may need to run
> a AGGR reallocate to get your free space in contiguous blocks. If this is
> something you need to do it's probably because you're filer was either
> *really* full or you added disks to an aggr late in the game.
>
> statit will give you a better picture of what your disks are doing
> individually than sysstat's %utilized, especially on a filer with a ton of
> disks, sysstat shows the busiest during it's interval, not always the best
> metric. If you have disks in the same raid group with drasticly different IO
> times then maybe a reallocate is worth while.
>
> Finally if you're using VMDK files inside of VMFS and not mounting your
> iSCSI LUNs as RDMs or something you may want to consider reallocating the
> VMDK files as well.
>
> If you're using EXT3 I doubt the host file system is the problem - although
> if it's a LVM all bets are off, we don't use them in our environment.
>
>
>
>
>
> Jeremy Page|Senior Technical Architect|Gilbarco Veeder-Root, A Danaher
> Company
> Office:336-547-5399|Cell:336-601-7274|24x7 Emergency:336-430-8151
>
> On 04/26/2012 12:49 AM, Peta Thames wrote:
>
> - is it absolutely necessary to defrag the OS before you reallocate
> the lun? I'm sure I've run reallocate without defraging the OS and
> still seen performance improvements. I'm also assuming that this is
> only relevant to Windows VMs, not Linux (in our case, Red Hat/CentOS)
> ones.
>
>
> Please be advised that this email may contain confidential information. If
> you are not the intended recipient, please notify us by email by replying to
> the sender and delete this message. The sender disclaims that the content of
> this email constitutes an offer to enter into, or the acceptance of, any
> agreement; provided that the foregoing does not invalidate the binding
> effect of any digital or other electronic reproduction of a manual signature
> that is included in any attachment.
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Reallocation and VMware [ In reply to ]
Peta - we were dealing with this very issue (unexplained latency spikes Netapp blamed on VM misalignment)
back in 2010 - I wrote up how we deconstructed the IOPs after many wasted perfstat iterations
to solve it pretty much on our own:

http://www.vmadmin.info/2010/07/vmware-and-netapp-deconstructing.html

It was maddening to me back in 2010 how netapp support could blockade support cases with a
blanket "must align VMs first" without a real quantification of the impact of misalignment - see

http://www.vmadmin.info/2010/07/quantifying-vmdk-misalignment.html

We ended up taking the downtime back then to align all VMs
But now, I would be one to encourage your making the leap to 8.x - we are on 8.1GA and we are not looking back.
The data motion of vFilers is allowing us to upgrade clusters with no downtime

http://www.vmadmin.info/2012/04/meta-storage-vmotion-netapp-datamotion.html

They have me almost believing in cluster mode for scale out...







On Apr 25, 2012, at 11:40 PM, Peta Thames wrote:

> Hi Jack,
>
> You're right, and I should have mentioned it before. Large numbers of
> the VMDKs are misaligned. I'd estimate about 33%, but I don't know
> exactly how many as the shiny new VSC scanner got stuck halfway
> through the scan I ran, leaving several VMs in a "being scanned"
> state. I have a case open with Netapp to find out how to get those
> VMs out of that state so I can a) continue the scan b) schedule fixing
> the misaligned luns.
>
> Not all the luns that have large latency spikes are misaligned
> however. Mind you, by the same token, not all of them are fragmented,
> although so far (I'm still getting through measuring them all) there's
> definitely a strong correlation.
>
> I also have to admit that I read the scale wrong in perf advisor, and
> the numbers I'm seeing are in microseconds, not milliseconds. Still
> way more than the 10ms I would like, but an order of magnitude better!
>
> Peta
>
> On 26 April 2012 15:52, Jack Lyons <jack1729@gmail.com> wrote:
>> Have you checked the alignment of the VMDK's?
>>
>> Jack
>> Sent from my Verizon Wireless BlackBerry
>>
>> -----Original Message-----
>> From: Peta Thames <petathames@gmail.com>
>> Sender: toasters-bounces@teaparty.net
>> Date: Thu, 26 Apr 2012 14:49:43
>> To: <Toasters@teaparty.net>
>> Subject: Reallocation and VMware
>>
>> Hi all,
>>
>> I'd like to pick your collective brains about your experiences with
>> reallocate, specifically when reallocating luns under VMware.
>>
>> For background, we're running ONTAP 8.0.1 on a 3170 that's over three
>> years old. I've been going through measuring reallocation, and most
>> of the volumes are over 3. We have no snapshots, and only a
>> relatively small number of volumes are de-duplicated. All our volumes
>> and luns are thin-provisioned, and no aggregate is more than 76% full
>> (most are ~65%). We regularly have huge latency spikes (worst I've
>> seen so far is 5000000ms, and there are far too many to even track
>> over 50000ms daily), and on one filer head, but not its partner, I
>> regularly see disk utilisation go to 100% or more. I'm hoping
>> reallocate will help here.
>>
>> I have a brief note from a NetApp support person who says "It’s very
>> important that you complete the reallocation in the following order:
>> 1:OS 2:LUN 3: Volume".
>>
>> I have two questions about this:
>> - is it absolutely necessary to defrag the OS before you reallocate
>> the lun? I'm sure I've run reallocate without defraging the OS and
>> still seen performance improvements. I'm also assuming that this is
>> only relevant to Windows VMs, not Linux (in our case, Red Hat/CentOS)
>> ones.
>> - if you only have one lun per volume, do you still need to run
>> reallocate on both the lun and the volume? If only one, which is
>> preferable?
>>
>> All advice appreciated.
>>
>> Thanks,
>> Peta
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters@teaparty.net
>> http://www.teaparty.net/mailman/listinfo/toasters
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters