Mailing List Archive

1 2  View All
RE: [PATCH 0/4] promote zcache from staging [ In reply to ]
> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]

Hi Seth --

Good discussion. Even though we disagree, I appreciate
your enthusiasm and your good work on the kernel!

> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> On 08/07/2012 04:47 PM, Dan Magenheimer wrote:
> > I notice your original published benchmarks [1] include
> > N=24, N=28, and N=32, but these updated results do not. Are you planning
> > on completing the runs? Second, I now see the numbers I originally
> > published for what I thought was the same benchmark as yours are actually
> > an order of magnitude larger (in sec) than yours. I didn't notice
> > this in March because we were focused on the percent improvement, not
> > the raw measurements. Since the hardware is highly similar, I suspect
> > it is not a hardware difference but instead that you are compiling
> > a much smaller kernel. In other words, your test case is much
> > smaller, and so exercises zcache much less. My test case compiles
> > a full enterprise kernel... what is yours doing?
>
> I am doing a minimal kernel build for my local hardware
> configuration.
>
> With the reduction in RAM, 1GB to 512MB, I didn't need to do
> test runs with >20 threads to find the peak of the benefit
> curve at 16 threads. Past that, zcache is saturated and I'd
> just be burning up my disk.

I think that's exactly what I said in a snippet of my response
that you deleted. A cache needs to work well both when it
is non-full and when it is full. You are only demonstrating
that it works well when it is non-full. When it is
"saturated", bad things can happen. Finding the "peak of the
benefit" is only half the work of benchmarking.

So it appears you are trying to prove your point by showing
the workloads that look good, while _not_ showing the workloads
that look bad, and then claiming you don't care about those
bad workloads anyway.

> Also, I provide the magnitude numbers (pages, seconds) just
> to show my source data. The %change numbers are the real
> results as they remove build size as a factor.

You'll have to explain what you mean because, if I understand
correctly, this is just not true. Different build sizes
definitely affect memory management differently, just as
different values of N (for make -jN) have an effect.

> > At LSFMM, Andrea
> > Arcangeli pointed out that zcache, for frontswap pages, has no "writeback"
> > capabilities and, when it is full, it simply rejects further attempts
> > to put data in its cache. He said this is unacceptable for KVM and I
> > agreed that it was a flaw that needed to be fixed before zcache should
> > be promoted.
>
> KVM (in-tree) is not a current user of zcache. While the
> use cases of possible future zcache users should be
> considered, I don't think they can be used to prevent promotion.

That wasn't my point. Andrea identified the flaw as an issue
of zcache.

> > A second flaw is that the "demo" zcache has no concept of LRU for
> > either cleancache or frontswap pages, or ability to reclaim pageframes
> > at all for frontswap pages.
> ...
> >
> > A third flaw is that the "demo" version has a very poor policy to
> > determine what pages are "admitted".
> ...
> >
> > I can add more issues to the list, but will stop here.
>
> All of the flaws you list do not prevent zcache from being
> beneficial right now, as my results demonstrate. Therefore,
> the flaws listed are really potential improvements and can
> be done in mainline after promotion. Even if large changes
> are required to make these improvements, they can be made in
> mainline in an incremental and public way.

Your results only demonstrate that zcache is beneficial on
the workloads that you chose to present. But using the same
workload with slightly different parameters (-jN or compiling
a larger kernel), zcache can be _detrimental_, and you've chosen
to not measure or present those cases, even though you did
measure and present some of those cases in your first benchmark
runs posted in March (on an earlier kernel).

I can only speak for myself, but this appears disingenuous to me.

Sorry, but FWIW my vote is still a NACK. IMHO zcache needs major
work before it should be promoted, and I think we should be spending
the time fixing the known flaws rather than arguing about promoting
"demo" code.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] promote zcache from staging [ In reply to ]
On 08/07/2012 03:23 PM, Seth Jennings wrote:
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
>> Some benchmarking numbers demonstrating the I/O saving that can be had
>> with zcache:
>>
>> https://lkml.org/lkml/2012/3/22/383
>
> There was concern that kernel changes external to zcache since v3.3 may
> have mitigated the benefit of zcache. So I re-ran my kernel building
> benchmark and confirmed that zcache is still providing I/O and runtime
> savings.

There was a request made to test with even greater memory pressure to
demonstrate that, at some unknown point, zcache doesn't have real
problems. So I continued out to 32 threads:

N=4..20 is the same data as before except for the pswpin values.
I found a mistake in the way I computed pswpin that changed those
values slightly. However, this didn't change the overall trend.

I also inverted the %change fields since it is a percent change vs the
normal case.

I/O (in pages)
normal zcache change
N pswpin pswpout majflt I/O sum pswpin pswpout majflt I/O sum %I/O
4 0 2 2116 2118 0 0 2125 2125 0%
8 0 575 2244 2819 0 4 2219 2223 -21%
12 1979 4038 3226 9243 1269 2519 3871 7659 -17%
16 21568 47278 9426 78272 7770 15598 9372 32740 -58%
20 50307 127797 15039 193143 20224 40634 17975 78833 -59%
24 186278 364809 45052 596139 47406 90489 30877 168772 -72%
28 274734 777815 53112 1105661 134981 307346 63480 505807 -54%
32 988530 2002087 168662 3159279 324801 723385 140288 1188474 -62%

Runtime (in seconds)
N normal zcache %change
4 126 127 1%
8 124 124 0%
12 131 133 2%
16 189 156 -17%
20 261 235 -10%
24 513 288 -44%
28 556 434 -22%
32 1463 745 -49%

%CPU utilization (out of 400% on 4 cpus)
N normal zcache %change
4 254 253 0%
8 261 263 1%
12 250 248 -1%
16 173 211 22%
20 124 140 13%
24 64 114 78%
28 59 76 29%
32 23 45 96%

The ~60% I/O savings holds even out to 32 threads, at which point the
non-zcache case has 12GB of I/O and is taking 12x longer to complete.
Additionally, the runtime savings increases significantly beyond 20
threads, even though the absolute runtime is suboptimal due to the
extreme memory pressure.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/4] promote zcache from staging [ In reply to ]
> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> On 08/07/2012 03:23 PM, Seth Jennings wrote:
> > On 07/27/2012 01:18 PM, Seth Jennings wrote:
> >> Some benchmarking numbers demonstrating the I/O saving that can be had
> >> with zcache:
> >>
> >> https://lkml.org/lkml/2012/3/22/383
> >
> > There was concern that kernel changes external to zcache since v3.3 may
> > have mitigated the benefit of zcache. So I re-ran my kernel building
> > benchmark and confirmed that zcache is still providing I/O and runtime
> > savings.
>
> There was a request made to test with even greater memory pressure to
> demonstrate that, at some unknown point, zcache doesn't have real
> problems. So I continued out to 32 threads:

Hi Seth --

Thanks for continuing with running the 24-32 thread benchmarks.

> Runtime (in seconds)
> N normal zcache %change
> 4 126 127 1%

> threads, even though the absolute runtime is suboptimal due to the
> extreme memory pressure.

I am not in a position right now to reproduce your results or
mine (due to a house move which is limiting my time and access
to my test machines, plus two presentations later this month at
Linuxcon NA and Plumbers) but I still don't think you've really
saturated the cache, which is when the extreme memory pressure
issues will show up in zcache. I suspect that adding more threads
to a minimal kernel compile doesn't increase the memory pressure as
much as I was seeing, so you're not seeing what I was seeing:
the zcache number climb to as much as 150% WORSE than non-zcache.
In various experiments trying variations, I have seen four-fold
degradations and worse.

My test case is a kernel compile using a full OL kernel config
file, which is roughly equivalent to a RHEL6 config. Compiling
this kernel, using similar hardware, I have never seen a runtime
less than ~800 seconds for any value of N. I suspect that my
test case, having much more source to compile, causes the N threads
in a "make -jN" each have more work to do, in parallel.

Since your test harness is obviously all set up, would you be
willing to reproduce your/my non-zcache/zcache runs with a RHEL6
config file and publish the results (using a 3.5 zcache)?

IIRC, the really bad zcache results starting showing up at N=24.
I also wonder if you have anything else unusual in your
test setup, such as a fast swap disk (mine is a partition
on the same rotating disk as source and target of the kernel build,
the default install for a RHEL6 system)? Or have you disabled
cleancache? Or have you changed any sysfs parameters or
other kernel files? Also, whether zcache or non-zcache,
I've noticed that the runtime of this workload when swapping
can vary by as much as 30-40%, so it would be wise to take at
least three samples to ensure a statistically valid comparison.
And are you using 512M of physical memory or relying on
kernel boot parameters to reduce visible memory... and
if the latter have you confirmed with /proc/meminfo?
Obviously, I'm baffled at the difference in our observations.

While I am always willing to admit that my numbers may be wrong,
I still can't imagine why you are in such a hurry to promote
zcache when these questions are looming. Would you care to
explain why? It seems reckless to me, and unlike the IBM
behavior I expect, so I really wonder about the motivation.

My goal is very simple: "First do no harm". I don't think
zcache should be enabled for distros (and users) until we can
reasonably demonstrate that running a workload with zcache
is never substantially worse than running the same workload
without zcache. If you can tell your customer: "Yes, always enable
zcache", great! But if you have to tell your customer: "It
depends on the workload, enable it if it works for you, disable
it otherwise", then zcache will get a bad reputation, and
will/should never be enabled in a reputable non-hobbyist distro.
I fear the "demo" zcache will get a bad reputation
so prefer to delay promotion while there is serious doubt
about whether "harm" may occur.

Last, you've never explained what problems zcache solves
for you that zram does not. With Minchan pushing for
the promotion of zram+zsmalloc, does zram solve your problem?
Another alternative might be to promote zcache as "demozcache"
(i.e. fork it for now).

It's hard to identify a reasonable compromise when you
are just saying "Gotta promote zcache NOW!" and not
explaining the problem you are trying to solve or motivations
behind it.

OK, Seth, I think all my cards are on the table. Where's yours?
(And, hello, is anyone else following this anyway? :-)

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] promote zcache from staging [ In reply to ]
On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> I also wonder if you have anything else unusual in your
> test setup, such as a fast swap disk (mine is a partition
> on the same rotating disk as source and target of the kernel build,
> the default install for a RHEL6 system)?

I'm using a normal SATA HDD with two partitions, one for
swap and the other an ext3 filesystem with the kernel source.

> Or have you disabled cleancache?

Yes, I _did_ disable cleancache. I could see where having
cleancache enabled could explain the difference in results.

> Or have you changed any sysfs parameters or
> other kernel files?

No.

> And are you using 512M of physical memory or relying on
> kernel boot parameters to reduce visible memory

Limited with mem=512M boot parameter.

> ... and
> if the latter have you confirmed with /proc/meminfo?

Yes, confirmed.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] promote zcache from staging [ In reply to ]
On 07/27/2012 01:18 PM, Seth Jennings wrote:
> zcache is the remaining piece of code required to support in-kernel
> memory compression. The other two features, cleancache and frontswap,
> have been promoted to mainline in 3.0 and 3.5. This patchset
> promotes zcache from the staging tree to mainline.
>
> Based on the level of activity and contributions we're seeing from a
> diverse set of people and interests, I think zcache has matured to the
> point where it makes sense to promote this out of staging.

I am wondering if there is any more discussion to be had on
the topic of promoting zcache. The discussion got dominated
by performance concerns, but hopefully my latest performance
metrics have alleviated those concerns for most and shown
the continuing value of zcache in both I/O and runtime savings.

I'm not saying that zcache development is complete by any
means. There are still many improvements that can be made.
I'm just saying that I believe it is stable and beneficial
enough to leave the staging tree.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] promote zcache from staging [ In reply to ]
Hi Seth,

On Tue, Aug 14, 2012 at 05:18:57PM -0500, Seth Jennings wrote:
> On 07/27/2012 01:18 PM, Seth Jennings wrote:
> > zcache is the remaining piece of code required to support in-kernel
> > memory compression. The other two features, cleancache and frontswap,
> > have been promoted to mainline in 3.0 and 3.5. This patchset
> > promotes zcache from the staging tree to mainline.
> >
> > Based on the level of activity and contributions we're seeing from a
> > diverse set of people and interests, I think zcache has matured to the
> > point where it makes sense to promote this out of staging.
>
> I am wondering if there is any more discussion to be had on
> the topic of promoting zcache. The discussion got dominated
> by performance concerns, but hopefully my latest performance
> metrics have alleviated those concerns for most and shown
> the continuing value of zcache in both I/O and runtime savings.
>
> I'm not saying that zcache development is complete by any
> means. There are still many improvements that can be made.
> I'm just saying that I believe it is stable and beneficial
> enough to leave the staging tree.
>
> Seth

I want to do some clean up on zcache but I'm okay after it is promoted
if Andrew merge it. But I'm not sure he doesn't mind it due to not good code
quality which includes not enough comment, not good variable/function name,
many code duplication of ramster).
Anyway, I think we should unify common code between zcache and ramster
before promoting at least. Otherwise, it would make code refactoring hard
because we always have to touch both side for just a clean up. It means
zcache contributor for the clean up should know well ramster too and it's
not desirable.


>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] promote zcache from staging [ In reply to ]
On Fri, Aug 10, 2012 at 01:14:01PM -0500, Seth Jennings wrote:
> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> > I also wonder if you have anything else unusual in your
> > test setup, such as a fast swap disk (mine is a partition
> > on the same rotating disk as source and target of the kernel build,
> > the default install for a RHEL6 system)?
>
> I'm using a normal SATA HDD with two partitions, one for
> swap and the other an ext3 filesystem with the kernel source.
>
> > Or have you disabled cleancache?
>
> Yes, I _did_ disable cleancache. I could see where having
> cleancache enabled could explain the difference in results.

Why did you disable the cleancache? Having both (cleancache
to compress fs data) and frontswap (to compress swap data) is the
goal - while you turned one of its sources off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] promote zcache from staging [ In reply to ]
On 08/15/2012 04:38 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 10, 2012 at 01:14:01PM -0500, Seth Jennings wrote:
>> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
>>> I also wonder if you have anything else unusual in your
>>> test setup, such as a fast swap disk (mine is a partition
>>> on the same rotating disk as source and target of the kernel build,
>>> the default install for a RHEL6 system)?
>>
>> I'm using a normal SATA HDD with two partitions, one for
>> swap and the other an ext3 filesystem with the kernel source.
>>
>>> Or have you disabled cleancache?
>>
>> Yes, I _did_ disable cleancache. I could see where having
>> cleancache enabled could explain the difference in results.
>
> Why did you disable the cleancache? Having both (cleancache
> to compress fs data) and frontswap (to compress swap data) is the
> goal - while you turned one of its sources off.

I excluded cleancache to reduce interference/noise from the
benchmarking results. For this particular workload,
cleancache doesn't make a lot of sense since it will steal
pages that could otherwise be used for storing frontswap
pages to prevent swapin/swapout I/O.

In a test run with both enabled, I found that it didn't make
much difference under moderate to extreme memory pressure.
Both resulted in about 55% I/O reduction. However, on light
memory pressure with 8 and 12 threads, it lowered the I/O
reduction ability of zcache to roughly 0 compared to ~20%
I/O reduction without cleancache.

In short, cleancache only had the power to harm in this
case, so I didn't enable it.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/4] promote zcache from staging [ In reply to ]
> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> > I also wonder if you have anything else unusual in your
> > test setup, such as a fast swap disk (mine is a partition
> > on the same rotating disk as source and target of the kernel build,
> > the default install for a RHEL6 system)?
>
> I'm using a normal SATA HDD with two partitions, one for
> swap and the other an ext3 filesystem with the kernel source.
>
> > Or have you disabled cleancache?
>
> Yes, I _did_ disable cleancache. I could see where having
> cleancache enabled could explain the difference in results.

Sorry to beat a dead horse, but I meant to report this
earlier in the week and got tied up by other things.

I finally got my test scaffold set up earlier this week
to try to reproduce my "bad" numbers with the RHEL6-ish
config file.

I found that with "make -j28" and "make -j32" I experienced
__DATA CORRUPTION__. This was repeatable.

The type of error led me to believe that the problem was
due to concurrency of cleancache reclaim. I did not try
with cleancache disabled to prove/support this theory
but it is consistent with the fact that you (Seth) have not
seen a similar problem and has disabled cleancache.

While this problem is most likely in my code and I am
suitably chagrined, it re-emphasizes the fact that
the current zcache in staging is 20-month old "demo"
code. The proposed new zcache codebase handles concurrency
much more effectively.

I'll be away from email for a few days now.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] promote zcache from staging [ In reply to ]
On 08/17/2012 05:21 PM, Dan Magenheimer wrote:
>> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
>> Subject: Re: [PATCH 0/4] promote zcache from staging
>>
>> On 08/09/2012 03:20 PM, Dan Magenheimer wrote
>>> I also wonder if you have anything else unusual in your
>>> test setup, such as a fast swap disk (mine is a partition
>>> on the same rotating disk as source and target of the kernel build,
>>> the default install for a RHEL6 system)?
>>
>> I'm using a normal SATA HDD with two partitions, one for
>> swap and the other an ext3 filesystem with the kernel source.
>>
>>> Or have you disabled cleancache?
>>
>> Yes, I _did_ disable cleancache. I could see where having
>> cleancache enabled could explain the difference in results.
>
> Sorry to beat a dead horse, but I meant to report this
> earlier in the week and got tied up by other things.
>
> I finally got my test scaffold set up earlier this week
> to try to reproduce my "bad" numbers with the RHEL6-ish
> config file.
>
> I found that with "make -j28" and "make -j32" I experienced
> __DATA CORRUPTION__. This was repeatable.

I actually hit this for the first time a few hours ago when
I was running performance for your rewrite. I didn't know
what to make of it yet. The 24-thread kernel build failed
when both frontswap and cleancache were enabled.

> The type of error led me to believe that the problem was
> due to concurrency of cleancache reclaim. I did not try
> with cleancache disabled to prove/support this theory
> but it is consistent with the fact that you (Seth) have not
> seen a similar problem and has disabled cleancache.
>
> While this problem is most likely in my code and I am
> suitably chagrined, it re-emphasizes the fact that
> the current zcache in staging is 20-month old "demo"
> code. The proposed new zcache codebase handles concurrency
> much more effectively.

I imagine this can be solved without rewriting the entire
codebase. If your new code contains a fix for this, can we
just pull it as a single patch?

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/4] promote zcache from staging [ In reply to ]
> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Sent: Friday, August 17, 2012 5:33 PM
> To: Dan Magenheimer
> Cc: Greg Kroah-Hartman; Andrew Morton; Nitin Gupta; Minchan Kim; Konrad Wilk; Robert Jennings; linux-
> mm@kvack.org; linux-kernel@vger.kernel.org; devel@driverdev.osuosl.org; Kurt Hackel
> Subject: Re: [PATCH 0/4] promote zcache from staging
>
> >
> > Sorry to beat a dead horse, but I meant to report this
> > earlier in the week and got tied up by other things.
> >
> > I finally got my test scaffold set up earlier this week
> > to try to reproduce my "bad" numbers with the RHEL6-ish
> > config file.
> >
> > I found that with "make -j28" and "make -j32" I experienced
> > __DATA CORRUPTION__. This was repeatable.
>
> I actually hit this for the first time a few hours ago when
> I was running performance for your rewrite. I didn't know
> what to make of it yet. The 24-thread kernel build failed
> when both frontswap and cleancache were enabled.
>
> > The type of error led me to believe that the problem was
> > due to concurrency of cleancache reclaim. I did not try
> > with cleancache disabled to prove/support this theory
> > but it is consistent with the fact that you (Seth) have not
> > seen a similar problem and has disabled cleancache.
> >
> > While this problem is most likely in my code and I am
> > suitably chagrined, it re-emphasizes the fact that
> > the current zcache in staging is 20-month old "demo"
> > code. The proposed new zcache codebase handles concurrency
> > much more effectively.
>
> I imagine this can be solved without rewriting the entire
> codebase. If your new code contains a fix for this, can we
> just pull it as a single patch?

Hi Seth --

I didn't even observe this before this week, let alone fix this
as an individual bug. The redesign takes into account LRU ordering
and zombie pageframes (which have valid pointers to the contained
zbuds and possibly valid data, so can't be recycled yet),
taking races and concurrency carefully into account.

The demo codebase is pretty dumb about concurrency, really
a hack that seemed to work. Given the above, I guess the
hack only works _most_ of the time... when it doesn't
data corruption can occur.

It would be an interesting challenge, but likely very
time-consuming, to fix this one bug while minimizing other
changes so that the fix could be delivered as a self-contained
incremental patch. I suspect if you try, you will learn why
the rewrite was preferable and necessary.

(Away from email for a few days very soon now.)
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All