Mailing List Archive: gettimeofday() jumping into the future

gettimeofday() jumping into the future

Aug 23, 2007, 4:08 AM

Post #1 of 23 (40811 views)

Hi,

We've been seeing some strange behaviour on some of our applications
recently. I've tracked this down to gettimeofday() returning spurious
values occasionally.

Specifically, gettimeofday() will suddenly, for a single call, return
a value about 4398 seconds (~1 hour 13 minutes) in the future. The
following call goes back to a normal value.

This seems to be occurring when the clock source goes slightly
backwards for a single call. In
kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;

So a small decrease in time here will (this is all unsigned
arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
this by some value, then right shifts by 22. The resulting value (in
nanoseconds) is approximately 4398 seconds; this gets added on to the
xtime value, giving us our jump into the future. The next call to
gettimeofday() returns to normal as we don't have this huge nanosecond
offset.

This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit
mode. It's a dell poweredge 1950. The kernel selects the TSC as the
clock source, having determined that the tsc runs synchronously on
this system. Switching the systems to use a different time source
seems to make the problem go away (which is fine for us, but we'd like
to get this fixed properly upstream).

We've also seen this behaviour with a synthetic test program (which
just runs 4 threads all calling gettimeofday() in a loop as fast as
possible and testing that it doesn't jump) on an older machine, a dell
poweredge SC1425 with two p4 hyperthreaded xeons.

Can anyone advise on what's going wrong here? I can't find much in the
way of documentation on whether the TSC is guaranteed to be
monotonically increasing on intel systems. Should the code choose not
to use the TSC? Or should the TSC reading code ensure that the
returned values are monotonic?

Is there any more information that would be useful? I'll be on a plane
for most of tomorrow, so might be a little slow responding.

Thanks,

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

gbritton at doomcom

Aug 23, 2007, 4:36 AM

Post #2 of 23 (40468 views)

On Thu, Aug 23, 2007 at 01:08:27PM +0200, Michael Smith wrote:
> Hi,
>
> We've been seeing some strange behaviour on some of our applications
> recently. I've tracked this down to gettimeofday() returning spurious
> values occasionally.
>
> Specifically, gettimeofday() will suddenly, for a single call, return
> a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> following call goes back to a normal value.

I have seen this as well (on a 2.6.20.4 kernel). The value returned was
always identical each time the glitch occured (just a little over 4398
seconds). I saw it watching packet receive timestamps and on the system in
question, it would generally hit this problem around once a minute. When
moving forward to a 2.6.21 kernel, the problem seemed to go away (also back
to 2.6.17, unfortunately I didn't have any sample points inbetween).
I didn't have free time to spend bisecting attempting to find when the
behavior started or stopped.

The hardware in this case was an HP Proliant DL380 G5 with two dueal-core
Core2 processors and was using the tsc as timesource.

-- Gerald

> This seems to be occurring when the clock source goes slightly
> backwards for a single call. In
> kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
> cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
>
> So a small decrease in time here will (this is all unsigned
> arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
> this by some value, then right shifts by 22. The resulting value (in
> nanoseconds) is approximately 4398 seconds; this gets added on to the
> xtime value, giving us our jump into the future. The next call to
> gettimeofday() returns to normal as we don't have this huge nanosecond
> offset.
>
> This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit
> mode. It's a dell poweredge 1950. The kernel selects the TSC as the
> clock source, having determined that the tsc runs synchronously on
> this system. Switching the systems to use a different time source
> seems to make the problem go away (which is fine for us, but we'd like
> to get this fixed properly upstream).
>
> We've also seen this behaviour with a synthetic test program (which
> just runs 4 threads all calling gettimeofday() in a loop as fast as
> possible and testing that it doesn't jump) on an older machine, a dell
> poweredge SC1425 with two p4 hyperthreaded xeons.
>
> Can anyone advise on what's going wrong here? I can't find much in the
> way of documentation on whether the TSC is guaranteed to be
> monotonically increasing on intel systems. Should the code choose not
> to use the TSC? Or should the TSC reading code ensure that the
> returned values are monotonic?
>
> Is there any more information that would be useful? I'll be on a plane
> for most of tomorrow, so might be a little slow responding.
>
> Thanks,
>
> Mike
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

peterz at infradead

Aug 23, 2007, 4:47 AM

Post #3 of 23 (40464 views)

[ CCs added ]

On Thu, 2007-08-23 at 13:08 +0200, Michael Smith wrote:
> Hi,
>
> We've been seeing some strange behaviour on some of our applications
> recently. I've tracked this down to gettimeofday() returning spurious
> values occasionally.
>
> Specifically, gettimeofday() will suddenly, for a single call, return
> a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> following call goes back to a normal value.
>
> This seems to be occurring when the clock source goes slightly
> backwards for a single call. In
> kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
> cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
>
> So a small decrease in time here will (this is all unsigned
> arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
> this by some value, then right shifts by 22. The resulting value (in
> nanoseconds) is approximately 4398 seconds; this gets added on to the
> xtime value, giving us our jump into the future. The next call to
> gettimeofday() returns to normal as we don't have this huge nanosecond
> offset.
>
> This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit
> mode. It's a dell poweredge 1950. The kernel selects the TSC as the
> clock source, having determined that the tsc runs synchronously on
> this system. Switching the systems to use a different time source
> seems to make the problem go away (which is fine for us, but we'd like
> to get this fixed properly upstream).
>
> We've also seen this behaviour with a synthetic test program (which
> just runs 4 threads all calling gettimeofday() in a loop as fast as
> possible and testing that it doesn't jump) on an older machine, a dell
> poweredge SC1425 with two p4 hyperthreaded xeons.
>
> Can anyone advise on what's going wrong here? I can't find much in the
> way of documentation on whether the TSC is guaranteed to be
> monotonically increasing on intel systems. Should the code choose not
> to use the TSC? Or should the TSC reading code ensure that the
> returned values are monotonic?
>
> Is there any more information that would be useful? I'll be on a plane
> for most of tomorrow, so might be a little slow responding.

The exact version of the kernel you're using might be good thing to
start with :-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Aug 23, 2007, 5:20 AM

Post #4 of 23 (40435 views)

On 8/23/07, Peter Zijlstra <peterz@infradead.org> wrote:
> [ CCs added ]
>
> On Thu, 2007-08-23 at 13:08 +0200, Michael Smith wrote:
> > Hi,
> >
> > We've been seeing some strange behaviour on some of our applications
> > recently. I've tracked this down to gettimeofday() returning spurious
> > values occasionally.
> >
> > Specifically, gettimeofday() will suddenly, for a single call, return
> > a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> > following call goes back to a normal value.
> >
> > This seems to be occurring when the clock source goes slightly
> > backwards for a single call. In
> > kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
> > cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
> >
> > So a small decrease in time here will (this is all unsigned
> > arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
> > this by some value, then right shifts by 22. The resulting value (in
> > nanoseconds) is approximately 4398 seconds; this gets added on to the
> > xtime value, giving us our jump into the future. The next call to
> > gettimeofday() returns to normal as we don't have this huge nanosecond
> > offset.
> >
> > This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit
> > mode. It's a dell poweredge 1950. The kernel selects the TSC as the
> > clock source, having determined that the tsc runs synchronously on
> > this system. Switching the systems to use a different time source
> > seems to make the problem go away (which is fine for us, but we'd like
> > to get this fixed properly upstream).
> >
> > We've also seen this behaviour with a synthetic test program (which
> > just runs 4 threads all calling gettimeofday() in a loop as fast as
> > possible and testing that it doesn't jump) on an older machine, a dell
> > poweredge SC1425 with two p4 hyperthreaded xeons.
> >
> > Can anyone advise on what's going wrong here? I can't find much in the
> > way of documentation on whether the TSC is guaranteed to be
> > monotonically increasing on intel systems. Should the code choose not
> > to use the TSC? Or should the TSC reading code ensure that the
> > returned values are monotonic?
> >
> > Is there any more information that would be useful? I'll be on a plane
> > for most of tomorrow, so might be a little slow responding.
>
> The exact version of the kernel you're using might be good thing to
> start with :-)

My apologies. It's always the obvious things one forgets to put in...

This was originally seen on an FC5 system running:
2.6.18-1.2257.fc5smp

We've reproduced it on the same system with a kernel built from
linus's git tree yesterday: 2.6.23-rc3.

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Aug 23, 2007, 6:03 AM

Post #5 of 23 (40444 views)

Gerald Britton wrote:
> On Thu, Aug 23, 2007 at 01:08:27PM +0200, Michael Smith wrote:
>
>> Hi,
>>
>> We've been seeing some strange behaviour on some of our applications
>> recently. I've tracked this down to gettimeofday() returning spurious
>> values occasionally.
>>
>> Specifically, gettimeofday() will suddenly, for a single call, return
>> a value about 4398 seconds (~1 hour 13 minutes) in the future. The
>> following call goes back to a normal value.
>>
>
> I have seen this as well (on a 2.6.20.4 kernel). The value returned was
> always identical each time the glitch occured (just a little over 4398
> seconds). I saw it watching packet receive timestamps and on the system in
> question, it would generally hit this problem around once a minute. When
> moving forward to a 2.6.21 kernel, the problem seemed to go away (also back
> to 2.6.17, unfortunately I didn't have any sample points inbetween).
> I didn't have free time to spend bisecting attempting to find when the
> behavior started or stopped.
>
>

That value, in nanoseconds, is 0x3fffd3a4c00. The next second is
0x40038d51600. Is the wraparound at (0x400 << 32) significant?

--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Aug 23, 2007, 11:47 AM

Post #6 of 23 (40433 views)

On Thu, 2007-08-23 at 14:20 +0200, Michael Smith wrote:
> On 8/23/07, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, 2007-08-23 at 13:08 +0200, Michael Smith wrote:
> > > We've been seeing some strange behaviour on some of our applications
> > > recently. I've tracked this down to gettimeofday() returning spurious
> > > values occasionally.
> > >
> > > Specifically, gettimeofday() will suddenly, for a single call, return
> > > a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> > > following call goes back to a normal value.
> > >
> > > This seems to be occurring when the clock source goes slightly
> > > backwards for a single call. In
> > > kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
> > > cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
> > >
> > > So a small decrease in time here will (this is all unsigned
> > > arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
> > > this by some value, then right shifts by 22. The resulting value (in
> > > nanoseconds) is approximately 4398 seconds; this gets added on to the
> > > xtime value, giving us our jump into the future. The next call to
> > > gettimeofday() returns to normal as we don't have this huge nanosecond
> > > offset.
> > >
> > > This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit
> > > mode. It's a dell poweredge 1950. The kernel selects the TSC as the
> > > clock source, having determined that the tsc runs synchronously on
> > > this system. Switching the systems to use a different time source
> > > seems to make the problem go away (which is fine for us, but we'd like
> > > to get this fixed properly upstream).

Hmm. That does sound like unsycned TSCs. Normally Intel systems don't
skew unless they are NUMA systems or you're entering low power states.
We try to catch both of those cases, so I'm not sure how you box is
slipping through.

Can you run the following test to verify that the TSCs are skewed?

thanks
-john

Re: gettimeofday() jumping into the future [ In reply to ]

Aug 23, 2007, 1:07 PM

Post #7 of 23 (40429 views)

Gerald Britton wrote:
> I have seen this as well (on a 2.6.20.4 kernel). The value returned was
> always identical each time the glitch occured (just a little over 4398
> seconds).

I would guess an off by 2^32 error is at play.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Aug 23, 2007, 1:09 PM

Post #8 of 23 (40445 views)

Avi Kivity wrote:
> Gerald Britton wrote:
>> On Thu, Aug 23, 2007 at 01:08:27PM +0200, Michael Smith wrote:
>>
>>> Hi,
>>>
>>> We've been seeing some strange behaviour on some of our applications
>>> recently. I've tracked this down to gettimeofday() returning spurious
>>> values occasionally.
>>>
>>> Specifically, gettimeofday() will suddenly, for a single call, return
>>> a value about 4398 seconds (~1 hour 13 minutes) in the future. The
>>> following call goes back to a normal value.
>>>
>>
>> I have seen this as well (on a 2.6.20.4 kernel). The value returned was
>> always identical each time the glitch occured (just a little over 4398
>> seconds). I saw it watching packet receive timestamps and on the
>> system in
>> question, it would generally hit this problem around once a minute. When
>> moving forward to a 2.6.21 kernel, the problem seemed to go away (also
>> back
>> to 2.6.17, unfortunately I didn't have any sample points inbetween).
>> I didn't have free time to spend bisecting attempting to find when the
>> behavior started or stopped.
>
> That value, in nanoseconds, is 0x3fffd3a4c00. The next second is
> 0x40038d51600. Is the wraparound at (0x400 << 32) significant?
>

This would be consistent with an off-by-2^32 error. In particular, if
either a CPU bug or a code bug could at some point produce 0x400fxxxxxxx
in between then that would produce exactly the observed characteristics.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Aug 25, 2007, 9:44 AM

Post #9 of 23 (40430 views)

>
> Hmm. That does sound like unsycned TSCs. Normally Intel systems don't
> skew unless they are NUMA systems or you're entering low power states.
> We try to catch both of those cases, so I'm not sure how you box is
> slipping through.
>
> Can you run the following test to verify that the TSCs are skewed?

I ran this for the past two days, multiple copies to put some load on
the system (the initial problem with gettimeofday() was easier to
reproduce under load). Nothing.

So, I guess that rules out unsynced TSCs as the cause? Or perhaps it
only happens in other conditions, for some reason. I'm on holidays
right now, but my coworkers should be able to continue investigating
other possible causes.

Thanks for the advice/test program.

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Mar 30, 2008, 2:17 PM

Post #10 of 23 (40350 views)

On Thu, 23 Aug 2007, Michael Smith wrote:

> We've been seeing some strange behaviour on some of our applications
> recently. I've tracked this down to gettimeofday() returning spurious
> values occasionally.
>
> Specifically, gettimeofday() will suddenly, for a single call, return
> a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> following call goes back to a normal value.

I have also seen this.

> This seems to be occurring when the clock source goes slightly
> backwards for a single call. In
> kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
> cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
>
> So a small decrease in time here will (this is all unsigned
> arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
> this by some value, then right shifts by 22. The resulting value (in
> nanoseconds) is approximately 4398 seconds; this gets added on to the
> xtime value, giving us our jump into the future. The next call to
> gettimeofday() returns to normal as we don't have this huge nanosecond
> offset.

Indeed. I don't know where the suggestion of off by 2^32us came in
later in this thread. As you've already pointed out, it's off by
2^42ns.

I've no idea why the TSC might go backwards, but perhaps we should not
break horribly if it does. How about treating it as zero?

diff -urN linux-2.6.24.4/include/linux/clocksource.h linux/include/linux/clocksource.h
--- linux-2.6.24.4/include/linux/clocksource.h 2008-03-24 18:49:18.000000000 +0000
+++ linux/include/linux/clocksource.h 2008-03-28 11:15:02.000000000 +0000
@@ -176,7 +176,7 @@
*
* XXX - This could use some mult_lxl_ll() asm optimization
*/
-static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
+static inline u64 cyc2ns(struct clocksource *cs, cycle_t cycles)
{
u64 ret = (u64)cycles;
ret = (ret * cs->mult) >> cs->shift;
diff -urN linux-2.6.24.4/kernel/time/timekeeping.c linux/kernel/time/timekeeping.c
--- linux-2.6.24.4/kernel/time/timekeeping.c 2008-03-24 18:49:18.000000000 +0000
+++ linux/kernel/time/timekeeping.c 2008-03-28 11:15:01.000000000 +0000
@@ -64,14 +64,17 @@
* called. Returns the number of nanoseconds since the
* last call to update_wall_time() (adjusted by NTP scaling)
*/
-static inline s64 __get_nsec_offset(void)
+static inline u64 __get_nsec_offset(void)
{
cycle_t cycle_now, cycle_delta;
- s64 ns_offset;
+ u64 ns_offset;

/* read clocksource: */
cycle_now = clocksource_read(clock);

+ if (cycle_now < clock->cycle_last)
+ return 0;
+
/* calculate the delta since the last update_wall_time: */
cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;

@@ -91,7 +94,7 @@
static inline void __get_realtime_clock_ts(struct timespec *ts)
{
unsigned long seq;
- s64 nsecs;
+ u64 nsecs;

do {
seq = read_seqbegin(&xtime_lock);
@@ -207,7 +210,7 @@
}
#else
static inline void change_clocksource(void) { }
-static inline s64 __get_nsec_offset(void) { return 0; }
+static inline u64 __get_nsec_offset(void) { return 0; }
#endif

/**
@@ -272,7 +275,7 @@
/* time in seconds when suspend began */
static unsigned long timekeeping_suspend_time;
/* xtime offset when we went into suspend */
-static s64 timekeeping_suspend_nsecs;
+static u64 timekeeping_suspend_nsecs;

/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.

Alternatively, we could try to make it work and have gettimeofday jump
back slightly in this case, but I don't like this as much, because I
think it's more complicated, slower and unnecessary.

diff -urN linux-2.6.24.4/arch/x86/vdso/vclock_gettime.c linux/arch/x86/vdso/vclock_gettime.c
--- linux-2.6.24.4/arch/x86/vdso/vclock_gettime.c 2008-03-24 18:49:18.000000000 +0000
+++ linux/arch/x86/vdso/vclock_gettime.c 2008-03-28 12:15:24.000000000 +0000
@@ -43,7 +43,8 @@

static noinline int do_realtime(struct timespec *ts)
{
- unsigned long seq, ns;
+ unsigned long seq;
+ long ns;
do {
seq = read_seqbegin(&gtod->lock);
ts->tv_sec = gtod->wall_time_sec;
diff -urN linux-2.6.24.4/include/linux/clocksource.h linux/include/linux/clocksource.h
--- linux-2.6.24.4/include/linux/clocksource.h 2008-03-24 18:49:18.000000000 +0000
+++ linux/include/linux/clocksource.h 2008-03-28 11:59:05.000000000 +0000
@@ -176,11 +176,9 @@
*
* XXX - This could use some mult_lxl_ll() asm optimization
*/
-static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
+static inline s64 cyc2ns(struct clocksource *cs, s64 cycles)
{
- u64 ret = (u64)cycles;
- ret = (ret * cs->mult) >> cs->shift;
- return ret;
+ return (cycles * cs->mult) >> cs->shift;
}

/**
diff -urN linux-2.6.24.4/include/linux/time.h linux/include/linux/time.h
--- linux-2.6.24.4/include/linux/time.h 2008-03-24 18:49:18.000000000 +0000
+++ linux/include/linux/time.h 2008-03-28 11:59:06.000000000 +0000
@@ -169,13 +169,17 @@
* @a: pointer to timespec to be incremented
* @ns: unsigned nanoseconds value to be added
*/
-static inline void timespec_add_ns(struct timespec *a, u64 ns)
+static inline void timespec_add_ns(struct timespec *a, s64 ns)
{
ns += a->tv_nsec;
while(unlikely(ns >= NSEC_PER_SEC)) {
ns -= NSEC_PER_SEC;
a->tv_sec++;
}
+ while(unlikely(ns < 0)) {
+ ns += NSEC_PER_SEC;
+ a->tv_sec--;
+ }
a->tv_nsec = ns;
}
#endif /* __KERNEL__ */
diff -urN linux-2.6.24.4/kernel/time/timekeeping.c linux/kernel/time/timekeeping.c
--- linux-2.6.24.4/kernel/time/timekeeping.c 2008-03-24 18:49:18.000000000 +0000
+++ linux/kernel/time/timekeeping.c 2008-03-28 12:31:48.000000000 +0000
@@ -47,7 +47,7 @@
static unsigned long total_sleep_time; /* seconds */

static struct timespec xtime_cache __attribute__ ((aligned (16)));
-static inline void update_xtime_cache(u64 nsec)
+static inline void update_xtime_cache(s64 nsec)
{
xtime_cache = xtime;
timespec_add_ns(&xtime_cache, nsec);
@@ -66,8 +66,8 @@
*/
static inline s64 __get_nsec_offset(void)
{
- cycle_t cycle_now, cycle_delta;
- s64 ns_offset;
+ cycle_t cycle_now;
+ s64 ns_offset, cycle_delta;

/* read clocksource: */
cycle_now = clocksource_read(clock);
@@ -75,6 +75,12 @@
/* calculate the delta since the last update_wall_time: */
cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;

+ /* Sign-extend. */
+ if (cycle_now < clock->cycle_last)
+ {
+ cycle_delta |= ~clock->mask;
+ }
+
/* convert to nanoseconds: */
ns_offset = cyc2ns(clock, cycle_delta);

@@ -182,7 +188,7 @@
{
struct clocksource *new;
cycle_t now;
- u64 nsec;
+ s64 nsec;

new = clocksource_get_next();

@@ -455,7 +461,17 @@
return;

#ifdef CONFIG_GENERIC_TIME
- offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask;
+ offset = clocksource_read(clock) - clock->cycle_last;
+
+ /* Mask but preserving sign. */
+ if (offset < 0)
+ {
+ offset = (offset & clock->mask) | ~clock->mask;
+ }
+ else
+ {
+ offset &= clock->mask;
+ }
#else
offset = clock->cycle_interval;
#endif
@@ -464,7 +480,7 @@
/* normally this loop will run just once, however in the
* case of lost or late ticks, it will accumulate correctly.
*/
- while (offset >= clock->cycle_interval) {
+ while (offset >= (s64)clock->cycle_interval) {
/* accumulate one interval */
clock->xtime_nsec += clock->xtime_interval;
clock->cycle_last += clock->cycle_interval;

--
Tim
Quidquid latine dictum sit, altum viditur.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

andi at firstfloor

Mar 31, 2008, 12:18 AM

Post #11 of 23 (40263 views)

Tim Ricketts <tr@earth.li> writes:

[.adding right people to cc just in case it slipped past their filters,
keeping enough quote for context]

> On Thu, 23 Aug 2007, Michael Smith wrote:
>
> > We've been seeing some strange behaviour on some of our applications
> > recently. I've tracked this down to gettimeofday() returning spurious
> > values occasionally.
> >
> > Specifically, gettimeofday() will suddenly, for a single call, return
> > a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> > following call goes back to a normal value.
>
> I have also seen this.
>
> > This seems to be occurring when the clock source goes slightly
> > backwards for a single call. In
> > kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
> > cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
> >
> > So a small decrease in time here will (this is all unsigned
> > arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
> > this by some value, then right shifts by 22. The resulting value (in
> > nanoseconds) is approximately 4398 seconds; this gets added on to the
> > xtime value, giving us our jump into the future. The next call to
> > gettimeofday() returns to normal as we don't have this huge nanosecond
> > offset.
>
> Indeed. I don't know where the suggestion of off by 2^32us came in
> later in this thread. As you've already pointed out, it's off by
> 2^42ns.

[...]
> +static inline u64 __get_nsec_offset(void)
> {
> cycle_t cycle_now, cycle_delta;
> - s64 ns_offset;
> + u64 ns_offset;
>
> /* read clocksource: */
> cycle_now = clocksource_read(clock);
>
> + if (cycle_now < clock->cycle_last)
> + return 0;

The old x86-64 pre-clocksource gettimeofday() implementation had a similar
check. It came from painful experience.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

tglx at linutronix

Mar 31, 2008, 1:55 AM

Post #12 of 23 (40282 views)

On Sun, 30 Mar 2008, Tim Ricketts wrote:
> On Thu, 23 Aug 2007, Michael Smith wrote:
>
> > We've been seeing some strange behaviour on some of our applications
> > recently. I've tracked this down to gettimeofday() returning spurious
> > values occasionally.
> >
> > Specifically, gettimeofday() will suddenly, for a single call, return
> > a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> > following call goes back to a normal value.
>
> I have also seen this.
>
> > This seems to be occurring when the clock source goes slightly
> > backwards for a single call. In
> > kernel/time/timekeeping.c:__get_nsec_offset(), we have this:
> > cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
> >
> > So a small decrease in time here will (this is all unsigned
> > arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies
> > this by some value, then right shifts by 22. The resulting value (in
> > nanoseconds) is approximately 4398 seconds; this gets added on to the
> > xtime value, giving us our jump into the future. The next call to
> > gettimeofday() returns to normal as we don't have this huge nanosecond
> > offset.
>
> Indeed. I don't know where the suggestion of off by 2^32us came in
> later in this thread. As you've already pointed out, it's off by
> 2^42ns.
>
> I've no idea why the TSC might go backwards, but perhaps we should not
> break horribly if it does. How about treating it as zero?

> + if (cycle_now < clock->cycle_last)
> + return 0;
> +

No, this breaks wrapping clocksources e.g. pmtimer. We need a
different sanity check for that TSC crap.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Mar 31, 2008, 9:03 AM

Post #13 of 23 (40270 views)

On Mon, 2008-03-31 at 10:55 +0200, Thomas Gleixner wrote:
>
> > + if (cycle_now < clock->cycle_last)
> > + return 0;
> > +
>
> No, this breaks wrapping clocksources e.g. pmtimer. We need a
> different sanity check for that TSC crap.

Yea. If you're going to do that sort of logic (which requires a costly
cmpxchg as well), it should be done in the read() function of the
clocksource.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

mihai.dontu at gmail

Apr 1, 2008, 9:26 PM

Post #14 of 23 (40237 views)

On Monday 31 March 2008, Tim Ricketts wrote:
> On Thu, 23 Aug 2007, Michael Smith wrote:
>
> > We've been seeing some strange behaviour on some of our applications
> > recently. I've tracked this down to gettimeofday() returning spurious
> > values occasionally.
> >
> > Specifically, gettimeofday() will suddenly, for a single call, return
> > a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> > following call goes back to a normal value.
>
> I have also seen this.

I don't know if gettimeofday() and dmesg's logging facility use the same
time source, but I'm seeing jumps into the past:

[ 11.010300] ACPI: SSDT 3F6D3F27, 01C6 (r1 PmRef Cpu0Cst 3001 INTL 20050624)
[ 9.344283] ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])
[ 9.344528] ACPI: Processor [CPU0] (supports 8 throttling states)
[ 9.344989] ACPI: SSDT 3F6D4374, 00C4 (r1 PmRef Cpu1Ist 3000 INTL 20050624)
[ 9.345458] ACPI: SSDT 3F6D40ED, 0085 (r1 PmRef Cpu1Cst 3000 INTL 20050624)
[ 11.013089] ACPI: CPU1 (power states: C1[C1] C2[C2] C3[C3])

and the future:

[ 11.081332] usb 1-2: new high speed USB device using ehci_hcd and address 2
[ 12.748489] usb usb4: configuration #1 chosen from 1 choice
[ 12.748653] hub 4-0:1.0: USB hub found
[ 12.748772] hub 4-0:1.0: 2 ports detected
[ 12.849090] ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 23 (level, low) -> IRQ 23
[ 12.849338] PCI: Setting latency timer of device 0000:00:1d.3 to 64
[ 12.849341] uhci_hcd 0000:00:1d.3: UHCI Host Controller
[ 12.849505] uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
[ 12.849734] uhci_hcd 0000:00:1d.3: irq 23, io base 0x0000bf20
[ 12.850020] usb usb5: configuration #1 chosen from 1 choice
[ 12.850185] hub 5-0:1.0: USB hub found
[ 12.850311] hub 5-0:1.0: 2 ports detected
[ 11.194688] usb 1-2: configuration #1 chosen from 1 choice

I have a 2.6.24, 64bit, SMP kernel (Intel Core 2 Duo).

--
Mihai DonÈ›u
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

mihai.dontu at gmail

Apr 1, 2008, 9:27 PM

Post #15 of 23 (40256 views)

On Wednesday 02 April 2008, Mihai DonÈ›u wrote:
> On Monday 31 March 2008, Tim Ricketts wrote:
> > On Thu, 23 Aug 2007, Michael Smith wrote:
> >
> > > We've been seeing some strange behaviour on some of our applications
> > > recently. I've tracked this down to gettimeofday() returning spurious
> > > values occasionally.
> > >
> > > Specifically, gettimeofday() will suddenly, for a single call, return
> > > a value about 4398 seconds (~1 hour 13 minutes) in the future. The
> > > following call goes back to a normal value.
> >
> > I have also seen this.
>
> I don't know if gettimeofday() and dmesg's logging facility use the same
> time source, but I'm seeing jumps into the past:
>
> [ 11.010300] ACPI: SSDT 3F6D3F27, 01C6 (r1 PmRef Cpu0Cst 3001 INTL 20050624)
> [ 9.344283] ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])
> [ 9.344528] ACPI: Processor [CPU0] (supports 8 throttling states)
> [ 9.344989] ACPI: SSDT 3F6D4374, 00C4 (r1 PmRef Cpu1Ist 3000 INTL 20050624)
> [ 9.345458] ACPI: SSDT 3F6D40ED, 0085 (r1 PmRef Cpu1Cst 3000 INTL 20050624)
> [ 11.013089] ACPI: CPU1 (power states: C1[C1] C2[C2] C3[C3])
>
> and the future:
>
> [ 11.081332] usb 1-2: new high speed USB device using ehci_hcd and address 2
> [ 12.748489] usb usb4: configuration #1 chosen from 1 choice
> [ 12.748653] hub 4-0:1.0: USB hub found
> [ 12.748772] hub 4-0:1.0: 2 ports detected
> [ 12.849090] ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 23 (level, low) -> IRQ 23
> [ 12.849338] PCI: Setting latency timer of device 0000:00:1d.3 to 64
> [ 12.849341] uhci_hcd 0000:00:1d.3: UHCI Host Controller
> [ 12.849505] uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
> [ 12.849734] uhci_hcd 0000:00:1d.3: irq 23, io base 0x0000bf20
> [ 12.850020] usb usb5: configuration #1 chosen from 1 choice
> [ 12.850185] hub 5-0:1.0: USB hub found
> [ 12.850311] hub 5-0:1.0: 2 ports detected
> [ 11.194688] usb 1-2: configuration #1 chosen from 1 choice
>
> I have a 2.6.24, 64bit, SMP kernel (Intel Core 2 Duo).

The full dmesg, attached.

--
Mihai DonÈ›u

Re: gettimeofday() jumping into the future [ In reply to ]

tglx at linutronix

Apr 2, 2008, 4:22 AM

Post #16 of 23 (40263 views)

On Mon, 31 Mar 2008, John Stultz wrote:
>
> On Mon, 2008-03-31 at 10:55 +0200, Thomas Gleixner wrote:
> >
> > > + if (cycle_now < clock->cycle_last)
> > > + return 0;
> > > +
> >
> > No, this breaks wrapping clocksources e.g. pmtimer. We need a
> > different sanity check for that TSC crap.
>
> Yea. If you're going to do that sort of logic (which requires a costly
> cmpxchg as well), it should be done in the read() function of the
> clocksource.

We can get away without a cmpxchg(). See patch below.

Tim, Michael, does this solve your problems ?

Thanks,

tglx

----------->

Subject: x86: tsc prevent time going backwards
From: Thomas Gleixner <tglx@linutronix.de>
Date: Tue, 01 Apr 2008 19:45:18 +0200

We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code for ever. This can cause
time jumps in the range of hours.

This was reported in:
http://lkml.org/lkml/2007/8/23/96
and
http://lkml.org/lkml/2008/3/31/23

I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the bootup sync
check. There exists an extremly small window where this delta can be
observed with a real big time jump. So far I was only able to
reproduce this with the vsyscall gettimeofday implementation, but in
theory this might be observable with the syscall based version as
well.

CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.

The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.

The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.

To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.

I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
arch/x86/kernel/tsc_32.c | 15 ++++++++++++++-
arch/x86/kernel/tsc_64.c | 23 ++++++++++++++++++++---
2 files changed, 34 insertions(+), 4 deletions(-)

Index: linux-2.6/arch/x86/kernel/tsc_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/tsc_32.c
+++ linux-2.6/arch/x86/kernel/tsc_32.c
@@ -287,14 +287,27 @@ core_initcall(cpufreq_tsc);
/* clock source code */

static unsigned long current_tsc_khz = 0;
+static struct clocksource clocksource_tsc;

+/*
+ * We compare the TSC to the cycle_last value in the clocksource
+ * structure to avoid a nasty time-warp issue. This can be observed in
+ * a very small window right after one CPU updated cycle_last under
+ * xtime lock and the other CPU reads a TSC value which is smaller
+ * than the cycle_last reference value due to a TSC which is slighty
+ * behind. This delta is nowhere else observable, but in that case it
+ * results in a forward time jump in the range of hours due to the
+ * unsigned delta calculation of the time keeping core code, which is
+ * necessary to support wrapping clocksources like pm timer.
+ */
static cycle_t read_tsc(void)
{
cycle_t ret;

rdtscll(ret);

- return ret;
+ return ret >= clocksource_tsc.cycle_last ?
+ ret : clocksource_tsc.cycle_last;
}

static struct clocksource clocksource_tsc = {
Index: linux-2.6/arch/x86/kernel/tsc_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/tsc_64.c
+++ linux-2.6/arch/x86/kernel/tsc_64.c
@@ -11,6 +11,7 @@
#include <asm/hpet.h>
#include <asm/timex.h>
#include <asm/timer.h>
+#include <asm/vgtod.h>

static int notsc __initdata = 0;

@@ -290,18 +291,34 @@ int __init notsc_setup(char *s)

__setup("notsc", notsc_setup);

+static struct clocksource clocksource_tsc;

-/* clock source code: */
+/*
+ * We compare the TSC to the cycle_last value in the clocksource
+ * structure to avoid a nasty time-warp. This can be observed in a
+ * very small window right after one CPU updated cycle_last under
+ * xtime/vsyscall_gtod lock and the other CPU reads a TSC value which
+ * is smaller than the cycle_last reference value due to a TSC which
+ * is slighty behind. This delta is nowhere else observable, but in
+ * that case it results in a forward time jump in the range of hours
+ * due to the unsigned delta calculation of the time keeping core
+ * code, which is necessary to support wrapping clocksources like pm
+ * timer.
+ */
static cycle_t read_tsc(void)
{
cycle_t ret = (cycle_t)get_cycles();
- return ret;
+
+ return ret >= clocksource_tsc.cycle_last ?
+ ret : clocksource_tsc.cycle_last;
}

static cycle_t __vsyscall_fn vread_tsc(void)
{
cycle_t ret = (cycle_t)vget_cycles();
- return ret;
+
+ return ret >= __vsyscall_gtod_data.clock.cycle_last ?
+ ret : __vsyscall_gtod_data.clock.cycle_last;
}

static struct clocksource clocksource_tsc = {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Apr 2, 2008, 3:40 PM

Post #17 of 23 (40271 views)

I'm pleased to see this change. I was dumbfounded by the approach
that was previously taken: run a brief experiment at bootup to see if
we can observe the TSCs being out of sync, and if we can't, conclude
that they are so perfectly in sync that no future observation will ever
see them as out of sync and so it's OK to write code that will blow up
if that conclusion is wrong. To me, this kind of thing is no better
than removing locks because in a brief experiment we couldn't provoke
the race condition that they guard against.

> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, Apr 2, 2008 at 4:22 AM
> Subject: Re: gettimeofday() jumping into the future
> To: John Stultz <johnstul@us.ibm.com>
> Cc: Tim Ricketts <tr@earth.li>, Michael Smith <msmith@xiph.org>, LKML <
> linux-kernel@vger.kernel.org>, Andy Wingo <wingo@fluendo.com>, Ingo Molnar <
> mingo@elte.hu>
>
>
> On Mon, 31 Mar 2008, John Stultz wrote:
> >
> > On Mon, 2008-03-31 at 10:55 +0200, Thomas Gleixner wrote:
> > >
> > > > + if (cycle_now < clock->cycle_last)
> > > > + return 0;
> > > > +
> > >
> > > No, this breaks wrapping clocksources e.g. pmtimer. We need a
> > > different sanity check for that TSC crap.
> >
> > Yea. If you're going to do that sort of logic (which requires a costly
> > cmpxchg as well), it should be done in the read() function of the
> > clocksource.
>
> We can get away without a cmpxchg(). See patch below.
>
> Tim, Michael, does this solve your problems ?
>
> Thanks,
>
> tglx
>
> ----------->
>
> Subject: x86: tsc prevent time going backwards
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Tue, 01 Apr 2008 19:45:18 +0200
>
> We already catch most of the TSC problems by sanity checks, but there
> is a subtle bug which has been in the code for ever. This can cause
> time jumps in the range of hours.
>
> This was reported in:
> http://lkml.org/lkml/2007/8/23/96
> and
> http://lkml.org/lkml/2008/3/31/23
>
> I was able to reproduce the problem with a gettimeofday loop test on a
> dual core and a quad core machine which both have sychronized
> TSCs. The TSCs seems not to be perfectly in sync though, but the
> kernel is not able to detect the slight delta in the bootup sync
> check. There exists an extremly small window where this delta can be
> observed with a real big time jump. So far I was only able to
> reproduce this with the vsyscall gettimeofday implementation, but in
> theory this might be observable with the syscall based version as
> well.
>
> CPU 0 updates the clock source variables under xtime/vyscall lock and
> CPU1, where the TSC is slighty behind CPU0, is reading the time right
> after the seqlock was unlocked.
>
> The clocksource reference data was updated with the TSC from CPU0 and
> the value which is read from TSC on CPU1 is less than the reference
> data. This results in a huge delta value due to the unsigned
> subtraction of the TSC value and the reference value. This algorithm
> can not be changed due to the support of wrapping clock sources like
> pm timer.
>
> The huge delta is converted to nanoseconds and added to xtime, which
> is then observable by the caller. The next gettimeofday call on CPU1
> will show the correct time again as now the TSC has advanced above the
> reference value.
>
> To prevent this TSC specific wreckage we need to compare the TSC value
> against the reference value and return the latter when it is larger
> than the actual TSC value.
>
> I pondered to mark the TSC unstable when the readout is smaller than
> the reference value, but this would render an otherwise good and fast
> clocksource unusable without a real good reason.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
> arch/x86/kernel/tsc_32.c | 15 ++++++++++++++-
> arch/x86/kernel/tsc_64.c | 23 ++++++++++++++++++++---
> 2 files changed, 34 insertions(+), 4 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/tsc_32.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/tsc_32.c
> +++ linux-2.6/arch/x86/kernel/tsc_32.c
> @@ -287,14 +287,27 @@ core_initcall(cpufreq_tsc);
> /* clock source code */
>
> static unsigned long current_tsc_khz = 0;
> +static struct clocksource clocksource_tsc;
>
> +/*
> + * We compare the TSC to the cycle_last value in the clocksource
> + * structure to avoid a nasty time-warp issue. This can be observed in
> + * a very small window right after one CPU updated cycle_last under
> + * xtime lock and the other CPU reads a TSC value which is smaller
> + * than the cycle_last reference value due to a TSC which is slighty
> + * behind. This delta is nowhere else observable, but in that case it
> + * results in a forward time jump in the range of hours due to the
> + * unsigned delta calculation of the time keeping core code, which is
> + * necessary to support wrapping clocksources like pm timer.
> + */
> static cycle_t read_tsc(void)
> {
> cycle_t ret;
>
> rdtscll(ret);
>
> - return ret;
> + return ret >= clocksource_tsc.cycle_last ?
> + ret : clocksource_tsc.cycle_last;
> }
>
> static struct clocksource clocksource_tsc = {
> Index: linux-2.6/arch/x86/kernel/tsc_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/tsc_64.c
> +++ linux-2.6/arch/x86/kernel/tsc_64.c
> @@ -11,6 +11,7 @@
> #include <asm/hpet.h>
> #include <asm/timex.h>
> #include <asm/timer.h>
> +#include <asm/vgtod.h>
>
> static int notsc __initdata = 0;
>
> @@ -290,18 +291,34 @@ int __init notsc_setup(char *s)
>
> __setup("notsc", notsc_setup);
>
> +static struct clocksource clocksource_tsc;
>
> -/* clock source code: */
> +/*
> + * We compare the TSC to the cycle_last value in the clocksource
> + * structure to avoid a nasty time-warp. This can be observed in a
> + * very small window right after one CPU updated cycle_last under
> + * xtime/vsyscall_gtod lock and the other CPU reads a TSC value which
> + * is smaller than the cycle_last reference value due to a TSC which
> + * is slighty behind. This delta is nowhere else observable, but in
> + * that case it results in a forward time jump in the range of hours
> + * due to the unsigned delta calculation of the time keeping core
> + * code, which is necessary to support wrapping clocksources like pm
> + * timer.
> + */
> static cycle_t read_tsc(void)
> {
> cycle_t ret = (cycle_t)get_cycles();
> - return ret;
> +
> + return ret >= clocksource_tsc.cycle_last ?
> + ret : clocksource_tsc.cycle_last;
> }
>
> static cycle_t __vsyscall_fn vread_tsc(void)
> {
> cycle_t ret = (cycle_t)vget_cycles();
> - return ret;
> +
> + return ret >= __vsyscall_gtod_data.clock.cycle_last ?
> + ret : __vsyscall_gtod_data.clock.cycle_last;
> }
>
> static struct clocksource clocksource_tsc = {
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Tim Mann work: mann@vmware.com home: tim@tim-mann.org
http://www.vmware.com http://tim-mann.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

fzu at wemgehoertderstaat

Apr 2, 2008, 4:57 PM

Post #18 of 23 (40280 views)

Am Mittwoch, 2. April 2008 schrieb Thomas Gleixner:
>
> Subject: x86: tsc prevent time going backwards
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Tue, 01 Apr 2008 19:45:18 +0200
>
> We already catch most of the TSC problems by sanity checks, but there
> is a subtle bug which has been in the code for ever. This can cause
> time jumps in the range of hours.
>
> This was reported in:
> http://lkml.org/lkml/2007/8/23/96
> and
> http://lkml.org/lkml/2008/3/31/23
>
> I was able to reproduce the problem with a gettimeofday loop test on a
> dual core and a quad core machine which both have sychronized
> TSCs. The TSCs seems not to be perfectly in sync though, but the
> kernel is not able to detect the slight delta in the bootup sync
> check. There exists an extremly small window where this delta can be
> observed with a real big time jump. So far I was only able to
> reproduce this with the vsyscall gettimeofday implementation, but in
> theory this might be observable with the syscall based version as
> well.
>
> CPU 0 updates the clock source variables under xtime/vyscall lock and
> CPU1, where the TSC is slighty behind CPU0, is reading the time right
> after the seqlock was unlocked.
>
> The clocksource reference data was updated with the TSC from CPU0 and
> the value which is read from TSC on CPU1 is less than the reference
> data. This results in a huge delta value due to the unsigned
> subtraction of the TSC value and the reference value. This algorithm
> can not be changed due to the support of wrapping clock sources like
> pm timer.
>
> The huge delta is converted to nanoseconds and added to xtime, which
> is then observable by the caller. The next gettimeofday call on CPU1
> will show the correct time again as now the TSC has advanced above the
> reference value.
>
> To prevent this TSC specific wreckage we need to compare the TSC value
> against the reference value and return the latter when it is larger
> than the actual TSC value.

Last paragraph IMO should be:
To prevent this TSC specific wreckage we need to substract the
reference value from the TSC value, interpret the result as
signed. If the interpreted result is negative, return the reference
value, else the TSC Value.
Your patch misses the corner case where TSC value is < reference _and_ valid
at TSC wrap around.

The access to the reference value needs a (the xtime ?) lock on 32bit, no?

Thanks,
Karsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

tglx at linutronix

Apr 2, 2008, 11:28 PM

Post #19 of 23 (40277 views)

On Thu, 3 Apr 2008, Karsten Wiese wrote:
> > To prevent this TSC specific wreckage we need to compare the TSC value
> > against the reference value and return the latter when it is larger
> > than the actual TSC value.
>
> Last paragraph IMO should be:
> To prevent this TSC specific wreckage we need to substract the
> reference value from the TSC value, interpret the result as
> signed. If the interpreted result is negative, return the reference
> value, else the TSC Value.
> Your patch misses the corner case where TSC value is < reference _and_ valid
> at TSC wrap around.

In about 200 years on a 3Ghz box.

> The access to the reference value needs a (the xtime ?) lock on 32bit, no?

The readout is under xtime lock anyway.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

James at superbug

Apr 3, 2008, 4:29 AM

Post #20 of 23 (40235 views)

>> On Thu, 23 Aug 2007, Michael Smith wrote:
>>
>>
> [...]
>
>> +static inline u64 __get_nsec_offset(void)
>> {
>> cycle_t cycle_now, cycle_delta;
>> - s64 ns_offset;
>> + u64 ns_offset;
>>
>> /* read clocksource: */
>> cycle_now = clocksource_read(clock);
>>
>> + if (cycle_now < clock->cycle_last)
>> + return 0;
>>

That does not work as cycle_now can go backwards as well as wrap around.
The only way to catch all edge cases is to be able to make an assumption
on the maximum acceptable difference between cycle_now and cycle_last,
where difference is the shortest distance between values if they were
pointers into a ring buffer. If this maximum difference value is less
than half of the max value cycle_now can have, one can take a very
simple approach.
Treat the cycle_now, cycle_last value as signed.
If they have same sign, do the comparison as unsigned.
If they have different signs, do the comparision as signed.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

James at superbug

Apr 3, 2008, 4:29 AM

Post #21 of 23 (40260 views)

James Courtier-Dutton wrote:
> than half of the max value cycle_now can have, one can take a very
> simple approach.
> Treat the cycle_now, cycle_last value as signed.
> If they have same sign, do the comparison as unsigned.
> If they have different signs, do the comparision as signed.
>
> James
>
Sorry,
>Treat the cycle_now, cycle_last value as signed.
>If they have same sign, do the comparison as unsigned.
>If they have different signs, do the comparision as signed.
is wrong.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

James at superbug

Apr 3, 2008, 5:44 AM

Post #22 of 23 (40236 views)

James Courtier-Dutton wrote:
> James Courtier-Dutton wrote:
>> than half of the max value cycle_now can have, one can take a very
>> simple approach.
>> Treat the cycle_now, cycle_last value as signed.
>> If they have same sign, do the comparison as unsigned.
>> If they have different signs, do the comparision as signed.
>>
>> James
>>
> Sorry,
> >Treat the cycle_now, cycle_last value as signed.
> >If they have same sign, do the comparison as unsigned.
> >If they have different signs, do the comparision as signed.
> is wrong.
>
It should be:
treat cycle_now and cycle_last as unsigned.
unsigned tmp;
tmp = cycle_now - cycle_last;
if (tmp > max_difference_threshold) cycle_now = cycle_last;

This correctly handles cycle_now going backwards as well as wrap around.
The only way to catch all edge cases is to be able to make an assumption
on the maximum acceptable difference between cycle_now and cycle_last,
where difference is the shortest distance between values if they were
pointers into a ring buffer. The other assumption made here is that the
wrap around only happens at max_uint of cycle_now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: gettimeofday() jumping into the future [ In reply to ]

Apr 11, 2008, 4:11 PM

Post #23 of 23 (40236 views)

On Thu, Apr 3, 2008 at 5:44 AM, James Courtier-Dutton
<James@superbug.co.uk> wrote:
> It should be:
> treat cycle_now and cycle_last as unsigned.
> unsigned tmp;
> tmp = cycle_now - cycle_last;
> if (tmp > max_difference_threshold) cycle_now = cycle_last;
>
> This correctly handles cycle_now going backwards as well as wrap around.
>
> The only way to catch all edge cases is to be able to make an assumption
> on the maximum acceptable difference between cycle_now and cycle_last,
> where difference is the shortest distance between values if they were
> pointers into a ring buffer. The other assumption made here is that the
> wrap around only happens at max_uint of cycle_now.

Yea. I see Thomas' patch was applied then reverted as it caused
problems with the TSC reseting over suspend and resume (since the
timekeeping core doesn't have a hook into the tsc clocksource to
inform it that the comparision is invalid, so the resume-time read
will always return the cycle_last value and not the actual smaller TSC
value).

This suggested bounding of how much a negative value is considered
valid would resolve this issue, since really the scope of this fix is
very small negative offsets due to the TSC being ever so slightly out
of sync between processors (but not enough to warrent throwing the TSC
out as a valid clock source).

-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/