Mailing List Archive

Disappearing root on 2.6.36-hardened-r6 upgrade
I've got (at least) two servers that lose their root partition after
this upgrade. One of them has an HP cciss SCSI RAID controller; the
other has a single IDE hard drive. Assuming the problem is something
common, I'll stick to describing the one with the array for now.

First of all, I didn't touch /etc/fstab:

/dev/cciss/c0d0p2 /boot ext3 noauto,noatime 1 2
/dev/cciss/c0d0p3 / ext4 acl,noatime 0 1
/dev/cciss/c0d0p1 none swap sw 0 0

I built the kernel after a make oldconfig, and updated grub.conf:

title Gentoo Linux 2.6.36-hardened-r6
root (hd0,1)
kernel /kernel-2.6.36-hardened-r6

It's actually there:

# /bin/ls /boot/kernel-2.6.36-hardened-r6
/boot/kernel-2.6.36-hardened-r6

But upon reboot, this happens:

http://michael.orlitzky.com/images/untouched.jpg

So, I tried it with root=/dev/cciss/c0d0p3:

http://michael.orlitzky.com/images/with_root_param.jpg

It clearly sees my partitions, since it lists them all. The root is
ext4, which is compiled into the kernel:

# grep EXT4 .config
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_XATTR=y
CONFIG_EXT4_FS_POSIX_ACL=y
# CONFIG_EXT4_FS_SECURITY is not set
# CONFIG_EXT4_DEBUG is not set

Now I'm at a loss. There must have been something else that I did during
the make oldconfig that broke it. I keep my kernel configs in git, so
here's the diff (with context stripped) from my previous kernel,
2.6.32-hardened-r22. If anyone has any ideas, I'd appreciate it:

+CONFIG_INSTRUCTION_DECODER=y
-CONFIG_GENERIC_TIME=y
+CONFIG_NEED_SG_DMA_LENGTH=y
+CONFIG_HAVE_EARLY_RES=y
+CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
+CONFIG_CROSS_COMPILE=""
+CONFIG_HAVE_KERNEL_LZO=y
+CONFIG_PERF_EVENTS=y
+CONFIG_HAVE_OPTPROBES=y
+CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
+CONFIG_HAVE_HW_BREAKPOINT=y
+CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
+CONFIG_HAVE_USER_RETURN_NOTIFIER=y
+CONFIG_HAVE_PERF_EVENTS_NMI=y
+CONFIG_INLINE_SPIN_UNLOCK=y
+CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
+CONFIG_INLINE_READ_UNLOCK=y
+CONFIG_INLINE_READ_UNLOCK_IRQ=y
+CONFIG_INLINE_WRITE_UNLOCK=y
+CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
+CONFIG_MUTEX_SPIN_ON_OWNER=y
+CONFIG_NO_BOOTMEM=y
-CONFIG_X86_L1_CACHE_BYTES=64
-CONFIG_X86_INTERNODE_CACHE_BYTES=64
+CONFIG_X86_INTERNODE_CACHE_SHIFT=7
-CONFIG_HAVE_MLOCK=y
-CONFIG_HAVE_MLOCKED_PAGE_BIT=y
-CONFIG_ACPI_DOCK=y
+CONFIG_ACPI_HED=m
+CONFIG_ACPI_APEI=y
+CONFIG_ACPI_APEI_GHES=m
+CONFIG_INTEL_IDLE=y
+CONFIG_PCIEASPM=y
+CONFIG_PCI_IOAPIC=y
-CONFIG_PACKET_MMAP=y
+
+
+
+CONFIG_NETFILTER_XT_MATCH_OSF=m
-CONFIG_NETFILTER_XT_MATCH_OSF=m
+CONFIG_RPS=y
+
+CONFIG_SCSI_MOD=y
+
+
-
+CONFIG_VGA_ARB_MAX_GPUS=2
+CONFIG_USB_EHCI_TT_NEWSCHED=y
-
-CONFIG_INOTIFY=y
+CONFIG_PAX_ELFRELOCS=y
+CONFIG_DEFAULT_SECURITY_DAC=y
+CONFIG_DEFAULT_SECURITY=""
-CONFIG_CRYPTO_FIPS=y
-CONFIG_CRYPTO_AEAD2=y
-CONFIG_CRYPTO_BLKCIPHER=m
-CONFIG_CRYPTO_BLKCIPHER2=y
-CONFIG_CRYPTO_RNG=m
-CONFIG_CRYPTO_RNG2=y
-CONFIG_CRYPTO_PCOMP=y
-CONFIG_CRYPTO_MANAGER=y
-CONFIG_CRYPTO_MANAGER2=y
-CONFIG_CRYPTO_WORKQUEUE=y
-CONFIG_CRYPTO_ECB=m
-CONFIG_CRYPTO_CRC32C_INTEL=m
-CONFIG_CRYPTO_MD5=m
-CONFIG_CRYPTO_SHA512=m
+CONFIG_CRYPTO_AES_586=m
-CONFIG_CRYPTO_DES=m
-CONFIG_CRYPTO_ZLIB=y
-CONFIG_CRYPTO_ANSI_CPRNG=m
-CONFIG_ZLIB_INFLATE=y
-CONFIG_ZLIB_DEFLATE=y
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 26 Dec 2010 at 1:59, Michael Orlitzky wrote:

> I've got (at least) two servers that lose their root partition after
> this upgrade. One of them has an HP cciss SCSI RAID controller; the
> other has a single IDE hard drive. Assuming the problem is something
> common, I'll stick to describing the one with the array for now.

which grsec is this ebuild based on? my guess is that it's a recent PaX/UDEREF
hardening that's causing this and should be mostly fixed now except for the
IP checksum code fix which i'll release soon. in the meantime you can disable
UDEREF. if you don't have it enabled then i don't know what it is, we'll need
more debugging, let me know.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 12/26/2010 03:46 AM, pageexec@freemail.hu wrote:
> On 26 Dec 2010 at 1:59, Michael Orlitzky wrote:
>
>> I've got (at least) two servers that lose their root partition after
>> this upgrade. One of them has an HP cciss SCSI RAID controller; the
>> other has a single IDE hard drive. Assuming the problem is something
>> common, I'll stick to describing the one with the array for now.
>
> which grsec is this ebuild based on? my guess is that it's a recent PaX/UDEREF
> hardening that's causing this and should be mostly fixed now except for the
> IP checksum code fix which i'll release soon. in the meantime you can disable
> UDEREF. if you don't have it enabled then i don't know what it is, we'll need
> more debugging, let me know.

The hardened-patches contains the following:

4423_grsec-remove-protected-paths.patch
4420_grsecurity-2.2.1-2.6.36.2-201012121726.patch
4435_grsec-kconfig-gentoo.patch
4421_grsec-remove-localversion-grsec.patch
4425_grsec-pax-without-grsec.patch
4430_grsec-kconfig-default-gids.patch
4422_grsec-mute-warnings.patch

I do have UDEREF enabled:

# grep UDEREF .config
CONFIG_PAX_MEMORY_UDEREF=y

I can try disabling it when I'd be willing to drive to work and reboot
the thing.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 12/26/2010 03:46 AM, pageexec@freemail.hu wrote:
> On 26 Dec 2010 at 1:59, Michael Orlitzky wrote:
>
>> I've got (at least) two servers that lose their root partition after
>> this upgrade. One of them has an HP cciss SCSI RAID controller; the
>> other has a single IDE hard drive. Assuming the problem is something
>> common, I'll stick to describing the one with the array for now.
>
> which grsec is this ebuild based on? my guess is that it's a recent PaX/UDEREF
> hardening that's causing this and should be mostly fixed now except for the
> IP checksum code fix which i'll release soon. in the meantime you can disable
> UDEREF. if you don't have it enabled then i don't know what it is, we'll need
> more debugging, let me know.
>

I'll repeat what I said in the bug report here
(See https://bugs.gentoo.org/show_bug.cgi?id=349705)

hardened-sources-2.6.32-r31 has grsecurity-2.2.1-2.6.32.27-201012121726

hardened-sources-2.6.36-r6 has grsecurity-2.2.1-2.6.36.2-201012121726


What's even stranger is that I have six HP Proliant DL 385 G7, all with
the following (partial) fstab:

/dev/cciss/c0d0p1 /boot ext2 noauto,noatime 1 2
/dev/cciss/c0d0p3 / ext4 noatime 0 1
/dev/cciss/c0d0p2 none swap sw 0 0

None of which showed a panic.


--
Anthony G. Basile, Ph.D.
Gentoo Developer
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 26 Dec 2010 at 12:06, Michael Orlitzky wrote:

> I do have UDEREF enabled:
>
> # grep UDEREF .config
> CONFIG_PAX_MEMORY_UDEREF=y
>
> I can try disabling it when I'd be willing to drive to work and reboot
> the thing.

ok, in this case don't worry about it as i'm sure it's a known bug.
if the next grsec patch (after 12.22) still fails on you then do let
us know though.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 12/26/2010 03:46 AM, pageexec@freemail.hu wrote:
> On 26 Dec 2010 at 1:59, Michael Orlitzky wrote:
>
>> I've got (at least) two servers that lose their root partition after
>> this upgrade. One of them has an HP cciss SCSI RAID controller; the
>> other has a single IDE hard drive. Assuming the problem is something
>> common, I'll stick to describing the one with the array for now.
>
> which grsec is this ebuild based on? my guess is that it's a recent PaX/UDEREF
> hardening that's causing this and should be mostly fixed now except for the
> IP checksum code fix which i'll release soon. in the meantime you can disable
> UDEREF. if you don't have it enabled then i don't know what it is, we'll need
> more debugging, let me know.
>

Within 24 hours I'll have the following ebuilds on the tree marked ~arch:

hardened-sources-2.6.32-r32
hardened-sources-2.6.36-r7

They are based on the very latest grsec patches. Can users who hit the
panic test them?

--
Anthony G. Basile, Ph.D.
Gentoo Developer
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 12/26/2010 12:57 PM, pageexec@freemail.hu wrote:
> On 26 Dec 2010 at 12:06, Michael Orlitzky wrote:
>
>> I do have UDEREF enabled:
>>
>> # grep UDEREF .config
>> CONFIG_PAX_MEMORY_UDEREF=y
>>
>> I can try disabling it when I'd be willing to drive to work and reboot
>> the thing.
>
> ok, in this case don't worry about it as i'm sure it's a known bug.
> if the next grsec patch (after 12.22) still fails on you then do let
> us know though.

Challenge accepted. I'm dressed, the car's cleaned off, and I'm
recompiling with UDEREF=n.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 12/26/2010 12:57 PM, pageexec@freemail.hu wrote:
> On 26 Dec 2010 at 12:06, Michael Orlitzky wrote:
>
>> I do have UDEREF enabled:
>>
>> # grep UDEREF .config
>> CONFIG_PAX_MEMORY_UDEREF=y
>>
>> I can try disabling it when I'd be willing to drive to work and reboot
>> the thing.
>
> ok, in this case don't worry about it as i'm sure it's a known bug.
> if the next grsec patch (after 12.22) still fails on you then do let
> us know though.

To my mild surprise, the box came back up. Disabling UDEREF fixed it.
I'll give the new ~arch ebuilds a try, too, when they become available.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 26 Dec 2010 at 14:09, Michael Orlitzky wrote:

> Challenge accepted. I'm dressed, the car's cleaned off, and I'm
> recompiling with UDEREF=n.

passing pax_nouderef on the kernel cmdline should be enough ;)
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 26 Dec 2010 at 19:59, "Tóth Attila" wrote:

> I don't know if it is related or not. I don't use ext4 and have no
> symptoms of disappearing root. I attach a photo taken using a recent
> kernel. The latest crashes I've experienced for the past few months
> prevented syncing, so didn't get logged. The other screen capture is
> older, may not be relevant nowdays.

it's a different issue, the UDEREF changes haven't been incorporated into
grsec's .32 series yet. looks like some null deref in the filesystem sync
code, but i can't tell what may be causing this. is this something you can
reproduce at will? if so, can you try vanilla?
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
El 26/12/10 21:06, pageexec@freemail.hu escribió:
> On 26 Dec 2010 at 19:59, "Tóth Attila" wrote:
>
>> I don't know if it is related or not. I don't use ext4 and have no
>> symptoms of disappearing root. I attach a photo taken using a recent
>> kernel. The latest crashes I've experienced for the past few months
>> prevented syncing, so didn't get logged. The other screen capture is
>> older, may not be relevant nowdays.
> it's a different issue, the UDEREF changes haven't been incorporated into
> grsec's .32 series yet. looks like some null deref in the filesystem sync
> code, but i can't tell what may be causing this. is this something you can
> reproduce at will? if so, can you try vanilla?
I recall somebody telling blueness to include a patch to fix that (was
before exams so everything is blurry :P). Can you try with 2.6.32-r31 ?
IIRC the patch to fix that ext4 issue was included by blueness there.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
El 26/12/10 21:00, pageexec@freemail.hu escribió:
> On 26 Dec 2010 at 14:09, Michael Orlitzky wrote:
>
>> Challenge accepted. I'm dressed, the car's cleaned off, and I'm
>> recompiling with UDEREF=n.
> passing pax_nouderef on the kernel cmdline should be enough ;)
This should be documented in the FAQ as it can be very useful to help
tracking certain hardened kernel issues (for example when virtualizing).
Can you point me to a document with similar parameters for the PaX
kernel features?
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
El 26/12/10 21:00, pageexec@freemail.hu escribió:
> On 26 Dec 2010 at 14:09, Michael Orlitzky wrote:
>
>> Challenge accepted. I'm dressed, the car's cleaned off, and I'm
>> recompiling with UDEREF=n.
> passing pax_nouderef on the kernel cmdline should be enough ;
looking at ./Documentation/kernel-parameters.txt only found these:
pax_nouderef [X86-32] disables UDEREF. Most likely needed
under certain
virtualization environments that don't cope well
with the
expand down segment used by UDEREF on X86-32.

pax_softmode= [X86-32] 0/1 to disable/enable PaX softmode on
boot already.

Are you sure I'm not missing any, a similar feature for KERNEXEC would
be very useful.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 12/26/2010 03:00 PM, pageexec@freemail.hu wrote:
> On 26 Dec 2010 at 14:09, Michael Orlitzky wrote:
>
>> Challenge accepted. I'm dressed, the car's cleaned off, and I'm
>> recompiling with UDEREF=n.
>
> passing pax_nouderef on the kernel cmdline should be enough ;)
>

This doesn't seem to work. At least, it doesn't prevent my panics,
whereas recompiling the same kernel with UDEREF=n does.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 27 Dec 2010 at 1:05, klondike wrote:

> looking at ./Documentation/kernel-parameters.txt only found these:
> pax_nouderef [X86-32] disables UDEREF. Most likely needed under certain
> virtualization environments that don't cope well with the
> expand down segment used by UDEREF on X86-32.
>
> pax_softmode= [X86-32] 0/1 to disable/enable PaX softmode on boot already.
>
> Are you sure I'm not missing any, a similar feature for KERNEXEC would
> be very useful.

as you found out, all the PaX specific kernel command line parameters
are documented where other such parameters are ;). as for KERNEXEC, it's
not possible (well, with reasonable effort) to turn it off at runtime.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
There were two screen shots attached. The older one was outdated related
to 2.6.32 kernel.

But the other was a recent panic.

So here is another one. This time I could paste it from the log:

last sysfs file:
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:00/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full
Modules linked in: i2c_dev tp_smapi thinkpad_ec lib80211_crypt_wep
lib80211_crypt_tkip lib80211_crypt_ccmp radeon ttm drm_kms_helper ehci_hcd
ipw2200 libipw yenta_socket i2c_i801 uhci_hcd

Pid: 1400, comm: kjournald Not tainted 2.6.36-hardened-r6 #1 1830W7F/1830W7F
EIP: 0060:[<0014d697>] EFLAGS: 00010216 CPU: 0
EIP is at journal_commit_transaction+0x6f7/0xd00
EAX: 00e89222 EBX: cc7ae76d ECX: 00000000 EDX: 00000000
ESI: 00000005 EDI: 00000000 EBP: e9c1c4c0 ESP: f695bf04
DS: 0068 ES: 0068 FS: 0000 GS: 00e0 SS: 0068
Process kjournald (pid: 1400, ti=f695a000 task=f70bb0b0 task.ti=f695a000)
Stack:
000028cd 26626a70 00029eaf f6ae6800 00000000 00000005 e3834c1c e10ed03c
<0> 00000fc4 00000001 f55b1000 f6ae68c0 ebe65e9e 0000681e f7076064 00000000
<0> 10fe2a49 e62673c0 000293ec 000028ce e3e13910 f70bb0b0 005bb206 00000003
Call Trace:
[<000028cd>] ? copy_thread+0x1d/0x140
[<00029eaf>] ? switched_to_idle+0x1f/0x60
[<0000681e>] ? write_ldt+0x10e/0x2d0
[<000293ec>] ? finish_task_switch.clone.120.clone.124+0x2c/0x90
[<000028ce>] ? copy_thread+0x1e/0x140
[<005bb206>] ? schedule+0x146/0x3e0
[<0014fb99>] ? kjournald+0x99/0x1b0
[<00046ad0>] ? autoremove_wake_function+0x0/0x40
[<0014fb00>] ? kjournald+0x0/0x1b0
[<000466a4>] ? kthread+0x74/0x80
[<00046630>] ? kthread+0x0/0x80
[<0000455e>] ? kernel_thread_helper+0x6/0x18
Code: 00 b9 03 00 00 00 89 ea e8 67 f7 ff ff 89 d8 ba 17 00 00 00 e8 1b 94
ef ff 89 d8 e8 f4 f3 f7 ff e9 ef fc ff ff 83 6d 40 01 8b 03 <ff> 40 34 71
04 ff 48 34 ce 8b 03 80 48 02 01 8b 44 24 4c 89 da
EIP: [<0014d697>] journal_commit_transaction+0x6f7/0xd00 SS:ESP 0068:f695bf04
---[ end trace 0f9efa514b41f93a ]---

It happens during IO activity. I wouldn't say heavy IO. The memory is OK,
the harddrive is perfect.
I can dd the whole hdd to my backup booting on a gentoo CD.

Regards:
Dw.
--
dr Tóth Attila, Radiológus, 06-20-825-8057
Attila Toth MD, Radiologist, +36-20-825-8057

2010.December 26.(V) 21:06 időpontban pageexec@freemail.hu ezt írta:
> On 26 Dec 2010 at 19:59, "Tóth Attila" wrote:
>
>> I don't know if it is related or not. I don't use ext4 and have no
>> symptoms of disappearing root. I attach a photo taken using a recent
>> kernel. The latest crashes I've experienced for the past few months
>> prevented syncing, so didn't get logged. The other screen capture is
>> older, may not be relevant nowdays.
>
> it's a different issue, the UDEREF changes haven't been incorporated into
> grsec's .32 series yet. looks like some null deref in the filesystem sync
> code, but i can't tell what may be causing this. is this something you can
> reproduce at will? if so, can you try vanilla?
>
>
>
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 30 Dec 2010 at 20:29, "Tóth Attila" wrote:

> There were two screen shots attached. The older one was outdated related
> to 2.6.32 kernel.
>
> But the other was a recent panic.

unfortunately this one had the first oops scroll away already, so i can't tell
much about it...

> So here is another one. This time I could paste it from the log:

this is gain some fs/journaling code trying to increment some seemingly invalid
pointer (in eax), there's probably some memory corruption going on here and it'd
be important to try both vanilla and -r7.

> It happens during IO activity. I wouldn't say heavy IO. The memory is OK,
> the harddrive is perfect.
> I can dd the whole hdd to my backup booting on a gentoo CD.

is the filesystem ok as well (fsck)?
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
2010.December 30.(Cs) 21:35 időpontban pageexec@freemail.hu ezt írta:
> On 30 Dec 2010 at 20:29, "Tóth Attila" wrote:
>
>> There were two screen shots attached. The older one was outdated related
>> to 2.6.32 kernel.
>>
>> But the other was a recent panic.
>
> unfortunately this one had the first oops scroll away already, so i can't
> tell
> much about it...

I took a look at on the logs again. You are right. First came this:

Dec 30 19:43:33 szk-simor kernel: PAX: suspicious general protection
fault: 0000 [#1] DEBUG_PAGEALLOC
Dec 30 19:43:33 szk-simor kernel: last sysfs file:
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:00/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full
Dec 30 19:43:33 szk-simor kernel: Modules linked in: i2c_dev tp_smapi
thinkpad_ec lib80211_crypt_wep lib80211_crypt_tkip lib80211_crypt_ccmp
radeon ttm drm_kms_helper ehci_h
cd ipw2200 libipw yenta_socket i2c_i801 uhci_hcd
Dec 30 19:43:33 szk-simor kernel:
Dec 30 19:43:33 szk-simor kernel: Pid: 1400, comm: kjournald Not tainted
2.6.36-hardened-r6 #1 1830W7F/1830W7F
Dec 30 19:43:33 szk-simor kernel: EIP: 0060:[<0014d697>] EFLAGS: 00010216
CPU: 0
Dec 30 19:43:33 szk-simor kernel: EIP is at
journal_commit_transaction+0x6f7/0xd00
Dec 30 19:43:33 szk-simor kernel: EAX: 00e89222 EBX: cc7ae76d ECX:
00000000 EDX: 00000000
Dec 30 19:43:33 szk-simor kernel: ESI: 00000005 EDI: 00000000 EBP:
e9c1c4c0 ESP: f695bf04
Dec 30 19:43:33 szk-simor kernel: DS: 0068 ES: 0068 FS: 0000 GS: 00e0 SS:
0068
Dec 30 19:43:34 szk-simor kernel: Process kjournald (pid: 1400,
ti=f695a000 task=f70bb0b0 task.ti=f695a000)
Dec 30 19:43:34 szk-simor kernel: Stack:
Dec 30 19:43:34 szk-simor kernel: 000028cd 26626a70 00029eaf f6ae6800
00000000 00000005 e3834c1c e10ed03c
Dec 30 19:43:34 szk-simor kernel: <0> 00000fc4 00000001 f55b1000 f6ae68c0
ebe65e9e 0000681e f7076064 00000000
Dec 30 19:43:34 szk-simor kernel: <0> 10fe2a49 e62673c0 000293ec 000028ce
e3e13910 f70bb0b0 005bb206 00000003
Dec 30 19:43:34 szk-simor kernel: Call Trace:
Dec 30 19:43:34 szk-simor kernel: [<000028cd>] ? copy_thread+0x1d/0x140
Dec 30 19:43:34 szk-simor kernel: [<00029eaf>] ? switched_to_idle+0x1f/0x60
Dec 30 19:43:34 szk-simor kernel: [<0000681e>] ? write_ldt+0x10e/0x2d0
Dec 30 19:43:34 szk-simor kernel: [<000293ec>] ?
finish_task_switch.clone.120.clone.124+0x2c/0x90
Dec 30 19:43:34 szk-simor kernel: [<000028ce>] ? copy_thread+0x1e/0x140
Dec 30 19:43:34 szk-simor kernel: [<005bb206>] ? schedule+0x146/0x3e0
Dec 30 19:43:34 szk-simor kernel: [<0014fb99>] ? kjournald+0x99/0x1b0
Dec 30 19:43:34 szk-simor kernel: [<00046ad0>] ?
autoremove_wake_function+0x0/0x40
Dec 30 19:43:34 szk-simor kernel: [<0014fb00>] ? kjournald+0x0/0x1b0
Dec 30 19:43:34 szk-simor kernel: [<000466a4>] ? kthread+0x74/0x80
Dec 30 19:43:34 szk-simor kernel: [<00046630>] ? kthread+0x0/0x80
Dec 30 19:43:34 szk-simor kernel: [<0000455e>] ?
kernel_thread_helper+0x6/0x18
Dec 30 19:43:34 szk-simor kernel: Code: 00 b9 03 00 00 00 89 ea e8 67 f7
ff ff 89 d8 ba 17 00 00 00 e8 1b 94 ef ff 89 d8 e8 f4 f3 f7 ff e9 ef fc ff
ff 83 6d 40 01 8b 03 <ff>
40 34 71 04 ff 48 34 ce 8b 03 80 48 02 01 8b 44 24 4c 89 da
Dec 30 19:43:34 szk-simor kernel: EIP: [<0014d697>]
journal_commit_transaction+0x6f7/0xd00 SS:ESP 0068:f695bf04
Dec 30 19:43:34 szk-simor kernel: ---[ end trace 0f9efa514b41f93a ]---

and there came this:

Dec 30 19:49:30 szk-simor kernel: PAX: suspicious general protection
fault: 0000 [#2] DEBUG_PAGEALLOC
Dec 30 19:49:30 szk-simor kernel: last sysfs file:
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:00/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full
Dec 30 19:49:30 szk-simor kernel: Modules linked in: i2c_dev tp_smapi
thinkpad_ec lib80211_crypt_wep lib80211_crypt_tkip lib80211_crypt_ccmp
radeon ttm drm_kms_helper ehci_h
cd ipw2200 libipw yenta_socket i2c_i801 uhci_hcd
Dec 30 19:49:30 szk-simor kernel:
Dec 30 19:49:30 szk-simor kernel: Pid: 12897, comm: shutdown Tainted: G
D 2.6.36-hardened-r6 #1 1830W7F/1830W7F
Dec 30 19:49:30 szk-simor kernel: EIP: 0060:[<000bdf8e>] EFLAGS: 00010206
CPU: 0
Dec 30 19:49:30 szk-simor kernel: EIP is at iput+0x4e/0x1f0
Dec 30 19:49:30 szk-simor kernel: EAX: e393467c EBX: e393467c ECX:
00000001 EDX: 48000200
Dec 30 19:49:30 szk-simor kernel: ESI: f68ab201 EDI: e3934abc EBP:
e3934a14 ESP: e6ccfef0
Dec 30 19:49:30 szk-simor kernel: DS: 0068 ES: 0068 FS: 0000 GS: 00e0 SS:
0068
Dec 30 19:49:30 szk-simor kernel: Process shutdown (pid: 12897,
ti=e6cce000 task=f70c37f0 task.ti=e6cce000)
Dec 30 19:49:30 szk-simor kernel: Stack:
Dec 30 19:49:30 szk-simor kernel: f68ab270 e3934a14 000c7217 7fffffff
f68ab200 00000001 00000000 e6ccff0c
Dec 30 19:49:30 szk-simor kernel: <0> e6ccff0c e6ccff18 00000000 e6ccff1c
e6ccff1c f68ab200 00000001 000e6f00
Dec 30 19:49:30 szk-simor kernel: <0> 000cae20 000cadff f68ab200 f7205400
f68ab23c 000ab9f0 e6ccff60 11ee9e94
Dec 30 19:49:30 szk-simor kernel: Call Trace:
Dec 30 19:49:30 szk-simor kernel: [<000c7217>] ? sync_inodes_sb+0xb7/0x100
Dec 30 19:49:30 szk-simor kernel: [<000e6f00>] ? dquot_quota_sync+0x0/0x2a0
Dec 30 19:49:30 szk-simor kernel: [<000cae20>] ? sync_one_sb+0x0/0x20
Dec 30 19:49:30 szk-simor kernel: [<000cadff>] ? __sync_filesystem+0x7f/0xa0
Dec 30 19:49:30 szk-simor kernel: [<000ab9f0>] ? iterate_supers+0x50/0x90
Dec 30 19:49:30 szk-simor kernel: [<000cad42>] ? sync_filesystems+0x12/0x20
Dec 30 19:49:30 szk-simor kernel: [<000caea8>] ? sys_sync+0x18/0x40
Dec 30 19:49:30 szk-simor kernel: [<005bce39>] ? syscall_call+0x7/0xb
Dec 30 19:49:30 szk-simor kernel: Code: c0 75 0a 5b 5e c3 8d b4 26 00 00
00 00 8b b3 9c 00 00 00 8b 46 20 85 c0 0f 84 3f 01 00 00 8b 50 10 85 d2 0f
84 34 01 00 00 89 d8 <ff> d2 85 c0 0f 85 9c 00 00 00 f6 83 10 01 00 00 87
75 26 8b 53
Dec 30 19:49:30 szk-simor kernel: EIP: [<000bdf8e>] iput+0x4e/0x1f0 SS:ESP
0068:e6ccfef0
Dec 30 19:49:30 szk-simor kernel: ---[ end trace 0f9efa514b41f93b ]---
Dec 30 19:49:35 szk-simor kernel: SysRq : Emergency Sync
Dec 30 19:49:35 szk-simor kernel: Emergency Sync complete
Dec 30 19:49:39 szk-simor kernel: SysRq : Emergency Remount R/O

>
>> So here is another one. This time I could paste it from the log:
>
> this is gain some fs/journaling code trying to increment some seemingly
> invalid
> pointer (in eax), there's probably some memory corruption going on here
> and it'd
> be important to try both vanilla and -r7.

Now I'm running -r7. I may have time for vanilla. But I cannot reliably
reproduce it.

I'll give memtest a spin overnight. Last time it was OK. I also have a
feeling of a possible memory corruption, but why it would always result in
a file system error? I have no other symptoms.

>
>> It happens during IO activity. I wouldn't say heavy IO. The memory is
>> OK,
>> the harddrive is perfect.
>> I can dd the whole hdd to my backup booting on a gentoo CD.
>
> is the filesystem ok as well (fsck)?
>

Because of these recurrent fs problems I reverted my mount options to use
data=journal and barrier=1. That is the most conservative and the slowest.
Fortunately I'm not a speed-freak. That way the systems survives these
events without loosing fs consistency. But it happened before, that I had
to restore some of my partitions from backup.

It is interesting to note, that hardened-sources-2.6.32-r20 was more
stable, than the other version I've met since than. It used
grsec-2.2.0-2.6.32.24-201010021153. May be the memory handling of that
kernel is different and that keeps it from triggering some memory
problems...

Thx:
Dw.
--
dr Tóth Attila, Radiológus, 06-20-825-8057
Attila Toth MD, Radiologist, +36-20-825-8057
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 4 Jan 2011 at 14:52, "Tóth Attila" wrote:

> Forgotten attachment

ok, i think it's time to try vanilla if you can as this seems to be
a problem in code we don't really touch directly...
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
No errors were found after 12 hours of memtest.

However some serious crashes still occur.

I attach snippets of kern.log.

Is it still suggests a hardware error?

I have to try out another laptop. That is not convenient...

Dw.
--
dr Tóth Attila, Radiológus, 06-20-825-8057
Attila Toth MD, Radiologist, +36-20-825-8057

2010.December 30.(Cs) 21:35 időpontban pageexec@freemail.hu ezt írta:
> On 30 Dec 2010 at 20:29, "Tóth Attila" wrote:
>
>> There were two screen shots attached. The older one was outdated related
>> to 2.6.32 kernel.
>>
>> But the other was a recent panic.
>
> unfortunately this one had the first oops scroll away already, so i can't
> tell
> much about it...
>
>> So here is another one. This time I could paste it from the log:
>
> this is gain some fs/journaling code trying to increment some seemingly
> invalid
> pointer (in eax), there's probably some memory corruption going on here
> and it'd
> be important to try both vanilla and -r7.
>
>> It happens during IO activity. I wouldn't say heavy IO. The memory is
>> OK,
>> the harddrive is perfect.
>> I can dd the whole hdd to my backup booting on a gentoo CD.
>
> is the filesystem ok as well (fsck)?
>
>
>
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
Forgotten attachment
--
dr Tóth Attila, Radiológus, 06-20-825-8057
Attila Toth MD, Radiologist, +36-20-825-8057

2010.December 30.(Cs) 21:35 időpontban pageexec@freemail.hu ezt írta:
> On 30 Dec 2010 at 20:29, "Tóth Attila" wrote:
>
>> There were two screen shots attached. The older one was outdated related
>> to 2.6.32 kernel.
>>
>> But the other was a recent panic.
>
> unfortunately this one had the first oops scroll away already, so i can't
> tell
> much about it...
>
>> So here is another one. This time I could paste it from the log:
>
> this is gain some fs/journaling code trying to increment some seemingly
> invalid
> pointer (in eax), there's probably some memory corruption going on here
> and it'd
> be important to try both vanilla and -r7.
>
>> It happens during IO activity. I wouldn't say heavy IO. The memory is
>> OK,
>> the harddrive is perfect.
>> I can dd the whole hdd to my backup booting on a gentoo CD.
>
> is the filesystem ok as well (fsck)?
>
>
>
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 4 Jan 2011 at 14:52, "Tóth Attila" wrote:

> No errors were found after 12 hours of memtest.
>
> However some serious crashes still occur.
>
> I attach snippets of kern.log.
>
> Is it still suggests a hardware error?

when i said memory corruption, i didn't mean a hw error but a sw one
that causes it ;). and i wonder whether the buggy code is in vanilla
already or not since we don't really touch the failing code directly.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
On 4 Jan 2011 at 19:38, "Tóth Attila" wrote:

> Would it be possible that the CPU itself is actually failing (opcode 0000)?

not in this case, always look at the first problem, everything else may very
well be just collateral damage. and that's a BUG_ON so it's the kernel that
detects some bad condition. and since that code and condition are fs related,
it's probably best to let the fs guys debug it but they'll deal with it only
if you can reproduce it with vanilla.
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
I see. Now I fired up my spare notebook and transferred the system in the
mean time. :P

I'm currently suffering of crashes occuring while I'm transcoding a
scientific event's DVD content. It became very frustrating.

Would it be possible that the CPU itself is actually failing (opcode 0000)?
The temperature is absolutely within normal limits even during heavy
usage, so I'm sure it's not because of overheating. It's a Pentium M
1.8Ghz, and the notebook's fan is OK.

I'll give vanilla a spin, nevertheless. How I could get closer to the
failing code in case of a kernel problem? Are there any useful suggestions
- besides changing architecture (which is not possible at the moment)?

Thanks:
Dw.
--
dr Tóth Attila, Radiológus, 06-20-825-8057
Attila Toth MD, Radiologist, +36-20-825-8057

2011.Január 4.(K) 17:46 időpontban pageexec@freemail.hu ezt írta:
> On 4 Jan 2011 at 14:52, "Tóth Attila" wrote:
>
>> No errors were found after 12 hours of memtest.
>>
>> However some serious crashes still occur.
>>
>> I attach snippets of kern.log.
>>
>> Is it still suggests a hardware error?
>
> when i said memory corruption, i didn't mean a hw error but a sw one
> that causes it ;). and i wonder whether the buggy code is in vanilla
> already or not since we don't really touch the failing code directly.
>
>
>
Re: Disappearing root on 2.6.36-hardened-r6 upgrade [ In reply to ]
I'd like to give a feedback regarding the crashes I've reported.
I transferred my system to my spare laptop (exactly the same model). I
haven't experienced any hangups or file systems problems so far, using the
same kernel (hardened-sources-2.6.36-r7) and performing the same tasks -
including a regular weekly upgrade (at least xulrunner).
That drives me to the direction, that it may be possible, that my problems
were caused by some sort of hardware glitch. I would rather repair my
laptop rather than ordering another spare device. Since there were no
problems running memtest for 12+ hours, I suspect some problems with
either the CPU or the motherboard. Replacing the motherboard lays beyond
my resources, so I'll replace the CPU. That is pretty convenient,
especially because I have a spare CPU in my drawer.

What would you guys suggest to test the system with besides emerging
qt-gui? Are there any memtest equivalent for checking the CPU?

Thx:
Dw.
--
dr Tóth Attila, Radiológus, 06-20-825-8057
Attila Toth MD, Radiologist, +36-20-825-8057

2011.Január 4.(K) 19:18 időpontban pageexec@freemail.hu ezt írta:
> On 4 Jan 2011 at 19:38, "Tóth Attila" wrote:
>
>> Would it be possible that the CPU itself is actually failing (opcode
>> 0000)?
>
> not in this case, always look at the first problem, everything else may
> very
> well be just collateral damage. and that's a BUG_ON so it's the kernel
> that
> detects some bad condition. and since that code and condition are fs
> related,
> it's probably best to let the fs guys debug it but they'll deal with it
> only
> if you can reproduce it with vanilla.
>
>
>

1 2  View All