Mailing List Archive

Charset for logs
Ever since we started to use RT (before 3.8.7, now 4.0.4), it doesn't seem to use the correct charset for logging. All norwegian characters (æøå) becomes: �. I can see this because we have scrips that contain norwegian characters, and every time a scrip is launched, it is logged to the Apache log. Today I also noticed that if I subscribe to a dashboard with norwegian characters in its name, the subject of the email sent out also have this problem (� instead of æ, ø or å). The email body however, has the correct charset. There is no charset problems in the web UI. How can this be fixed?
Thanks
Re: Charset for logs [ In reply to ]
On Fri, Jul 20, 2012 at 09:24:22AM +0200, Ole Jon Bjørkum wrote:
> Ever since we started to use RT (before 3.8.7, now 4.0.4), it doesn't seem to use the correct
> charset for logging. All norwegian characters (aeo/aa) becomes: **. I can see this because we
> have scrips that contain norwegian characters, and every time a scrip is launched, it is
> logged to the Apache log.

How are you logging, Syslog, Screen, File? RT has several different ways
to log and it's impossible to test without knowing.

> Today I also noticed that if I subscribe to a dashboard with
> norwegian characters in its name, the subject of the email sent out also have this problem (**
> instead of ae, o/ or aa). The email body however, has the correct charset. There is no charset
> problems in the web UI. How can this be fixed?

Please provide a raw Subject: line so we can see what's going on.
Re: Charset for logs [ In reply to ]
RT is installed from the Ubuntu repository, and the installation seems to log to /var/log/syslog and /var/log/apache2/error.log. However, I just discovered that it is only the Apache log that has charset problems. The syslog shows all characters correctly. Also, the Apache log logs in GMT while the syslog logs in the correct timezone, but I guess that is how it's supposed to be.
I'm not quite sure what you mean by raw subject line.
This is what shows up in Outlooks internet headers: Alle nye og ?pne sakerThis is how the subject line looks in Outlook: Alle nye og �pne saker jeg eier
The question mark should be the character "Ã¥", so the word should be "Ã¥pne"
The message body uses the correct charset (I can see that UTF-8 is specified in the HTML).
Thanks

> Date: Fri, 20 Jul 2012 08:17:45 -0700
> From: falcone@bestpractical.com
> To: rt-users@lists.bestpractical.com
> Subject: Re: [rt-users] Charset for logs
>
> On Fri, Jul 20, 2012 at 09:24:22AM +0200, Ole Jon Bjørkum wrote:
> > Ever since we started to use RT (before 3.8.7, now 4.0.4), it doesn't seem to use the correct
> > charset for logging. All norwegian characters (aeo/aa) becomes: **. I can see this because we
> > have scrips that contain norwegian characters, and every time a scrip is launched, it is
> > logged to the Apache log.
>
> How are you logging, Syslog, Screen, File? RT has several different ways
> to log and it's impossible to test without knowing.
>
> > Today I also noticed that if I subscribe to a dashboard with
> > norwegian characters in its name, the subject of the email sent out also have this problem (**
> > instead of ae, o/ or aa). The email body however, has the correct charset. There is no charset
> > problems in the web UI. How can this be fixed?
>
> Please provide a raw Subject: line so we can see what's going on.
Re: Charset for logs [ In reply to ]
On Mon, Jul 23, 2012 at 09:36:27AM +0200, Ole Jon Bjørkum wrote:
> RT is installed from the Ubuntu repository, and the installation seems to log to
> /var/log/syslog and /var/log/apache2/error.log. However, I just discovered that it is only the
> Apache log that has charset problems. The syslog shows all characters correctly. Also, the
> Apache log logs in GMT while the syslog logs in the correct timezone, but I guess that is how
> it's supposed to be.

RT prints logs in GMT, when those pass through syslog, syslog will add
an additional timestamp. Apache however keeps the RT timestamps.
Is it just RT's messages in the apache logs that are corrupt, or is
something as simple as a request to /Test/latin1pagename.html
corrupted in the access/error log? RT should be pushing out UTF-8 but
I'm not sure if RT is doing something wrong or if apache is corrupting
it.

> I'm not quite sure what you mean by raw subject line.
> This is what shows up in Outlooks internet headers: Alle nye og ?pne saker
> This is how the subject line looks in Outlook: Alle nye og **pne saker jeg eier
> The question mark should be the character "aa", so the word should be "aapne"
> The message body uses the correct charset (I can see that UTF-8 is specified in the HTML).

I mean the raw on-disk header. Subject: lines are encoded if they
contain UTF-8, so something like this:
Subject: =?UTF-8?B?4pyIVEhSRUUgQ29vbCBEZWFscyBGcm9tIEFtZXJpY2FuIEFpcmxpbmVz?=
If you have an email that is consistently corrupted when passing
through RT, if you can capture a raw version of the email (so not the
.msg file from Outlook, but something caught further upstream, before
it gets to rt-mailgate preferably) please zip it up and send it into
the RT bug tracker, along with your System Configuration page which
contains a ton of information such as perl module versions, some of
which are known-bad.

-kevin

>
> > Date: Fri, 20 Jul 2012 08:17:45 -0700
> > From: falcone@bestpractical.com
> > To: rt-users@lists.bestpractical.com
> > Subject: Re: [rt-users] Charset for logs
> >
> > On Fri, Jul 20, 2012 at 09:24:22AM +0200, Ole Jon Bjo/rkum wrote:
> > > Ever since we started to use RT (before 3.8.7, now 4.0.4), it doesn't seem to use the
> correct
> > > charset for logging. All norwegian characters (aeo/aa) becomes: **. I can see this because
> we
> > > have scrips that contain norwegian characters, and every time a scrip is launched, it is
> > > logged to the Apache log.
> >
> > How are you logging, Syslog, Screen, File? RT has several different ways
> > to log and it's impossible to test without knowing.
> >
> > > Today I also noticed that if I subscribe to a dashboard with
> > > norwegian characters in its name, the subject of the email sent out also have this problem
> (**
> > > instead of ae, o/ or aa). The email body however, has the correct charset. There is no
> charset
> > > problems in the web UI. How can this be fixed?
> >
> > Please provide a raw Subject: line so we can see what's going on.
Re: Charset for logs [ In reply to ]
If I create an HTML-file with the name æøå.html, and access it through Apache, the access log says "GET /%C3%A6%C3%B8%C3%A5.html". It seems to URL-encode it or something. If I then delete the HTML file an try to access it, the error log says "File does not exist: /var/www/\xc3\xa6\xc3\xb8\xc3\xa5.html". Not at all readable either... Maybe it's Apache's fault.
I don't know how I can capture the raw e-mails that RT sends out for dashboard subscriptions. They are sent by RT through Postfix on the local server. Please tell me if you know how.

> Date: Tue, 31 Jul 2012 12:09:57 -0400
> From: falcone@bestpractical.com
> To: rt-users@lists.bestpractical.com
> Subject: Re: [rt-users] Charset for logs
>
> On Mon, Jul 23, 2012 at 09:36:27AM +0200, Ole Jon Bjørkum wrote:
> > RT is installed from the Ubuntu repository, and the installation seems to log to
> > /var/log/syslog and /var/log/apache2/error.log. However, I just discovered that it is only the
> > Apache log that has charset problems. The syslog shows all characters correctly. Also, the
> > Apache log logs in GMT while the syslog logs in the correct timezone, but I guess that is how
> > it's supposed to be.
>
> RT prints logs in GMT, when those pass through syslog, syslog will add
> an additional timestamp. Apache however keeps the RT timestamps.
> Is it just RT's messages in the apache logs that are corrupt, or is
> something as simple as a request to /Test/latin1pagename.html
> corrupted in the access/error log? RT should be pushing out UTF-8 but
> I'm not sure if RT is doing something wrong or if apache is corrupting
> it.
>
> > I'm not quite sure what you mean by raw subject line.
> > This is what shows up in Outlooks internet headers: Alle nye og ?pne saker
> > This is how the subject line looks in Outlook: Alle nye og **pne saker jeg eier
> > The question mark should be the character "aa", so the word should be "aapne"
> > The message body uses the correct charset (I can see that UTF-8 is specified in the HTML).
>
> I mean the raw on-disk header. Subject: lines are encoded if they
> contain UTF-8, so something like this:
> Subject: =?UTF-8?B?4pyIVEhSRUUgQ29vbCBEZWFscyBGcm9tIEFtZXJpY2FuIEFpcmxpbmVz?=
> If you have an email that is consistently corrupted when passing
> through RT, if you can capture a raw version of the email (so not the
> .msg file from Outlook, but something caught further upstream, before
> it gets to rt-mailgate preferably) please zip it up and send it into
> the RT bug tracker, along with your System Configuration page which
> contains a ton of information such as perl module versions, some of
> which are known-bad.
>
> -kevin
>
> >
> > > Date: Fri, 20 Jul 2012 08:17:45 -0700
> > > From: falcone@bestpractical.com
> > > To: rt-users@lists.bestpractical.com
> > > Subject: Re: [rt-users] Charset for logs
> > >
> > > On Fri, Jul 20, 2012 at 09:24:22AM +0200, Ole Jon Bjo/rkum wrote:
> > > > Ever since we started to use RT (before 3.8.7, now 4.0.4), it doesn't seem to use the
> > correct
> > > > charset for logging. All norwegian characters (aeo/aa) becomes: **. I can see this because
> > we
> > > > have scrips that contain norwegian characters, and every time a scrip is launched, it is
> > > > logged to the Apache log.
> > >
> > > How are you logging, Syslog, Screen, File? RT has several different ways
> > > to log and it's impossible to test without knowing.
> > >
> > > > Today I also noticed that if I subscribe to a dashboard with
> > > > norwegian characters in its name, the subject of the email sent out also have this problem
> > (**
> > > > instead of ae, o/ or aa). The email body however, has the correct charset. There is no
> > charset
> > > > problems in the web UI. How can this be fixed?
> > >
> > > Please provide a raw Subject: line so we can see what's going on.