Mailing List Archive

XML expat error
Hi,

I have written a piece of code that reads all xml files in a directory
in onder to retrieve one element in each of these files. All files
have the same XML structure. After file 123 I receive the following
error :

xml.parsers.expat.ExpatError: not well-formed (invalid token): line
554, column 20

I guess that the element I try to read or the XML(which would be
strange since they have been created with the same code) can't ben
retrieved.

Is there a way to :
1. fix this problems so that I can retrieve it
2. is there a way that after such an error the invalid file is being
skipped and the program continues with reading the subsequent files;
Some sort of error handling?

Here is the code I use :

from xml.dom import minidom
import os
path = "/Documents/programming/data/xml/"


dirList = os.listdir(path)
url_file=open('/Documents/programming/data/xml/test.txt','w')
for file in dirList:
xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
xml_elem = xmldoc.getElementsByTagName('webpage')
web_elem = xml_elem[0]
url = web_elem.attributes['uri']
url_file.write(url.value + '\n')
url_file.close()
--
http://mail.python.org/mailman/listinfo/python-list
Re: XML expat error [ In reply to ]
"dirkheld" <dirkheld@gmail.com> wrote in message
news:babb6775-311d-4f7a-bc03-90f249e34180@s19g2000prg.googlegroups.com...

> xml.parsers.expat.ExpatError: not well-formed (invalid token): line
> 554, column 20
>
> I guess that the element I try to read or the XML(which would be
> strange since they have been created with the same code) can't ben
> retrieved.

It's fairly easy to write non-robust XML generating code, and also
quick to test if one file is always bad. Drop it into a text editor or
Firefox, and take a quick look at line 554. Most likely some random
control character has sneaked in; it only takes (for example) one NUL
to make the document ill-formed.



--
http://mail.python.org/mailman/listinfo/python-list
Re: XML expat error [ In reply to ]
On 27 feb, 17:18, "Richard Brodie" <R.Bro...@rl.ac.uk> wrote:
> "dirkheld" <dirkh...@gmail.com> wrote in message
>
> news:babb6775-311d-4f7a-bc03-90f249e34180@s19g2000prg.googlegroups.com...
>
> > xml.parsers.expat.ExpatError: not well-formed (invalid token): line
> > 554, column 20
>
> > I guess that the element I try to read or the XML(which would be
> > strange since they have been created with the same code) can't ben
> > retrieved.
>
> It's fairly easy to write non-robust XML generating code, and also
> quick to test if one file is always bad. Drop it into a text editor or
> Firefox, and take a quick look at line 554. Most likely some random
> control character has sneaked in; it only takes (for example) one NUL
> to make the document ill-formed.

Something strange here. The xml file causing the problem has only 361
lines. Isn't there a way to catch this error, ignore it and continu
with the rest of the other files?
This is the full error report :

Traceback (most recent call last):
File "xmltest.py", line 10, in <module>
xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/xml/dom/minidom.py", line 1913, in parse
return expatbuilder.parse(file)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/xml/dom/expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
554, column 20
--
http://mail.python.org/mailman/listinfo/python-list
Re: XML expat error [ In reply to ]
On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:

> Something strange here. The xml file causing the problem has only 361
> lines. Isn't there a way to catch this error, ignore it and continu
> with the rest of the other files?

Yes of course: handle the exception instead of letting it propagate to the
top level and ending the program.

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list
Re: XML expat error [ In reply to ]
On 28 feb, 08:18, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
> On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:
> > Something strange here. The xml file causing the problem has only 361
> > lines. Isn't there a way to catch this error, ignore it and continu
> > with the rest of the other files?
>
> Yes of course: handle the exception instead of letting it propagate to the
> top level and ending the program.
>
> Ciao,
> Marc 'BlackJack' Rintsch

Ehm, maybe a stupid question... how. I'm rather new to python and I
never user error handling.
--
http://mail.python.org/mailman/listinfo/python-list
Re: XML expat error [ In reply to ]
dirkheld wrote:
> On 28 feb, 08:18, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
>> On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:
>>> Something strange here. The xml file causing the problem has only 361
>>> lines. Isn't there a way to catch this error, ignore it and continu
>>> with the rest of the other files?
>> Yes of course: handle the exception instead of letting it propagate to the
>> top level and ending the program.
>>
>> Ciao,
>> Marc 'BlackJack' Rintsch
>
> Ehm, maybe a stupid question... how. I'm rather new to python and I
> never user error handling.

Care to read the tutorial?

Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Re: XML expat error [ In reply to ]
On Thu, 28 Feb 2008 12:37:10 -0800, dirkheld wrote:

>> Yes of course: handle the exception instead of letting it propagate to the
>> top level and ending the program.
>
> Ehm, maybe a stupid question... how. I'm rather new to python and I
> never user error handling.

Then you should work through the tutorial in the docs, at least until
section 8.3 Handling Exceptions:

http://docs.python.org/tut/node10.html#SECTION0010300000000000000000

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list