Mailing List Archive

how to extract columns like awk $1 $5
Hi

Is there a simple way to extract words speerated by a space in python
the way i do it in awk '{print $4 $5}' . I am sure there should be some
but i dont know it.

Thanks
n00b


--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
On Sat, 2005-01-08 at 01:15, Anand S Bisen wrote:
> Hi
>
> Is there a simple way to extract words speerated by a space in python
> the way i do it in awk '{print $4 $5}' . I am sure there should be some
> but i dont know it.

The 'str.split' method is probably what you want:

.>>> x = "The confused frog mumbled something about foxes"
.>>> x.split()
['The', 'confused', 'frog', 'mumbled', 'something', 'about', 'foxes']
.>>> x.split(" ")[4:6]
['something', 'about']

so if 'x' is your string, the rough equivalent of that awk statement is:

.>>> x_words = x.split()
.>>> print x_words[4], x_words[5]

or perhaps

.>>> print "%s %s" % tuple(x.split()[4:6])

--
Craig Ringer

--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
It takes a few more lines in Python, but you can do something like

for text in open("file.txt","r"):
words = text.split()
print words[4],words[5]
(assuming that awk starts counting from zero -- I forget).

--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:

> Is there a simple way to extract words speerated by a space in python
> the way i do it in awk '{print $4 $5}' . I am sure there should be some
> but i dont know it.

mystr = '1 2 3 4 5 6'
parts = mystr.split()
print parts[3:5]

Jeremy

--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
In article <mailman.310.1105118153.22381.python-list@python.org>,
Anand S Bisen <vmlinuz@abisen.com> wrote:
>Hi
>
>Is there a simple way to extract words speerated by a space in python
>the way i do it in awk '{print $4 $5}' . I am sure there should be some
>but i dont know it.

Something along the lines of:

words = input.split()
print words[4], words[5]
--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
roy@panix.com (Roy Smith) writes:
> Something along the lines of:
>
> words = input.split()
> print words[4], words[5]

That throws an exception if there are fewer than 6 fields, which might
or might not be what you want.
--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:

> Is there a simple way to extract words speerated by a space in python
> the way i do it in awk '{print $4 $5}' . I am sure there should be some
> but i dont know it.

i guess it depends on how faithfully you want to reproduce awk's behavior
and options.

as several people have mentioned, strings have the split() method for
simple tokenization, but blindly indexing into the resulting sequence
can give you an out-of-range exception. out of range indexes are no
problem for awk; it would just return an empty string without complaint.

note that the index bases are slightly different: python sequences
start with index 0, while awk's fields begin with $1. there IS a $0,
but it means the entire unsplit line.

the split() method accepts a separator argument, which can be used to
replicate awk's -F option / FS variable.

so, if you want to closely approximate awk's behavior without fear of
exceptions, you could try a small function like this:


def awk_it(instring,index,delimiter=" "):
try:
return [instring,instring.split(delimiter)[index-1]][max(0,min(1,index))]
except:
return ""


>>> print awk_it("a b c d e",0)
a b c d e

>>> print awk_it("a b c d e",1)
a

>>> print awk_it("a b c d e",5)
e

>>> print awk_it("a b c d e",6)


- dan
--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
Dan Valentine <nobody@invalid.domain> wrote:

> On Fri, 07 Jan 2005 12:15:48 -0500, Anand S Bisen wrote:
>
> > Is there a simple way to extract words speerated by a space in python
> > the way i do it in awk '{print $4 $5}' . I am sure there should be some
> > but i dont know it.
>
> i guess it depends on how faithfully you want to reproduce awk's behavior
> and options.
>
> as several people have mentioned, strings have the split() method for
> simple tokenization, but blindly indexing into the resulting sequence
> can give you an out-of-range exception. out of range indexes are no
> problem for awk; it would just return an empty string without complaint.

It's pretty easy to create a list type which has awk-ish behavior:

class awkList (list):
def __getitem__ (self, key):
try:
return list.__getitem__ (self, key)
except IndexError:
return ""

l = awkList ("foo bar baz".split())
print "l[0] = ", repr (l[0])
print "l[5] = ", repr (l[5])

-----------

Roy-Smiths-Computer:play$ ./awk.py
l[0] = 'foo'
l[5] = ''

Hmmm. There's something going on here I don't understand. The ref
manual (3.3.5 Emulating container types) says for __getitem__(), "Note:
for loops expect that an IndexError will be raised for illegal indexes
to allow proper detection of the end of the sequence." I expected my
little demo class to therefore break for loops, but they seem to work
fine:

>>> import awk
>>> l = awk.awkList ("foo bar baz".split())
>>> l
['foo', 'bar', 'baz']
>>> for i in l:
... print i
...
foo
bar
baz
>>> l[5]
''

Given that I've caught the IndexError, I'm not sure how that's working.
--
http://mail.python.org/mailman/listinfo/python-list
Re: how to extract columns like awk $1 $5 [ In reply to ]
Roy Smith wrote:
> Hmmm. There's something going on here I don't understand. The ref
> manual (3.3.5 Emulating container types) says for __getitem__(),
"Note:
> for loops expect that an IndexError will be raised for illegal
indexes
> to allow proper detection of the end of the sequence." I expected my

> little demo class to therefore break for loops, but they seem to work

> fine:
>
> >>> import awk
> >>> l = awk.awkList ("foo bar baz".split())
> >>> l
> ['foo', 'bar', 'baz']
> >>> for i in l:
> ... print i
> ...
> foo
> bar
> baz
> >>> l[5]
> ''
>
> Given that I've caught the IndexError, I'm not sure how that's
working.


The title of that particular section is "Emulating container types",
which is not what you're doing, so it doesn't apply here. For built-in
types, iterators are at work. The list iterator probably doesn't even
call getitem, but accesses the items directly from the C structure.
--
CARL BANKS

--
http://mail.python.org/mailman/listinfo/python-list