Mailing List Archive

python ETL
Hi,
My company is involved in the development of many data marts and
data-warehouses, and I currently looking into migrating our old set of
tools (written in Korn) to a new, more dynamic and robust one. I am
looking into python as I have heard that it could be a good contestant
for the job, and wanted to know if anyone knew of an existing open
source project which implements ETL using python, or any libraries that
may ease the production of such tools.

Thanks.

--
http://mail.python.org/mailman/listinfo/python-list
Re: python ETL [ In reply to ]
arielgr@gmail.com wrote:
> Hi,
> My company is involved in the development of many data marts and
> data-warehouses, and I currently looking into migrating our old set of
> tools (written in Korn) to a new, more dynamic and robust one. I am
> looking into python as I have heard that it could be a good contestant
> for the job, and wanted to know if anyone knew of an existing open
> source project which implements ETL using python, or any libraries that
> may ease the production of such tools.

I'm not an expert in such matters, I had to Google for the definition of
ETL ("extract, transform, and load" which appears to just be a buzzword
for "data munging"); but it seems to me that "ETL" is so utterly broad
in scope that we can't tell you anything until you give us some more
information.

What are your sources of data? What kind of data are you dealing with?
What kinds of munging do you want to do? What formats are the data going to?

However, given that your current toolset is written as Korn shell
scripts, I'm pretty confident that Python will be up to the task.

--
Robert Kern
rkern@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

--
http://mail.python.org/mailman/listinfo/python-list
Re: python ETL [ In reply to ]
arielgr@gmail.com wrote:
> Hi,
> My company is involved in the development of many data marts and
> data-warehouses, and I currently looking into migrating our old set of
> tools (written in Korn) to a new, more dynamic and robust one. I am
> looking into python as I have heard that it could be a good contestant
> for the job, and wanted to know if anyone knew of an existing open
> source project which implements ETL using python, or any libraries that
> may ease the production of such tools.
>
> Thanks.

Robert is right; you have not really given much information.

However, I would have to assume that if homebrew shell scripts have been
doing the work adequately, then the marts and warehouses are not very
large and the datasets are primarily text rather than binary.

If this is the case and you are only seeking incremental improvement,
then Python would be a very good choice. Perl would also do the job.
Just about any language would work. Yes, there are many reasons to
choose Python. However, you would have to build any scalability and
metadata management.

If you seek a radical improvement, it is available, but I do not know of
any free tools that will do it. A question like this will probably not
be answered in a newsgroup post or even the exchange of a few emails.

Choosing an effective tool for the organization is not a trivial
process. It requires knowledge of both the tools and the organization's
methodologies and processes. If you do not have staff who can do this,
most companies find it is much cheaper and faster to pay someone who
does know (a consultant) to assist them in assessing their requirements,
tool selection, and forming an implementation plan.

Yes, your company staff can learn a lot by experimenting and playing
with several tools, but shareholders might not view that approach as the
most effective.
--
http://mail.python.org/mailman/listinfo/python-list
Re: python ETL [ In reply to ]
On Mon, 01 Aug 2005 10:49:36 -0500, Paul Watson <pwatson@redlinepy.com> wrote:
> arielgr@gmail.com wrote:
>> Hi,
>> My company is involved in the development of many data marts and
>> data-warehouses, and I currently looking into migrating our old set of
>> tools (written in Korn) to a new, more dynamic and robust one.
...
> However, I would have to assume that if homebrew shell scripts have been
> doing the work adequately, then the marts and warehouses are not very
> large and the datasets are primarily text rather than binary.
>
> If this is the case and you are only seeking incremental improvement,
> then Python would be a very good choice. Perl would also do the job.
> Just about any language would work. Yes, there are many reasons to
> choose Python. However, you would have to build any scalability and
> metadata management.
>
> If you seek a radical improvement, it is available, but I do not know of
> any free tools that will do it. A question like this will probably not
> be answered in a newsgroup post or even the exchange of a few emails.
>
> Choosing an effective tool for the organization is not a trivial
> process. It requires knowledge of both the tools and the organization's
> methodologies and processes. If you do not have staff who can do this,
> most companies find it is much cheaper and faster to pay someone who
> does know (a consultant) to assist them in assessing their requirements,
> tool selection, and forming an implementation plan.

But remember: sometimes, a bunch of shell scripts or a Python script is the
right tool for the problem.

Sometimes, I think a bunch of shell scripts is the right tool for a lot of
the problems people throw XMLthis, XMLthat, .NET, SQL servers, consultants
and money at.

There is no real reason (with the little information we have[1]) to believe
that the original poster is making his employer a disservice by looking at
doing things himself, in plain old Python, instread of letting someome tear
down and rebuild whatever workflow/methodology/process stuff they have right
now.

/Jorgen
[1] Unless "ETL" and "data mart" carry some deep meaning which
I've missed, that is.

--
// Jorgen Grahn <jgrahn@ Ph'nglui mglw'nafh Cthulhu
\X/ algonet.se> R'lyeh wgah'nagl fhtagn!
--
http://mail.python.org/mailman/listinfo/python-list