User: Password:
Subscribe / Log in / New account



Posted Jan 3, 2014 16:47 UTC (Fri) by meyert (subscriber, #32097)
Parent article: OpenRefine

why not use an ETL tool, like pentaho kettle for this kind of data processing, or even Apache hadoop which is also great in handling "unstructered" data?!

(Log in to post comments)


Posted Jan 9, 2014 22:58 UTC (Thu) by Wol (guest, #4433) [Link]

Or use a tool that's as old as relational (actually, older).

Of course, I'm going to suggest Pick :-)

But that thing about "complex cell transformation" is stuff that I've been doing with Pick ever since I first met it in the mid 80s.

Something that is INCREDIBLY simple with Pick - first import the data into Pick (it doesn't enforce structure, so that's easy). Then create virtual columns using Pick BASIC code that mirror your raw columns, just cleaned up.

Then you can do SELECT FILE WITH RAW_DATA NE CLEANED_DATA and you'll find all the records where the data is duff.

And it's far faster than SQL :-)

Mind you, (probably in the early 2000s) one of my colleagues, on a SQL training course, amazed the instructor by pulling the same stunt in MS SQL Server. Some of the delegates, and the instructor, twigged what was happening and were amazed and delighted to learn this new trick. Others just couldn't understand or see the point of it.



Posted May 9, 2016 11:29 UTC (Mon) by diegor (subscriber, #1967) [Link]

Pick is quite a generic name. Have you some link to look at?


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds