|
|
Subscribe / Log in / New account

Surprisingly relevant?

Surprisingly relevant?

Posted Nov 19, 2020 11:57 UTC (Thu) by motiejus (subscriber, #92837)
In reply to: Surprisingly relevant? by warrax
Parent article: The state of the AWK

Recently I used awk to filter a few hundred gigabytes of LIDAR data to clip it to a bounding boxes I was interested at:

#!/usr/bin/awk -f
BEGIN { FS = "," }
$1 > ymin && $1 < ymax && $2 > xmin && $2 < xmax {print $2 "," $1 "," $3}

I just ran this again on a data sub-set (100M of data points, 2.7GB uncompressed) just to have data for this comment. My 8-core laptop did the whole operation in 29 seconds:
1. each file: unzip to memory.
2. each file: run through the program above for the bounding box.
3. each file: sort.
4. all files: merge sort.
5. all files: compress.

Combined with GNU Make, `sort` and `sort -m`, I can't imagine a more powerful combination of tools for this simple "big data"(?) task.

No, awk is not dead, and spending half-hour[1] is enough to use it for life. :)

[1]: https://ferd.ca/awk-in-20-minutes.html


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds