[Novalug] Best programming language for "munging?" Not Perl? Ruby? Python?

William Sutton william at trilug.org
Wed Oct 28 11:03:55 EDT 2009


I've been using Perl 5 professionally, personally, and on pet project 
since '98.  With the occasional exception of forays into server-side 
JavaScript, Progress 4GL (eww), and client-side JavaScript/HTML/CSS, it 
has been my primary tool in my career.  My first self-taught programming 
language was C (long since forgotten), I was exposed to C++, Lisp, Scheme, 
x86 assembler and Java in college, and the main programming language in 
college was Ada 95 (which I still remembered enough of a few years ago 
to implement the HP printer hack in).

My professional career has included Fortune 500 manufacturing, news CMS 
systems, and, of late, litigation support.

My Perl experience is extensive; procedural, OO, regex, etc.  I'm not 
Larry Wall, but I've done a thing or two of great complexity.

Having thus established my eminence ;-) we turn to the real issues:

1. Perl was designed to perform text processing.  It has also been 
designed to work the way humans think.  I find Java horribly clunky (not 
as much so as Ada 95) because my brain is wired for Perl.  I don't "think" 
about what the various syntax or operators are called; I just use them 
like so many Lego blocks.  Of course, from the interpreter side, it has 
become increasingly complex because of sacrifices to maintain backward 
compatibility, but from the programming side, it's incredibly easy to use.

2. Perl 6 is a response to all the various criticisms leveled at Perl over 
the years.  IMHO, it's on the Duke Nukem Forever development path. 
Besides which, there is an enormous amount of code in Perl 5 that will 
NEVER be rewritten.  People can talk about, evaluate, and rhapsodize over 
Perl 6 all day long, but it has a very steep road ahead to overcome the 
Perl 5 legacy.

3. Closely related to #2 is the fact that many small businesses are 
running Perl/Win-32 applications for their business processes.  Most of 
them will never be rewritten because you just don't break what already 
works.

4. Litigation support is about providing attorneys and paralegals with the 
information they need, when they need it.  How you go about that is up to 
you.  There are, of course, some old standards in the business: 
Concordance, Summation, IPro.  There are some new attempts at 
standardization (EDRM).  There are also a number of large third-party 
service providers out there (some of whom resell other products).  You 
could write your own document search and retrieval tool if you wanted.  At 
the root level, what you are providing are documents (with tif/jpeg 
page renders, document-level data, page-level or document-level extracted 
text, redaction capability, and various other associated services.  You 
could write a tool to process these sorts of things in C, or Java, or 
whatever you like.  For the extensive text processing involved during the 
import/export process (or to massage data into a different format), you 
probably want to use something designed for text processing.  I use Perl 
on a daily basis for these tasks.  I can work wonders in 15 minutes with a 
quick Perl script that would take a Java or C programmer hours or days to 
do.  And I wouldn't want to mess with CPL for Concordance; I've heard 
horror stories from some very capable co-workers with even more extensive 
résumés than mine about CPL.

Bottom line:  Perl 5 isn't dead, and won't be for a long time.  Sure, a 
lot of people who have written Perl have moved on to the next language in 
the Language-of-the-Month-Club, but its utility hasn't diminished. 
Finally, I find it highly useful for daily litigation support work.

William Sutton


On Wed, 28 Oct 2009, Paul D. Bain wrote:

> James Ewing Cottrell 3rd wrote:
>>>
>>> PROS:
>>>
>>> When Perl was made, it was a nice unification of sed/awk/sh/C. Easy to
>>> understand and Get Things Done. But it has turned into a monster, mostly
>>> because it pretended to be a Language when it stretched that term to the
>>> breaking point. It's a collection of Special Cases.
>>>
>>> Perl did get some things right. The idea of \special always being
>>> literal in regexps is nice.
>>>
>>> And taking the left or right values from || and && is WAY better than
>>> the stupid 0 or 1 of C, altho making them boolean results as in Java is
>>> useful.
>>>
>>> And it IS totally cool that you can create reference to hashes of arrays
>>> of arrays of hashes on the fly all in one statement.
>>>
>>> CONS:
>>>
>>> The grammar is Context Sensitive and Ambiguous. Cardinal Sins.
>>>
>>> OK, let me ask you folks, what would you call the string =~ in the
>>> following statement: $var =~ /regexp/;
>>>
>>> Is it an operator? NO! OK, then what is it? Well, it's a Glorified
>>> Comma. The operator is "m" here, and "=~m" is the infix notation for
>>> "$var" and /regexp/ if you will.
>>>
>>> After using Perl for about 5 years I was calling for a Perl II language
>>> back in the early 90s. Well, they finally got to it with Perl 6, but
>>> they are going in the wrong direction ... it's even MORE complex.
>>>
>>> JIM
>
> Jim,
>
> 	Thank you for giving us the opinion of a long-time Perl user. I had
> suspected that Perl 6 was headed in the wrong direction, but I was not
> certain because I am not a long-time Perl user and because I have not
> been following Perl 6 developments closely.
>
> 	My question is this: If Perl is deficient in some respects, then which
> language is now best for "munging," the transforming of messy input into
> cleaner, culled data? In yesteryear, Perl held this distinction -- does
> it still? Or does that distinction now belong to Ruby? I understand that
> Ruby processes strings just as well as Perl does because, for example,
> it supports Perl-style regular expressions.
>
> 	One of Perl's huge advantages is this respect is CPAN.org, which has,
> AFAIK, no Ruby counterpart. CPAN has many good modules for processing
> and munging email, which is important to me. Furthermore, Ruby's
> libraries are still considered inferior to Python's. Moreover, the
> number of mature Perl and Python frameworks is huge, probably far larger
> than the number of Ruby frameworks. In this regard, Ruby-on-Rails
> (RoRails) does not count. RoRails is a Web application development
> framework only, and I am interested in munging tools _only_.
>
> 	BTW, this is an important question in the field of legal technology.
> That is why I am asking.
>
>
>>> American Dave wrote:
>>>
>>>> On Fri, Oct 23, 2009 at 01:07:39PM +1300, Mark Smith wrote:
>>>>> On Thu, Oct 22, 2009 at 04:42:52PM -0400, James Ewing Cottrell 3rd wrote:
>>>>>> I've pretty much been writing in Perl (and shell) for 20 years now.
>>>>>
>>>>> even today, it's a really good language for just about everything.
>>>>> sadly, some folks have negative connotations associated with perl.
>>>>> not sure why it gets a poor review from those people.
>>>>
>>>> The complaints usually revolve around the following:
>>>>
>>>>   * Vars like $_ are horrid to read, especially when implied
>>>>   * No out-of-the-box GUI support
>>>>   * Perl's base is much smaller than many other modern languages,
>>>> meaning you've got module soup after a while.  This is also a feature.
>>>>   * Without 'use strict' code isn't maintainable.
>>>>   * Perl 6 is very, very late.
>>>>
>>>> I should note that I like Perl.
>>>> -A. Dave
>
>
> _______________________________________________
> Novalug mailing list
> Novalug at calypso.tux.org
> http://calypso.tux.org/mailman/listinfo/novalug
>


More information about the Novalug mailing list