[Novalug] Converting HTML with sed and regular expressions
Matt Good
matt at matt-good.net
Sun Apr 8 23:18:48 EDT 2007
I'd recommend trying Tidy which supports a lot of features for cleaning
up bad HTML:
http://tidy.sourceforge.net/
-- Matt Good
On Sun, 2007-04-08 at 20:12 -0700, Jim Ide wrote:
> Hello -
>
> I have several HTML files that contain lines similar to these:
>
> <P CLASS="western" STYLE="margin-bottom: 0in"><FONT FACE="Comic Sans MS, cursive">
> <!--
> here is a
> multi line
> comment
> -->
>
> I want to:
> 1. change the <P *> lines to <p>
> 2. delete the <FONT> elements
> 3. remove the comments
>
> I am using sed as follows:
>
> sed -f fix.sed.txt < in.html > out.html
>
> fix.sed.txt contains the following:
>
> s/<FONT*>//g
> s/<P*>/<p>/g
> s/<!--*-->//g
>
> These sed regexps have no effect. What am I doing wrong?
>
> Thanks for your help.
>
>
>
>
> ____________________________________________________________________________________
> We won't tell. Get more on shows you hate to love
> (and love to hate): Yahoo! TV's Guilty Pleasures list.
> http://tv.yahoo.com/collections/265
> _______________________________________________
> Novalug mailing list
> Novalug at calypso.tux.org
> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
More information about the Novalug
mailing list