[Novalug] Converting HTML with sed and regular expressions

Matt Good matt at matt-good.net
Sun Apr 8 23:18:48 EDT 2007


I'd recommend trying Tidy which supports a lot of features for cleaning
up bad HTML:
http://tidy.sourceforge.net/

-- Matt Good

On Sun, 2007-04-08 at 20:12 -0700, Jim Ide wrote:
> Hello -
> 
> I have several HTML files that contain lines similar to these:
> 
> <P CLASS="western" STYLE="margin-bottom: 0in"><FONT FACE="Comic Sans MS, cursive">
> <!--
> here is a
> multi line
> comment
> -->
> 
> I want to:
> 1. change the <P *> lines to <p>
> 2. delete the <FONT> elements
> 3. remove the comments
> 
> I am using sed as follows:
> 
> sed -f fix.sed.txt < in.html > out.html
> 
> fix.sed.txt contains the following:
> 
> s/<FONT*>//g
> s/<P*>/<p>/g
> s/<!--*-->//g
> 
> These sed regexps have no effect.  What am I doing wrong?
> 
> Thanks for your help.
> 
> 
> 
>  
> ____________________________________________________________________________________
> We won't tell. Get more on shows you hate to love 
> (and love to hate): Yahoo! TV's Guilty Pleasures list.
> http://tv.yahoo.com/collections/265 
> _______________________________________________
> Novalug mailing list
> Novalug at calypso.tux.org
> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug



More information about the Novalug mailing list