[Novalug] Converting HTML with sed and regular expressions

Jim Ide jimsmaillists at yahoo.com
Sun Apr 8 23:12:29 EDT 2007


Hello -

I have several HTML files that contain lines similar to these:

<P CLASS="western" STYLE="margin-bottom: 0in"><FONT FACE="Comic Sans MS, cursive">
<!--
here is a
multi line
comment
-->

I want to:
1. change the <P *> lines to <p>
2. delete the <FONT> elements
3. remove the comments

I am using sed as follows:

sed -f fix.sed.txt < in.html > out.html

fix.sed.txt contains the following:

s/<FONT*>//g
s/<P*>/<p>/g
s/<!--*-->//g

These sed regexps have no effect.  What am I doing wrong?

Thanks for your help.



 
____________________________________________________________________________________
We won't tell. Get more on shows you hate to love 
(and love to hate): Yahoo! TV's Guilty Pleasures list.
http://tv.yahoo.com/collections/265 


More information about the Novalug mailing list