[Novalug] how to output parts of files that are common

Mark Smith mark at winksmith.com
Wed May 13 18:01:42 EDT 2009


i haven't been following so someone else might have mentioned it.
you can use sdiff which is a side-by-side diff.  if the files are
sortable:

sort -o /tmp/1 file1
sort -o /tmp/2 file2
sdiff -w200 /tmp/[12]

you can easily visualize which lines are the same in the context of the file.

also, i don't have too much experience with it, but vimdiff helps
you visualize the diffs in two or more files.


On Wed, May 13, 2009 at 07:08:36PM +0000, jecottrell3 at comcast.net wrote:
> Indeed you could!
> 
> Simply use "diff -U 987654 files... | grep -v '^ '
> 
> That will give you 987,654 lines context, which is most likely bigger than each file.
> 
> Then grep out the lines that are the same, which will start with a space.
> 
> NOTE: if you want the exact lines back, change the grep to: sed -ne '/^ /{s///;p}'
> 
> Also note that if you don't care about the order you can simply sort and run comm.
> 
> Finally, if you don't care about order, you can use uniq: cat file1 file2 | sort | uniq -d
> 
> JIM
> 
> ----- Original Message -----
> From: "Jon LaBadie" <novalugml at jgcomp.com>
> To: "NOVALUG" <novalug at calypso.tux.org>
> Sent: Wednesday, May 13, 2009 2:34:32 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Novalug] how to output parts of files that are common
> 
> On Wed, May 13, 2009 at 08:19:55AM -0400, Raul Parra wrote:
> > Jon is right - I just tested comm on two files and messed with the order and
> > spacing but left the exact same lines in place and comm did not correctly
> > find the strings that were unique and common between the files.
> > 
> > RP
> > 
> > On Tue, May 12, 2009 at 5:02 PM, Jon LaBadie <novalugml at jgcomp.com> wrote:
> > 
> > > On Tue, May 12, 2009 at 03:44:15PM -0400, Bob Copeland wrote:
> > > > On Sun, May 10, 2009 at 11:05 AM, Nino Pereira <pereira at speakeasy.net>
> > > wrote:
> > > > > I think I know how to do the reverse, viz., find the parts of
> > > > > the files that differ (with diff or xxdiff). But, how do you get
> > > > > only the sections of files that are equal?
> > > >
> > > > In case you haven't found it yet -- the opposite of diff is comm(1)!
> > >
> > > Not really similar commands.  comm expects things in an ordered sequence
> > > and extra lines can throw its matching off.
> > >
> > > jl
> > > --
> 
> I wonder if you could take the diff output and write a script to use
> the line numbers and generate the inverse?
> 
> jl
> -- 
> Jon H. LaBadie                  jon at jgcomp.com
>  JG Computing
>  12027 Creekbend Drive		(703) 787-0884
>  Reston, VA  20194		(703) 787-0922 (fax)
> _______________________________________________
> Novalug mailing list
> Novalug at calypso.tux.org
> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
> _______________________________________________
> Novalug mailing list
> Novalug at calypso.tux.org
> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug

-- 
Mark Smith
mark at winksmith.com
mark at tux.org



More information about the Novalug mailing list