[Novalug] how to output parts of files that are common
Mark Smith
mark at winksmith.com
Wed May 13 18:01:42 EDT 2009
i haven't been following so someone else might have mentioned it.
you can use sdiff which is a side-by-side diff. if the files are
sortable:
sort -o /tmp/1 file1
sort -o /tmp/2 file2
sdiff -w200 /tmp/[12]
you can easily visualize which lines are the same in the context of the file.
also, i don't have too much experience with it, but vimdiff helps
you visualize the diffs in two or more files.
On Wed, May 13, 2009 at 07:08:36PM +0000, jecottrell3 at comcast.net wrote:
> Indeed you could!
>
> Simply use "diff -U 987654 files... | grep -v '^ '
>
> That will give you 987,654 lines context, which is most likely bigger than each file.
>
> Then grep out the lines that are the same, which will start with a space.
>
> NOTE: if you want the exact lines back, change the grep to: sed -ne '/^ /{s///;p}'
>
> Also note that if you don't care about the order you can simply sort and run comm.
>
> Finally, if you don't care about order, you can use uniq: cat file1 file2 | sort | uniq -d
>
> JIM
>
> ----- Original Message -----
> From: "Jon LaBadie" <novalugml at jgcomp.com>
> To: "NOVALUG" <novalug at calypso.tux.org>
> Sent: Wednesday, May 13, 2009 2:34:32 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Novalug] how to output parts of files that are common
>
> On Wed, May 13, 2009 at 08:19:55AM -0400, Raul Parra wrote:
> > Jon is right - I just tested comm on two files and messed with the order and
> > spacing but left the exact same lines in place and comm did not correctly
> > find the strings that were unique and common between the files.
> >
> > RP
> >
> > On Tue, May 12, 2009 at 5:02 PM, Jon LaBadie <novalugml at jgcomp.com> wrote:
> >
> > > On Tue, May 12, 2009 at 03:44:15PM -0400, Bob Copeland wrote:
> > > > On Sun, May 10, 2009 at 11:05 AM, Nino Pereira <pereira at speakeasy.net>
> > > wrote:
> > > > > I think I know how to do the reverse, viz., find the parts of
> > > > > the files that differ (with diff or xxdiff). But, how do you get
> > > > > only the sections of files that are equal?
> > > >
> > > > In case you haven't found it yet -- the opposite of diff is comm(1)!
> > >
> > > Not really similar commands. comm expects things in an ordered sequence
> > > and extra lines can throw its matching off.
> > >
> > > jl
> > > --
>
> I wonder if you could take the diff output and write a script to use
> the line numbers and generate the inverse?
>
> jl
> --
> Jon H. LaBadie jon at jgcomp.com
> JG Computing
> 12027 Creekbend Drive (703) 787-0884
> Reston, VA 20194 (703) 787-0922 (fax)
> _______________________________________________
> Novalug mailing list
> Novalug at calypso.tux.org
> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
> _______________________________________________
> Novalug mailing list
> Novalug at calypso.tux.org
> http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug
--
Mark Smith
mark at winksmith.com
mark at tux.org
More information about the Novalug
mailing list