[Novalug] [OT] programming question PERL for PERL GURU

jecottrell3 at comcast.net jecottrell3 at comcast.net
Fri May 29 21:52:41 EDT 2009


You have a good point. Maybe.

If G>F and F>S, I will have no info for G?S even tho we all know that a Grandfather is Two Generations older than his son.

But it may not matter. IIRC, the actual sorts are done in such a way as to only care about one of the inequalities, either less than, or both. You test whether the observed relation jives with the implied ordering (ascending or descending) based on their position in the list.

If you ar looking for "less than", you swap when you see a "greater than", but do nothing for already existing "less than"s. You also do nothing for "equality". So the "I don't know" function simply maps into equality.

But a topological is just a sort with imperfect ordering.

I claim (based only on gut feeling, so this needs to be proved) that even tho we may have no direct history between G and S, their relationship with F will clearly establish that G must precede S.

Bonnie, I used Generation Number as an abstract term. It doesn't necessarily correspond to actual generations. And while I realized that breeding can get rather complicated, the good news is that the parentage graph is NOT cyclical. "I'm My Own Grandpa" is a funnory song, but it Can't Happen.

To REALLY simplify this algorithm, assuming my table has been set up, do this:

older[x][y]; # true if x is older than y

until no dogs left
{ for each dog D
  { if D has no ancestors # older[x][D] is false for all x
    { print D or transfer to a result collection array
      forget about D # set older[D][x] for all x
    }
  }
}

I don't see how that algorithm can possibly fail.

Heck, you can even do it in place, altho I suspect that may lead to N-cubed behavior.

OK, we keep the older[x][y] table, but we also keep ancestors[y], which a count of how many times y has appeared on the right side of X>Y.

Now we just scan until we find ancestor[y] = 0, output y, and decrement the ancestor count for all of y's descendents. Much Better!

JIM

----- Original Message -----
From: "Jay Berkenbilt" <ejb at ql.org>
To: "Bonnie Dalzell" <bdalzell at qis.net>
Cc: "novalug mailing list" <NOVALUG at calypso.tux.org>
Sent: Friday, May 29, 2009 6:11:36 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Novalug] [OT] programming question PERL for PERL GURU

jecottrell3 at comcast.net wrote:

> Sounds like what you are doing is the same thing that tsort(1)
> does. Check out Knuth's algorithm for Topological Sorts.
>
> . . .

Yes, this is exactly a topological sort.  If you're not up on your
graph theory [:-)], a graph is a collection of nodes (the dogs) with
edges between them.  A directed graph is a graph in which the edges
point from one node to another and not necessarily back.  In your
case, there is an edge between two nodes if one is the parent of the
other.  A directed acyclic graph is a directed graph that contains no
cycles; i.e., no dog is going to be a parent of one of its
descendents!

> But there is probably a better way. The sort function in C and perl
> will take a comparison function, which simply says A is
> {less,equal,greater} than B.
>
> my $g_g_generation = []; # not trying to cause a big sensation
>
> sub agecmp
> { $g_g_generation->[$a][$b]; }
>
> while (($L, $R] = next_pair)
> {
>   $g_g_generation->[$L][$R] = -1;
>   $g_g_generation->[$R][$L] =  1;
> }
>
> print(join ' ', sort(agecmp @$g_g_generation), "\n");
>
> JIM

I don't believe this will work reliably because you don't have entries
for all pairs.  I don't think you can take a short cut from doing an
actual topological sort.

As luck would have it, I happen to have a topological sort algorithm
in perl sitting around.  It even does error checking.  It is my
prototype of the core dependency management algorithm in my open
source build system, http://www.abuild.org.

I've attached the full code here.  I've plugged in the values for your
graph.  In this graph, the keys are the dogs on the left hand and the
values are the dogs on the right hand of each left hand.  You have to
explicitly list each dog even if it has no children to use my script.
Hopefully it should be clear how this is a representation of the same
data.  I've added the special node 'all' that depends on all the
dogs.  A traversal of the graph starting from 'all' will give you the
topological sort of your graph in reverse.

    my %graph =
	(
	 'I' => ['G', 'E', 'D'],
	 'H' => ['G', 'E', 'D'],
	 'G' => ['C', 'C'],
	 'F' => [],
	 'E' => ['F', 'B'],
	 'D' => ['B'],
	 'C' => ['A'],
	 'B' => ['F', 'A'],
	 'A' => [],
	 'all' => [qw(A B C D E F G H I)],
	);

Running the attached program generates the following output:

sort(A) = A
sort(B) = F A B
sort(C) = A C
sort(D) = F A B D
sort(E) = F A B E
sort(F) = F
sort(G) = A C G
sort(H) = A C G F B E D H
sort(I) = A C G F B E D I
sort(all) = A F B C D E G H I all

If you lop "all" off the end of sort(all) and reverse, it will be a
correct sorting of your dogs.  It's not exactly the same sorting that
you came up with by hand because there are some cases in which the
order doesn't matter (like between F and A, which could appear in
either order).

You could actually use the tsort(1) program that JIM mentions too.  If
you run

tsort <<EOF
I G
I E
I D
H G
H E
H D
G C
F C
E F
E B
D B
C A
B F
B A
EOF

you get output

H
I
D
E
G
B
F
C
A

which is also a valid ordering.  You could open a pipe to tsort from
perl and just use it, but it's probably cleaner to have your own
implementation, especially since the code is here.

You can use/modify this code with any open source licensing terms you
want.

-- 
Jay Berkenbilt <ejb at ql.org>


_______________________________________________
Novalug mailing list
Novalug at calypso.tux.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/novalug



More information about the Novalug mailing list