[Bioperl-l] New package to compare two SeqI-implementing objects
Daniel Renfro
bluecurio at gmail.com
Mon Feb 1 03:22:37 UTC 2010
Hello all,
A colleague and I have been working on a (Bio)Perl package to compare two
Seq objects. This is in response to a need we found in our lab -- we wanted
to see the changes to GenBank files through time, but wanted an automated
way to do this. This led to what I'm calling the SeqDiff.pm package. I
thought it would be a good idea to inform the community and get some
feedback.
The package takes two Seq objects as arguments, arbitrarily called "old" and
"new." It then matches the features from the old object with the new object.
This is done based on some criteria -- in our case we decided the features
must be of the same type (have the same primary_tag) and have at least one
matching database cross-reference (db_xref) in common. The left-over
features (ones that did not have a match) are dropped into arrays called
"lost" and "gained." The matching is done in about NlogN time, as each
matching pair are removed from subsequent searches.
The matched features and iterated through and the differences are
calculated. Each feature is examined recursively and any differences are
reported. Optionally you can give the new() method a flag so that everything
is returned (differences and similarities.) You can set callbacks for
different types of objects (like anything that isa('Bio::LocationI')) if you
want a custom comparison for specific BioPerl objects. This comparison step
is the computationally slow part, and currently everything is held in
memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
next() and last() methods.
Maybe this was a little verbose, but that is the SeqDiff package in a
nutshell. I hope to soon release v1.0. If you have any questions or comments
I'd love to hear them.
-Daniel Renfro
Hu Lab Research Associate
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4055
More information about the Bioperl-l
mailing list