[Bioperl-l] CONTIG dealing

Nikki Appleby n.appleby at uq.edu.au
Wed Oct 18 21:58:06 UTC 2006


I have just entered the wonderful new world of BioPerl, so the answer to my
question may be obvious to any of the gurus reading this.

I need to collect sequence features and ontology annotations. Here goes.

I am retrieving sequences from SwissProt via Bio::DB::SwissProt and
get_Seq_by_id, for this example Q8RZV7. Once I have parsed it into an RDBMS
format that I am happy with I can get at the xref ids. In this case, they
are 

AP003451; BAB86144.1; -; Genomic_DNA. 
AP008207; BAF07116.1; -; Genomic_DNA. 
AB103395; BAC81207.1; -; mRNA. 

I can happily go off and fetch those from Bio::DB::GenBank (first column),
and Bio::DB::GenPept (second). All good, except...

AP008207 is a contig. I don't want to get all of the features for the entire
thing, just the single contig that actually matches the original sequence.
It takes a couple of hours to get at it and then it gives me way too much.

I will come across this problem with other sequences. How do I (a) find out
if it is a contig without downloading it in it's entirety and (b) extract
the list of sequences that are about to be contigged together.

I have searched the web for answers, including this list, but see nothing.
Help!
 
Nikki Appleby.






More information about the Bioperl-l mailing list