[Bioperl-l] extract feature seq when split between 2 GenBank accessions

Jason Stajich jason at cgt.duhs.duke.edu
Wed Aug 27 13:05:08 EDT 2003


If you are getting the seq via spliced_seq you can pass in a
Bio::DB::RandomAccessI (either a [local] Bio::Index::Fasta or [remote]
Bio::DB::GenBank, etc db handle) to the spliced_seq object.

Now I think there is a bug because spliced seq is sorting the locations
before processing on them which has been reported but not fixed
(I am really hoping for some more bugfixing developers out there folks!)
but it should work through that system once that bug is fixed.

I would just use a Bio::DB::Fasta/Bio::Index::Fasta where you have the
accessions indexed instead of reading in all the possible seqs and storing
in a hash to keep the memory requirements down.  You can also use the
DB::Failover + DB::FileCache to cache local/remote calls if you need to
mix local and remote dbs.

-jason

On Wed, 27 Aug 2003, Charles Hauser wrote:

> All,
>
> I'd like to extract the CDS from genbank records and have found that in
> some instances these are distributed among >1 genbank accession (see
> below).
>
> I have a script which does fine if CDS is fully contained within 1
> accession, other than storing all accession seqs in a hash is there a
> good way to deal with these?
>
> Charles
>
>
> LOCUS       AY095303S1              2375 bp    DNA     linear   PLN 21-JAN-2003
> DEFINITION  Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1)
>             gene, ccs1-ac206 allele, 5'UTR and exons 1 through 6.
> ACCESSION   AY095303
> VERSION     AY095303.1  GI:25986619
>
>      CDS             join(207..330,512..825,1045..1233,1418..1798,2000..2131,
>                      2253..2345,AY095304.1:6..303,AY095304.1:495..677,
>                      AY095304.1:863..1098)
>                      /gene="CCS1"
>
>
>
>
> LOCUS       AY095303S2              1505 bp    DNA     linear   PLN 21-JAN-2003
> DEFINITION  Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1)
>             gene, ccs1-ac206 allele, exons 7, 8 and 9, 3'UTR and complete cds.
> ACCESSION   AY095304
> VERSION     AY095304.1  GI:25986620
>      CDS             join(AY095303.1:207..330,AY095303.1:512..825,
>                      AY095303.1:1045..1233,AY095303.1:1418..1798,
>                      AY095303.1:2000..2131,AY095303.1:2253..2345,6..303,
>                      495..677,863..1098)
>                      /gene="CCS1"
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list