[Bioperl-l] [Bioperl -l] Problem reading EMBL format file

Hans Rudolf Hotz hrh at sanger.ac.uk
Wed Aug 10 04:18:14 EDT 2005


Iain

This is one of the features of SRS. If you search EMBL with a ProteinID,
you don't search EMBL but you search EMBL_features. Hence, the output is
only one feature. And depending on your SRS installation this might look
more or less like an EMBL entry, but is not an EMBL entry.

In order to get an EMBL entry (with all the features, of course) you
can do:


getz "[EMBL-ProteinID:AAA46567] > embl" -e

or

getz "[EMBL-ProteinID:AAA46567] > parent" -e


Then you get the proper embl entry (M23021) which you can feed into SeqIO

Hope this helps,

Hans


On Tue, 9 Aug 2005, Iain Wallace wrote:

> Hi all,
>
> Hope fully somebody will be able to help me, I am having some difficulty
> reading a file that looks to me very much like EMBL format.
>
> I am trying to read some sequence files using SeqIO. Both files are obtained
> using the getz program with the following commands
> getz "[EMBL-ProteinID:AAA46567]" -e >COAT_SBMV.AAA46567.cds
> getz "[EMBL-Acc:M23021]" -e > COAT_SBMV.M23021.embl
>
> The embl file is read fine, and I am able to extract the features I want. I
> am having problems with the CDS file; it doesn't appear to be read properly.
> I guess the CDS file isn't a proper EMBL format. Does anyone know what
> format it is or how I could convert it to a proper EMBL format or
> alternatively how to make getz return the file in the proper format. The two
> files look very similar to me
>
> I tried the following little conversion program which worked fine on the
> EMBL file, but failed on the cds file with the error: No whitespace allowed
> in EMBL display id [unknown id]
>
> use Bio::SeqIO;
>
> $filename = $ARGV[0];
> $in = Bio::SeqIO->new(-file => $filename ,
> -format => 'EMBL');
> $out = Bio::SeqIO->new(-file => ">outputfilename" ,
> -format => 'EMBL');
>
> while ( my $seq = $in->next_seq() ) {
> $out->write_seq($seq);
> }
>
>
> Thanks for all your help
>
> Iain
>


More information about the Bioperl-l mailing list