[Bioperl-l] EMBL format field

Zhi-Qiang Ye yezhiqiang at gmail.com
Thu Jun 12 09:06:32 UTC 2008


Hi, Jason

     I used exactly your code, and the result is still 'unknown id'.
Where can I get the version of bioperl?
I used ubuntu gutsy, the version in ubuntu's package management system is 1.4-1.

     I installed BioPerl 1.4 on another computer, IA64 with redhat
linux.  It has the same problem.
In the process of installation using CPAN, make test always failed. So
I used 'force install ....'.
I am not sure it is the reason.

Thanks.
Zhi-Qiang Ye

2008/6/11 Jason Stajich <jason at bioperl.org>:
> What version of bioperl? It works for me using  this code I get 'CB271253'
> printed out.
>
> #!/usr/bin/perl -w
> use strict;
> use Bio::SeqIO;
> my $in = Bio::SeqIO->new(-format => 'embl', -file => shift);
> while( my $seq = $in->next_seq ) {
>  print $seq->id,"\n";
> }
>
> On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote:
>
>> That's weird. I also met this problem. I tried a embl-format file like
>> this:
>>
>> ID   CB271253; SV 1; linear; mRNA; EST; INV; 591 BP.
>> XX
>> AC   CB271253;
>> XX
>> DT   24-FEB-2003 (Rel. 74, Created)
>> DT   24-FEB-2003 (Rel. 74, Last updated, Version 1)
>> XX
>> DE   taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to
>> DE   SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence.
>>
>> from: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw
>>
>> the $seq object's   ->id, ->display_id  are "unkown id" ...
>>
>>
>>
>> ZQ Ye
>>
>> 2008/6/9 Hilmar Lapp <hlapp at gmx.net>:
>>>
>>> If this is the case with the latest version of BioPerl it should be filed
>>> as
>>> a bug report for the embl parser. The ID ought to be reported in
>>> $seq->get_secondary_accessions() (which returns an array). If it doesn't,
>>> it
>>> sounds like a bug to me.
>>>
>>>       -hilmar
>>>
>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>
>>>> Hi Wen,
>>>> A dump of that sequence object (Data::Dumper is your friend !) reveals
>>>> that the PA EMBL field is not saved into the object. However, you will
>>>> find the string 'AB000170.1' in the embedded CDS feature, more precisely
>>>> the seqid of the location object. I don't know whether that is always
>>>> the case, but it is in your particular example.
>>>> So, to get your hands on that value you have to do:
>>>>
>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures;
>>>> my $parent_id = $cds->location->seq_id;
>>>>
>>>> HTH,
>>>> Marc
>>>>
>>>> Marc Logghe
>>>> Senior Bioinformatician
>>>> Ablynx nv
>>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a EMBL file that I want to extract one of the line
>>>>>
>>>>> ###file###
>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>> XX
>>>>> PA   AB000170.1
>>>>> XX
>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>> XX
>>>>> OS   Sus scrofa (pig)
>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
>>>>> Mammalia;
>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus.
>>>>> OX   NCBI_TaxID=9823;
>>>>> .........
>>>>>
>>>>> I want the accession number in the line that starts with PA, AB000170
>>>>> in this example.
>>>>>
>>>>> Can anybody kindly help, tell me which module and method I should use?
>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>> get_secondary_id, etc.. they did not work...
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> Wen
>>>>> _______________________________________________



More information about the Bioperl-l mailing list