[Bioperl-l] EMBL format field

Sun Jun 15 02:25:26 UTC 2008

See this FAQ question, as well as the one following it.

chris

On Jun 14, 2008, at 8:39 AM, Zhi-Qiang Ye wrote:

>  Thank all of you.  I finally get the newest version of bioperl
> installed and solved the problem.
>
>  I noticed that ensembl API still uses bioperl-1.2.3, which
> misleaded me that bioperl-1.4 is very up-to-date ...
>
>
> Regards,
> Zhi-Qiang
>
>
> 2008/6/12 Kevin Brown <Kevin.M.Brown at asu.edu>:
>> See the following links for where to get a more current version.   
>> 1.4 is
>> years old and lots of parts are non-functional due to website and  
>> file
>> format changes.
>>
>> http://www.bioperl.org/wiki/Installing_BioPerl
>>
>> http://www.bioperl.org/wiki/Installing_BioPerl_on_Ubuntu_Server
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Zhi-Qiang Ye
>>> Sent: Thursday, June 12, 2008 2:07 AM
>>> To: Jason Stajich
>>> Cc: bioperl list
>>> Subject: Re: [Bioperl-l] EMBL format field
>>>
>>> Hi, Jason
>>>
>>>    I used exactly your code, and the result is still 'unknown id'.
>>> Where can I get the version of bioperl?
>>> I used ubuntu gutsy, the version in ubuntu's package
>>> management system is 1.4-1.
>>>
>>>    I installed BioPerl 1.4 on another computer, IA64 with redhat
>>> linux.  It has the same problem.
>>> In the process of installation using CPAN, make test always  
>>> failed. So
>>> I used 'force install ....'.
>>> I am not sure it is the reason.
>>>
>>> Thanks.
>>> Zhi-Qiang Ye
>>>
>>> 2008/6/11 Jason Stajich <jason at bioperl.org>:
>>>> What version of bioperl? It works for me using  this code I
>>> get 'CB271253'
>>>> printed out.
>>>>
>>>> #!/usr/bin/perl -w
>>>> use strict;
>>>> use Bio::SeqIO;
>>>> my $in = Bio::SeqIO->new(-format => 'embl', -file => shift);
>>>> while( my $seq = $in->next_seq ) {
>>>> print $seq->id,"\n";
>>>> }
>>>>
>>>> On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote:
>>>>
>>>>> That's weird. I also met this problem. I tried a
>>> embl-format file like
>>>>> this:
>>>>>
>>>>> ID   CB271253; SV 1; linear; mRNA; EST; INV; 591 BP.
>>>>> XX
>>>>> AC   CB271253;
>>>>> XX
>>>>> DT   24-FEB-2003 (Rel. 74, Created)
>>>>> DT   24-FEB-2003 (Rel. 74, Last updated, Version 1)
>>>>> XX
>>>>> DE   taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA
>>> 3' similar to
>>>>> DE   SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence.
>>>>>
>>>>> from:
>>> http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw
>>>>>
>>>>> the $seq object's   ->id, ->display_id  are "unkown id" ...
>>>>>
>>>>>
>>>>>
>>>>> ZQ Ye
>>>>>
>>>>> 2008/6/9 Hilmar Lapp <hlapp at gmx.net>:
>>>>>>
>>>>>> If this is the case with the latest version of BioPerl it
>>> should be filed
>>>>>> as
>>>>>> a bug report for the embl parser. The ID ought to be reported in
>>>>>> $seq->get_secondary_accessions() (which returns an
>>> array). If it doesn't,
>>>>>> it
>>>>>> sounds like a bug to me.
>>>>>>
>>>>>>     -hilmar
>>>>>>
>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>>>
>>>>>>> Hi Wen,
>>>>>>> A dump of that sequence object (Data::Dumper is your
>>> friend !) reveals
>>>>>>> that the PA EMBL field is not saved into the object.
>>> However, you will
>>>>>>> find the string 'AB000170.1' in the embedded CDS
>>> feature, more precisely
>>>>>>> the seqid of the location object. I don't know whether
>>> that is always
>>>>>>> the case, but it is in your particular example.
>>>>>>> So, to get your hands on that value you have to do:
>>>>>>>
>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'}
>>> $seq->get_SeqFeatures;
>>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>>
>>>>>>> HTH,
>>>>>>> Marc
>>>>>>>
>>>>>>> Marc Logghe
>>>>>>> Senior Bioinformatician
>>>>>>> Ablynx nv
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>>
>>>>>>>> ###file###
>>>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>>> XX
>>>>>>>> PA   AB000170.1
>>>>>>>> XX
>>>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>>> XX
>>>>>>>> OS   Sus scrofa (pig)
>>>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>>>>>>> Mammalia;
>>>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina;
>>> Suidae; Sus.
>>>>>>>> OX   NCBI_TaxID=9823;
>>>>>>>> .........
>>>>>>>>
>>>>>>>> I want the accession number in the line that starts
>>> with PA, AB000170
>>>>>>>> in this example.
>>>>>>>>
>>>>>>>> Can anybody kindly help, tell me which module and
>>> method I should use?
>>>>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>>
>>>>>>>> Wen
>>>>>>>> _______________________________________________
>>> _______________________________________________
>>> Bioperl-l mailing list
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign