[Bioperl-l] EMBL format field

Wed Jun 11 02:33:45 UTC 2008

Hi, Hilmar,

Thank you for your advice.

I am a BioPerl user and I step in only when there is no
efficient/effective BioPerl method to solve specific problems.

Please forgive us for providing free solutions.

Bill at genenformics.com

> Bill,
>
> this mailing list is about BioPerl. There are many programs and web-
> sites out there that convert between IDs, that wasn't the question.
>
> We welcome your participation in helping to solve Bioperl-related
> problems, and sometimes the easiest solution is to use other, cross-
> platform open-source tools.
>
> For peddling commercial products, no matter how useful they are and
> how little the cost, please use other forums.
>
> 	-hilmar
>
> On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote:
>> This can be accomplished using IdConvert if protein accession/gi is
>> known:
>>
>> $> ./IdConvert.exe BAA19060
>> #Input  Nuc_GI  Nuc_Acc Pro_GI  Pro_Acc Desc
>> BAA19060        1783121 AB000170.1      1783123 BAA19061.1
>> endopeptidase 24.16 type M3 [Sus scrofa]
>>
>> Download IdConvert from http://www.genenformics.com/download.html
>> for free.
>>
>> Bill at genenformics.com
>>
>>
>>>
>>> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote:
>>>> I agree if it isn't the accession # it shouldn't be stored there.
>>>> I guess it is a DBlink, but it is going to be hacky to round-trip
>>>> this as you'll have to have a special case for records that are
>>>> mRNAs...
>>>
>>> I think I agree with that - didn't realize it is the accession of the
>>> (translated) protein. It would be ideal to convert this into a DBLink
>>> annotation indeed, but that's an opinion and an interpretation of the
>>> file (even if a very useful one). As such I believe it should be the
>>> matter of a SeqProcessor.
>>>
>>> Hmm - except that at that point the information has been lost already
>>> so there's actually nothing that the SeqProcessor could massage.
>>>
>>> So what if the line would simply be a B::Annotation::SimpleValue with
>>> 'PA' as key and the accession# as value? That wouldn't be an
>>> interpretation, and yet would make the value available to a
>>> SeqProcessor for converting into a DBLink.
>>>
>>> 	-hilmar
>>>
>>>>
>>>> -jason
>>>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:
>>>>
>>>>> PA is an odd field; it isn't described in the EMBL user manual:
>>>>>
>>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>>>>>
>>>>> but appears in mRNA files, so I'm guessing it stands for the (p)
>>>>> rotein (a)ccession.  I don't think this should be stored as
>>>>> primary/secondary accession, but maybe as a DBLink annootation?
>>>>>
>>>>> chris
>>>>>
>>>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>>>>>
>>>>>> PA is a field that we don't currently parse, something that
>>>>>> should be filed as a bug on bugzilla.
>>>>>> Would you be able to do this?
>>>>>>
>>>>>> -jason
>>>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>>>>>
>>>>>>> Hilmar,
>>>>>>>
>>>>>>> I tried that, it did not work. Marc's way can work.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Wen
>>>>>>>
>>>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>>>>>
>>>>>>>> If this is the case with the latest version of BioPerl it
>>>>>>>> should be filed as a bug report for the embl parser. The ID
>>>>>>>> ought to be reported in $seq->get_secondary_accessions() (which
>>>>>>>> returns an array). If it doesn't, it sounds like a bug to me.
>>>>>>>>
>>>>>>>> 	-hilmar
>>>>>>>>
>>>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>>>>> Hi Wen,
>>>>>>>>> A dump of that sequence object (Data::Dumper is your friend !)
>>>>>>>>> reveals
>>>>>>>>> that the PA EMBL field is not saved into the object. However,
>>>>>>>>> you will
>>>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more
>>>>>>>>> precisely
>>>>>>>>> the seqid of the location object. I don't know whether that is
>>>>>>>>> always
>>>>>>>>> the case, but it is in your particular example.
>>>>>>>>> So, to get your hands on that value you have to do:
>>>>>>>>>
>>>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq-
>>>>>>>>>> get_SeqFeatures;
>>>>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>>>>
>>>>>>>>> HTH,
>>>>>>>>> Marc
>>>>>>>>>
>>>>>>>>> Marc Logghe
>>>>>>>>> Senior Bioinformatician
>>>>>>>>> Ablynx nv
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>>>>
>>>>>>>>>> ###file###
>>>>>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>>>>> XX
>>>>>>>>>> PA   AB000170.1
>>>>>>>>>> XX
>>>>>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>>>>> XX
>>>>>>>>>> OS   Sus scrofa (pig)
>>>>>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>>>>>>>>>> Euteleostomi;
>>>>>>>>>> Mammalia;
>>>>>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina;
>>>>>>>>>> Suidae; Sus.
>>>>>>>>>> OX   NCBI_TaxID=9823;
>>>>>>>>>> .........
>>>>>>>>>>
>>>>>>>>>> I want the accession number in the line that starts with PA,
>>>>>>>>>> AB000170
>>>>>>>>>> in this example.
>>>>>>>>>>
>>>>>>>>>> Can anybody kindly help, tell me which module and method I
>>>>>>>>>> should use?
>>>>>>>>>> I tried various things like $seq_obj -> primary_id,
>>>>>>>>>> display_id,
>>>>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>>>>
>>>>>>>>>> Thanks a lot!
>>>>>>>>>>
>>>>>>>>>> Wen
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>> --
>>>>>>>> ===========================================================
>>>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>>>> ===========================================================
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Marie-Claude Hofmann
>>>>> College of Veterinary Medicine
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>