[Bioperl-l] EMBL format field

bill at genenformics.com bill at genenformics.com
Wed Jun 11 01:43:55 UTC 2008


This can be accomplished using IdConvert if protein accession/gi is known:

$> ./IdConvert.exe BAA19060
#Input  Nuc_GI  Nuc_Acc Pro_GI  Pro_Acc Desc
BAA19060        1783121 AB000170.1      1783123 BAA19061.1     
endopeptidase 24.16 type M3 [Sus scrofa]

Download IdConvert from http://www.genenformics.com/download.html for free.

Bill at genenformics.com


>
> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote:
>> I agree if it isn't the accession # it shouldn't be stored there.
>> I guess it is a DBlink, but it is going to be hacky to round-trip
>> this as you'll have to have a special case for records that are
>> mRNAs...
>
> I think I agree with that - didn't realize it is the accession of the
> (translated) protein. It would be ideal to convert this into a DBLink
> annotation indeed, but that's an opinion and an interpretation of the
> file (even if a very useful one). As such I believe it should be the
> matter of a SeqProcessor.
>
> Hmm - except that at that point the information has been lost already
> so there's actually nothing that the SeqProcessor could massage.
>
> So what if the line would simply be a B::Annotation::SimpleValue with
> 'PA' as key and the accession# as value? That wouldn't be an
> interpretation, and yet would make the value available to a
> SeqProcessor for converting into a DBLink.
>
> 	-hilmar
>
>>
>> -jason
>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:
>>
>>> PA is an odd field; it isn't described in the EMBL user manual:
>>>
>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>>>
>>> but appears in mRNA files, so I'm guessing it stands for the (p)
>>> rotein (a)ccession.  I don't think this should be stored as
>>> primary/secondary accession, but maybe as a DBLink annootation?
>>>
>>> chris
>>>
>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>>>
>>>> PA is a field that we don't currently parse, something that
>>>> should be filed as a bug on bugzilla.
>>>> Would you be able to do this?
>>>>
>>>> -jason
>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>>>
>>>>> Hilmar,
>>>>>
>>>>> I tried that, it did not work. Marc's way can work.
>>>>>
>>>>> Thanks,
>>>>> Wen
>>>>>
>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>>>
>>>>>> If this is the case with the latest version of BioPerl it
>>>>>> should be filed as a bug report for the embl parser. The ID
>>>>>> ought to be reported in $seq->get_secondary_accessions() (which
>>>>>> returns an array). If it doesn't, it sounds like a bug to me.
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>>> Hi Wen,
>>>>>>> A dump of that sequence object (Data::Dumper is your friend !)
>>>>>>> reveals
>>>>>>> that the PA EMBL field is not saved into the object. However,
>>>>>>> you will
>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more
>>>>>>> precisely
>>>>>>> the seqid of the location object. I don't know whether that is
>>>>>>> always
>>>>>>> the case, but it is in your particular example.
>>>>>>> So, to get your hands on that value you have to do:
>>>>>>>
>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq-
>>>>>>> >get_SeqFeatures;
>>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>>
>>>>>>> HTH,
>>>>>>> Marc
>>>>>>>
>>>>>>> Marc Logghe
>>>>>>> Senior Bioinformatician
>>>>>>> Ablynx nv
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>>
>>>>>>>> ###file###
>>>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>>> XX
>>>>>>>> PA   AB000170.1
>>>>>>>> XX
>>>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>>> XX
>>>>>>>> OS   Sus scrofa (pig)
>>>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>>>>>>>> Euteleostomi;
>>>>>>>> Mammalia;
>>>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina;
>>>>>>>> Suidae; Sus.
>>>>>>>> OX   NCBI_TaxID=9823;
>>>>>>>> .........
>>>>>>>>
>>>>>>>> I want the accession number in the line that starts with PA,
>>>>>>>> AB000170
>>>>>>>> in this example.
>>>>>>>>
>>>>>>>> Can anybody kindly help, tell me which module and method I
>>>>>>>> should use?
>>>>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>>
>>>>>>>> Wen
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Marie-Claude Hofmann
>>> College of Veterinary Medicine
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>





More information about the Bioperl-l mailing list