[Bioperl-l] EMBL format field

Chris Fields cjfields at uiuc.edu
Wed Jun 11 02:59:43 UTC 2008


Bill,

It's okay to offer suggestions to problems, particularly if no one  
answers, but I have to agree with Hilmar in this case.

The specific problem: your 'solution' is tied to commercial software  
(albeit free), which appear to be closed-source and with questionable  
licensing.  I couldn't find documentation on your website addressing  
either issue.  Therefore, I couldn't recommend using it unless the  
latter two issues were clarified, preferably by becoming open-source.

chris

On Jun 10, 2008, at 9:33 PM, bill at genenformics.com wrote:

> Hi, Hilmar,
>
> Thank you for your advice.
>
> I am a BioPerl user and I step in only when there is no
> efficient/effective BioPerl method to solve specific problems.
>
> Please forgive us for providing free solutions.
>
> Bill at genenformics.com
>
>> Bill,
>>
>> this mailing list is about BioPerl. There are many programs and web-
>> sites out there that convert between IDs, that wasn't the question.
>>
>> We welcome your participation in helping to solve Bioperl-related
>> problems, and sometimes the easiest solution is to use other, cross-
>> platform open-source tools.
>>
>> For peddling commercial products, no matter how useful they are and
>> how little the cost, please use other forums.
>>
>> 	-hilmar
>>
>> On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote:
>>> This can be accomplished using IdConvert if protein accession/gi is
>>> known:
>>>
>>> $> ./IdConvert.exe BAA19060
>>> #Input  Nuc_GI  Nuc_Acc Pro_GI  Pro_Acc Desc
>>> BAA19060        1783121 AB000170.1      1783123 BAA19061.1
>>> endopeptidase 24.16 type M3 [Sus scrofa]
>>>
>>> Download IdConvert from http://www.genenformics.com/download.html
>>> for free.
>>>
>>> Bill at genenformics.com
>>>
>>>
>>>>
>>>> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote:
>>>>> I agree if it isn't the accession # it shouldn't be stored there.
>>>>> I guess it is a DBlink, but it is going to be hacky to round-trip
>>>>> this as you'll have to have a special case for records that are
>>>>> mRNAs...
>>>>
>>>> I think I agree with that - didn't realize it is the accession of  
>>>> the
>>>> (translated) protein. It would be ideal to convert this into a  
>>>> DBLink
>>>> annotation indeed, but that's an opinion and an interpretation of  
>>>> the
>>>> file (even if a very useful one). As such I believe it should be  
>>>> the
>>>> matter of a SeqProcessor.
>>>>
>>>> Hmm - except that at that point the information has been lost  
>>>> already
>>>> so there's actually nothing that the SeqProcessor could massage.
>>>>
>>>> So what if the line would simply be a B::Annotation::SimpleValue  
>>>> with
>>>> 'PA' as key and the accession# as value? That wouldn't be an
>>>> interpretation, and yet would make the value available to a
>>>> SeqProcessor for converting into a DBLink.
>>>>
>>>> 	-hilmar
>>>>
>>>>>
>>>>> -jason
>>>>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:
>>>>>
>>>>>> PA is an odd field; it isn't described in the EMBL user manual:
>>>>>>
>>>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>>>>>>
>>>>>> but appears in mRNA files, so I'm guessing it stands for the (p)
>>>>>> rotein (a)ccession.  I don't think this should be stored as
>>>>>> primary/secondary accession, but maybe as a DBLink annootation?
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>>>>>>
>>>>>>> PA is a field that we don't currently parse, something that
>>>>>>> should be filed as a bug on bugzilla.
>>>>>>> Would you be able to do this?
>>>>>>>
>>>>>>> -jason
>>>>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>>>>>>
>>>>>>>> Hilmar,
>>>>>>>>
>>>>>>>> I tried that, it did not work. Marc's way can work.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Wen
>>>>>>>>
>>>>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>>>>>>
>>>>>>>>> If this is the case with the latest version of BioPerl it
>>>>>>>>> should be filed as a bug report for the embl parser. The ID
>>>>>>>>> ought to be reported in $seq->get_secondary_accessions()  
>>>>>>>>> (which
>>>>>>>>> returns an array). If it doesn't, it sounds like a bug to me.
>>>>>>>>>
>>>>>>>>> 	-hilmar
>>>>>>>>>
>>>>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>>>>>> Hi Wen,
>>>>>>>>>> A dump of that sequence object (Data::Dumper is your  
>>>>>>>>>> friend !)
>>>>>>>>>> reveals
>>>>>>>>>> that the PA EMBL field is not saved into the object. However,
>>>>>>>>>> you will
>>>>>>>>>> find the string 'AB000170.1' in the embedded CDS feature,  
>>>>>>>>>> more
>>>>>>>>>> precisely
>>>>>>>>>> the seqid of the location object. I don't know whether that  
>>>>>>>>>> is
>>>>>>>>>> always
>>>>>>>>>> the case, but it is in your particular example.
>>>>>>>>>> So, to get your hands on that value you have to do:
>>>>>>>>>>
>>>>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq-
>>>>>>>>>>> get_SeqFeatures;
>>>>>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>>>>>
>>>>>>>>>> HTH,
>>>>>>>>>> Marc
>>>>>>>>>>
>>>>>>>>>> Marc Logghe
>>>>>>>>>> Senior Bioinformatician
>>>>>>>>>> Ablynx nv
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- 
>>>>>>>>>>> l-
>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>>>>>
>>>>>>>>>>> ###file###
>>>>>>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>>>>>> XX
>>>>>>>>>>> PA   AB000170.1
>>>>>>>>>>> XX
>>>>>>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>>>>>> XX
>>>>>>>>>>> OS   Sus scrofa (pig)
>>>>>>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>>>>>>>>>>> Euteleostomi;
>>>>>>>>>>> Mammalia;
>>>>>>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina;
>>>>>>>>>>> Suidae; Sus.
>>>>>>>>>>> OX   NCBI_TaxID=9823;
>>>>>>>>>>> .........
>>>>>>>>>>>
>>>>>>>>>>> I want the accession number in the line that starts with PA,
>>>>>>>>>>> AB000170
>>>>>>>>>>> in this example.
>>>>>>>>>>>
>>>>>>>>>>> Can anybody kindly help, tell me which module and method I
>>>>>>>>>>> should use?
>>>>>>>>>>> I tried various things like $seq_obj -> primary_id,
>>>>>>>>>>> display_id,
>>>>>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>>
>>>>>>>>>>> Wen
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ===========================================================
>>>>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>>>>> ===========================================================
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher
>>>>>> Lab of Dr. Marie-Claude Hofmann
>>>>>> College of Veterinary Medicine
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list