[Bioperl-l] EMBL format field

Wen Huang whuang.ustc at gmail.com
Wed Jun 11 00:51:51 UTC 2008


Hi Everybody,

Thank you for your thoughtful discussion and help. I have found  
another way to get around it (by grep and awk), but not so perl-ish.

I don't think I know how to submit a bug report to bugzilla, but I do  
think that it is not a good idea to include the parent id in a PA  
line, or even in the file...

The file I got is from EMBL-CDS databank, I wanted to get the mRNA  
from which they are derived. I guess it is better to include it as a  
DBlink as Jason pointed out.

Thanks,
Wen

On Jun 10, 2008, at 7:36 PM, Jason Stajich wrote:

> I agree if it isn't the accession # it shouldn't be stored there.  I  
> guess it is a DBlink, but it is going to be hacky to round-trip this  
> as you'll have to have a special case for records that are mRNAs...
>
> -jason
> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:
>
>> PA is an odd field; it isn't described in the EMBL user manual:
>>
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>>
>> but appears in mRNA files, so I'm guessing it stands for the  
>> (p)rotein (a)ccession.  I don't think this should be stored as  
>> primary/secondary accession, but maybe as a DBLink annootation?
>>
>> chris
>>
>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>>
>>> PA is a field that we don't currently parse, something that should  
>>> be filed as a bug on bugzilla.
>>> Would you be able to do this?
>>>
>>> -jason
>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>>
>>>> Hilmar,
>>>>
>>>> I tried that, it did not work. Marc's way can work.
>>>>
>>>> Thanks,
>>>> Wen
>>>>
>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>>
>>>>> If this is the case with the latest version of BioPerl it should  
>>>>> be filed as a bug report for the embl parser. The ID ought to be  
>>>>> reported in $seq->get_secondary_accessions() (which returns an  
>>>>> array). If it doesn't, it sounds like a bug to me.
>>>>>
>>>>> 	-hilmar
>>>>>
>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>> Hi Wen,
>>>>>> A dump of that sequence object (Data::Dumper is your friend !)  
>>>>>> reveals
>>>>>> that the PA EMBL field is not saved into the object. However,  
>>>>>> you will
>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more  
>>>>>> precisely
>>>>>> the seqid of the location object. I don't know whether that is  
>>>>>> always
>>>>>> the case, but it is in your particular example.
>>>>>> So, to get your hands on that value you have to do:
>>>>>>
>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- 
>>>>>> >get_SeqFeatures;
>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>
>>>>>> HTH,
>>>>>> Marc
>>>>>>
>>>>>> Marc Logghe
>>>>>> Senior Bioinformatician
>>>>>> Ablynx nv
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>
>>>>>>> ###file###
>>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>> XX
>>>>>>> PA   AB000170.1
>>>>>>> XX
>>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>> XX
>>>>>>> OS   Sus scrofa (pig)
>>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
>>>>>>> Euteleostomi;
>>>>>>> Mammalia;
>>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae;  
>>>>>>> Sus.
>>>>>>> OX   NCBI_TaxID=9823;
>>>>>>> .........
>>>>>>>
>>>>>>> I want the accession number in the line that starts with PA,  
>>>>>>> AB000170
>>>>>>> in this example.
>>>>>>>
>>>>>>> Can anybody kindly help, tell me which module and method I  
>>>>>>> should use?
>>>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>> Wen
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> -- 
>>>>> ===========================================================
>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>> ===========================================================
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>




More information about the Bioperl-l mailing list