[Bioperl-l] Get variation included in genbank file

Dave Messina David.Messina at sbc.su.se
Wed Jun 9 17:20:12 UTC 2010


Hi again Jessica,

Forgive my slowness here, but is this what you want to do?

1) Start with an NM_ mRNA record

	in your example, NM_001110556.1

2) Obtain the corresponding NG_ genomics locus record in Genbank format

	which would correspond to the example file you attached. Accession number NG_011506


Is that right?

There are probably more clever ways to do this, but here's how I would approach it:

1) extract the GeneID dbxref from the NM_ mRNA record using Bio::SeqIO.

	See http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_the_Features
	for details.

2) Use that to query the Gene database and get the related NG_ record

	I don't know exactly what the field name is for the NG_ record, but you can list them all using this example:
	http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F

	and figure it out via trial and error.

3) Once you have the NG_ id, you can retrieve the genbank record

	Here's the relevant example:
	http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO




So, by now it should be obvious that I'm presenting a general strategy. You'll have to do some legwork to get exactly what you want.


Good luck, and if you come up with a nice solution, please add it to the wiki!

Dave





> I would need to automatically get a gbk file like this with :Variation(dbSNP) included and correct mRNA/CDS regions, can it be done automatically using EUtilities, I am not sure about it.
> 
> thx
> 
> 
> On Mon, Jun 7, 2010 at 5:18 PM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Jessica,
> 
> 
> > Does any know how to include variation(dbSNP) in the genbank file format
> > automatically using NM_ accession number using bioperl?
> 
> I'm not sure I understand the question.
> 
> As far as I know, Genbank records don't include SNP information. See for example the record for human p53 (which has SNPs):
> 
>        http://www.ncbi.nlm.nih.gov/nuccore/NM_000546.4
> 
> 
> I think though you should be able to get to a dbSNP record if you have a NM_ accession number using the BioPerl interface to NCBI's EUtilities.
> 
> More information here:
> 
>        http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
> 
> 
> If that's not what you're after, could you clarify what you want to do?
> 
> 
> Dave
> 
> 
> 
> 
> -- 
> Jessica Jingping Sun
> <FLNA.gbk>





More information about the Bioperl-l mailing list