[Bioperl-l] SNP reference file download

Chris Fields cjfields at uiuc.edu
Fri Jul 21 15:50:20 UTC 2006


You'll need the latest code from CVS; you could try (the highly
experimental) Bio::DB::EUtilities to get the raw flatfile XML data, then
pass everything through Bio::ClusterIO.  Currently there isn't tempfile,
file, or filehandle support for the EUtilities but I plan on adding this
soon.  You could also pipe STDOUT from one SNP retrieval script into STDIN
for the ClusterIO.

BTW, the EFetch object below accepts an array reference of primary IDs if
you want to use them instead, so you don't need to run an ESearch query
first.  To do this you'll need to set the database parameter (-db => 'snp');
the database from the ESearch query is passed to EFetch via the Cookie
object.

Chris

use Bio::DB::EUtilities;
use Bio::ClusterIO;

# save XML to tempfile for read/write 
open my $XMLDATA, '+>', 'tempfile.xml';

# ESearch for term, place data in search history
my $esearch= Bio::DB::EUtilities->new(-eutil       => 'esearch',
                                      -term        => 'dihydroorotase',
                                      -db          => 'snp',
                                      -usehistory  => 'y');

$esearch->get_response; 
print STDERR "Count: ", $esearch->count,"\n";

# efetch is default EUtility
my $efetch = Bio::DB::EUtilities->new(-cookie   => $esearch->next_cookie,
                                      -rettype  => 'flt'); # SNP flatfile

print $XMLDATA $efetch->get_response->content; 

seek ($XMLDATA, 0, 0); # don't forget to rewind...

my $cio = Bio::ClusterIO->new(-format   => 'dbsnp',
                              -fh       => $XMLDATA);

# $snp is a Bio::Variation::snp object, see perldoc for methods
while (my $snp = $cio->next_cluster) {
    print "ID : ",$snp->id,"\n";
}

close $XMLDATA;

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of vrramnar at student.cs.uwaterloo.ca
> Sent: Thursday, July 20, 2006 6:18 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] SNP reference file download
> 
> 
> Hello All,
> 
> I was wondering if anyone knew how to download an entire SNP reference
> file from
> NCBI?? Or even downloading the sequence data for a particular SNP.
> 
> I know how to do this via Bio::DB::GenBank, Bio::DB::SwissP, etc.. when
> referring
> to NM_##### but when I try to access rs###### files I am unsure of what
> Bio::DB
> to point to, if there is one.
> 
> For example, if I had the accession number: rs4986950 How could I retrieve
> NCBI's
> entire reference file for this SNP record OR just the SNP sequence
> relating to
> this accession number.
> 
> Any help on this subject would greatly be appreciated,
> 
> Rohan
> 
> 
> ----------------------------------------
> This mail sent through www.mywaterloo.ca
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list