[Bioperl-l] get CDS start site for entry in NCBI

Matthew McCormack mccormack at molbio.mgh.harvard.edu
Wed Apr 17 19:08:22 EDT 2013


I am not much of a Perl coder and I have a few questions.

      First, I would like to write a script that will go to NCBI 
genebank and get the base number for the start of the CDS region, e.g. 
235 (given a particular accession number). I have looked at HOWTO's and 
documentation for Bio::SeqIO and Bio::DB::GenBank and I can cut and 
paste the examples and they work, but I can not figure out how to get 
what I want; the CDS start site. I have difficulty knowing what all the 
methods and their options are for the seqio object and seq_object. Most 
of the examples seem to be using a file to get information and not a 
website.

    Actually, what I have to start with is a TAIR locus number such as 
AT4g08500, but I can not search on this at NCBI and come up with a 
unique entry. I may have to have a table of conversions from TAIR locus 
number to accession numbers.

   Also, I was looking for a bit of advice. What I am doing is getting 
data off another web site. I have a script using the WWW::Mechanize 
module in which I can input a link and go to that webpage, and then go 
down a line of links (over 100) getting information from each link. As 
part of that information that I am getting is the number base of a 
binding site, but I want to know if that binding site is in the CDS. The 
start number is the start of the gene, so say if the binding site is 
235, then I want to know if this is in the CDS. This data is not 
provided by the website, that is why I want to go to NCBI and get the 
start of the CDS. The data at NCBI for 'gene' has the same length as the 
first webpage, but also contains the beginning of the CDS, say 299, so 
with this information I can tell if the binding site is in the CDS. Do 
you think the best way to do this is extract the info from the link on 
the first web page, then go to NCBI and extract the CDS, then back to 
the original web page and the next link, and so on, for a couple of 
hundred links ? Or is there a better way ? I am concerned about a script 
that will keep going back to NCBI.

Matthew



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.





More information about the Bioperl-l mailing list