[BioPython] accessing "data quality" Phrap records in Genbank

Fri Aug 3 01:20:38 UTC 2007

Thanks.  urllib does the job quite well.

--- Peter <biopython at maubp.freeserve.co.uk> wrote:

> Emanuel Hey wrote:
> > for some sequence records, NCBI has a a record of
> the
> > Phrap scores corresponding to the sequence  (i.e.
> one
> > score for each base). 
> > 
> > These are typically records containing draft
> sequences
> > from genome projects
> > 
> > to see an example, try this link
> > 
> >
>
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512
> > 
> > How could I go about downloading these sequence
> > quality scores?  
> 
> One option for getting the data would be to
> construct the URL then 
> download it using standard python tools, e.g. the
> urllib.urlretrieve 
> function. Alternatively Biopython has some
> NCBI/Entrez code you might be 
> able to use...
> 
> The second step is actually parsing the data file
> into a usable form. 
> The "Base Quality" format looks very easy to parse,
> with a FASTA like 
> header followed by space separated decimal scores. 
> Their XML format 
> also looks fairly simple - the core data looks like
> its held as a string 
> where each two characters represents one score in
> hex.  As far as I 
> could see based on the URL you gave, none of the
> other format options 
> actually contain the "data quality" information.
> 
> I'm not aware of any code in Biopython to cope with
> either of these file 
> formats.
> 
> > I need to filter the data by a certain score
> 
> Are you trying to select parts of the associated
> sequence?
> 
> Peter
> 
> 

____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469