[BioPython] accessing "data quality" Phrap records in Genbank
Emanuel Hey
jodyhey at yahoo.com
Fri Aug 3 01:20:38 UTC 2007
Thanks. urllib does the job quite well.
--- Peter <biopython at maubp.freeserve.co.uk> wrote:
> Emanuel Hey wrote:
> > for some sequence records, NCBI has a a record of
> the
> > Phrap scores corresponding to the sequence (i.e.
> one
> > score for each base).
> >
> > These are typically records containing draft
> sequences
> > from genome projects
> >
> > to see an example, try this link
> >
> >
>
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512
> >
> > How could I go about downloading these sequence
> > quality scores?
>
> One option for getting the data would be to
> construct the URL then
> download it using standard python tools, e.g. the
> urllib.urlretrieve
> function. Alternatively Biopython has some
> NCBI/Entrez code you might be
> able to use...
>
> The second step is actually parsing the data file
> into a usable form.
> The "Base Quality" format looks very easy to parse,
> with a FASTA like
> header followed by space separated decimal scores.
> Their XML format
> also looks fairly simple - the core data looks like
> its held as a string
> where each two characters represents one score in
> hex. As far as I
> could see based on the URL you gave, none of the
> other format options
> actually contain the "data quality" information.
>
> I'm not aware of any code in Biopython to cope with
> either of these file
> formats.
>
> > I need to filter the data by a certain score
>
> Are you trying to select parts of the associated
> sequence?
>
> Peter
>
>
____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469
More information about the Biopython
mailing list