[BioPython] biopython tutorial

Tue Aug 5 22:55:30 UTC 2008

>>> 4.
>>> the 814 hits are now 816 throughout
>>
>> That number is always going to increase - maybe we can reword things
>> slightly to make it clear that may not be exactly what the user will
>> see.
>
> Yeah I figured it was this no worries.

I might do that tomorrow (along with the links below...)

> If you want to be OCD like I apparently am you could add a note to this effect.

Having a perfectionist looking after documentation or a website can be
a good thing.

>>> 5.
>>> add links for prosite & swissprot db downloads
>>
>> Where would you add these, and which URLs did you have in mind?
>
>
> I was thinking in this section:
>
> ========
> To parse a file that contains more than one Swiss-Prot record, we use the
> parse function instead. This function allows us to iterate over the records
> in the file. For example, let's parse the full Swiss-Prot database and
> collect all the descriptions. The full Swiss-Prot database, downloaded from
> ExPASy on 4 December 2007, contains 290484 Swiss-Prot records in a single
> gzipped-file uniprot_sprot.dat.gz.
> ========
>
> ...it could link to:
> ftp://ca.expasy.org/databases/uniprot/current_release/knowledgebase/complete
>
> ...and in this section:
>
> ========
> In general, a Prosite file can contain more than one Prosite records. For
> example, the full set of Prosite records, which can be downloaded as a
> single file (prosite.dat) from ExPASy, contains 2073 records in (version
> 20.24 released on 4 December 2007). To parse such a file, we again make use
> of an iterator:
> ========
>
> ...it could link to:
> ftp://ftp.expasy.org/databases/prosite/
>
> I found these without too much trouble on my own of course but might be
> handy for newbies.

That looks sensible...

> Also, the tutorial might give an estimate of how long it will take to parse
> the full Swiss-Prot DB, I waited a few minutes & then decided to move on.
>  Maybe a smaller file or subset with just e.g. 100 records would be
> appropriate for the tutorial?

It will depend very much on the computer (hard drive mostly).  As I
recall somewhere between 2 and 10 minutes sounds about right.

>>> 7.
>>> ============
>>> As the name suggests, this is a really simple consensus calculator, and
>>> will ...
>>
>> The default is 0.7 for any sequence type (DNA, protein, etc).  Do you
>> mean which way round is the percentage counted (the letter has to be
>> above 70% I think)?
>
> I meant that this sentence in the above para: "if the most common value is
> higher than some threshold value (the default is .3)" should probably just
> say 0.7 I think.

I see it now, fixed in CVS.

Thanks!

Peter