[Bioperl-l] Refseq Version

Smithies, Russell Russell.Smithies at agresearch.co.nz
Sun Feb 7 21:05:17 UTC 2010


AAArrrgg, what is it with Outlook this morning!!!
Formatting kaput again but I'm sure you can work it out from there!

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Monday, 8 February 2010 9:59 a.m.
> To: 'shalu sharma'
> Cc: 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] Refseq Version
>
> I should have known it would break the formatting :-(
>
> Try this:
>
> Release 1:June 30, 2003;Release Size: 4672871949 bases, 263588685 amino
> acids, 1061675 records
> Release 2:October 21, 2003;Release Size: 2124 organisms, 7745398573
> nucleotide bases, 286957682 amino acids, 1097404 records
> Release 3:January 13, 2004;Release Size: 2218 organisms, 7992741222
> nucleotide bases, 294647847 amino acids, 1101244 records
> Release 4:March 24, 2004;Release Size: 2358 organisms, 8175128887
> nucleotide bases, 318253841 amino acids, 1193457 records
> Release 5:May 2 , 2004;Release Size: 2395 organisms, 8325515623 nucleotide
> bases, 337229387 amino acids, 1255613 records
> Release 6:July 5, 2004;Release Size: 2467 organisms, 8696371716 nucleotide
> bases, 365446682 amino acids, 1367206 records
> Release 7:September 12, 2004;Release Size: 2558 organisms, 21072808460
> nucleotide bases, 405233619 amino acids, 1579579 records
> Release 8:October 31, 2004;Release Size: 2645 organisms, 26814386658
> nucleotide bases, 430300369 amino acids, 1709723 records
> Release 9:January 9, 2005;Release Size:  2780 organisms, 36786975473
> nucleotide bases, 470534907 amino acids, 1843944 records
> Release 10:March 6, 2005;Release Size:2827 organisms, 36893741150
> nucleotide bases, 482862858 amino acids, 1893478 records
> Release 11:May 8, 2005;Release Size:2928 organisms, 39731702362 nucleotide
> bases, 507980644 amino acids, 2477893 records
> Release 12:July 10, 2005;Release Size:2969 organisms, 43043256058
> nucleotide bases, 608493108 amino acids, 2869675 records
> Release 13:September 11, 2005;Release Size:3060 organisms, 44727484853
> nucleotide bases, 686768902 amino acids, 3400773 records
> Release 14:November 20, 2005;Release Size:3198 organisms, 47364955367
> nucleotide bases, 763761075 amino acids, 3272776 records
> Release 15:January 1, 2006;Release Size:3244 organisms, 52645441913
> nucleotide bases, 810009733 amino acids, 3436263 records
> Release 16:March 11, 2006;Release Size:3397 organisms, 56175443059
> nucleotide bases, 887509001 amino acids, 3715260 records
> Release 17:May 1, 2006;Release Size:3497 organisms, 62130037371 nucleotide
> bases, 927587669 amino acids, 3999859 records
> Release 18:July 11, 2006;Release Size:3695 organisms, 70474041999
> nucleotide bases, 974374765 amino acids, 4186692 records
> Release 19:September 10, 2006;Release Size: 3774 organisms, 70694879544
> nucleotide bases, 1012985077 amino acids, 4311543 records
> Release 20:November 5, 2006;Release Size:3919 organisms, 72679681505
> nucleotide bases, 1061797276 amino acids, 4567569 records
> Release 21:January 6, 2007;Release Size:4079 organisms, 73864990566
> nucleotide bases, 1144795927 amino acids, 4742335 records
> Release 22:March 5, 2007;Release Size:4187 organisms, 82441128546
> nucleotide bases, 1215085694 amino acids, 5207865 records
> Release 23:May 8, 2007;Release Size:4300 organisms, 83148327110 nucleotide
> bases, 1291050995 amino acids, 5503385 records
> Release 24:July 10, 2007;Release Size:4511 organisms, 89856995521
> nucleotide bases, 1365916222 amino acids, 6073814 records
> Release 25:September 11, 2007;Release Size:4646 organisms, 91265840843
> nucleotide bases, 1470475398 amino acids, 6515132 records
> Release 26:November 4, 2007;Release Size:4737 organisms, 99105705485
> nucleotide bases, 1495032507 amino acids, 6698250 records
> Release 27:January 6, 2008;Release Size:4926 organisms, 101059552113
> nucleotide bases, 1556356987 amino acids, 7025715 records
> Release 28:March 9, 2008;Release Size: 5059 organisms, 102051350525
> nucleotide bases, 1770627427 amino acids, 7914560 records
> Release 29:May 4, 2008;Release Size:5168 organisms, 104671101150
> nucleotide bases, 1870214220 amino acids, 8376141 records
> Release 30:July 7, 2008;Release Size:5395 organisms, 105074486709
> nucleotide bases, 1913447691 amino acids, 8572852 records
> Release 31:August 30, 2008;Release Size: 5513 organisms, 109214348591
> nucleotide bases, 2026768719 amino acids, 9145702 records
> Release 32:November 10, 2008;Release Size: 5726 organisms, 111122203221
> nucleotide bases, 2089596746 amino acids, 9501764 records
> Release 33:January 16, 2009;Release Size:7773 organisms, 116001583818
> nucleotide bases, 2204073443 amino acids, 10325282 records
> Release 34:March 6, 2009;Release Size: 8054 organisms, 111792574830
> nucleotide bases, 2299682138 amino acids, 10021870 records
> Release 35:May 4, 2009;Release Size: 8393 organisms, 113210655336
> nucleotide bases, 2565199170 amino acids, 10993891 records
> Release 36:July 2, 2009;Release Size: 8665 organisms, 117013741530
> nucleotide bases, 2756884219 amino acids, 12141825 records
> Release 37:September 3, 2009;Release Size: 9005 organisms, 119151229820
> nucleotide bases, 2965450333 amino acids, 12941750 records
> Release 38:November 7, 2009;Release Size: 9166 organisms, 119196622435
> nucleotide bases, 3115246540 amino acids, 13436447 records
>
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > Sent: Monday, 8 February 2010 9:47 a.m.
> > To: 'shalu sharma'
> > Cc: 'bioperl-l at lists.open-bio.org'
> > Subject: Re: [Bioperl-l] Refseq Version
> >
> > Release 39 was Jan 30 and according to the README releases only come out
> > in odd months (January, March, May, July, September, November)
> > The stats file is here: ftp://ftp.ncbi.nih.gov/refseq/release/release-
> > statistics/RefSeq-release39.01232010.stats.txt
> >
> > The numbers of sequences between the fasta release and the pre-build
> blast
> > databases seem to differ but I guess only NCBI can explain that.
> > I can't see any way of extracting the release number from the pre-build
> > blast databases (apart from the build date) but it might be worth asking
> > NCBI if they'd include the information in future releases.
> >
> >
> > FYI, here's the old release stats.
> > (I wget'ed and grep'ed all the stats files)
> >
> > Release
> >
> > Date
> >
> > Year
> >
> > Organisms
> >
> > Nucleotide Bases
> >
> > Amino Acids
> >
> > Records
> >
> > 1
> >
> > Jun-30
> >
> > 2003
> >
> >             4,672,871,949
> >
> >             263,588,685
> >
> >           1,061,675
> >
> > 2
> >
> > Oct-21
> >
> > 2003
> >
> >         2,124
> >
> >             7,745,398,573
> >
> >             286,957,682
> >
> >           1,097,404
> >
> > 3
> >
> > Jan-13
> >
> > 2004
> >
> >         2,218
> >
> >             7,992,741,222
> >
> >             294,647,847
> >
> >           1,101,244
> >
> > 4
> >
> > Mar-24
> >
> > 2004
> >
> >         2,358
> >
> >             8,175,128,887
> >
> >             318,253,841
> >
> >           1,193,457
> >
> > 5
> >
> > May-02
> >
> > 2004
> >
> >         2,395
> >
> >             8,325,515,623
> >
> >             337,229,387
> >
> >           1,255,613
> >
> > 6
> >
> > Jul-05
> >
> > 2004
> >
> >         2,467
> >
> >             8,696,371,716
> >
> >             365,446,682
> >
> >           1,367,206
> >
> > 7
> >
> > Sep-12
> >
> > 2004
> >
> >         2,558
> >
> >           21,072,808,460
> >
> >             405,233,619
> >
> >           1,579,579
> >
> > 8
> >
> > Oct-31
> >
> > 2004
> >
> >         2,645
> >
> >           26,814,386,658
> >
> >             430,300,369
> >
> >           1,709,723
> >
> > 9
> >
> > Jan-09
> >
> > 2005
> >
> >         2,780
> >
> >           36,786,975,473
> >
> >             470,534,907
> >
> >           1,843,944
> >
> > 10
> >
> > Mar-06
> >
> > 2005
> >
> >         2,827
> >
> >           36,893,741,150
> >
> >             482,862,858
> >
> >           1,893,478
> >
> > 11
> >
> > May-08
> >
> > 2005
> >
> >         2,928
> >
> >           39,731,702,362
> >
> >             507,980,644
> >
> >           2,477,893
> >
> > 12
> >
> > Jul-10
> >
> > 2005
> >
> >         2,969
> >
> >           43,043,256,058
> >
> >             608,493,108
> >
> >           2,869,675
> >
> > 13
> >
> > Sep-11
> >
> > 2005
> >
> >         3,060
> >
> >           44,727,484,853
> >
> >             686,768,902
> >
> >           3,400,773
> >
> > 14
> >
> > Nov-20
> >
> > 2005
> >
> >         3,198
> >
> >           47,364,955,367
> >
> >             763,761,075
> >
> >           3,272,776
> >
> > 15
> >
> > Jan-01
> >
> > 2006
> >
> >         3,244
> >
> >           52,645,441,913
> >
> >             810,009,733
> >
> >           3,436,263
> >
> > 16
> >
> > Mar-11
> >
> > 2006
> >
> >         3,397
> >
> >           56,175,443,059
> >
> >             887,509,001
> >
> >           3,715,260
> >
> > 17
> >
> > May-01
> >
> > 2006
> >
> >         3,497
> >
> >           62,130,037,371
> >
> >             927,587,669
> >
> >           3,999,859
> >
> > 18
> >
> > Jul-11
> >
> > 2006
> >
> >         3,695
> >
> >           70,474,041,999
> >
> >             974,374,765
> >
> >           4,186,692
> >
> > 19
> >
> > Sep-10
> >
> > 2006
> >
> >         3,774
> >
> >           70,694,879,544
> >
> >          1,012,985,077
> >
> >           4,311,543
> >
> > 20
> >
> > Nov-05
> >
> > 2006
> >
> >         3,919
> >
> >           72,679,681,505
> >
> >          1,061,797,276
> >
> >           4,567,569
> >
> > 21
> >
> > Jan-06
> >
> > 2007
> >
> >         4,079
> >
> >           73,864,990,566
> >
> >          1,144,795,927
> >
> >           4,742,335
> >
> > 22
> >
> > Mar-05
> >
> > 2007
> >
> >         4,187
> >
> >           82,441,128,546
> >
> >          1,215,085,694
> >
> >           5,207,865
> >
> > 23
> >
> > May-08
> >
> > 2007
> >
> >         4,300
> >
> >           83,148,327,110
> >
> >          1,291,050,995
> >
> >           5,503,385
> >
> > 24
> >
> > Jul-10
> >
> > 2007
> >
> >         4,511
> >
> >           89,856,995,521
> >
> >          1,365,916,222
> >
> >           6,073,814
> >
> > 25
> >
> > Sep-11
> >
> > 2007
> >
> >         4,646
> >
> >           91,265,840,843
> >
> >          1,470,475,398
> >
> >           6,515,132
> >
> > 26
> >
> > Nov-04
> >
> > 2007
> >
> >         4,737
> >
> >           99,105,705,485
> >
> >          1,495,032,507
> >
> >           6,698,250
> >
> > 27
> >
> > Jan-06
> >
> > 2008
> >
> >         4,926
> >
> >          101,059,552,113
> >
> >          1,556,356,987
> >
> >           7,025,715
> >
> > 28
> >
> > Mar-09
> >
> > 2008
> >
> >         5,059
> >
> >          102,051,350,525
> >
> >          1,770,627,427
> >
> >           7,914,560
> >
> > 29
> >
> > May-04
> >
> > 2008
> >
> >         5,168
> >
> >          104,671,101,150
> >
> >          1,870,214,220
> >
> >           8,376,141
> >
> > 30
> >
> > Jul-07
> >
> > 2008
> >
> >         5,395
> >
> >          105,074,486,709
> >
> >          1,913,447,691
> >
> >           8,572,852
> >
> > 31
> >
> > Aug-30
> >
> > 2008
> >
> >         5,513
> >
> >          109,214,348,591
> >
> >          2,026,768,719
> >
> >           9,145,702
> >
> > 32
> >
> > Nov-10
> >
> > 2008
> >
> >         5,726
> >
> >          111,122,203,221
> >
> >          2,089,596,746
> >
> >           9,501,764
> >
> > 33
> >
> > Jan-16
> >
> > 2009
> >
> >         7,773
> >
> >          116,001,583,818
> >
> >          2,204,073,443
> >
> >         10,325,282
> >
> > 34
> >
> > Mar-06
> >
> > 2009
> >
> >         8,054
> >
> >          111,792,574,830
> >
> >          2,299,682,138
> >
> >         10,021,870
> >
> > 35
> >
> > May-04
> >
> > 2009
> >
> >         8,393
> >
> >          113,210,655,336
> >
> >          2,565,199,170
> >
> >         10,993,891
> >
> > 36
> >
> > Jul-02
> >
> > 2009
> >
> >         8,665
> >
> >          117,013,741,530
> >
> >          2,756,884,219
> >
> >         12,141,825
> >
> > 37
> >
> > Sep-03
> >
> > 2009
> >
> >         9,005
> >
> >          119,151,229,820
> >
> >          2,965,450,333
> >
> >         12,941,750
> >
> > 38
> >
> > Nov-07
> >
> > 2009
> >
> >         9,166
> >
> >          119,196,622,435
> >
> >          3,115,246,540
> >
> >         13,436,447
> >
> >
> >
> > --Russell
> >
> >
> > From: shalu sharma [mailto:sharmashalu.bio at gmail.com]
> > Sent: Saturday, 6 February 2010 3:56 a.m.
> > To: Smithies, Russell
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Refseq Version
> >
> > Hi Russell,
> >                Thanks for your response.
> > I am getting the number of sequence in the database but not the release
> > number (like 38, 39).
> > This is what i did:
> >
> > $ fastacmd -I -d /db/ncbiblast/refseq/refseq_protein
> > Database: NCBI Protein Reference Sequences
> >            7,585,993 sequences; 2,644,770,521 total letters
> >
> > File names:
> > /db/ncbiblast/refseq/refseq_protein.00
> >    Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 36,805
> > res
> > /db/ncbiblast/refseq/refseq_protein.01
> >    Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 33,403
> > res
> > /db/ncbiblast/refseq/refseq_protein.02
> >    Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 15,830
> > res
> >
> > I am still confuse that how i can get the release number. I know refseq
> 39
> > was released on Jan 30, 2010 but i don't know how to confirm this. I
> also
> > tried look refseq release file but was not able to get any thing.
> >
> > I would really appreciate if anyone can help me out with this.
> >
> > Thanks
> > Shalu
> >
> > On Thu, Feb 4, 2010 at 6:39 PM, Smithies, Russell
> >
> <Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.n
> > z>> wrote:
> > If you have access to the blast database, use fastacmd -I -d
> databasename
> > Otherwise, it's usually at the bottom of your blast result.
> >
> > --Russell
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-
> > bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
> > > bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On
> Behalf
> > Of shalu sharma
> > > Sent: Friday, 5 February 2010 11:02 a.m.
> > > To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> > > Subject: [Bioperl-l] Refseq Version
> > >
> > > Hi All,
> > >       This is not a bioperl query.
> > > Is there any way to check refseq version (release). Actually i am
> using
> > > some
> > > server to blast my sequences (blastall) against refseq. Is there any
> way
> > i
> > > can get the version information on the refseq database (from the blast
> > > file
> > > or directly from the database)?
> > >
> > > Thanks
> > > Shalu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list