[Bioperl-l] Refseq Version
    Smithies, Russell 
    Russell.Smithies at agresearch.co.nz
       
    Sun Feb  7 20:47:00 UTC 2010
    
    
  
Release 39 was Jan 30 and according to the README releases only come out in odd months (January, March, May, July, September, November)
The stats file is here: ftp://ftp.ncbi.nih.gov/refseq/release/release-statistics/RefSeq-release39.01232010.stats.txt
The numbers of sequences between the fasta release and the pre-build blast databases seem to differ but I guess only NCBI can explain that.
I can't see any way of extracting the release number from the pre-build blast databases (apart from the build date) but it might be worth asking NCBI if they'd include the information in future releases.
FYI, here's the old release stats.
(I wget'ed and grep'ed all the stats files)
Release
Date
Year
Organisms
Nucleotide Bases
Amino Acids
Records
1
Jun-30
2003
            4,672,871,949
            263,588,685
          1,061,675
2
Oct-21
2003
        2,124
            7,745,398,573
            286,957,682
          1,097,404
3
Jan-13
2004
        2,218
            7,992,741,222
            294,647,847
          1,101,244
4
Mar-24
2004
        2,358
            8,175,128,887
            318,253,841
          1,193,457
5
May-02
2004
        2,395
            8,325,515,623
            337,229,387
          1,255,613
6
Jul-05
2004
        2,467
            8,696,371,716
            365,446,682
          1,367,206
7
Sep-12
2004
        2,558
          21,072,808,460
            405,233,619
          1,579,579
8
Oct-31
2004
        2,645
          26,814,386,658
            430,300,369
          1,709,723
9
Jan-09
2005
        2,780
          36,786,975,473
            470,534,907
          1,843,944
10
Mar-06
2005
        2,827
          36,893,741,150
            482,862,858
          1,893,478
11
May-08
2005
        2,928
          39,731,702,362
            507,980,644
          2,477,893
12
Jul-10
2005
        2,969
          43,043,256,058
            608,493,108
          2,869,675
13
Sep-11
2005
        3,060
          44,727,484,853
            686,768,902
          3,400,773
14
Nov-20
2005
        3,198
          47,364,955,367
            763,761,075
          3,272,776
15
Jan-01
2006
        3,244
          52,645,441,913
            810,009,733
          3,436,263
16
Mar-11
2006
        3,397
          56,175,443,059
            887,509,001
          3,715,260
17
May-01
2006
        3,497
          62,130,037,371
            927,587,669
          3,999,859
18
Jul-11
2006
        3,695
          70,474,041,999
            974,374,765
          4,186,692
19
Sep-10
2006
        3,774
          70,694,879,544
         1,012,985,077
          4,311,543
20
Nov-05
2006
        3,919
          72,679,681,505
         1,061,797,276
          4,567,569
21
Jan-06
2007
        4,079
          73,864,990,566
         1,144,795,927
          4,742,335
22
Mar-05
2007
        4,187
          82,441,128,546
         1,215,085,694
          5,207,865
23
May-08
2007
        4,300
          83,148,327,110
         1,291,050,995
          5,503,385
24
Jul-10
2007
        4,511
          89,856,995,521
         1,365,916,222
          6,073,814
25
Sep-11
2007
        4,646
          91,265,840,843
         1,470,475,398
          6,515,132
26
Nov-04
2007
        4,737
          99,105,705,485
         1,495,032,507
          6,698,250
27
Jan-06
2008
        4,926
         101,059,552,113
         1,556,356,987
          7,025,715
28
Mar-09
2008
        5,059
         102,051,350,525
         1,770,627,427
          7,914,560
29
May-04
2008
        5,168
         104,671,101,150
         1,870,214,220
          8,376,141
30
Jul-07
2008
        5,395
         105,074,486,709
         1,913,447,691
          8,572,852
31
Aug-30
2008
        5,513
         109,214,348,591
         2,026,768,719
          9,145,702
32
Nov-10
2008
        5,726
         111,122,203,221
         2,089,596,746
          9,501,764
33
Jan-16
2009
        7,773
         116,001,583,818
         2,204,073,443
        10,325,282
34
Mar-06
2009
        8,054
         111,792,574,830
         2,299,682,138
        10,021,870
35
May-04
2009
        8,393
         113,210,655,336
         2,565,199,170
        10,993,891
36
Jul-02
2009
        8,665
         117,013,741,530
         2,756,884,219
        12,141,825
37
Sep-03
2009
        9,005
         119,151,229,820
         2,965,450,333
        12,941,750
38
Nov-07
2009
        9,166
         119,196,622,435
         3,115,246,540
        13,436,447
--Russell
From: shalu sharma [mailto:sharmashalu.bio at gmail.com]
Sent: Saturday, 6 February 2010 3:56 a.m.
To: Smithies, Russell
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Refseq Version
Hi Russell,
               Thanks for your response.
I am getting the number of sequence in the database but not the release number (like 38, 39).
This is what i did:
$ fastacmd -I -d /db/ncbiblast/refseq/refseq_protein
Database: NCBI Protein Reference Sequences
           7,585,993 sequences; 2,644,770,521 total letters
File names:
/db/ncbiblast/refseq/refseq_protein.00
   Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 36,805 res
/db/ncbiblast/refseq/refseq_protein.01
   Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 33,403 res
/db/ncbiblast/refseq/refseq_protein.02
   Date: Jan 30, 2010  8:34 PM    Version: 4    Longest sequence: 15,830 res
I am still confuse that how i can get the release number. I know refseq 39 was released on Jan 30, 2010 but i don't know how to confirm this. I also tried look refseq release file but was not able to get any thing.
I would really appreciate if anyone can help me out with this.
Thanks
Shalu
On Thu, Feb 4, 2010 at 6:39 PM, Smithies, Russell <Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.nz>> wrote:
If you have access to the blast database, use fastacmd -I -d databasename
Otherwise, it's usually at the bottom of your blast result.
--Russell
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of shalu sharma
> Sent: Friday, 5 February 2010 11:02 a.m.
> To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> Subject: [Bioperl-l] Refseq Version
>
> Hi All,
>       This is not a bioperl query.
> Is there any way to check refseq version (release). Actually i am using
> some
> server to blast my sequences (blastall) against refseq. Is there any way i
> can get the version information on the refseq database (from the blast
> file
> or directly from the database)?
>
> Thanks
> Shalu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
    
    
More information about the Bioperl-l
mailing list