[Bioperl-l] Refseq Version
Smithies, Russell
Russell.Smithies at agresearch.co.nz
Sun Feb 7 20:47:00 UTC 2010
Release 39 was Jan 30 and according to the README releases only come out in odd months (January, March, May, July, September, November)
The stats file is here: ftp://ftp.ncbi.nih.gov/refseq/release/release-statistics/RefSeq-release39.01232010.stats.txt
The numbers of sequences between the fasta release and the pre-build blast databases seem to differ but I guess only NCBI can explain that.
I can't see any way of extracting the release number from the pre-build blast databases (apart from the build date) but it might be worth asking NCBI if they'd include the information in future releases.
FYI, here's the old release stats.
(I wget'ed and grep'ed all the stats files)
Release
Date
Year
Organisms
Nucleotide Bases
Amino Acids
Records
1
Jun-30
2003
4,672,871,949
263,588,685
1,061,675
2
Oct-21
2003
2,124
7,745,398,573
286,957,682
1,097,404
3
Jan-13
2004
2,218
7,992,741,222
294,647,847
1,101,244
4
Mar-24
2004
2,358
8,175,128,887
318,253,841
1,193,457
5
May-02
2004
2,395
8,325,515,623
337,229,387
1,255,613
6
Jul-05
2004
2,467
8,696,371,716
365,446,682
1,367,206
7
Sep-12
2004
2,558
21,072,808,460
405,233,619
1,579,579
8
Oct-31
2004
2,645
26,814,386,658
430,300,369
1,709,723
9
Jan-09
2005
2,780
36,786,975,473
470,534,907
1,843,944
10
Mar-06
2005
2,827
36,893,741,150
482,862,858
1,893,478
11
May-08
2005
2,928
39,731,702,362
507,980,644
2,477,893
12
Jul-10
2005
2,969
43,043,256,058
608,493,108
2,869,675
13
Sep-11
2005
3,060
44,727,484,853
686,768,902
3,400,773
14
Nov-20
2005
3,198
47,364,955,367
763,761,075
3,272,776
15
Jan-01
2006
3,244
52,645,441,913
810,009,733
3,436,263
16
Mar-11
2006
3,397
56,175,443,059
887,509,001
3,715,260
17
May-01
2006
3,497
62,130,037,371
927,587,669
3,999,859
18
Jul-11
2006
3,695
70,474,041,999
974,374,765
4,186,692
19
Sep-10
2006
3,774
70,694,879,544
1,012,985,077
4,311,543
20
Nov-05
2006
3,919
72,679,681,505
1,061,797,276
4,567,569
21
Jan-06
2007
4,079
73,864,990,566
1,144,795,927
4,742,335
22
Mar-05
2007
4,187
82,441,128,546
1,215,085,694
5,207,865
23
May-08
2007
4,300
83,148,327,110
1,291,050,995
5,503,385
24
Jul-10
2007
4,511
89,856,995,521
1,365,916,222
6,073,814
25
Sep-11
2007
4,646
91,265,840,843
1,470,475,398
6,515,132
26
Nov-04
2007
4,737
99,105,705,485
1,495,032,507
6,698,250
27
Jan-06
2008
4,926
101,059,552,113
1,556,356,987
7,025,715
28
Mar-09
2008
5,059
102,051,350,525
1,770,627,427
7,914,560
29
May-04
2008
5,168
104,671,101,150
1,870,214,220
8,376,141
30
Jul-07
2008
5,395
105,074,486,709
1,913,447,691
8,572,852
31
Aug-30
2008
5,513
109,214,348,591
2,026,768,719
9,145,702
32
Nov-10
2008
5,726
111,122,203,221
2,089,596,746
9,501,764
33
Jan-16
2009
7,773
116,001,583,818
2,204,073,443
10,325,282
34
Mar-06
2009
8,054
111,792,574,830
2,299,682,138
10,021,870
35
May-04
2009
8,393
113,210,655,336
2,565,199,170
10,993,891
36
Jul-02
2009
8,665
117,013,741,530
2,756,884,219
12,141,825
37
Sep-03
2009
9,005
119,151,229,820
2,965,450,333
12,941,750
38
Nov-07
2009
9,166
119,196,622,435
3,115,246,540
13,436,447
--Russell
From: shalu sharma [mailto:sharmashalu.bio at gmail.com]
Sent: Saturday, 6 February 2010 3:56 a.m.
To: Smithies, Russell
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Refseq Version
Hi Russell,
Thanks for your response.
I am getting the number of sequence in the database but not the release number (like 38, 39).
This is what i did:
$ fastacmd -I -d /db/ncbiblast/refseq/refseq_protein
Database: NCBI Protein Reference Sequences
7,585,993 sequences; 2,644,770,521 total letters
File names:
/db/ncbiblast/refseq/refseq_protein.00
Date: Jan 30, 2010 8:34 PM Version: 4 Longest sequence: 36,805 res
/db/ncbiblast/refseq/refseq_protein.01
Date: Jan 30, 2010 8:34 PM Version: 4 Longest sequence: 33,403 res
/db/ncbiblast/refseq/refseq_protein.02
Date: Jan 30, 2010 8:34 PM Version: 4 Longest sequence: 15,830 res
I am still confuse that how i can get the release number. I know refseq 39 was released on Jan 30, 2010 but i don't know how to confirm this. I also tried look refseq release file but was not able to get any thing.
I would really appreciate if anyone can help me out with this.
Thanks
Shalu
On Thu, Feb 4, 2010 at 6:39 PM, Smithies, Russell <Russell.Smithies at agresearch.co.nz<mailto:Russell.Smithies at agresearch.co.nz>> wrote:
If you have access to the blast database, use fastacmd -I -d databasename
Otherwise, it's usually at the bottom of your blast result.
--Russell
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of shalu sharma
> Sent: Friday, 5 February 2010 11:02 a.m.
> To: bioperl-l at lists.open-bio.org<mailto:bioperl-l at lists.open-bio.org>
> Subject: [Bioperl-l] Refseq Version
>
> Hi All,
> This is not a bioperl query.
> Is there any way to check refseq version (release). Actually i am using
> some
> server to blast my sequences (blastall) against refseq. Is there any way i
> can get the version information on the refseq database (from the blast
> file
> or directly from the database)?
>
> Thanks
> Shalu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
More information about the Bioperl-l
mailing list