[Bioperl-l] getting data from ncbi

Sean Davis sdavis2 at mail.nih.gov
Sun Jun 13 19:51:09 EDT 2004


Aditi,

If you want to know gene name, I would suggest blasting against RefSeq or
some other database of transcripts so that you can actually retrieve genes.
Blasting against nr is not always going to get you genes (most of the time
it won't.).  As for getting chromosome positions, once you know the genes,
you could use UCSC genome browser to get the positions.  Alternatively, you
could blast (or blat) your sequences against the genome and obtain the
chromosome positions directly.  I'm not sure that feeding the gi number into
entrez is going to get you useful information if your goal is to find genes
and chromosome positions.

Sean

----- Original Message -----
From: "aditi gupta" <aditi9783 at yahoo.co.in>
To: <bioperl-l at portal.open-bio.org>
Sent: Sunday, June 13, 2004 6:14 AM
Subject: [Bioperl-l] getting data from ncbi


> hi to all,
>
> i had a file which contained following data:
>
> # BLASTN 2.2.9 [May-01-2004]
> # Query: gi|37182815|gb|AY358849.1| Homo sapiens clone DNA180287 ALTE
> (UNQ6508) mRNA, complete cds
> # Database: nr
> # Fields: Query id, Subject id, % identity, alignment length,
> mismatches, gap openings, q. start, q. end, s. start, s. end, e-value,
> bit score
> gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0
> 0 552 568 3218 3234   1.1 34.19
> gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435
> 455 56604 56624   1.1 34.19
> gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 100.00 16 0 0 260
> 275 89982 89967   4.2 32.21
> gi|37182815|gb|AY358849.1| gi|7385112|gb|AF222766.1|AF222766 100.00 17 0
> 0 345 361 242 226   1.1 34.19
>
> but i required only some of the fields, and with the help of members of
> this maillist, i succeeded and obtained following output:
>
> gi|28592069|gb|U63637.2|BTU63637   100.00   17   0   552   568
> gi|14318385|gb|AC089993.2|   95.24   21  1  435  455
> gi|14318385|gb|AC089993.2|  100.00   16  0  260  275
> gi|7385112|gb|AF222766.1|AF222766  100.00  17  0  345  361
>
> the code is:
>
> #!/usr/bin/perl
> $/ = undef;
> use Getopt::Long;
> (GetOptions("f|filename=s"=>\$file));
> open (IN,$file) or die "Error opening $file:$!\n";
> open (OUT,">>$file.txt")or die "Error opening $file.txt:$!\n";
> $list = <IN>;
> @seqs = split( /gi\|37182815\|gb\|AY358849.1\|/, $list );
> foreach $seq(@seqs){
> if ($seq =~ /(gi\|\d+\|gb\|[0-9A-Z.]+\|([0-9A-Z.]+)?)
>   \s*
>   ([0-9.]+)
>   \s+
>   (\d+)
>   \s+
>   (\d+)
>   \s+
>   \d+
>   \s+
>   (\d+)
>   \s+
>   (\d+)
>   /x)
> {
>  $id=$1;
>  $identity_percentage=$3;
>  $align_length=$4;
>  $mismatches=$5;
>  $q_start=$6;
>  $q_end=$7;
> }
> print OUT
> "\n$id\t$identity_percentage\t$align_length\t$mismatches\t$q_start\t$q_e
> nd\n";
> }
>
> exit;
>
>
>
> but i also have to feed the gi number(the first field) into ncbi entrez
> nucleotide site:
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
> and retreive the gene and chromosome name, if available from the
> resulting web page ........
> is it possible to get the gene n chromosome info in the output with
> other fields?what changes in code are required?
>
> please help!! i don't have any idea of using internet with perl......
>
> thanx a lot in advance,
>
> regards,
> aditi.
>
>
> Yahoo! India Matrimony: Find your partner online.
>
>


----------------------------------------------------------------------------
----


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list