[Bioperl-l] getting data from ncbi

aditi gupta aditi9783 at yahoo.co.in
Sun Jun 13 06:14:48 EDT 2004


hi to all,
 
i had a file which contained following data:
 
# BLASTN 2.2.9 [May-01-2004]
# Query: gi|37182815|gb|AY358849.1| Homo sapiens clone DNA180287 ALTE (UNQ6508) mRNA, complete cds
# Database: nr
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
gi|37182815|gb|AY358849.1| gi|28592069|gb|U63637.2|BTU63637 100.00 17 0 0 552 568 3218 3234   1.1 34.19
gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 95.24 21 1 0 435 455 56604 56624   1.1 34.19
gi|37182815|gb|AY358849.1| gi|14318385|gb|AC089993.2| 100.00 16 0 0 260 275 89982 89967   4.2 32.21
gi|37182815|gb|AY358849.1| gi|7385112|gb|AF222766.1|AF222766 100.00 17 0 0 345 361 242 226   1.1 34.19
 
but i required only some of the fields, and with the help of members of this maillist, i succeeded and obtained following output:
 
gi|28592069|gb|U63637.2|BTU63637   100.00   17   0   552   568
gi|14318385|gb|AC089993.2|   95.24   21  1  435  455
gi|14318385|gb|AC089993.2|  100.00   16  0  260  275
gi|7385112|gb|AF222766.1|AF222766  100.00  17  0  345  361
 
the code is:
 
#!/usr/bin/perl 
$/ = undef;
use Getopt::Long;
(GetOptions("f|filename=s"=>\$file));
open (IN,$file) or die "Error opening $file:$!\n";
open (OUT,">>$file.txt")or die "Error opening $file.txt:$!\n";
$list = <IN>;
@seqs = split( /gi\|37182815\|gb\|AY358849.1\|/, $list );
foreach $seq(@seqs){
if ($seq =~ /(gi\|\d+\|gb\|[0-9A-Z.]+\|([0-9A-Z.]+)?)
  \s*
  ([0-9.]+)
  \s+
  (\d+)
  \s+
  (\d+)
  \s+
  \d+
  \s+
  (\d+)
  \s+
  (\d+)
  /x)
{
 $id=$1;
 $identity_percentage=$3;
 $align_length=$4;
 $mismatches=$5;
 $q_start=$6;
 $q_end=$7;
}
print OUT "\n$id\t$identity_percentage\t$align_length\t$mismatches\t$q_start\t$q_end\n";
}

exit;

 
 
but i also have to feed the gi number(the first field) into ncbi entrez nucleotide site:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
and retreive the gene and chromosome name, if available from the resulting web page ........
is it possible to get the gene n chromosome info in the output with other fields?what changes in code are required?
 
please help!! i don't have any idea of using internet with perl......
 
thanx a lot in advance,
 
regards,
aditi.


Yahoo! India Matrimony: Find your partner online.


More information about the Bioperl-l mailing list