[Bioperl-l] Genbank seq CODE

Jason Stajich jason@cgt.mc.duke.edu
Mon, 10 Jun 2002 14:27:38 -0400 (EDT)


[Pls try not use all caps in messages, it feels as if one is being shouted
at.]

On Mon, 10 Jun 2002, Melissa L. Kimball wrote:

> I THINK I AM DOING IT RIGHT????? MAYBE SINCE I OPENED AN "FTP" FILE HANDLE,
> DO I HAVE TO SPECIFY "STDOUT" BEFORE I USE  write->seq() ?
>
> #!/usr/bin/perl -v
>
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::Seq;
> use Bio::DB::NCBIHelper;
> use Bio::Annotation::Collection;
> use diagnostics;
>
> my $ftp = "/usr/bin/ftp";
> my $tmp = "genbankflatfile.txt";
> my $remotefile = "gbcu.flat.gz";
> my $localfile = "gbcu.flat.gz";
> my $host = "ftp.ncbi.nih.gov";
> my $dir = "/genbank/daily";
>
> open(FTP,"| $ftp -n -v $host > $tmp");
>
> print FTP "user anonymous mkimball\@med.unc.edu\n";
> print FTP "cd $dir\n";
> print FTP "binary\n";
> print FTP "get $remotefile $localfile\n";
> print FTP "quit\n";
>
> #close(FTP);
>
> #`gzip -d gbcu.flat.gz`
>
> $genbankfile = Bio::SeqIO->new('-file' => "gbcu.flat",'-format' =>
> 'genbank');
> $fastafile = Bio::SeqIO->new('-file' => "gbcu.fsa", '-format' => 'Fasta');
                                         ^^^^^^
	                                  ">gbcu.fsa"
One correction, this is why you aren't able to write - you haven't told
the program you want to open a writeable filehandle - you need the
">filename.fsa"

>
> while (my $sequence = $genbankfile->next_seq())
> {
>         my $thespecies = $sequence->species();   //YOUR WAY IS MUCH BETTER!!
>         my $specsci = $thespecies->species();
>
>         chop($specsci);
>
>        if ($specsci =~ /^gonorrhoea\b/i) {
>
>                 print "$specsci\n\n";
>
>                 $fastafile->write_seq($sequence);
>         }
> }
>
>
> IN THE CONDITION, I CHECK FOR ALL THOSE ENTRIES THAT ARE "gonorrhoea."  WHEN
> I ACTUALLY LOOK AT A *.seq FILE IT IS SPELLED "gonorrhoeae."  ALL OTHER
> SCIENTIFIC LITERATURE SPELLS IT THIS WAY.  STRANGE.
>
> HERE IS A CHUNK OF ANNOTATION.  I WILL DEFINITELY NEED THE DEFINITION LINE,
> SOURCE LINE, AND ORGANISM LINE.  POSSIBLY KEYWORDS, TITLE, AND FEATURES.
> THE QUERY WOULD BE ON THE STRING "gonorrhoeae":
>
>
> LOCUS       AB032563                1407 bp    DNA     linear   BCT
> 23-SEP-2000
> DEFINITION  Neisseria gonorrhoeae gene for efflux transporter membrane
> protein
>             AgrA, complete cds.
> ACCESSION   AB032563
> VERSION     AB032563.1  GI:10280997
> KEYWORDS    AgrA.
> SOURCE      Neisseria gonorrhoeae (strain:ATCC19424) DNA.
>   ORGANISM  Neisseria gonorrhoeae
>             Bacteria; Proteobacteria; beta subdivision; Neisseriaceae;
>             Neisseria.
> REFERENCE   1  (bases 1 to 1407)
>   AUTHORS   Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
>   TITLE     agrA gene involving to aminoglycoside resistance in Neisseria
>             gonorrhoeae
>   JOURNAL   Published Only in DataBase (2000) In press
> REFERENCE   2  (bases 1 to 1407)
>   AUTHORS   Murata,T., Gotoh,N., Sakota,E., Otsuki,M. and Nishino,T.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (20-SEP-1999) Takeshi Murata, Kyoto Phamaceutical
>             University, Microbiology; Misasagi Yamashina, Kyoto, Kyoto
>             607-8414, Japan (E-mail:murata@mb.kyoto-phu.ac.jp,
>             Tel:81-75-595-4642)
> FEATURES             Location/Qualifiers
>      source          1..1407
>                      /organism="Neisseria gonorrhoeae"
>                      /strain="ATCC19424"
>                      /db_xref="taxon:485"
>      gene            1..1407
>                      /gene="agrA"
>
>
>
> THANK YOU! THANK YOU! THANK YOU!
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu