[Bioperl-l] extract ncDNA

Chris Fields cjfields at uiuc.edu
Sun Feb 26 14:12:57 UTC 2006


You're not using bioperl.  See:

http://www.bioperl.org/wiki/HOWTO:Beginners

then go to:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Chris


On Feb 26, 2006, at 5:51 AM, perlmails at gmail.com wrote:

> Dear Bioperl group,
>
> I have been working on extracting non-coding DNA (ncDNA) sequences
> from an organimsm.
>
> I tried extracting the intergenic sequences from the sense-strand
> after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
> the EMBL feature table entries using the Bioperl and the additional
> script (mentioned below).
>
> Now, I realised that there is a problem to extract the ncDNA sequences
> from the negative-strand, Any ideas?
>
> To extract the ncDNAs from negative-strand, I thought of converting
> the negative-strand co-ordinates to sense-strand co-ordinates and
> adding these to the sense-strand cords. Then filter all the features
> (select the ncDNAs after discarding the features from EMBL FT) to get
> all the ncDNAs.
>
> Is there anything I am missing for using from the bioperl kit?
>
> ##<<<code start>>
> use strict;
>
> my $EMBL_cord_file = "Organism.feature.cords";  # feature
> co-ordinates: start \t end
> my $RAW_file = "Organism.raw";
> my $ncDNA_file = "Organism.ncDNA";
>
> open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
> open(RAW, $RAW_file) or die "Canot open RAW_file";
> open(OUT, ">$ncDNA_file") or die;
>
> my @dna=<RAW>;
> my $dna = join('', at dna);
>
> while($dna){
> 	$dna=~s/\s//g;
> 	while(<EMBLCORD>){
> 		my @cords = split /\t/;
> 		my	$start = $cords[0];
> 		my	$end = $cords[1];
> 		my $replaceString = "\n>$start..$end";
> 		substr($dna, $start-1, $end-$start+1, $replaceString);
> }
> 	print OUT $dna,"\n";
> 	exit;
> }
> ##<<<code end>>
>
> Another thing is, since I am reading the whole file in a scalar the
> script does not complete the extraction of all ncDNAs from the
> sense-strand. Obviously, the features are parsed first before the
> flattening of the 266,000 nt sequence into a single string.
>
> Any help would be appreciated.
>
> -PO
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list