[Biojava-l] Reading sequence identifier from file in FASTA format
Alan Acosta
zagato.gekko at gmail.com
Thu Nov 2 21:37:58 UTC 2006
Hello Everybody, this is my first write to the list, i'm a student of
computer science, and a really newbie into this subjects, so i'm
experimenting with BioJava against BioPerl...
My problem is that i'm trying to read FASTA file with
///////////////////
BufferedReader input = new BufferedReader( new
FileReader("NC_008009.fna") );
RichSequenceIterator seqIter = RichSequence.IOTools.readFastaDNA(
input, RichObjectFactory.getDefaultNamespace() );
RichSequence rseq = null;
if( seqIter.hasNext() )
{
rseq = seqIter.nextRichSequence();
System.out.println("Identifier: "+rseq.getIdentifier() );
System.out.println("Description: "+rseq.getDescription() );
System.out.println("SubList: "+rseq.subList(10, 20).seqString()
);
}
///////////////////
but i'm getting a "identifier: null" into the answer when it had to be a
"Identifier: 94967031", the description and sublist works good... in (
http://biojava.org/wiki/BioJava:BioJavaXDocs#Reading) says that:
>gi|<identifier>|<namespace>|<accession>.<version>|<name> <description>
identifier will be read it with setIdentifier() and will available through
getIdentifier() but i get a null.
Then... this file is really in FASTA format ? or i'm doing into the wrong
way this ?
The test file is:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Acidobacteria_bacterium_Ellin345/NC_008009.fna
And the header is:
>gi|94967031|ref|NC_008009.1| Acidobacteria bacterium Ellin345, complete
genome
CCGTGTGTTGCGCGGCCAGATGAGAAATTTCTATGTCCCTCTCGACCACGACTCCACCAGCTCCGAACCC
Where i can find a newbie tutorial for starter task into BioJava ?
I appreciate any help... thanks.
Farewell
Alan Acosta
Cali - Colombia
More information about the Biojava-l
mailing list