From aumanga at biggjapan.com  Thu Jan  8 04:04:38 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Thu, 08 Jan 2009 18:04:38 +0900
Subject: [Biojava-l] Genebank Webservices (corrrect result page)
Message-ID: <4965C1A6.1030306@biggjapan.com>

Sorry, correct result page is :

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277

From aumanga at biggjapan.com  Thu Jan  8 04:09:22 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Thu, 08 Jan 2009 18:09:22 +0900
Subject: [Biojava-l] Genebank Webservices (corrrect result page)
In-Reply-To: <4965C1A6.1030306@biggjapan.com>
References: <4965C1A6.1030306@biggjapan.com>
Message-ID: <4965C2C2.2030801@biggjapan.com>

Greetings all,

Sorry if this is reposted!

I come from a computer science background and only have little knowledge 
in bioinformatics.
In the application I develop,I want to search for an genebank id (like 
4558277) from ncbi and want to retrieve the relavent PDB_ID.

For example :
Say for id '4558277', i get the result

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277

I can see the value '1F58_L' which is only significant to me.

I want to know where there is any webservice to do retrive this 
information.

That means, I send '4558277'  in SOAP input parameters and in the result 
I should get the value '1F58_L' .

I found following webservices and want to know whether I can use the one 
for 'Gene' :

http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html

thanks in advance,
umanga


Ashika Umanga Umagiliya wrote:
> Sorry, correct result page is :
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From aumanga at biggjapan.com  Thu Jan  8 03:58:09 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Thu, 08 Jan 2009 17:58:09 +0900
Subject: [Biojava-l] Genebank Webservices ?
Message-ID: <4965C021.2060109@biggjapan.com>

Greetings all,

I come from a computer science background and only have little knowledge 
in bioinformatics.
In the application I develop,I want to search for an genebank id (like 
4558277) from ncbi and want to retrieve the relavent PDB_ID.

For example :
Say for id '4558277', i get the result

http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277

I can see the value '1F58_L' which is only significant to me.

I want to know where there is any webservice to do retrive this information.

That means, I send '4558277'  in SOAP input parameters and in the result 
I should get the value '1F58_L' .

I found following webservices and want to know whether I can use the one 
for 'Gene' :

http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html

thanks in advance,
umanga


From holland at eaglegenomics.com  Thu Jan  8 06:07:43 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 08 Jan 2009 11:07:43 +0000
Subject: [Biojava-l] Genebank Webservices ?
In-Reply-To: <4965C021.2060109@biggjapan.com>
References: <4965C021.2060109@biggjapan.com>
Message-ID: <4965DE7F.8020209@eaglegenomics.com>

There is no generic interface to NCBI eUtils in BioJava, but one is planned.

In the meantime take a look at this existing BioJava 1.6 package, which
will query Genbank for a sequence and return a BioJava RichSequence
object containing the result. You can then search through the
annotations and features of the sequence to find the result you need.

This is for Gene records:

http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.html

Or the equivalent for Peptide records:

http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.html

cheers,
Richard

Ashika Umanga Umagiliya wrote:
> Greetings all,
> 
> I come from a computer science background and only have little knowledge
> in bioinformatics.
> In the application I develop,I want to search for an genebank id (like
> 4558277) from ncbi and want to retrieve the relavent PDB_ID.
> 
> For example :
> Say for id '4558277', i get the result
> 
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277
> 
> I can see the value '1F58_L' which is only significant to me.
> 
> I want to know where there is any webservice to do retrive this
> information.
> 
> That means, I send '4558277'  in SOAP input parameters and in the result
> I should get the value '1F58_L' .
> 
> I found following webservices and want to know whether I can use the one
> for 'Gene' :
> 
> http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html
> 
> thanks in advance,
> umanga
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From andreas.prlic at gmail.com  Mon Jan 12 04:28:22 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Mon, 12 Jan 2009 10:28:22 +0100
Subject: [Biojava-l] BioJava
In-Reply-To: <496ac2075c8cf4.53511495@wp.pl>
References: <496ac2075c8cf4.53511495@wp.pl>
Message-ID: <59a41c430901120128l53f2a5c8le0a122a0a73515@mail.gmail.com>

Hi Michal,

the code you sent looks fine to me. Still I am not sure if I fully
understand what you are trying to say. What do you mean with "each
hit" ?

>From our previous discussion I understand that you work with two sets
of atoms (residues)  where each position in one set corresponds to a
position in the other set. This means you know that all atoms are on
structurally equivalent positions and the two sets of atoms are of the
same size. If this is the case, then the SVDSuperimposer is the right
tool and you would include all atoms in the two sets for the RMSD
calculation.

If you work with 2 proteins where you do NOT know the structurally
equivalent positions at the start, then StructurePairAligner provides
an algorithm to align two proteins (of different length) and find
pairs of atoms (residues) on structurally equivalent positions. In
this case, the RMSD calculation considers the positions that are
equivalent and ignores the unaligned regions. Guess I should create a
wiki page for explaining this difference between SVDSuperimposer and
StructurePairAligner...

Andreas


2009/1/12 Micha? Lorenc <m.t.lorenc at wp.pl>:
> Dear Andreas,
> I used the SVDSuperimposer class, but after Calc.rotate and Calc.shift I
> would know which Atom is close to another Atom.
>
> SVDSuperimposer.getRMS(caAtoms1, caAtoms2) get me only for the whole
> protein structure the RMS value, but how could I get a RMS value for
> each hit?
>
> I attached you my code. Thank you in advance!
>
> Best regards,
>
> Michal
>
> ----------------------------------------------------
> Adam Ma?ysz na aukcji!
> Zobacz:
> http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2F522934896.html&sid=605
>


From aumanga at biggjapan.com  Thu Jan 15 20:03:24 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Fri, 16 Jan 2009 10:03:24 +0900
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers?
Message-ID: <496FDCDC.5010805@biggjapan.com>

Greetings all,

I come from a computer science background and at the moment I work on a 
Bioinformatics software.I really see the necessity to learn more on 
bioinformatics , quickly :)
I hear (and use blindly)all this words - "sequence alignment , epitopes 
, CDR , homology modeling ,docking,amino acids"...etc and at the moment 
I don't care much about them since I've been told what to happen and I 
implement it.
Where can i learn about this concepts easily , I mean for a guy come 
from mathematical and IT background ?/

Best regards,
umanga


From holland at eaglegenomics.com  Fri Jan 16 05:50:35 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 16 Jan 2009 10:50:35 +0000
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software
	engineers?
In-Reply-To: <496FDCDC.5010805@biggjapan.com>
References: <496FDCDC.5010805@biggjapan.com>
Message-ID: <4970667B.9030601@eaglegenomics.com>

Your best bet is a good old fashioned book. ;)

A quick search on Amazon threw up this one which looks like a very
helpful intro to cell biology for people like you (and me!) who have
come to bioinformatics from a computer science background:

http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4

Hopefully this is a good starting point. I'm sure everyone on this list
has their own favourite books which they could recommend to you as well.

cheers,
Richard


Ashika Umanga Umagiliya wrote:
> Greetings all,
> 
> I come from a computer science background and at the moment I work on a
> Bioinformatics software.I really see the necessity to learn more on
> bioinformatics , quickly :)
> I hear (and use blindly)all this words - "sequence alignment , epitopes
> , CDR , homology modeling ,docking,amino acids"...etc and at the moment
> I don't care much about them since I've been told what to happen and I
> implement it.
> Where can i learn about this concepts easily , I mean for a guy come
> from mathematical and IT background ?/
> 
> Best regards,
> umanga
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From markjschreiber at gmail.com  Fri Jan 16 07:27:23 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 16 Jan 2009 20:27:23 +0800
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software
	engineers?
In-Reply-To: <4970667B.9030601@eaglegenomics.com>
References: <496FDCDC.5010805@biggjapan.com>
	<4970667B.9030601@eaglegenomics.com>
Message-ID: <93b45ca50901160427l5941f82dy18b68f5000c32722@mail.gmail.com>

Wikipedia is always a good place to get a very rapid overview of some
unfamiliar biological term.

- Mark

On Fri, Jan 16, 2009 at 6:50 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
>
> Your best bet is a good old fashioned book. ;)
>
> A quick search on Amazon threw up this one which looks like a very
> helpful intro to cell biology for people like you (and me!) who have
> come to bioinformatics from a computer science background:
>
> http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4
>
> Hopefully this is a good starting point. I'm sure everyone on this list
> has their own favourite books which they could recommend to you as well.
>
> cheers,
> Richard
>
>
> Ashika Umanga Umagiliya wrote:
> > Greetings all,
> >
> > I come from a computer science background and at the moment I work on a
> > Bioinformatics software.I really see the necessity to learn more on
> > bioinformatics , quickly :)
> > I hear (and use blindly)all this words - "sequence alignment , epitopes
> > , CDR , homology modeling ,docking,amino acids"...etc and at the moment
> > I don't care much about them since I've been told what to happen and I
> > implement it.
> > Where can i learn about this concepts easily , I mean for a guy come
> > from mathematical and IT background ?/
> >
> > Best regards,
> > umanga
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From koen.bruynseels at cropdesign.com  Fri Jan 16 08:09:58 2009
From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com)
Date: Fri, 16 Jan 2009 14:09:58 +0100
Subject: [Biojava-l] Koen Bruynseels is out of the office.
Message-ID: <OFF96A90E2.94FBF795-ONC1257540.00485300-C1257540.004852FF@basf-c-s.be>


I will be out of the office starting  01/14/2009 and will not return until
01/25/2009.

I will respond to your message when I return.


From aumanga at biggjapan.com  Sun Jan 18 19:43:31 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Mon, 19 Jan 2009 09:43:31 +0900
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software
	engineers?
In-Reply-To: <4970667B.9030601@eaglegenomics.com>
References: <496FDCDC.5010805@biggjapan.com>
	<4970667B.9030601@eaglegenomics.com>
Message-ID: <4973CCB3.4000008@biggjapan.com>

Thanks everyone for the tips..
I started reading "BioInformatics for Dummies" to get the basics..then 
hoping to move on to the book Richard recommended.

Thank you again,
Best regards,
umanga


Richard Holland wrote:
> Your best bet is a good old fashioned book. ;)
>
> A quick search on Amazon threw up this one which looks like a very
> helpful intro to cell biology for people like you (and me!) who have
> come to bioinformatics from a computer science background:
>
> http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4
>
> Hopefully this is a good starting point. I'm sure everyone on this list
> has their own favourite books which they could recommend to you as well.
>
> cheers,
> Richard
>
>
> Ashika Umanga Umagiliya wrote:
>   
>> Greetings all,
>>
>> I come from a computer science background and at the moment I work on a
>> Bioinformatics software.I really see the necessity to learn more on
>> bioinformatics , quickly :)
>> I hear (and use blindly)all this words - "sequence alignment , epitopes
>> , CDR , homology modeling ,docking,amino acids"...etc and at the moment
>> I don't care much about them since I've been told what to happen and I
>> implement it.
>> Where can i learn about this concepts easily , I mean for a guy come
>> from mathematical and IT background ?/
>>
>> Best regards,
>> umanga
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>     
>
>   


From marcel.huntemann at gmail.com  Tue Jan 20 21:42:14 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Tue, 20 Jan 2009 18:42:14 -0800
Subject: [Biojava-l] How to get translated sequence out of blast result
Message-ID: <49768B86.20707@Gmail.com>

Hi!

I've a multiple fasta file with a lot of nucleotide sequences in it. I
made a blastx with this file against a database. Now I want to parse the
blast result. To be more precisely: I want to get the translated protein
query sequence with it's start and stop position for each hit.
I am using the example code from the BioJava cookbook
(http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). The parsing works
fine so far, besides one problem. I am able to get the start and stop
position for the query sequence via hit.getQueryStart() and
hit.getQueryEnd(). But I couldn't figure out yet, how to get the
translated protein query sequence out of the blast result. I couldn't find
something like hit.getQuerySequence() or similar. I would guess that
something like that exists already somehwere or am I wrong and I've to
implement it myself?

Thanks,
Marcel

From markjschreiber at gmail.com  Wed Jan 21 21:30:54 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 22 Jan 2009 10:30:54 +0800
Subject: [Biojava-l] Off topic: JDK6 and JAX-WS 2.1
Message-ID: <93b45ca50901211830i5af3e213p9db9a6d10f42fa75@mail.gmail.com>

Sorry for the off topic post but this is something that has caused me to
loose quite a bit of hair recently. If you're planning on doing webservice
development with JAX-WS don't use JDK6 unless you use a version more recent
than update 3.
I'll spare you the gory details but versions of JDK6u4 and onwards use
JAX-WS 2.1 which prevents the need for playing with endorsed directories etc
which is very tricky in IDEs and not un-complicated with Ant.

- Mark

From marcel.huntemann at gmail.com  Thu Jan 22 19:48:57 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Thu, 22 Jan 2009 16:48:57 -0800
Subject: [Biojava-l] Problem with blast file parser
Message-ID: <497913F9.70009@Gmail.com>

Hi!

I am experiencing a strange problem with the Blast parser. I am using the
code from the BioJava CookBook
(http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). In the blast file
are the results of 20 contigs. The problem is that the parser only gives
me the results of every other sequence. So I get the results for contig #
1, 3, 5, 7, 9, 11 and then it continues with the even ones 12, 14, 16, 18
and 20. Did anyone experience the same problems or maybe knows what causes
them?

Thanks,
Marcel


From charles at imbusch.net  Fri Jan 23 11:17:32 2009
From: charles at imbusch.net (Charles Imbusch)
Date: Fri, 23 Jan 2009 17:17:32 +0100
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497913F9.70009@Gmail.com>
References: <497913F9.70009@Gmail.com>
Message-ID: <4979ED9C.6040207@imbusch.net>

Hello Marcel,

I also do experience the problem that the parser is skipping
the even result numbers. I have not found a sufficient solution
for that, so I gave up on parsing on a blast result file containing
multiple results. Instead I splitted up the big fasta file into
serveral ones, so that I just get one result for one fasta file.
That works, even it's not the best solution for it.

Let me know if you find another solution for that problem.

Cheers,
 Charles


From markjschreiber at gmail.com  Fri Jan 23 21:20:36 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 24 Jan 2009 10:20:36 +0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <4979ED9C.6040207@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
Message-ID: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>

Is this XML parsing or blast text output?
- Mark

On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch <charles at imbusch.net>wrote:

> Hello Marcel,
>
> I also do experience the problem that the parser is skipping
> the even result numbers. I have not found a sufficient solution
> for that, so I gave up on parsing on a blast result file containing
> multiple results. Instead I splitted up the big fasta file into
> serveral ones, so that I just get one result for one fasta file.
> That works, even it's not the best solution for it.
>
> Let me know if you find another solution for that problem.
>
> Cheers,
> Charles
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From marcel.huntemann at gmail.com  Fri Jan 23 22:54:02 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Fri, 23 Jan 2009 19:54:02 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
Message-ID: <497A90DA.5070104@Gmail.com>

As I said, I am using the code from
http://biojava.org/wiki/BioJava:CookBook:Blast:Parser.
I have a normal text file that was created by blast. I thought that the
given code converts the input stream from the file into SAX events.
Do I have to do another step, before I use the code of that example?

Cheers,
Marcel


Mark Schreiber wrote:
> Is this XML parsing or blast text output?
> 
> - Mark
> 
> On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch <charles at imbusch.net
> <mailto:charles at imbusch.net>> wrote:
> 
>     Hello Marcel,
> 
>     I also do experience the problem that the parser is skipping
>     the even result numbers. I have not found a sufficient solution
>     for that, so I gave up on parsing on a blast result file containing
>     multiple results. Instead I splitted up the big fasta file into
>     serveral ones, so that I just get one result for one fasta file.
>     That works, even it's not the best solution for it.
> 
>     Let me know if you find another solution for that problem.
> 
>     Cheers,
>     Charles
> 
> 
>     _______________________________________________
>     Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 

From charles at imbusch.net  Sun Jan 25 06:54:53 2009
From: charles at imbusch.net (Charles Imbusch)
Date: Sun, 25 Jan 2009 12:54:53 +0100
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
Message-ID: <497C530D.7090006@imbusch.net>

Hello Mark,

same here. I'm parsing plain text output.

Cheers,
  Charles

Mark Schreiber schrieb:
> Is this XML parsing or blast text output?
>
> - Mark

From markjschreiber at gmail.com  Sun Jan 25 21:45:25 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 26 Jan 2009 10:45:25 +0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497C530D.7090006@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
	<497C530D.7090006@imbusch.net>
Message-ID: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>

Have you tried parsing the XML output. While the text parser does
'fake' XML by emitting SAX events I think that the XML parser may be a
lot more stable (the XML output of blast is more stable, at least
recently anyway).

I know this isn't the best solution to your problem but the default
text output of BLAST is not the most parseable. In fact it is probably
the least machine readable of all the blast outputs and definitely the
least stable.

- Mark

On Sun, Jan 25, 2009 at 7:54 PM, Charles Imbusch <charles at imbusch.net> wrote:
>
> Hello Mark,
>
> same here. I'm parsing plain text output.
>
> Cheers,
>  Charles
>
> Mark Schreiber schrieb:
>>
>> Is this XML parsing or blast text output?
>>
>> - Mark

From ahmed.elmasri at gmail.com  Mon Jan 26 01:35:20 2009
From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.)
Date: Mon, 26 Jan 2009 01:35:20 -0500
Subject: [Biojava-l] Depreciated methods
Message-ID: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>

Hello list,
I am new to BioJava and I have been trying some of its examples. I came
across some of the depreciated methods and I am not sure if they will be
removed entirely any time soon? I also found some of the examples
problematic and didn't run properly even though I am following the
instructions stated in the comment section.
Please let me know if you have answers for my questions.
Best wishes,
Ahmed

-- 
Ahmed Abdeen Hamed
Scientific Informatics Project Leader
Marine Biological Laboratory Woods Hole, MA
--
Ph.D. student, Complex Systems
School of Informatics, Indiana University

From holland at eaglegenomics.com  Mon Jan 26 03:10:42 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 26 Jan 2009 08:10:42 +0000
Subject: [Biojava-l] Depreciated methods
In-Reply-To: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>
References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>
Message-ID: <497D7002.3010101@eaglegenomics.com>

> I am new to BioJava and I have been trying some of its examples. I came
> across some of the depreciated methods and I am not sure if they will be
> removed entirely any time soon? 

Deprecated = may be removed without notice in any future release. I
couldn't say for any individual method, but in general it's a bad idea
to use anything that is marked deprecated when writing new code.

> I also found some of the examples
> problematic and didn't run properly even though I am following the
> instructions stated in the comment section.

Could you specify exactly which examples didn't work, and the exact
problems you had with them?

thanks,
Richard

> Please let me know if you have answers for my questions.
> Best wishes,
> Ahmed
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From charles at imbusch.net  Mon Jan 26 06:04:09 2009
From: charles at imbusch.net (Charles Imbusch)
Date: Mon, 26 Jan 2009 12:04:09 +0100
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>	
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	
	<497C530D.7090006@imbusch.net>
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
Message-ID: <497D98A9.6010904@imbusch.net>

Hello Mark,

no I haven't tried to parse XML output from Blast.
Just because of the fact that plain text output can be
viewed with any editor. That's still very convenient.

But I'm keen: is there actually an easy to use program for viewing
XML output?
Another option would be to generate XML and plain text
output from Blast at the same time (in one run). But I couldn't find a
way to do so. Maybe I missed something?

Cheers,
  Charles

Mark Schreiber schrieb:
> Have you tried parsing the XML output. While the text parser does
> 'fake' XML by emitting SAX events I think that the XML parser may be a
> lot more stable (the XML output of blast is more stable, at least
> recently anyway).
>
> I know this isn't the best solution to your problem but the default
> text output of BLAST is not the most parseable. In fact it is probably
> the least machine readable of all the blast outputs and definitely the
> least stable.
>
> - Mark
>   


From holland at eaglegenomics.com  Mon Jan 26 06:21:26 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 26 Jan 2009 11:21:26 +0000
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497D98A9.6010904@imbusch.net>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>		<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>		<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
	<497D98A9.6010904@imbusch.net>
Message-ID: <497D9CB6.4010403@eaglegenomics.com>

This app looks hopeful for viewing blast output - haven't tried it
though...:

http://www.korilog.com/index.php/BlastViewer.html

Otherwise, no there's no way of making Blast output in more than one
format at once. It's either text, or XML, but it won't do both.

cheers,
Richard

Charles Imbusch wrote:
> Hello Mark,
> 
> no I haven't tried to parse XML output from Blast.
> Just because of the fact that plain text output can be
> viewed with any editor. That's still very convenient.
> 
> But I'm keen: is there actually an easy to use program for viewing
> XML output?
> Another option would be to generate XML and plain text
> output from Blast at the same time (in one run). But I couldn't find a
> way to do so. Maybe I missed something?
> 
> Cheers,
>  Charles
> 
> Mark Schreiber schrieb:
>> Have you tried parsing the XML output. While the text parser does
>> 'fake' XML by emitting SAX events I think that the XML parser may be a
>> lot more stable (the XML output of blast is more stable, at least
>> recently anyway).
>>
>> I know this isn't the best solution to your problem but the default
>> text output of BLAST is not the most parseable. In fact it is probably
>> the least machine readable of all the blast outputs and definitely the
>> least stable.
>>
>> - Mark
>>   
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From ahmed.elmasri at gmail.com  Mon Jan 26 12:24:20 2009
From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.)
Date: Mon, 26 Jan 2009 12:24:20 -0500
Subject: [Biojava-l] Depreciated methods
In-Reply-To: <497D7002.3010101@eaglegenomics.com>
References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>
	<497D7002.3010101@eaglegenomics.com>
Message-ID: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com>

Thanks Richard!

For the deprecated methods, is there a reference or a wiki that maps the
deprecated ones to ones that should be used instead?

As for the examples I am having trouble with: WriteToFasta is one. Here is
the error that I am getting:
java.lang.IllegalArgumentException: No alphabet was set in the identifier
    at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928)
    at examples.WriteToFasta.main(WriteToFasta.java:43)

And here is my parameter value:
      //get the int constant for the file type
      int fileType = Integer.parseInt("2");

I would appreciate any help.
Best wishes,
Ahmed


On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> > I am new to BioJava and I have been trying some of its examples. I came
> > across some of the depreciated methods and I am not sure if they will be
> > removed entirely any time soon?
>
> Deprecated = may be removed without notice in any future release. I
> couldn't say for any individual method, but in general it's a bad idea
> to use anything that is marked deprecated when writing new code.
>
> > I also found some of the examples
> > problematic and didn't run properly even though I am following the
> > instructions stated in the comment section.
>
> Could you specify exactly which examples didn't work, and the exact
> problems you had with them?
>
> thanks,
> Richard
>
> > Please let me know if you have answers for my questions.
> > Best wishes,
> > Ahmed
> >
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>


-- 
Ahmed Abdeen Hamed
Scientific Informatics Project Leader
Marine Biological Laboratory Woods Hole, MA
--
Ph.D. student, Complex Systems
School of Informatics, Indiana University

From holland at eaglegenomics.com  Mon Jan 26 12:30:00 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 26 Jan 2009 17:30:00 +0000
Subject: [Biojava-l] Depreciated methods
In-Reply-To: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com>
References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>	
	<497D7002.3010101@eaglegenomics.com>
	<5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com>
Message-ID: <497DF318.8070703@eaglegenomics.com>

Most methods include a note saying which method should be used instead.
For those that don't, take a look in the org.biojavax packages to see if
there are suitable alternative classes.

In the case of the deprecated SeqIOTools.fileToBioJava, a much better
version of the FASTA parser/writer exists in the org.biojavax packages.
Instructions on how to use it are here:

http://www.biojava.org/wiki/BioJava:BioJavaXDocs#Example

cheers,
Richard

Hamed, Ahmed A. wrote:
> Thanks Richard!
> 
> For the deprecated methods, is there a reference or a wiki that maps the
> deprecated ones to ones that should be used instead?
> 
> As for the examples I am having trouble with: WriteToFasta is one. Here
> is the error that I am getting:
> java.lang.IllegalArgumentException: No alphabet was set in the identifier
>     at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928)
>     at examples.WriteToFasta.main(WriteToFasta.java:43)
> 
> And here is my parameter value:
>       //get the int constant for the file type
>       int fileType = Integer.parseInt("2");
> 
> I would appreciate any help.
> Best wishes,
> Ahmed
> 
> 
> 
> 
> On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland
> <holland at eaglegenomics.com <mailto:holland at eaglegenomics.com>> wrote:
> 
>     > I am new to BioJava and I have been trying some of its examples. I
>     came
>     > across some of the depreciated methods and I am not sure if they
>     will be
>     > removed entirely any time soon?
> 
>     Deprecated = may be removed without notice in any future release. I
>     couldn't say for any individual method, but in general it's a bad idea
>     to use anything that is marked deprecated when writing new code.
> 
>     > I also found some of the examples
>     > problematic and didn't run properly even though I am following the
>     > instructions stated in the comment section.
> 
>     Could you specify exactly which examples didn't work, and the exact
>     problems you had with them?
> 
>     thanks,
>     Richard
> 
>     > Please let me know if you have answers for my questions.
>     > Best wishes,
>     > Ahmed
>     >
> 
>     --
>     Richard Holland, BSc MBCS
>     Finance Director, Eagle Genomics Ltd
>     M: +44 7500 438846 | E: holland at eaglegenomics.com
>     <mailto:holland at eaglegenomics.com>
>     http://www.eaglegenomics.com/
> 
> 
> 
> 
> -- 
> Ahmed Abdeen Hamed
> Scientific Informatics Project Leader
> Marine Biological Laboratory Woods Hole, MA
> --
> Ph.D. student, Complex Systems
> School of Informatics, Indiana University
> 
> 
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From markjschreiber at gmail.com  Mon Jan 26 19:55:32 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 27 Jan 2009 08:55:32 +0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497D98A9.6010904@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
	<497C530D.7090006@imbusch.net>
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
	<497D98A9.6010904@imbusch.net>
Message-ID: <93b45ca50901261655n727219cdnd1ee27bf3b0d31e6@mail.gmail.com>

You can generate plain text from XML using an XSLT. It probably won't
be identical to the BLAST text output but that format is not so stable
anyway.


On Mon, Jan 26, 2009 at 7:04 PM, Charles Imbusch <charles at imbusch.net> wrote:
> Hello Mark,
>
> no I haven't tried to parse XML output from Blast.
> Just because of the fact that plain text output can be
> viewed with any editor. That's still very convenient.
>
> But I'm keen: is there actually an easy to use program for viewing
> XML output?
> Another option would be to generate XML and plain text
> output from Blast at the same time (in one run). But I couldn't find a
> way to do so. Maybe I missed something?
>
> Cheers,
>  Charles
>
> Mark Schreiber schrieb:
>>
>> Have you tried parsing the XML output. While the text parser does
>> 'fake' XML by emitting SAX events I think that the XML parser may be a
>> lot more stable (the XML output of blast is more stable, at least
>> recently anyway).
>>
>> I know this isn't the best solution to your problem but the default
>> text output of BLAST is not the most parseable. In fact it is probably
>> the least machine readable of all the blast outputs and definitely the
>> least stable.
>>
>> - Mark
>>
>
>

From nir at rosettadesigngroup.com  Tue Jan 27 08:08:34 2009
From: nir at rosettadesigngroup.com (Nir London)
Date: Tue, 27 Jan 2009 15:08:34 +0200
Subject: [Biojava-l] Rosetta Academic Training Workshop
Message-ID: <2F3C5E9F-418B-489F-A852-3A99248D03AC@rosettadesigngroup.com>

Due to public demand, ?Rosetta Design Group? is organizing a ?Rosetta?  
software training workshop, aimed for academic groups. The format of  
the workshop will be a ?webinar? - a web seminar, enabling more groups  
to attend while avoiding the annoying jet lag and accommodation  
troubles. Would you be interested in participating? If so please fill  
the form located at: http://rosettadesigngroup.com/blog/rosetta-academic-workshop/ 
  and we will contact you when the details are finalized.*

Nir London | Rosetta Design Group
http://rosettadesigngroup.com/

* If you?re not from an academic group, don?t worry, write us anyway?

From gwu at molbio.mgh.harvard.edu  Wed Jan 28 23:51:28 2009
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Wed, 28 Jan 2009 23:51:28 -0500
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <497D9CB6.4010403@eaglegenomics.com>
References: <497913F9.70009@Gmail.com>	<4979ED9C.6040207@imbusch.net>		<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>		<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>
Message-ID: <498135D0.9060103@molbio.mgh.harvard.edu>

Hi Everyone,

I have a piece of code to parse Genbank file and retrieve gene sequence 
and related information. It works well with sequences such as 
Arabidopsis thaliana, C. elegans, Bos taurus. But it failed with Mus 
musculus chromosome 2. The contig that the code failed on is the largest 
one in my test. Contig NT_039207 has 116366104 bp, but the code shows 
it's cut to 100000020 bp. That causes some gene coordinates out of 
range. Attached is the code. Can anyone give some suggesttion?

The Mus musculus Genbank file can be downloaded at :
ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz

Thanks in advance

Gang
==========================================
public class TestMus {
    public void testMusChr2() throws FileNotFoundException, 
NoSuchElementException, BioException {
        String fp="/tmp/mm_alt_chr2.gbk";
        System.out.println("File: " + fp);
        BufferedReader gReader = new BufferedReader(new 
InputStreamReader(new FileInputStream(new File(fp))));
        Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
        RichSequenceIterator seqI = 
RichSequence.IOTools.readGenbankDNA(gReader, ns);
        while (seqI.hasNext()) {
            RichSequence seq = seqI.nextRichSequence();
            String organism = seq.getTaxon().getDisplayName();
            String accession = seq.getAccession();
            String identifier = seq.getIdentifier();
            int taxonID = seq.getTaxon().getNCBITaxID();
            String division = seq.getDivision();
            String seqVersion = "" + seq.getSeqVersion();
            int seqLength = seq.length();
            String description = seq.getDescription();
            System.out.println("Organism: " + organism
                    + "\nAccession: " + accession
                    + "\nIdentifier: " + identifier
                    + "\nTaxonID: " + taxonID
                    + "\nDivision: " + division
                    + "\nSeqVersion: " + seqVersion
                    + "\nLength: " + seqLength);
            System.out.println("2041-2101: " + seq.subStr(2041, 2101));
            for (Iterator i = seq.features(); i.hasNext();) {
                RichFeature f = (RichFeature) i.next();
                int rank = f.getRank();
                String fType = f.getType();
                if (fType.toLowerCase().equals("gene")) {
                    int startPos=f.getLocation().getMin();
                    int endPos=f.getLocation().getMax();
                    int geneLen=endPos-startPos+1;
                    String sequence=seq.subStr(startPos, endPos);
                    String strand = f.getStrand().getToken() + "";
                    Annotation ann = (Annotation) f.getAnnotation();
                    String geneIdentifier ="";
                    if (ann.containsProperty("locus_tag")) {
                        geneIdentifier=ann.getProperty("locus_tag") + "";
                    }
                    else geneIdentifier=ann.getProperty("gene") + "";

                    String alternativeIdentifiers="";
                    try {
                        alternativeIdentifiers= (String) 
ann.getProperty("gene");

                    } catch(NoSuchElementException e) {}
                    String annotation="";
                    System.out.println(rank + "\t" + geneIdentifier + 
"\t" + alternativeIdentifiers + "\t"
                            + startPos + "\t" + endPos + "\t" + geneLen 
+ "\t" + strand);
                }
            }
        }
    }
    public static void main(String [] args) throws Exception {
       TestMus tm=new TestMus();
        tm.testMusChr2();
    }
}

From markjschreiber at gmail.com  Thu Jan 29 00:43:35 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 29 Jan 2009 13:43:35 +0800
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <498135D0.9060103@molbio.mgh.harvard.edu>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
	<497C530D.7090006@imbusch.net>
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
	<497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com>
	<498135D0.9060103@molbio.mgh.harvard.edu>
Message-ID: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>

I assume that the downloaded file has the complete sequence in it? Probably
worth checking that it has the complete sequence block (all 116366104 bp).

- Mark

On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu>wrote:

> Hi Everyone,
>
> I have a piece of code to parse Genbank file and retrieve gene sequence and
> related information. It works well with sequences such as Arabidopsis
> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
> 2. The contig that the code failed on is the largest one in my test. Contig
> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
> That causes some gene coordinates out of range. Attached is the code. Can
> anyone give some suggesttion?
>
> The Mus musculus Genbank file can be downloaded at :
> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>
> Thanks in advance
>
> Gang
> ==========================================
> public class TestMus {
>   public void testMusChr2() throws FileNotFoundException,
> NoSuchElementException, BioException {
>       String fp="/tmp/mm_alt_chr2.gbk";
>       System.out.println("File: " + fp);
>       BufferedReader gReader = new BufferedReader(new InputStreamReader(new
> FileInputStream(new File(fp))));
>       Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
>       RichSequenceIterator seqI =
> RichSequence.IOTools.readGenbankDNA(gReader, ns);
>       while (seqI.hasNext()) {
>           RichSequence seq = seqI.nextRichSequence();
>           String organism = seq.getTaxon().getDisplayName();
>           String accession = seq.getAccession();
>           String identifier = seq.getIdentifier();
>           int taxonID = seq.getTaxon().getNCBITaxID();
>           String division = seq.getDivision();
>           String seqVersion = "" + seq.getSeqVersion();
>           int seqLength = seq.length();
>           String description = seq.getDescription();
>           System.out.println("Organism: " + organism
>                   + "\nAccession: " + accession
>                   + "\nIdentifier: " + identifier
>                   + "\nTaxonID: " + taxonID
>                   + "\nDivision: " + division
>                   + "\nSeqVersion: " + seqVersion
>                   + "\nLength: " + seqLength);
>           System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>           for (Iterator i = seq.features(); i.hasNext();) {
>               RichFeature f = (RichFeature) i.next();
>               int rank = f.getRank();
>               String fType = f.getType();
>               if (fType.toLowerCase().equals("gene")) {
>                   int startPos=f.getLocation().getMin();
>                   int endPos=f.getLocation().getMax();
>                   int geneLen=endPos-startPos+1;
>                   String sequence=seq.subStr(startPos, endPos);
>                   String strand = f.getStrand().getToken() + "";
>                   Annotation ann = (Annotation) f.getAnnotation();
>                   String geneIdentifier ="";
>                   if (ann.containsProperty("locus_tag")) {
>                       geneIdentifier=ann.getProperty("locus_tag") + "";
>                   }
>                   else geneIdentifier=ann.getProperty("gene") + "";
>
>                   String alternativeIdentifiers="";
>                   try {
>                       alternativeIdentifiers= (String)
> ann.getProperty("gene");
>
>                   } catch(NoSuchElementException e) {}
>                   String annotation="";
>                   System.out.println(rank + "\t" + geneIdentifier + "\t" +
> alternativeIdentifiers + "\t"
>                           + startPos + "\t" + endPos + "\t" + geneLen +
> "\t" + strand);
>               }
>           }
>       }
>   }
>   public static void main(String [] args) throws Exception {
>      TestMus tm=new TestMus();
>       tm.testMusChr2();
>   }
> }
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From holland at eaglegenomics.com  Thu Jan 29 02:25:10 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 29 Jan 2009 07:25:10 +0000
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>	<498135D0.9060103@molbio.mgh.harvard.edu>
	<93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
Message-ID: <498159D6.8010906@eaglegenomics.com>

Gabrielle Doan posted a solution to this a while back and I believe the
changes have been committed already:

http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html

How old is the copy of BioJava that you're using? Have you tried
checking out the trunk from Subversion to see if that works?

cheers,
Richard

Mark Schreiber wrote:
> I assume that the downloaded file has the complete sequence in it? Probably
> worth checking that it has the complete sequence block (all 116366104 bp).
> 
> - Mark
> 
> On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu>wrote:
> 
>> Hi Everyone,
>>
>> I have a piece of code to parse Genbank file and retrieve gene sequence and
>> related information. It works well with sequences such as Arabidopsis
>> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
>> 2. The contig that the code failed on is the largest one in my test. Contig
>> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
>> That causes some gene coordinates out of range. Attached is the code. Can
>> anyone give some suggesttion?
>>
>> The Mus musculus Genbank file can be downloaded at :
>> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>>
>> Thanks in advance
>>
>> Gang
>> ==========================================
>> public class TestMus {
>>   public void testMusChr2() throws FileNotFoundException,
>> NoSuchElementException, BioException {
>>       String fp="/tmp/mm_alt_chr2.gbk";
>>       System.out.println("File: " + fp);
>>       BufferedReader gReader = new BufferedReader(new InputStreamReader(new
>> FileInputStream(new File(fp))));
>>       Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
>>       RichSequenceIterator seqI =
>> RichSequence.IOTools.readGenbankDNA(gReader, ns);
>>       while (seqI.hasNext()) {
>>           RichSequence seq = seqI.nextRichSequence();
>>           String organism = seq.getTaxon().getDisplayName();
>>           String accession = seq.getAccession();
>>           String identifier = seq.getIdentifier();
>>           int taxonID = seq.getTaxon().getNCBITaxID();
>>           String division = seq.getDivision();
>>           String seqVersion = "" + seq.getSeqVersion();
>>           int seqLength = seq.length();
>>           String description = seq.getDescription();
>>           System.out.println("Organism: " + organism
>>                   + "\nAccession: " + accession
>>                   + "\nIdentifier: " + identifier
>>                   + "\nTaxonID: " + taxonID
>>                   + "\nDivision: " + division
>>                   + "\nSeqVersion: " + seqVersion
>>                   + "\nLength: " + seqLength);
>>           System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>>           for (Iterator i = seq.features(); i.hasNext();) {
>>               RichFeature f = (RichFeature) i.next();
>>               int rank = f.getRank();
>>               String fType = f.getType();
>>               if (fType.toLowerCase().equals("gene")) {
>>                   int startPos=f.getLocation().getMin();
>>                   int endPos=f.getLocation().getMax();
>>                   int geneLen=endPos-startPos+1;
>>                   String sequence=seq.subStr(startPos, endPos);
>>                   String strand = f.getStrand().getToken() + "";
>>                   Annotation ann = (Annotation) f.getAnnotation();
>>                   String geneIdentifier ="";
>>                   if (ann.containsProperty("locus_tag")) {
>>                       geneIdentifier=ann.getProperty("locus_tag") + "";
>>                   }
>>                   else geneIdentifier=ann.getProperty("gene") + "";
>>
>>                   String alternativeIdentifiers="";
>>                   try {
>>                       alternativeIdentifiers= (String)
>> ann.getProperty("gene");
>>
>>                   } catch(NoSuchElementException e) {}
>>                   String annotation="";
>>                   System.out.println(rank + "\t" + geneIdentifier + "\t" +
>> alternativeIdentifiers + "\t"
>>                           + startPos + "\t" + endPos + "\t" + geneLen +
>> "\t" + strand);
>>               }
>>           }
>>       }
>>   }
>>   public static void main(String [] args) throws Exception {
>>      TestMus tm=new TestMus();
>>       tm.testMusChr2();
>>   }
>> }
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From jw12 at sanger.ac.uk  Thu Jan 29 06:20:47 2009
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Thu, 29 Jan 2009 11:20:47 +0000
Subject: [Biojava-l] closing this week: Registrations for DAS workshop.
Message-ID: <207B56E3-C65C-41A4-800E-AF0B9F158CA6@sanger.ac.uk>

DAS is currently being used to share annotations on genomes, protein
alignments, structural and interaction information.

If you are interested in sharing biological information the DAS workshop
below may be of interest to you.

Registration is open for the 2009 DAS workshop (8,9,10th March) at the
Genome Campus, Hinxton UK. If you are interested in attending, please
find out more by going to http://www.dasregistry.org/course.jsp and
register via the web link at the bottom of the page. This workshop will
cater for novice to expert DAS users as each day is optional.

Closing date for registration is 1st Feb 2009. If you register now you
can change the details of your registration any time up until this
closing date. Please register early as places will be limited.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From holland at eaglegenomics.com  Thu Jan 29 11:10:42 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 29 Jan 2009 16:10:42 +0000
Subject: [Biojava-l] Eagle Genomics is hiring
Message-ID: <4981D502.1000905@eaglegenomics.com>

Hi all,

Apologies if this is inappropriate for the list, but I thought it would
be a good way to reach the kind of people we're looking for.

Richard

=====

Senior Bioinformatics Software Developer
Eagle Genomics Ltd., Cambridge, UK
http://www.eaglegenomics.com/

We are a young and exciting bioinformatics company looking to
revolutionise the way in which industry and academia work together. We
are based at the heart of Europe's largest biotech cluster in Cambridge,
UK. As we expand our client base, we're looking to build a talented and
committed team of experts. We are currently looking for a software
developer to work on a wide range of complex projects, and who is happy
to work face-to-face with our customers. Ideally you will have had
substantial prior experience working in a life science company or
research institute, however we will also consider graduates with a track
record in bioinformatics.

In addition to your superb technical skills, you will also:
* have the ability to quickly translate scientific problems into real
software solutions,
* be able to put technical concepts into simple language for end users
to understand,
* be able to pick up new skills and techniques in record time,
* work well in a collaborative team environment,
* be creative, innovative, and forward-thinking.

You will have hands-on experience in some of the following:
* Java,
* Perl,
* SQL query design,
* Relational database schema design,
* Open-source bioinformatics toolkits such as BioJava, BioPerl, BioSQL,
etc.,
* Ensembl,
* BioMart,
* DAS,
* Taverna,
* Oracle Life Sciences Platform,
* Oracle database administration,
* MySQL database administration,
* VMware virtual machines,
* Grid computing and parallelisation.

The preferred candidate will be able to work from our offices in
Cambridge, but we would also consider telecommuting arrangements.

We offer a competitive salary and a range of company benefits.

To apply, please send your CV and cover letter as PDF documents to
jobs at eaglegenomics.com. If you have any questions about the position or
would like to discuss it further before applying, please use the same
email address. We are only able to offer positions to EEA citizens and
permanent residents, or Tier 1 migrants under the new UK points-based
immigration scheme.

Individual contracting arrangements could be considered but we will
prefer those candidates who can work with us as employees. No agencies
please.

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From gwu at molbio.mgh.harvard.edu  Thu Jan 29 13:40:06 2009
From: gwu at molbio.mgh.harvard.edu (gwu)
Date: Thu, 29 Jan 2009 13:40:06 -0500
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>	
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	
	<497C530D.7090006@imbusch.net>	
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	
	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>	
	<498135D0.9060103@molbio.mgh.harvard.edu>
	<93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
Message-ID: <4981F806.7070100@molbio.mgh.harvard.edu>

Thanks Mark. I did parse out the sequence block with sed and the length 
agrees with what the Genbank says.

Gang

Mark Schreiber wrote:
> I assume that the downloaded file has the complete sequence in it? 
> Probably worth checking that it has the complete sequence block (all 
> 116366104 bp).
>  
> - Mark
>
> On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu 
> <mailto:gwu at molbio.mgh.harvard.edu>> wrote:
>
>     Hi Everyone,
>
>     I have a piece of code to parse Genbank file and retrieve gene
>     sequence and related information. It works well with sequences
>     such as Arabidopsis thaliana, C. elegans, Bos taurus. But it
>     failed with Mus musculus chromosome 2. The contig that the code
>     failed on is the largest one in my test. Contig NT_039207 has
>     116366104 bp, but the code shows it's cut to 100000020 bp. That
>     causes some gene coordinates out of range. Attached is the code.
>     Can anyone give some suggesttion?
>
>     The Mus musculus Genbank file can be downloaded at :
>     ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>
>     Thanks in advance
>
>     Gang
>     ==========================================
>     public class TestMus {
>       public void testMusChr2() throws FileNotFoundException,
>     NoSuchElementException, BioException {
>           String fp="/tmp/mm_alt_chr2.gbk";
>           System.out.println("File: " + fp);
>           BufferedReader gReader = new BufferedReader(new
>     InputStreamReader(new FileInputStream(new File(fp))));
>           Namespace ns = (Namespace)
>     RichObjectFactory.getDefaultNamespace();
>           RichSequenceIterator seqI =
>     RichSequence.IOTools.readGenbankDNA(gReader, ns);
>           while (seqI.hasNext()) {
>               RichSequence seq = seqI.nextRichSequence();
>               String organism = seq.getTaxon().getDisplayName();
>               String accession = seq.getAccession();
>               String identifier = seq.getIdentifier();
>               int taxonID = seq.getTaxon().getNCBITaxID();
>               String division = seq.getDivision();
>               String seqVersion = "" + seq.getSeqVersion();
>               int seqLength = seq.length();
>               String description = seq.getDescription();
>               System.out.println("Organism: " + organism
>                       + "\nAccession: " + accession
>                       + "\nIdentifier: " + identifier
>                       + "\nTaxonID: " + taxonID
>                       + "\nDivision: " + division
>                       + "\nSeqVersion: " + seqVersion
>                       + "\nLength: " + seqLength);
>               System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>               for (Iterator i = seq.features(); i.hasNext();) {
>                   RichFeature f = (RichFeature) i.next();
>                   int rank = f.getRank();
>                   String fType = f.getType();
>                   if (fType.toLowerCase().equals("gene")) {
>                       int startPos=f.getLocation().getMin();
>                       int endPos=f.getLocation().getMax();
>                       int geneLen=endPos-startPos+1;
>                       String sequence=seq.subStr(startPos, endPos);
>                       String strand = f.getStrand().getToken() + "";
>                       Annotation ann = (Annotation) f.getAnnotation();
>                       String geneIdentifier ="";
>                       if (ann.containsProperty("locus_tag")) {
>                           geneIdentifier=ann.getProperty("locus_tag")
>     + "";
>                       }
>                       else geneIdentifier=ann.getProperty("gene") + "";
>
>                       String alternativeIdentifiers="";
>                       try {
>                           alternativeIdentifiers= (String)
>     ann.getProperty("gene");
>
>                       } catch(NoSuchElementException e) {}
>                       String annotation="";
>                       System.out.println(rank + "\t" + geneIdentifier
>     + "\t" + alternativeIdentifiers + "\t"
>                               + startPos + "\t" + endPos + "\t" +
>     geneLen + "\t" + strand);
>                   }
>               }
>           }
>       }
>       public static void main(String [] args) throws Exception {
>          TestMus tm=new TestMus();
>           tm.testMusChr2();
>       }
>     }
>     _______________________________________________
>     Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>


From gwu at molbio.mgh.harvard.edu  Thu Jan 29 14:28:42 2009
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Thu, 29 Jan 2009 14:28:42 -0500
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <498159D6.8010906@eaglegenomics.com>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>	<498135D0.9060103@molbio.mgh.harvard.edu>
	<93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
	<498159D6.8010906@eaglegenomics.com>
Message-ID: <4982036A.7070302@molbio.mgh.harvard.edu>

Thanks Richard. That is exactly the same issue. The latest Subversion 
trunk fixed the problem.

Thanks again for the quick response.

Gang

Richard Holland wrote:
> Gabrielle Doan posted a solution to this a while back and I believe the
> changes have been committed already:
>
> http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html
>
> How old is the copy of BioJava that you're using? Have you tried
> checking out the trunk from Subversion to see if that works?
>
> cheers,
> Richard
>
> Mark Schreiber wrote:
>   
>> I assume that the downloaded file has the complete sequence in it? Probably
>> worth checking that it has the complete sequence block (all 116366104 bp).
>>
>> - Mark
>>
>> On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu>wrote:
>>
>>     
>>> Hi Everyone,
>>>
>>> I have a piece of code to parse Genbank file and retrieve gene sequence and
>>> related information. It works well with sequences such as Arabidopsis
>>> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
>>> 2. The contig that the code failed on is the largest one in my test. Contig
>>> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
>>> That causes some gene coordinates out of range. Attached is the code. Can
>>> anyone give some suggesttion?
>>>
>>> The Mus musculus Genbank file can be downloaded at :
>>> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>>>
>>> Thanks in advance
>>>
>>> Gang
>>> ==========================================
>>> public class TestMus {
>>>   public void testMusChr2() throws FileNotFoundException,
>>> NoSuchElementException, BioException {
>>>       String fp="/tmp/mm_alt_chr2.gbk";
>>>       System.out.println("File: " + fp);
>>>       BufferedReader gReader = new BufferedReader(new InputStreamReader(new
>>> FileInputStream(new File(fp))));
>>>       Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
>>>       RichSequenceIterator seqI =
>>> RichSequence.IOTools.readGenbankDNA(gReader, ns);
>>>       while (seqI.hasNext()) {
>>>           RichSequence seq = seqI.nextRichSequence();
>>>           String organism = seq.getTaxon().getDisplayName();
>>>           String accession = seq.getAccession();
>>>           String identifier = seq.getIdentifier();
>>>           int taxonID = seq.getTaxon().getNCBITaxID();
>>>           String division = seq.getDivision();
>>>           String seqVersion = "" + seq.getSeqVersion();
>>>           int seqLength = seq.length();
>>>           String description = seq.getDescription();
>>>           System.out.println("Organism: " + organism
>>>                   + "\nAccession: " + accession
>>>                   + "\nIdentifier: " + identifier
>>>                   + "\nTaxonID: " + taxonID
>>>                   + "\nDivision: " + division
>>>                   + "\nSeqVersion: " + seqVersion
>>>                   + "\nLength: " + seqLength);
>>>           System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>>>           for (Iterator i = seq.features(); i.hasNext();) {
>>>               RichFeature f = (RichFeature) i.next();
>>>               int rank = f.getRank();
>>>               String fType = f.getType();
>>>               if (fType.toLowerCase().equals("gene")) {
>>>                   int startPos=f.getLocation().getMin();
>>>                   int endPos=f.getLocation().getMax();
>>>                   int geneLen=endPos-startPos+1;
>>>                   String sequence=seq.subStr(startPos, endPos);
>>>                   String strand = f.getStrand().getToken() + "";
>>>                   Annotation ann = (Annotation) f.getAnnotation();
>>>                   String geneIdentifier ="";
>>>                   if (ann.containsProperty("locus_tag")) {
>>>                       geneIdentifier=ann.getProperty("locus_tag") + "";
>>>                   }
>>>                   else geneIdentifier=ann.getProperty("gene") + "";
>>>
>>>                   String alternativeIdentifiers="";
>>>                   try {
>>>                       alternativeIdentifiers= (String)
>>> ann.getProperty("gene");
>>>
>>>                   } catch(NoSuchElementException e) {}
>>>                   String annotation="";
>>>                   System.out.println(rank + "\t" + geneIdentifier + "\t" +
>>> alternativeIdentifiers + "\t"
>>>                           + startPos + "\t" + endPos + "\t" + geneLen +
>>> "\t" + strand);
>>>               }
>>>           }
>>>       }
>>>   }
>>>   public static void main(String [] args) throws Exception {
>>>      TestMus tm=new TestMus();
>>>       tm.testMusChr2();
>>>   }
>>> }
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>       
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>     
>
>   


From marcel.huntemann at gmail.com  Thu Jan 29 15:55:15 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Thu, 29 Jan 2009 12:55:15 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <4979ED9C.6040207@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
Message-ID: <498217B3.4010703@Gmail.com>

Hi Charles!

I've "found" a solution now. After dealing a couple of days with the
terrible xml output of blast and BioJava's BlastXMLParser (which also
wasn't working properly), I decided to have a look at the source code and
try to figure out myself what was wrong with the BlastLikeSAXParser. So I
 checked out the present status of the source code via the anonymous svn
checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After
a couple of hours and me not finding an error that could cause this
behavior, I thought I'll just give it a try and compiled the checked out
source via ant. Then used the new created biojava.jar and suddenly
everything went perfectly!
So, whatever the error was (unfortunately I don't have the old source code
to make a diff on certain files), it is already corrected in the
up-to-the-minute version in the subversion system.
Try it out!

Cheers,
Marcel


Charles Imbusch wrote:
> Hello Marcel,
> 
> I also do experience the problem that the parser is skipping
> the even result numbers. I have not found a sufficient solution
> for that, so I gave up on parsing on a blast result file containing
> multiple results. Instead I splitted up the big fasta file into
> serveral ones, so that I just get one result for one fasta file.
> That works, even it's not the best solution for it.
> 
> Let me know if you find another solution for that problem.
> 
> Cheers,
> Charles
> 
> 

From andreas at sdsc.edu  Thu Jan 29 16:11:51 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 29 Jan 2009 13:11:51 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <498217B3.4010703@Gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<498217B3.4010703@Gmail.com>
Message-ID: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>

Hi,

We had a couple of bug reports recently regarding issues that already
got fixed in the latest biojava builds from SVN. I think it is time to
start preparing the next biojava release ( 1.7 )  to make sure
everybody gets up to the latest status...

Andreas


On Thu, Jan 29, 2009 at 12:55 PM, Marcel Huntemann
<marcel.huntemann at gmail.com> wrote:
> Hi Charles!
>
> I've "found" a solution now. After dealing a couple of days with the
> terrible xml output of blast and BioJava's BlastXMLParser (which also
> wasn't working properly), I decided to have a look at the source code and
> try to figure out myself what was wrong with the BlastLikeSAXParser. So I
>  checked out the present status of the source code via the anonymous svn
> checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After
> a couple of hours and me not finding an error that could cause this
> behavior, I thought I'll just give it a try and compiled the checked out
> source via ant. Then used the new created biojava.jar and suddenly
> everything went perfectly!
> So, whatever the error was (unfortunately I don't have the old source code
> to make a diff on certain files), it is already corrected in the
> up-to-the-minute version in the subversion system.
> Try it out!
>
> Cheers,
> Marcel
>
>
> Charles Imbusch wrote:
>> Hello Marcel,
>>
>> I also do experience the problem that the parser is skipping
>> the even result numbers. I have not found a sufficient solution
>> for that, so I gave up on parsing on a blast result file containing
>> multiple results. Instead I splitted up the big fasta file into
>> serveral ones, so that I just get one result for one fasta file.
>> That works, even it's not the best solution for it.
>>
>> Let me know if you find another solution for that problem.
>>
>> Cheers,
>> Charles
>>
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From marcel.huntemann at gmail.com  Thu Jan 29 16:24:55 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Thu, 29 Jan 2009 13:24:55 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>	
	<498217B3.4010703@Gmail.com>
	<59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>
Message-ID: <49821EA7.9090603@Gmail.com>

That sounds reasonable. I bet a lot of people would appreciate that!

Andreas Prlic wrote:
> Hi,
> 
> We had a couple of bug reports recently regarding issues that already
> got fixed in the latest biojava builds from SVN. I think it is time to
> start preparing the next biojava release ( 1.7 )  to make sure
> everybody gets up to the latest status...
> 
> Andreas

From marcin.swiatek at mail.mcgill.ca  Thu Jan 29 16:56:29 2009
From: marcin.swiatek at mail.mcgill.ca (Marcin Swiatek)
Date: Thu, 29 Jan 2009 16:56:29 -0500
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <49821EA7.9090603@Gmail.com>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>	<498217B3.4010703@Gmail.com><59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>
	<49821EA7.9090603@Gmail.com>
Message-ID: <176A06E658ED0745965C072C5F2C116A02F87314@EXCHANGE2VS2.campus.mcgill.ca>

I personally would. Especially that I have just solved the problem
myself, unaware that someone did that already. 

BTW: the problem I picked up (which seems similar to the description
given) was that new set line (as evaluated by checkNewBlastLikeDataSet
in BlastSAXParser) wasn't picked up by HitSectionSAXParser and neither
it percolated up to BlastSAXParser, thus leaving the state machine of
the parser in a weird state. It would recover by skipping everything up
to the next data set (thus the result of having every other item
processed). 

BTW2: XML parser in 1.6 doesn't deal with new BLAST files either (2.19,
was it?). Has this been fixed in the SVN repository?

Cheers, 

Marcin


-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Marcel
Huntemann
Sent: Thursday, January 29, 2009 4:25 PM
To: Andreas Prlic
Cc: biojava-dev; biojava-l at biojava.org
Subject: Re: [Biojava-l] Problem with blast file parser

That sounds reasonable. I bet a lot of people would appreciate that!

Andreas Prlic wrote:
> Hi,
> 
> We had a couple of bug reports recently regarding issues that already
> got fixed in the latest biojava builds from SVN. I think it is time to
> start preparing the next biojava release ( 1.7 )  to make sure
> everybody gets up to the latest status...
> 
> Andreas
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From umanga.bio at gmail.com  Fri Jan 30 07:00:41 2009
From: umanga.bio at gmail.com (Ashika Umanga Umangiliya)
Date: Fri, 30 Jan 2009 21:00:41 +0900
Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ?
Message-ID: <b4d9edaa0901300400m41f1b19dy764da72db6162067@mail.gmail.com>

Greetings all,

In the application I develop ,I  want to draw chromatogram from AB1. I come
from computer science background have little knowledge this subject.Where
can I find information on this?
Can I draw the graph using data in AB1 file? Or is there any function ?


thanks in advance,

Umanga

From ayates at ebi.ac.uk  Fri Jan 30 07:57:40 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 30 Jan 2009 12:57:40 +0000
Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ?
In-Reply-To: <b4d9edaa0901300400m41f1b19dy764da72db6162067@mail.gmail.com>
References: <b4d9edaa0901300400m41f1b19dy764da72db6162067@mail.gmail.com>
Message-ID: <4982F944.7080905@ebi.ac.uk>

Hi Umanga,

Fortunately BioJava has an API for drawing chromatograms located under
org.biojava.bio.chromatogram & org.biojava.bio.chromatogram.graphic. To
parse in a AB1 file you can run the following code:

import java.io.*;
import org.biojava.bio.program.abi.ABIFChromatogram;
import org.biojava.bio.chromatogram.*;
import org.biojava.bio.chromatogram.graphic.*;

File file = new File("chr.ab1");
Chromatogram c = ABIFChromatogram.parse(file);
ChromatogramGraphic cg = new ChromatogramGraphic(c);
//Can't remember how to get this so you'll have to find out
Graphics2D context = getContextFromSomewhere();
cg.drawTo(cg);

You can configure the size of the image through the ChromatogramGraphic
object & alter a number of ChromatogramGraphic.Option attributes through
ChromatogramGraphic.setOption(ChromatogramGraphic.Option opt, Object value).

This should be enough to get you going. I will warn you that this class
is quite memory intensive & an application I wrote ages ago had very big
memory problems because of it (the drawing component not the file
parsing). An alternative library is available from
http://code.google.com/p/bioview2/ (which was developed by an old
colleague).

Try the biojava code first and if that serves your purpose then great;
if not then try bioview2.

Regards,

Andy Yates

P.S. The AB1 parser only supports the processed data channels in the AB1
file. If you want the raw data from it then you will have to modify the
source or use another library (probably the C library StadenIO) to
convert the raw data into an SCF file.

Ashika Umanga Umangiliya wrote:
> Greetings all,
> 
> In the application I develop ,I  want to draw chromatogram from AB1. I come
> from computer science background have little knowledge this subject.Where
> can I find information on this?
> Can I draw the graph using data in AB1 file? Or is there any function ?
> 
> 
> thanks in advance,
> 
> Umanga
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From ahmed.elmasri at gmail.com  Fri Jan 30 21:41:31 2009
From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.)
Date: Fri, 30 Jan 2009 21:41:31 -0500
Subject: [Biojava-l] Sequence start/end location
Message-ID: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>

Hello list,
I am trying to get find the start and end location of a gene in a gene
sequence. I am reading from a gene FASTA database file. Is there a built-in
method that I can use? The alternative is really painful since I have to
parse a ptt file and not exactly working for me.
Thanks very much!
Ahmed

From markjschreiber at gmail.com  Fri Jan 30 23:15:59 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 31 Jan 2009 12:15:59 +0800
Subject: [Biojava-l] Sequence start/end location
In-Reply-To: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>
References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>
Message-ID: <93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com>

Hi -

Unfortunately your FASTA file won't contain any feature information
which could tell you the start and end. If you don't want to get the
info from the PTT file you might want to look at parsing the Genbank
file instead which will have the feature information.

A PTT parser might not be a bad thing for BioJava though. If you write
one please consider adding it.

- Mark

On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A.
<ahmed.elmasri at gmail.com> wrote:
>
> Hello list,
> I am trying to get find the start and end location of a gene in a gene
> sequence. I am reading from a gene FASTA database file. Is there a built-in
> method that I can use? The alternative is really painful since I have to
> parse a ptt file and not exactly working for me.
> Thanks very much!
> Ahmed
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From markjschreiber at gmail.com  Sat Jan 31 05:25:59 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 31 Jan 2009 18:25:59 +0800
Subject: [Biojava-l] Sequence start/end location
In-Reply-To: <5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com>
References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>
	<93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com>
	<5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com>
Message-ID: <93b45ca50901310225p6676c282m203e8c4e13ba37f1@mail.gmail.com>

Hi Ahmed -

For a first time contribution it would probably be easiest to post
something to the list and someone with a development account can check
it in for you.

Please make sure to add javadoc comments and a basic JUnit test for
any classes you make.

- Mark

On Sat, Jan 31, 2009 at 3:13 PM, Hamed, Ahmed A.
<ahmed.elmasri at gmail.com> wrote:
> Dear Mark,
> Thank you for your response. I would be happy to contribute my PTTParser if
> you point me to where/how to check it in. I am still new to the BioJava
> community and there is so much to learn.
> Best wishes,
> Ahmed
>
> On Fri, Jan 30, 2009 at 11:15 PM, Mark Schreiber <markjschreiber at gmail.com>
> wrote:
>>
>> Hi -
>>
>> Unfortunately your FASTA file won't contain any feature information
>> which could tell you the start and end. If you don't want to get the
>> info from the PTT file you might want to look at parsing the Genbank
>> file instead which will have the feature information.
>>
>> A PTT parser might not be a bad thing for BioJava though. If you write
>> one please consider adding it.
>>
>> - Mark
>>
>> On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A.
>> <ahmed.elmasri at gmail.com> wrote:
>> >
>> > Hello list,
>> > I am trying to get find the start and end location of a gene in a gene
>> > sequence. I am reading from a gene FASTA database file. Is there a
>> > built-in
>> > method that I can use? The alternative is really painful since I have to
>> > parse a ptt file and not exactly working for me.
>> > Thanks very much!
>> > Ahmed
>> > _______________________________________________
>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
> --
> Ahmed Abdeen Hamed
> Scientific Informatics Project Leader
> Marine Biological Laboratory Woods Hole, MA
> --
> Ph.D. student, Complex Systems
> School of Informatics, Indiana University
>
>
>
>

From aumanga at biggjapan.com  Thu Jan  8 09:04:38 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Thu, 08 Jan 2009 18:04:38 +0900
Subject: [Biojava-l] Genebank Webservices (corrrect result page)
Message-ID: <4965C1A6.1030306@biggjapan.com>

Sorry, correct result page is :

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277


From aumanga at biggjapan.com  Thu Jan  8 09:09:22 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Thu, 08 Jan 2009 18:09:22 +0900
Subject: [Biojava-l] Genebank Webservices (corrrect result page)
In-Reply-To: <4965C1A6.1030306@biggjapan.com>
References: <4965C1A6.1030306@biggjapan.com>
Message-ID: <4965C2C2.2030801@biggjapan.com>

Greetings all,

Sorry if this is reposted!

I come from a computer science background and only have little knowledge 
in bioinformatics.
In the application I develop,I want to search for an genebank id (like 
4558277) from ncbi and want to retrieve the relavent PDB_ID.

For example :
Say for id '4558277', i get the result

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277

I can see the value '1F58_L' which is only significant to me.

I want to know where there is any webservice to do retrive this 
information.

That means, I send '4558277'  in SOAP input parameters and in the result 
I should get the value '1F58_L' .

I found following webservices and want to know whether I can use the one 
for 'Gene' :

http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html

thanks in advance,
umanga


Ashika Umanga Umagiliya wrote:
> Sorry, correct result page is :
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=4558277
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From aumanga at biggjapan.com  Thu Jan  8 08:58:09 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Thu, 08 Jan 2009 17:58:09 +0900
Subject: [Biojava-l] Genebank Webservices ?
Message-ID: <4965C021.2060109@biggjapan.com>

Greetings all,

I come from a computer science background and only have little knowledge 
in bioinformatics.
In the application I develop,I want to search for an genebank id (like 
4558277) from ncbi and want to retrieve the relavent PDB_ID.

For example :
Say for id '4558277', i get the result

http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277

I can see the value '1F58_L' which is only significant to me.

I want to know where there is any webservice to do retrive this information.

That means, I send '4558277'  in SOAP input parameters and in the result 
I should get the value '1F58_L' .

I found following webservices and want to know whether I can use the one 
for 'Gene' :

http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html

thanks in advance,
umanga


From holland at eaglegenomics.com  Thu Jan  8 11:07:43 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 08 Jan 2009 11:07:43 +0000
Subject: [Biojava-l] Genebank Webservices ?
In-Reply-To: <4965C021.2060109@biggjapan.com>
References: <4965C021.2060109@biggjapan.com>
Message-ID: <4965DE7F.8020209@eaglegenomics.com>

There is no generic interface to NCBI eUtils in BioJava, but one is planned.

In the meantime take a look at this existing BioJava 1.6 package, which
will query Genbank for a sequence and return a BioJava RichSequence
object containing the result. You can then search through the
annotations and features of the sequence to find the result you need.

This is for Gene records:

http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenbankRichSequenceDB.html

Or the equivalent for Peptide records:

http://biojava.org/docs/api/org/biojavax/bio/db/ncbi/GenpeptRichSequenceDB.html

cheers,
Richard

Ashika Umanga Umagiliya wrote:
> Greetings all,
> 
> I come from a computer science background and only have little knowledge
> in bioinformatics.
> In the application I develop,I want to search for an genebank id (like
> 4558277) from ncbi and want to retrieve the relavent PDB_ID.
> 
> For example :
> Say for id '4558277', i get the result
> 
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=4558277
> 
> I can see the value '1F58_L' which is only significant to me.
> 
> I want to know where there is any webservice to do retrive this
> information.
> 
> That means, I send '4558277'  in SOAP input parameters and in the result
> I should get the value '1F58_L' .
> 
> I found following webservices and want to know whether I can use the one
> for 'Gene' :
> 
> http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html
> 
> thanks in advance,
> umanga
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas.prlic at gmail.com  Mon Jan 12 09:28:22 2009
From: andreas.prlic at gmail.com (Andreas Prlic)
Date: Mon, 12 Jan 2009 10:28:22 +0100
Subject: [Biojava-l] BioJava
In-Reply-To: <496ac2075c8cf4.53511495@wp.pl>
References: <496ac2075c8cf4.53511495@wp.pl>
Message-ID: <59a41c430901120128l53f2a5c8le0a122a0a73515@mail.gmail.com>

Hi Michal,

the code you sent looks fine to me. Still I am not sure if I fully
understand what you are trying to say. What do you mean with "each
hit" ?

>From our previous discussion I understand that you work with two sets
of atoms (residues)  where each position in one set corresponds to a
position in the other set. This means you know that all atoms are on
structurally equivalent positions and the two sets of atoms are of the
same size. If this is the case, then the SVDSuperimposer is the right
tool and you would include all atoms in the two sets for the RMSD
calculation.

If you work with 2 proteins where you do NOT know the structurally
equivalent positions at the start, then StructurePairAligner provides
an algorithm to align two proteins (of different length) and find
pairs of atoms (residues) on structurally equivalent positions. In
this case, the RMSD calculation considers the positions that are
equivalent and ignores the unaligned regions. Guess I should create a
wiki page for explaining this difference between SVDSuperimposer and
StructurePairAligner...

Andreas


2009/1/12 Micha? Lorenc <m.t.lorenc at wp.pl>:
> Dear Andreas,
> I used the SVDSuperimposer class, but after Calc.rotate and Calc.shift I
> would know which Atom is close to another Atom.
>
> SVDSuperimposer.getRMS(caAtoms1, caAtoms2) get me only for the whole
> protein structure the RMS value, but how could I get a RMS value for
> each hit?
>
> I attached you my code. Thank you in advance!
>
> Best regards,
>
> Michal
>
> ----------------------------------------------------
> Adam Ma?ysz na aukcji!
> Zobacz:
> http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2F522934896.html&sid=605
>


From aumanga at biggjapan.com  Fri Jan 16 01:03:24 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Fri, 16 Jan 2009 10:03:24 +0900
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software engineers?
Message-ID: <496FDCDC.5010805@biggjapan.com>

Greetings all,

I come from a computer science background and at the moment I work on a 
Bioinformatics software.I really see the necessity to learn more on 
bioinformatics , quickly :)
I hear (and use blindly)all this words - "sequence alignment , epitopes 
, CDR , homology modeling ,docking,amino acids"...etc and at the moment 
I don't care much about them since I've been told what to happen and I 
implement it.
Where can i learn about this concepts easily , I mean for a guy come 
from mathematical and IT background ?/

Best regards,
umanga


From holland at eaglegenomics.com  Fri Jan 16 10:50:35 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 16 Jan 2009 10:50:35 +0000
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software
	engineers?
In-Reply-To: <496FDCDC.5010805@biggjapan.com>
References: <496FDCDC.5010805@biggjapan.com>
Message-ID: <4970667B.9030601@eaglegenomics.com>

Your best bet is a good old fashioned book. ;)

A quick search on Amazon threw up this one which looks like a very
helpful intro to cell biology for people like you (and me!) who have
come to bioinformatics from a computer science background:

http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4

Hopefully this is a good starting point. I'm sure everyone on this list
has their own favourite books which they could recommend to you as well.

cheers,
Richard


Ashika Umanga Umagiliya wrote:
> Greetings all,
> 
> I come from a computer science background and at the moment I work on a
> Bioinformatics software.I really see the necessity to learn more on
> bioinformatics , quickly :)
> I hear (and use blindly)all this words - "sequence alignment , epitopes
> , CDR , homology modeling ,docking,amino acids"...etc and at the moment
> I don't care much about them since I've been told what to happen and I
> implement it.
> Where can i learn about this concepts easily , I mean for a guy come
> from mathematical and IT background ?/
> 
> Best regards,
> umanga
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Fri Jan 16 12:27:23 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 16 Jan 2009 20:27:23 +0800
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software
	engineers?
In-Reply-To: <4970667B.9030601@eaglegenomics.com>
References: <496FDCDC.5010805@biggjapan.com>
	<4970667B.9030601@eaglegenomics.com>
Message-ID: <93b45ca50901160427l5941f82dy18b68f5000c32722@mail.gmail.com>

Wikipedia is always a good place to get a very rapid overview of some
unfamiliar biological term.

- Mark

On Fri, Jan 16, 2009 at 6:50 PM, Richard Holland
<holland at eaglegenomics.com> wrote:
>
> Your best bet is a good old fashioned book. ;)
>
> A quick search on Amazon threw up this one which looks like a very
> helpful intro to cell biology for people like you (and me!) who have
> come to bioinformatics from a computer science background:
>
> http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4
>
> Hopefully this is a good starting point. I'm sure everyone on this list
> has their own favourite books which they could recommend to you as well.
>
> cheers,
> Richard
>
>
> Ashika Umanga Umagiliya wrote:
> > Greetings all,
> >
> > I come from a computer science background and at the moment I work on a
> > Bioinformatics software.I really see the necessity to learn more on
> > bioinformatics , quickly :)
> > I hear (and use blindly)all this words - "sequence alignment , epitopes
> > , CDR , homology modeling ,docking,amino acids"...etc and at the moment
> > I don't care much about them since I've been told what to happen and I
> > implement it.
> > Where can i learn about this concepts easily , I mean for a guy come
> > from mathematical and IT background ?/
> >
> > Best regards,
> > umanga
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From koen.bruynseels at cropdesign.com  Fri Jan 16 13:09:58 2009
From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com)
Date: Fri, 16 Jan 2009 14:09:58 +0100
Subject: [Biojava-l] Koen Bruynseels is out of the office.
Message-ID: <OFF96A90E2.94FBF795-ONC1257540.00485300-C1257540.004852FF@basf-c-s.be>


I will be out of the office starting  01/14/2009 and will not return until
01/25/2009.

I will respond to your message when I return.


From aumanga at biggjapan.com  Mon Jan 19 00:43:31 2009
From: aumanga at biggjapan.com (Ashika Umanga Umagiliya)
Date: Mon, 19 Jan 2009 09:43:31 +0900
Subject: [Biojava-l] [Off the Topic ] Bioinformatics for software
	engineers?
In-Reply-To: <4970667B.9030601@eaglegenomics.com>
References: <496FDCDC.5010805@biggjapan.com>
	<4970667B.9030601@eaglegenomics.com>
Message-ID: <4973CCB3.4000008@biggjapan.com>

Thanks everyone for the tips..
I started reading "BioInformatics for Dummies" to get the basics..then 
hoping to move on to the book Richard recommended.

Thank you again,
Best regards,
umanga


Richard Holland wrote:
> Your best bet is a good old fashioned book. ;)
>
> A quick search on Amazon threw up this one which looks like a very
> helpful intro to cell biology for people like you (and me!) who have
> come to bioinformatics from a computer science background:
>
> http://www.amazon.com/Bioinformatics-Genes-Proteins-Computers-Advanced/dp/1859960545/ref=sr_1_4?ie=UTF8&s=books&qid=1232102747&sr=1-4
>
> Hopefully this is a good starting point. I'm sure everyone on this list
> has their own favourite books which they could recommend to you as well.
>
> cheers,
> Richard
>
>
> Ashika Umanga Umagiliya wrote:
>   
>> Greetings all,
>>
>> I come from a computer science background and at the moment I work on a
>> Bioinformatics software.I really see the necessity to learn more on
>> bioinformatics , quickly :)
>> I hear (and use blindly)all this words - "sequence alignment , epitopes
>> , CDR , homology modeling ,docking,amino acids"...etc and at the moment
>> I don't care much about them since I've been told what to happen and I
>> implement it.
>> Where can i learn about this concepts easily , I mean for a guy come
>> from mathematical and IT background ?/
>>
>> Best regards,
>> umanga
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>     
>
>   


From marcel.huntemann at gmail.com  Wed Jan 21 02:42:14 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Tue, 20 Jan 2009 18:42:14 -0800
Subject: [Biojava-l] How to get translated sequence out of blast result
Message-ID: <49768B86.20707@Gmail.com>

Hi!

I've a multiple fasta file with a lot of nucleotide sequences in it. I
made a blastx with this file against a database. Now I want to parse the
blast result. To be more precisely: I want to get the translated protein
query sequence with it's start and stop position for each hit.
I am using the example code from the BioJava cookbook
(http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). The parsing works
fine so far, besides one problem. I am able to get the start and stop
position for the query sequence via hit.getQueryStart() and
hit.getQueryEnd(). But I couldn't figure out yet, how to get the
translated protein query sequence out of the blast result. I couldn't find
something like hit.getQuerySequence() or similar. I would guess that
something like that exists already somehwere or am I wrong and I've to
implement it myself?

Thanks,
Marcel


From markjschreiber at gmail.com  Thu Jan 22 02:30:54 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 22 Jan 2009 10:30:54 +0800
Subject: [Biojava-l] Off topic: JDK6 and JAX-WS 2.1
Message-ID: <93b45ca50901211830i5af3e213p9db9a6d10f42fa75@mail.gmail.com>

Sorry for the off topic post but this is something that has caused me to
loose quite a bit of hair recently. If you're planning on doing webservice
development with JAX-WS don't use JDK6 unless you use a version more recent
than update 3.
I'll spare you the gory details but versions of JDK6u4 and onwards use
JAX-WS 2.1 which prevents the need for playing with endorsed directories etc
which is very tricky in IDEs and not un-complicated with Ant.

- Mark


From marcel.huntemann at gmail.com  Fri Jan 23 00:48:57 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Thu, 22 Jan 2009 16:48:57 -0800
Subject: [Biojava-l] Problem with blast file parser
Message-ID: <497913F9.70009@Gmail.com>

Hi!

I am experiencing a strange problem with the Blast parser. I am using the
code from the BioJava CookBook
(http://biojava.org/wiki/BioJava:CookBook:Blast:Parser). In the blast file
are the results of 20 contigs. The problem is that the parser only gives
me the results of every other sequence. So I get the results for contig #
1, 3, 5, 7, 9, 11 and then it continues with the even ones 12, 14, 16, 18
and 20. Did anyone experience the same problems or maybe knows what causes
them?

Thanks,
Marcel


From charles at imbusch.net  Fri Jan 23 16:17:32 2009
From: charles at imbusch.net (Charles Imbusch)
Date: Fri, 23 Jan 2009 17:17:32 +0100
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497913F9.70009@Gmail.com>
References: <497913F9.70009@Gmail.com>
Message-ID: <4979ED9C.6040207@imbusch.net>

Hello Marcel,

I also do experience the problem that the parser is skipping
the even result numbers. I have not found a sufficient solution
for that, so I gave up on parsing on a blast result file containing
multiple results. Instead I splitted up the big fasta file into
serveral ones, so that I just get one result for one fasta file.
That works, even it's not the best solution for it.

Let me know if you find another solution for that problem.

Cheers,
 Charles


From markjschreiber at gmail.com  Sat Jan 24 02:20:36 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 24 Jan 2009 10:20:36 +0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <4979ED9C.6040207@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
Message-ID: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>

Is this XML parsing or blast text output?
- Mark

On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch <charles at imbusch.net>wrote:

> Hello Marcel,
>
> I also do experience the problem that the parser is skipping
> the even result numbers. I have not found a sufficient solution
> for that, so I gave up on parsing on a blast result file containing
> multiple results. Instead I splitted up the big fasta file into
> serveral ones, so that I just get one result for one fasta file.
> That works, even it's not the best solution for it.
>
> Let me know if you find another solution for that problem.
>
> Cheers,
> Charles
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From marcel.huntemann at gmail.com  Sat Jan 24 03:54:02 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Fri, 23 Jan 2009 19:54:02 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
Message-ID: <497A90DA.5070104@Gmail.com>

As I said, I am using the code from
http://biojava.org/wiki/BioJava:CookBook:Blast:Parser.
I have a normal text file that was created by blast. I thought that the
given code converts the input stream from the file into SAX events.
Do I have to do another step, before I use the code of that example?

Cheers,
Marcel


Mark Schreiber wrote:
> Is this XML parsing or blast text output?
> 
> - Mark
> 
> On Sat, Jan 24, 2009 at 12:17 AM, Charles Imbusch <charles at imbusch.net
> <mailto:charles at imbusch.net>> wrote:
> 
>     Hello Marcel,
> 
>     I also do experience the problem that the parser is skipping
>     the even result numbers. I have not found a sufficient solution
>     for that, so I gave up on parsing on a blast result file containing
>     multiple results. Instead I splitted up the big fasta file into
>     serveral ones, so that I just get one result for one fasta file.
>     That works, even it's not the best solution for it.
> 
>     Let me know if you find another solution for that problem.
> 
>     Cheers,
>     Charles
> 
> 
>     _______________________________________________
>     Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 


From charles at imbusch.net  Sun Jan 25 11:54:53 2009
From: charles at imbusch.net (Charles Imbusch)
Date: Sun, 25 Jan 2009 12:54:53 +0100
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
Message-ID: <497C530D.7090006@imbusch.net>

Hello Mark,

same here. I'm parsing plain text output.

Cheers,
  Charles

Mark Schreiber schrieb:
> Is this XML parsing or blast text output?
>
> - Mark


From markjschreiber at gmail.com  Mon Jan 26 02:45:25 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Mon, 26 Jan 2009 10:45:25 +0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497C530D.7090006@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
	<497C530D.7090006@imbusch.net>
Message-ID: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>

Have you tried parsing the XML output. While the text parser does
'fake' XML by emitting SAX events I think that the XML parser may be a
lot more stable (the XML output of blast is more stable, at least
recently anyway).

I know this isn't the best solution to your problem but the default
text output of BLAST is not the most parseable. In fact it is probably
the least machine readable of all the blast outputs and definitely the
least stable.

- Mark

On Sun, Jan 25, 2009 at 7:54 PM, Charles Imbusch <charles at imbusch.net> wrote:
>
> Hello Mark,
>
> same here. I'm parsing plain text output.
>
> Cheers,
>  Charles
>
> Mark Schreiber schrieb:
>>
>> Is this XML parsing or blast text output?
>>
>> - Mark


From ahmed.elmasri at gmail.com  Mon Jan 26 06:35:20 2009
From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.)
Date: Mon, 26 Jan 2009 01:35:20 -0500
Subject: [Biojava-l] Depreciated methods
Message-ID: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>

Hello list,
I am new to BioJava and I have been trying some of its examples. I came
across some of the depreciated methods and I am not sure if they will be
removed entirely any time soon? I also found some of the examples
problematic and didn't run properly even though I am following the
instructions stated in the comment section.
Please let me know if you have answers for my questions.
Best wishes,
Ahmed

-- 
Ahmed Abdeen Hamed
Scientific Informatics Project Leader
Marine Biological Laboratory Woods Hole, MA
--
Ph.D. student, Complex Systems
School of Informatics, Indiana University


From holland at eaglegenomics.com  Mon Jan 26 08:10:42 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 26 Jan 2009 08:10:42 +0000
Subject: [Biojava-l] Depreciated methods
In-Reply-To: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>
References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>
Message-ID: <497D7002.3010101@eaglegenomics.com>

> I am new to BioJava and I have been trying some of its examples. I came
> across some of the depreciated methods and I am not sure if they will be
> removed entirely any time soon? 

Deprecated = may be removed without notice in any future release. I
couldn't say for any individual method, but in general it's a bad idea
to use anything that is marked deprecated when writing new code.

> I also found some of the examples
> problematic and didn't run properly even though I am following the
> instructions stated in the comment section.

Could you specify exactly which examples didn't work, and the exact
problems you had with them?

thanks,
Richard

> Please let me know if you have answers for my questions.
> Best wishes,
> Ahmed
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From charles at imbusch.net  Mon Jan 26 11:04:09 2009
From: charles at imbusch.net (Charles Imbusch)
Date: Mon, 26 Jan 2009 12:04:09 +0100
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>	
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	
	<497C530D.7090006@imbusch.net>
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
Message-ID: <497D98A9.6010904@imbusch.net>

Hello Mark,

no I haven't tried to parse XML output from Blast.
Just because of the fact that plain text output can be
viewed with any editor. That's still very convenient.

But I'm keen: is there actually an easy to use program for viewing
XML output?
Another option would be to generate XML and plain text
output from Blast at the same time (in one run). But I couldn't find a
way to do so. Maybe I missed something?

Cheers,
  Charles

Mark Schreiber schrieb:
> Have you tried parsing the XML output. While the text parser does
> 'fake' XML by emitting SAX events I think that the XML parser may be a
> lot more stable (the XML output of blast is more stable, at least
> recently anyway).
>
> I know this isn't the best solution to your problem but the default
> text output of BLAST is not the most parseable. In fact it is probably
> the least machine readable of all the blast outputs and definitely the
> least stable.
>
> - Mark
>   


From holland at eaglegenomics.com  Mon Jan 26 11:21:26 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 26 Jan 2009 11:21:26 +0000
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497D98A9.6010904@imbusch.net>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>		<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>		<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
	<497D98A9.6010904@imbusch.net>
Message-ID: <497D9CB6.4010403@eaglegenomics.com>

This app looks hopeful for viewing blast output - haven't tried it
though...:

http://www.korilog.com/index.php/BlastViewer.html

Otherwise, no there's no way of making Blast output in more than one
format at once. It's either text, or XML, but it won't do both.

cheers,
Richard

Charles Imbusch wrote:
> Hello Mark,
> 
> no I haven't tried to parse XML output from Blast.
> Just because of the fact that plain text output can be
> viewed with any editor. That's still very convenient.
> 
> But I'm keen: is there actually an easy to use program for viewing
> XML output?
> Another option would be to generate XML and plain text
> output from Blast at the same time (in one run). But I couldn't find a
> way to do so. Maybe I missed something?
> 
> Cheers,
>  Charles
> 
> Mark Schreiber schrieb:
>> Have you tried parsing the XML output. While the text parser does
>> 'fake' XML by emitting SAX events I think that the XML parser may be a
>> lot more stable (the XML output of blast is more stable, at least
>> recently anyway).
>>
>> I know this isn't the best solution to your problem but the default
>> text output of BLAST is not the most parseable. In fact it is probably
>> the least machine readable of all the blast outputs and definitely the
>> least stable.
>>
>> - Mark
>>   
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ahmed.elmasri at gmail.com  Mon Jan 26 17:24:20 2009
From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.)
Date: Mon, 26 Jan 2009 12:24:20 -0500
Subject: [Biojava-l] Depreciated methods
In-Reply-To: <497D7002.3010101@eaglegenomics.com>
References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>
	<497D7002.3010101@eaglegenomics.com>
Message-ID: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com>

Thanks Richard!

For the deprecated methods, is there a reference or a wiki that maps the
deprecated ones to ones that should be used instead?

As for the examples I am having trouble with: WriteToFasta is one. Here is
the error that I am getting:
java.lang.IllegalArgumentException: No alphabet was set in the identifier
    at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928)
    at examples.WriteToFasta.main(WriteToFasta.java:43)

And here is my parameter value:
      //get the int constant for the file type
      int fileType = Integer.parseInt("2");

I would appreciate any help.
Best wishes,
Ahmed


On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> > I am new to BioJava and I have been trying some of its examples. I came
> > across some of the depreciated methods and I am not sure if they will be
> > removed entirely any time soon?
>
> Deprecated = may be removed without notice in any future release. I
> couldn't say for any individual method, but in general it's a bad idea
> to use anything that is marked deprecated when writing new code.
>
> > I also found some of the examples
> > problematic and didn't run properly even though I am following the
> > instructions stated in the comment section.
>
> Could you specify exactly which examples didn't work, and the exact
> problems you had with them?
>
> thanks,
> Richard
>
> > Please let me know if you have answers for my questions.
> > Best wishes,
> > Ahmed
> >
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>


-- 
Ahmed Abdeen Hamed
Scientific Informatics Project Leader
Marine Biological Laboratory Woods Hole, MA
--
Ph.D. student, Complex Systems
School of Informatics, Indiana University


From holland at eaglegenomics.com  Mon Jan 26 17:30:00 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 26 Jan 2009 17:30:00 +0000
Subject: [Biojava-l] Depreciated methods
In-Reply-To: <5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com>
References: <5cdd31570901252235o419eb0b1ydf511c59c1bf71b3@mail.gmail.com>	
	<497D7002.3010101@eaglegenomics.com>
	<5cdd31570901260924i24c5c9a1meb995929f6502482@mail.gmail.com>
Message-ID: <497DF318.8070703@eaglegenomics.com>

Most methods include a note saying which method should be used instead.
For those that don't, take a look in the org.biojavax packages to see if
there are suitable alternative classes.

In the case of the deprecated SeqIOTools.fileToBioJava, a much better
version of the FASTA parser/writer exists in the org.biojavax packages.
Instructions on how to use it are here:

http://www.biojava.org/wiki/BioJava:BioJavaXDocs#Example

cheers,
Richard

Hamed, Ahmed A. wrote:
> Thanks Richard!
> 
> For the deprecated methods, is there a reference or a wiki that maps the
> deprecated ones to ones that should be used instead?
> 
> As for the examples I am having trouble with: WriteToFasta is one. Here
> is the error that I am getting:
> java.lang.IllegalArgumentException: No alphabet was set in the identifier
>     at org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:928)
>     at examples.WriteToFasta.main(WriteToFasta.java:43)
> 
> And here is my parameter value:
>       //get the int constant for the file type
>       int fileType = Integer.parseInt("2");
> 
> I would appreciate any help.
> Best wishes,
> Ahmed
> 
> 
> 
> 
> On Mon, Jan 26, 2009 at 3:10 AM, Richard Holland
> <holland at eaglegenomics.com <mailto:holland at eaglegenomics.com>> wrote:
> 
>     > I am new to BioJava and I have been trying some of its examples. I
>     came
>     > across some of the depreciated methods and I am not sure if they
>     will be
>     > removed entirely any time soon?
> 
>     Deprecated = may be removed without notice in any future release. I
>     couldn't say for any individual method, but in general it's a bad idea
>     to use anything that is marked deprecated when writing new code.
> 
>     > I also found some of the examples
>     > problematic and didn't run properly even though I am following the
>     > instructions stated in the comment section.
> 
>     Could you specify exactly which examples didn't work, and the exact
>     problems you had with them?
> 
>     thanks,
>     Richard
> 
>     > Please let me know if you have answers for my questions.
>     > Best wishes,
>     > Ahmed
>     >
> 
>     --
>     Richard Holland, BSc MBCS
>     Finance Director, Eagle Genomics Ltd
>     M: +44 7500 438846 | E: holland at eaglegenomics.com
>     <mailto:holland at eaglegenomics.com>
>     http://www.eaglegenomics.com/
> 
> 
> 
> 
> -- 
> Ahmed Abdeen Hamed
> Scientific Informatics Project Leader
> Marine Biological Laboratory Woods Hole, MA
> --
> Ph.D. student, Complex Systems
> School of Informatics, Indiana University
> 
> 
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Tue Jan 27 00:55:32 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 27 Jan 2009 08:55:32 +0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <497D98A9.6010904@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
	<497C530D.7090006@imbusch.net>
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
	<497D98A9.6010904@imbusch.net>
Message-ID: <93b45ca50901261655n727219cdnd1ee27bf3b0d31e6@mail.gmail.com>

You can generate plain text from XML using an XSLT. It probably won't
be identical to the BLAST text output but that format is not so stable
anyway.


On Mon, Jan 26, 2009 at 7:04 PM, Charles Imbusch <charles at imbusch.net> wrote:
> Hello Mark,
>
> no I haven't tried to parse XML output from Blast.
> Just because of the fact that plain text output can be
> viewed with any editor. That's still very convenient.
>
> But I'm keen: is there actually an easy to use program for viewing
> XML output?
> Another option would be to generate XML and plain text
> output from Blast at the same time (in one run). But I couldn't find a
> way to do so. Maybe I missed something?
>
> Cheers,
>  Charles
>
> Mark Schreiber schrieb:
>>
>> Have you tried parsing the XML output. While the text parser does
>> 'fake' XML by emitting SAX events I think that the XML parser may be a
>> lot more stable (the XML output of blast is more stable, at least
>> recently anyway).
>>
>> I know this isn't the best solution to your problem but the default
>> text output of BLAST is not the most parseable. In fact it is probably
>> the least machine readable of all the blast outputs and definitely the
>> least stable.
>>
>> - Mark
>>
>
>


From nir at rosettadesigngroup.com  Tue Jan 27 13:08:34 2009
From: nir at rosettadesigngroup.com (Nir London)
Date: Tue, 27 Jan 2009 15:08:34 +0200
Subject: [Biojava-l] Rosetta Academic Training Workshop
Message-ID: <2F3C5E9F-418B-489F-A852-3A99248D03AC@rosettadesigngroup.com>

Due to public demand, ?Rosetta Design Group? is organizing a ?Rosetta?  
software training workshop, aimed for academic groups. The format of  
the workshop will be a ?webinar? - a web seminar, enabling more groups  
to attend while avoiding the annoying jet lag and accommodation  
troubles. Would you be interested in participating? If so please fill  
the form located at: http://rosettadesigngroup.com/blog/rosetta-academic-workshop/ 
  and we will contact you when the details are finalized.*

Nir London | Rosetta Design Group
http://rosettadesigngroup.com/

* If you?re not from an academic group, don?t worry, write us anyway?


From gwu at molbio.mgh.harvard.edu  Thu Jan 29 04:51:28 2009
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Wed, 28 Jan 2009 23:51:28 -0500
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <497D9CB6.4010403@eaglegenomics.com>
References: <497913F9.70009@Gmail.com>	<4979ED9C.6040207@imbusch.net>		<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>		<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>
Message-ID: <498135D0.9060103@molbio.mgh.harvard.edu>

Hi Everyone,

I have a piece of code to parse Genbank file and retrieve gene sequence 
and related information. It works well with sequences such as 
Arabidopsis thaliana, C. elegans, Bos taurus. But it failed with Mus 
musculus chromosome 2. The contig that the code failed on is the largest 
one in my test. Contig NT_039207 has 116366104 bp, but the code shows 
it's cut to 100000020 bp. That causes some gene coordinates out of 
range. Attached is the code. Can anyone give some suggesttion?

The Mus musculus Genbank file can be downloaded at :
ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz

Thanks in advance

Gang
==========================================
public class TestMus {
    public void testMusChr2() throws FileNotFoundException, 
NoSuchElementException, BioException {
        String fp="/tmp/mm_alt_chr2.gbk";
        System.out.println("File: " + fp);
        BufferedReader gReader = new BufferedReader(new 
InputStreamReader(new FileInputStream(new File(fp))));
        Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
        RichSequenceIterator seqI = 
RichSequence.IOTools.readGenbankDNA(gReader, ns);
        while (seqI.hasNext()) {
            RichSequence seq = seqI.nextRichSequence();
            String organism = seq.getTaxon().getDisplayName();
            String accession = seq.getAccession();
            String identifier = seq.getIdentifier();
            int taxonID = seq.getTaxon().getNCBITaxID();
            String division = seq.getDivision();
            String seqVersion = "" + seq.getSeqVersion();
            int seqLength = seq.length();
            String description = seq.getDescription();
            System.out.println("Organism: " + organism
                    + "\nAccession: " + accession
                    + "\nIdentifier: " + identifier
                    + "\nTaxonID: " + taxonID
                    + "\nDivision: " + division
                    + "\nSeqVersion: " + seqVersion
                    + "\nLength: " + seqLength);
            System.out.println("2041-2101: " + seq.subStr(2041, 2101));
            for (Iterator i = seq.features(); i.hasNext();) {
                RichFeature f = (RichFeature) i.next();
                int rank = f.getRank();
                String fType = f.getType();
                if (fType.toLowerCase().equals("gene")) {
                    int startPos=f.getLocation().getMin();
                    int endPos=f.getLocation().getMax();
                    int geneLen=endPos-startPos+1;
                    String sequence=seq.subStr(startPos, endPos);
                    String strand = f.getStrand().getToken() + "";
                    Annotation ann = (Annotation) f.getAnnotation();
                    String geneIdentifier ="";
                    if (ann.containsProperty("locus_tag")) {
                        geneIdentifier=ann.getProperty("locus_tag") + "";
                    }
                    else geneIdentifier=ann.getProperty("gene") + "";

                    String alternativeIdentifiers="";
                    try {
                        alternativeIdentifiers= (String) 
ann.getProperty("gene");

                    } catch(NoSuchElementException e) {}
                    String annotation="";
                    System.out.println(rank + "\t" + geneIdentifier + 
"\t" + alternativeIdentifiers + "\t"
                            + startPos + "\t" + endPos + "\t" + geneLen 
+ "\t" + strand);
                }
            }
        }
    }
    public static void main(String [] args) throws Exception {
       TestMus tm=new TestMus();
        tm.testMusChr2();
    }
}


From markjschreiber at gmail.com  Thu Jan 29 05:43:35 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 29 Jan 2009 13:43:35 +0800
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <498135D0.9060103@molbio.mgh.harvard.edu>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>
	<497C530D.7090006@imbusch.net>
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>
	<497D98A9.6010904@imbusch.net> <497D9CB6.4010403@eaglegenomics.com>
	<498135D0.9060103@molbio.mgh.harvard.edu>
Message-ID: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>

I assume that the downloaded file has the complete sequence in it? Probably
worth checking that it has the complete sequence block (all 116366104 bp).

- Mark

On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu>wrote:

> Hi Everyone,
>
> I have a piece of code to parse Genbank file and retrieve gene sequence and
> related information. It works well with sequences such as Arabidopsis
> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
> 2. The contig that the code failed on is the largest one in my test. Contig
> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
> That causes some gene coordinates out of range. Attached is the code. Can
> anyone give some suggesttion?
>
> The Mus musculus Genbank file can be downloaded at :
> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>
> Thanks in advance
>
> Gang
> ==========================================
> public class TestMus {
>   public void testMusChr2() throws FileNotFoundException,
> NoSuchElementException, BioException {
>       String fp="/tmp/mm_alt_chr2.gbk";
>       System.out.println("File: " + fp);
>       BufferedReader gReader = new BufferedReader(new InputStreamReader(new
> FileInputStream(new File(fp))));
>       Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
>       RichSequenceIterator seqI =
> RichSequence.IOTools.readGenbankDNA(gReader, ns);
>       while (seqI.hasNext()) {
>           RichSequence seq = seqI.nextRichSequence();
>           String organism = seq.getTaxon().getDisplayName();
>           String accession = seq.getAccession();
>           String identifier = seq.getIdentifier();
>           int taxonID = seq.getTaxon().getNCBITaxID();
>           String division = seq.getDivision();
>           String seqVersion = "" + seq.getSeqVersion();
>           int seqLength = seq.length();
>           String description = seq.getDescription();
>           System.out.println("Organism: " + organism
>                   + "\nAccession: " + accession
>                   + "\nIdentifier: " + identifier
>                   + "\nTaxonID: " + taxonID
>                   + "\nDivision: " + division
>                   + "\nSeqVersion: " + seqVersion
>                   + "\nLength: " + seqLength);
>           System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>           for (Iterator i = seq.features(); i.hasNext();) {
>               RichFeature f = (RichFeature) i.next();
>               int rank = f.getRank();
>               String fType = f.getType();
>               if (fType.toLowerCase().equals("gene")) {
>                   int startPos=f.getLocation().getMin();
>                   int endPos=f.getLocation().getMax();
>                   int geneLen=endPos-startPos+1;
>                   String sequence=seq.subStr(startPos, endPos);
>                   String strand = f.getStrand().getToken() + "";
>                   Annotation ann = (Annotation) f.getAnnotation();
>                   String geneIdentifier ="";
>                   if (ann.containsProperty("locus_tag")) {
>                       geneIdentifier=ann.getProperty("locus_tag") + "";
>                   }
>                   else geneIdentifier=ann.getProperty("gene") + "";
>
>                   String alternativeIdentifiers="";
>                   try {
>                       alternativeIdentifiers= (String)
> ann.getProperty("gene");
>
>                   } catch(NoSuchElementException e) {}
>                   String annotation="";
>                   System.out.println(rank + "\t" + geneIdentifier + "\t" +
> alternativeIdentifiers + "\t"
>                           + startPos + "\t" + endPos + "\t" + geneLen +
> "\t" + strand);
>               }
>           }
>       }
>   }
>   public static void main(String [] args) throws Exception {
>      TestMus tm=new TestMus();
>       tm.testMusChr2();
>   }
> }
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From holland at eaglegenomics.com  Thu Jan 29 07:25:10 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 29 Jan 2009 07:25:10 +0000
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>	<498135D0.9060103@molbio.mgh.harvard.edu>
	<93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
Message-ID: <498159D6.8010906@eaglegenomics.com>

Gabrielle Doan posted a solution to this a while back and I believe the
changes have been committed already:

http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html

How old is the copy of BioJava that you're using? Have you tried
checking out the trunk from Subversion to see if that works?

cheers,
Richard

Mark Schreiber wrote:
> I assume that the downloaded file has the complete sequence in it? Probably
> worth checking that it has the complete sequence block (all 116366104 bp).
> 
> - Mark
> 
> On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu>wrote:
> 
>> Hi Everyone,
>>
>> I have a piece of code to parse Genbank file and retrieve gene sequence and
>> related information. It works well with sequences such as Arabidopsis
>> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
>> 2. The contig that the code failed on is the largest one in my test. Contig
>> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
>> That causes some gene coordinates out of range. Attached is the code. Can
>> anyone give some suggesttion?
>>
>> The Mus musculus Genbank file can be downloaded at :
>> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>>
>> Thanks in advance
>>
>> Gang
>> ==========================================
>> public class TestMus {
>>   public void testMusChr2() throws FileNotFoundException,
>> NoSuchElementException, BioException {
>>       String fp="/tmp/mm_alt_chr2.gbk";
>>       System.out.println("File: " + fp);
>>       BufferedReader gReader = new BufferedReader(new InputStreamReader(new
>> FileInputStream(new File(fp))));
>>       Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
>>       RichSequenceIterator seqI =
>> RichSequence.IOTools.readGenbankDNA(gReader, ns);
>>       while (seqI.hasNext()) {
>>           RichSequence seq = seqI.nextRichSequence();
>>           String organism = seq.getTaxon().getDisplayName();
>>           String accession = seq.getAccession();
>>           String identifier = seq.getIdentifier();
>>           int taxonID = seq.getTaxon().getNCBITaxID();
>>           String division = seq.getDivision();
>>           String seqVersion = "" + seq.getSeqVersion();
>>           int seqLength = seq.length();
>>           String description = seq.getDescription();
>>           System.out.println("Organism: " + organism
>>                   + "\nAccession: " + accession
>>                   + "\nIdentifier: " + identifier
>>                   + "\nTaxonID: " + taxonID
>>                   + "\nDivision: " + division
>>                   + "\nSeqVersion: " + seqVersion
>>                   + "\nLength: " + seqLength);
>>           System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>>           for (Iterator i = seq.features(); i.hasNext();) {
>>               RichFeature f = (RichFeature) i.next();
>>               int rank = f.getRank();
>>               String fType = f.getType();
>>               if (fType.toLowerCase().equals("gene")) {
>>                   int startPos=f.getLocation().getMin();
>>                   int endPos=f.getLocation().getMax();
>>                   int geneLen=endPos-startPos+1;
>>                   String sequence=seq.subStr(startPos, endPos);
>>                   String strand = f.getStrand().getToken() + "";
>>                   Annotation ann = (Annotation) f.getAnnotation();
>>                   String geneIdentifier ="";
>>                   if (ann.containsProperty("locus_tag")) {
>>                       geneIdentifier=ann.getProperty("locus_tag") + "";
>>                   }
>>                   else geneIdentifier=ann.getProperty("gene") + "";
>>
>>                   String alternativeIdentifiers="";
>>                   try {
>>                       alternativeIdentifiers= (String)
>> ann.getProperty("gene");
>>
>>                   } catch(NoSuchElementException e) {}
>>                   String annotation="";
>>                   System.out.println(rank + "\t" + geneIdentifier + "\t" +
>> alternativeIdentifiers + "\t"
>>                           + startPos + "\t" + endPos + "\t" + geneLen +
>> "\t" + strand);
>>               }
>>           }
>>       }
>>   }
>>   public static void main(String [] args) throws Exception {
>>      TestMus tm=new TestMus();
>>       tm.testMusChr2();
>>   }
>> }
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From jw12 at sanger.ac.uk  Thu Jan 29 11:20:47 2009
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Thu, 29 Jan 2009 11:20:47 +0000
Subject: [Biojava-l] closing this week: Registrations for DAS workshop.
Message-ID: <207B56E3-C65C-41A4-800E-AF0B9F158CA6@sanger.ac.uk>

DAS is currently being used to share annotations on genomes, protein
alignments, structural and interaction information.

If you are interested in sharing biological information the DAS workshop
below may be of interest to you.

Registration is open for the 2009 DAS workshop (8,9,10th March) at the
Genome Campus, Hinxton UK. If you are interested in attending, please
find out more by going to http://www.dasregistry.org/course.jsp and
register via the web link at the bottom of the page. This workshop will
cater for novice to expert DAS users as each day is optional.

Closing date for registration is 1st Feb 2009. If you register now you
can change the details of your registration any time up until this
closing date. Please register early as places will be limited.


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From holland at eaglegenomics.com  Thu Jan 29 16:10:42 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 29 Jan 2009 16:10:42 +0000
Subject: [Biojava-l] Eagle Genomics is hiring
Message-ID: <4981D502.1000905@eaglegenomics.com>

Hi all,

Apologies if this is inappropriate for the list, but I thought it would
be a good way to reach the kind of people we're looking for.

Richard

=====

Senior Bioinformatics Software Developer
Eagle Genomics Ltd., Cambridge, UK
http://www.eaglegenomics.com/

We are a young and exciting bioinformatics company looking to
revolutionise the way in which industry and academia work together. We
are based at the heart of Europe's largest biotech cluster in Cambridge,
UK. As we expand our client base, we're looking to build a talented and
committed team of experts. We are currently looking for a software
developer to work on a wide range of complex projects, and who is happy
to work face-to-face with our customers. Ideally you will have had
substantial prior experience working in a life science company or
research institute, however we will also consider graduates with a track
record in bioinformatics.

In addition to your superb technical skills, you will also:
* have the ability to quickly translate scientific problems into real
software solutions,
* be able to put technical concepts into simple language for end users
to understand,
* be able to pick up new skills and techniques in record time,
* work well in a collaborative team environment,
* be creative, innovative, and forward-thinking.

You will have hands-on experience in some of the following:
* Java,
* Perl,
* SQL query design,
* Relational database schema design,
* Open-source bioinformatics toolkits such as BioJava, BioPerl, BioSQL,
etc.,
* Ensembl,
* BioMart,
* DAS,
* Taverna,
* Oracle Life Sciences Platform,
* Oracle database administration,
* MySQL database administration,
* VMware virtual machines,
* Grid computing and parallelisation.

The preferred candidate will be able to work from our offices in
Cambridge, but we would also consider telecommuting arrangements.

We offer a competitive salary and a range of company benefits.

To apply, please send your CV and cover letter as PDF documents to
jobs at eaglegenomics.com. If you have any questions about the position or
would like to discuss it further before applying, please use the same
email address. We are only able to offer positions to EEA citizens and
permanent residents, or Tier 1 migrants under the new UK points-based
immigration scheme.

Individual contracting arrangements could be considered but we will
prefer those candidates who can work with us as employees. No agencies
please.

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From gwu at molbio.mgh.harvard.edu  Thu Jan 29 18:40:06 2009
From: gwu at molbio.mgh.harvard.edu (gwu)
Date: Thu, 29 Jan 2009 13:40:06 -0500
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>	
	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	
	<497C530D.7090006@imbusch.net>	
	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	
	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>	
	<498135D0.9060103@molbio.mgh.harvard.edu>
	<93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
Message-ID: <4981F806.7070100@molbio.mgh.harvard.edu>

Thanks Mark. I did parse out the sequence block with sed and the length 
agrees with what the Genbank says.

Gang

Mark Schreiber wrote:
> I assume that the downloaded file has the complete sequence in it? 
> Probably worth checking that it has the complete sequence block (all 
> 116366104 bp).
>  
> - Mark
>
> On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu 
> <mailto:gwu at molbio.mgh.harvard.edu>> wrote:
>
>     Hi Everyone,
>
>     I have a piece of code to parse Genbank file and retrieve gene
>     sequence and related information. It works well with sequences
>     such as Arabidopsis thaliana, C. elegans, Bos taurus. But it
>     failed with Mus musculus chromosome 2. The contig that the code
>     failed on is the largest one in my test. Contig NT_039207 has
>     116366104 bp, but the code shows it's cut to 100000020 bp. That
>     causes some gene coordinates out of range. Attached is the code.
>     Can anyone give some suggesttion?
>
>     The Mus musculus Genbank file can be downloaded at :
>     ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>
>     Thanks in advance
>
>     Gang
>     ==========================================
>     public class TestMus {
>       public void testMusChr2() throws FileNotFoundException,
>     NoSuchElementException, BioException {
>           String fp="/tmp/mm_alt_chr2.gbk";
>           System.out.println("File: " + fp);
>           BufferedReader gReader = new BufferedReader(new
>     InputStreamReader(new FileInputStream(new File(fp))));
>           Namespace ns = (Namespace)
>     RichObjectFactory.getDefaultNamespace();
>           RichSequenceIterator seqI =
>     RichSequence.IOTools.readGenbankDNA(gReader, ns);
>           while (seqI.hasNext()) {
>               RichSequence seq = seqI.nextRichSequence();
>               String organism = seq.getTaxon().getDisplayName();
>               String accession = seq.getAccession();
>               String identifier = seq.getIdentifier();
>               int taxonID = seq.getTaxon().getNCBITaxID();
>               String division = seq.getDivision();
>               String seqVersion = "" + seq.getSeqVersion();
>               int seqLength = seq.length();
>               String description = seq.getDescription();
>               System.out.println("Organism: " + organism
>                       + "\nAccession: " + accession
>                       + "\nIdentifier: " + identifier
>                       + "\nTaxonID: " + taxonID
>                       + "\nDivision: " + division
>                       + "\nSeqVersion: " + seqVersion
>                       + "\nLength: " + seqLength);
>               System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>               for (Iterator i = seq.features(); i.hasNext();) {
>                   RichFeature f = (RichFeature) i.next();
>                   int rank = f.getRank();
>                   String fType = f.getType();
>                   if (fType.toLowerCase().equals("gene")) {
>                       int startPos=f.getLocation().getMin();
>                       int endPos=f.getLocation().getMax();
>                       int geneLen=endPos-startPos+1;
>                       String sequence=seq.subStr(startPos, endPos);
>                       String strand = f.getStrand().getToken() + "";
>                       Annotation ann = (Annotation) f.getAnnotation();
>                       String geneIdentifier ="";
>                       if (ann.containsProperty("locus_tag")) {
>                           geneIdentifier=ann.getProperty("locus_tag")
>     + "";
>                       }
>                       else geneIdentifier=ann.getProperty("gene") + "";
>
>                       String alternativeIdentifiers="";
>                       try {
>                           alternativeIdentifiers= (String)
>     ann.getProperty("gene");
>
>                       } catch(NoSuchElementException e) {}
>                       String annotation="";
>                       System.out.println(rank + "\t" + geneIdentifier
>     + "\t" + alternativeIdentifiers + "\t"
>                               + startPos + "\t" + endPos + "\t" +
>     geneLen + "\t" + strand);
>                   }
>               }
>           }
>       }
>       public static void main(String [] args) throws Exception {
>          TestMus tm=new TestMus();
>           tm.testMusChr2();
>       }
>     }
>     _______________________________________________
>     Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>


From gwu at molbio.mgh.harvard.edu  Thu Jan 29 19:28:42 2009
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Thu, 29 Jan 2009 14:28:42 -0500
Subject: [Biojava-l] Genbank file parser error
In-Reply-To: <498159D6.8010906@eaglegenomics.com>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>	<93b45ca50901231820k4e5c0cbw530fe3fef6a7ea20@mail.gmail.com>	<497C530D.7090006@imbusch.net>	<93b45ca50901251845s4ede7060x14b1c61a28dbb4a4@mail.gmail.com>	<497D98A9.6010904@imbusch.net>
	<497D9CB6.4010403@eaglegenomics.com>	<498135D0.9060103@molbio.mgh.harvard.edu>
	<93b45ca50901282143q54d69018q55ef005cfb8e38ca@mail.gmail.com>
	<498159D6.8010906@eaglegenomics.com>
Message-ID: <4982036A.7070302@molbio.mgh.harvard.edu>

Thanks Richard. That is exactly the same issue. The latest Subversion 
trunk fixed the problem.

Thanks again for the quick response.

Gang

Richard Holland wrote:
> Gabrielle Doan posted a solution to this a while back and I believe the
> changes have been committed already:
>
> http://www.mail-archive.com/biojava-l at lists.open-bio.org/msg01036.html
>
> How old is the copy of BioJava that you're using? Have you tried
> checking out the trunk from Subversion to see if that works?
>
> cheers,
> Richard
>
> Mark Schreiber wrote:
>   
>> I assume that the downloaded file has the complete sequence in it? Probably
>> worth checking that it has the complete sequence block (all 116366104 bp).
>>
>> - Mark
>>
>> On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu>wrote:
>>
>>     
>>> Hi Everyone,
>>>
>>> I have a piece of code to parse Genbank file and retrieve gene sequence and
>>> related information. It works well with sequences such as Arabidopsis
>>> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
>>> 2. The contig that the code failed on is the largest one in my test. Contig
>>> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
>>> That causes some gene coordinates out of range. Attached is the code. Can
>>> anyone give some suggesttion?
>>>
>>> The Mus musculus Genbank file can be downloaded at :
>>> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>>>
>>> Thanks in advance
>>>
>>> Gang
>>> ==========================================
>>> public class TestMus {
>>>   public void testMusChr2() throws FileNotFoundException,
>>> NoSuchElementException, BioException {
>>>       String fp="/tmp/mm_alt_chr2.gbk";
>>>       System.out.println("File: " + fp);
>>>       BufferedReader gReader = new BufferedReader(new InputStreamReader(new
>>> FileInputStream(new File(fp))));
>>>       Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
>>>       RichSequenceIterator seqI =
>>> RichSequence.IOTools.readGenbankDNA(gReader, ns);
>>>       while (seqI.hasNext()) {
>>>           RichSequence seq = seqI.nextRichSequence();
>>>           String organism = seq.getTaxon().getDisplayName();
>>>           String accession = seq.getAccession();
>>>           String identifier = seq.getIdentifier();
>>>           int taxonID = seq.getTaxon().getNCBITaxID();
>>>           String division = seq.getDivision();
>>>           String seqVersion = "" + seq.getSeqVersion();
>>>           int seqLength = seq.length();
>>>           String description = seq.getDescription();
>>>           System.out.println("Organism: " + organism
>>>                   + "\nAccession: " + accession
>>>                   + "\nIdentifier: " + identifier
>>>                   + "\nTaxonID: " + taxonID
>>>                   + "\nDivision: " + division
>>>                   + "\nSeqVersion: " + seqVersion
>>>                   + "\nLength: " + seqLength);
>>>           System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>>>           for (Iterator i = seq.features(); i.hasNext();) {
>>>               RichFeature f = (RichFeature) i.next();
>>>               int rank = f.getRank();
>>>               String fType = f.getType();
>>>               if (fType.toLowerCase().equals("gene")) {
>>>                   int startPos=f.getLocation().getMin();
>>>                   int endPos=f.getLocation().getMax();
>>>                   int geneLen=endPos-startPos+1;
>>>                   String sequence=seq.subStr(startPos, endPos);
>>>                   String strand = f.getStrand().getToken() + "";
>>>                   Annotation ann = (Annotation) f.getAnnotation();
>>>                   String geneIdentifier ="";
>>>                   if (ann.containsProperty("locus_tag")) {
>>>                       geneIdentifier=ann.getProperty("locus_tag") + "";
>>>                   }
>>>                   else geneIdentifier=ann.getProperty("gene") + "";
>>>
>>>                   String alternativeIdentifiers="";
>>>                   try {
>>>                       alternativeIdentifiers= (String)
>>> ann.getProperty("gene");
>>>
>>>                   } catch(NoSuchElementException e) {}
>>>                   String annotation="";
>>>                   System.out.println(rank + "\t" + geneIdentifier + "\t" +
>>> alternativeIdentifiers + "\t"
>>>                           + startPos + "\t" + endPos + "\t" + geneLen +
>>> "\t" + strand);
>>>               }
>>>           }
>>>       }
>>>   }
>>>   public static void main(String [] args) throws Exception {
>>>      TestMus tm=new TestMus();
>>>       tm.testMusChr2();
>>>   }
>>> }
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>       
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>     
>
>   


From marcel.huntemann at gmail.com  Thu Jan 29 20:55:15 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Thu, 29 Jan 2009 12:55:15 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <4979ED9C.6040207@imbusch.net>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
Message-ID: <498217B3.4010703@Gmail.com>

Hi Charles!

I've "found" a solution now. After dealing a couple of days with the
terrible xml output of blast and BioJava's BlastXMLParser (which also
wasn't working properly), I decided to have a look at the source code and
try to figure out myself what was wrong with the BlastLikeSAXParser. So I
 checked out the present status of the source code via the anonymous svn
checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After
a couple of hours and me not finding an error that could cause this
behavior, I thought I'll just give it a try and compiled the checked out
source via ant. Then used the new created biojava.jar and suddenly
everything went perfectly!
So, whatever the error was (unfortunately I don't have the old source code
to make a diff on certain files), it is already corrected in the
up-to-the-minute version in the subversion system.
Try it out!

Cheers,
Marcel


Charles Imbusch wrote:
> Hello Marcel,
> 
> I also do experience the problem that the parser is skipping
> the even result numbers. I have not found a sufficient solution
> for that, so I gave up on parsing on a blast result file containing
> multiple results. Instead I splitted up the big fasta file into
> serveral ones, so that I just get one result for one fasta file.
> That works, even it's not the best solution for it.
> 
> Let me know if you find another solution for that problem.
> 
> Cheers,
> Charles
> 
> 


From andreas at sdsc.edu  Thu Jan 29 21:11:51 2009
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 29 Jan 2009 13:11:51 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <498217B3.4010703@Gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>
	<498217B3.4010703@Gmail.com>
Message-ID: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>

Hi,

We had a couple of bug reports recently regarding issues that already
got fixed in the latest biojava builds from SVN. I think it is time to
start preparing the next biojava release ( 1.7 )  to make sure
everybody gets up to the latest status...

Andreas


On Thu, Jan 29, 2009 at 12:55 PM, Marcel Huntemann
<marcel.huntemann at gmail.com> wrote:
> Hi Charles!
>
> I've "found" a solution now. After dealing a couple of days with the
> terrible xml output of blast and BioJava's BlastXMLParser (which also
> wasn't working properly), I decided to have a look at the source code and
> try to figure out myself what was wrong with the BlastLikeSAXParser. So I
>  checked out the present status of the source code via the anonymous svn
> checkout (howto here: http://biojava.org/wiki/CVS_to_SVN_Migration). After
> a couple of hours and me not finding an error that could cause this
> behavior, I thought I'll just give it a try and compiled the checked out
> source via ant. Then used the new created biojava.jar and suddenly
> everything went perfectly!
> So, whatever the error was (unfortunately I don't have the old source code
> to make a diff on certain files), it is already corrected in the
> up-to-the-minute version in the subversion system.
> Try it out!
>
> Cheers,
> Marcel
>
>
> Charles Imbusch wrote:
>> Hello Marcel,
>>
>> I also do experience the problem that the parser is skipping
>> the even result numbers. I have not found a sufficient solution
>> for that, so I gave up on parsing on a blast result file containing
>> multiple results. Instead I splitted up the big fasta file into
>> serveral ones, so that I just get one result for one fasta file.
>> That works, even it's not the best solution for it.
>>
>> Let me know if you find another solution for that problem.
>>
>> Cheers,
>> Charles
>>
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From marcel.huntemann at gmail.com  Thu Jan 29 21:24:55 2009
From: marcel.huntemann at gmail.com (Marcel Huntemann)
Date: Thu, 29 Jan 2009 13:24:55 -0800
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>
References: <497913F9.70009@Gmail.com> <4979ED9C.6040207@imbusch.net>	
	<498217B3.4010703@Gmail.com>
	<59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>
Message-ID: <49821EA7.9090603@Gmail.com>

That sounds reasonable. I bet a lot of people would appreciate that!

Andreas Prlic wrote:
> Hi,
> 
> We had a couple of bug reports recently regarding issues that already
> got fixed in the latest biojava builds from SVN. I think it is time to
> start preparing the next biojava release ( 1.7 )  to make sure
> everybody gets up to the latest status...
> 
> Andreas


From marcin.swiatek at mail.mcgill.ca  Thu Jan 29 21:56:29 2009
From: marcin.swiatek at mail.mcgill.ca (Marcin Swiatek)
Date: Thu, 29 Jan 2009 16:56:29 -0500
Subject: [Biojava-l] Problem with blast file parser
In-Reply-To: <49821EA7.9090603@Gmail.com>
References: <497913F9.70009@Gmail.com>
	<4979ED9C.6040207@imbusch.net>	<498217B3.4010703@Gmail.com><59a41c430901291311w642a98am68b33cc6c78b15f1@mail.gmail.com>
	<49821EA7.9090603@Gmail.com>
Message-ID: <176A06E658ED0745965C072C5F2C116A02F87314@EXCHANGE2VS2.campus.mcgill.ca>

I personally would. Especially that I have just solved the problem
myself, unaware that someone did that already. 

BTW: the problem I picked up (which seems similar to the description
given) was that new set line (as evaluated by checkNewBlastLikeDataSet
in BlastSAXParser) wasn't picked up by HitSectionSAXParser and neither
it percolated up to BlastSAXParser, thus leaving the state machine of
the parser in a weird state. It would recover by skipping everything up
to the next data set (thus the result of having every other item
processed). 

BTW2: XML parser in 1.6 doesn't deal with new BLAST files either (2.19,
was it?). Has this been fixed in the SVN repository?

Cheers, 

Marcin


-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Marcel
Huntemann
Sent: Thursday, January 29, 2009 4:25 PM
To: Andreas Prlic
Cc: biojava-dev; biojava-l at biojava.org
Subject: Re: [Biojava-l] Problem with blast file parser

That sounds reasonable. I bet a lot of people would appreciate that!

Andreas Prlic wrote:
> Hi,
> 
> We had a couple of bug reports recently regarding issues that already
> got fixed in the latest biojava builds from SVN. I think it is time to
> start preparing the next biojava release ( 1.7 )  to make sure
> everybody gets up to the latest status...
> 
> Andreas
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From umanga.bio at gmail.com  Fri Jan 30 12:00:41 2009
From: umanga.bio at gmail.com (Ashika Umanga Umangiliya)
Date: Fri, 30 Jan 2009 21:00:41 +0900
Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ?
Message-ID: <b4d9edaa0901300400m41f1b19dy764da72db6162067@mail.gmail.com>

Greetings all,

In the application I develop ,I  want to draw chromatogram from AB1. I come
from computer science background have little knowledge this subject.Where
can I find information on this?
Can I draw the graph using data in AB1 file? Or is there any function ?


thanks in advance,

Umanga


From ayates at ebi.ac.uk  Fri Jan 30 12:57:40 2009
From: ayates at ebi.ac.uk (Andy Yates)
Date: Fri, 30 Jan 2009 12:57:40 +0000
Subject: [Biojava-l] Reading AB1 files and drawing chromatogram ?
In-Reply-To: <b4d9edaa0901300400m41f1b19dy764da72db6162067@mail.gmail.com>
References: <b4d9edaa0901300400m41f1b19dy764da72db6162067@mail.gmail.com>
Message-ID: <4982F944.7080905@ebi.ac.uk>

Hi Umanga,

Fortunately BioJava has an API for drawing chromatograms located under
org.biojava.bio.chromatogram & org.biojava.bio.chromatogram.graphic. To
parse in a AB1 file you can run the following code:

import java.io.*;
import org.biojava.bio.program.abi.ABIFChromatogram;
import org.biojava.bio.chromatogram.*;
import org.biojava.bio.chromatogram.graphic.*;

File file = new File("chr.ab1");
Chromatogram c = ABIFChromatogram.parse(file);
ChromatogramGraphic cg = new ChromatogramGraphic(c);
//Can't remember how to get this so you'll have to find out
Graphics2D context = getContextFromSomewhere();
cg.drawTo(cg);

You can configure the size of the image through the ChromatogramGraphic
object & alter a number of ChromatogramGraphic.Option attributes through
ChromatogramGraphic.setOption(ChromatogramGraphic.Option opt, Object value).

This should be enough to get you going. I will warn you that this class
is quite memory intensive & an application I wrote ages ago had very big
memory problems because of it (the drawing component not the file
parsing). An alternative library is available from
http://code.google.com/p/bioview2/ (which was developed by an old
colleague).

Try the biojava code first and if that serves your purpose then great;
if not then try bioview2.

Regards,

Andy Yates

P.S. The AB1 parser only supports the processed data channels in the AB1
file. If you want the raw data from it then you will have to modify the
source or use another library (probably the C library StadenIO) to
convert the raw data into an SCF file.

Ashika Umanga Umangiliya wrote:
> Greetings all,
> 
> In the application I develop ,I  want to draw chromatogram from AB1. I come
> from computer science background have little knowledge this subject.Where
> can I find information on this?
> Can I draw the graph using data in AB1 file? Or is there any function ?
> 
> 
> thanks in advance,
> 
> Umanga
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From ahmed.elmasri at gmail.com  Sat Jan 31 02:41:31 2009
From: ahmed.elmasri at gmail.com (Hamed, Ahmed A.)
Date: Fri, 30 Jan 2009 21:41:31 -0500
Subject: [Biojava-l] Sequence start/end location
Message-ID: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>

Hello list,
I am trying to get find the start and end location of a gene in a gene
sequence. I am reading from a gene FASTA database file. Is there a built-in
method that I can use? The alternative is really painful since I have to
parse a ptt file and not exactly working for me.
Thanks very much!
Ahmed


From markjschreiber at gmail.com  Sat Jan 31 04:15:59 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 31 Jan 2009 12:15:59 +0800
Subject: [Biojava-l] Sequence start/end location
In-Reply-To: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>
References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>
Message-ID: <93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com>

Hi -

Unfortunately your FASTA file won't contain any feature information
which could tell you the start and end. If you don't want to get the
info from the PTT file you might want to look at parsing the Genbank
file instead which will have the feature information.

A PTT parser might not be a bad thing for BioJava though. If you write
one please consider adding it.

- Mark

On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A.
<ahmed.elmasri at gmail.com> wrote:
>
> Hello list,
> I am trying to get find the start and end location of a gene in a gene
> sequence. I am reading from a gene FASTA database file. Is there a built-in
> method that I can use? The alternative is really painful since I have to
> parse a ptt file and not exactly working for me.
> Thanks very much!
> Ahmed
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From markjschreiber at gmail.com  Sat Jan 31 10:25:59 2009
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 31 Jan 2009 18:25:59 +0800
Subject: [Biojava-l] Sequence start/end location
In-Reply-To: <5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com>
References: <5cdd31570901301841i5bee9ba7l56d86afdddb665f5@mail.gmail.com>
	<93b45ca50901302015l4134dce3g684b26cd18c14c21@mail.gmail.com>
	<5cdd31570901302313l12c490bct6768979ce562d40b@mail.gmail.com>
Message-ID: <93b45ca50901310225p6676c282m203e8c4e13ba37f1@mail.gmail.com>

Hi Ahmed -

For a first time contribution it would probably be easiest to post
something to the list and someone with a development account can check
it in for you.

Please make sure to add javadoc comments and a basic JUnit test for
any classes you make.

- Mark

On Sat, Jan 31, 2009 at 3:13 PM, Hamed, Ahmed A.
<ahmed.elmasri at gmail.com> wrote:
> Dear Mark,
> Thank you for your response. I would be happy to contribute my PTTParser if
> you point me to where/how to check it in. I am still new to the BioJava
> community and there is so much to learn.
> Best wishes,
> Ahmed
>
> On Fri, Jan 30, 2009 at 11:15 PM, Mark Schreiber <markjschreiber at gmail.com>
> wrote:
>>
>> Hi -
>>
>> Unfortunately your FASTA file won't contain any feature information
>> which could tell you the start and end. If you don't want to get the
>> info from the PTT file you might want to look at parsing the Genbank
>> file instead which will have the feature information.
>>
>> A PTT parser might not be a bad thing for BioJava though. If you write
>> one please consider adding it.
>>
>> - Mark
>>
>> On Sat, Jan 31, 2009 at 10:41 AM, Hamed, Ahmed A.
>> <ahmed.elmasri at gmail.com> wrote:
>> >
>> > Hello list,
>> > I am trying to get find the start and end location of a gene in a gene
>> > sequence. I am reading from a gene FASTA database file. Is there a
>> > built-in
>> > method that I can use? The alternative is really painful since I have to
>> > parse a ptt file and not exactly working for me.
>> > Thanks very much!
>> > Ahmed
>> > _______________________________________________
>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
> --
> Ahmed Abdeen Hamed
> Scientific Informatics Project Leader
> Marine Biological Laboratory Woods Hole, MA
> --
> Ph.D. student, Complex Systems
> School of Informatics, Indiana University
>
>
>
>