From KPetrov at ics.uci.edu  Sat Nov  1 14:44:48 2003
From: KPetrov at ics.uci.edu (Kirill Petrov)
Date: Sat Nov  1 14:41:20 2003
Subject: [Biojava-l] hmm profile comparison
Message-ID: <1067715888.7337.15.camel@kirill.homedns.org>

Hello All,

is there a way to compare 2 existing hmm profiles using biojava api? 
Or probably there is another type of profiling system that allows
comparisons of the profiles rather than sequences?

Kirill

From mark.schreiber at agresearch.co.nz  Sat Nov  1 16:29:06 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Sat Nov  1 16:25:58 2003
Subject: [Biojava-l] hmm profile comparison
Message-ID: <AF026AF0FF4B054590228FD1F1DE516501BF1D1B@inbox.agresearch.co.nz>

If you want to see if two profiles have the same parameters you can get the Distributions from each state and use the DistributionTools.areEmissionSpectraEqual() method to tell you if they are the same. You should also test that the Distribution that holds the transitions is equal.
 
- Mark
 

	-----Original Message----- 
	From: Kirill Petrov [mailto:KPetrov@ics.uci.edu] 
	Sent: Sun 2/11/2003 8:44 a.m. 
	To: biojava-l@biojava.org 
	Cc: 
	Subject: [Biojava-l] hmm profile comparison
	
	
	Hello All,
	
	is there a way to compare 2 existing hmm profiles using biojava api?
	Or probably there is another type of profiling system that allows
	comparisons of the profiles rather than sequences?
	
	Kirill
	
	_______________________________________________
	Biojava-l mailing list  -  Biojava-l@biojava.org
	http://biojava.org/mailman/listinfo/biojava-l
	

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From KPetrov at ics.uci.edu  Sat Nov  1 20:12:40 2003
From: KPetrov at ics.uci.edu (Kirill Petrov)
Date: Sat Nov  1 20:09:10 2003
Subject: [Biojava-l] hmm profile comparison
In-Reply-To: <AF026AF0FF4B054590228FD1F1DE516501BF1D1B@inbox.agresearch.co.nz>
References: <AF026AF0FF4B054590228FD1F1DE516501BF1D1B@inbox.agresearch.co.nz>
Message-ID: <1067735560.7337.99.camel@kirill.homedns.org>

> 	is there a way to compare 2 existing hmm profiles using biojava api?
> 	Or probably there is another type of profiling system that allows
> 	comparisons of the profiles rather than sequences?

On Sun, 2003-11-02 at 02:29, Schreiber, Mark wrote:
> If you want to see if two profiles have the same parameters you can
> get the Distributions from each state and use the 
> DistributionTools.areEmissionSpectraEqual() method to tell you if 
> they are the same. You should also test that the Distribution that
>  holds the transitions is equal.
As far as I understand that would let me know if two profiles are equal
or not. The problem, however, is identifying the distance between 2
profiles. Basically, I would want to use the HMM for separtion of a
group of sequences into 2 distinct groups. Is that possible?

Kirill

	
From hr_malmi at hotmail.com  Sun Nov  2 22:07:42 2003
From: hr_malmi at hotmail.com (harald malming)
Date: Sun Nov  2 22:04:28 2003
Subject: [Biojava-l] Dot states in SimpleMarkovModel
Message-ID: <BAY8-F1112qgrZCThXN00019ec5@hotmail.com>

hi there, can anyone tell me if it is possible to have more than one dot 
state in a simpleMarkovModel. As soon as I add a second dot state, the : "DP 
dp=DPFactory.DEFAULT.createDP(...)";  call never completes.

I would really appreciate some help,
Harry

_________________________________________________________________
Last ned nye MSN Messenger 6.0 gratis http://www.msn.no/computing/messenger 
- Den raskeste veien mellom deg og dine venner

From matthew_pocock at yahoo.co.uk  Mon Nov  3 08:08:23 2003
From: matthew_pocock at yahoo.co.uk (Matthew Pocock)
Date: Mon Nov  3 08:11:08 2003
Subject: [Biojava-l] Dot states in SimpleMarkovModel
In-Reply-To: <BAY8-F1112qgrZCThXN00019ec5@hotmail.com>
References: <BAY8-F1112qgrZCThXN00019ec5@hotmail.com>
Message-ID: <3FA65347.6020308@yahoo.co.uk>

Hi,

This is a known (and fixed) bug on the 1.3 release. Guys, could we get 
that maintainance release out?

Matthew

harald malming wrote:

> hi there, can anyone tell me if it is possible to have more than one 
> dot state in a simpleMarkovModel. As soon as I add a second dot state, 
> the : "DP dp=DPFactory.DEFAULT.createDP(...)";  call never completes.
>
> I would really appreciate some help,
> Harry
>
> _________________________________________________________________
> Last ned nye MSN Messenger 6.0 gratis 
> http://www.msn.no/computing/messenger - Den raskeste veien mellom deg 
> og dine venner
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>


From valentin_ruano at yahoo.es  Mon Nov  3 14:55:34 2003
From: valentin_ruano at yahoo.es (=?iso-8859-1?q?Valentin=20Ruano?=)
Date: Mon Nov  3 14:52:19 2003
Subject: [Biojava-l] SequencePoster
Message-ID: <20031103195534.58456.qmail@web41902.mail.yahoo.com>

Hi everyone,

I plan to develop an small bioinformatic application
using Java involving some Swing UI. 

Firstly I would like to be able to show a single
sequence and multi sequence alignments. 

Since the sequence could be rather long, using a
multiline display, such as SequencePoster, is best.
The problem I am experiencing with SequencePoster is
that apparently It is not possible to have control on
the number of columns it displays per line, that is,
the line "desirable" length. Moreover, the line length
seems to be set in order to minimise the number of
blank position in the last line right corner. 

I try to set the maximum size for the poster component
and also its container panel but it does not stop it
from span beyond the left and right frame limits when
the line length is two big.

At last I tried to use setLines(0) for automatic line
number calculation depending on space available as
indicated in the JavaDoc API. But it just returns and
Out of Memory error. The same happens with negative
line numbers.

output:

No sequence
Fitting to sequence
Initial width: 0
alongDim (pixles needed for sequence only): 48.0
Fitting to sequence
Initial width: 0
alongDim (pixles needed for sequence only): 48.0
java.lang.OutOfMemoryError
Exception in thread "main" 

-------------
  
About the alignment issue. There is this other
JComponent for pairwise alignment so just two
sequences. What about multi sequence alignments, is
there any plans for this?


    thanks and regards, Valentin.


___________________________________________________
Yahoo! Messenger - Nueva versi?n GRATIS
Super Webcam, voz, caritas animadas, y m?s...
http://messenger.yahoo.es
From mark.schreiber at agresearch.co.nz  Mon Nov  3 17:58:23 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Mon Nov  3 18:00:59 2003
Subject: [Biojava-l] Serialization of SimpleSequences
Message-ID: <AF026AF0FF4B054590228FD1F1DE516501BF1D24@inbox.agresearch.co.nz>

This problem is now resolved in the biojava1.3 branch, Features are serializing nicely and still behaiving like features at the end of it all.
 
I'm still working on a solution in biojava-live which involves delving into the bowels of the ontology Term implementations to find out why they won't deserialize. Some problem with a HashSet throwing a wobbly when it calls hashcode() on Term$Impl and getting a null pointer exception which shouldn't be possible as far as I can tell. Matthew, do you know what might be going on? Might need to write a ReadObject method for Term$Impl to hold its hand but hopefull not.
 
- Mark
 

	-----Original Message----- 
	From: Schreiber, Mark 
	Sent: Tue 28/10/2003 3:40 p.m. 
	To: Vasa Curcin; biojava-l@biojava.org 
	Cc: 
	Subject: RE: [Biojava-l] Serialization of SimpleSequences
	
	
	Hi -
	
	I thought we had fixed that one although it turns out the unit test was a bit inadequate. Generally its not a good idea to make an interface implement serializable as there may be a perfectly valid implementation that can't implement serializable. Probably better to make as many of the implementations as possible serializable. I'll have a look at this.
	
	- Mark
	
	
	        -----Original Message-----
	        From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk]
	        Sent: Tue 28/10/2003 2:06 p.m.
	        To: biojava-l@biojava.org
	        Cc:
	        Subject: [Biojava-l] Serialization of SimpleSequences
	       
	       
	        Hello,
	       
	        While transferring some sequences via serialization, I noticed that all
	        my Features are getting lost. After some digging around, it seems as if
	        Java doesn't serialize the FeatureHolder inside the sequence (I was
	        working with SimpleSequence objects). Even though a
	        NotSerializableException is not thrown, the FeatureHolder is missing.
	        After making the FeatureHolder interface in Biojava extend Serializable,
	        the problem disappeared. How much will this change affect the rest of
	        the code - ie. is there a good reason why FeatureHolders are not
	        serializable?
	       
	        Cheers,
	        Vasa
	       
	        _______________________________________________
	        Biojava-l mailing list  -  Biojava-l@biojava.org
	        http://biojava.org/mailman/listinfo/biojava-l
	       
	
	=======================================================================
	Attention: The information contained in this message and/or attachments
	from AgResearch Limited is intended only for the persons or entities
	to which it is addressed and may contain confidential and/or privileged
	material. Any review, retransmission, dissemination or other use of, or
	taking of any action in reliance upon, this information by persons or
	entities other than the intended recipients is prohibited by AgResearch
	Limited. If you have received this message in error, please notify the
	sender immediately.
	=======================================================================
	
	_______________________________________________
	Biojava-l mailing list  -  Biojava-l@biojava.org
	http://biojava.org/mailman/listinfo/biojava-l
	

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From kalle.naslund at genpat.uu.se  Wed Nov  5 04:52:38 2003
From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=)
Date: Wed Nov  5 04:49:32 2003
Subject: [Biojava-l] SequencePoster
In-Reply-To: <20031103195534.58456.qmail@web41902.mail.yahoo.com>
References: <20031103195534.58456.qmail@web41902.mail.yahoo.com>
Message-ID: <3FA8C866.5030609@genpat.uu.se>

Valentin Ruano wrote:

>< TEXT REMOVED BY KALLE >
>  
>About the alignment issue. There is this other
>JComponent for pairwise alignment so just two
>sequences. What about multi sequence alignments, is
>there any plans for this?
>  
>

Biojava already renders multisequence alignments, please
look at

http://biojava.org/pipermail/biojava-l/2003-May/003801.html

for a description on how you can do it. Hopefully that
mailinglist entry will atleast get you started.

regards Kalle


From mark.schreiber at agresearch.co.nz  Wed Nov  5 18:05:47 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Wed Nov  5 18:02:29 2003
Subject: [Biojava-l] Nearing biojava 1.3.1 release
Message-ID: <AF026AF0FF4B054590228FD1F1DE516501BF1D2C@inbox.agresearch.co.nz>

Hi -

There have been a lot of calls for a biojava 1.3.1 maintenance release. The good news is I'm just about ready. I just need to sort out some merging of the DP code from the biojava-live version.

Before I put it out I would really like to get something resolved. Currenlty DNATools.a() == RNATools.a(). This is due to the way that the Symbols are declared in AlphabetManager.xml. I personally think this is counter intuitive and wrong. It is very easily fixable (if it needs fixing). Unless anybody disagrees I would like to make it so they are not canonical. Note that this has implications for the NUCLEOTIDE alphabet which might mean such a move is not desriable so please speak now or forever hold your peace.

There have been a lot of bug fixes and minimal API breakage, there was a little bit although it is unlikely to be noticed by most people. I don't think I will be putting out a version for Java1.3 either.

- Mark

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From vc100 at doc.ic.ac.uk  Thu Nov  6 00:53:12 2003
From: vc100 at doc.ic.ac.uk (Vasa Curcin)
Date: Thu Nov  6 00:49:14 2003
Subject: [Biojava-l] NCBI database
Message-ID: <3FA9E1C8.80607@doc.ic.ac.uk>

Hello,

It seems that the NCBI class is not working anymore. I am using it to 
retrieve some annotated sequences, and since this morning it is 
returning a MalformedURLException. It seems like the web interface to 
NCBI changed. Anyone knows something about this?

Regards,
Vasa

From mark.schreiber at agresearch.co.nz  Thu Nov  6 19:47:41 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Thu Nov  6 19:44:58 2003
Subject: [Biojava-l] NCBI database
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7C92@inbox.agresearch.co.nz>

Hi -

Just noticed that the NCBI Entrez page has a new look and is at the URL http://www.ncbi.nih.gov/gquery/gquery.fcgi

I'm not sure if this is the problem (possibly a new URL). I'll check it out.

- Mark


> -----Original Message-----
> From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] 
> Sent: Thursday, 6 November 2003 6:53 p.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] NCBI database
> 
> 
> Hello,
> 
> It seems that the NCBI class is not working anymore. I am using it to 
> retrieve some annotated sequences, and since this morning it is 
> returning a MalformedURLException. It seems like the web interface to 
> NCBI changed. Anyone knows something about this?
> 
> Regards,
> Vasa
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From mark.schreiber at agresearch.co.nz  Mon Nov 10 16:37:20 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Mon Nov 10 16:34:04 2003
Subject: [Biojava-l] Phrap ace format
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7C9C@inbox.agresearch.co.nz>

Hi -

Has anyone ever made a biojava parser for the phrap ace format?

- Mark

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From dumontier at mshri.on.ca  Mon Nov 10 22:39:07 2003
From: dumontier at mshri.on.ca (Marc Dumontier)
Date: Mon Nov 10 22:42:05 2003
Subject: [Biojava-l] BLAST through servlets 
Message-ID: <490D0AFAF3D2D3119F6C00508B6FDF1501FA465A@ex.mshri.on.ca>

Hi,

I was wondering if anyone has implemented a blast interface using servlets,
and maybe applied a stylesheet to the XML output.

If anyone has already done this and can share some source code,that would be
greatly appreciated.

thanks,
Marc Dumontier
Bioinformatics Software Developer
Blueprint Initiative
Mount Sinai Hospital
http://www.blueprint.org
(416)586-8505 x6311
 
From verhoeff2 at gis.a-star.edu.sg  Tue Nov 11 02:52:45 2003
From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans)
Date: Tue Nov 11 08:17:17 2003
Subject: [Biojava-l] BLAST parsing explodes in size
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B041C@BIONIC.biopolis.one-north.com>

Hi,

 
I am having a problem parsing huge blast results. Basically I am parsing
the blast results pretty much the same way as in "Biojava in Anger",
with as only difference that I use the setModeLazy() of the
BlastLikeSAXParser, since I am using NCBI Blast version 2.2.4 and that
version is not recognised by the parser yet.

Besides that the only difference lays in the things I do with the data.

 
The problem is that when I parse a blast result that is a few hundred
MB, for example 300MB, the java application is ballooning up to around
1.6GB of memory. Sometimes the application even crashes because I only
have got 2GB to play with.

 
Does anyone know what's causing this? Is it because I set the lazy mode?
Is there any way to work around it?

 
Kind regards,

 
Frans Verhoef

Bioinformatics Specialist

Genome Institute of Singapore

Genome, #02-01, 60 Biopolis Street, Singapore 138672

Tel: +65 6478 8000

DID: +65 6478 8060

HP: +65 9848 4325

Email: verhoeff2@gis.a-star.edu.sg

 
From kdj at sanger.ac.uk  Tue Nov 11 11:21:52 2003
From: kdj at sanger.ac.uk (Keith James)
Date: Tue Nov 11 11:21:53 2003
Subject: [Biojava-l] BLAST parsing explodes in size
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560B041C@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D560B041C@BIONIC.biopolis.one-north.com>
Message-ID: <sc48ymmkh3r.fsf@hgs2b.internal.sanger.ac.uk>

>>>>> "FV" == VERHOEF Frans <verhoeff2@gis.a-star.edu.sg> writes:

    FV> Hi, I am having a problem parsing huge blast
    FV> results. Basically I am parsing the blast results pretty much
    FV> the same way as in "Biojava in Anger", with as only difference
    FV> that I use the setModeLazy() of the BlastLikeSAXParser, since
    FV> I am using NCBI Blast version 2.2.4 and that version is not
    FV> recognised by the parser yet.

Using blast 2.2.4 or 2.2.6 is safe in lazy mode - diffs show only
minor whitespace changes in the format.

    FV> Besides that the only difference lays in the things I do with
    FV> the data.

This is likely to be the cause of the problem. See below.

    FV> The problem is that when I parse a blast result that is a few
    FV> hundred MB, for example 300MB, the java application is
    FV> ballooning up to around 1.6GB of memory. Sometimes the
    FV> application even crashes because I only have got 2GB to play
    FV> with.

The parser uses an event driven framework which is designed to handle
very big data - it will handle multi-GB reports. However, if you
create many fine-grained objects for every element of every report you
will quickly run out of resources.

    FV> Does anyone know what's causing this? Is it because I set the
    FV> lazy mode?  Is there any way to work around it?

Either you need to think about which elements of the report you are
interested in and build a filter which captures those events,
discarding the rest. See the demos/ssbind package for an example by
Matthew. Or if you really need all those objects then you should look
at allowing them to be garbage-collected as soon as possible.

It is possible that there is a bug somewhere, but without any seeing
any code it isn't possible to say much more. If you need more help,
post a short (working) piece of code illustrating the problem and we
will do our best.

hth

Keith

-- 

- Keith James <kdj@sanger.ac.uk> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -
From ralf.sigmund at ipk-gatersleben.de  Tue Nov 11 14:37:22 2003
From: ralf.sigmund at ipk-gatersleben.de (Ralf Sigmund)
Date: Tue Nov 11 14:33:53 2003
Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice)
Message-ID: <000401c3a88b$3a139bf0$7a8a5ec2@PGRC22>

I have been investigating on Solutions to accurately describe and execute
Bioinformatics analysis tasks.

I am interested in an Analysis platform which offers the following
possibilities for
	protocol based Planning,  Execution and Result-Reporting of
multistep processes
	    example: 
		find syntenious regions by comparison of ests & marker
positions from a species lacking a genomic map 
		with the completed genome map of another species.
		
		this task will be achieved by a sequence of subtasks
		each subtask will possess several degrees of freedom: 
			- filtering / choice of input data 
			- the sanitization of data
			- the choice of the algorithm
			- setting of multiple parameters, thresholds
	  
		a framework could support tasks like this in several ways:
		
		--allow unambiguous definition of the steps in a storable
format,
		  which could be exchanged with other scientists and allow
them to reproduce the experiment

		--protocol of valuable execution parameters like the
actually used dataset versions, start-, end-time points

		--allow for annotation / documentation of intermediate steps
and the presentation of these results in a repository in order to facilitate
their reuse in additional in silico experiments (possibly done by different
experimentators) 

		--allow for concurrent execution of several experimentators
optimizing the utilization of computing resources.
		
		--allow for scheduled reiteration of experiments after
source-database updates
		
		
Starting with L.Stein's commentary in Nature "Creating a Bioinformatics
Nation" and by reading the available material on the OmniGene Project one
might have guessed that Java would be an ideal Platform for a new generation
of data and task integrating Middleware Software.

However the OmniGene effort has been transferred into the non/public
corporate space and even before there was no widespread adoption of this
platform (judged by the sourceforge traffic, the lack of citations..)

Recently I discovered the BioPipe project and its accompanying publication
in Genome Research.
The project is mature, tightly integrated with Bioperl and allmost completly
fullfills the above stated requirements.
  
However BioPipe is based on Perl and now I wonder if Java would not be more
advantageous as a platform of this kind.

I will try to list the advantages of JAVA and Perl in this application below
and hope for your comments:

(1)Compared to Perl Java has advanced Object Orientation support which
allows for more transparent and modular architectures. Development tools
like Eclipse/Omodo-UML even increase this advantage. 
(2)Component Transaction Monitors like the Application Server JBOSS
(j2ee,ejb) are an ideal platform for the Management of multiple user /
multiple task scenarios. The j2ee-technology is successfully used in many
similar applications in other industries. Advanced client applications could
really benefit form Object Remoting provided by the J2ee Platform.
(3)Based on my limited knowledge the Java Platform appears to have a much
tighter (more failsafe?) incorporation of XML (XML-Schema - class binding
with JAXB) and Webservice Technologies (SOAP) (Apache Tomcat/AXIS).
(4) There are several workflow design and management tools even with graphic
editors. Integration of this j2ee based projects might allow big advantages
to this part. 

I see 2 major disadvantages for Java:
(1) bioinformatics tools are typically command line tools. The Perl on Unix
platform is the best way to invoke such tools from a program. Java's
platform independence appears to be the source for its weakness in this
field.
(2) the bioperl project has a far bigger codebase, and more contributors
than any JAVA Bioinformatics efforts like Biojava and Omnigene.

I wonder if Java will ever become a significant technology for public / open
source bioinformatics projects?
It seems like the existing headstart perl based projects now have outweighs
any advantages the Java Technology offers.
 
Thanks for your comments on this ideas...

Regards 
Ralf

---------------------------------
Dr. Ralf Sigmund
Institut f?r Pflanzengenetik
und Kulturpflanzenforschung (IPK)
Corrensstra?e 3
D-06466 Gatersleben
---------------------------------
Tel:   +49/(0)39482/5-659
Fax:   +49/(0)39482/5-595
mailto:ralf.sigmund@ipk-gatersleben.de


From mark.schreiber at agresearch.co.nz  Tue Nov 11 16:46:00 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Tue Nov 11 16:42:43 2003
Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice)
Message-ID: <AF026AF0FF4B054590228FD1F1DE516501BF1D33@inbox.agresearch.co.nz>

Hi Ralf,

I think this is an interesting proposal. I definitely think if you want to do this properly you would need to back it with J2EE technology and blend in some biojava where appropriate. We have been doing quite a bit of work recently to make biojava more able to play with j2ee, esp on the serialization side of things. These updates will be available in biojava1.3.1 which will be out soon.

I've made some more comments below.

 		
> Starting with L.Stein's commentary in Nature "Creating a 
> Bioinformatics Nation" and by reading the available material 
> on the OmniGene Project one might have guessed that Java 
> would be an ideal Platform for a new generation of data and 
> task integrating Middleware Software.
> 

I think your right. You would need something like j2ee to make it bullet proof if you envision multiple transactions with multiple clients, especially if any of them have write access to your data. This is probably beyond Perl. You could use .NET but then you are tied to one OS and you won't be able to easily use bioperl or biojava.

> However the OmniGene effort has been transferred into the 
> non/public corporate space and even before there was no 
> widespread adoption of this platform (judged by the 
> sourceforge traffic, the lack of citations..)
> 

Are you shure it's no longer open source? I'm surprised.

> Recently I discovered the BioPipe project and its 
> accompanying publication in Genome Research. The project is 
> mature, tightly integrated with Bioperl and allmost completly 
> fullfills the above stated requirements.
>   
> However BioPipe is based on Perl and now I wonder if Java 
> would not be more advantageous as a platform of this kind.
> 

BioPipe is a protocol definition. The core engine is written in Perl/ BioPerl. It may be possible to write a BioPipe engine in Java although I've thought about this and I wonder if the BioPipe schema may be a bit Perl centric. Even so if you do make a enterprise bioinformatics system based on Java then a worthy goal would be making a module that can process and execute BioPipe protocols.


> I will try to list the advantages of JAVA and Perl in this 
> application below and hope for your comments:
> 
> (1)Compared to Perl Java has advanced Object Orientation 
> support which allows for more transparent and modular 
> architectures. Development tools like Eclipse/Omodo-UML even 
> increase this advantage.

True, if you do it right.
 
> (2)Component Transaction Monitors like the Application Server JBOSS
> (j2ee,ejb) are an ideal platform for the Management of 
> multiple user / multiple task scenarios. The j2ee-technology 
> is successfully used in many similar applications in other 
> industries. Advanced client applications could really benefit 
> form Object Remoting provided by the J2ee Platform. 

Very true. I think to do it any other way would be to reinvent the wheel and cause several major headaches. This would be the strongest argument for using Java.

>(3)Based 
> on my limited knowledge the Java Platform appears to have a 
> much tighter (more failsafe?) incorporation of XML 
> (XML-Schema - class binding with JAXB) and Webservice 
> Technologies (SOAP) (Apache Tomcat/AXIS).

Also true, unfortunately the code gets a bit bloated. Compare an Axis Soap application to a Perl or Python one. Fortunately a lot of this code is biolerplate stuff that is easily autogenerated and doesn't need much maintaining.

> (4) There are several workflow design and management tools 
> even with graphic editors. Integration of this j2ee based 
> projects might allow big advantages to this part. 
> 

I don't have much experience here so can't comment

> I see 2 major disadvantages for Java:
> (1) bioinformatics tools are typically command line tools. 
> The Perl on Unix platform is the best way to invoke such 
> tools from a program. Java's platform independence appears to 
> be the source for its weakness in this field.

True but BioJava has introduced org.biojava.utils.ExecRunner classes to execute other applications which seems to perform very well. Currently it's in biojava-live. I think it should be able to be transferred to biojava 1.3.1 though.

> (2) the bioperl project has a far bigger codebase, and more 
> contributors than any JAVA Bioinformatics efforts like 
> Biojava and Omnigene.
> 

True, biojava is growing though.

> I wonder if Java will ever become a significant technology 
> for public / open source bioinformatics projects? It seems 
> like the existing headstart perl based projects now have 
> outweighs any advantages the Java Technology offers.
>  

Who knows. Almost everyone who comes through a university computer or bioinformatics program will be taught Java and possibly Perl. Java is much more attactive for industry and there have been some useful additions to biojava from industry sources. Perl has had the advantage of been a text processing language that lends itself to bioinformatics. I'm in awe of the people who use perl for large scale projects. Seems like a nightmare to me.


- Mark
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From zren at amylin.com  Tue Nov 11 17:49:54 2003
From: zren at amylin.com (Ren, Zhen)
Date: Tue Nov 11 17:58:33 2003
Subject: [Biojava-l] building javadocs failed
Message-ID: <C527D913CB1168458AE19E0FECDE58DE015769F9@api-exch-1.amylin.com>

Hi,
 
I followed the instruction at http://cvs.biojava.org/ and successfully downloaded code from the CVS repositories and later built the JAR files.  However, I failed to build javadocs by typing "ant javadocs" at the DOS command prompt.  Here is the error message:
 
C:\Program Files\biojava-live>ant javadocs
Buildfile: build.xml
 
BUILD FAILED
Target `javadocs' does not exist in this project.
 
Total time: 4 seconds
 
Thank you for your suggestion.
 
Zhen

 
From david.huen at ntlworld.com  Tue Nov 11 18:18:26 2003
From: david.huen at ntlworld.com (David Huen)
Date: Tue Nov 11 18:14:56 2003
Subject: [Biojava-l] building javadocs failed
In-Reply-To: <C527D913CB1168458AE19E0FECDE58DE015769F9@api-exch-1.amylin.com>
References: <C527D913CB1168458AE19E0FECDE58DE015769F9@api-exch-1.amylin.com>
Message-ID: <200311112318.26134.david.huen@ntlworld.com>

On Tuesday 11 Nov 2003 10:49 pm, Ren, Zhen wrote:
> Hi,
>
> I followed the instruction at http://cvs.biojava.org/ and successfully
> downloaded code from the CVS repositories and later built the JAR files. 
> However, I failed to build javadocs by typing "ant javadocs" at the DOS
> command prompt.  Here is the error message:
>
> C:\Program Files\biojava-live>ant javadocs
> Buildfile: build.xml
>
I believe that 'ant javadocs-biojava' is what works now.  There are spearate 
docs for various other items like grammars, etc.

Regards,
David

From zren at amylin.com  Tue Nov 11 18:24:36 2003
From: zren at amylin.com (Ren, Zhen)
Date: Tue Nov 11 18:21:07 2003
Subject: [Biojava-l] building javadocs failed
Message-ID: <C527D913CB1168458AE19E0FECDE58DE0109F098@api-exch-1.amylin.com>

Sorry to bug you again.  It seems still not working.  Error message:

C:\Program Files\biojava-live>
C:\Program Files\biojava-live>ant javadocs-biojava
Buildfile: build.xml

init:
     [echo] JUnit present:                   true
     [echo] JUnit supported by Ant:          true
     [echo] SableCC supported by Ant:        true

prepare:

prepare-biojava:

prepare-taglets:

compile-taglets:
    [javac] Compiling 3 source files to C:\Program Files\biojava-live\ant-build\
classes\taglets
    [javac] C:\Program Files\biojava-live\ant-build\src\taglets\Useage.java:81:
cannot resolve symbol
    [javac] symbol  : method holder ()
    [javac] location: interface com.sun.javadoc.Tag
    [javac]         sb.append(((ClassDoc) tags[0].holder()).qualifiedTypeName())
;
    [javac]                                   ^
    [javac] 1 error

BUILD FAILED
file:C:/Program Files/biojava-live/build.xml:421: Compile failed; see the compil
er error output for details.

Total time: 6 seconds

Thanks.

Zhen

-----Original Message-----
From: David Huen [mailto:david.huen@ntlworld.com]
Sent: Tuesday, November 11, 2003 3:18 PM
To: Ren, Zhen; biojava-l@biojava.org
Subject: Re: [Biojava-l] building javadocs failed


On Tuesday 11 Nov 2003 10:49 pm, Ren, Zhen wrote:
> Hi,
>
> I followed the instruction at http://cvs.biojava.org/ and successfully
> downloaded code from the CVS repositories and later built the JAR files. 
> However, I failed to build javadocs by typing "ant javadocs" at the DOS
> command prompt.  Here is the error message:
>
> C:\Program Files\biojava-live>ant javadocs
> Buildfile: build.xml
>
I believe that 'ant javadocs-biojava' is what works now.  There are spearate 
docs for various other items like grammars, etc.

Regards,
David


From verhoeff2 at gis.a-star.edu.sg  Wed Nov 12 04:37:22 2003
From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans)
Date: Wed Nov 12 04:36:28 2003
Subject: [Biojava-l] BLAST parsing explodes in size
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com>

Hi Keith,

Thanks for your response. I did paste the method that's doing the
parsing somewhere below. I also ran just now this method trying to parse
a blast output file with a size of approximately 350mb. The output
generated is this:

Before parsing: 402280
After parsing: 1043162496

With the number indicating the memory size of java in bytes. That means
that during the parsing (all biojava) the size explodes from a mere
402kb to 1gb. After that the size doesn't do much anymore.

For your information, I am using the following:
- NCBI Blast 2.2.4
- Java 1.4.2_01
- Linux 
- Biojava from cvs, last updated at 21st of October

Hopefully you will now tell me I am doing something stupid ;-)


private void parseBlastOutput(File file) throws Exception{
      Runtime r = Runtime.getRuntime();
      System.out.println("Before parsing: " +
(r.totalMemory()-r.freeMemory()));
      InputStream is = new FileInputStream(file);
      BlastLikeSAXParser parser = new BlastLikeSAXParser();
      parser.setModeLazy();
      SeqSimilarityAdapter adapter = new SeqSimilarityAdapter();
      parser.setContentHandler(adapter);
      List results = new ArrayList();
      SearchContentHandler builder = new BlastLikeSearchBuilder(results,
new DummySequenceDB("queries"), new DummySequenceDBInstallation());
      adapter.setSearchContentHandler(builder);
      parser.parse(new InputSource(is));
      
      for (Iterator i = results.iterator(); i.hasNext(); ){
         System.out.println("Iterating: " +
(r.totalMemory()-r.freeMemory()));
         SeqSimilaritySearchResult result =
(SeqSimilaritySearchResult)i.next();
         
         org.biojava.bio.Annotation anno = result.getAnnotation();
         String queryID = (String)anno.getProperty("queryId");
         String database =
this.parseNameFromDBPath((String)anno.getProperty("databaseId"));
         String lib = this.parseIDForLibrary(queryID);
         BlastSetting bsetting = null;
         if (lib!=null && database!=null) bsetting =
adaptor.fetchSetting(lib, database);
         if (lib == null || database == null || bsetting == null){
            //means no blast setting can be found for this library and
database
            System.out.println("HELP!!!!!");
            throw new Exception("Cannot find Blast Setting in database
for library " + lib + " and blastdatabase " + database);
         }
         
         File outFile = new File(destDir, queryID + ".out");
         BufferedWriter out = new BufferedWriter(new
FileWriter(outFile));
 
out.write("queryID\tqueryStart\tqueryEnd\tdatabase\tsubjectID\tsubjectSt
art\tsubjectEnd\tscore\teValue\tDescription\n");
         List hits = result.getHits();
         //System.out.println("Start writing with " + hits.size() + "
hits.");
         for (int j=0; j<hits.size(); j++){     
            SeqSimilaritySearchHit hit =
(SeqSimilaritySearchHit)hits.get(j);
            if (hit.getEValue() > bsetting.getMaxEValue()){
               
               break;
            }
            //System.out.println("HIT!!!");
            org.biojava.bio.Annotation hitAnno = hit.getAnnotation();
            String description =
hitAnno.containsProperty("subjectDescription") ?
(String)hitAnno.getProperty("subjectDescription") : "No Description";
            
            out.write(queryID + "\t");
            out.write(hit.getQueryStart() + "\t");
            out.write(hit.getQueryEnd() + "\t");
            out.write(database + "\t");
            out.write(hit.getSubjectID() + "\t");
            out.write(hit.getSubjectStart() + "\t");
            out.write(hit.getSubjectEnd() + "\t");
            out.write(hit.getScore() + "\t");
            out.write(hit.getEValue() + "\t");
            out.write(description + "\n");
            out.flush();
            hitAnno = null;description = null;hit=null;
            System.gc();
         }
         out.close();
         hits = null; out=null; outFile=null; bsetting=null; lib=null;
database=null; queryID=null; anno=null; result=null;
         System.gc();
      }
      
      file.delete();
   }


> -----Original Message-----
> From: Keith James [mailto:kdj@sanger.ac.uk]
> Sent: Wednesday, November 12, 2003 12:25 AM
> To: VERHOEF Frans
> Cc: biojava-l@biojava.org
> Subject: Re: [Biojava-l] BLAST parsing explodes in size
> 
> >>>>> "FV" == VERHOEF Frans <verhoeff2@gis.a-star.edu.sg> writes:
> 
>     FV> Hi, I am having a problem parsing huge blast
>     FV> results. Basically I am parsing the blast results pretty much
>     FV> the same way as in "Biojava in Anger", with as only difference
>     FV> that I use the setModeLazy() of the BlastLikeSAXParser, since
>     FV> I am using NCBI Blast version 2.2.4 and that version is not
>     FV> recognised by the parser yet.
> 
> Using blast 2.2.4 or 2.2.6 is safe in lazy mode - diffs show only
> minor whitespace changes in the format.
> 
>     FV> Besides that the only difference lays in the things I do with
>     FV> the data.
> 
> This is likely to be the cause of the problem. See below.
> 
>     FV> The problem is that when I parse a blast result that is a few
>     FV> hundred MB, for example 300MB, the java application is
>     FV> ballooning up to around 1.6GB of memory. Sometimes the
>     FV> application even crashes because I only have got 2GB to play
>     FV> with.
> 
> The parser uses an event driven framework which is designed to handle
> very big data - it will handle multi-GB reports. However, if you
> create many fine-grained objects for every element of every report you
> will quickly run out of resources.
> 
>     FV> Does anyone know what's causing this? Is it because I set the
>     FV> lazy mode?  Is there any way to work around it?
> 
> Either you need to think about which elements of the report you are
> interested in and build a filter which captures those events,
> discarding the rest. See the demos/ssbind package for an example by
> Matthew. Or if you really need all those objects then you should look
> at allowing them to be garbage-collected as soon as possible.
> 
> It is possible that there is a bug somewhere, but without any seeing
> any code it isn't possible to say much more. If you need more help,
> post a short (working) piece of code illustrating the problem and we
> will do our best.
> 
> hth
> 
> Keith
> 
> --
> 
> - Keith James <kdj@sanger.ac.uk> Microarray Facility, Team 65 -
> - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -

From matthew_pocock at yahoo.co.uk  Wed Nov 12 05:25:36 2003
From: matthew_pocock at yahoo.co.uk (Matthew Pocock)
Date: Wed Nov 12 05:30:45 2003
Subject: [Biojava-l] BLAST parsing explodes in size
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com>
Message-ID: <3FB20AA0.80509@yahoo.co.uk>

Morning,

I think the problem is that you are populating the results List with 
/all/ of the blast data. This means that all the data from the complete 
report must be in memory in this List. A better approach is to write an 
object to replace builder in adapter.setSearchContentHandler(builder), 
which does all the processing as the data streams in from the parser. 
This will keep memory consumption down to the bare minimum.

There is some code that does this sort of thing in demos/ssbind, and it 
may be worth scanning the code for BlastLikeSearchBuilder for ideas.

Best,

Matthew

VERHOEF Frans wrote:

>Hi Keith,
>
>Thanks for your response. I did paste the method that's doing the
>parsing somewhere below. I also ran just now this method trying to parse
>a blast output file with a size of approximately 350mb. The output
>generated is this:
>
>Before parsing: 402280
>After parsing: 1043162496
>
>With the number indicating the memory size of java in bytes. That means
>that during the parsing (all biojava) the size explodes from a mere
>402kb to 1gb. After that the size doesn't do much anymore.
>
>For your information, I am using the following:
>- NCBI Blast 2.2.4
>- Java 1.4.2_01
>- Linux 
>- Biojava from cvs, last updated at 21st of October
>
>Hopefully you will now tell me I am doing something stupid ;-)
>
>
>private void parseBlastOutput(File file) throws Exception{
>      Runtime r = Runtime.getRuntime();
>      System.out.println("Before parsing: " +
>(r.totalMemory()-r.freeMemory()));
>      InputStream is = new FileInputStream(file);
>      BlastLikeSAXParser parser = new BlastLikeSAXParser();
>      parser.setModeLazy();
>      SeqSimilarityAdapter adapter = new SeqSimilarityAdapter();
>      parser.setContentHandler(adapter);
>      List results = new ArrayList();
>      SearchContentHandler builder = new BlastLikeSearchBuilder(results,
>new DummySequenceDB("queries"), new DummySequenceDBInstallation());
>      adapter.setSearchContentHandler(builder);
>      parser.parse(new InputSource(is));
>      
>      for (Iterator i = results.iterator(); i.hasNext(); ){
>         System.out.println("Iterating: " +
>(r.totalMemory()-r.freeMemory()));
>         SeqSimilaritySearchResult result =
>(SeqSimilaritySearchResult)i.next();
>         
>         org.biojava.bio.Annotation anno = result.getAnnotation();
>         String queryID = (String)anno.getProperty("queryId");
>         String database =
>this.parseNameFromDBPath((String)anno.getProperty("databaseId"));
>         String lib = this.parseIDForLibrary(queryID);
>         BlastSetting bsetting = null;
>         if (lib!=null && database!=null) bsetting =
>adaptor.fetchSetting(lib, database);
>         if (lib == null || database == null || bsetting == null){
>            //means no blast setting can be found for this library and
>database
>            System.out.println("HELP!!!!!");
>            throw new Exception("Cannot find Blast Setting in database
>for library " + lib + " and blastdatabase " + database);
>         }
>         
>         File outFile = new File(destDir, queryID + ".out");
>         BufferedWriter out = new BufferedWriter(new
>FileWriter(outFile));
> 
>out.write("queryID\tqueryStart\tqueryEnd\tdatabase\tsubjectID\tsubjectSt
>art\tsubjectEnd\tscore\teValue\tDescription\n");
>         List hits = result.getHits();
>         //System.out.println("Start writing with " + hits.size() + "
>hits.");
>         for (int j=0; j<hits.size(); j++){     
>            SeqSimilaritySearchHit hit =
>(SeqSimilaritySearchHit)hits.get(j);
>            if (hit.getEValue() > bsetting.getMaxEValue()){
>               
>               break;
>            }
>            //System.out.println("HIT!!!");
>            org.biojava.bio.Annotation hitAnno = hit.getAnnotation();
>            String description =
>hitAnno.containsProperty("subjectDescription") ?
>(String)hitAnno.getProperty("subjectDescription") : "No Description";
>            
>            out.write(queryID + "\t");
>            out.write(hit.getQueryStart() + "\t");
>            out.write(hit.getQueryEnd() + "\t");
>            out.write(database + "\t");
>            out.write(hit.getSubjectID() + "\t");
>            out.write(hit.getSubjectStart() + "\t");
>            out.write(hit.getSubjectEnd() + "\t");
>            out.write(hit.getScore() + "\t");
>            out.write(hit.getEValue() + "\t");
>            out.write(description + "\n");
>            out.flush();
>            hitAnno = null;description = null;hit=null;
>            System.gc();
>         }
>         out.close();
>         hits = null; out=null; outFile=null; bsetting=null; lib=null;
>database=null; queryID=null; anno=null; result=null;
>         System.gc();
>      }
>      
>      file.delete();
>   }
>
>
>  
>
>>-----Original Message-----
>>From: Keith James [mailto:kdj@sanger.ac.uk]
>>Sent: Wednesday, November 12, 2003 12:25 AM
>>To: VERHOEF Frans
>>Cc: biojava-l@biojava.org
>>Subject: Re: [Biojava-l] BLAST parsing explodes in size
>>
>>    
>>
>>>>>>>"FV" == VERHOEF Frans <verhoeff2@gis.a-star.edu.sg> writes:
>>>>>>>              
>>>>>>>
>>    FV> Hi, I am having a problem parsing huge blast
>>    FV> results. Basically I am parsing the blast results pretty much
>>    FV> the same way as in "Biojava in Anger", with as only difference
>>    FV> that I use the setModeLazy() of the BlastLikeSAXParser, since
>>    FV> I am using NCBI Blast version 2.2.4 and that version is not
>>    FV> recognised by the parser yet.
>>
>>Using blast 2.2.4 or 2.2.6 is safe in lazy mode - diffs show only
>>minor whitespace changes in the format.
>>
>>    FV> Besides that the only difference lays in the things I do with
>>    FV> the data.
>>
>>This is likely to be the cause of the problem. See below.
>>
>>    FV> The problem is that when I parse a blast result that is a few
>>    FV> hundred MB, for example 300MB, the java application is
>>    FV> ballooning up to around 1.6GB of memory. Sometimes the
>>    FV> application even crashes because I only have got 2GB to play
>>    FV> with.
>>
>>The parser uses an event driven framework which is designed to handle
>>very big data - it will handle multi-GB reports. However, if you
>>create many fine-grained objects for every element of every report you
>>will quickly run out of resources.
>>
>>    FV> Does anyone know what's causing this? Is it because I set the
>>    FV> lazy mode?  Is there any way to work around it?
>>
>>Either you need to think about which elements of the report you are
>>interested in and build a filter which captures those events,
>>discarding the rest. See the demos/ssbind package for an example by
>>Matthew. Or if you really need all those objects then you should look
>>at allowing them to be garbage-collected as soon as possible.
>>
>>It is possible that there is a bug somewhere, but without any seeing
>>any code it isn't possible to say much more. If you need more help,
>>post a short (working) piece of code illustrating the problem and we
>>will do our best.
>>
>>hth
>>
>>Keith
>>
>>--
>>
>>- Keith James <kdj@sanger.ac.uk> Microarray Facility, Team 65 -
>>- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -
>>    
>>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>  
>


From matthew_pocock at yahoo.co.uk  Wed Nov 12 05:30:21 2003
From: matthew_pocock at yahoo.co.uk (Matthew Pocock)
Date: Wed Nov 12 05:35:21 2003
Subject: [Biojava-l] building javadocs failed
In-Reply-To: <C527D913CB1168458AE19E0FECDE58DE0109F098@api-exch-1.amylin.com>
References: <C527D913CB1168458AE19E0FECDE58DE0109F098@api-exch-1.amylin.com>
Message-ID: <3FB20BBD.3000901@yahoo.co.uk>

Hi,

I think this is my bad - an incompattibility between the taglets of java 
1.4.2 and those before. You can safely nuke this class if it's causing 
problems.

Matthew

Ren, Zhen wrote:

>Sorry to bug you again.  It seems still not working.  Error message:
>
>C:\Program Files\biojava-live>
>C:\Program Files\biojava-live>ant javadocs-biojava
>Buildfile: build.xml
>
>init:
>     [echo] JUnit present:                   true
>     [echo] JUnit supported by Ant:          true
>     [echo] SableCC supported by Ant:        true
>
>prepare:
>
>prepare-biojava:
>
>prepare-taglets:
>
>compile-taglets:
>    [javac] Compiling 3 source files to C:\Program Files\biojava-live\ant-build\
>classes\taglets
>    [javac] C:\Program Files\biojava-live\ant-build\src\taglets\Useage.java:81:
>cannot resolve symbol
>    [javac] symbol  : method holder ()
>    [javac] location: interface com.sun.javadoc.Tag
>    [javac]         sb.append(((ClassDoc) tags[0].holder()).qualifiedTypeName())
>;
>    [javac]                                   ^
>    [javac] 1 error
>
>BUILD FAILED
>file:C:/Program Files/biojava-live/build.xml:421: Compile failed; see the compil
>er error output for details.
>
>Total time: 6 seconds
>
>Thanks.
>
>Zhen
>
>-----Original Message-----
>From: David Huen [mailto:david.huen@ntlworld.com]
>Sent: Tuesday, November 11, 2003 3:18 PM
>To: Ren, Zhen; biojava-l@biojava.org
>Subject: Re: [Biojava-l] building javadocs failed
>
>
>On Tuesday 11 Nov 2003 10:49 pm, Ren, Zhen wrote:
>  
>
>>Hi,
>>
>>I followed the instruction at http://cvs.biojava.org/ and successfully
>>downloaded code from the CVS repositories and later built the JAR files. 
>>However, I failed to build javadocs by typing "ant javadocs" at the DOS
>>command prompt.  Here is the error message:
>>
>>C:\Program Files\biojava-live>ant javadocs
>>Buildfile: build.xml
>>
>>    
>>
>I believe that 'ant javadocs-biojava' is what works now.  There are spearate 
>docs for various other items like grammars, etc.
>
>Regards,
>David
>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>  
>


From kdj at sanger.ac.uk  Wed Nov 12 05:40:26 2003
From: kdj at sanger.ac.uk (Keith James)
Date: Wed Nov 12 05:40:27 2003
Subject: [Biojava-l] BLAST parsing explodes in size
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D560B041F@BIONIC.biopolis.one-north.com>
Message-ID: <sc4brrhyihe.fsf@hgs2e.internal.sanger.ac.uk>

>>>>> " " == VERHOEF Frans <verhoeff2@gis.a-star.edu.sg> writes:

     > Hi Keith, Thanks for your response. I did paste the method
     > that's doing the parsing somewhere below. I also ran just now
     > this method trying to parse a blast output file with a size of
     > approximately 350mb. The output generated is this:

     > Before parsing: 402280 After parsing: 1043162496

     > With the number indicating the memory size of java in
     > bytes. That means that during the parsing (all biojava) the
     > size explodes from a mere 402kb to 1gb. After that the size
     > doesn't do much anymore.

A report of 350mb is sufficient to generate a lot of objects if you
fully represent all hits, HSPs, alignments and annotation.

At the top of your method you create a list to contain all your
results:

 List results = new ArrayList();

and pass it to the builder. Although you make a couple of System.gc()
calls further down they are not addressing the cause of the problem -
this list is still in scope and objects within it cannot be garbage
collected. As the BlastLikeSearchBuilder stores its results in a List
in this way is not appropriate for your situation.

This is the same as choosing whether to parse XML using SAX or DOM -
only use DOM if you can afford to have the whole lot in memory at
once.

The data you are saving in your output file are taken from a very
small subset of the objects being created (so you are not using most
of them). You need to extend the event-driven way of handling the data
from the SAXContentHandler right through the SearchContentHandler and
up to the point where you write to your file. Don't collect everything
as objects before you write.

There is a working example in demos/ssbind (ProcessBlastReport) of
using this event and filtering approach.

Keith

-- 

- Keith James <kdj@sanger.ac.uk> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -
From tmo at ebi.ac.uk  Wed Nov 12 06:58:27 2003
From: tmo at ebi.ac.uk (Tom Oinn)
Date: Wed Nov 12 06:50:01 2003
Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice)
References: <AF026AF0FF4B054590228FD1F1DE516501BF1D33@inbox.agresearch.co.nz>
Message-ID: <3FB22063.90601@ebi.ac.uk>

Hi Ralf, Mark, all...

Are you aware of our project, Taverna? We're working with various 
groups, mostly up at Newcastle and Manchester but also here at the EBI 
to provide workflow based technology for bioinformatics. Specifically, 
we have a system, Soaplab, that can wrap arbitrary command line tools as 
services (currently applies to all the EMBOSS tools but we can easily 
extend it), courtesy of Martin Senger's work here, and the Taverna 
project itself which allows users to create workflows out of both 
Soaplab's services and arbitrary SOAP based web services (we could add 
Corba, OGSA, RMI etc if needed). It's open source (LGPL) and on 
sourceforge (taverna.sf.net), and is in use 'in anger' in several 
complex bioinformatics analysis projects.

May I humbly request that you take a look before writing something 
similar, and if possible join our development effort?

Matthew - you're both on this list and working up at Newcastle, does 
this seem reasonable? I'll be up in a few weeks to talk to the 
biologists, perhaps we could get together over a drink or several and 
see how Taverna and Biojava could play together?

Cheers,

Tom

From matthew_pocock at yahoo.co.uk  Wed Nov 12 08:18:57 2003
From: matthew_pocock at yahoo.co.uk (Matthew Pocock)
Date: Wed Nov 12 08:23:53 2003
Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice)
In-Reply-To: <3FB22063.90601@ebi.ac.uk>
References: <AF026AF0FF4B054590228FD1F1DE516501BF1D33@inbox.agresearch.co.nz>
	<3FB22063.90601@ebi.ac.uk>
Message-ID: <3FB23341.3090203@yahoo.co.uk>

Tom Oinn wrote:

> Matthew - you're both on this list and working up at Newcastle, does 
> this seem reasonable?

Yes. Very.

> I'll be up in a few weeks to talk to the biologists, perhaps we could 
> get together over a drink or several and see how Taverna and Biojava 
> could play together? 

We should sort something out. On a related note, I'm currently writing 
AXIS web services for biojava sequence & feature objects, which should 
reduce the overhead of this kind of thing a bit.

Matthew

>
>
> Cheers,
>
> Tom
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>


From tmo at ebi.ac.uk  Wed Nov 12 08:48:02 2003
From: tmo at ebi.ac.uk (Tom Oinn)
Date: Wed Nov 12 08:39:30 2003
Subject: [Biojava-l] Job / Task Scheduler for Biojava (Webservice)
References: <AF026AF0FF4B054590228FD1F1DE516501BF1D33@inbox.agresearch.co.nz>
	<3FB22063.90601@ebi.ac.uk> <3FB23341.3090203@yahoo.co.uk>
Message-ID: <3FB23A12.7050308@ebi.ac.uk>


Matthew Pocock wrote:
> Tom Oinn wrote:
> 
>> Matthew - you're both on this list and working up at Newcastle, does 
>> this seem reasonable?
> 
> Yes. Very.

Let's hope the BBSRC agree :)

>> I'll be up in a few weeks to talk to the biologists, perhaps we could 
>> get together over a drink or several and see how Taverna and Biojava 
>> could play together? 
> 
> 
> We should sort something out. On a related note, I'm currently writing 
> AXIS web services for biojava sequence & feature objects, which should 
> reduce the overhead of this kind of thing a bit.

Fantastic, we're also very interested in service interfaces to the DAS 
systems (working with the EnsEMBL guys next door from time to time on 
that one). We have some constraints on what kinds of service we can 
consume, basically it boils down to 'don't use complex types in axis', 
but there are some exceptions (collection types are fine). I'm assuming 
you've followed my various rants on the axis user list as to exactly 
why, but we've fallen into the pattern of passing XML documents around 
as strings, so our toolkit doesn't need to know anything about the data 
at that level and yet we retain the structured information where 
possible. We believe there is no good reason why a web service tookit 
should comprehend the structure of the sequence object, for example, 
flowing through it.

Biojava people - please download Taverna and have a play with it, the 
'windows' build is not particularly well named, it's actually all java, 
you'll just need to have 'dot' from graphviz installed and on your path 
and everything will work. We'll be releasing beta7 for macosX and 
hopefully both redhat and debian as well, just to make things a little 
more convenient. We have a user mailing list which might be worth 
subscribing to if this is of any interest, links from taverna.sf.net

Cheers,

Tom

From verhoeff2 at gis.a-star.edu.sg  Wed Nov 12 20:08:37 2003
From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans)
Date: Wed Nov 12 20:07:33 2003
Subject: [Biojava-l] BLAST parsing explodes in size
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0420@BIONIC.biopolis.one-north.com>

Hi,

Thanks for the suggestions. I am quite new in the world of Biojava and
basically what I did was copy the example in Biojava in anger and adapt
it to my needs. It seems I now have to adapt it a little more.

One more question. If the blast output is already in XML, how would you
go about it in Biojava?

Kind regards,
Frans

From verhoeff2 at gis.a-star.edu.sg  Thu Nov 13 03:39:08 2003
From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans)
Date: Thu Nov 13 03:38:04 2003
Subject: [Biojava-l] BLAST parsing explodes in size
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0421@BIONIC.biopolis.one-north.com>

Thank you guys!

I now have a great looking solution which is lean and fast.
I am definitely a biojava fan.

Regards,
Frans

From vc100 at doc.ic.ac.uk  Tue Nov 18 05:27:53 2003
From: vc100 at doc.ic.ac.uk (Vasa Curcin)
Date: Tue Nov 18 05:23:40 2003
Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot
Message-ID: <3FB9F429.8030404@doc.ic.ac.uk>

Hello,

There seems to be some problem with serializing SequenceDB objects 
obtained from SwissProtDatabase. The error is:

java.io.WriteAbortedException: writing aborted; java.io.NotSeria
lizableException: 
org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre
n
        at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:324)
        at java.util.HashSet.readObject(HashSet.java:272)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.

I am using Biojava 1.30, with Mark's patches from a few weeks back. 
Anyone has an idea?

Regards,
Vasa

From mark.schreiber at agresearch.co.nz  Tue Nov 18 15:14:48 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Tue Nov 18 15:11:39 2003
Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7CBC@inbox.agresearch.co.nz>

Hi -

I'm not sure serializing an entire SequenceDB is a good idea, however, can you tell me if the serialization is failing on the DB or one of the sequences in it?

- Mark


> -----Original Message-----
> From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] 
> Sent: Tuesday, 18 November 2003 11:28 p.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] Serialization of SequenceDB obtained 
> from Swiss-Prot
> 
> 
> Hello,
> 
> There seems to be some problem with serializing SequenceDB objects 
> obtained from SwissProtDatabase. The error is:
> 
> java.io.WriteAbortedException: writing aborted; java.io.NotSeria
> lizableException: 
> org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre
> n
>         at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278)
>         at 
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:324)
>         at java.util.HashSet.readObject(HashSet.java:272)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> 
> I am using Biojava 1.30, with Mark's patches from a few weeks back. 
> Anyone has an idea?
> 
> Regards,
> Vasa
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From phirnee123 at yahoo.com  Wed Nov 19 09:10:14 2003
From: phirnee123 at yahoo.com (bharani kumar)
Date: Wed Nov 19 09:06:27 2003
Subject: [Biojava-l] data cleansing scoring functions
Message-ID: <20031119141014.64425.qmail@web13405.mail.yahoo.com>

hello everybody,

we r involved in building a protein docking software
and i would need a suggestion of urs.in this we  r
taking into account 20 scoring functions like
hydrophobicity and stuffs like thatand after that
combine all the scoring functions  to get a optimised
total and plot it against the RMSD of various
conformations resulted by orientation of one protein
over  the other rotationally and translationally.

 Now my question is that does all these 20 scoring
functions are equally important.certainly not.so the
data has to be cleansed and finally i hope we wouild
be left with certain limited number of scoring
functions like 12 or 13.

      so what would be the best way to clean the
data(the scoring  functions).One of my supervisor
suggested that it could be done using matlab by
applying PCA.

      In this regard i need ur  suggestion.

=====
***********************************************************************************

"The secret of success is to know something nobody else knows." 

 
BHARANI KUMAR.P.S 
CUBIC, 
UNIVERSIT�T ZU K�LN, 
Z�lpicher  Str. 47 
50674 K�ln 
Germany 
Fon        +49 221 7212018, +49 176 21000597
phirnee123@yahoo.com 
************************************************************


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
From vc100 at doc.ic.ac.uk  Wed Nov 19 12:46:59 2003
From: vc100 at doc.ic.ac.uk (Vasa Curcin)
Date: Wed Nov 19 12:43:05 2003
Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot
References: <AF026AF0FF4B054590228FD1F1DE5165016E7CBC@inbox.agresearch.co.nz>
Message-ID: <3FBBAC93.4020004@doc.ic.ac.uk>

Hi,

I am still investigating when exactly the problem with writing out the 
object occurs, but this may be related. Here, I am returning matches 
from a SwissProt search from the server to the client. The object is a 
SequenceDB obtained from SwissProt and the entry has the following line:

FT   INIT_MET      0      0       BY SIMILARITY.


This is the exact exception:

17:32:29,654 ERROR [STDERR] got data from 
http://us.expasy.org/cgi-bin/get-sprot
-raw.pl?143B_MOUSE
17:32:30,605 ERROR [STDERR] java.lang.IllegalArgumentException: Location 
0 is ou
tside 1..245
        at 
org.biojava.bio.seq.impl.SimpleFeature.<init>(SimpleFeature.java:306)

        at 
sun.reflect.GeneratedConstructorAccessor85.newInstance(Unknown Source
)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
onstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at 
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple
FeatureRealizer.java:138)
rethrown as org.biojava.bio.BioException: Couldn't realize feature
        at 
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple
FeatureRealizer.java:144)
        at 
org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatur
eRealizer.java:94)
        at 
org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence
.java:198)
        at 
org.biojava.bio.seq.impl.SimpleSequence.createFeature(SimpleSequence.
java:204)
        at 
org.biojava.bio.seq.io.SequenceBuilderBase.makeSequence(SequenceBuild
erBase.java:168)
        at 
org.biojava.bio.seq.io.SmartSequenceBuilder.makeSequence(SmartSequenc
eBuilder.java:87)
        at 
org.biojava.bio.seq.io.SequenceBuilderFilter.makeSequence(SequenceBui
lderFilter.java:98)
        at 
org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:10
1)
        at 
org.biojava.bio.seq.db.SwissprotSequenceDB.getSequence(SwissprotSeque
nceDB.java:93)

Is this 0, 0 location common in Swiss-Prot entries? It seems the 
serialization is failing only on those entries which have this feature.

Regards,
Vasa

Schreiber, Mark wrote:

>Hi -
>
>I'm not sure serializing an entire SequenceDB is a good idea, however, can you tell me if the serialization is failing on the DB or one of the sequences in it?
>
>- Mark
>
>
>  
>
>>-----Original Message-----
>>From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] 
>>Sent: Tuesday, 18 November 2003 11:28 p.m.
>>To: biojava-l@biojava.org
>>Subject: [Biojava-l] Serialization of SequenceDB obtained 
>>from Swiss-Prot
>>
>>
>>Hello,
>>
>>There seems to be some problem with serializing SequenceDB objects 
>>obtained from SwissProtDatabase. The error is:
>>
>>java.io.WriteAbortedException: writing aborted; java.io.NotSeria
>>lizableException: 
>>org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre
>>n
>>        at 
>>java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278)
>>        at 
>>java.io.ObjectInputStream.readObject(ObjectInputStream.java:324)
>>        at java.util.HashSet.readObject(HashSet.java:272)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at 
>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>>
>>I am using Biojava 1.30, with Mark's patches from a few weeks back. 
>>Anyone has an idea?
>>
>>Regards,
>>Vasa
>>
>>_______________________________________________
>>Biojava-l mailing list  -  Biojava-l@biojava.org 
>>http://biojava.org/mailman/listinfo/biojava-l
>>
>>    
>>
>=======================================================================
>Attention: The information contained in this message and/or attachments
>from AgResearch Limited is intended only for the persons or entities
>to which it is addressed and may contain confidential and/or privileged
>material. Any review, retransmission, dissemination or other use of, or
>taking of any action in reliance upon, this information by persons or
>entities other than the intended recipients is prohibited by AgResearch
>Limited. If you have received this message in error, please notify the
>sender immediately.
>=======================================================================
>  
>


From matthew_pocock at yahoo.co.uk  Wed Nov 19 13:30:39 2003
From: matthew_pocock at yahoo.co.uk (Matthew Pocock)
Date: Wed Nov 19 13:37:36 2003
Subject: [Biojava-l] Serialization of SequenceDB obtained from Swiss-Prot
In-Reply-To: <3FBBAC93.4020004@doc.ic.ac.uk>
References: <AF026AF0FF4B054590228FD1F1DE5165016E7CBC@inbox.agresearch.co.nz>
	<3FBBAC93.4020004@doc.ic.ac.uk>
Message-ID: <3FBBB6CF.7090101@yahoo.co.uk>

I think the sp parser needs to special-case this - the location is meant 
to have the semantics of being 'before' the sequence starts (I think). 
Could someone brave fix this special case in the SP parser? Perhaps a 
fuzzy range, with both < 1? Also the sp file writer will need modifying 
to round-trip this.

Grr.

Matthew

Vasa Curcin wrote:

> Hi,
>
> I am still investigating when exactly the problem with writing out the 
> object occurs, but this may be related. Here, I am returning matches 
> from a SwissProt search from the server to the client. The object is a 
> SequenceDB obtained from SwissProt and the entry has the following line:
>
> FT   INIT_MET      0      0       BY SIMILARITY.
>
>
> This is the exact exception:
>
> 17:32:29,654 ERROR [STDERR] got data from 
> http://us.expasy.org/cgi-bin/get-sprot
> -raw.pl?143B_MOUSE
> 17:32:30,605 ERROR [STDERR] java.lang.IllegalArgumentException: 
> Location 0 is ou
> tside 1..245
>        at 
> org.biojava.bio.seq.impl.SimpleFeature.<init>(SimpleFeature.java:306)
>
>        at 
> sun.reflect.GeneratedConstructorAccessor85.newInstance(Unknown Source
> )
>        at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
> onstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
>        at 
> org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple
> FeatureRealizer.java:138)
> rethrown as org.biojava.bio.BioException: Couldn't realize feature
>        at 
> org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(Simple
> FeatureRealizer.java:144)
>        at 
> org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatur
> eRealizer.java:94)
>        at 
> org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence
> .java:198)
>        at 
> org.biojava.bio.seq.impl.SimpleSequence.createFeature(SimpleSequence.
> java:204)
>        at 
> org.biojava.bio.seq.io.SequenceBuilderBase.makeSequence(SequenceBuild
> erBase.java:168)
>        at 
> org.biojava.bio.seq.io.SmartSequenceBuilder.makeSequence(SmartSequenc
> eBuilder.java:87)
>        at 
> org.biojava.bio.seq.io.SequenceBuilderFilter.makeSequence(SequenceBui
> lderFilter.java:98)
>        at 
> org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:10
> 1)
>        at 
> org.biojava.bio.seq.db.SwissprotSequenceDB.getSequence(SwissprotSeque
> nceDB.java:93)
>
> Is this 0, 0 location common in Swiss-Prot entries? It seems the 
> serialization is failing only on those entries which have this feature.
>
> Regards,
> Vasa
>
> Schreiber, Mark wrote:
>
>> Hi -
>>
>> I'm not sure serializing an entire SequenceDB is a good idea, 
>> however, can you tell me if the serialization is failing on the DB or 
>> one of the sequences in it?
>>
>> - Mark
>>
>>
>>  
>>
>>> -----Original Message-----
>>> From: Vasa Curcin [mailto:vc100@doc.ic.ac.uk] Sent: Tuesday, 18 
>>> November 2003 11:28 p.m.
>>> To: biojava-l@biojava.org
>>> Subject: [Biojava-l] Serialization of SequenceDB obtained from 
>>> Swiss-Prot
>>>
>>>
>>> Hello,
>>>
>>> There seems to be some problem with serializing SequenceDB objects 
>>> obtained from SwissProtDatabase. The error is:
>>>
>>> java.io.WriteAbortedException: writing aborted; java.io.NotSeria
>>> lizableException: 
>>> org.biojava.bio.seq.io.SequenceBuilderBase$TemplateWithChildre
>>> n
>>>        at 
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1278)
>>>        at 
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:324)
>>>        at java.util.HashSet.readObject(HashSet.java:272)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>>>
>>> I am using Biojava 1.30, with Mark's patches from a few weeks back. 
>>> Anyone has an idea?
>>>
>>> Regards,
>>> Vasa
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l@biojava.org 
>>> http://biojava.org/mailman/listinfo/biojava-l
>>>
>>>   
>>
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>  
>>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>


From dumontier at mshri.on.ca  Wed Nov 19 14:21:50 2003
From: dumontier at mshri.on.ca (Marc Dumontier)
Date: Wed Nov 19 14:24:46 2003
Subject: [Biojava-l] blast2html
Message-ID: <490D0AFAF3D2D3119F6C00508B6FDF1501FA4669@ex.mshri.on.ca>

hi,

I'm trying to modify the Blasr2HTML code in
org.biojava.bio.program.blast2html to add some links to my blast output.

In HTMLRenderer, I'm trying to add a link to each row in my summary.
The variable oHitSummary.oHitId.id contains the accession..well something
like (ref|NP_011554.1|) , I was wondering if the Blast2HTMLHandler saves GI
information, since the link i need to create needs the GI as the argument.

Thanks,
Marc
From bioinformatics4suman at yahoo.com  Thu Nov 20 01:12:26 2003
From: bioinformatics4suman at yahoo.com (Suman Kanuganti)
Date: Thu Nov 20 01:08:38 2003
Subject: [Biojava-l] MassCalc question.
Message-ID: <20031120061226.55042.qmail@web60107.mail.yahoo.com>

Ok; I am having problem with using MassCalc class. It
always an IllegalSymbolException though the symbol
list is correct. 
I have written this,

   SequenceIterator iter =
MySeqTools.myReadFastaAA(args[0]);

    while(iter.hasNext()){
      Sequence seq = iter.nextSequence();
      SymbolList syml = (SymbolList)seq;
      System.out.println(syml.seqString());
      MassCalc mCalc = new
MassCalc(SymbolPropertyTable.MONO_MASS, true);
      double mass = mCalc.getMass(syml);
      System.out.println("Mass:
"+seq.getName()+"\t"+mass);
    }


while result in error

Exception in thread "main"
org.biojava.bio.symbol.IllegalSymbolException: The
SymbolList was not using the protein alphabet

Any one can help me with this,

Thanks,
Suman K

=====
Suman K
BioInformatics Associate,
Genomics Research,
Newton Lab,
University of Missouri - Columbia.

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/
From rh4552000 at yahoo.co.uk  Thu Nov 20 04:59:08 2003
From: rh4552000 at yahoo.co.uk (=?iso-8859-1?q?Rich=20Heath?=)
Date: Thu Nov 20 04:55:26 2003
Subject: [Biojava-l] File formats
Message-ID: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com>

Hi, 

I am a software developer based in the UK that has
been asked about producing a piece of software that
outputs data from the files in ABI sequencers in a
more human readable format. I hope the
org.biojava.bio.program.abi package will let me do
this, but I have some concerns about the legal
implications of using and contributing to this
package. 

Does anyone know what the legal position is with
regards reverse engineering the Applied Biosystems
file format (and any other file formats come to that
matter)? I would imagine this file format is the
property of Applied Biosystems and they would not like
me producing applications that read from it unless I
provide them with a sizable licence fee (although I
guess I am not reverse engineering it if I just use
the above package, just if I contribute to it?). 

Many thanks in advance for your help, 

Rich


________________________________________________________________________
Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://mail.messenger.yahoo.co.uk
From colin.hardman at cambridgeAntibody.com  Thu Nov 20 06:49:31 2003
From: colin.hardman at cambridgeAntibody.com (Colin Hardman)
Date: Thu Nov 20 06:45:42 2003
Subject: [Biojava-l] blast2html
References: <490D0AFAF3D2D3119F6C00508B6FDF1501FA4669@ex.mshri.on.ca>
Message-ID: <3FBCAA47.4A6E06B4@cambridgeAntibody.com>


Marc,

    As I remember it the summary line from the blast output is split on white
space with the first token put into hitid - this will make it's way into
oHitSummary.oHitId.id in Blast2HTML

If you want to change this then you need to implement your own
SummaryLineHelperIF in org.biojava.bio.program.sax, but I don't think you will
need to.

As the actual format of this line depends on how, and from what source, you
built the blast indexes the HTMLRenderer delegates the link generation to the
DatabaseURLGenerator interface.
If your blast result has a summary line line the following

?gi|4557284|ref|NM_000646.1|[4557284]  some text description.....
eg from http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html

you are going to need to parse it in your own DatabaseURLGenerator -
HTMLRenderer gets hold of these using a URLGeneratorFactory to get the list -
ths first returned in the list is used to create the link in the summary, the
others ( if they exist ) are added as extra links in the details.

for an example look at NcbiDatabaseURLGenerator & DefaultURLGeneratorFactory in
org.biojava.bio.program.blast2html

If you write one to parse the above format it might be useful to add it to the
repository - even make it the default.

Hope that helps,

Colin Hardman

Marc Dumontier wrote:

> hi,
>
> I'm trying to modify the Blasr2HTML code in
> org.biojava.bio.program.blast2html to add some links to my blast output.
>
> In HTMLRenderer, I'm trying to add a link to each row in my summary.
> The variable oHitSummary.oHitId.id contains the accession..well something
> like (ref|NP_011554.1|) , I was wondering if the Blast2HTMLHandler saves GI
> information, since the link i need to create needs the GI as the argument.
>
> Thanks,
> Marc
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l

From matthew_pocock at yahoo.co.uk  Thu Nov 20 06:55:00 2003
From: matthew_pocock at yahoo.co.uk (Matthew Pocock)
Date: Thu Nov 20 07:02:07 2003
Subject: [Biojava-l] File formats
In-Reply-To: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com>
References: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com>
Message-ID: <3FBCAB94.1050506@yahoo.co.uk>

Hi Rich,

We should check this out. This is one of the bizar things about digital 
IP right now - the data in the abi file is obviosly yours, but 
potentially you are not alowed to access it in non-blessed ways because 
the encoding is proprietary. I have a feeling that we would have been in 
trouble if our code was based upon their serializer/deserializer code 
(which it is not) due to copyright issues. SW pattents don't work in the 
EU/UK (yet). Further than that I don't know. Oh, and IANAL.

Matthew

(goes to speak with someone who may know more)

Rich Heath wrote:

>Hi, 
>
>I am a software developer based in the UK that has
>been asked about producing a piece of software that
>outputs data from the files in ABI sequencers in a
>more human readable format. I hope the
>org.biojava.bio.program.abi package will let me do
>this, but I have some concerns about the legal
>implications of using and contributing to this
>package. 
>
>Does anyone know what the legal position is with
>regards reverse engineering the Applied Biosystems
>file format (and any other file formats come to that
>matter)? I would imagine this file format is the
>property of Applied Biosystems and they would not like
>me producing applications that read from it unless I
>provide them with a sizable licence fee (although I
>guess I am not reverse engineering it if I just use
>the above package, just if I contribute to it?). 
>
>Many thanks in advance for your help, 
>
>Rich
>
>
>
>________________________________________________________________________
>Want to chat instantly with your online friends?  Get the FREE Yahoo!
>Messenger http://mail.messenger.yahoo.co.uk
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>  
>


From dumontier at mshri.on.ca  Thu Nov 20 09:41:07 2003
From: dumontier at mshri.on.ca (Marc Dumontier)
Date: Thu Nov 20 09:44:02 2003
Subject: [Biojava-l] blast2html
Message-ID: <490D0AFAF3D2D3119F6C00508B6FDF1501FA466B@ex.mshri.on.ca>

Hey,

Thanks for your input

I found the route of less resistance was to add the -I parameter when
running blast, which will then include the gi from the original fasta.

I then just parse out the GI to include in my link

Marc


-----Original Message-----
From: Colin Hardman
To: Marc Dumontier
Cc: 'biojava-l@biojava.org'
Sent: 11/20/03 6:49 AM
Subject: Re: [Biojava-l] blast2html


Marc,

    As I remember it the summary line from the blast output is split on
white
space with the first token put into hitid - this will make it's way into
oHitSummary.oHitId.id in Blast2HTML

If you want to change this then you need to implement your own
SummaryLineHelperIF in org.biojava.bio.program.sax, but I don't think
you will
need to.

As the actual format of this line depends on how, and from what source,
you
built the blast indexes the HTMLRenderer delegates the link generation
to the
DatabaseURLGenerator interface.
If your blast result has a summary line line the following

 gi|4557284|ref|NM_000646.1|[4557284]  some text description.....
eg from http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html

you are going to need to parse it in your own DatabaseURLGenerator -
HTMLRenderer gets hold of these using a URLGeneratorFactory to get the
list -
ths first returned in the list is used to create the link in the
summary, the
others ( if they exist ) are added as extra links in the details.

for an example look at NcbiDatabaseURLGenerator &
DefaultURLGeneratorFactory in
org.biojava.bio.program.blast2html

If you write one to parse the above format it might be useful to add it
to the
repository - even make it the default.

Hope that helps,

Colin Hardman

Marc Dumontier wrote:

> hi,
>
> I'm trying to modify the Blasr2HTML code in
> org.biojava.bio.program.blast2html to add some links to my blast
output.
>
> In HTMLRenderer, I'm trying to add a link to each row in my summary.
> The variable oHitSummary.oHitId.id contains the accession..well
something
> like (ref|NP_011554.1|) , I was wondering if the Blast2HTMLHandler
saves GI
> information, since the link i need to create needs the GI as the
argument.
>
> Thanks,
> Marc
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
From rhett-sutphin at uiowa.edu  Thu Nov 20 10:05:46 2003
From: rhett-sutphin at uiowa.edu (Rhett Sutphin)
Date: Thu Nov 20 09:58:19 2003
Subject: [Biojava-l] File formats
In-Reply-To: <3FBCAB94.1050506@yahoo.co.uk>
References: <20031120095908.58849.qmail@web25207.mail.ukl.yahoo.com>
	<3FBCAB94.1050506@yahoo.co.uk>
Message-ID: <3FBCD84A.3080803@uiowa.edu>

Hi Rich, Matthew,

I am also not a lawyer.  That in mind, here's my understanding of the topic:

First, the org.biojava.bio.program.abi package is based on a paper by Clark Tibbetts (available online: http://www-2.cs.cmu.edu/afs/cs/project/genome/WWW/Papers/clark.html ).  That paper was published in August 1995 and is a fairly thorough technical description of the ABI 377, including its means of operation, communication protocols, and (of course) data files.  I am unaware of any legal action taken against him or Vanderbilt (his apparent employer at the time).  And, of course, the paper remains available.

Second, the Staden io_lib library can read ABI-formatted chromatograms.  I am unaware of any legal action against its makers and it is currently available (and has been for a while).

Third (and this is, again, just my understanding of current US law), reverse engineering for interoperability is legal.  The only area where this is not true is if the material is (a) copyrighted and (b) protected by an "access control."  If these conditions are met, then the material falls under that most unpleasant of IP laws, the DMCA.  However, (a) whoever wants to read the ABI files with your software will probably own the copyright to them (if they are even copyrightable -- they might just be lists of facts and hence uncopyrightable in the US); and (b) I don't think a proprietary file format rises to the level of an "access control."  An ABI file isn't encrypted -- you just have to know what offsets from which to read the bytes.

Rhett

Matthew Pocock wrote:
> Hi Rich,
> 
> We should check this out. This is one of the bizar things about digital 
> IP right now - the data in the abi file is obviosly yours, but 
> potentially you are not alowed to access it in non-blessed ways because 
> the encoding is proprietary. I have a feeling that we would have been in 
> trouble if our code was based upon their serializer/deserializer code 
> (which it is not) due to copyright issues. SW pattents don't work in the 
> EU/UK (yet). Further than that I don't know. Oh, and IANAL.
> 
> Matthew
> 
> (goes to speak with someone who may know more)
> 
> Rich Heath wrote:
> 
>> Hi,
>> I am a software developer based in the UK that has
>> been asked about producing a piece of software that
>> outputs data from the files in ABI sequencers in a
>> more human readable format. I hope the
>> org.biojava.bio.program.abi package will let me do
>> this, but I have some concerns about the legal
>> implications of using and contributing to this
>> package.
>> Does anyone know what the legal position is with
>> regards reverse engineering the Applied Biosystems
>> file format (and any other file formats come to that
>> matter)? I would imagine this file format is the
>> property of Applied Biosystems and they would not like
>> me producing applications that read from it unless I
>> provide them with a sizable licence fee (although I
>> guess I am not reverse engineering it if I just use
>> the above package, just if I contribute to it?).
>> Many thanks in advance for your help,
>> Rich
>>


From dag at sonsorol.org  Thu Nov 20 10:54:07 2003
From: dag at sonsorol.org (Chris Dagdigian)
Date: Thu Nov 20 10:53:19 2003
Subject: [Biojava-l] Total OBF server shutdown Saturday November 22nd (all
	day EDT timezone)
Message-ID: <3FBCE39F.6080309@sonsorol.org>

Hi folks,

Apologies for the massive cross-posting.

Our CVS, mailing list and web servers are located in a Cambridge, MA USA 
datacenter belonging to Wyeth Resarch. Genetics Institute (which became 
part of Wyeth) has supported our signficant internet bandwidth and 
hosting needs for many years since the earliest versions of our open 
source efforts. Since I have to do this massive cross-post anyway I 
figured it was a good time to thank them again in public.

The real reason for this message is to announce a 1-day period of 
significant server downtime. The office floor & datacenter in the 
building where our servers are hosted is going to have a planned 
electrical shutdown (including emergency and backup power circuits) from 
10am - 6pm on Saturday November 22nd. I'll be manually bringing down our 
servers sometime before the 10am deadline.

The time estimate is conservative. In the event that the facilty work 
takes less time than expected I'll probably take advantage of the window 
to perform some server upgrades and failed disk replacements.

For any questions/concerns or if you notice a server or service that is 
still not available after the 22nd please contact me directly at 
'chris@bioteam.net' or 1-617-877-5498.

Regards,
Chris


From mark.schreiber at agresearch.co.nz  Thu Nov 20 16:06:24 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Thu Nov 20 16:02:46 2003
Subject: [Biojava-l] MassCalc question.
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7CCA@inbox.agresearch.co.nz>

Hi -

This is a bug that has been fixed in biojava-live and the 1.3 branch on CVS. It will also be available in the biojava 1.3.1 release which will be out very soon. (as soon as I solve my ftp problems and put it up on the site!)

- Mark


> -----Original Message-----
> From: Suman Kanuganti [mailto:bioinformatics4suman@yahoo.com] 
> Sent: Thursday, 20 November 2003 7:12 p.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] MassCalc question.
> 
> 
> Ok; I am having problem with using MassCalc class. It
> always an IllegalSymbolException though the symbol
> list is correct. 
> I have written this,
> 
>    SequenceIterator iter =
> MySeqTools.myReadFastaAA(args[0]);
> 
>     while(iter.hasNext()){
>       Sequence seq = iter.nextSequence();
>       SymbolList syml = (SymbolList)seq;
>       System.out.println(syml.seqString());
>       MassCalc mCalc = new 
> MassCalc(SymbolPropertyTable.MONO_MASS, true);
>       double mass = mCalc.getMass(syml);
>       System.out.println("Mass:
> "+seq.getName()+"\t"+mass);
>     }
> 
> 
> while result in error
> 
> Exception in thread "main"
> org.biojava.bio.symbol.IllegalSymbolException: The
> SymbolList was not using the protein alphabet
> 
> Any one can help me with this,
> 
> Thanks,
> Suman K
> 
> =====
> Suman K
> BioInformatics Associate,
> Genomics Research,
> Newton Lab,
> University of Missouri - Columbia.
> 
> __________________________________
> Do you Yahoo!?
> Free Pop-Up Blocker - Get it now
> http://companion.yahoo.com/ 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From wux at mail.cbi.pku.edu.cn  Thu Nov 20 19:49:14 2003
From: wux at mail.cbi.pku.edu.cn (wux@mail.cbi.pku.edu.cn)
Date: Thu Nov 20 19:49:27 2003
Subject: [Biojava-l] chinese version of biojava in anger
Message-ID: <200311210053.hAL0r4AY018996@mail.cbi.pku.edu.cn>

Dear all:

   I have finished the translation of Biojava In Anger to Simple Chinese version:
here is the URL http://wux.cbi.pku.edu.cn/PUMA/biojava/index-cn.html.
.


����

������������              
 				          Yours faithfully,
����������������������������     wux
����������������������������     wux@mail.cbi.pku.edu.cn
������������������������������    2003-11-21
*****************************************************
WuXin  Ph.D student of CBI (Center of Bioinformatics)
Peking University  100871 P.R.China
Email: wux@mail.cbi.pku.edu.cn
Tel: 010-62762409 (dorm)
     010-62755206 (office)
Address: Building 47#2026 Peking University
*****************************************************


From verhoeff2 at gis.a-star.edu.sg  Fri Nov 21 03:02:57 2003
From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans)
Date: Fri Nov 21 03:01:42 2003
Subject: [Biojava-l] File formats
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0426@BIONIC.biopolis.one-north.com>

Hi,

Just my take on it, but if developing software that reads ABI files
would be illegal, I think Microsoft would already have sued Sun
Microsystems for StarOffice being able to read/write in MS Office
formats.
So I do not think you have to worry about it.

Kind regards,
Frans


> -----Original Message-----
> From: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l-
> bounces@portal.open-bio.org] On Behalf Of Rich Heath
> Sent: Thursday, November 20, 2003 5:59 PM
> To: biojava-l@biojava.org
> Subject: [Biojava-l] File formats
> 
> Hi,
> 
> I am a software developer based in the UK that has
> been asked about producing a piece of software that
> outputs data from the files in ABI sequencers in a
> more human readable format. I hope the
> org.biojava.bio.program.abi package will let me do
> this, but I have some concerns about the legal
> implications of using and contributing to this
> package.
> 
> Does anyone know what the legal position is with
> regards reverse engineering the Applied Biosystems
> file format (and any other file formats come to that
> matter)? I would imagine this file format is the
> property of Applied Biosystems and they would not like
> me producing applications that read from it unless I
> provide them with a sizable licence fee (although I
> guess I am not reverse engineering it if I just use
> the above package, just if I contribute to it?).
> 
> Many thanks in advance for your help,
> 
> Rich
> 
> 
> 
>
________________________________________________________________________
> Want to chat instantly with your online friends?  Get the FREE Yahoo!
> Messenger http://mail.messenger.yahoo.co.uk
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l

From lmorris at ebi.ac.uk  Mon Nov 24 09:48:43 2003
From: lmorris at ebi.ac.uk (Lorna Morris)
Date: Mon Nov 24 09:55:15 2003
Subject: [Biojava-l] code changes to embl parser
Message-ID: <3FC21A4B.6090709@ebi.ac.uk>

I'm trying to submit some biojava files I've changed to the mailing list but I get this message: 

-----------

Your mail to 'Biojava-l' with the subject

    EmblFileFormer

Is being held until the list moderator can review it for approval.

The reason it is being held:

    Message has a suspicious header


------------

I'm sending them as 6 separate java attachments. Should I send the code changes in a different way to avoid getting this message? Thanks,

Lorna


From td2 at sanger.ac.uk  Mon Nov 24 10:10:24 2003
From: td2 at sanger.ac.uk (Thomas Down)
Date: Mon Nov 24 10:16:54 2003
Subject: [Biojava-l] code changes to embl parser
In-Reply-To: <3FC21A4B.6090709@ebi.ac.uk>
References: <3FC21A4B.6090709@ebi.ac.uk>
Message-ID: <20031124151024.GA277532@jabba.sanger.ac.uk>

On Mon, Nov 24, 2003 at 02:48:43PM +0000, Lorna Morris wrote:
> I'm trying to submit some biojava files I've changed to the mailing list 
> but I get this message: 
> -----------
> 
> Your mail to 'Biojava-l' with the subject
> 
>    EmblFileFormer
> 
> Is being held until the list moderator can review it for approval.
> 
> The reason it is being held:
> 
>    Message has a suspicious header
> 
> 
> ------------
> 
> I'm sending them as 6 separate java attachments. Should I send the code 
> changes in a different way to avoid getting this message? Thanks,

Hi Lorna,

I'm afraid that all messages with attachments are currently being
held by the mailing list software -- it was introduced as a
(rather draconian) anti spam/virus measure.  We've now got some
better filtering software on the mailing list, so I think the
no-attachments rule should probably be removed.

But in the mean time, the best solution would be either to put
your changes on a website somewhere and post a link, or just include
them in the body of a message.

Sorry about that,

     Thomas.
From lmorris at ebi.ac.uk  Mon Nov 24 10:32:47 2003
From: lmorris at ebi.ac.uk (Lorna Morris)
Date: Mon Nov 24 10:39:18 2003
Subject: [Biojava-l] EMBLFileFormer changes
Message-ID: <3FC2249F.6010108@ebi.ac.uk>

Hello

I'm using biojava to parse an EMBL Flat file, modify it, and dump it out 
to file at the end. However when I used SeqIOTools.writeEmbl the file 
created, did not have correctly ordered and nested RN, RP, RX, RA, RT 
and RL lines. These lines should occur in repeated sets, one set for 
each reference in the flat file. I've modified some of the biojava 
classes and added 2 new classes to correct this. Everthing works fine 
now. I've put the modified classes and new classes here:

www.ebi.ac.uk/~lmorris/bioJavaFiles

Files modfied:

EmblLikeFormat
EmblFileFormer
SeqIOEventEmitter
GenEmblPropertyComparator

Files added:

ReferenceAnnotation.java
EmblReferenceComparator.java

If you need any more details on the changes I've made let me know. Thanks,

Lorna

From lmorris at ebi.ac.uk  Wed Nov 19 11:09:41 2003
From: lmorris at ebi.ac.uk (Lorna Morris)
Date: Mon Nov 24 12:01:13 2003
Subject: [Biojava-l] EMBL Parser
Message-ID: <3FBB95C5.9000306@ebi.ac.uk>

Hello

I'm using biojava to parse an EMBL Flat file, modify it, and dump it out 
to file at the end. However when I used SeqIOTools.writeEmbl the file 
created, did not have correctly ordered and nested RN, RP, RX, RA, RT 
and RL lines. These lines should occur in repeated sets, one set for 
each reference in the flat file. I've modified some of the biojava 
classes and added 2 new classes to correct this. Everthing works fine 
now. I'm attatching the classes to this mail.

Files modfied:

EmblLikeFormat
EmblFileFormer
SeqIOEventEmitter
GenEmblPropertyComparator

Files added:

ReferenceAnnotation.java
EmblReferenceComparator.java

If you need any more details on the changes I've made let me know. Thanks,

Lorna


-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;


import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import org.biojava.bio.seq.Feature;
import org.biojava.bio.seq.StrandedFeature;
import org.biojava.bio.symbol.Alphabet;
import org.biojava.bio.symbol.IllegalAlphabetException;
import org.biojava.bio.symbol.IllegalSymbolException;
import org.biojava.bio.symbol.Symbol;
import org.biojava.bio.taxa.EbiFormat;
import org.biojava.bio.taxa.Taxon;
import org.biojava.bio.BioException;

/**
 * <p><code>EmblFileFormer</code> performs the detailed formatting of
 * EMBL entries for writing to a <code>PrintStream</code>. Currently
 * the formatting of the header is not correct. This really needs to
 * be addressed in the parser which is merging fields which should
 * remain separate.</p>
 *
 * <p>The event generator used to feed events to this class should
 * enforce ordering of those events. This class will stream data
 * directly to the <code>PrintStream</code></p>.
 *
 * <p>This implementation requires that all the symbols be added in
 * one block as is does not buffer the tokenized symbols between
 * calls.</p>
 *
 * @author Keith James
 * @author Len Trigg (Taxon output)
 * @since 1.2
 */
public class EmblFileFormer extends AbstractGenEmblFileFormer
    implements SeqFileFormer
{
    // Tags which are special cases, not having "XX" after them
    private static List NON_SEPARATED_TAGS = new ArrayList();

    static
    {
        NON_SEPARATED_TAGS.add(EmblLikeFormat.SOURCE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.REFERENCE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.COORDINATE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.REF_ACCESSION_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.AUTHORS_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.TITLE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.FEATURE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.JOURNAL_TAG);//Lorna: added
        NON_SEPARATED_TAGS.add(EmblLikeFormat.SEPARATOR_TAG);//Lorna: added
    }

    // 19 spaces
    private static String FT_LEADER =
        EmblLikeFormat.FEATURE_TABLE_TAG + "                   ";

    // 3 spaces
    private static String SQ_LEADER = "   ";

    // 80 spaces
    private static String EMPTY_LINE =
        "                                        " +
        "                                        ";

    private PrintStream stream;

    private String idLine;
    private String accLine;

    /**
     * Creates a new <code>EmblFileFormer</code> using
     * <code>System.out</code> stream.
     */
    protected EmblFileFormer()
    {
        this(System.out);
    }

    /**
     * Creates a new <code>EmblFileFormer</code> using the specified
     * stream.
     *
     * @param stream a <code>PrintStream</code>.
     */
    protected EmblFileFormer(PrintStream stream)
    {
        super();
        this.stream = stream;
    }

    public PrintStream getPrintStream()
    {
        return stream;
    }

    public void setPrintStream(PrintStream stream)
    {
        this.stream = stream;
    }

    public void setName(String id) throws ParseException
    {
        idLine = id;
    }

    public void startSequence() throws ParseException
    {
       aCount = 0;
       cCount = 0;
       gCount = 0;
       tCount = 0;
       oCount = 0;
    }

    public void endSequence() throws ParseException
    {
        stream.println(EmblLikeFormat.END_SEQUENCE_TAG);
    }

    public void setURI(String uri) throws ParseException { }

    public void addSymbols(Alphabet  alpha,
                           Symbol [] syms,
                           int       start,
                           int       length)
        throws IllegalAlphabetException
    {
        try
        {
            int end = start + length - 1;

            for (int i = start; i <= end; i++)
            {
                Symbol sym = syms[i];

                if (sym == a)
                    aCount++;
                else if (sym == c)
                    cCount++;
                else if (sym == g)
                    gCount++;
                else if (sym == t)
                    tCount++;
                else
                    oCount++;
            }

            StringBuffer sb = new StringBuffer(EmblLikeFormat.SEPARATOR_TAG);
            sb.append(nl);
            sb.append("SQ   Sequence ");
            sb.append(length + " BP; ");
            sb.append(aCount + " A; ");
            sb.append(cCount + " C; ");
            sb.append(gCount + " G; ");
            sb.append(tCount + " T; ");
            sb.append(oCount + " other;");

            // Print sequence summary header
            stream.println(sb);

            int fullLine = length / 60;
            int partLine = length % 60;

            int lineCount = fullLine;
            if (partLine > 0)
                lineCount++;

            int lineLens [] = new int [lineCount];

            // All lines are 60, except last (if present)
            Arrays.fill(lineLens, 60);

            if (partLine > 0)
                lineLens[lineCount - 1] = partLine;

            for (int i = 0; i < lineLens.length; i++)
            {
                // Prep the whitespace
                StringBuffer sq = new StringBuffer(EMPTY_LINE);

                // How long is this chunk?
                int len = lineLens[i];
                // Prepare a Symbol array same length as chunk
                Symbol [] sa = new Symbol [len];

                // Get symbols and format into blocks of tokens
                System.arraycopy(syms, start + (i * 60), sa, 0, len);

                sb = new StringBuffer();

                String blocks = (formatTokenBlock(sb, sa, 10,
                         alpha.getTokenization("token"))).toString();

                sq.replace(5, blocks.length() + 5, blocks);

                // Calculate the running residue count and add to the line
                String count = Integer.toString((i * 60) + len);
                sq.replace((80 - count.length()), 80, count);

                // Print formatted sequence line
                stream.println(sq);
            }
        }
        catch (BioException ex)
        {
            throw new IllegalAlphabetException(ex, "Alphabet not tokenizing");
        }
    }

        public void addSequenceProperty(Object key, Object value)
        throws ParseException
    {
        StringBuffer sb = new StringBuffer();

        // Ignore separators if they are sent to us. The parser should
        // be ignoring these really (lorna: I've changed this so they are ignored in SeqIOEventEmitter)
        //if (key.equals(EmblLikeFormat.SEPARATOR_TAG))
            //return;

        String tag = key.toString();
        String leader = tag + SQ_LEADER;
        String line = "";
        int wrapWidth = 85 - leader.length();

        // Special case: accession number
        if (key.equals(EmblProcessor.PROPERTY_EMBL_ACCESSIONS))
        {
            accLine = buildPropertyLine((Collection) value, ";", true);
            return;
        }
        else if (key.equals(EmblLikeFormat.ACCESSION_TAG))
        {
            line = accLine;
        } else if (key.equals(OrganismParser.PROPERTY_ORGANISM)) {
            Taxon taxon = (Taxon) value;
            addSequenceProperty(EmblLikeFormat.SOURCE_TAG, taxon);
            addSequenceProperty(EmblLikeFormat.ORGANISM_TAG, taxon.getParent());
            addSequenceProperty(EmblLikeFormat.ORGANISM_XREF_TAG, taxon);
            return;
        }
        if (value instanceof String)
        {
            line = (String) value;
        }
        else if (value instanceof Collection)
        {
            // Special case: date lines
            if (key.equals(EmblLikeFormat.DATE_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false);
                wrapWidth = Integer.MAX_VALUE;
            }
            //lorna :added 21.08.03, DR lines are another special case. Each one goes onto a separate line.
            else if (key.equals(EmblLikeFormat.DR_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false);
                wrapWidth = Integer.MAX_VALUE;
            }
            else if (key.equals(EmblLikeFormat.AUTHORS_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false); //lorna: add space here?
                wrapWidth = Integer.MAX_VALUE;
            }
            else if (key.equals(EmblLikeFormat.REF_ACCESSION_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false);
                wrapWidth = Integer.MAX_VALUE;
            }
            else
            {
                line = buildPropertyLine((Collection) value, " ", false);
            }
        } else if (value instanceof Taxon) {
            if (key.equals(EmblLikeFormat.ORGANISM_TAG)) {
                line = EbiFormat.getInstance().serialize((Taxon) value);
            } else if (key.equals(EmblLikeFormat.SOURCE_TAG)) {
                line = EbiFormat.getInstance().serializeSource((Taxon) value);
            } else if (key.equals(EmblLikeFormat.ORGANISM_XREF_TAG)) {
                line = EbiFormat.getInstance().serializeXRef((Taxon) value);
            }
        }

        if (line.length() == 0)
        {
            stream.println(tag);
        }
        else
        {
            sb = formatSequenceProperty(sb, line, leader, wrapWidth);
            stream.println(sb);
        }
        // Special case: those which don't get separated
        if (! NON_SEPARATED_TAGS.contains(key))
            stream.println(EmblLikeFormat.SEPARATOR_TAG);
        // Special case: feature header
        if (key.equals(EmblLikeFormat.FEATURE_TAG))
            stream.println(EmblLikeFormat.FEATURE_TAG);
    }


    public void startFeature(Feature.Template templ)
        throws ParseException
    {
        int strand = 0;

        if (templ instanceof StrandedFeature.Template)
            strand = ((StrandedFeature.Template) templ).strand.getValue();

        StringBuffer sb = new StringBuffer(FT_LEADER);
        sb = formatLocationBlock(sb, templ.location, strand, FT_LEADER, 80);
        sb.replace(5, 5 + templ.type.length(), templ.type);
        stream.println(sb);
    }

    public void endFeature() throws ParseException { }

    public void addFeatureProperty(Object key, Object value)
    {
        // Don't print internal data structures
        if (key.equals(Feature.PROPERTY_DATA_KEY))
            return;

        StringBuffer fb;
        StringBuffer sb;

        // The value may be a collection if several qualifiers of the
        // same type are present in a feature
        if (value instanceof Collection)
        {
            for (Iterator vi = ((Collection) value).iterator(); vi.hasNext();)
            {
                fb = new StringBuffer();
                sb = new StringBuffer();

                fb = formatQualifierBlock(fb,
                                          formatQualifier(sb, key, vi.next()).substring(0),
                                          FT_LEADER,
                                          80);
                stream.println(fb);
            }
        }
        else
        {
            fb = new StringBuffer();
            sb = new StringBuffer();

            fb = formatQualifierBlock(fb,
                                      formatQualifier(sb, key, value).substring(0),
                                      FT_LEADER,
                                      80);
            stream.println(fb);
        }
    }

    private String buildPropertyLine(Collection property,
                                     String separator,
                                     boolean terminate)
    {
        StringBuffer sb = new StringBuffer();

        for (Iterator pi = property.iterator(); pi.hasNext();)
        {
            sb.append(pi.next().toString());
            sb.append(separator);
        }

        if (terminate)
        {
            return sb.substring(0);
        }
        else
        {
            return sb.substring(0, sb.length() - separator.length());
        }
    }
}
-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.PrintStream;
import java.io.Serializable;
import java.util.Vector;
import java.util.ArrayList;

import org.biojava.bio.seq.Sequence;
import org.biojava.bio.symbol.IllegalSymbolException;
import org.biojava.utils.ParseErrorEvent;
import org.biojava.utils.ParseErrorListener;
import org.biojava.utils.ParseErrorSource;
import org.biojava.utils.ChangeVetoException;

/**
 * <p>
 * Format processor for handling EMBL records and similar files.  This
 * takes a very simple approach: all `normal' attribute lines are
 * passed to the listener as a tag (first two characters) and a value
 * (the rest of the line from the 6th character onwards).  Any data
 * between the special `SQ' line and the "//" entry terminator is
 * passed as a SymbolReader.
 * </p>
 *
 * <p>
 * This low-level format processor should normally be used in
 * conjunction with one or more `filter' objects, such as
 * EmblProcessor.
 * </p>
 *
 * <p>
 * Many ideas borrowed from the old EmblFormat processor by Thomas
 * Down and Thad Welch.
 * </p>
 *
 * @author Thomas Down
 * @author Greg Cox
 * @author Keith James
 * @author Len Trigg
 * @since 1.1
 */

public class EmblLikeFormat
    implements
            SequenceFormat,
            Serializable,
            ParseErrorSource,
            ParseErrorListener
{
    public static final String DEFAULT = "EMBL";

    protected static final String ID_TAG = "ID";
    protected static final String SIZE_TAG = "SIZE";
    protected static final String STRAND_NUMBER_TAG = "STRANDS";
    protected static final String TYPE_TAG = "TYPE";
    protected static final String CIRCULAR_TAG = "CIRCULAR";
    protected static final String DIVISION_TAG = "DIVISION";
    protected static final String DR_TAG = "DR"; //Lorna: new tag

    protected static final String ACCESSION_TAG = "AC";
    protected static final String VERSION_TAG = "SV";
    protected static final String DATE_TAG = "DT";
    protected static final String DEFINITION_TAG = "DE";
    protected static final String KEYWORDS_TAG = "KW";
    protected static final String SOURCE_TAG = "OS";
    protected static final String ORGANISM_TAG = "OC";
    protected static final String ORGANISM_XREF_TAG = "OX";
    protected static final String REFERENCE_TAG = "RN";
    protected static final String COORDINATE_TAG = "RP";
    protected static final String REF_ACCESSION_TAG = "RX";
    protected static final String AUTHORS_TAG = "RA";
    protected static final String TITLE_TAG = "RT";
    protected static final String JOURNAL_TAG = "RL";
    protected static final String COMMENT_TAG = "CC";
    protected static final String FEATURE_TAG = "FH";
    protected static final String SEPARATOR_TAG = "XX";
    protected static final String FEATURE_TABLE_TAG = "FT";
    protected static final String START_SEQUENCE_TAG = "SQ";  
    protected static final String END_SEQUENCE_TAG = "//";

    private boolean elideSymbols = false;
    private Vector mListeners = new Vector();

    /**
     * <p>Specifies whether the symbols (SQ) part of the entry should
     * be ignored. If this property is set to <code>true</code>, the
     * parser will never call addSymbols on the
     * <code>SeqIOListener</code>, but parsing will be faster if
     * you're only interested in header information.</p>
     *
     * <p> This property also allows the header to be parsed for files
     * which have invalid sequence data.</p>
     */
    public void setElideSymbols(boolean b)
    {
        elideSymbols = b;
    }

    /**
     * Return a flag indicating if symbol data will be skipped
     * when parsing streams.
     */
    public boolean getElideSymbols()
    {
        return elideSymbols;
    }

        public boolean readSequence(BufferedReader     reader,
                                SymbolTokenization symParser,
                                SeqIOListener      listener)
        throws IllegalSymbolException, IOException, ParseException
    {

    EmblReferenceProperty reference = null; //lorna

	if (listener instanceof ParseErrorSource) {
	    ((ParseErrorSource)(listener)).addParseErrorListener(this);
	}

        String            line;
        StreamParser    sparser       = null;
        boolean hasMoreSequence       = true;
        boolean hasInternalWhitespace = false;

        listener.startSequence();

        while ((line = reader.readLine()) != null)
        {
            if (line.startsWith(END_SEQUENCE_TAG))
            {
                if (sparser != null)
                {
                    // End of symbol data
                    sparser.close();
                    sparser = null;
                }

                // Allows us to tolerate trailing whitespace without
                // thinking that there is another Sequence to follow
                while (true)
                {
                    reader.mark(1);
                    int c = reader.read();

                    if (c == -1)
                    {
                        hasMoreSequence = false;
                        break;
                    }

                    if (Character.isWhitespace((char) c))
                    {
                        hasInternalWhitespace = true;
                        continue;
                    }

                    if (hasInternalWhitespace)
                        System.err.println("Warning: whitespace found between sequence entries");

                    reader.reset();
                    break;
                }

                listener.endSequence();
                return hasMoreSequence;
            }
            else if (line.startsWith(START_SEQUENCE_TAG))
            {
                // Adding a null property to flush the last feature;
                // Needed for Swissprot files because there is no gap
                // between the feature table and the sequence data
                listener.addSequenceProperty(SEPARATOR_TAG, "");

                sparser = symParser.parseStream(listener);
            }
            else
            {
                if (sparser == null)
                {
                    // Normal attribute line
                    String tag  = line.substring(0, 2);
                    String rest = null;
                    if (line.length() > 5)
                    {
                        rest = line.substring(5);
                    }

                    //lorna added, tags read in order, when a complete set goes through,
                    //spit out a single annotation event

                    ReferenceAnnotation refAnnot = new ReferenceAnnotation();

                    if (tag.equals(REFERENCE_TAG)) { //only 1 reference_tag!

                        try {
                            refAnnot.setProperty(tag, rest);
                            while (!(tag.equals(SEPARATOR_TAG))) {
                                // Normal attribute line

                                line = reader.readLine();

                                tag  = line.substring(0, 2);

                                if (line.length() > 5)
                                {
                                    rest = line.substring(5);
                                } else {
                                    rest = null;//for XX lines
                                }

                                if (refAnnot.containsProperty(tag)) {

                                    Object property = refAnnot.getProperty(tag);
                                    ArrayList properties;

                                    if (property instanceof String) {
                                        properties = new ArrayList();
                                        properties.add(property);
                                        properties.add(rest);
                                        refAnnot.setProperty(tag, properties);
                                    }
                                    if (property instanceof ArrayList) {
                                        ((ArrayList)property).add(rest);
                                    }
                                }  else {
                                    refAnnot.setProperty(tag, rest);
                                }
                            }
                            listener.addSequenceProperty(ReferenceAnnotation.class, refAnnot);

                        } catch (ChangeVetoException cve) {
                            cve.printStackTrace();
                        }

                    }
                    // lorna, end
                    else { //lorna
                        listener.addSequenceProperty(tag, rest);
                    } //lorna
                }
                else
                {
                    // Sequence line
                    if (! elideSymbols)
                        processSequenceLine(line, sparser);
                }
            }
        }

        if (sparser != null)
            sparser.close();

        throw new IOException("Premature end of stream or missing end tag '//' for EMBL");
    }


    /**
     * Dispatch symbol data from SQ-block line of an EMBL-like file.
     */
    protected void processSequenceLine(String line, StreamParser parser)
        throws IllegalSymbolException, ParseException
    {
        char[] cline = line.toCharArray();
        int parseStart = 0;
        int parseEnd   = 0;

        while (parseStart < cline.length)
        {
            while (parseStart < cline.length && cline[parseStart] == ' ')
                ++parseStart;
            if (parseStart >= cline.length)
                break;

            if (Character.isDigit(cline[parseStart]))
                return;

            parseEnd = parseStart + 1;
            while (parseEnd < cline.length && cline[parseEnd] != ' ') {
                if (cline[parseEnd] == '.' || cline[parseEnd] == '~') {
                   cline[parseEnd] = '-';
                }
                ++parseEnd;
            }

            // Got a segment of read sequence data
            parser.characters(cline, parseStart, parseEnd - parseStart);

            parseStart = parseEnd;
        }
    }

    public void writeSequence(Sequence seq, PrintStream os)
        throws IOException
    {
        writeSequence(seq, getDefaultFormat(), os);
    }

    /**
     * <code>writeSequence</code> writes a sequence to the specified
     * <code>PrintStream</code>, using the specified format.
     *
     * @param seq a <code>Sequence</code> to write out.
     * @param format a <code>String</code> indicating which sub-format
     * of those available from a particular
     * <code>SequenceFormat</code> implemention to use when
     * writing.
     * @param os a <code>PrintStream</code> object.
     *
     * @exception IOException if an error occurs.
     * @deprecated use writeSequence(Sequence seq, PrintStream os)
     */
    public void writeSequence(Sequence seq, String format, PrintStream os)
	throws IOException
    {
        SeqFileFormer former;

        if (format.equalsIgnoreCase("EMBL"))
            former = new EmblFileFormer();
        else if (format.equalsIgnoreCase("SWISSPROT"))
            former = new SwissprotFileFormer();
        else
            throw new IllegalArgumentException("Unknown format '"
                                               + format
                                               + "'");
        former.setPrintStream(os);

        SeqIOEventEmitter emitter =
            new SeqIOEventEmitter(GenEmblPropertyComparator.INSTANCE,
                                  GenEmblFeatureComparator.INSTANCE);

        emitter.getSeqIOEvents(seq, former);
    }

    /**
     * <code>getDefaultFormat</code> returns the String identifier for
     * the default format written by a <code>SequenceFormat</code>
     * implementation.
     *
     * @return a <code>String</code>.
     * @deprecated
     */
    public String getDefaultFormat()
    {
        return DEFAULT;
    }

    /**
     * <p>
     * This method determines the behaviour when a bad line is processed.
     * Some options are to log the error, throw an exception, ignore it
     * completely, or pass the event through.
     * </p>
     *
     * <p>
     * This method should be overwritten when different behavior is desired.
     * </p>
     *
     * @param theEvent The event that contains the bad line and token.
     */
    public void BadLineParsed(ParseErrorEvent theEvent)
    {
        notifyParseErrorEvent(theEvent);
    }

    /**
     * Adds a parse error listener to the list of listeners if it isn't already
     * included.
     *
     * @param theListener Listener to be added.
     */
    public synchronized void addParseErrorListener(ParseErrorListener theListener)
    {
        if (mListeners.contains(theListener) == false)
        {
            mListeners.addElement(theListener);
        }
    }

    /**
     * Removes a parse error listener from the list of listeners if it is
     * included.
     *
     * @param theListener Listener to be removed.
     */
    public synchronized void removeParseErrorListener(ParseErrorListener theListener)
    {
        if (mListeners.contains(theListener) == true)
        {
            mListeners.removeElement(theListener);
        }
    }
 
    // Protected methods
    /**
     * Passes the event on to all the listeners registered for ParseErrorEvents.
     *
     * @param theEvent The event to be handed to the listeners.
     */
    protected void notifyParseErrorEvent(ParseErrorEvent theEvent)
    {
        Vector listeners;
        synchronized(this)
        {
            listeners = (Vector)mListeners.clone();
        }

        for (int index = 0; index < listeners.size(); index++)
        {
            ParseErrorListener client = (ParseErrorListener)listeners.elementAt(index);
            client.BadLineParsed(theEvent);
        }
    }
}
-------------- next part --------------
/*
 * Created by IntelliJ IDEA.
 * User: lmorris
 * Date: Nov 14, 2003
 * Time: 11:11:52 AM
 * To change template for new class use 
 * Code Style | Class Templates options (Tools | IDE Options).
 */
package org.biojava.bio.seq.io;

import java.util.Comparator;
import java.util.List;
import java.util.ArrayList;

public class EmblReferenceComparator implements Comparator {

    static final Comparator INSTANCE = new EmblReferenceComparator();

    private List tagOrder;

    {
        tagOrder = new ArrayList();
        tagOrder.add(EmblLikeFormat.REFERENCE_TAG);
        tagOrder.add(EmblLikeFormat.COORDINATE_TAG);
        tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.AUTHORS_TAG);
        tagOrder.add(EmblLikeFormat.TITLE_TAG);
        tagOrder.add(EmblLikeFormat.JOURNAL_TAG);
        tagOrder.add(EmblLikeFormat.SEPARATOR_TAG);
    }

    public int compare(Object o1, Object o2)
    {
        int index1 = tagOrder.indexOf(o1);
        int index2 = tagOrder.indexOf(o2);

        return (index1 - index2);
    }

}
-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;

/**
 * <p><code>GenEmblPropertyComparator</code> compares Genbank/EMBL
 * file format tags by the order in which they should appear in their
 * respective formats.</p>
 *
 * <p>EMBL tags sort before Genbank tags. This is arbitrary. Given the
 * subtle differences in the values accompanying equivalent tags in
 * these formats the two sets shouldn't be mixed anyway.</p>
 *
 * <p>Any tags which belong to neither set sort before anything
 * else.<p>
 *
 * @author Keith James
 */
final class GenEmblPropertyComparator implements Comparator
{
    static final Comparator INSTANCE = new GenEmblPropertyComparator();

    private List tagOrder;

    private GenEmblPropertyComparator()
    {
        tagOrder = new ArrayList();
        tagOrder.add(EmblLikeFormat.ID_TAG);
        tagOrder.add(EmblLikeFormat.ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.VERSION_TAG);
        tagOrder.add(EmblLikeFormat.DATE_TAG);
        tagOrder.add(EmblLikeFormat.DEFINITION_TAG);
        tagOrder.add(EmblLikeFormat.KEYWORDS_TAG);
        tagOrder.add(EmblLikeFormat.SOURCE_TAG);
        tagOrder.add(EmblLikeFormat.ORGANISM_TAG);
        /*tagOrder.add(EmblLikeFormat.REFERENCE_TAG);
        tagOrder.add(EmblLikeFormat.COORDINATE_TAG);
        tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.AUTHORS_TAG);
        tagOrder.add(EmblLikeFormat.TITLE_TAG);
        tagOrder.add(EmblLikeFormat.JOURNAL_TAG);*/
        tagOrder.add(ReferenceAnnotation.class);
        tagOrder.add(EmblLikeFormat.DR_TAG);//lorna:added 21.08.03
        tagOrder.add(EmblLikeFormat.COORDINATE_TAG);
        tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.AUTHORS_TAG);
        tagOrder.add(EmblLikeFormat.TITLE_TAG);
        tagOrder.add(EmblLikeFormat.JOURNAL_TAG);
        tagOrder.add(EmblLikeFormat.COMMENT_TAG);
        tagOrder.add(EmblLikeFormat.FEATURE_TAG);

        tagOrder.add(GenbankFormat.LOCUS_TAG);
        tagOrder.add(GenbankFormat.SIZE_TAG);
        tagOrder.add(GenbankFormat.STRAND_NUMBER_TAG);
        tagOrder.add(GenbankFormat.TYPE_TAG);
        tagOrder.add(GenbankFormat.CIRCULAR_TAG);
        tagOrder.add(GenbankFormat.DIVISION_TAG);
        tagOrder.add(GenbankFormat.DATE_TAG);
        tagOrder.add(GenbankFormat.DEFINITION_TAG);
        tagOrder.add(GenbankFormat.ACCESSION_TAG);
        tagOrder.add(GenbankFormat.VERSION_TAG);
        tagOrder.add(GenbankFormat.GI_TAG);
        tagOrder.add(GenbankFormat.KEYWORDS_TAG);
        tagOrder.add(GenbankFormat.SOURCE_TAG);
        tagOrder.add(GenbankFormat.ORGANISM_TAG);
        tagOrder.add(GenbankFormat.REFERENCE_TAG);
        tagOrder.add(GenbankFormat.AUTHORS_TAG);
        tagOrder.add(GenbankFormat.TITLE_TAG);
        tagOrder.add(GenbankFormat.JOURNAL_TAG);
        tagOrder.add(GenbankFormat.COMMENT_TAG);
        tagOrder.add(GenbankFormat.FEATURE_TAG);
    }

    public int compare(Object o1, Object o2)
    {
        int index1 = tagOrder.indexOf(o1);
        int index2 = tagOrder.indexOf(o2);

        return (index1 - index2);
    }
}
-------------- next part --------------
/*
 * Created by IntelliJ IDEA.
 * User: lmorris
 * Date: Nov 14, 2003
 * Time: 11:45:41 AM
 * To change template for new class use 
 * Code Style | Class Templates options (Tools | IDE Options).
 */
package org.biojava.bio.seq.io;

import org.biojava.bio.AbstractAnnotation;
import org.biojava.utils.ChangeVetoException;

import java.util.Map;
import java.util.HashMap;

public class ReferenceAnnotation extends AbstractAnnotation {

     /**
   * The properties map. This may be null if no property values have
   * yet been set.
   */
    private Map properties;

    public ReferenceAnnotation() {

            super();
        try {
            System.out.println("Calling refAnnot");
            this.setProperty(EmblLikeFormat.SEPARATOR_TAG, "");//all references have an epty XX line
        } catch (ChangeVetoException e) {
            e.printStackTrace();
        }
    }

    protected Map getProperties() {
        if(!propertiesAllocated()) {
            properties = new HashMap();
        }
        return properties;
    }

    protected boolean propertiesAllocated() {
        return properties != null;
    }


}
-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;

import java.util.*;

import org.biojava.bio.Annotation;
import org.biojava.bio.BioError;
import org.biojava.bio.seq.Feature;
import org.biojava.bio.seq.FeatureHolder;
import org.biojava.bio.seq.Sequence;
import org.biojava.bio.symbol.IllegalAlphabetException;
import org.biojava.bio.symbol.Symbol;

/**
 * <code>SeqIOEventEmitter</code> is a utility class which scans a
 * <code>Sequence</code> object and sends events describing its
 * constituent data to a <code>SeqIOListener</code>. The listener
 * should be able to reconstruct the <code>Sequence</code> from these
 * events.
 *
 * @author Keith James
 * @since 1.2
*/
class SeqIOEventEmitter
{
    private static Symbol [] symProto = new Symbol [0];

    private Comparator seqPropComparator;
    private Comparator refPropComparator;
    private Comparator featureComparator;

    SeqIOEventEmitter(Comparator seqPropComparator,
                      Comparator featureComparator)
    {
        this.seqPropComparator = seqPropComparator;
        this.featureComparator = featureComparator;
    };


            /**
     * <code>getSeqIOEvents</code> scans a <code>Sequence</code>
     * object and sends events describing its data to the
     * <code>SeqIOListener</code>.
     *
     * @param seq a <code>Sequence</code>.
     * @param listener a <code>SeqIOListener</code>.
     */
    void getSeqIOEvents(Sequence seq, SeqIOListener listener)
    {
        try
        {
            // Inform listener of sequence start
            listener.startSequence();

            // Pass name to listener
            listener.setName(seq.getName());

            // Pass URN to listener
            listener.setURI(seq.getURN());

            // Pass sequence properties to listener
            Annotation a = seq.getAnnotation();
            List sKeys = new ArrayList(a.keys());
            Collections.sort(sKeys, seqPropComparator);

            for (Iterator ki = sKeys.iterator(); ki.hasNext();)
            {
                Object key = ki.next();

                if ( key.equals(ReferenceAnnotation.class)) {

                    ArrayList references = null;

                    if (a.getProperty(key) instanceof ArrayList) {
                       references = ((ArrayList)a.getProperty(key));
                    }

                    if (references != null) {

                        for ( int i = 0; i < references.size(); i++ ) {
                            ReferenceAnnotation refAnnot = (ReferenceAnnotation)references.get(i);

                            Map referenceLines = refAnnot.getProperties();
                            List refKeys = new ArrayList(referenceLines.keySet());
                            refPropComparator = EmblReferenceComparator.INSTANCE;
                            Collections.sort(refKeys, refPropComparator);

                            for (Iterator kit = refKeys.iterator(); kit.hasNext();)
                            {
                                Object refKey = kit.next();
                                //adds all the R* tags and final XX tag
                                listener.addSequenceProperty(refKey, refAnnot.getProperty(refKey));
                            }
                        }
                    }
                }
                else {

                    if (!(key.equals(EmblLikeFormat.SEPARATOR_TAG)))  {  //lorna: ignore XX

                       listener.addSequenceProperty(key, a.getProperty(key));
                    }

                }
            }

            // Recurse through sub feature tree, flattening it for
            // EMBL
            List subs = getSubFeatures(seq);
            Collections.sort(subs, featureComparator);

            // Put the source features first for EMBL
            for (Iterator fi = subs.iterator(); fi.hasNext();)
            {
                // The template is required to call startFeature
                Feature.Template t = ((Feature) fi.next()).makeTemplate();

                // Inform listener of feature start
                listener.startFeature(t);

                // Pass feature properties (i.e. qualifiers to
                // listener)
                // FIXME: this will drop all non-comparable keys
                List fKeys = comparableList(t.annotation.keys());
                Collections.sort(fKeys);

                for (Iterator ki = fKeys.iterator(); ki.hasNext();)
                {
                    Object key = ki.next();
                    listener.addFeatureProperty(key, t.annotation.getProperty(key));
                }

                // Inform listener of feature end
                listener.endFeature();
            }

            // Add symbols
            listener.addSymbols(seq.getAlphabet(),
                                (Symbol []) seq.toList().toArray(symProto),
                                0,
                                seq.length());

            // Inform listener of sequence end
            listener.endSequence();
        }
        catch (IllegalAlphabetException iae)
        {
            // This should never happen as the alphabet is being used
            // by this Sequence instance
            throw new BioError("An internal error occurred processing symbols",iae);
        }
        catch (ParseException pe)
        {
            throw new BioError("An internal error occurred creating SeqIO events",pe);
        }
    }


    /**
     * <code>getSubFeatures</code> is a recursive method which returns
     * a list of all <code>Feature</code>s within a
     * <code>FeatureHolder</code>.
     *
     * @param fh a <code>FeatureHolder</code>.
     *
     * @return a <code>List</code>.
     */
    private static List getSubFeatures(FeatureHolder fh)
    {
        List subfeat = new ArrayList();

        for (Iterator fi = fh.features(); fi.hasNext();)
        {
            FeatureHolder sfh = (FeatureHolder) fi.next();

            subfeat.addAll((Collection) getSubFeatures(sfh));
            subfeat.add(sfh);
        }
        return subfeat;
    }

    private List comparableList(Collection coll) {
      ArrayList res = new ArrayList();
      for(Iterator i = coll.iterator(); i.hasNext(); ) {
        Object o = i.next();
        if(o instanceof Comparable) {
          res.add(o);
        }
      }
      return res;
    }
}
From lmorris at ebi.ac.uk  Mon Nov 24 09:37:52 2003
From: lmorris at ebi.ac.uk (Lorna Morris)
Date: Mon Nov 24 12:01:15 2003
Subject: [Biojava-l] EmblFileFormer
Message-ID: <3FC217C0.9050900@ebi.ac.uk>

Hello

I'm using biojava to parse an EMBL Flat file, modify it, and dump it out 
to file at the end. However when I used SeqIOTools.writeEmbl the file 
created, did not have correctly ordered and nested RN, RP, RX, RA, RT 
and RL lines. These lines should occur in repeated sets, one set for 
each reference in the flat file. I've modified some of the biojava 
classes and added 2 new classes to correct this. Everthing works fine 
now. I'm attatching the classes to this mail.

Files modfied:

EmblLikeFormat
EmblFileFormer
SeqIOEventEmitter
GenEmblPropertyComparator

Files added:

ReferenceAnnotation.java
EmblReferenceComparator.java

If you need any more details on the changes I've made let me know. Thanks,

Lorna
-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;


import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import org.biojava.bio.seq.Feature;
import org.biojava.bio.seq.StrandedFeature;
import org.biojava.bio.symbol.Alphabet;
import org.biojava.bio.symbol.IllegalAlphabetException;
import org.biojava.bio.symbol.IllegalSymbolException;
import org.biojava.bio.symbol.Symbol;
import org.biojava.bio.taxa.EbiFormat;
import org.biojava.bio.taxa.Taxon;
import org.biojava.bio.BioException;

/**
 * <p><code>EmblFileFormer</code> performs the detailed formatting of
 * EMBL entries for writing to a <code>PrintStream</code>. Currently
 * the formatting of the header is not correct. This really needs to
 * be addressed in the parser which is merging fields which should
 * remain separate.</p>
 *
 * <p>The event generator used to feed events to this class should
 * enforce ordering of those events. This class will stream data
 * directly to the <code>PrintStream</code></p>.
 *
 * <p>This implementation requires that all the symbols be added in
 * one block as is does not buffer the tokenized symbols between
 * calls.</p>
 *
 * @author Keith James
 * @author Len Trigg (Taxon output)
 * @since 1.2
 */
public class EmblFileFormer extends AbstractGenEmblFileFormer
    implements SeqFileFormer
{
    // Tags which are special cases, not having "XX" after them
    private static List NON_SEPARATED_TAGS = new ArrayList();

    static
    {
        NON_SEPARATED_TAGS.add(EmblLikeFormat.SOURCE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.REFERENCE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.COORDINATE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.REF_ACCESSION_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.AUTHORS_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.TITLE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.FEATURE_TAG);
        NON_SEPARATED_TAGS.add(EmblLikeFormat.JOURNAL_TAG);//Lorna: added
        NON_SEPARATED_TAGS.add(EmblLikeFormat.SEPARATOR_TAG);//Lorna: added
    }

    // 19 spaces
    private static String FT_LEADER =
        EmblLikeFormat.FEATURE_TABLE_TAG + "                   ";

    // 3 spaces
    private static String SQ_LEADER = "   ";

    // 80 spaces
    private static String EMPTY_LINE =
        "                                        " +
        "                                        ";

    private PrintStream stream;

    private String idLine;
    private String accLine;

    /**
     * Creates a new <code>EmblFileFormer</code> using
     * <code>System.out</code> stream.
     */
    protected EmblFileFormer()
    {
        this(System.out);
    }

    /**
     * Creates a new <code>EmblFileFormer</code> using the specified
     * stream.
     *
     * @param stream a <code>PrintStream</code>.
     */
    protected EmblFileFormer(PrintStream stream)
    {
        super();
        this.stream = stream;
    }

    public PrintStream getPrintStream()
    {
        return stream;
    }

    public void setPrintStream(PrintStream stream)
    {
        this.stream = stream;
    }

    public void setName(String id) throws ParseException
    {
        idLine = id;
    }

    public void startSequence() throws ParseException
    {
       aCount = 0;
       cCount = 0;
       gCount = 0;
       tCount = 0;
       oCount = 0;
    }

    public void endSequence() throws ParseException
    {
        stream.println(EmblLikeFormat.END_SEQUENCE_TAG);
    }

    public void setURI(String uri) throws ParseException { }

    public void addSymbols(Alphabet  alpha,
                           Symbol [] syms,
                           int       start,
                           int       length)
        throws IllegalAlphabetException
    {
        try
        {
            int end = start + length - 1;

            for (int i = start; i <= end; i++)
            {
                Symbol sym = syms[i];

                if (sym == a)
                    aCount++;
                else if (sym == c)
                    cCount++;
                else if (sym == g)
                    gCount++;
                else if (sym == t)
                    tCount++;
                else
                    oCount++;
            }

            StringBuffer sb = new StringBuffer(EmblLikeFormat.SEPARATOR_TAG);
            sb.append(nl);
            sb.append("SQ   Sequence ");
            sb.append(length + " BP; ");
            sb.append(aCount + " A; ");
            sb.append(cCount + " C; ");
            sb.append(gCount + " G; ");
            sb.append(tCount + " T; ");
            sb.append(oCount + " other;");

            // Print sequence summary header
            stream.println(sb);

            int fullLine = length / 60;
            int partLine = length % 60;

            int lineCount = fullLine;
            if (partLine > 0)
                lineCount++;

            int lineLens [] = new int [lineCount];

            // All lines are 60, except last (if present)
            Arrays.fill(lineLens, 60);

            if (partLine > 0)
                lineLens[lineCount - 1] = partLine;

            for (int i = 0; i < lineLens.length; i++)
            {
                // Prep the whitespace
                StringBuffer sq = new StringBuffer(EMPTY_LINE);

                // How long is this chunk?
                int len = lineLens[i];
                // Prepare a Symbol array same length as chunk
                Symbol [] sa = new Symbol [len];

                // Get symbols and format into blocks of tokens
                System.arraycopy(syms, start + (i * 60), sa, 0, len);

                sb = new StringBuffer();

                String blocks = (formatTokenBlock(sb, sa, 10,
                         alpha.getTokenization("token"))).toString();

                sq.replace(5, blocks.length() + 5, blocks);

                // Calculate the running residue count and add to the line
                String count = Integer.toString((i * 60) + len);
                sq.replace((80 - count.length()), 80, count);

                // Print formatted sequence line
                stream.println(sq);
            }
        }
        catch (BioException ex)
        {
            throw new IllegalAlphabetException(ex, "Alphabet not tokenizing");
        }
    }

        public void addSequenceProperty(Object key, Object value)
        throws ParseException
    {
        StringBuffer sb = new StringBuffer();

        // Ignore separators if they are sent to us. The parser should
        // be ignoring these really (lorna: I've changed this so they are ignored in SeqIOEventEmitter)
        //if (key.equals(EmblLikeFormat.SEPARATOR_TAG))
            //return;

        String tag = key.toString();
        String leader = tag + SQ_LEADER;
        String line = "";
        int wrapWidth = 85 - leader.length();

        // Special case: accession number
        if (key.equals(EmblProcessor.PROPERTY_EMBL_ACCESSIONS))
        {
            accLine = buildPropertyLine((Collection) value, ";", true);
            return;
        }
        else if (key.equals(EmblLikeFormat.ACCESSION_TAG))
        {
            line = accLine;
        } else if (key.equals(OrganismParser.PROPERTY_ORGANISM)) {
            Taxon taxon = (Taxon) value;
            addSequenceProperty(EmblLikeFormat.SOURCE_TAG, taxon);
            addSequenceProperty(EmblLikeFormat.ORGANISM_TAG, taxon.getParent());
            addSequenceProperty(EmblLikeFormat.ORGANISM_XREF_TAG, taxon);
            return;
        }
        if (value instanceof String)
        {
            line = (String) value;
        }
        else if (value instanceof Collection)
        {
            // Special case: date lines
            if (key.equals(EmblLikeFormat.DATE_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false);
                wrapWidth = Integer.MAX_VALUE;
            }
            //lorna :added 21.08.03, DR lines are another special case. Each one goes onto a separate line.
            else if (key.equals(EmblLikeFormat.DR_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false);
                wrapWidth = Integer.MAX_VALUE;
            }
            else if (key.equals(EmblLikeFormat.AUTHORS_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false); //lorna: add space here?
                wrapWidth = Integer.MAX_VALUE;
            }
            else if (key.equals(EmblLikeFormat.REF_ACCESSION_TAG))
            {
                line = buildPropertyLine((Collection) value, nl + leader, false);
                wrapWidth = Integer.MAX_VALUE;
            }
            else
            {
                line = buildPropertyLine((Collection) value, " ", false);
            }
        } else if (value instanceof Taxon) {
            if (key.equals(EmblLikeFormat.ORGANISM_TAG)) {
                line = EbiFormat.getInstance().serialize((Taxon) value);
            } else if (key.equals(EmblLikeFormat.SOURCE_TAG)) {
                line = EbiFormat.getInstance().serializeSource((Taxon) value);
            } else if (key.equals(EmblLikeFormat.ORGANISM_XREF_TAG)) {
                line = EbiFormat.getInstance().serializeXRef((Taxon) value);
            }
        }

        if (line.length() == 0)
        {
            stream.println(tag);
        }
        else
        {
            sb = formatSequenceProperty(sb, line, leader, wrapWidth);
            stream.println(sb);
        }
        // Special case: those which don't get separated
        if (! NON_SEPARATED_TAGS.contains(key))
            stream.println(EmblLikeFormat.SEPARATOR_TAG);
        // Special case: feature header
        if (key.equals(EmblLikeFormat.FEATURE_TAG))
            stream.println(EmblLikeFormat.FEATURE_TAG);
    }


    public void startFeature(Feature.Template templ)
        throws ParseException
    {
        int strand = 0;

        if (templ instanceof StrandedFeature.Template)
            strand = ((StrandedFeature.Template) templ).strand.getValue();

        StringBuffer sb = new StringBuffer(FT_LEADER);
        sb = formatLocationBlock(sb, templ.location, strand, FT_LEADER, 80);
        sb.replace(5, 5 + templ.type.length(), templ.type);
        stream.println(sb);
    }

    public void endFeature() throws ParseException { }

    public void addFeatureProperty(Object key, Object value)
    {
        // Don't print internal data structures
        if (key.equals(Feature.PROPERTY_DATA_KEY))
            return;

        StringBuffer fb;
        StringBuffer sb;

        // The value may be a collection if several qualifiers of the
        // same type are present in a feature
        if (value instanceof Collection)
        {
            for (Iterator vi = ((Collection) value).iterator(); vi.hasNext();)
            {
                fb = new StringBuffer();
                sb = new StringBuffer();

                fb = formatQualifierBlock(fb,
                                          formatQualifier(sb, key, vi.next()).substring(0),
                                          FT_LEADER,
                                          80);
                stream.println(fb);
            }
        }
        else
        {
            fb = new StringBuffer();
            sb = new StringBuffer();

            fb = formatQualifierBlock(fb,
                                      formatQualifier(sb, key, value).substring(0),
                                      FT_LEADER,
                                      80);
            stream.println(fb);
        }
    }

    private String buildPropertyLine(Collection property,
                                     String separator,
                                     boolean terminate)
    {
        StringBuffer sb = new StringBuffer();

        for (Iterator pi = property.iterator(); pi.hasNext();)
        {
            sb.append(pi.next().toString());
            sb.append(separator);
        }

        if (terminate)
        {
            return sb.substring(0);
        }
        else
        {
            return sb.substring(0, sb.length() - separator.length());
        }
    }
}
-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.PrintStream;
import java.io.Serializable;
import java.util.Vector;
import java.util.ArrayList;

import org.biojava.bio.seq.Sequence;
import org.biojava.bio.symbol.IllegalSymbolException;
import org.biojava.utils.ParseErrorEvent;
import org.biojava.utils.ParseErrorListener;
import org.biojava.utils.ParseErrorSource;
import org.biojava.utils.ChangeVetoException;

/**
 * <p>
 * Format processor for handling EMBL records and similar files.  This
 * takes a very simple approach: all `normal' attribute lines are
 * passed to the listener as a tag (first two characters) and a value
 * (the rest of the line from the 6th character onwards).  Any data
 * between the special `SQ' line and the "//" entry terminator is
 * passed as a SymbolReader.
 * </p>
 *
 * <p>
 * This low-level format processor should normally be used in
 * conjunction with one or more `filter' objects, such as
 * EmblProcessor.
 * </p>
 *
 * <p>
 * Many ideas borrowed from the old EmblFormat processor by Thomas
 * Down and Thad Welch.
 * </p>
 *
 * @author Thomas Down
 * @author Greg Cox
 * @author Keith James
 * @author Len Trigg
 * @since 1.1
 */

public class EmblLikeFormat
    implements
            SequenceFormat,
            Serializable,
            ParseErrorSource,
            ParseErrorListener
{
    public static final String DEFAULT = "EMBL";

    protected static final String ID_TAG = "ID";
    protected static final String SIZE_TAG = "SIZE";
    protected static final String STRAND_NUMBER_TAG = "STRANDS";
    protected static final String TYPE_TAG = "TYPE";
    protected static final String CIRCULAR_TAG = "CIRCULAR";
    protected static final String DIVISION_TAG = "DIVISION";
    protected static final String DR_TAG = "DR"; //Lorna: new tag

    protected static final String ACCESSION_TAG = "AC";
    protected static final String VERSION_TAG = "SV";
    protected static final String DATE_TAG = "DT";
    protected static final String DEFINITION_TAG = "DE";
    protected static final String KEYWORDS_TAG = "KW";
    protected static final String SOURCE_TAG = "OS";
    protected static final String ORGANISM_TAG = "OC";
    protected static final String ORGANISM_XREF_TAG = "OX";
    protected static final String REFERENCE_TAG = "RN";
    protected static final String COORDINATE_TAG = "RP";
    protected static final String REF_ACCESSION_TAG = "RX";
    protected static final String AUTHORS_TAG = "RA";
    protected static final String TITLE_TAG = "RT";
    protected static final String JOURNAL_TAG = "RL";
    protected static final String COMMENT_TAG = "CC";
    protected static final String FEATURE_TAG = "FH";
    protected static final String SEPARATOR_TAG = "XX";
    protected static final String FEATURE_TABLE_TAG = "FT";
    protected static final String START_SEQUENCE_TAG = "SQ";  
    protected static final String END_SEQUENCE_TAG = "//";

    private boolean elideSymbols = false;
    private Vector mListeners = new Vector();

    /**
     * <p>Specifies whether the symbols (SQ) part of the entry should
     * be ignored. If this property is set to <code>true</code>, the
     * parser will never call addSymbols on the
     * <code>SeqIOListener</code>, but parsing will be faster if
     * you're only interested in header information.</p>
     *
     * <p> This property also allows the header to be parsed for files
     * which have invalid sequence data.</p>
     */
    public void setElideSymbols(boolean b)
    {
        elideSymbols = b;
    }

    /**
     * Return a flag indicating if symbol data will be skipped
     * when parsing streams.
     */
    public boolean getElideSymbols()
    {
        return elideSymbols;
    }

        public boolean readSequence(BufferedReader     reader,
                                SymbolTokenization symParser,
                                SeqIOListener      listener)
        throws IllegalSymbolException, IOException, ParseException
    {

    EmblReferenceProperty reference = null; //lorna

	if (listener instanceof ParseErrorSource) {
	    ((ParseErrorSource)(listener)).addParseErrorListener(this);
	}

        String            line;
        StreamParser    sparser       = null;
        boolean hasMoreSequence       = true;
        boolean hasInternalWhitespace = false;

        listener.startSequence();

        while ((line = reader.readLine()) != null)
        {
            if (line.startsWith(END_SEQUENCE_TAG))
            {
                if (sparser != null)
                {
                    // End of symbol data
                    sparser.close();
                    sparser = null;
                }

                // Allows us to tolerate trailing whitespace without
                // thinking that there is another Sequence to follow
                while (true)
                {
                    reader.mark(1);
                    int c = reader.read();

                    if (c == -1)
                    {
                        hasMoreSequence = false;
                        break;
                    }

                    if (Character.isWhitespace((char) c))
                    {
                        hasInternalWhitespace = true;
                        continue;
                    }

                    if (hasInternalWhitespace)
                        System.err.println("Warning: whitespace found between sequence entries");

                    reader.reset();
                    break;
                }

                listener.endSequence();
                return hasMoreSequence;
            }
            else if (line.startsWith(START_SEQUENCE_TAG))
            {
                // Adding a null property to flush the last feature;
                // Needed for Swissprot files because there is no gap
                // between the feature table and the sequence data
                listener.addSequenceProperty(SEPARATOR_TAG, "");

                sparser = symParser.parseStream(listener);
            }
            else
            {
                if (sparser == null)
                {
                    // Normal attribute line
                    String tag  = line.substring(0, 2);
                    String rest = null;
                    if (line.length() > 5)
                    {
                        rest = line.substring(5);
                    }

                    //lorna added, tags read in order, when a complete set goes through,
                    //spit out a single annotation event

                    ReferenceAnnotation refAnnot = new ReferenceAnnotation();

                    if (tag.equals(REFERENCE_TAG)) { //only 1 reference_tag!

                        try {
                            refAnnot.setProperty(tag, rest);
                            while (!(tag.equals(SEPARATOR_TAG))) {
                                // Normal attribute line

                                line = reader.readLine();

                                tag  = line.substring(0, 2);

                                if (line.length() > 5)
                                {
                                    rest = line.substring(5);
                                } else {
                                    rest = null;//for XX lines
                                }

                                if (refAnnot.containsProperty(tag)) {

                                    Object property = refAnnot.getProperty(tag);
                                    ArrayList properties;

                                    if (property instanceof String) {
                                        properties = new ArrayList();
                                        properties.add(property);
                                        properties.add(rest);
                                        refAnnot.setProperty(tag, properties);
                                    }
                                    if (property instanceof ArrayList) {
                                        ((ArrayList)property).add(rest);
                                    }
                                }  else {
                                    refAnnot.setProperty(tag, rest);
                                }
                            }
                            listener.addSequenceProperty(ReferenceAnnotation.class, refAnnot);

                        } catch (ChangeVetoException cve) {
                            cve.printStackTrace();
                        }

                    }
                    // lorna, end
                    else { //lorna
                        listener.addSequenceProperty(tag, rest);
                    } //lorna
                }
                else
                {
                    // Sequence line
                    if (! elideSymbols)
                        processSequenceLine(line, sparser);
                }
            }
        }

        if (sparser != null)
            sparser.close();

        throw new IOException("Premature end of stream or missing end tag '//' for EMBL");
    }


    /**
     * Dispatch symbol data from SQ-block line of an EMBL-like file.
     */
    protected void processSequenceLine(String line, StreamParser parser)
        throws IllegalSymbolException, ParseException
    {
        char[] cline = line.toCharArray();
        int parseStart = 0;
        int parseEnd   = 0;

        while (parseStart < cline.length)
        {
            while (parseStart < cline.length && cline[parseStart] == ' ')
                ++parseStart;
            if (parseStart >= cline.length)
                break;

            if (Character.isDigit(cline[parseStart]))
                return;

            parseEnd = parseStart + 1;
            while (parseEnd < cline.length && cline[parseEnd] != ' ') {
                if (cline[parseEnd] == '.' || cline[parseEnd] == '~') {
                   cline[parseEnd] = '-';
                }
                ++parseEnd;
            }

            // Got a segment of read sequence data
            parser.characters(cline, parseStart, parseEnd - parseStart);

            parseStart = parseEnd;
        }
    }

    public void writeSequence(Sequence seq, PrintStream os)
        throws IOException
    {
        writeSequence(seq, getDefaultFormat(), os);
    }

    /**
     * <code>writeSequence</code> writes a sequence to the specified
     * <code>PrintStream</code>, using the specified format.
     *
     * @param seq a <code>Sequence</code> to write out.
     * @param format a <code>String</code> indicating which sub-format
     * of those available from a particular
     * <code>SequenceFormat</code> implemention to use when
     * writing.
     * @param os a <code>PrintStream</code> object.
     *
     * @exception IOException if an error occurs.
     * @deprecated use writeSequence(Sequence seq, PrintStream os)
     */
    public void writeSequence(Sequence seq, String format, PrintStream os)
	throws IOException
    {
        SeqFileFormer former;

        if (format.equalsIgnoreCase("EMBL"))
            former = new EmblFileFormer();
        else if (format.equalsIgnoreCase("SWISSPROT"))
            former = new SwissprotFileFormer();
        else
            throw new IllegalArgumentException("Unknown format '"
                                               + format
                                               + "'");
        former.setPrintStream(os);

        SeqIOEventEmitter emitter =
            new SeqIOEventEmitter(GenEmblPropertyComparator.INSTANCE,
                                  GenEmblFeatureComparator.INSTANCE);

        emitter.getSeqIOEvents(seq, former);
    }

    /**
     * <code>getDefaultFormat</code> returns the String identifier for
     * the default format written by a <code>SequenceFormat</code>
     * implementation.
     *
     * @return a <code>String</code>.
     * @deprecated
     */
    public String getDefaultFormat()
    {
        return DEFAULT;
    }

    /**
     * <p>
     * This method determines the behaviour when a bad line is processed.
     * Some options are to log the error, throw an exception, ignore it
     * completely, or pass the event through.
     * </p>
     *
     * <p>
     * This method should be overwritten when different behavior is desired.
     * </p>
     *
     * @param theEvent The event that contains the bad line and token.
     */
    public void BadLineParsed(ParseErrorEvent theEvent)
    {
        notifyParseErrorEvent(theEvent);
    }

    /**
     * Adds a parse error listener to the list of listeners if it isn't already
     * included.
     *
     * @param theListener Listener to be added.
     */
    public synchronized void addParseErrorListener(ParseErrorListener theListener)
    {
        if (mListeners.contains(theListener) == false)
        {
            mListeners.addElement(theListener);
        }
    }

    /**
     * Removes a parse error listener from the list of listeners if it is
     * included.
     *
     * @param theListener Listener to be removed.
     */
    public synchronized void removeParseErrorListener(ParseErrorListener theListener)
    {
        if (mListeners.contains(theListener) == true)
        {
            mListeners.removeElement(theListener);
        }
    }
 
    // Protected methods
    /**
     * Passes the event on to all the listeners registered for ParseErrorEvents.
     *
     * @param theEvent The event to be handed to the listeners.
     */
    protected void notifyParseErrorEvent(ParseErrorEvent theEvent)
    {
        Vector listeners;
        synchronized(this)
        {
            listeners = (Vector)mListeners.clone();
        }

        for (int index = 0; index < listeners.size(); index++)
        {
            ParseErrorListener client = (ParseErrorListener)listeners.elementAt(index);
            client.BadLineParsed(theEvent);
        }
    }
}
-------------- next part --------------
/*
 * Created by IntelliJ IDEA.
 * User: lmorris
 * Date: Nov 14, 2003
 * Time: 11:11:52 AM
 * To change template for new class use 
 * Code Style | Class Templates options (Tools | IDE Options).
 */
package org.biojava.bio.seq.io;

import java.util.Comparator;
import java.util.List;
import java.util.ArrayList;

public class EmblReferenceComparator implements Comparator {

    static final Comparator INSTANCE = new EmblReferenceComparator();

    private List tagOrder;

    {
        tagOrder = new ArrayList();
        tagOrder.add(EmblLikeFormat.REFERENCE_TAG);
        tagOrder.add(EmblLikeFormat.COORDINATE_TAG);
        tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.AUTHORS_TAG);
        tagOrder.add(EmblLikeFormat.TITLE_TAG);
        tagOrder.add(EmblLikeFormat.JOURNAL_TAG);
        tagOrder.add(EmblLikeFormat.SEPARATOR_TAG);
    }

    public int compare(Object o1, Object o2)
    {
        int index1 = tagOrder.indexOf(o1);
        int index2 = tagOrder.indexOf(o2);

        return (index1 - index2);
    }

}
-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;

/**
 * <p><code>GenEmblPropertyComparator</code> compares Genbank/EMBL
 * file format tags by the order in which they should appear in their
 * respective formats.</p>
 *
 * <p>EMBL tags sort before Genbank tags. This is arbitrary. Given the
 * subtle differences in the values accompanying equivalent tags in
 * these formats the two sets shouldn't be mixed anyway.</p>
 *
 * <p>Any tags which belong to neither set sort before anything
 * else.<p>
 *
 * @author Keith James
 */
final class GenEmblPropertyComparator implements Comparator
{
    static final Comparator INSTANCE = new GenEmblPropertyComparator();

    private List tagOrder;

    private GenEmblPropertyComparator()
    {
        tagOrder = new ArrayList();
        tagOrder.add(EmblLikeFormat.ID_TAG);
        tagOrder.add(EmblLikeFormat.ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.VERSION_TAG);
        tagOrder.add(EmblLikeFormat.DATE_TAG);
        tagOrder.add(EmblLikeFormat.DEFINITION_TAG);
        tagOrder.add(EmblLikeFormat.KEYWORDS_TAG);
        tagOrder.add(EmblLikeFormat.SOURCE_TAG);
        tagOrder.add(EmblLikeFormat.ORGANISM_TAG);
        /*tagOrder.add(EmblLikeFormat.REFERENCE_TAG);
        tagOrder.add(EmblLikeFormat.COORDINATE_TAG);
        tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.AUTHORS_TAG);
        tagOrder.add(EmblLikeFormat.TITLE_TAG);
        tagOrder.add(EmblLikeFormat.JOURNAL_TAG);*/
        tagOrder.add(ReferenceAnnotation.class);
        tagOrder.add(EmblLikeFormat.DR_TAG);//lorna:added 21.08.03
        tagOrder.add(EmblLikeFormat.COORDINATE_TAG);
        tagOrder.add(EmblLikeFormat.REF_ACCESSION_TAG);
        tagOrder.add(EmblLikeFormat.AUTHORS_TAG);
        tagOrder.add(EmblLikeFormat.TITLE_TAG);
        tagOrder.add(EmblLikeFormat.JOURNAL_TAG);
        tagOrder.add(EmblLikeFormat.COMMENT_TAG);
        tagOrder.add(EmblLikeFormat.FEATURE_TAG);

        tagOrder.add(GenbankFormat.LOCUS_TAG);
        tagOrder.add(GenbankFormat.SIZE_TAG);
        tagOrder.add(GenbankFormat.STRAND_NUMBER_TAG);
        tagOrder.add(GenbankFormat.TYPE_TAG);
        tagOrder.add(GenbankFormat.CIRCULAR_TAG);
        tagOrder.add(GenbankFormat.DIVISION_TAG);
        tagOrder.add(GenbankFormat.DATE_TAG);
        tagOrder.add(GenbankFormat.DEFINITION_TAG);
        tagOrder.add(GenbankFormat.ACCESSION_TAG);
        tagOrder.add(GenbankFormat.VERSION_TAG);
        tagOrder.add(GenbankFormat.GI_TAG);
        tagOrder.add(GenbankFormat.KEYWORDS_TAG);
        tagOrder.add(GenbankFormat.SOURCE_TAG);
        tagOrder.add(GenbankFormat.ORGANISM_TAG);
        tagOrder.add(GenbankFormat.REFERENCE_TAG);
        tagOrder.add(GenbankFormat.AUTHORS_TAG);
        tagOrder.add(GenbankFormat.TITLE_TAG);
        tagOrder.add(GenbankFormat.JOURNAL_TAG);
        tagOrder.add(GenbankFormat.COMMENT_TAG);
        tagOrder.add(GenbankFormat.FEATURE_TAG);
    }

    public int compare(Object o1, Object o2)
    {
        int index1 = tagOrder.indexOf(o1);
        int index2 = tagOrder.indexOf(o2);

        return (index1 - index2);
    }
}
-------------- next part --------------
/*
 * Created by IntelliJ IDEA.
 * User: lmorris
 * Date: Nov 14, 2003
 * Time: 11:45:41 AM
 * To change template for new class use 
 * Code Style | Class Templates options (Tools | IDE Options).
 */
package org.biojava.bio.seq.io;

import org.biojava.bio.AbstractAnnotation;
import org.biojava.utils.ChangeVetoException;

import java.util.Map;
import java.util.HashMap;

public class ReferenceAnnotation extends AbstractAnnotation {

     /**
   * The properties map. This may be null if no property values have
   * yet been set.
   */
    private Map properties;

    public ReferenceAnnotation() {

            super();
        try {
            System.out.println("Calling refAnnot");
            this.setProperty(EmblLikeFormat.SEPARATOR_TAG, "");//all references have an epty XX line
        } catch (ChangeVetoException e) {
            e.printStackTrace();
        }
    }

    protected Map getProperties() {
        if(!propertiesAllocated()) {
            properties = new HashMap();
        }
        return properties;
    }

    protected boolean propertiesAllocated() {
        return properties != null;
    }


}
-------------- next part --------------
/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */

package org.biojava.bio.seq.io;

import java.util.*;

import org.biojava.bio.Annotation;
import org.biojava.bio.BioError;
import org.biojava.bio.seq.Feature;
import org.biojava.bio.seq.FeatureHolder;
import org.biojava.bio.seq.Sequence;
import org.biojava.bio.symbol.IllegalAlphabetException;
import org.biojava.bio.symbol.Symbol;

/**
 * <code>SeqIOEventEmitter</code> is a utility class which scans a
 * <code>Sequence</code> object and sends events describing its
 * constituent data to a <code>SeqIOListener</code>. The listener
 * should be able to reconstruct the <code>Sequence</code> from these
 * events.
 *
 * @author Keith James
 * @since 1.2
*/
class SeqIOEventEmitter
{
    private static Symbol [] symProto = new Symbol [0];

    private Comparator seqPropComparator;
    private Comparator refPropComparator;
    private Comparator featureComparator;

    SeqIOEventEmitter(Comparator seqPropComparator,
                      Comparator featureComparator)
    {
        this.seqPropComparator = seqPropComparator;
        this.featureComparator = featureComparator;
    };


            /**
     * <code>getSeqIOEvents</code> scans a <code>Sequence</code>
     * object and sends events describing its data to the
     * <code>SeqIOListener</code>.
     *
     * @param seq a <code>Sequence</code>.
     * @param listener a <code>SeqIOListener</code>.
     */
    void getSeqIOEvents(Sequence seq, SeqIOListener listener)
    {
        try
        {
            // Inform listener of sequence start
            listener.startSequence();

            // Pass name to listener
            listener.setName(seq.getName());

            // Pass URN to listener
            listener.setURI(seq.getURN());

            // Pass sequence properties to listener
            Annotation a = seq.getAnnotation();
            List sKeys = new ArrayList(a.keys());
            Collections.sort(sKeys, seqPropComparator);

            for (Iterator ki = sKeys.iterator(); ki.hasNext();)
            {
                Object key = ki.next();

                if ( key.equals(ReferenceAnnotation.class)) {

                    ArrayList references = null;

                    if (a.getProperty(key) instanceof ArrayList) {
                       references = ((ArrayList)a.getProperty(key));
                    }

                    if (references != null) {

                        for ( int i = 0; i < references.size(); i++ ) {
                            ReferenceAnnotation refAnnot = (ReferenceAnnotation)references.get(i);

                            Map referenceLines = refAnnot.getProperties();
                            List refKeys = new ArrayList(referenceLines.keySet());
                            refPropComparator = EmblReferenceComparator.INSTANCE;
                            Collections.sort(refKeys, refPropComparator);

                            for (Iterator kit = refKeys.iterator(); kit.hasNext();)
                            {
                                Object refKey = kit.next();
                                //adds all the R* tags and final XX tag
                                listener.addSequenceProperty(refKey, refAnnot.getProperty(refKey));
                            }
                        }
                    }
                }
                else {

                    if (!(key.equals(EmblLikeFormat.SEPARATOR_TAG)))  {  //lorna: ignore XX

                       listener.addSequenceProperty(key, a.getProperty(key));
                    }

                }
            }

            // Recurse through sub feature tree, flattening it for
            // EMBL
            List subs = getSubFeatures(seq);
            Collections.sort(subs, featureComparator);

            // Put the source features first for EMBL
            for (Iterator fi = subs.iterator(); fi.hasNext();)
            {
                // The template is required to call startFeature
                Feature.Template t = ((Feature) fi.next()).makeTemplate();

                // Inform listener of feature start
                listener.startFeature(t);

                // Pass feature properties (i.e. qualifiers to
                // listener)
                // FIXME: this will drop all non-comparable keys
                List fKeys = comparableList(t.annotation.keys());
                Collections.sort(fKeys);

                for (Iterator ki = fKeys.iterator(); ki.hasNext();)
                {
                    Object key = ki.next();
                    listener.addFeatureProperty(key, t.annotation.getProperty(key));
                }

                // Inform listener of feature end
                listener.endFeature();
            }

            // Add symbols
            listener.addSymbols(seq.getAlphabet(),
                                (Symbol []) seq.toList().toArray(symProto),
                                0,
                                seq.length());

            // Inform listener of sequence end
            listener.endSequence();
        }
        catch (IllegalAlphabetException iae)
        {
            // This should never happen as the alphabet is being used
            // by this Sequence instance
            throw new BioError("An internal error occurred processing symbols",iae);
        }
        catch (ParseException pe)
        {
            throw new BioError("An internal error occurred creating SeqIO events",pe);
        }
    }


    /**
     * <code>getSubFeatures</code> is a recursive method which returns
     * a list of all <code>Feature</code>s within a
     * <code>FeatureHolder</code>.
     *
     * @param fh a <code>FeatureHolder</code>.
     *
     * @return a <code>List</code>.
     */
    private static List getSubFeatures(FeatureHolder fh)
    {
        List subfeat = new ArrayList();

        for (Iterator fi = fh.features(); fi.hasNext();)
        {
            FeatureHolder sfh = (FeatureHolder) fi.next();

            subfeat.addAll((Collection) getSubFeatures(sfh));
            subfeat.add(sfh);
        }
        return subfeat;
    }

    private List comparableList(Collection coll) {
      ArrayList res = new ArrayList();
      for(Iterator i = coll.iterator(); i.hasNext(); ) {
        Object o = i.next();
        if(o instanceof Comparable) {
          res.add(o);
        }
      }
      return res;
    }
}
From mark.schreiber at agresearch.co.nz  Mon Nov 24 16:01:54 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Mon Nov 24 16:08:36 2003
Subject: [Biojava-l] EmblFileFormer
Message-ID: <AF026AF0FF4B054590228FD1F1DE516501BF1D46@inbox.agresearch.co.nz>

Hi Lorna,

This is really good. Writing files correctly has been a weak point in biojava. I have committed your changes to the 1.3.1 branch and I will put them on the biojava-live branch shortly.

If there are any volunteers the writing of GenPept and SwissProt files also sucks badly. International fame and adoration await the person who fixes them :)

- mark


> -----Original Message-----
> From: Lorna Morris [mailto:lmorris@ebi.ac.uk] 
> Sent: Tuesday, 25 November 2003 3:38 a.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] EmblFileFormer
> 
> 
> Hello
> 
> I'm using biojava to parse an EMBL Flat file, modify it, and 
> dump it out 
> to file at the end. However when I used SeqIOTools.writeEmbl the file 
> created, did not have correctly ordered and nested RN, RP, RX, RA, RT 
> and RL lines. These lines should occur in repeated sets, one set for 
> each reference in the flat file. I've modified some of the biojava 
> classes and added 2 new classes to correct this. Everthing works fine 
> now. I'm attatching the classes to this mail.
> 
> Files modfied:
> 
> EmblLikeFormat
> EmblFileFormer
> SeqIOEventEmitter
> GenEmblPropertyComparator
> 
> Files added:
> 
> ReferenceAnnotation.java
> EmblReferenceComparator.java
> 
> If you need any more details on the changes I've made let me 
> know. Thanks,
> 
> Lorna
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From mark.schreiber at agresearch.co.nz  Mon Nov 24 17:25:48 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Mon Nov 24 17:32:19 2003
Subject: [Biojava-l] BioJava in the news
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7CE4@inbox.agresearch.co.nz>

Hey -

James Gosling knows we exist!

http://bio.oreilly.com/news/gosling.html
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From ben at schoolid.com  Tue Nov 25 03:42:35 2003
From: ben at schoolid.com (Ben Good)
Date: Tue Nov 25 03:49:09 2003
Subject: [Biojava-l] anger error
Message-ID: <511887DA-1F23-11D8-99C7-000393C45566@schoolid.com>

Hi,

  Trying to implement this bit from biojava in anger ("How do count the 
residues in a sequence").

Count counts = new IndexedCount ((FiniteAlphabet)seq.getAlphabet());
					//iterate through the Symbols in seq
						for (Iterator i = seq.iterator(); i.hasNext();){
						AtomicSymbol sym = (AtomicSymbol)i.next();
						counts.increaseCount (sym,1.0);
						}

It compiles but gives a class cast exception when I try to run it.  
Won't accept (AtomicSymbol)i.next();

It seems that seq.iterator() returns an iterator over Symbols and not 
AtomicSymbols?

any ideas?

thanks
-Ben

From mark.schreiber at agresearch.co.nz  Tue Nov 25 16:32:14 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Tue Nov 25 16:38:50 2003
Subject: [Biojava-l] anger error
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7CEB@inbox.agresearch.co.nz>

Hi Ben -

If your Sequence contains any ambiguous Symbols (eg for DNA n,y,r,w etc), they will not be AtomicSymbols they will be BasisSymbols. BasisSymbols are made up of one or more AtomicSymbols.

If this is the case you need to use Solution 2 from the same page (http://www.biojava.org/docs/bj_in_anger/CountResidues.htm). Actually Solution 2 is the better of the two as it is more flexible.

If this still doesn't work send me the sequence and I'll take a look.

- Mark


> -----Original Message-----
> From: Ben Good [mailto:ben@schoolid.com] 
> Sent: Tuesday, 25 November 2003 9:43 p.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] anger error
> 
> 
> Hi,
> 
>   Trying to implement this bit from biojava in anger ("How do 
> count the 
> residues in a sequence").
> 
> Count counts = new IndexedCount ((FiniteAlphabet)seq.getAlphabet());
> 					//iterate through the 
> Symbols in seq
> 						for (Iterator i 
> = seq.iterator(); i.hasNext();){
> 						AtomicSymbol 
> sym = (AtomicSymbol)i.next();
> 						
> counts.increaseCount (sym,1.0);
> 						}
> 
> It compiles but gives a class cast exception when I try to run it.  
> Won't accept (AtomicSymbol)i.next();
> 
> It seems that seq.iterator() returns an iterator over Symbols and not 
> AtomicSymbols?
> 
> any ideas?
> 
> thanks
> -Ben
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From wux at mail.cbi.pku.edu.cn  Tue Nov 25 20:42:25 2003
From: wux at mail.cbi.pku.edu.cn (wux@mail.cbi.pku.edu.cn)
Date: Tue Nov 25 20:52:49 2003
Subject: [Biojava-l] chinese version of biojava in anger's website
Message-ID: <200311260146.hAQ1k1AY002411@mail.cbi.pku.edu.cn>

Dear all:

   The chinese version of biojava in anger is located at http://www.cbi.pku.edu.cn/chinese/documents/PUMA/biojava/index-cn.html

   Now, it is ok to see it out of china. I hope you can enjoy it.
PS: Mark, would you like to add a link in biojava in anger? Thanks.


����

������������              
 				          Yours faithfully,
����������������������������     wux
����������������������������     wux@mail.cbi.pku.edu.cn
������������������������������    2003-11-26
*****************************************************
WuXin  Ph.D student of CBI (Center of Bioinformatics)
Peking University  100871 P.R.China
Email: wux@mail.cbi.pku.edu.cn
Tel: 010-62762409 (dorm)
     010-62755206 (office)
Address: Building 47#2026 Peking University
*****************************************************


From mark.schreiber at agresearch.co.nz  Tue Nov 25 20:52:24 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Tue Nov 25 20:59:10 2003
Subject: [Biojava-l] chinese version of biojava in anger's website
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7CEE@inbox.agresearch.co.nz>

Hi -

Thanks for this. I will put a link from biojava in anger as soon as I sort out some file permissions problems I am having with the open-bio server.

Also, do you happen to know which font is required for viewing chinese characters so I can add this information too?

Thanks

Mark


> -----Original Message-----
> From: wux@mail.cbi.pku.edu.cn [mailto:wux@mail.cbi.pku.edu.cn] 
> Sent: Wednesday, 26 November 2003 2:42 p.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] chinese version of biojava in anger's website
> 
> 
> Dear all:
> 
>    The chinese version of biojava in anger is located at 
> http://www.cbi.pku.edu.cn/chinese/documents/PUMA/biojava/index-cn.html
> 
>    Now, it is ok to see it out of china. I hope you can enjoy it.
> PS: Mark, would you like to add a link in biojava in anger? Thanks.
> 
> 
> ����
> 
> ������������              
>  				          Yours faithfully,
> ����������������������������     wux
> ����������������������������     wux@mail.cbi.pku.edu.cn
> ������������������������������    2003-11-26
> *****************************************************
> WuXin  Ph.D student of CBI (Center of Bioinformatics)
> Peking University  100871 P.R.China
> Email: wux@mail.cbi.pku.edu.cn
> Tel: 010-62762409 (dorm)
>      010-62755206 (office)
> Address: Building 47#2026 Peking University
> *****************************************************
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

From daviddebeule at pandora.be  Wed Nov 26 16:00:48 2003
From: daviddebeule at pandora.be (david de beule)
Date: Wed Nov 26 18:42:57 2003
Subject: [Biojava-l] Strang change of Location class
Message-ID: <001c01c3b460$5f08d3e0$0100a8c0@davidpc>

Hi,

This piece of code:

  //making a sequence
  Alphabet dna = DNATools.getDNA();
  SymbolTokenization dnaToke = dna.getTokenization("token");
  SymbolList seq0 = new SimpleSymbolList(dnaToke, "ACTGGACCTAAGG");
  Sequence sequence0 = new SimpleSequence(seq0, "test", "test", null);

  //adding a feature with a between location
  StrandedFeature.Template templ = new StrandedFeature.Template(); 
  templ.annotation = Annotation.EMPTY_ANNOTATION;
  templ.location = new BetweenLocation(new RangeLocation(7,8));
  templ.source = "my feature";
  templ.strand = StrandedFeature.POSITIVE;
  templ.type = "interesting motif";
  sequence0.createFeature(templ);

  Iterator iter = sequence0.features();
  while (iter.hasNext()) {
      Feature feature = (Feature)iter.next();
      Location location = feature.getLocation();
      System.out.println("orginal feature location: " + location.getClass()); 
  }

  //converting to a simplegappedsequence
  SimpleGappedSequence _sequence = new SimpleGappedSequence(sequence0);

  iter = _sequence.features();
  while (iter.hasNext()) {
      Feature feature = (Feature)iter.next();
      Location location = feature.getLocation();
      System.out.println("new feature location: " + location.getClass()); 
  }

Gives me the following output:

orginal feature location class org.biojava.bio.symbol.BetweenLocation
new feature location: class org.biojava.bio.symbol.RangeLocation

Why is the feature location changed from BetweenLocation to RangeLocation during the conversion ??

Any help would be appreciated,

David De Beule

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/biojava-l/attachments/20031126/9f3d2d9f/attachment-0001.htm
From cox at mshri.on.ca  Wed Nov 26 21:14:40 2003
From: cox at mshri.on.ca (Brian Cox)
Date: Wed Nov 26 18:43:21 2003
Subject: [Biojava-l] weightmatrix annotator
Message-ID: <009301c3b48c$39daaf40$61627026@rossdell>

Hello,
Does the current method or is there a method that lets multiple weight matrix annotations be on the same sequence.  I currently am annotating the sequence then pulling the annotation off into a list then annotating with the next matrix etc., is there a good way of iterating through all matrices, annotating the sequence with out deleting the annotation previous annotation? Perhaps it does this already and I did something wrong?

later,
Brian Cox
Samuel Lunenfeld Research Institute
Mount Sinai Hospital, Rm 884
Toronto, Ontario
Canada

416-586-8266
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/biojava-l/attachments/20031126/9e23259d/attachment.htm
From wux at mail.cbi.pku.edu.cn  Wed Nov 26 21:00:03 2003
From: wux at mail.cbi.pku.edu.cn (wux@mail.cbi.pku.edu.cn)
Date: Wed Nov 26 21:59:12 2003
Subject: [Biojava-l] How soon can we get a book of biojava?
Message-ID: <200311270203.hAR23PAY010939@mail.cbi.pku.edu.cn>

Dear all:

   As Mark said, James Gosling knows biojava exists. I found two books in O'reilly :
"Beginning perl for bioinformatics" and " Mastering perl for bioinformatics". I hope
" Beginning java for bioinformatics" and " Mastering java for bioinformatics " are available
as soon as possible. Does biojava team think about it?


����

������������              
 				          Yours faithfully,
����������������������������     wux
����������������������������     wux@mail.cbi.pku.edu.cn
������������������������������    2003-11-27
*****************************************************
WuXin  Ph.D student of CBI (Center of Bioinformatics)
Peking University  100871 P.R.China
Email: wux@mail.cbi.pku.edu.cn
Tel: 010-62762409 (dorm)
     010-62755206 (office)
Address: Building 47#2026 Peking University
*****************************************************


From david.huen at ntlworld.com  Thu Nov 27 02:33:31 2003
From: david.huen at ntlworld.com (David Huen)
Date: Thu Nov 27 02:40:00 2003
Subject: [Biojava-l] How soon can we get a book of biojava?
In-Reply-To: <200311270203.hAR23PAY010939@mail.cbi.pku.edu.cn>
References: <200311270203.hAR23PAY010939@mail.cbi.pku.edu.cn>
Message-ID: <200311270733.32777.david.huen@ntlworld.com>

On Thursday 27 Nov 2003 2:00 am, wux@mail.cbi.pku.edu.cn wrote:
> Dear all:
>
>    As Mark said, James Gosling knows biojava exists. I found two books in
> O'reilly : "Beginning perl for bioinformatics" and " Mastering perl for
> bioinformatics". I hope " Beginning java for bioinformatics" and "
> Mastering java for bioinformatics " are available as soon as possible.
> Does biojava team think about it?
>
I believe that Matthew Pocock (and someone else too?) was commissioned to 
write one  and has been busy doing so.

Rgds,
David Huen

From matthew_pocock at yahoo.co.uk  Thu Nov 27 10:32:42 2003
From: matthew_pocock at yahoo.co.uk (Matthew Pocock)
Date: Thu Nov 27 10:48:28 2003
Subject: [Biojava-l] Strang change of Location class
In-Reply-To: <001c01c3b460$5f08d3e0$0100a8c0@davidpc>
References: <001c01c3b460$5f08d3e0$0100a8c0@davidpc>
Message-ID: <3FC6191A.4040803@yahoo.co.uk>

Hehe. Subtle are the ways of wizards.

Deep in the guts of BioJava there is some magic that makes locations 
behave reasonably, even if you do impolite things like ask them to 
project across gaps or into assemblies. This was not taking into account 
the other magic that makes BetweenLocation and CircularLocation behave.

The code in CVS should handle this now. Well spotted.

Matthew

> Gives me the following output:
>  
> orginal feature location class org.biojava.bio.symbol.BetweenLocation
> new feature location: class org.biojava.bio.symbol.RangeLocation
>  
> Why is the feature location changed from BetweenLocation to 
> RangeLocation during the conversion ??
>  
> Any help would be appreciated,
>  
> David De Beule
>  
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>  
>


From tvavouri at hotmail.com  Thu Nov 27 11:14:47 2003
From: tvavouri at hotmail.com (Tanya Vavouri)
Date: Thu Nov 27 11:21:13 2003
Subject: [Biojava-l] weightmatrix annotator
Message-ID: <BAY2-F50n8URb022x4Q00015e00@hotmail.com>

Hi Brian,

I have started using biojava to annotate sequences with weight matrices and 
have had the same problem. Basically, using the WeightMatrixAnnotator class 
I can annotate a sequence with multiple weight matrices, but the problem is 
that when I then look at the sequence features, I can't tell which feature 
corresponds to which weight matrix.

As a solution, I've modified the WeightMatrixAnnotator class so that the 
constructor can also take a String argument, which is the ID of my weight 
matrix and then the class saves that string as the Feature.type(instead of 
"hit").

I also thought that it would be good for the WeightMatrixAnnotator to accept 
a database of Weight Matrices so that it can neatly annotate a sequence with 
many matrices.

Does anyone know if this can already be done with some other biojava classes 
or whether someone is already working on this ? If not, would it be worth me 
sending to biojava some classes that I've written to deal with these tasks ?

Tanya Vavouri

Graduate Student
Comparative Genomics Group
MRC HGMP-RC
Hinxton
Cambridge CB10 1SB
UK


>From: "Brian Cox" <cox@mshri.on.ca>
>To: <biojava-l@biojava.org>
>Subject: [Biojava-l] weightmatrix annotator
>Date: Wed, 26 Nov 2003 18:14:40 -0800
>
>Hello,
>Does the current method or is there a method that lets multiple weight 
>matrix annotations be on the same sequence.  I currently am annotating the 
>sequence then pulling the annotation off into a list then annotating with 
>the next matrix etc., is there a good way of iterating through all 
>matrices, annotating the sequence with out deleting the annotation previous 
>annotation? Perhaps it does this already and I did something wrong?
>
>later,
>Brian Cox
>Samuel Lunenfeld Research Institute
>Mount Sinai Hospital, Rm 884
>Toronto, Ontario
>Canada
>
>416-586-8266
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l

_________________________________________________________________
Use MSN Messenger to send music and pics to your friends 
http://www.msn.co.uk/messenger

From verhoeff2 at gis.a-star.edu.sg  Thu Nov 27 22:34:07 2003
From: verhoeff2 at gis.a-star.edu.sg (VERHOEF Frans)
Date: Fri Nov 28 12:25:51 2003
Subject: [Biojava-l] PhredFormat
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D560B0694@BIONIC.biopolis.one-north.com>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: PhredFormat.java
Type: application/octet-stream
Size: 9750 bytes
Desc: PhredFormat.java
Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20031128/65d8a81b/PhredFormat-0001.obj
From mark.schreiber at agresearch.co.nz  Sun Nov 30 17:19:12 2003
From: mark.schreiber at agresearch.co.nz (Schreiber, Mark)
Date: Sun Nov 30 17:25:47 2003
Subject: [Biojava-l] RE: [Biojava-dev] PhredFormat
Message-ID: <AF026AF0FF4B054590228FD1F1DE5165016E7CFB@inbox.agresearch.co.nz>

Hi Frans -

Thanks for these changes. I have committed them to cvs and added "default" as a valid tokenization of IntegerAlphabet (as a synonym of "token").

- Mark


-----Original Message-----
From: VERHOEF Frans [mailto:verhoeff2@gis.a-star.edu.sg] 
Sent: Friday, 28 November 2003 4:34 p.m.
To: biojava-dev@biojava.org; biojava-l@biojava.org
Subject: [Biojava-dev] PhredFormat


Hi,
 
I have fixed the little bugs in PhredFormat bugging me for the last 2 days. I have attached the version fixed by me. Feel free to use it, change it or throw it.
In short what I have changed is this:
 
-          PhredFormat implements ParseErrorSource and ParseErrorListener. This was not much of a job, as I basically copied it from FastaFormat.
-          readSequenceData(BufferedReader br, SymbolTokenization parser, SeqIOListener listener) has changed. This method used to parse char arrays for short number strings and feed it to the StreamParser, which in turn would try to do the same. As in the process the whitespaces were removed, in the end a String representing a humongous number was tried to be parsed to integer. Now this method does not parse the char arrays, but just feeds whole chunks of char array to the StreamParser.
 
One new issue came up though, when I am trying to do the following:
 
            StreamReader qualityIter = PhredTools.readPhredQuality(new BufferedReader(new FileReader(phredQualityFile)));
            While (qualityIter.hasNext()){
                Sequence seq = qualityIter.nextSequence();
                String str = seq.seqString();
            }
 
The last line gave the following exception:
 
            java.util.NoSuchElementException: default parser not supported by IntegerAlphabet yet
            at org.biojava.bio.symbol.IntegerAlphabet.getTokenization(IntegerAlphabet.java:216)
            at org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSymbolList.java:101)
            at org.biojava.bio.seq.impl.SimpleSequence.seqString(SimpleSequence.java:108)
            at org.gis.server.pipeline.apps.SequenceInfoParser.parseResults(SequenceInfoParser.java:82)
 
What happens is that SimpleSequence calls the AbstractSymbolList.seqString() method. This method in turn executes getAlphabet().getTokenization("default"), where getAlphabet returns the IntegerAlphabet. But IntegerAlphabet throws the Exception here, because it only except a name parameter value "token" and not the "default" that AbstractSymbolList gives. I do have simple workaround, that basically where the method IntegerAplhabet.getTokenization(String name) accepts both "default" and "token". 
But I am not sure I here understand the philosophy behind the design completely...
 
Kind regards,
 
Frans Verhoef
Bioinformatics Specialist
Genome Institute of Singapore
Genome, #02-01, 60 Biopolis Street, Singapore 138672
Tel: +65 6478 8000
DID: +65 6478 8060
HP: +65 9848 4325
Email: verhoeff2@gis.a-star.edu.sg
 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================