From markjschreiber at gmail.com Wed Oct 1 02:07:51 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 1 Oct 2008 14:07:51 +0800 Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast result In-Reply-To: References: Message-ID: <93b45ca50809302307t19a652a4v4a61eeceec07aa62@mail.gmail.com> Actually, if it is an OS specific carriage return then there is still a minor issue. We should really try and code stuff so that it can handle files that originate from any major OS. - Mark On Wed, Oct 1, 2008 at 12:31 AM, Richard Holland wrote: > > Sounds like it _might_ be something to do with the carriage return > itself. Is the blast file generated on the same OS that you're running > your analysis on? (e.g. you might run Blast on a Linux box, but > attempt to parse the file on a Windows box?). If the two OSes are > different, this might point to it - as Linux won't necessarily > understand the Windows linebreaks, or vice versa, and might > misinterpret them. When you copy the portion of the file to a new file > on the OS you're running the analysis on, it will substitute its own > local linebreaks and thus mask the problem. > > So the first thing I'd check is to what the two OSes involved are. If > they're different, try running your analysis program on the same OS as > the Blast output was generated on. If that does fix it, then try > putting your Blast files through dos2unix or something similar to > convert the linebreaks before running your analysis program. > > If they're the same OS, then we still have a problem! > > cheers, > Richard > > 2008/9/30 David Toomey : > > Hi > > > > > > > > I am parsing a blast result and I am getting a > > StringIndexOutOfBoundsException. The stack trace is > > > > > > > > at java.lang.String.substring(String.java:1938) > > > > at java.lang.String.substring(String.java:1905) > > > > at > > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA > > lignmentSAXParser.java:291) > > > > at > > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign > > mentSAXParser.java:116) > > > > at > > org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP > > arser.java:517) > > > > at > > org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP > > arser.java:287) > > > > at > > org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse > > r.java:251) > > > > at > > org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja > > va:117) > > > > at > > org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser > > .java:634) > > > > at > > org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:341 > > ) > > > > at > > org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:168) > > > > at > > org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars > > er.java:314) > > > > at > > org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser. > > java:276) > > > > at > > org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java > > :163) > > > > at ie.rcsi.blast.StandardParser.parse(StandardParser.java:65) > > > > at ie.rcsi.blast.BlastParser.parse(BlastParser.java:44) > > > > at ie.rcsi.blast.Main.main(Main.java:30) > > > > > > > > I have updated BlastLikeAlignmentSAXParser to output some debug info and > > narrowed down the line causing the problem to the following line > > > > > > > > 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) > > > > GN=ISPF > > > > > > > > If I remove the carriage return and put it on a single line then everything > > works fine. Strangely if I copy this entry and put it in a file on it's own > > it also parses correctly, even with the carriage return!!! > > > > > > > > Has anyone seen this before or does anyone have a suggestion on what I might > > to do fix it. I send the complete blast result if it would help. I have > > tried using blast 2.2.18 and 2.2.17 and the problem is the same. > > > > > > > > Cheers > > > > > > > > Dave > > > > > > > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From dtoomey at rcsi.ie Wed Oct 1 04:40:44 2008 From: dtoomey at rcsi.ie (David Toomey) Date: Wed, 1 Oct 2008 09:40:44 +0100 Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast result References: Message-ID: They are on the same OS. For all my tests I have run the blast search and parsing on the same OS. This has mostly been windows but I have also tried the whole thing on Linux and I get the same problem. I have done some more testing and I don't think the carriage return is the problem. What I have found is that if the second line is less than 11 characters the error is thrown. If I add 4 spaces in front of the 'GN=ISPF' on the second line then it is parsed correctly, like this. 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) GN=ISPF I haven't figured out why it parses correctly when it is the only entry in the file, even without the spaces. So maybe I am still missing something. Cheers, Dave -----Original Message----- From: dicknetherlands at gmail.com [mailto:dicknetherlands at gmail.com] On Behalf Of Richard Holland Sent: 30 September 2008 17:31 To: David Toomey Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] StringIndexOutOfBoundsException while parsing blast result Sounds like it _might_ be something to do with the carriage return itself. Is the blast file generated on the same OS that you're running your analysis on? (e.g. you might run Blast on a Linux box, but attempt to parse the file on a Windows box?). If the two OSes are different, this might point to it - as Linux won't necessarily understand the Windows linebreaks, or vice versa, and might misinterpret them. When you copy the portion of the file to a new file on the OS you're running the analysis on, it will substitute its own local linebreaks and thus mask the problem. So the first thing I'd check is to what the two OSes involved are. If they're different, try running your analysis program on the same OS as the Blast output was generated on. If that does fix it, then try putting your Blast files through dos2unix or something similar to convert the linebreaks before running your analysis program. If they're the same OS, then we still have a problem! cheers, Richard From holland at eaglegenomics.com Wed Oct 1 05:37:59 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 1 Oct 2008 10:37:59 +0100 Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast result In-Reply-To: References:
Message-ID: Thanks for the extra info. 2008/10/1 David Toomey : > They are on the same OS. For all my tests I have run the blast search and > parsing on the same OS. This has mostly been windows but I have also tried > the whole thing on Linux and I get the same problem. > I have done some more testing and I don't think the carriage return is the > problem. > What I have found is that if the second line is less than 11 characters the > error is thrown. If I add 4 spaces in front of the 'GN=ISPF' on the second > line then it is parsed correctly, like this. > > 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) > GN=ISPF > > I haven't figured out why it parses correctly when it is the only entry in > the file, even without the spaces. So maybe I am still missing something. > > Cheers, > > Dave > > -----Original Message----- > From: dicknetherlands at gmail.com [mailto:dicknetherlands at gmail.com] On Behalf > Of Richard Holland > Sent: 30 September 2008 17:31 > To: David Toomey > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] StringIndexOutOfBoundsException while parsing blast > result > > Sounds like it _might_ be something to do with the carriage return > itself. Is the blast file generated on the same OS that you're running > your analysis on? (e.g. you might run Blast on a Linux box, but > attempt to parse the file on a Windows box?). If the two OSes are > different, this might point to it - as Linux won't necessarily > understand the Windows linebreaks, or vice versa, and might > misinterpret them. When you copy the portion of the file to a new file > on the OS you're running the analysis on, it will substitute its own > local linebreaks and thus mask the problem. > > So the first thing I'd check is to what the two OSes involved are. If > they're different, try running your analysis program on the same OS as > the Blast output was generated on. If that does fix it, then try > putting your Blast files through dos2unix or something similar to > convert the linebreaks before running your analysis program. > > If they're the same OS, then we still have a problem! > > cheers, > Richard > > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From pzgyuanf at gmail.com Wed Oct 1 06:52:25 2008 From: pzgyuanf at gmail.com (pprun) Date: Wed, 01 Oct 2008 18:52:25 +0800 Subject: [Biojava-l] BufferedOutputStream to RichSequence.IOTools.writeXXX() method needs to flush manually Message-ID: Hi, I don't know this is a feature or a bug, If a BufferedOutputStream was passed to method RichSequence.IOTools.writeGenbank(OutputStream os, Sequence seq, Namespace ns), at the end, I need to manually flush it - BufferedOutputStream.flush() Otherwise, the output content will be truncated. Is this the expected behavior? Thanks, - Pprun From holland at eaglegenomics.com Wed Oct 1 09:36:59 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 1 Oct 2008 14:36:59 +0100 Subject: [Biojava-l] BufferedOutputStream to RichSequence.IOTools.writeXXX() method needs to flush manually In-Reply-To: References: Message-ID: The IOTools interfaces accept OutputStream instances, not BufferedOutputStream instances. flush() is not a requirement on OutputStream and so BJX does not call it. cheers, Richard 2008/10/1 pprun : > Hi, > I don't know this is a feature or a bug, > If a BufferedOutputStream was passed to method > RichSequence.IOTools.writeGenbank(OutputStream os, Sequence seq, > Namespace ns), > at the end, I need to manually flush it - BufferedOutputStream.flush() > > Otherwise, the output content will be truncated. > > Is this the expected behavior? > > Thanks, > - Pprun > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Wed Oct 1 20:46:03 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 2 Oct 2008 08:46:03 +0800 Subject: [Biojava-l] BufferedOutputStream to RichSequence.IOTools.writeXXX() method needs to flush manually In-Reply-To: References: Message-ID: <93b45ca50810011746y7d4f49biffd5c2e483c86bd1@mail.gmail.com> As a general rule it is best if BioJava doesn't handle the flushing and closing of OutputStreams. This is because you may want to keep using the stream and control it's behaivour. An interesting example is if you pass System.out to a method that closes the stream. Probably not what you want. Having said that maybe we should add a javadoc to say that BufferedOutputStreams need to be flushed (and possibly closed). - Mark On Wed, Oct 1, 2008 at 9:36 PM, Richard Holland wrote: > The IOTools interfaces accept OutputStream instances, not > BufferedOutputStream instances. flush() is not a requirement on > OutputStream and so BJX does not call it. > > cheers, > Richard > > 2008/10/1 pprun : >> Hi, >> I don't know this is a feature or a bug, >> If a BufferedOutputStream was passed to method >> RichSequence.IOTools.writeGenbank(OutputStream os, Sequence seq, >> Namespace ns), >> at the end, I need to manually flush it - BufferedOutputStream.flush() >> >> Otherwise, the output content will be truncated. >> >> Is this the expected behavior? >> >> Thanks, >> - Pprun >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From gabrielle_doan at gmx.net Tue Oct 7 10:26:44 2008 From: gabrielle_doan at gmx.net (Gabrielle Doan) Date: Tue, 07 Oct 2008 16:26:44 +0200 Subject: [Biojava-l] Getting a part of a sequence Message-ID: <48EB71A4.70409@gmx.net> Hi all, I have a BioSQL database which contains all human chromosomes. My intention is to get the information about a particular gene. How can I get a part of a particular chromosome with all associated features? At the moment I use following code to create my new sequence:

RichSequence subSeq = RichSequence.Tools.subSequence(parent,
	position[0], position[1], ns, geneName, parent.getAccession(),
	parent.getIdentifier(), parent.getVersion() + 1,
	(Double) (parent.getVersion() + 1.0));
<\code>

Here is the part how I get the parent sequence:

	public static RichSequence getChromosome(String chrNo) {
		Transaction tx = session.beginTransaction();
		RichSequence ret = null;

		String query;

		try {
			if (chrNo.equals("MT")) {
				query = "from BioEntry as be where be.description like '%:num%'";
				query = query.replaceAll(":num", "mitochondrion");
			} else {
				query = "from BioEntry as be where be.description like '%hromosome 
:num%'";
				query = query.replaceAll(":num", chrNo);
			}

			Query q = session.createQuery(query);

			ret = (RichSequence) q.list().get(0);
			tx.commit();
		} catch (Exception e) {
			tx.rollback();
			e.printStackTrace();
		}
		return ret;
	}
<\code>

I always have to load the whole chromsome to get a part of it, so it 
takes very long time and I get a lot of unused information (waste of 
memory). I also tried to use ThinRichSequence<\code> instead of 
RichSequence<\code>, but thereby I didn't notice any difference.
Can you give me a hint how to accelerate the code?
I am grateful for any hits.

cheers,
Gabrielle

From holland at eaglegenomics.com  Tue Oct  7 19:05:54 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 8 Oct 2008 00:05:54 +0100
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <48EB71A4.70409@gmx.net>
References: <48EB71A4.70409@gmx.net>
Message-ID: 

Hello.

Your code is pretty good already - but you're right, it will load the
whole chromosome into memory before you can chop out the interesting
bit you actually need.

As you observed, by using ThinRichSequence in your query it will load
only the initial shell of a sequence object to start with, but the
moment you try and sub-sequence it, it will immediately load the whole
sequence data into memory in order to perform the operation.

If you only want the sequence data, as a string, you can do this by
specifying the sequence attribute in the query and bypassing the
sequence object entirely:

 select rs.stringSequence from Sequence as rs where rs.description
like '%hromosome :num%

This will return a String instead of a RichSequence object. You can
use HQL operators to perform substrings etc. on the string inside the
query itself - see
http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
, particularly section 14.9.

If you only want the features, you can do this by using the
BioSQLFeatureFilter technique. In particular you will want the
BySequenceName filter, the And filter, and the OverlapsRichLocation
filter. You construct a filter then pass it to the filter() method in
BioSQLRichSequenceDB. The database will return to you all the
RichFeature objects that match your criteria. Note that it searches
the whole database so you really must use a BySequenceName filter at
the very least in order to make the results useful!

However, you can't use HQL to construct a complete slice of a sequence
directly in the database before returning it to the program for use as
a ready-made RichSequence object. This would require Hibernate to know
what a BioJava sub-sequence object is and how it behaves in relation
to an 'unsliced' one, which is beyond the scope of it's job as a
persistence framework.

cheers,
Richard



2008/10/7 Gabrielle Doan :
> Hi all,
> I have a BioSQL database which contains all human chromosomes. My intention
> is to get the information about a particular gene. How can I get a part of a
> particular chromosome with all associated features? At the moment I use
> following code to create my new sequence:
>
> 
> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>        position[0], position[1], ns, geneName, parent.getAccession(),
>        parent.getIdentifier(), parent.getVersion() + 1,
>        (Double) (parent.getVersion() + 1.0));
> <\code>
>
> Here is the part how I get the parent sequence:
> 
>        public static RichSequence getChromosome(String chrNo) {
>                Transaction tx = session.beginTransaction();
>                RichSequence ret = null;
>
>                String query;
>
>                try {
>                        if (chrNo.equals("MT")) {
>                                query = "from BioEntry as be where
> be.description like '%:num%'";
>                                query = query.replaceAll(":num",
> "mitochondrion");
>                        } else {
>                                query = "from BioEntry as be where
> be.description like '%hromosome :num%'";
>                                query = query.replaceAll(":num", chrNo);
>                        }
>
>                        Query q = session.createQuery(query);
>
>                        ret = (RichSequence) q.list().get(0);
>                        tx.commit();
>                } catch (Exception e) {
>                        tx.rollback();
>                        e.printStackTrace();
>                }
>                return ret;
>        }
> <\code>
>
> I always have to load the whole chromsome to get a part of it, so it takes
> very long time and I get a lot of unused information (waste of memory). I
> also tried to use ThinRichSequence<\code> instead of
> RichSequence<\code>, but thereby I didn't notice any difference.
> Can you give me a hint how to accelerate the code?
> I am grateful for any hits.
>
> cheers,
> Gabrielle
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From koen.bruynseels at cropdesign.com  Tue Oct  7 20:02:18 2008
From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com)
Date: Wed, 8 Oct 2008 02:02:18 +0200
Subject: [Biojava-l] Koen Bruynseels is out of the office.
Message-ID: 


I will be out of the office starting  04/10/2008 and will not return until
09/10/2008.

I will respond to your message when I return.


From gabrielle_doan at gmx.net  Thu Oct  9 08:22:01 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Thu, 09 Oct 2008 14:22:01 +0200
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: 
References: <48EB71A4.70409@gmx.net>
	
Message-ID: <48EDF769.8050901@gmx.net>

Hi Richard,

thanks a lot for your mail. I have successfully retrieved the 
subsequence of a sequence as a String. And now I try to get the features 
for a particular range with following code:


	public FeatureHolder filterFeature(String name, int startpos, int endpos) {
		RichLocation rl = new SimpleRichLocation(new SimplePosition(startpos),
				new SimplePosition(endpos), 0);
		BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
				new BioSQLFeatureFilter.BySequenceName(name),
				new BioSQLFeatureFilter.OverlapsRichLocation(rl));
		return filter(filter);
	}
<\code>

Fortunately I received these errors:

Exception in thread "main" java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
	at org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
	at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
	... 3 more
Caused by: org.hibernate.PropertyAccessException: Exception occurred 
inside setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
	at 
org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
	at 
org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
	at 
org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
	at 
org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
	at 
org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
	at org.hibernate.loader.Loader.doQuery(Loader.java:729)
	at 
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
	at org.hibernate.loader.Loader.doList(Loader.java:2213)
	at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
	at org.hibernate.loader.Loader.list(Loader.java:2099)
	at 
org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
	at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
	at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
	... 21 more
Caused by: java.lang.NullPointerException
	at 
org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
	at 
org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
	at 
org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
	at 
org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
	at org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
	at 
org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
	... 26 more
<\message>

Why do I get these errors?
BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. 
How can I find out the sequence name? Is it the value "name" in the 
table "Bioentry"? As the build-in subSequence method takes a long time I 
intend to get the subsequence as a String by myself and add the features 
to it. What do you think about this?

I'm grateful for any hints.
cheers,

Gabrielle



Richard Holland schrieb:
> Hello.
> 
> Your code is pretty good already - but you're right, it will load the
> whole chromosome into memory before you can chop out the interesting
> bit you actually need.
> 
> As you observed, by using ThinRichSequence in your query it will load
> only the initial shell of a sequence object to start with, but the
> moment you try and sub-sequence it, it will immediately load the whole
> sequence data into memory in order to perform the operation.
> 
> If you only want the sequence data, as a string, you can do this by
> specifying the sequence attribute in the query and bypassing the
> sequence object entirely:
> 
>  select rs.stringSequence from Sequence as rs where rs.description
> like '%hromosome :num%
> 
> This will return a String instead of a RichSequence object. You can
> use HQL operators to perform substrings etc. on the string inside the
> query itself - see
> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
> , particularly section 14.9.
> 
> If you only want the features, you can do this by using the
> BioSQLFeatureFilter technique. In particular you will want the
> BySequenceName filter, the And filter, and the OverlapsRichLocation
> filter. You construct a filter then pass it to the filter() method in
> BioSQLRichSequenceDB. The database will return to you all the
> RichFeature objects that match your criteria. Note that it searches
> the whole database so you really must use a BySequenceName filter at
> the very least in order to make the results useful!
> 
> However, you can't use HQL to construct a complete slice of a sequence
> directly in the database before returning it to the program for use as
> a ready-made RichSequence object. This would require Hibernate to know
> what a BioJava sub-sequence object is and how it behaves in relation
> to an 'unsliced' one, which is beyond the scope of it's job as a
> persistence framework.
> 
> cheers,
> Richard
> 
> 
> 
> 2008/10/7 Gabrielle Doan :
>> Hi all,
>> I have a BioSQL database which contains all human chromosomes. My intention
>> is to get the information about a particular gene. How can I get a part of a
>> particular chromosome with all associated features? At the moment I use
>> following code to create my new sequence:
>>
>> 
>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>        position[0], position[1], ns, geneName, parent.getAccession(),
>>        parent.getIdentifier(), parent.getVersion() + 1,
>>        (Double) (parent.getVersion() + 1.0));
>> <\code>
>>
>> Here is the part how I get the parent sequence:
>> 
>>        public static RichSequence getChromosome(String chrNo) {
>>                Transaction tx = session.beginTransaction();
>>                RichSequence ret = null;
>>
>>                String query;
>>
>>                try {
>>                        if (chrNo.equals("MT")) {
>>                                query = "from BioEntry as be where
>> be.description like '%:num%'";
>>                                query = query.replaceAll(":num",
>> "mitochondrion");
>>                        } else {
>>                                query = "from BioEntry as be where
>> be.description like '%hromosome :num%'";
>>                                query = query.replaceAll(":num", chrNo);
>>                        }
>>
>>                        Query q = session.createQuery(query);
>>
>>                        ret = (RichSequence) q.list().get(0);
>>                        tx.commit();
>>                } catch (Exception e) {
>>                        tx.rollback();
>>                        e.printStackTrace();
>>                }
>>                return ret;
>>        }
>> <\code>
>>
>> I always have to load the whole chromsome to get a part of it, so it takes
>> very long time and I get a lot of unused information (waste of memory). I
>> also tried to use ThinRichSequence<\code> instead of
>> RichSequence<\code>, but thereby I didn't notice any difference.
>> Can you give me a hint how to accelerate the code?
>> I am grateful for any hits.
>>
>> cheers,
>> Gabrielle
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> 
> 
> 


From holland at eaglegenomics.com  Fri Oct 10 10:30:03 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 10 Oct 2008 15:30:03 +0100
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <48EDF769.8050901@gmx.net>
References: <48EB71A4.70409@gmx.net>
	
	<48EDF769.8050901@gmx.net>
Message-ID: 

This looks like a bug in BJX. I have just committed a fix that I think will
fix it to the head of subversion. Can you check out the latest source,
compile it, and try your program again?

cheers,
Richard

2008/10/9 Gabrielle Doan 

> Hi Richard,
>
> thanks a lot for your mail. I have successfully retrieved the subsequence
> of a sequence as a String. And now I try to get the features for a
> particular range with following code:
>
> 
>        public FeatureHolder filterFeature(String name, int startpos, int
> endpos) {
>                RichLocation rl = new SimpleRichLocation(new
> SimplePosition(startpos),
>                                new SimplePosition(endpos), 0);
>                BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>                                new
> BioSQLFeatureFilter.BySequenceName(name),
>                                new
> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>                return filter(filter);
>        }
> <\code>
>
> Fortunately I received these errors:
> 
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>        at
> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>        at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>        ... 3 more
> Caused by: org.hibernate.PropertyAccessException: Exception occurred inside
> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>        at
> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>        at
> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>        at
> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>        at
> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>        at
> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>        at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>        at
> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>        at org.hibernate.loader.Loader.doList(Loader.java:2213)
>        at
> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>        at org.hibernate.loader.Loader.list(Loader.java:2099)
>        at
> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>        at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>        ... 8 more
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>        ... 21 more
> Caused by: java.lang.NullPointerException
>        at
> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>        at
> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>        at
> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>        ... 26 more
> <\message>
>
> Why do I get these errors?
> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. How
> can I find out the sequence name? Is it the value "name" in the table
> "Bioentry"? As the build-in subSequence method takes a long time I intend to
> get the subsequence as a String by myself and add the features to it. What
> do you think about this?
>
> I'm grateful for any hints.
> cheers,
>
> Gabrielle
>
>
>
> Richard Holland schrieb:
>
>  Hello.
>>
>> Your code is pretty good already - but you're right, it will load the
>> whole chromosome into memory before you can chop out the interesting
>> bit you actually need.
>>
>> As you observed, by using ThinRichSequence in your query it will load
>> only the initial shell of a sequence object to start with, but the
>> moment you try and sub-sequence it, it will immediately load the whole
>> sequence data into memory in order to perform the operation.
>>
>> If you only want the sequence data, as a string, you can do this by
>> specifying the sequence attribute in the query and bypassing the
>> sequence object entirely:
>>
>>  select rs.stringSequence from Sequence as rs where rs.description
>> like '%hromosome :num%
>>
>> This will return a String instead of a RichSequence object. You can
>> use HQL operators to perform substrings etc. on the string inside the
>> query itself - see
>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>> , particularly section 14.9.
>>
>> If you only want the features, you can do this by using the
>> BioSQLFeatureFilter technique. In particular you will want the
>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>> filter. You construct a filter then pass it to the filter() method in
>> BioSQLRichSequenceDB. The database will return to you all the
>> RichFeature objects that match your criteria. Note that it searches
>> the whole database so you really must use a BySequenceName filter at
>> the very least in order to make the results useful!
>>
>> However, you can't use HQL to construct a complete slice of a sequence
>> directly in the database before returning it to the program for use as
>> a ready-made RichSequence object. This would require Hibernate to know
>> what a BioJava sub-sequence object is and how it behaves in relation
>> to an 'unsliced' one, which is beyond the scope of it's job as a
>> persistence framework.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/7 Gabrielle Doan :
>>
>>> Hi all,
>>> I have a BioSQL database which contains all human chromosomes. My
>>> intention
>>> is to get the information about a particular gene. How can I get a part
>>> of a
>>> particular chromosome with all associated features? At the moment I use
>>> following code to create my new sequence:
>>>
>>> 
>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>       position[0], position[1], ns, geneName, parent.getAccession(),
>>>       parent.getIdentifier(), parent.getVersion() + 1,
>>>       (Double) (parent.getVersion() + 1.0));
>>> <\code>
>>>
>>> Here is the part how I get the parent sequence:
>>> 
>>>       public static RichSequence getChromosome(String chrNo) {
>>>               Transaction tx = session.beginTransaction();
>>>               RichSequence ret = null;
>>>
>>>               String query;
>>>
>>>               try {
>>>                       if (chrNo.equals("MT")) {
>>>                               query = "from BioEntry as be where
>>> be.description like '%:num%'";
>>>                               query = query.replaceAll(":num",
>>> "mitochondrion");
>>>                       } else {
>>>                               query = "from BioEntry as be where
>>> be.description like '%hromosome :num%'";
>>>                               query = query.replaceAll(":num", chrNo);
>>>                       }
>>>
>>>                       Query q = session.createQuery(query);
>>>
>>>                       ret = (RichSequence) q.list().get(0);
>>>                       tx.commit();
>>>               } catch (Exception e) {
>>>                       tx.rollback();
>>>                       e.printStackTrace();
>>>               }
>>>               return ret;
>>>       }
>>> <\code>
>>>
>>> I always have to load the whole chromsome to get a part of it, so it
>>> takes
>>> very long time and I get a lot of unused information (waste of memory). I
>>> also tried to use ThinRichSequence<\code> instead of
>>> RichSequence<\code>, but thereby I didn't notice any difference.
>>> Can you give me a hint how to accelerate the code?
>>> I am grateful for any hits.
>>>
>>> cheers,
>>> Gabrielle
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>
>>
>>
>>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From gabrielle_doan at gmx.net  Tue Oct 14 07:18:20 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Tue, 14 Oct 2008 13:18:20 +0200
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: 
References: <48EB71A4.70409@gmx.net>	
		
	<48EDF769.8050901@gmx.net>
	
Message-ID: <48F47FFC.4090607@gmx.net>

Hi Richard,
I have checked out the latest source and tried my code again. It still 
didn't work and I received following new errors:


Exception in thread "main" java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
	at org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:612)
	at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
	... 3 more
Caused by: org.hibernate.PropertyAccessException: Exception occurred 
inside setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
	at 
org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
	at 
org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
	at 
org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
	at 
org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
	at 
org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
	at org.hibernate.loader.Loader.doQuery(Loader.java:729)
	at 
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
	at org.hibernate.loader.Loader.doList(Loader.java:2213)
	at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
	at org.hibernate.loader.Loader.list(Loader.java:2099)
	at 
org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
	at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
	at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
	... 21 more
Caused by: java.lang.NullPointerException
	at 
org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
	at 
org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
	at 
org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
	at 
org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
	at org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
	at 
org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
	... 25 more
<\message>

I think  BioSQLFeatureFilter.OverlapsRichLocation(rl) <\code> 
causes the problem I have. Can you help me to solve this problem?

I'm grateful for any hints.
cheers,

Gabrielle



Richard Holland schrieb:
> This looks like a bug in BJX. I have just committed a fix that I think will
> fix it to the head of subversion. Can you check out the latest source,
> compile it, and try your program again?
> 
> cheers,
> Richard
> 
> 2008/10/9 Gabrielle Doan 
> 
>> Hi Richard,
>>
>> thanks a lot for your mail. I have successfully retrieved the subsequence
>> of a sequence as a String. And now I try to get the features for a
>> particular range with following code:
>>
>> 
>>        public FeatureHolder filterFeature(String name, int startpos, int
>> endpos) {
>>                RichLocation rl = new SimpleRichLocation(new
>> SimplePosition(startpos),
>>                                new SimplePosition(endpos), 0);
>>                BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>>                                new
>> BioSQLFeatureFilter.BySequenceName(name),
>>                                new
>> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>>                return filter(filter);
>>        }
>> <\code>
>>
>> Fortunately I received these errors:
>> 
>> Exception in thread "main" java.lang.RuntimeException:
>> java.lang.reflect.InvocationTargetException
>>        at
>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>        at
>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>        at
>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>>        at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>> Caused by: java.lang.reflect.InvocationTargetException
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>        ... 3 more
>> Caused by: org.hibernate.PropertyAccessException: Exception occurred inside
>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>        at
>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>        at
>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>        at
>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>        at
>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>        at
>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>        at
>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>        at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>        at
>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>        at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>        at
>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>        at org.hibernate.loader.Loader.list(Loader.java:2099)
>>        at
>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>        at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>        ... 8 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>        ... 21 more
>> Caused by: java.lang.NullPointerException
>>        at
>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>        at
>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>        at
>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>        at
>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>        at
>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>        at
>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>        ... 26 more
>> <\message>
>>
>> Why do I get these errors?
>> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. How
>> can I find out the sequence name? Is it the value "name" in the table
>> "Bioentry"? As the build-in subSequence method takes a long time I intend to
>> get the subsequence as a String by myself and add the features to it. What
>> do you think about this?
>>
>> I'm grateful for any hints.
>> cheers,
>>
>> Gabrielle
>>
>>
>>
>> Richard Holland schrieb:
>>
>>  Hello.
>>> Your code is pretty good already - but you're right, it will load the
>>> whole chromosome into memory before you can chop out the interesting
>>> bit you actually need.
>>>
>>> As you observed, by using ThinRichSequence in your query it will load
>>> only the initial shell of a sequence object to start with, but the
>>> moment you try and sub-sequence it, it will immediately load the whole
>>> sequence data into memory in order to perform the operation.
>>>
>>> If you only want the sequence data, as a string, you can do this by
>>> specifying the sequence attribute in the query and bypassing the
>>> sequence object entirely:
>>>
>>>  select rs.stringSequence from Sequence as rs where rs.description
>>> like '%hromosome :num%
>>>
>>> This will return a String instead of a RichSequence object. You can
>>> use HQL operators to perform substrings etc. on the string inside the
>>> query itself - see
>>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>>> , particularly section 14.9.
>>>
>>> If you only want the features, you can do this by using the
>>> BioSQLFeatureFilter technique. In particular you will want the
>>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>>> filter. You construct a filter then pass it to the filter() method in
>>> BioSQLRichSequenceDB. The database will return to you all the
>>> RichFeature objects that match your criteria. Note that it searches
>>> the whole database so you really must use a BySequenceName filter at
>>> the very least in order to make the results useful!
>>>
>>> However, you can't use HQL to construct a complete slice of a sequence
>>> directly in the database before returning it to the program for use as
>>> a ready-made RichSequence object. This would require Hibernate to know
>>> what a BioJava sub-sequence object is and how it behaves in relation
>>> to an 'unsliced' one, which is beyond the scope of it's job as a
>>> persistence framework.
>>>
>>> cheers,
>>> Richard
>>>
>>>
>>>
>>> 2008/10/7 Gabrielle Doan :
>>>
>>>> Hi all,
>>>> I have a BioSQL database which contains all human chromosomes. My
>>>> intention
>>>> is to get the information about a particular gene. How can I get a part
>>>> of a
>>>> particular chromosome with all associated features? At the moment I use
>>>> following code to create my new sequence:
>>>>
>>>> 
>>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>>       position[0], position[1], ns, geneName, parent.getAccession(),
>>>>       parent.getIdentifier(), parent.getVersion() + 1,
>>>>       (Double) (parent.getVersion() + 1.0));
>>>> <\code>
>>>>
>>>> Here is the part how I get the parent sequence:
>>>> 
>>>>       public static RichSequence getChromosome(String chrNo) {
>>>>               Transaction tx = session.beginTransaction();
>>>>               RichSequence ret = null;
>>>>
>>>>               String query;
>>>>
>>>>               try {
>>>>                       if (chrNo.equals("MT")) {
>>>>                               query = "from BioEntry as be where
>>>> be.description like '%:num%'";
>>>>                               query = query.replaceAll(":num",
>>>> "mitochondrion");
>>>>                       } else {
>>>>                               query = "from BioEntry as be where
>>>> be.description like '%hromosome :num%'";
>>>>                               query = query.replaceAll(":num", chrNo);
>>>>                       }
>>>>
>>>>                       Query q = session.createQuery(query);
>>>>
>>>>                       ret = (RichSequence) q.list().get(0);
>>>>                       tx.commit();
>>>>               } catch (Exception e) {
>>>>                       tx.rollback();
>>>>                       e.printStackTrace();
>>>>               }
>>>>               return ret;
>>>>       }
>>>> <\code>
>>>>
>>>> I always have to load the whole chromsome to get a part of it, so it
>>>> takes
>>>> very long time and I get a lot of unused information (waste of memory). I
>>>> also tried to use ThinRichSequence<\code> instead of
>>>> RichSequence<\code>, but thereby I didn't notice any difference.
>>>> Can you give me a hint how to accelerate the code?
>>>> I am grateful for any hits.
>>>>
>>>> cheers,
>>>> Gabrielle
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>>
>>>
>>>
> 
> 


From holland at eaglegenomics.com  Tue Oct 14 11:23:10 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 14 Oct 2008 16:23:10 +0100
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <48F47FFC.4090607@gmx.net>
References: <48EB71A4.70409@gmx.net>
	
	<48EDF769.8050901@gmx.net>
	
	<48F47FFC.4090607@gmx.net>
Message-ID: 

Something's broken! At least from your stack trace I can see exactly what's
going on. The set of locations is being loaded for the feature, but
Hibernate is not calling the setMin()/setMax() methods in each location
before inserting them into the set.

When they get added to the set of locations for the feature, they therefore
get added with null for min and max. At any point when these locations are
used, for instance when they are merged by the feature location setter, or
anywhere else, you'll get NullPointerExceptions.

This is despite the fact that the HBM XML files are explicitly telling it
_not_ to lazy-load them. Also this only happens when loading Features, and
not when loading Sequence objects.

I honestly don't know!

What I suggest is that you create a temporary database with only one record
in it, and run your test program against that to see what happens. If it
still breaks, raise a bug on BugZilla and post the Genbank dump of the
database to BugZilla along with your program code and the full stacktrace.
Someone with a bit more Hibernate knowledge than me might then be able to
help out.

cheers,
Richard


2008/10/14 Gabrielle Doan 

> Hi Richard,
> I have checked out the latest source and tried my code again. It still
> didn't work and I received following new errors:
>
> 
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>        at
> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:612)
>        at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>        ... 3 more
> Caused by: org.hibernate.PropertyAccessException: Exception occurred inside
> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>        at
> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>        at
> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>        at
> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>        at
> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>        at
> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>        at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>        at
> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>        at org.hibernate.loader.Loader.doList(Loader.java:2213)
>        at
> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>        at org.hibernate.loader.Loader.list(Loader.java:2099)
>        at
> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>        at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>        ... 8 more
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>        ... 21 more
> Caused by: java.lang.NullPointerException
>        at
> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>        at
> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>        at
> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>        ... 25 more
> <\message>
>
> I think  BioSQLFeatureFilter.OverlapsRichLocation(rl) <\code> causes
> the problem I have. Can you help me to solve this problem?
>
> I'm grateful for any hints.
> cheers,
>
> Gabrielle
>
>
>
> Richard Holland schrieb:
>
>> This looks like a bug in BJX. I have just committed a fix that I think
>> will
>> fix it to the head of subversion. Can you check out the latest source,
>> compile it, and try your program again?
>>
>> cheers,
>> Richard
>>
>> 2008/10/9 Gabrielle Doan 
>>
>>  Hi Richard,
>>>
>>> thanks a lot for your mail. I have successfully retrieved the subsequence
>>> of a sequence as a String. And now I try to get the features for a
>>> particular range with following code:
>>>
>>> 
>>>       public FeatureHolder filterFeature(String name, int startpos, int
>>> endpos) {
>>>               RichLocation rl = new SimpleRichLocation(new
>>> SimplePosition(startpos),
>>>                               new SimplePosition(endpos), 0);
>>>               BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>>>                               new
>>> BioSQLFeatureFilter.BySequenceName(name),
>>>                               new
>>> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>>>               return filter(filter);
>>>       }
>>> <\code>
>>>
>>> Fortunately I received these errors:
>>> 
>>> Exception in thread "main" java.lang.RuntimeException:
>>> java.lang.reflect.InvocationTargetException
>>>       at
>>>
>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>>       at
>>>
>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>>       at
>>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>>>       at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>       at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>       at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at
>>>
>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>>       ... 3 more
>>> Caused by: org.hibernate.PropertyAccessException: Exception occurred
>>> inside
>>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>>       at
>>>
>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>>       at
>>>
>>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>>       at
>>>
>>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>>       at
>>>
>>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>>       at
>>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>>       at
>>>
>>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>>       at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>>       at
>>>
>>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>>       at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>>       at
>>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>>       at org.hibernate.loader.Loader.list(Loader.java:2099)
>>>       at
>>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>>       at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>>       at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>>       ... 8 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>       at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>       at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at
>>>
>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>>       ... 21 more
>>> Caused by: java.lang.NullPointerException
>>>       at
>>>
>>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>>       at
>>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>>       ... 26 more
>>> <\message>
>>>
>>> Why do I get these errors?
>>> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter.
>>> How
>>> can I find out the sequence name? Is it the value "name" in the table
>>> "Bioentry"? As the build-in subSequence method takes a long time I intend
>>> to
>>> get the subsequence as a String by myself and add the features to it.
>>> What
>>> do you think about this?
>>>
>>> I'm grateful for any hints.
>>> cheers,
>>>
>>> Gabrielle
>>>
>>>
>>>
>>> Richard Holland schrieb:
>>>
>>>  Hello.
>>>
>>>> Your code is pretty good already - but you're right, it will load the
>>>> whole chromosome into memory before you can chop out the interesting
>>>> bit you actually need.
>>>>
>>>> As you observed, by using ThinRichSequence in your query it will load
>>>> only the initial shell of a sequence object to start with, but the
>>>> moment you try and sub-sequence it, it will immediately load the whole
>>>> sequence data into memory in order to perform the operation.
>>>>
>>>> If you only want the sequence data, as a string, you can do this by
>>>> specifying the sequence attribute in the query and bypassing the
>>>> sequence object entirely:
>>>>
>>>>  select rs.stringSequence from Sequence as rs where rs.description
>>>> like '%hromosome :num%
>>>>
>>>> This will return a String instead of a RichSequence object. You can
>>>> use HQL operators to perform substrings etc. on the string inside the
>>>> query itself - see
>>>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>>>> , particularly section 14.9.
>>>>
>>>> If you only want the features, you can do this by using the
>>>> BioSQLFeatureFilter technique. In particular you will want the
>>>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>>>> filter. You construct a filter then pass it to the filter() method in
>>>> BioSQLRichSequenceDB. The database will return to you all the
>>>> RichFeature objects that match your criteria. Note that it searches
>>>> the whole database so you really must use a BySequenceName filter at
>>>> the very least in order to make the results useful!
>>>>
>>>> However, you can't use HQL to construct a complete slice of a sequence
>>>> directly in the database before returning it to the program for use as
>>>> a ready-made RichSequence object. This would require Hibernate to know
>>>> what a BioJava sub-sequence object is and how it behaves in relation
>>>> to an 'unsliced' one, which is beyond the scope of it's job as a
>>>> persistence framework.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>>
>>>>
>>>> 2008/10/7 Gabrielle Doan :
>>>>
>>>>  Hi all,
>>>>> I have a BioSQL database which contains all human chromosomes. My
>>>>> intention
>>>>> is to get the information about a particular gene. How can I get a part
>>>>> of a
>>>>> particular chromosome with all associated features? At the moment I use
>>>>> following code to create my new sequence:
>>>>>
>>>>> 
>>>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>>>      position[0], position[1], ns, geneName, parent.getAccession(),
>>>>>      parent.getIdentifier(), parent.getVersion() + 1,
>>>>>      (Double) (parent.getVersion() + 1.0));
>>>>> <\code>
>>>>>
>>>>> Here is the part how I get the parent sequence:
>>>>> 
>>>>>      public static RichSequence getChromosome(String chrNo) {
>>>>>              Transaction tx = session.beginTransaction();
>>>>>              RichSequence ret = null;
>>>>>
>>>>>              String query;
>>>>>
>>>>>              try {
>>>>>                      if (chrNo.equals("MT")) {
>>>>>                              query = "from BioEntry as be where
>>>>> be.description like '%:num%'";
>>>>>                              query = query.replaceAll(":num",
>>>>> "mitochondrion");
>>>>>                      } else {
>>>>>                              query = "from BioEntry as be where
>>>>> be.description like '%hromosome :num%'";
>>>>>                              query = query.replaceAll(":num", chrNo);
>>>>>                      }
>>>>>
>>>>>                      Query q = session.createQuery(query);
>>>>>
>>>>>                      ret = (RichSequence) q.list().get(0);
>>>>>                      tx.commit();
>>>>>              } catch (Exception e) {
>>>>>                      tx.rollback();
>>>>>                      e.printStackTrace();
>>>>>              }
>>>>>              return ret;
>>>>>      }
>>>>> <\code>
>>>>>
>>>>> I always have to load the whole chromsome to get a part of it, so it
>>>>> takes
>>>>> very long time and I get a lot of unused information (waste of memory).
>>>>> I
>>>>> also tried to use ThinRichSequence<\code> instead of
>>>>> RichSequence<\code>, but thereby I didn't notice any difference.
>>>>> Can you give me a hint how to accelerate the code?
>>>>> I am grateful for any hits.
>>>>>
>>>>> cheers,
>>>>> Gabrielle
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From charles at imbusch.net  Tue Oct 14 17:03:04 2008
From: charles at imbusch.net (Charles Imbusch)
Date: Tue, 14 Oct 2008 23:03:04 +0200
Subject: [Biojava-l] parsing tblastn results
Message-ID: <48F50908.5060307@imbusch.net>

Hello,

for a project I want to parse a tblastn result with BioJava. I used the code
on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I 
get an
error message as follows:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: 
String index out of range: -3
    at java.lang.String.substring(String.java:1938)
    at java.lang.String.substring(String.java:1905)
    at 
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289)
    at 
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118)
    at 
org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635)
    at 
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337)
    at 
org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
    at 
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313)
    at 
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276)
    at 
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162)
    at BlastEcho.echo(BlastEcho.java:29)
    at BlastEcho.main(BlastEcho.java:75)

I uploaded the Blast output file I want to parse here:
http://charles.imbusch.net/tmp/blastresult.txt

Any answer is appreciated.

Cheers,
  Charles

From ayates at ebi.ac.uk  Wed Oct 15 04:07:35 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 15 Oct 2008 09:07:35 +0100
Subject: [Biojava-l] ANN: EBI Course - Programmatic access in Java:
 webservices & work flows
Message-ID: <48F5A4C7.7010304@ebi.ac.uk>

Hi everyone,

Posting this here as it may be of interest to some people.

The EBI is holding a course in accessing a large number of its resources
from Java programs. The course will run from the 24th - 27th November
being held on-site at the Hinxton Genome Campus. Resources being covered
will include:

* Ontology Lookup Service - Offers access to multiple ontologies through
a common interface
* PICR - A tool for going between identifier spaces for proteins)
* UniProt
* IntAct
* ChEBI
* BioMart
* Integr8
* CiteXplore
* And many many more :)

If you are interested in any of these resources then please go to
http://www.ebi.ac.uk/training/handson/course_081124_javawebservices.html
. The course will cost you ?75 for the 3 days.

All the best,

Andy Yates

From holland at eaglegenomics.com  Wed Oct 15 04:13:18 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 15 Oct 2008 09:13:18 +0100
Subject: [Biojava-l] parsing tblastn results
In-Reply-To: <48F50908.5060307@imbusch.net>
References: <48F50908.5060307@imbusch.net>
Message-ID: 

I've raised a bug report for you. Hopefully someone will take a look at it
soon:

http://bugzilla.open-bio.org/show_bug.cgi?id=2617

cheers,
Richard

2008/10/14 Charles Imbusch 

> Hello,
>
> for a project I want to parse a tblastn result with BioJava. I used the
> code
> on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I
> get an
> error message as follows:
>
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
> String index out of range: -3
>   at java.lang.String.substring(String.java:1938)
>   at java.lang.String.substring(String.java:1905)
>   at
> org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289)
>   at
> org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
>   at
> org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313)
>   at
> org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276)
>   at
> org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162)
>   at BlastEcho.echo(BlastEcho.java:29)
>   at BlastEcho.main(BlastEcho.java:75)
>
> I uploaded the Blast output file I want to parse here:
> http://charles.imbusch.net/tmp/blastresult.txt
>
> Any answer is appreciated.
>
> Cheers,
>  Charles
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From dtoomey at rcsi.ie  Wed Oct 15 05:46:58 2008
From: dtoomey at rcsi.ie (David Toomey)
Date: Wed, 15 Oct 2008 10:46:58 +0100
Subject: [Biojava-l] parsing tblastn results
References: <48F50908.5060307@imbusch.net>
	
Message-ID: 

Hi Richard

This looks suspiciously like a bug I raised a couple of weeks ago. I was
parsing blastp results but the stack trace is the same.

http://bugzilla.open-bio.org/show_bug.cgi?id=2603

Charles, I have updated the original bug with a hack which at least allows
you to parse the result and get an output. You just need to recompile the
source code with the modified 'BlastLikeAlignmentSAXParser.java. Not ideal
but at least you will be able to run your code until the source is fixed.

Cheers

Dave

-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Richard Holland
Sent: 15 October 2008 09:13
To: Charles Imbusch
Cc: biojava-l at biojava.org
Subject: Re: [Biojava-l] parsing tblastn results

I've raised a bug report for you. Hopefully someone will take a look at it
soon:

http://bugzilla.open-bio.org/show_bug.cgi?id=2617

cheers,
Richard

2008/10/14 Charles Imbusch 

> Hello,
>
> for a project I want to parse a tblastn result with BioJava. I used the
> code
> on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I
> get an
> error message as follows:
>
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
> String index out of range: -3
>   at java.lang.String.substring(String.java:1938)
>   at java.lang.String.substring(String.java:1905)
>   at
>
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA
lignmentSAXParser.java:289)
>   at
>
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign
mentSAXParser.java:115)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP
arser.java:514)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP
arser.java:287)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse
r.java:251)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja
va:118)
>   at
>
org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser
.java:635)
>   at
>
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337
)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
>   at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars
er.java:313)
>   at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.
java:276)
>   at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java
:162)
>   at BlastEcho.echo(BlastEcho.java:29)
>   at BlastEcho.main(BlastEcho.java:75)
>
> I uploaded the Blast output file I want to parse here:
> http://charles.imbusch.net/tmp/blastresult.txt
>
> Any answer is appreciated.
>
> Cheers,
>  Charles
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l



From gabrielle_doan at gmx.net  Wed Oct 15 09:15:39 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Wed, 15 Oct 2008 15:15:39 +0200
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <381a3e850810142152p4e0a0c2ds80a74570b44f2be0@mail.gmail.com>
References: <48EB71A4.70409@gmx.net>	
		
	<48EDF769.8050901@gmx.net>	
		
	<48F47FFC.4090607@gmx.net>	
	<381a3e850810140928p4af06cf4r3dfd08908efd42f6@mail.gmail.com>	
	<48F4C99E.6070007@gmx.net>
	<381a3e850810142152p4e0a0c2ds80a74570b44f2be0@mail.gmail.com>
Message-ID: <48F5ECFB.6040703@gmx.net>

Hi Augusto,

I've inserted your files into BJX. Unfortunately it hasn't solved my 
problems. Maybe Richard has another idea how to handle it.

Best regards,
Gabrielle




Augusto Fernandes Vellozo schrieb:
> Hi Gabrielle,
> Please, let me know if the results ares ok or not.
> I remember, when I made the corrections, I didn't see the case with
> circularLength, because for my use case it doesn't matter and because
> i don't understand exactly what is this. Take care, if you have this
> use case.
> 
> Cheers,
> 
> Augusto
> 
> 2008/10/14 Gabrielle Doan :
>> Hi Augusto,
>>
>> thank you so much. I hope this will be the solution to my problem.
>>
>> cheers,
>> Gabrielle
>>
>> Augusto Fernandes Vellozo schrieb:
>>> Hi Gabrielle,
>>> I had some problems with the class Location and i modified some
>>> classes in my machine. I've already written to Richard.
>>> The classes modified are attached.
>>> These could help you.
>>>
>>> Good luck,
>>>
>>> Augusto
>>>
>>> 2008/10/14 Gabrielle Doan :
>>>> Hi Richard,
>>>> I have checked out the latest source and tried my code again. It still
>>>> didn't work and I received following new errors:
>>>>
>>>> 
>>>> Exception in thread "main" java.lang.RuntimeException:
>>>> java.lang.reflect.InvocationTargetException
>>>>       at
>>>>
>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>>>       at
>>>>
>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>>>       at
>>>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:612)
>>>>       at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>       at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>       at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>>       at
>>>>
>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>>>       ... 3 more
>>>> Caused by: org.hibernate.PropertyAccessException: Exception occurred
>>>> inside
>>>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>>>       at
>>>>
>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>>>       at
>>>>
>>>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>>>       at
>>>>
>>>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>>>       at
>>>>
>>>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>>>       at
>>>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>>>       at
>>>>
>>>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>>>       at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>>>       at
>>>>
>>>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>>>       at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>>>       at
>>>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>>>       at org.hibernate.loader.Loader.list(Loader.java:2099)
>>>>       at
>>>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>>>       at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>>>       at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>>>       ... 8 more
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>       at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>>>>       at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>>       at
>>>>
>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>>>       ... 21 more
>>>> Caused by: java.lang.NullPointerException
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>>>       at
>>>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>>>       ... 25 more
>>>> <\message>
>>>>
>>>> I think  BioSQLFeatureFilter.OverlapsRichLocation(rl) <\code>
>>>> causes
>>>> the problem I have. Can you help me to solve this problem?
>>>>
>>>> I'm grateful for any hints.
>>>> cheers,
>>>>
>>>> Gabrielle
>>>>
>>>>
>>>>
>>>> Richard Holland schrieb:
>>>>> This looks like a bug in BJX. I have just committed a fix that I think
>>>>> will
>>>>> fix it to the head of subversion. Can you check out the latest source,
>>>>> compile it, and try your program again?
>>>>>
>>>>> cheers,
>>>>> Richard
>>>>>
>>>>> 2008/10/9 Gabrielle Doan 
>>>>>
>>>>>> Hi Richard,
>>>>>>
>>>>>> thanks a lot for your mail. I have successfully retrieved the
>>>>>> subsequence
>>>>>> of a sequence as a String. And now I try to get the features for a
>>>>>> particular range with following code:
>>>>>>
>>>>>> 
>>>>>>      public FeatureHolder filterFeature(String name, int startpos, int
>>>>>> endpos) {
>>>>>>              RichLocation rl = new SimpleRichLocation(new
>>>>>> SimplePosition(startpos),
>>>>>>                              new SimplePosition(endpos), 0);
>>>>>>              BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>>>>>>                              new
>>>>>> BioSQLFeatureFilter.BySequenceName(name),
>>>>>>                              new
>>>>>> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>>>>>>              return filter(filter);
>>>>>>      }
>>>>>> <\code>
>>>>>>
>>>>>> Fortunately I received these errors:
>>>>>> 
>>>>>> Exception in thread "main" java.lang.RuntimeException:
>>>>>> java.lang.reflect.InvocationTargetException
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>>>>>      at
>>>>>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>>>>>>      at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>>>>>      ... 3 more
>>>>>> Caused by: org.hibernate.PropertyAccessException: Exception occurred
>>>>>> inside
>>>>>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>>>>>      at
>>>>>>
>>>>>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>>>>>      at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>>>>>      at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>>>>>      at
>>>>>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>>>>>      at org.hibernate.loader.Loader.list(Loader.java:2099)
>>>>>>      at
>>>>>>
>>>>>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>>>>>      at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>>>>>      at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>>>>>      ... 8 more
>>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>>>>>      ... 21 more
>>>>>> Caused by: java.lang.NullPointerException
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>>>>>      at
>>>>>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>>>>>      ... 26 more
>>>>>> <\message>
>>>>>>
>>>>>> Why do I get these errors?
>>>>>> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter.
>>>>>> How
>>>>>> can I find out the sequence name? Is it the value "name" in the table
>>>>>> "Bioentry"? As the build-in subSequence method takes a long time I
>>>>>> intend
>>>>>> to
>>>>>> get the subsequence as a String by myself and add the features to it.
>>>>>> What
>>>>>> do you think about this?
>>>>>>
>>>>>> I'm grateful for any hints.
>>>>>> cheers,
>>>>>>
>>>>>> Gabrielle
>>>>>>
>>>>>>
>>>>>>
>>>>>> Richard Holland schrieb:
>>>>>>
>>>>>>  Hello.
>>>>>>> Your code is pretty good already - but you're right, it will load the
>>>>>>> whole chromosome into memory before you can chop out the interesting
>>>>>>> bit you actually need.
>>>>>>>
>>>>>>> As you observed, by using ThinRichSequence in your query it will load
>>>>>>> only the initial shell of a sequence object to start with, but the
>>>>>>> moment you try and sub-sequence it, it will immediately load the whole
>>>>>>> sequence data into memory in order to perform the operation.
>>>>>>>
>>>>>>> If you only want the sequence data, as a string, you can do this by
>>>>>>> specifying the sequence attribute in the query and bypassing the
>>>>>>> sequence object entirely:
>>>>>>>
>>>>>>>  select rs.stringSequence from Sequence as rs where rs.description
>>>>>>> like '%hromosome :num%
>>>>>>>
>>>>>>> This will return a String instead of a RichSequence object. You can
>>>>>>> use HQL operators to perform substrings etc. on the string inside the
>>>>>>> query itself - see
>>>>>>>
>>>>>>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>>>>>>> , particularly section 14.9.
>>>>>>>
>>>>>>> If you only want the features, you can do this by using the
>>>>>>> BioSQLFeatureFilter technique. In particular you will want the
>>>>>>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>>>>>>> filter. You construct a filter then pass it to the filter() method in
>>>>>>> BioSQLRichSequenceDB. The database will return to you all the
>>>>>>> RichFeature objects that match your criteria. Note that it searches
>>>>>>> the whole database so you really must use a BySequenceName filter at
>>>>>>> the very least in order to make the results useful!
>>>>>>>
>>>>>>> However, you can't use HQL to construct a complete slice of a sequence
>>>>>>> directly in the database before returning it to the program for use as
>>>>>>> a ready-made RichSequence object. This would require Hibernate to know
>>>>>>> what a BioJava sub-sequence object is and how it behaves in relation
>>>>>>> to an 'unsliced' one, which is beyond the scope of it's job as a
>>>>>>> persistence framework.
>>>>>>>
>>>>>>> cheers,
>>>>>>> Richard
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2008/10/7 Gabrielle Doan :
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>> I have a BioSQL database which contains all human chromosomes. My
>>>>>>>> intention
>>>>>>>> is to get the information about a particular gene. How can I get a
>>>>>>>> part
>>>>>>>> of a
>>>>>>>> particular chromosome with all associated features? At the moment I
>>>>>>>> use
>>>>>>>> following code to create my new sequence:
>>>>>>>>
>>>>>>>> 
>>>>>>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>>>>>>     position[0], position[1], ns, geneName, parent.getAccession(),
>>>>>>>>     parent.getIdentifier(), parent.getVersion() + 1,
>>>>>>>>     (Double) (parent.getVersion() + 1.0));
>>>>>>>> <\code>
>>>>>>>>
>>>>>>>> Here is the part how I get the parent sequence:
>>>>>>>> 
>>>>>>>>     public static RichSequence getChromosome(String chrNo) {
>>>>>>>>             Transaction tx = session.beginTransaction();
>>>>>>>>             RichSequence ret = null;
>>>>>>>>
>>>>>>>>             String query;
>>>>>>>>
>>>>>>>>             try {
>>>>>>>>                     if (chrNo.equals("MT")) {
>>>>>>>>                             query = "from BioEntry as be where
>>>>>>>> be.description like '%:num%'";
>>>>>>>>                             query = query.replaceAll(":num",
>>>>>>>> "mitochondrion");
>>>>>>>>                     } else {
>>>>>>>>                             query = "from BioEntry as be where
>>>>>>>> be.description like '%hromosome :num%'";
>>>>>>>>                             query = query.replaceAll(":num", chrNo);
>>>>>>>>                     }
>>>>>>>>
>>>>>>>>                     Query q = session.createQuery(query);
>>>>>>>>
>>>>>>>>                     ret = (RichSequence) q.list().get(0);
>>>>>>>>                     tx.commit();
>>>>>>>>             } catch (Exception e) {
>>>>>>>>                     tx.rollback();
>>>>>>>>                     e.printStackTrace();
>>>>>>>>             }
>>>>>>>>             return ret;
>>>>>>>>     }
>>>>>>>> <\code>
>>>>>>>>
>>>>>>>> I always have to load the whole chromsome to get a part of it, so it
>>>>>>>> takes
>>>>>>>> very long time and I get a lot of unused information (waste of
>>>>>>>> memory).
>>>>>>>> I
>>>>>>>> also tried to use ThinRichSequence<\code> instead of
>>>>>>>> RichSequence<\code>, but thereby I didn't notice any
>>>>>>>> difference.
>>>>>>>> Can you give me a hint how to accelerate the code?
>>>>>>>> I am grateful for any hits.
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Gabrielle
>>>>>>>> _______________________________________________
>>>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>>>
>>>>>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
>>
> 
> 
> 


From holland at eaglegenomics.com  Sun Oct 19 20:18:29 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 01:18:29 +0100
Subject: [Biojava-l] BioJava 3 Begins - Volunteers please!
Message-ID: 

Hi all,

I've just committed some new code to the biojava3 branch of the biojava-live
subversion repository. It's the foundations of a brand new alphabet+symbol
set of classes, and an example of how to use them to represent DNA. You'll
notice that the new code is very lightweight and allows for a lot more
flexibility than the old code - for instance, the concept of Alphabet has
changed radically. It also makes much more extensive use of the Collections
API.

I haven't got any test cases or usage examples yet but give me a shout if
you don't understand the code and I'll explain how it works. (Hint:
SymbolFormat is there to convert Strings into SymbolList objects, and vice
versa).

So, now we want some volunteers! We're starting from scratch here so there's
a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
whether it be copy-and-paste existing classes and modify them to suit the
new style, or write completely new ones to provide equivalent functionality.


I'll post an example of how to do file parsing soon, probably starting with
FASTA. In the meantime, a good place to start would be for people to design
object models to represent their favourite data types (e.g. Genbank, or
microarray data). Utility classes to manipulate those objects would be great
too.

The object models need to be normalised as much as possible - e.g. if your
data has a lot of comments, and the order of those comments is important,
then give your object model a collection of comment objects. The object
model for each data type should be completely independent and use basic data
types wherever possible (e.g. store sequences as strings, don't attempt to
parse them into anything fancy like SymbolLists). The closer the object
model is to the original data format, the better. There's going to be clever
tricks when it comes to converting data between different object models
(e.g. Genbank to INSDSeq), which I will explain later when I put the file
parsing examples up.

You'll notice how the biojava3 branch uses Maven instead of Ant. This is
because we want to make it as modular as possible, so if you want to write
microarray stuff, create a new microarray sub-project (as per the dna
example that's already there). This way if someone only wants the microarray
bit of BJ3, they only need install the appropriate JAR file and can ignore
the rest. (The 'core' module is for stuff that is so generic it could be
used anywhere, or is used in every single other module.)

If coding isn't your cup of tea, then we would very much welcome testers
(particularly those who enjoy writing test cases!), documenters
(particularly code commenters), translators (for internationalisation of the
code), and of course all those who wish to contribute ideas and suggestions
no matter how off-the-wall they might be. In particular if you'd like to
take charge of an area of the development process, e.g. Documentation Chief,
or Protein Champion, then that would be much appreciated.

I'm very much looking forward to working with everyone on this. Good luck,
and happy coding!

cheers,
Richard

PS. Please don't forget to attach the appropriate licence to your code. You
can copy-and-paste it from the existing classes I just committed this
evening.

PPS. For those who are worried about backwards compatibility - this was
discussed on the lists a while back and it was made clear that BJ3 is a
clean break. However, the existing code will continue to be maintained and
bugfixed for a couple of years so you don't have to upgrade if you don't
want to - it just won't have any new features developed for it. This is
largely because it'll probably take just that long to write all the new BJ3
code. When we do decide to desupport the existing BJ code, plenty of notice
will be given (i.e. years as opposed to months).


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From holland at eaglegenomics.com  Mon Oct 20 13:52:08 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 18:52:08 +0100
Subject: [Biojava-l] File parsing in BJ3
Message-ID: 

(From now on I will only be posting these development messages to
biojava-dev, which is the intended purpose of that list. Those of you who
wish to keep track of things but are currently only subscribed to biojava-l
should also subscribe to biojava-dev in order to keep up to date.)

As promised, I've committed a new package in the biojava-core module that
should help understand how to do file parsing and conversion and writing in
the new BJ3 modules. Here's an example of how to use it to write a Genbank
parser (note no parsers actually exist yet!):

1. Design yourself a Genbank class which implements the interface Thing and
can fully represent all the data that might possibly occur inside a Genbank
file.

2. Write an interface called GenbankReceiver, which extends ThingReceiver
and defines all the methods you might need in order to construct a Genbank
object in an asynchronous fashion.

3. Write a GenbankBuilder class which implements GenbankReceiver and
ThingBuilder. It's job is to receive data via method calls, use that data to
construct a Genbank object, then provide that object on demand.

4. Write a GenbankWriter class which implements GenbankReceiver and
ThingWriter. It's job is similar to GenbankBuilder, but instead of
constructing new Genbank objects, it writes Genbank records to file that
reflect the data it receives.

5. Write a GenbankReader class which implements ThingReader. It can read
GenbankFiles and output the data to the methods of the ThingReceiver
provided to it, which in this case could be anything which implements the
interface GenbankReceiver.

6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
Genbank object and will fire off data from it to the provided ThingReceiver
(a GenbankReceiver instance) as if the Genbank object was being read from a
file or some other source.

That's it! OK so it's a minimum of 6 classes instead of the original 1 or 2,
but the additional steps are necessary for flexibility in converting between
formats.

Now to use it (you'll probably want a GenbankTools class to wrap these steps
up for user-friendliness, including various options for opening files,
etc.):

1. To read a file - instantiate ThingParser with your GenbankReader as the
reader, and GenbankBuilder as the receiver. Use the iterator methods on
ThingParser to get the objects out.

2. To write a file - instantiate ThingParser with a GenbankEmitter wrapping
your Genbank object, and a GenbankWriter as the receiver. Use the parseAll()
method on the ThingParser to dump the whole lot to your chosen output.

The clever bit comes when you want to convert between files. Imagine you've
done all the above for Genbank, and you've also done it for FASTA. How to
convert between them? What you need to do is this:

1. Implement all the classes for both Genbank and FASTA.

2. Write a GenbankFASTAConverter class that implements ThingConverter
and GenbankReceiver, and will internally convert the data received and pass
it on out to the receiver provided, which will be a FASTAReceiver instance.

3. Write a FASTAGenbankConverter class that operates in exactly the opposite
way, implementing ThingConverter and FASTAReceiver.

Then to convert you use ThingParser again:

1. From FASTA file to Genbank object: Instantiate ThingParser with a
FASTAReader reader, a GenbankBuilder receiver, and add a
FASTAGenbankConverter instance to the converter chain. Use the iterator to
get your Genbank objects out of your FASTA file.

2. From FASTA file to Genbank file: Same as option 1, but provide a
GenbankWriter instead and use parseAll() instead of the iterator methos.

3. From FASTA object to Genbank object: Same as option 1, but provide a
FASTAEmitter wrapping your FASTA object as the reader instead.

4. From FASTA object to Genbank file: Same as option 1, but swap both the
reader and the receiver as per options 2 and 3.

5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all mentions
of FASTA and Genbank, and use GenbankFASTAConverter instead.

One last and very important feature of this approach is that if you discover
that nobody has written the appropriate converter for your chosen pair of
formats A and C, but converters do exist to map A to some other format B and
that other format B on to C, then you can just put the two converts A-B and
B-C into the ThingParser chain and it'll work perfectly.

Enjoy!

cheers,
Richard

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From markjschreiber at gmail.com  Mon Oct 20 22:54:27 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 10:54:27 +0800
Subject: [Biojava-l] Biojava / BioSQL entity beans
Message-ID: <93b45ca50810201954k44ab0f65xb94a0214d8eb4e13@mail.gmail.com>

Hi -

Richard has kindly uploaded some JPA Entity beans that map to the
BioSQL database schema as a BioSQL module for BJ3.  These entity beans
where generated as part of the Tokyo webservices workshop.  As
Entities they are useful as POJOs as well as data transfer via JPA,
JAXB and can be used in EJB containers or a plain old JVM.  The have
no biological smarts and the intention was/is that these will be
provided by wrapping them in Bio-aware (and more thread safe) wrappers
that implement interfaces from other BJ3 modules.  In essence it is a
persistence layer.

The following is copied verbatim from the package-info.java and gives
you some idea of how I intend the package to be used (obviously some
of this is still to come).  There is also some discussion of some of
the gotcha's that might trip you up when playing with object
relational persistence.

BTW the naming convention is to call something FooEntity. Where BioSQL
requires a compound primary key this is implemented as an Embeddable
object called FooEntityPK which is the key for FooEntity.  The other
thing you may see is FooEntityUK which is the same concept but
represents some of the cases where BioSQL tables don't have a primary
key (even a compound one) but implicitly they do because all the
fields have the SQL unique restriction. In these cases JPA still
requires an Embeddable key to track updates. As far as Java is
concerned they are the same as a FooEntityPK but I used a different
name to make the distinction.

The annotations provide mapping to tables from a Derby database. This
is the reference Java in memory DB which can run from any JVM and is
also found in Glassfish. The mappings will likely also work with
MySQL. For Oracle (and possibly others) you would need to override the
@GeneratedValue strategy for generating primary keys. I believe this
can be done with external XML config files. You may also wish to
overide the default eager loading and cascade annotations depending on
your JPA persistence method and preferences.

This has been lightly tested using Glassfish, Derby and Toplink
essentials and is a work in progress but seems to work OK.

Best regards,

- Mark

/**
 * The package contains Entity representations of BioJava classes.
 * The purpose of these entities is to allow simple serialization of
BioJava data
 * using binary serialization for protocols that require this (eg RPC between
 * Java application servers) as well as persistence mechanisms that require bean
 * like ojbects such as the Java Persistence Architechture (JPA) or the
 * Java API for XML Binding (JAXB). For this reason all objects in this package
 * should provide a parameterless public constructor and public get/set methods
 * for relevant fields.
 * 
 * Given the public nature of the constructors and the setters in these beans
 * these classes are not intended for direct use in general programming when
 * using the BioJava v3 API. This is because it is possible to leave the bean in
 * and inconsitent state and they are not thread safe unless
synchronization
 * controlled externally (via synchornization blocks or via a
application container).
 * 

 * The Entities are intended to back other objects that a
 * programer will interact with directly. For example
Foo.class will be backed
 * by FooEntity.class. Generally interaction with
Foo.class is to be prefered and
 * will often be more sensible as the entities typically provide no 'biological
 * behaivour'. Relevant behaivour should be provided by the wrapping
class. It is best
 * to think of Foo as a view onto the data that is held in the
 * FooEntity.  A good example is the sophisticated Symbol
 * behaivour that can represent biological logic about IUPAC ambiguity symbols.
 * For example a 'w' in a Biosequence represents an abiguity between
'a' and 't',
 * whereas a 'w' in BiosequenceEntity is simply a 'w' and nothing else.
 * 

 * The wrapper entity pattern is intended to allow for a lot of the advanced
 * behaivour in the original BioJava while also allowing use of modern transport
 * and persistence packages. This is achieved by peristing and transporting the
 * entity without the wrapper and re-wrapping it at the other end.
 * 

 * Currently BioJava v3 uses annotated @Id fields to define
 * equals(Object o). Consistent definition is critical to how
 * the object will behave when persisted to a database. In the case of:
 * 
 * Foo f = ... initialize
 * Foo fo = ... initialize
 * boolean b = f.equals(fo);
 * 
 * b would be true if both objects share the same value
 * (or embeddable object) in the field that represents the primary key in the
 * database even if all other fields are equal. This is desirable because
 * two entities representing the same DB record may be retreived from
two different
 * sessions. Additionally these are the identity fields, so logically,
they should map to
 * the concept of identity. Finally, searching a collection is made very simple
 * without requireing an iterator:
 *  * Integer id = //code to initialize
 * collection.contains(new Foo(id));
 * 
 * By default BioJava v3 entities use only the primary key
field for equality
 * If either record has null as the primary key value it
is never equal
 * to another. When implementing equals(Object o) it is
not advisable to perform
 * the test this.getClass() == o.getClass() because of the possibility of proxy
 * classes used in JPA. This can, however, lead to an issue with the
 * hashcode() method.  Consider the following code:
 *  * Foo foo = new Foo() //no primary key
 * HashSet set = new HashSet();
 * set.add(foo);
 * // code here to persist Foo and consequently generate it's PK
 * boolean b = set.contains(foo);
 * 
 * Because only the PK is used for equality, then the PK is used in
the hashcode.
 * This means that b is probably going to be false because
 * it would have been stored in a hash bucket using the old hashcode that will
 * now be different even though the set actually does contain a pointer to foo.
 * Although a potential deficiency it is unlikely to be a major problem for
 * BioJava v3 developers because using entity backed objects is
prefered to direct
 * interaction with entities. If you need to use entities directly
then use hashed
 * collections with caution.
 *
 * Wrapper classes can either delegate it's equals call to the underlying
 * entity or it can do something that is more biologically sensible
 * (as PK values are typically not exposed in the wrapper). It is probably more
 * sensible for a wrapper to define it's own equals (and
haschode
 * implementations due to the limitations of the default @Id based system
 * described above. Especially the potential hashcode problems.
 *
 * For example FooSequence.class might want to base
 * equality on the exact match of the DNA sequence it holds even though
 * FooSequenceEntity.class may only use the PK field. If delegation
 * is used (or not) it should be clearly documented.
 * 

 *
 * 
 * @author Mark Schreiber
 */
package org.biojava.biosql.entity;

From markjschreiber at gmail.com  Mon Oct 20 23:16:51 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 11:16:51 +0800
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: 
References: 
Message-ID: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>

So if I want to build a BioSQL loader from Genbank then would the
classes (or there wrappers) in the BioSQL Entity package need to
implement Thing?  Would maven have an issue with that or would it just
create a dependency on core? (you can tell I've never used Maven
right).

>From a design point of view should Thing be an interface or an
Annotation? The reason I ask is that it doesn't define any methods so
it is more of a tag than an interface.

Anyway, my understanding is that I would use a Genbank parser (or
write one). Write a EntityReceiver interface (probably more than one
given the number of entities in BioSQL, implement a EntityBuilder
(again possibly more than one) that implements EntityReceiver and
builds Entity beans from messages it receives. In this case I probably
wouldn't provide a writer as JPA would be writing the beans to the
database.  Would this be how you imagine it?

- Mark


On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
 wrote:
> (From now on I will only be posting these development messages to
> biojava-dev, which is the intended purpose of that list. Those of you who
> wish to keep track of things but are currently only subscribed to biojava-l
> should also subscribe to biojava-dev in order to keep up to date.)
>
> As promised, I've committed a new package in the biojava-core module that
> should help understand how to do file parsing and conversion and writing in
> the new BJ3 modules. Here's an example of how to use it to write a Genbank
> parser (note no parsers actually exist yet!):
>
> 1. Design yourself a Genbank class which implements the interface Thing and
> can fully represent all the data that might possibly occur inside a Genbank
> file.
>
> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
> and defines all the methods you might need in order to construct a Genbank
> object in an asynchronous fashion.
>
> 3. Write a GenbankBuilder class which implements GenbankReceiver and
> ThingBuilder. It's job is to receive data via method calls, use that data to
> construct a Genbank object, then provide that object on demand.
>
> 4. Write a GenbankWriter class which implements GenbankReceiver and
> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> constructing new Genbank objects, it writes Genbank records to file that
> reflect the data it receives.
>
> 5. Write a GenbankReader class which implements ThingReader. It can read
> GenbankFiles and output the data to the methods of the ThingReceiver
> provided to it, which in this case could be anything which implements the
> interface GenbankReceiver.
>
> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
> Genbank object and will fire off data from it to the provided ThingReceiver
> (a GenbankReceiver instance) as if the Genbank object was being read from a
> file or some other source.
>
> That's it! OK so it's a minimum of 6 classes instead of the original 1 or 2,
> but the additional steps are necessary for flexibility in converting between
> formats.
>
> Now to use it (you'll probably want a GenbankTools class to wrap these steps
> up for user-friendliness, including various options for opening files,
> etc.):
>
> 1. To read a file - instantiate ThingParser with your GenbankReader as the
> reader, and GenbankBuilder as the receiver. Use the iterator methods on
> ThingParser to get the objects out.
>
> 2. To write a file - instantiate ThingParser with a GenbankEmitter wrapping
> your Genbank object, and a GenbankWriter as the receiver. Use the parseAll()
> method on the ThingParser to dump the whole lot to your chosen output.
>
> The clever bit comes when you want to convert between files. Imagine you've
> done all the above for Genbank, and you've also done it for FASTA. How to
> convert between them? What you need to do is this:
>
> 1. Implement all the classes for both Genbank and FASTA.
>
> 2. Write a GenbankFASTAConverter class that implements ThingConverter
> and GenbankReceiver, and will internally convert the data received and pass
> it on out to the receiver provided, which will be a FASTAReceiver instance.
>
> 3. Write a FASTAGenbankConverter class that operates in exactly the opposite
> way, implementing ThingConverter and FASTAReceiver.
>
> Then to convert you use ThingParser again:
>
> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> FASTAReader reader, a GenbankBuilder receiver, and add a
> FASTAGenbankConverter instance to the converter chain. Use the iterator to
> get your Genbank objects out of your FASTA file.
>
> 2. From FASTA file to Genbank file: Same as option 1, but provide a
> GenbankWriter instead and use parseAll() instead of the iterator methos.
>
> 3. From FASTA object to Genbank object: Same as option 1, but provide a
> FASTAEmitter wrapping your FASTA object as the reader instead.
>
> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
> reader and the receiver as per options 2 and 3.
>
> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all mentions
> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>
> One last and very important feature of this approach is that if you discover
> that nobody has written the appropriate converter for your chosen pair of
> formats A and C, but converters do exist to map A to some other format B and
> that other format B on to C, then you can just put the two converts A-B and
> B-C into the ThingParser chain and it'll work perfectly.
>
> Enjoy!
>
> cheers,
> Richard
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From andreas at sdsc.edu  Mon Oct 20 23:17:28 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Oct 2008 20:17:28 -0700
Subject: [Biojava-l] [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: 
References: 
Message-ID: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>

Hi,

Couple of thoughts regarding biojava v3:

License: Since it seems we will end up copying code from biojava 1.6
to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
people should still use the same biojava license headers when
committing new files and all code will be considered to be LGPL, if no
header is present. Do NOT commit code under other licenses.

Installation: We need some installation instructions on the wiki site,
e.g. how to get the maven setup running.  What are the code
conventions for the new version?

Blast: the Blast parsing modules are among the most frequently used
ones in biojava 1.6. To make people use biojava v3 it will be crucial
to have a port of them to the new version. Does anybody want to take
care of that?

Automated builds: is it interesting to have automated builds set up
for the new version at this stage, or should we wait until a more
mature stage? I could easily add another auto-build similar to the one
for biojava 1.6 at http://www.spice-3d.org/cruise/

Andreas

On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
 wrote:
> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>

From markjschreiber at gmail.com  Tue Oct 21 01:41:28 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 13:41:28 +0800
Subject: [Biojava-l] Logging in BJ3
Message-ID: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>

Hi -

I would like to strongly advocate the liberal and extensive use of
Logging in BioJava3.  The lack of this plagued us (me at least) during
bug fixes in previous versions of BioJava.  The default Java logging
API is very flexible and easily meets our needs. It's also not too
much effort for developers to put in place (you know you use
System.println() all over the place anyway).

The following is an example snippet using logging that would certainly
help debugging.  With the standard logging setup only the severe
statement would appear on the terminal. We could also provide config
files that show lower levels of logging so that people can easily
generate detailed logs to accompany bug reports.  If we want to be
really tricky we could even use a MemoryLogger that has a rotating
buffer of log statements that could spit out with a stack trace so you
could just submit the stack trace and the activity log all in one go
and we can get an idea of what was going on at the time.

The example below also shows what to do to avoid a major performance
hit during logging. The marked "expensive logging operation" pretends
to get config information by getting it from a database. One might
expect this to take time while the db connects etc and could produce
quite a long String of information. To save time when logging is not
set to the CONFIG level the if statement is able to skip this costly
step.

I know from experience we will definitely get the most value from this
in the IO parsers and ThingBuilders.

Any thoughts?

- Mark



    private Logger logger = Logger.getLogger("org.biojava.MyClass");

    public Object generateObject(String argument){
         logger.entering(""+getClass(), "generateObject", argument);

         //expensive logging operation
         if (logger.isLoggable( Level.CONFIG )) {
            logger.config("DB config: "+ getDBConfigInfo());
         }

         Object obj = null;
         try{

            //do some stuff
            logger.fine("doing stuff");
            obj = new Object();

         }catch(Exception ex){
             logger.severe("Failed to do stuff");
             logger.throwing(""+getClass(), "generateObject", ex);
         }

         logger.exiting(""+getClass(), "generateObject", obj);
         return obj;
    }

From holland at eaglegenomics.com  Tue Oct 21 04:34:46 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 09:34:46 +0100
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
References: 
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
Message-ID: 

Spot on.

Annotation/interface.... i think Annotation is probably better as you
suggest, but I'd have to look into that. Not sure how it works with
collections and generics. If it does turn out to be a better bet, I'll
change it over.

With the BioSQL dependencies, take a look at the pom.xml file inside the
biojava-dna module. It declares a dependency on biojava-core. If you want to
add dependencies to external JARs, take a look at biojava-biosql's pom.xml
to see how it depends on javax.persistence. (The easiest way to add these is
via an IDE such as NetBeans, which is what I'm using at the moment).

cheers,
Richard

2008/10/21 Mark Schreiber 

> So if I want to build a BioSQL loader from Genbank then would the
> classes (or there wrappers) in the BioSQL Entity package need to
> implement Thing?  Would maven have an issue with that or would it just
> create a dependency on core? (you can tell I've never used Maven
> right).
>
> From a design point of view should Thing be an interface or an
> Annotation? The reason I ask is that it doesn't define any methods so
> it is more of a tag than an interface.
>
> Anyway, my understanding is that I would use a Genbank parser (or
> write one). Write a EntityReceiver interface (probably more than one
> given the number of entities in BioSQL, implement a EntityBuilder
> (again possibly more than one) that implements EntityReceiver and
> builds Entity beans from messages it receives. In this case I probably
> wouldn't provide a writer as JPA would be writing the beans to the
> database.  Would this be how you imagine it?
>
> - Mark
>
>
> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>  wrote:
> > (From now on I will only be posting these development messages to
> > biojava-dev, which is the intended purpose of that list. Those of you who
> > wish to keep track of things but are currently only subscribed to
> biojava-l
> > should also subscribe to biojava-dev in order to keep up to date.)
> >
> > As promised, I've committed a new package in the biojava-core module that
> > should help understand how to do file parsing and conversion and writing
> in
> > the new BJ3 modules. Here's an example of how to use it to write a
> Genbank
> > parser (note no parsers actually exist yet!):
> >
> > 1. Design yourself a Genbank class which implements the interface Thing
> and
> > can fully represent all the data that might possibly occur inside a
> Genbank
> > file.
> >
> > 2. Write an interface called GenbankReceiver, which extends ThingReceiver
> > and defines all the methods you might need in order to construct a
> Genbank
> > object in an asynchronous fashion.
> >
> > 3. Write a GenbankBuilder class which implements GenbankReceiver and
> > ThingBuilder. It's job is to receive data via method calls, use that data
> to
> > construct a Genbank object, then provide that object on demand.
> >
> > 4. Write a GenbankWriter class which implements GenbankReceiver and
> > ThingWriter. It's job is similar to GenbankBuilder, but instead of
> > constructing new Genbank objects, it writes Genbank records to file that
> > reflect the data it receives.
> >
> > 5. Write a GenbankReader class which implements ThingReader. It can read
> > GenbankFiles and output the data to the methods of the ThingReceiver
> > provided to it, which in this case could be anything which implements the
> > interface GenbankReceiver.
> >
> > 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
> > Genbank object and will fire off data from it to the provided
> ThingReceiver
> > (a GenbankReceiver instance) as if the Genbank object was being read from
> a
> > file or some other source.
> >
> > That's it! OK so it's a minimum of 6 classes instead of the original 1 or
> 2,
> > but the additional steps are necessary for flexibility in converting
> between
> > formats.
> >
> > Now to use it (you'll probably want a GenbankTools class to wrap these
> steps
> > up for user-friendliness, including various options for opening files,
> > etc.):
> >
> > 1. To read a file - instantiate ThingParser with your GenbankReader as
> the
> > reader, and GenbankBuilder as the receiver. Use the iterator methods on
> > ThingParser to get the objects out.
> >
> > 2. To write a file - instantiate ThingParser with a GenbankEmitter
> wrapping
> > your Genbank object, and a GenbankWriter as the receiver. Use the
> parseAll()
> > method on the ThingParser to dump the whole lot to your chosen output.
> >
> > The clever bit comes when you want to convert between files. Imagine
> you've
> > done all the above for Genbank, and you've also done it for FASTA. How to
> > convert between them? What you need to do is this:
> >
> > 1. Implement all the classes for both Genbank and FASTA.
> >
> > 2. Write a GenbankFASTAConverter class that implements
> ThingConverter
> > and GenbankReceiver, and will internally convert the data received and
> pass
> > it on out to the receiver provided, which will be a FASTAReceiver
> instance.
> >
> > 3. Write a FASTAGenbankConverter class that operates in exactly the
> opposite
> > way, implementing ThingConverter and FASTAReceiver.
> >
> > Then to convert you use ThingParser again:
> >
> > 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> > FASTAReader reader, a GenbankBuilder receiver, and add a
> > FASTAGenbankConverter instance to the converter chain. Use the iterator
> to
> > get your Genbank objects out of your FASTA file.
> >
> > 2. From FASTA file to Genbank file: Same as option 1, but provide a
> > GenbankWriter instead and use parseAll() instead of the iterator methos.
> >
> > 3. From FASTA object to Genbank object: Same as option 1, but provide a
> > FASTAEmitter wrapping your FASTA object as the reader instead.
> >
> > 4. From FASTA object to Genbank file: Same as option 1, but swap both the
> > reader and the receiver as per options 2 and 3.
> >
> > 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> mentions
> > of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >
> > One last and very important feature of this approach is that if you
> discover
> > that nobody has written the appropriate converter for your chosen pair of
> > formats A and C, but converters do exist to map A to some other format B
> and
> > that other format B on to C, then you can just put the two converts A-B
> and
> > B-C into the ThingParser chain and it'll work perfectly.
> >
> > Enjoy!
> >
> > cheers,
> > Richard
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From ayates at ebi.ac.uk  Tue Oct 21 04:40:48 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 09:40:48 +0100
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
Message-ID: <48FD9590.5010704@ebi.ac.uk>

Hi,

A logging framework is a priority to start baking into the new API now.
As Mark has mentioned logging frameworks are very flexible things but
it's not until you start using them do you get a real feel about how
easy & extensible they are.

The JDK logger has some good integration with MessageFormat &
localization. I'm not completely taken with how it does the checks for
log levels (log.isDebugEnabled() just seems easier that
log.isLoggable(Level.FINEST)) & how you grab a logger ( I'd prefer
something like Logger.getLogger(this.getClass()) ) but that's just
nit-picking.

I'll be happy to go with whatever people are most comfortable with & we
should attempt to use as many of the core Java classes as possible.

Andy

Mark Schreiber wrote:
> Hi -
> 
> I would like to strongly advocate the liberal and extensive use of
> Logging in BioJava3.  The lack of this plagued us (me at least) during
> bug fixes in previous versions of BioJava.  The default Java logging
> API is very flexible and easily meets our needs. It's also not too
> much effort for developers to put in place (you know you use
> System.println() all over the place anyway).
> 
> The following is an example snippet using logging that would certainly
> help debugging.  With the standard logging setup only the severe
> statement would appear on the terminal. We could also provide config
> files that show lower levels of logging so that people can easily
> generate detailed logs to accompany bug reports.  If we want to be
> really tricky we could even use a MemoryLogger that has a rotating
> buffer of log statements that could spit out with a stack trace so you
> could just submit the stack trace and the activity log all in one go
> and we can get an idea of what was going on at the time.
> 
> The example below also shows what to do to avoid a major performance
> hit during logging. The marked "expensive logging operation" pretends
> to get config information by getting it from a database. One might
> expect this to take time while the db connects etc and could produce
> quite a long String of information. To save time when logging is not
> set to the CONFIG level the if statement is able to skip this costly
> step.
> 
> I know from experience we will definitely get the most value from this
> in the IO parsers and ThingBuilders.
> 
> Any thoughts?
> 
> - Mark
> 
> 
> 
>     private Logger logger = Logger.getLogger("org.biojava.MyClass");
> 
>     public Object generateObject(String argument){
>          logger.entering(""+getClass(), "generateObject", argument);
> 
>          //expensive logging operation
>          if (logger.isLoggable( Level.CONFIG )) {
>             logger.config("DB config: "+ getDBConfigInfo());
>          }
> 
>          Object obj = null;
>          try{
> 
>             //do some stuff
>             logger.fine("doing stuff");
>             obj = new Object();
> 
>          }catch(Exception ex){
>              logger.severe("Failed to do stuff");
>              logger.throwing(""+getClass(), "generateObject", ex);
>          }
> 
>          logger.exiting(""+getClass(), "generateObject", obj);
>          return obj;
>     }
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From ayates at ebi.ac.uk  Tue Oct 21 04:49:47 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 09:49:47 +0100
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: 
References: 	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	
Message-ID: <48FD97AB.70503@ebi.ac.uk>

Depends on what you want to program. If you want to have a collection of
objects which are Things & perform a common action on them then
annotations are not the way forward.

If you want to have some kind of meta-programming occurring & need a
class to be multiple things then annotations are right. There is
currently no way to enforce compile time dependencies on annotations &
my thinking is that this is right. Annotations should be meta data or
provide a way to alter a class in a non-invasive way (think Web Service
annotations creating WS Servers & Clients without any alteration of the
class).

Andy

Richard Holland wrote:
> Spot on.
> 
> Annotation/interface.... i think Annotation is probably better as you
> suggest, but I'd have to look into that. Not sure how it works with
> collections and generics. If it does turn out to be a better bet, I'll
> change it over.
> 
> With the BioSQL dependencies, take a look at the pom.xml file inside the
> biojava-dna module. It declares a dependency on biojava-core. If you want to
> add dependencies to external JARs, take a look at biojava-biosql's pom.xml
> to see how it depends on javax.persistence. (The easiest way to add these is
> via an IDE such as NetBeans, which is what I'm using at the moment).
> 
> cheers,
> Richard
> 
> 2008/10/21 Mark Schreiber 
> 
>> So if I want to build a BioSQL loader from Genbank then would the
>> classes (or there wrappers) in the BioSQL Entity package need to
>> implement Thing?  Would maven have an issue with that or would it just
>> create a dependency on core? (you can tell I've never used Maven
>> right).
>>
>> From a design point of view should Thing be an interface or an
>> Annotation? The reason I ask is that it doesn't define any methods so
>> it is more of a tag than an interface.
>>
>> Anyway, my understanding is that I would use a Genbank parser (or
>> write one). Write a EntityReceiver interface (probably more than one
>> given the number of entities in BioSQL, implement a EntityBuilder
>> (again possibly more than one) that implements EntityReceiver and
>> builds Entity beans from messages it receives. In this case I probably
>> wouldn't provide a writer as JPA would be writing the beans to the
>> database.  Would this be how you imagine it?
>>
>> - Mark
>>
>>
>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>  wrote:
>>> (From now on I will only be posting these development messages to
>>> biojava-dev, which is the intended purpose of that list. Those of you who
>>> wish to keep track of things but are currently only subscribed to
>> biojava-l
>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>
>>> As promised, I've committed a new package in the biojava-core module that
>>> should help understand how to do file parsing and conversion and writing
>> in
>>> the new BJ3 modules. Here's an example of how to use it to write a
>> Genbank
>>> parser (note no parsers actually exist yet!):
>>>
>>> 1. Design yourself a Genbank class which implements the interface Thing
>> and
>>> can fully represent all the data that might possibly occur inside a
>> Genbank
>>> file.
>>>
>>> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
>>> and defines all the methods you might need in order to construct a
>> Genbank
>>> object in an asynchronous fashion.
>>>
>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>>> ThingBuilder. It's job is to receive data via method calls, use that data
>> to
>>> construct a Genbank object, then provide that object on demand.
>>>
>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>> constructing new Genbank objects, it writes Genbank records to file that
>>> reflect the data it receives.
>>>
>>> 5. Write a GenbankReader class which implements ThingReader. It can read
>>> GenbankFiles and output the data to the methods of the ThingReceiver
>>> provided to it, which in this case could be anything which implements the
>>> interface GenbankReceiver.
>>>
>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
>>> Genbank object and will fire off data from it to the provided
>> ThingReceiver
>>> (a GenbankReceiver instance) as if the Genbank object was being read from
>> a
>>> file or some other source.
>>>
>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 or
>> 2,
>>> but the additional steps are necessary for flexibility in converting
>> between
>>> formats.
>>>
>>> Now to use it (you'll probably want a GenbankTools class to wrap these
>> steps
>>> up for user-friendliness, including various options for opening files,
>>> etc.):
>>>
>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
>> the
>>> reader, and GenbankBuilder as the receiver. Use the iterator methods on
>>> ThingParser to get the objects out.
>>>
>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>> wrapping
>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>> parseAll()
>>> method on the ThingParser to dump the whole lot to your chosen output.
>>>
>>> The clever bit comes when you want to convert between files. Imagine
>> you've
>>> done all the above for Genbank, and you've also done it for FASTA. How to
>>> convert between them? What you need to do is this:
>>>
>>> 1. Implement all the classes for both Genbank and FASTA.
>>>
>>> 2. Write a GenbankFASTAConverter class that implements
>> ThingConverter
>>> and GenbankReceiver, and will internally convert the data received and
>> pass
>>> it on out to the receiver provided, which will be a FASTAReceiver
>> instance.
>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>> opposite
>>> way, implementing ThingConverter and FASTAReceiver.
>>>
>>> Then to convert you use ThingParser again:
>>>
>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>> FASTAGenbankConverter instance to the converter chain. Use the iterator
>> to
>>> get your Genbank objects out of your FASTA file.
>>>
>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>> GenbankWriter instead and use parseAll() instead of the iterator methos.
>>>
>>> 3. From FASTA object to Genbank object: Same as option 1, but provide a
>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>
>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
>>> reader and the receiver as per options 2 and 3.
>>>
>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>> mentions
>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>
>>> One last and very important feature of this approach is that if you
>> discover
>>> that nobody has written the appropriate converter for your chosen pair of
>>> formats A and C, but converters do exist to map A to some other format B
>> and
>>> that other format B on to C, then you can just put the two converts A-B
>> and
>>> B-C into the ThingParser chain and it'll work perfectly.
>>>
>>> Enjoy!
>>>
>>> cheers,
>>> Richard
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Finance Director, Eagle Genomics Ltd
>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
> 
> 
> 

From holland at eaglegenomics.com  Tue Oct 21 05:06:41 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 10:06:41 +0100
Subject: [Biojava-l] [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
References: 
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
Message-ID: 

>
>
> License: Since it seems we will end up copying code from biojava 1.6
> to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
> people should still use the same biojava license headers when
> committing new files and all code will be considered to be LGPL, if no
> header is present. Do NOT commit code under other licenses.
>
> Installation: We need some installation instructions on the wiki site,
> e.g. how to get the maven setup running.  What are the code
> conventions for the new version?


Not sure where best to put it in the Wiki, but I agree it needs to go there
somewhere.

Installation is a one-liner from within the top level of the project:

   mvn install

This compiles and installs the JARs into your local Maven repository, and
also downloads and installs any external dependencies. Then you can add the
installed modules as dependencies in your own Maven projects.

If you need to write a launcher script for your project, or you want to use
the JAR files outside Maven, you can use this command to generate the
CLASSPATH for use outside Maven. This only includes external dependencies -
you'll also need to add to it the individual JAR files from inside the
various target/ folders that Maven built for you:

  mvn dependency:build-classpath

Code conventions are simple:

1. I'm not fussed about the specific formatter people use in each module, as
long as the code is all formatted using some kind of consistent method. I
personally just use the default settings from Format code in NetBeans.

2. Use 'this' wherever possible, and for static references, use the
classname prefix (e.g. MyClass.staticField). I hate having to try and work
out in my head which references are going where, and which are static and
which are not!

3. Comment every single method, even if it's private. This helps understand
the flow of your code. Also comment liberally inside methods if they are
longer than just a few lines (i.e. if you can't fit the entire method within
the code panel in NetBeans, its going to need internal comments).

4. When writing getters/setters, follow the Java beans conventions so that
automated frameworks like Spring can easily pick it up and work with it.

5. Please write tests for your code using JUnit conventions, inside the
test/ folder of each module. I know I haven't done this myself yet, but I'm
going to!


>
>
> Blast: the Blast parsing modules are among the most frequently used
> ones in biojava 1.6. To make people use biojava v3 it will be crucial
> to have a port of them to the new version. Does anybody want to take
> care of that?


I'll second that. Blast is vital. We'd really appreciate a volunteer,
please!


>
> Automated builds: is it interesting to have automated builds set up
> for the new version at this stage, or should we wait until a more
> mature stage? I could easily add another auto-build similar to the one
> for biojava 1.6 at http://www.spice-3d.org/cruise/


You could do, although I don't think they'd be much use yet. But why not
start early then we won't forget to do it later.


Richard


>
> Andreas
>
> On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
>  wrote:
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From benn at mpi-cbg.de  Tue Oct 21 05:00:44 2008
From: benn at mpi-cbg.de (Neil Benn)
Date: Tue, 21 Oct 2008 11:00:44 +0200
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
Message-ID: <48FD9A3C.20904@mpi-cbg.de>

Hello,

          I'm not sure if I should comment as I have no time to 
contribute LOC but I thought I may as well ;).

Mark Schreiber wrote:
> Hi -
>
> I would like to strongly advocate the liberal and extensive use of
> Logging in BioJava3.  The lack of this plagued us (me at least) during
> bug fixes in previous versions of BioJava.  The default Java logging
> API is very flexible and easily meets our needs. It's also not too
> much effort for developers to put in place (you know you use
> System.println() all over the place anyway).
>   
Hmm, that is true but for total completeness you can use 
commons-logging, that is very easy to use and much more flexible as it 
can encapsulate other logging mechanisms (including JDK1.4 logging 
framework).  To use it you simply declare a new logger as follows:

private static final Log logger = LogFactory.getLog();

The rest of it works pretty much the same as below- if you dovetail 
commons-logging with log4j then you'll cover the most common case of 
logging used in other frameworks - the config files to setup log4j (XML 
and preperties fiels) are well documented all over the web.
> 
>
> I know from experience we will definitely get the most value from this
> in the IO parsers and ThingBuilders.
>
> Any thoughts?
>   
+1
> - Mark
>
>
>
>     private Logger logger = Logger.getLogger("org.biojava.MyClass");
>
>     public Object generateObject(String argument){
>          logger.entering(""+getClass(), "generateObject", argument);
>
>          //expensive logging operation
>          if (logger.isLoggable( Level.CONFIG )) {
>             logger.config("DB config: "+ getDBConfigInfo());
>          }
>
>          Object obj = null;
>          try{
>
>             //do some stuff
>             logger.fine("doing stuff");
>             obj = new Object();
>
>          }catch(Exception ex){
>              logger.severe("Failed to do stuff");
>              logger.throwing(""+getClass(), "generateObject", ex);
>          }
>
>          logger.exiting(""+getClass(), "generateObject", obj);
>          return obj;
>     }
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>   


From markjschreiber at gmail.com  Tue Oct 21 05:18:41 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 17:18:41 +0800
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: 
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
	
Message-ID: <93b45ca50810210218n1e2ac06bma211f1541b8be3bb@mail.gmail.com>

For the Entity classes my original thinking was to implement an EJB3
interceptor which logs all method calls. This would be preferable to
putting logging statements in all the classes but I don't know if such
an interceptor will work outside of a container. Does anyone know if
JPA can use an interceptor outside of a container?

Logging for the actual persistence would be via the persistence
provider (Hibernate, Toplink etc).

- Mark

On Tue, Oct 21, 2008 at 5:08 PM, Richard Holland
 wrote:
> Excellent idea. I'll integrate it into ThingParser as an example
>
> 2008/10/21 Mark Schreiber 
>>
>> Hi -
>>
>> I would like to strongly advocate the liberal and extensive use of
>> Logging in BioJava3.  The lack of this plagued us (me at least) during
>> bug fixes in previous versions of BioJava.  The default Java logging
>> API is very flexible and easily meets our needs. It's also not too
>> much effort for developers to put in place (you know you use
>> System.println() all over the place anyway).
>>
>> The following is an example snippet using logging that would certainly
>> help debugging.  With the standard logging setup only the severe
>> statement would appear on the terminal. We could also provide config
>> files that show lower levels of logging so that people can easily
>> generate detailed logs to accompany bug reports.  If we want to be
>> really tricky we could even use a MemoryLogger that has a rotating
>> buffer of log statements that could spit out with a stack trace so you
>> could just submit the stack trace and the activity log all in one go
>> and we can get an idea of what was going on at the time.
>>
>> The example below also shows what to do to avoid a major performance
>> hit during logging. The marked "expensive logging operation" pretends
>> to get config information by getting it from a database. One might
>> expect this to take time while the db connects etc and could produce
>> quite a long String of information. To save time when logging is not
>> set to the CONFIG level the if statement is able to skip this costly
>> step.
>>
>> I know from experience we will definitely get the most value from this
>> in the IO parsers and ThingBuilders.
>>
>> Any thoughts?
>>
>> - Mark
>>
>>
>>
>>    private Logger logger = Logger.getLogger("org.biojava.MyClass");
>>
>>    public Object generateObject(String argument){
>>         logger.entering(""+getClass(), "generateObject", argument);
>>
>>         //expensive logging operation
>>         if (logger.isLoggable( Level.CONFIG )) {
>>            logger.config("DB config: "+ getDBConfigInfo());
>>         }
>>
>>         Object obj = null;
>>         try{
>>
>>            //do some stuff
>>            logger.fine("doing stuff");
>>            obj = new Object();
>>
>>         }catch(Exception ex){
>>             logger.severe("Failed to do stuff");
>>             logger.throwing(""+getClass(), "generateObject", ex);
>>         }
>>
>>         logger.exiting(""+getClass(), "generateObject", obj);
>>         return obj;
>>    }
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>

From ayates at ebi.ac.uk  Tue Oct 21 05:21:26 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 10:21:26 +0100
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <48FD9A3C.20904@mpi-cbg.de>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
	<48FD9A3C.20904@mpi-cbg.de>
Message-ID: <48FD9F16.2000405@ebi.ac.uk>

Hi Neil,

That's okay the more people take an interest in this the better it will
be. We did discuss this quite a bit ago at a biojava meeting & the
general consensus was bridges can be manually written between the
logging frameworks as and when they are required. Also using the JDK
logger reduces our external dependencies.

However I do like the logging facades & am in favour of them. Especially
SLF4J which does the same thing as commons-logging but relies on the
existence of SLF4J adaptors not the raw logging framework which
commons-logging does. It also has links to a lot more logging frameworks
 including simple-log (https://simple-log.dev.java.net/) & logback
(http://logback.qos.ch/).

There's just so many options here it's hard to gauge what is the best
thing to do. Do we buy into a single framework & use all of its features
(JDK logger has nice things for logging entering & exiting methods along
with locale ResourceBundles) or go for a common denominator.

It's not an easy decision to make ........

Andy

Neil Benn wrote:
> Hello,
> 
>          I'm not sure if I should comment as I have no time to
> contribute LOC but I thought I may as well ;).
> 
> Mark Schreiber wrote:
>> Hi -
>>
>> I would like to strongly advocate the liberal and extensive use of
>> Logging in BioJava3.  The lack of this plagued us (me at least) during
>> bug fixes in previous versions of BioJava.  The default Java logging
>> API is very flexible and easily meets our needs. It's also not too
>> much effort for developers to put in place (you know you use
>> System.println() all over the place anyway).
>>   
> Hmm, that is true but for total completeness you can use
> commons-logging, that is very easy to use and much more flexible as it
> can encapsulate other logging mechanisms (including JDK1.4 logging
> framework).  To use it you simply declare a new logger as follows:
> 
> private static final Log logger = LogFactory.getLog( here>);
> 
> The rest of it works pretty much the same as below- if you dovetail
> commons-logging with log4j then you'll cover the most common case of
> logging used in other frameworks - the config files to setup log4j (XML
> and preperties fiels) are well documented all over the web.
>> 
>>
>> I know from experience we will definitely get the most value from this
>> in the IO parsers and ThingBuilders.
>>
>> Any thoughts?
>>   
> +1
>> - Mark
>>
>>
>>
>>     private Logger logger = Logger.getLogger("org.biojava.MyClass");
>>
>>     public Object generateObject(String argument){
>>          logger.entering(""+getClass(), "generateObject", argument);
>>
>>          //expensive logging operation
>>          if (logger.isLoggable( Level.CONFIG )) {
>>             logger.config("DB config: "+ getDBConfigInfo());
>>          }
>>
>>          Object obj = null;
>>          try{
>>
>>             //do some stuff
>>             logger.fine("doing stuff");
>>             obj = new Object();
>>
>>          }catch(Exception ex){
>>              logger.severe("Failed to do stuff");
>>              logger.throwing(""+getClass(), "generateObject", ex);
>>          }
>>
>>          logger.exiting(""+getClass(), "generateObject", obj);
>>          return obj;
>>     }
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>   
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From ayates at ebi.ac.uk  Tue Oct 21 05:23:35 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 10:23:35 +0100
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <93b45ca50810210218n1e2ac06bma211f1541b8be3bb@mail.gmail.com>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>	
	<93b45ca50810210218n1e2ac06bma211f1541b8be3bb@mail.gmail.com>
Message-ID: <48FD9F97.8010705@ebi.ac.uk>

As far as I was aware JPA has no concept of EJB3 interceptors. If you
want that kind of thing I think you would have to start using AOP or
proxy objects.

Andy

Mark Schreiber wrote:
> For the Entity classes my original thinking was to implement an EJB3
> interceptor which logs all method calls. This would be preferable to
> putting logging statements in all the classes but I don't know if such
> an interceptor will work outside of a container. Does anyone know if
> JPA can use an interceptor outside of a container?
> 
> Logging for the actual persistence would be via the persistence
> provider (Hibernate, Toplink etc).
> 
> - Mark
> 
> On Tue, Oct 21, 2008 at 5:08 PM, Richard Holland
>  wrote:
>> Excellent idea. I'll integrate it into ThingParser as an example
>>
>> 2008/10/21 Mark Schreiber 
>>> Hi -
>>>
>>> I would like to strongly advocate the liberal and extensive use of
>>> Logging in BioJava3.  The lack of this plagued us (me at least) during
>>> bug fixes in previous versions of BioJava.  The default Java logging
>>> API is very flexible and easily meets our needs. It's also not too
>>> much effort for developers to put in place (you know you use
>>> System.println() all over the place anyway).
>>>
>>> The following is an example snippet using logging that would certainly
>>> help debugging.  With the standard logging setup only the severe
>>> statement would appear on the terminal. We could also provide config
>>> files that show lower levels of logging so that people can easily
>>> generate detailed logs to accompany bug reports.  If we want to be
>>> really tricky we could even use a MemoryLogger that has a rotating
>>> buffer of log statements that could spit out with a stack trace so you
>>> could just submit the stack trace and the activity log all in one go
>>> and we can get an idea of what was going on at the time.
>>>
>>> The example below also shows what to do to avoid a major performance
>>> hit during logging. The marked "expensive logging operation" pretends
>>> to get config information by getting it from a database. One might
>>> expect this to take time while the db connects etc and could produce
>>> quite a long String of information. To save time when logging is not
>>> set to the CONFIG level the if statement is able to skip this costly
>>> step.
>>>
>>> I know from experience we will definitely get the most value from this
>>> in the IO parsers and ThingBuilders.
>>>
>>> Any thoughts?
>>>
>>> - Mark
>>>
>>>
>>>
>>>    private Logger logger = Logger.getLogger("org.biojava.MyClass");
>>>
>>>    public Object generateObject(String argument){
>>>         logger.entering(""+getClass(), "generateObject", argument);
>>>
>>>         //expensive logging operation
>>>         if (logger.isLoggable( Level.CONFIG )) {
>>>            logger.config("DB config: "+ getDBConfigInfo());
>>>         }
>>>
>>>         Object obj = null;
>>>         try{
>>>
>>>            //do some stuff
>>>            logger.fine("doing stuff");
>>>            obj = new Object();
>>>
>>>         }catch(Exception ex){
>>>             logger.severe("Failed to do stuff");
>>>             logger.throwing(""+getClass(), "generateObject", ex);
>>>         }
>>>
>>>         logger.exiting(""+getClass(), "generateObject", obj);
>>>         return obj;
>>>    }
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From markjschreiber at gmail.com  Tue Oct 21 05:26:41 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 17:26:41 +0800
Subject: [Biojava-l] [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: 
References: 
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	
Message-ID: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>

>> Blast: the Blast parsing modules are among the most frequently used
>> ones in biojava 1.6. To make people use biojava v3 it will be crucial
>> to have a port of them to the new version. Does anybody want to take
>> care of that?
>
>
> I'll second that. Blast is vital. We'd really appreciate a volunteer,
> please!
>

BlastXML output would certainly be the easiest place to start. I also
think with the new Thing/ ThingBuilder framework it will be possible
to develop all manner of parsers for the vagaries of Blast text output
that come with each new release of Blast. Possible but maybe not a
good idea. I don't think that output was ever supposed to be machine
readable.  The table formatted output (-m8 I think) would be a better
option.

Given the DTD it should be possible to do a quick JAXB binding. How
would that work in the Thing/ ThingBuilder paradigm?

- Mark

From markjschreiber at gmail.com  Tue Oct 21 06:35:14 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 18:35:14 +0800
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: <48FD97AB.70503@ebi.ac.uk>
References: 
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	
	<48FD97AB.70503@ebi.ac.uk>
Message-ID: <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>

Is there any need for Thing at all? Can't a bulder be typed to produce
something that extends Object?

If Thing provides no behaivour contract or meta-information then why
does it exist?

- Mark

On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates  wrote:
> Depends on what you want to program. If you want to have a collection of
> objects which are Things & perform a common action on them then
> annotations are not the way forward.
>
> If you want to have some kind of meta-programming occurring & need a
> class to be multiple things then annotations are right. There is
> currently no way to enforce compile time dependencies on annotations &
> my thinking is that this is right. Annotations should be meta data or
> provide a way to alter a class in a non-invasive way (think Web Service
> annotations creating WS Servers & Clients without any alteration of the
> class).
>
> Andy
>
> Richard Holland wrote:
>> Spot on.
>>
>> Annotation/interface.... i think Annotation is probably better as you
>> suggest, but I'd have to look into that. Not sure how it works with
>> collections and generics. If it does turn out to be a better bet, I'll
>> change it over.
>>
>> With the BioSQL dependencies, take a look at the pom.xml file inside the
>> biojava-dna module. It declares a dependency on biojava-core. If you want to
>> add dependencies to external JARs, take a look at biojava-biosql's pom.xml
>> to see how it depends on javax.persistence. (The easiest way to add these is
>> via an IDE such as NetBeans, which is what I'm using at the moment).
>>
>> cheers,
>> Richard
>>
>> 2008/10/21 Mark Schreiber 
>>
>>> So if I want to build a BioSQL loader from Genbank then would the
>>> classes (or there wrappers) in the BioSQL Entity package need to
>>> implement Thing?  Would maven have an issue with that or would it just
>>> create a dependency on core? (you can tell I've never used Maven
>>> right).
>>>
>>> From a design point of view should Thing be an interface or an
>>> Annotation? The reason I ask is that it doesn't define any methods so
>>> it is more of a tag than an interface.
>>>
>>> Anyway, my understanding is that I would use a Genbank parser (or
>>> write one). Write a EntityReceiver interface (probably more than one
>>> given the number of entities in BioSQL, implement a EntityBuilder
>>> (again possibly more than one) that implements EntityReceiver and
>>> builds Entity beans from messages it receives. In this case I probably
>>> wouldn't provide a writer as JPA would be writing the beans to the
>>> database.  Would this be how you imagine it?
>>>
>>> - Mark
>>>
>>>
>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>>  wrote:
>>>> (From now on I will only be posting these development messages to
>>>> biojava-dev, which is the intended purpose of that list. Those of you who
>>>> wish to keep track of things but are currently only subscribed to
>>> biojava-l
>>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>>
>>>> As promised, I've committed a new package in the biojava-core module that
>>>> should help understand how to do file parsing and conversion and writing
>>> in
>>>> the new BJ3 modules. Here's an example of how to use it to write a
>>> Genbank
>>>> parser (note no parsers actually exist yet!):
>>>>
>>>> 1. Design yourself a Genbank class which implements the interface Thing
>>> and
>>>> can fully represent all the data that might possibly occur inside a
>>> Genbank
>>>> file.
>>>>
>>>> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
>>>> and defines all the methods you might need in order to construct a
>>> Genbank
>>>> object in an asynchronous fashion.
>>>>
>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>>>> ThingBuilder. It's job is to receive data via method calls, use that data
>>> to
>>>> construct a Genbank object, then provide that object on demand.
>>>>
>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>>> constructing new Genbank objects, it writes Genbank records to file that
>>>> reflect the data it receives.
>>>>
>>>> 5. Write a GenbankReader class which implements ThingReader. It can read
>>>> GenbankFiles and output the data to the methods of the ThingReceiver
>>>> provided to it, which in this case could be anything which implements the
>>>> interface GenbankReceiver.
>>>>
>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
>>>> Genbank object and will fire off data from it to the provided
>>> ThingReceiver
>>>> (a GenbankReceiver instance) as if the Genbank object was being read from
>>> a
>>>> file or some other source.
>>>>
>>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 or
>>> 2,
>>>> but the additional steps are necessary for flexibility in converting
>>> between
>>>> formats.
>>>>
>>>> Now to use it (you'll probably want a GenbankTools class to wrap these
>>> steps
>>>> up for user-friendliness, including various options for opening files,
>>>> etc.):
>>>>
>>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
>>> the
>>>> reader, and GenbankBuilder as the receiver. Use the iterator methods on
>>>> ThingParser to get the objects out.
>>>>
>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>>> wrapping
>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>>> parseAll()
>>>> method on the ThingParser to dump the whole lot to your chosen output.
>>>>
>>>> The clever bit comes when you want to convert between files. Imagine
>>> you've
>>>> done all the above for Genbank, and you've also done it for FASTA. How to
>>>> convert between them? What you need to do is this:
>>>>
>>>> 1. Implement all the classes for both Genbank and FASTA.
>>>>
>>>> 2. Write a GenbankFASTAConverter class that implements
>>> ThingConverter
>>>> and GenbankReceiver, and will internally convert the data received and
>>> pass
>>>> it on out to the receiver provided, which will be a FASTAReceiver
>>> instance.
>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>>> opposite
>>>> way, implementing ThingConverter and FASTAReceiver.
>>>>
>>>> Then to convert you use ThingParser again:
>>>>
>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>>> FASTAGenbankConverter instance to the converter chain. Use the iterator
>>> to
>>>> get your Genbank objects out of your FASTA file.
>>>>
>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>>> GenbankWriter instead and use parseAll() instead of the iterator methos.
>>>>
>>>> 3. From FASTA object to Genbank object: Same as option 1, but provide a
>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>>
>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
>>>> reader and the receiver as per options 2 and 3.
>>>>
>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>>> mentions
>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>>
>>>> One last and very important feature of this approach is that if you
>>> discover
>>>> that nobody has written the appropriate converter for your chosen pair of
>>>> formats A and C, but converters do exist to map A to some other format B
>>> and
>>>> that other format B on to C, then you can just put the two converts A-B
>>> and
>>>> B-C into the ThingParser chain and it'll work perfectly.
>>>>
>>>> Enjoy!
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Finance Director, Eagle Genomics Ltd
>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>
>>
>>
>

From augustovmail-java at yahoo.com.br  Tue Oct 21 07:45:41 2008
From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo)
Date: Tue, 21 Oct 2008 13:45:41 +0200
Subject: [Biojava-l] SimpleRichAnnotation
In-Reply-To: <381a3e850810210421u54058163ncf347b57394af1b2@mail.gmail.com>
References: <381a3e850810210421u54058163ncf347b57394af1b2@mail.gmail.com>
Message-ID: <381a3e850810210445sc801d40ja36655349b5920b9@mail.gmail.com>

Hi everyone,

I am having problems with the class SimpleRichAnnotation.
I have one term t of ontology o and I put one note n (with the term t)
in an SimpleRichAnnotation object a, but in the moment i call the
method
a.getProperties(t) it didn't return the note n.
I saw in the code of Biojava that the method getProperties imports the
term t into of the ontology default before to do the search. Because
this it doesn't return the correct note.

Please, someone knows why is this method changing the ontology?

Thanks,
--
Augusto F. Vellozo



-- 
Augusto F. Vellozo

From charles at imbusch.net  Tue Oct 21 10:00:45 2008
From: charles at imbusch.net (Charles Imbusch)
Date: Tue, 21 Oct 2008 16:00:45 +0200
Subject: [Biojava-l] parsing tblastn results
In-Reply-To: 
References: <48F50908.5060307@imbusch.net>	
	
Message-ID: <48FDE08D.8000300@imbusch.net>

Thank you David and Richard for the quick replies.
I downloaded two files from 
http://bugzilla.open-bio.org/show_bug.cgi?id=2603
and tried to apply the patches. I suppose that's the way to get the modified
BlastSAXParser.java.

charlie at custodian:~/biojava-live_1.6$ patch -p0 < BlastSAXParser.java.patch
(Stripping trailing CRs from patch.)
patching file src/org/biojava/bio/program/sax/BlastSAXParser.java
Hunk #1 FAILED at 60.
Hunk #2 FAILED at 631.
Hunk #3 FAILED at 643.
Hunk #4 FAILED at 650.
4 out of 4 hunks FAILED -- saving rejects to file 
src/org/biojava/bio/program/sax/BlastSAXParser.java.rej

and similar for the other file

charlie at custodian:~/biojava-live_1.6$ patch -p0 < 
HitSectionSAXParser.java.patch
(Stripping trailing CRs from patch.)
patching file src/org/biojava/bio/program/sax/HitSectionSAXParser.java
Hunk #1 FAILED at 41.
Hunk #2 FAILED at 65.
Hunk #3 FAILED at 96.
Hunk #4 FAILED at 515.
Hunk #5 FAILED at 524.
5 out of 5 hunks FAILED -- saving rejects to file 
src/org/biojava/bio/program/sax/HitSectionSAXParser.java.rej

Obviously something went wrong, but I couldn't figure out what. I 
uploaded the rej files to
http://charles.imbusch.net/tmp/

Any hint is appreciated.

cheers,
  Charles

From crackeur at comcast.net  Tue Oct 21 22:21:57 2008
From: crackeur at comcast.net (jimmy Zhang)
Date: Tue, 21 Oct 2008 19:21:57 -0700
Subject: [Biojava-l] [ANN] VTD-XML extended edition released
References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	<93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>
Message-ID: <009401c933ec$f572a700$0402a8c0@your55e5f9e3d2>

The Java version of extended VTD-XmL is released and available for download.
This version supports 256 GB max file sizes and memory mapped capabilities.
The updated documentation is also available for download. In short, you can
basically do full XPath query on documents that are bigger than memory space
available on your machine.

A special thanks to Duane May who provided value suggestions and inputs and
helped refine the VTD specs to make this happen.

To download the package and the documentation, go to
https://sourceforge.net/project/downloading.php?group_id=110612&use_mirror=&filename=vtd-xml_2.4_doc.zip&64621261

https://sourceforge.net/project/downloading.php?group_id=110612&use_mirror=&filename=ximpleware_extended_2.4.zip&99532507



From pzgyuanf at gmail.com  Sat Oct 25 20:57:16 2008
From: pzgyuanf at gmail.com (pprun)
Date: Sun, 26 Oct 2008 08:57:16 +0800
Subject: [Biojava-l] Test failed for Alphabet.getSymbolMatchType method
Message-ID: 

Hi,
The current implementation uses the same condition equalsIgnoreCase for
EXACT_STRING_MATCH and MIXED_CASE_MATCH


    public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
       ...
        if (a.toString().equalsIgnoreCase(b.toString())) {
            return SymbolMatchType.EXACT_STRING_MATCH;
        }
        if (a.toString().equalsIgnoreCase(b.toString())) {
            return SymbolMatchType.MIXED_CASE_MATCH;
        }
          ...

String.equals should be used for EXACT_STRING_MATCH:

    public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
        ...
        if (a.toString().equals(b.toString())) {
            return SymbolMatchType.EXACT_STRING_MATCH;
        }
        if (a.toString().equalsIgnoreCase(b.toString())) {
            return SymbolMatchType.MIXED_CASE_MATCH;
        }
          ...

The test case used to identify the above bug is:

/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */
package org.biojava.core.symbol;

import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import static org.junit.Assert.*;

/**
 *
 * @author pprun
 */
public class AlphabetTest {

    public AlphabetTest() {
    }

    @BeforeClass
    public static void setUpClass() throws Exception {
    }

    @AfterClass
    public static void tearDownClass() throws Exception {
    }

    @Before
    public void setUp() {
    }

    @After
    public void tearDown() {
    }

    /**
     * Test of getSymbolMatchType method, of class Alphabet.
     */
    @Test
    public void testGetSymbolMatchType() {
        System.out.println("getSymbolMatchType");

        Alphabet testAlphabet = new Alphabet("testGetSymbolMatchType");

        // 1. exact match
        Symbol a = Symbol.get("ATGC");
        Symbol b = Symbol.get("ATGC");
        SymbolMatchType expResult = SymbolMatchType.EXACT_MATCH;
        SymbolMatchType result = testAlphabet.getSymbolMatchType(a, b);
        assertEquals(expResult, result);

        // 2. mixed case match
        a = Symbol.get("ATGC");
        b = Symbol.get("aTGC");
        expResult = SymbolMatchType.MIXED_CASE_MATCH;
        result = testAlphabet.getSymbolMatchType(a, b);
        assertEquals(expResult, result);
    }
}


BTW., how can I get the dev/test role?
Then I can contribute to the development or test (as I'm still a
beginner for bio field) for BJ3.

Thanks,
Pprun


From gabrielle_doan at gmx.net  Mon Oct 27 08:57:03 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Mon, 27 Oct 2008 13:57:03 +0100
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database
Message-ID: <4905BA9F.1060400@gmx.net>

Hi all,

I have a BioSQL database which contains all human chromsomes. For my 
recent project I have to query for a part of a sequence.
As far as I know I can get the whole sequence from the entry 
Biosequence.Seq in the BioSQL schema. So I've made this query:

SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;

But this query hasn't yield the desired string, because the length of 
this biosequence is only 100,000,020 bp. I am very confused why I get 
such a discrepancy. I have added all chromosomes with the build in 
method in BioJava addRichSequence(RichSequence seq) to the database. 
 From my raw data I know that this sequence should have a length of 
140,279,252 bp. So where is the remaining part of my sequence? I have 
observed these discrepancies on all chromsomes which are longer than 
100,000,020 bp.

Here is an abstract of my database:
bioentry_id	description	length	
2	Homo sapiens mitochondrion, complete genome.	16571	
3	Homo sapiens chromosome Y, reference assembly, complete sequence. 
57772954	
4	Homo sapiens chromosome X, reference assembly, complete sequence. 
100000020	
5	Homo sapiens chromosome 22, reference assembly, complete sequence. 
49691432	
6	Homo sapiens chromosome 21, reference assembly, complete sequence. 
46944323	
7	Homo sapiens chromosome 20, reference assembly, complete sequence. 
25960004	
8	Homo sapiens chromosome 9, reference assembly, complete sequence. 
100000020	
9	Homo sapiens chromosome 7, reference assembly, complete sequence. 
100000020	

Sequences smaller than 100,000,020 bp are correctly stored under 
Biosequence.seq.

I am grateful for any hints, which explain the behaviour of my database.

Cheers,

Gabrielle

From gabrielle_doan at gmx.net  Tue Oct 28 10:26:47 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Tue, 28 Oct 2008 15:26:47 +0100
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database]
Message-ID: <49072127.7010304@gmx.net>

Hi all,
concering the problem as described below I have found out that this 
problem also occured in BioRuby and was fixed in 2004.
See: 
http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb?cvsroot=bioruby
Unfortunately I'm clueless about BioRuby. Does anybody recognize this 
problem or understand how it was solved in BioRuby?

I am grateful for any hints.

Cheers,

Gabrielle


-------- Original-Nachricht --------
Betreff: [Biojava-l] differences between read in sequence and stored 
sequence in database
Datum: Mon, 27 Oct 2008 13:57:03 +0100
Von: Gabrielle Doan 
An: biojava-l at biojava.org

Hi all,

I have a BioSQL database which contains all human chromsomes. For my
recent project I have to query for a part of a sequence.
As far as I know I can get the whole sequence from the entry
Biosequence.Seq in the BioSQL schema. So I've made this query:

SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;

But this query hasn't yield the desired string, because the length of
this biosequence is only 100,000,020 bp. I am very confused why I get
such a discrepancy. I have added all chromosomes with the build in
method in BioJava addRichSequence(RichSequence seq) to the database.
 From my raw data I know that this sequence should have a length of
140,279,252 bp. So where is the remaining part of my sequence? I have
observed these discrepancies on all chromsomes which are longer than
100,000,020 bp.

Here is an abstract of my database:
bioentry_id	description	length	
2	Homo sapiens mitochondrion, complete genome.	16571	
3	Homo sapiens chromosome Y, reference assembly, complete sequence.
57772954	
4	Homo sapiens chromosome X, reference assembly, complete sequence.
100000020	
5	Homo sapiens chromosome 22, reference assembly, complete sequence.
49691432	
6	Homo sapiens chromosome 21, reference assembly, complete sequence.
46944323	
7	Homo sapiens chromosome 20, reference assembly, complete sequence.
25960004	
8	Homo sapiens chromosome 9, reference assembly, complete sequence.
100000020	
9	Homo sapiens chromosome 7, reference assembly, complete sequence.
100000020	

Sequences smaller than 100,000,020 bp are correctly stored under
Biosequence.seq.

I am grateful for any hints, which explain the behaviour of my database.

Cheers,

Gabrielle
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From dtoomey at rcsi.ie  Wed Oct 29 06:45:45 2008
From: dtoomey at rcsi.ie (David Toomey)
Date: Wed, 29 Oct 2008 10:45:45 +0000
Subject: [Biojava-l] How to get full query description from blast result
Message-ID: 

Hi

I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query

Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
falciparum (isolate 3D7) GN=ABRA

I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation

Can anyone point me in the right direction for retrieving the complete query description?

Thanks

Dave



From holland at eaglegenomics.com  Thu Oct 30 10:07:42 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 30 Oct 2008 14:07:42 +0000
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database]
In-Reply-To: <49072127.7010304@gmx.net>
References: <49072127.7010304@gmx.net>
Message-ID: 

Hello.

Sorry for the delayed reply - I've been away on business all week.

The similar Ruby issue (and solution) is discussed here:

http://portal.open-bio.org/pipermail/bioruby/2004-March.txt

How did you parse the files in the first place? Did you use the new
GenBank parsers (BJX), or the older ones? This will help indicate
where the problem lies - the data will have been truncated at the
point it was parsed from file, so the data in your database will
reflect this and you'll have to reload it once the appropriate parser
has been fixed.

If it was the newer BJX parser, then the problem most probably lies in
this regex from org.biojavax.bio.seq.io.GenbankFormat, which can
probably be fixed in a similar manner to the Ruby equivalent dicussed
in the posting above:

    protected static final Pattern sectp =
Pattern.compile("^(\\s{0,8}(\\S+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");

Could someone volunteer to develop and test a fix? If you come up with
something, please commit it to the SVN trunk.

cheers,
Richard


2008/10/28 Gabrielle Doan :
> Hi all,
> concering the problem as described below I have found out that this problem
> also occured in BioRuby and was fixed in 2004.
> See:
> http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb?cvsroot=bioruby
> Unfortunately I'm clueless about BioRuby. Does anybody recognize this
> problem or understand how it was solved in BioRuby?
>
> I am grateful for any hints.
>
> Cheers,
>
> Gabrielle
>
>
> -------- Original-Nachricht --------
> Betreff: [Biojava-l] differences between read in sequence and stored
> sequence in database
> Datum: Mon, 27 Oct 2008 13:57:03 +0100
> Von: Gabrielle Doan 
> An: biojava-l at biojava.org
>
> Hi all,
>
> I have a BioSQL database which contains all human chromsomes. For my
> recent project I have to query for a part of a sequence.
> As far as I know I can get the whole sequence from the entry
> Biosequence.Seq in the BioSQL schema. So I've made this query:
>
> SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
>
> But this query hasn't yield the desired string, because the length of
> this biosequence is only 100,000,020 bp. I am very confused why I get
> such a discrepancy. I have added all chromosomes with the build in
> method in BioJava addRichSequence(RichSequence seq) to the database.
> From my raw data I know that this sequence should have a length of
> 140,279,252 bp. So where is the remaining part of my sequence? I have
> observed these discrepancies on all chromsomes which are longer than
> 100,000,020 bp.
>
> Here is an abstract of my database:
> bioentry_id     description     length
> 2       Homo sapiens mitochondrion, complete genome.    16571
> 3       Homo sapiens chromosome Y, reference assembly, complete sequence.
> 57772954
> 4       Homo sapiens chromosome X, reference assembly, complete sequence.
> 100000020
> 5       Homo sapiens chromosome 22, reference assembly, complete sequence.
> 49691432
> 6       Homo sapiens chromosome 21, reference assembly, complete sequence.
> 46944323
> 7       Homo sapiens chromosome 20, reference assembly, complete sequence.
> 25960004
> 8       Homo sapiens chromosome 9, reference assembly, complete sequence.
> 100000020
> 9       Homo sapiens chromosome 7, reference assembly, complete sequence.
> 100000020
>
> Sequences smaller than 100,000,020 bp are correctly stored under
> Biosequence.seq.
>
> I am grateful for any hints, which explain the behaviour of my database.
>
> Cheers,
>
> Gabrielle
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From holland at eaglegenomics.com  Thu Oct 30 10:10:12 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 30 Oct 2008 14:10:12 +0000
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: 
References: 
Message-ID: 

Good question!

Can someone who knows a lot about the blast parser internals provide
David with an answer to his question?

cheers,
Richard

2008/10/29 David Toomey :
> Hi
>
> I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query
>
> Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> falciparum (isolate 3D7) GN=ABRA
>
> I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation
>
> Can anyone point me in the right direction for retrieving the complete query description?
>
> Thanks
>
> Dave
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From markjschreiber at gmail.com  Fri Oct 31 03:26:35 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 31 Oct 2008 15:26:35 +0800
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database
In-Reply-To: <4905BA9F.1060400@gmx.net>
References: <4905BA9F.1060400@gmx.net>
Message-ID: <93b45ca50810310026o6ee35a61sf2815c3547e1e679@mail.gmail.com>

Could this be a database implementation issue? Is there a limit on how
long a field can be in your DB?

- Mark

On Mon, Oct 27, 2008 at 8:57 PM, Gabrielle Doan  wrote:
>
> Hi all,
>
> I have a BioSQL database which contains all human chromsomes. For my recent project I have to query for a part of a sequence.
> As far as I know I can get the whole sequence from the entry Biosequence.Seq in the BioSQL schema. So I've made this query:
>
> SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
>
> But this query hasn't yield the desired string, because the length of this biosequence is only 100,000,020 bp. I am very confused why I get such a discrepancy. I have added all chromosomes with the build in method in BioJava addRichSequence(RichSequence seq) to the database. From my raw data I know that this sequence should have a length of 140,279,252 bp. So where is the remaining part of my sequence? I have observed these discrepancies on all chromsomes which are longer than 100,000,020 bp.
>
> Here is an abstract of my database:
> bioentry_id     description     length
> 2       Homo sapiens mitochondrion, complete genome.    16571
> 3       Homo sapiens chromosome Y, reference assembly, complete sequence. 57772954
> 4       Homo sapiens chromosome X, reference assembly, complete sequence. 100000020
> 5       Homo sapiens chromosome 22, reference assembly, complete sequence. 49691432
> 6       Homo sapiens chromosome 21, reference assembly, complete sequence. 46944323
> 7       Homo sapiens chromosome 20, reference assembly, complete sequence. 25960004
> 8       Homo sapiens chromosome 9, reference assembly, complete sequence. 100000020
> 9       Homo sapiens chromosome 7, reference assembly, complete sequence. 100000020
>
> Sequences smaller than 100,000,020 bp are correctly stored under Biosequence.seq.
>
> I am grateful for any hints, which explain the behaviour of my database.
>
> Cheers,
>
> Gabrielle
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From markjschreiber at gmail.com  Fri Oct 31 04:00:35 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 31 Oct 2008 16:00:35 +0800
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: 
References: 
	
Message-ID: <93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>

Hi -

If you use the BlastEcho program on the cookbook pages you can find
out if and how the information is being parsed and where it goes.

It is possible it is not parsed. In this case you could add a feature request.

- Mark

On Thu, Oct 30, 2008 at 10:10 PM, Richard Holland
 wrote:
>
> Good question!
>
> Can someone who knows a lot about the blast parser internals provide
> David with an answer to his question?
>
> cheers,
> Richard
>
> 2008/10/29 David Toomey :
> > Hi
> >
> > I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query
> >
> > Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> > falciparum (isolate 3D7) GN=ABRA
> >
> > I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation
> >
> > Can anyone point me in the right direction for retrieving the complete query description?
> >
> > Thanks
> >
> > Dave
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From community at struck.lu  Fri Oct 31 06:05:00 2008
From: community at struck.lu (community at struck.lu)
Date: Fri, 31 Oct 2008 11:05:00 +0100
Subject: [Biojava-l] SCF: support for ambiguities
Message-ID: 

Hello,


I am using the SCF class in the context of HIV-1 population sequencing. In
this context we do have sometimes ambiguous base calls. To support them I
extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.

Therefore I simply added the following code to the "decode" function:

#########################
        public Symbol decode(byte call) throws IllegalSymbolException {

            //get the DNA Alphabet
            Alphabet dna = DNATools.getDNA();

            char c = (char) call;
            switch (c) {
                case 'a':
                case 'A':
                    return DNATools.a();
                case 'c':
                case 'C':
                    return DNATools.c();
                case 'g':
                case 'G':
                    return DNATools.g();
                case 't':
                case 'T':
                    return DNATools.t();
                case 'n':
                case 'N':
                    return DNATools.n();
                case '-':
                    return DNATools.getDNA().getGapSymbol();
                case 'w':
                case 'W':
                    //make the 'W' symbol
                    Set symbolsThatMakeW = new HashSet();
                    symbolsThatMakeW.add(DNATools.a());
                    symbolsThatMakeW.add(DNATools.t());
                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
                    return w;
                case 's':
                case 'S':
                    //make the 'S' symbol
                    Set symbolsThatMakeS = new HashSet();
                    symbolsThatMakeS.add(DNATools.c());
                    symbolsThatMakeS.add(DNATools.g());
                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
                    return s;
... (and so on)
#########################

Is this the right way to do it? And if so, how can this code be submitted to
the official biojava source code?


Best regards,
Daniel Struck
_________________________________________________________
Mail sent using root eSolutions Webmailer - www.root.lu



From dtoomey at rcsi.ie  Fri Oct 31 08:07:19 2008
From: dtoomey at rcsi.ie (David Toomey)
Date: Fri, 31 Oct 2008 12:07:19 +0000
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: <93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
References: 
	
	<93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
Message-ID: 

Hi Mark

I tried that and it appears that it is not being parsed. Only the portion of the line up to the first space is returned as queryId. The rest of the line is not returned.
Could this be added to the blast parser?

Cheers

Dave


-----Original Message-----
From: Mark Schreiber [mailto:markjschreiber at gmail.com]
Sent: 31 October 2008 08:01
To: holland at eaglegenomics.com
Cc: David Toomey; biojava-l at biojava.org
Subject: Re: [Biojava-l] How to get full query description from blast result

Hi -

If you use the BlastEcho program on the cookbook pages you can find
out if and how the information is being parsed and where it goes.

It is possible it is not parsed. In this case you could add a feature request.

- Mark

On Thu, Oct 30, 2008 at 10:10 PM, Richard Holland
 wrote:
>
> Good question!
>
> Can someone who knows a lot about the blast parser internals provide
> David with an answer to his question?
>
> cheers,
> Richard
>
> 2008/10/29 David Toomey :
> > Hi
> >
> > I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query
> >
> > Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> > falciparum (isolate 3D7) GN=ABRA
> >
> > I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation
> >
> > Can anyone point me in the right direction for retrieving the complete query description?
> >
> > Thanks
> >
> > Dave
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From simon.foote at nrc-cnrc.gc.ca  Fri Oct 31 07:56:30 2008
From: simon.foote at nrc-cnrc.gc.ca (Simon Foote)
Date: Fri, 31 Oct 2008 07:56:30 -0400
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: <93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
References: 
	<93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
Message-ID: <490AF26E.7000604@nrc-cnrc.gc.ca>

Mark is right
A quick look at the code shows that for the query line, it extracts 
everything upto the first whitespace and puts that into the queryId and 
everything else is discarded.
To get the full description, some additional code is needed to populate 
a queryDescription with everything from the query line upto the query 
length information which is contained in parentheses.

Simon

Bioinformatics Specialist
Institute for Biological Sciences | Institut des sciences biologiques
National Research Council of Canada | Conseil national de recherches Canada
Ottawa, Canada K1A 0R6
Telephone | T?l?phone 613-990-3600 / Facsimile | T?l?copieur 613-990-9092
Government of Canada | Gouvernement du Canada



Mark Schreiber wrote:
>
> Hi -
>
> If you use the BlastEcho program on the cookbook pages you can find
> out if and how the information is being parsed and where it goes.
>
> It is possible it is not parsed. In this case you could add a feature 
> request.
>
> - Mark
>
> On Thu, Oct 30, 2008 at 10:10 PM, Richard Holland
>  wrote:
> >
> > Good question!
> >
> > Can someone who knows a lot about the blast parser internals provide
> > David with an answer to his question?
> >
> > cheers,
> > Richard
> >
> > 2008/10/29 David Toomey :
> > > Hi
> > >
> > > I am parsing blast results and I need to get the complete query 
> description line but I can only work out how to get the first part of 
> the line. So for example in the blast result query
> > >
> > > Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> > > falciparum (isolate 3D7) GN=ABRA
> > >
> > > I need to get all of the description above but I can only seem to 
> retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the 
> queryId property of the annotation
> > >
> > > Can anyone point me in the right direction for retrieving the 
> complete query description?
> > >
> > > Thanks
> > >
> > > Dave
> > >
> > >
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >
> >
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From benb at fruitfly.org  Fri Oct 31 09:38:32 2008
From: benb at fruitfly.org (Ben Berman)
Date: Fri, 31 Oct 2008 06:38:32 -0700
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
Message-ID: 


Is there a reason why IUPAC ambiguity codes have never been added to  
DNATools?  Would it hurt the performance of symbol lookups?


On Oct 31, 2008, at 3:05 AM, community at struck.lu wrote:

> Hello,
>
>
> I am using the SCF class in the context of HIV-1 population  
> sequencing. In
> this context we do have sometimes ambiguous base calls. To support  
> them I
> extended the SCF class to allow for IUPAC ambiguities up to 2  
> nucleotides.
>
> Therefore I simply added the following code to the "decode" function:
>
> #########################
>        public Symbol decode(byte call) throws IllegalSymbolException {
>
>            //get the DNA Alphabet
>            Alphabet dna = DNATools.getDNA();
>
>            char c = (char) call;
>            switch (c) {
>                case 'a':
>                case 'A':
>                    return DNATools.a();
>                case 'c':
>                case 'C':
>                    return DNATools.c();
>                case 'g':
>                case 'G':
>                    return DNATools.g();
>                case 't':
>                case 'T':
>                    return DNATools.t();
>                case 'n':
>                case 'N':
>                    return DNATools.n();
>                case '-':
>                    return DNATools.getDNA().getGapSymbol();
>                case 'w':
>                case 'W':
>                    //make the 'W' symbol
>                    Set symbolsThatMakeW = new HashSet();
>                    symbolsThatMakeW.add(DNATools.a());
>                    symbolsThatMakeW.add(DNATools.t());
>                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>                    return w;
>                case 's':
>                case 'S':
>                    //make the 'S' symbol
>                    Set symbolsThatMakeS = new HashSet();
>                    symbolsThatMakeS.add(DNATools.c());
>                    symbolsThatMakeS.add(DNATools.g());
>                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>                    return s;
> ... (and so on)
> #########################
>
> Is this the right way to do it? And if so, how can this code be  
> submitted to
> the official biojava source code?
>
>
> Best regards,
> Daniel Struck
> _________________________________________________________
> Mail sent using root eSolutions Webmailer - www.root.lu
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

----
Ben Berman, PhD
Research Associate, USC Epigenome Center
Harlyne J. Norris Research Tower
1450 Biggy St.
Room #G511, MC 9601
Los Angeles, CA 90033


From holland at eaglegenomics.com  Fri Oct 31 09:56:54 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 31 Oct 2008 13:56:54 +0000
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
Message-ID: 

It is the correct method, yes.

However your code constructs a new hash set every time it does the
check for W or S etc.. It would be much more efficient to create
class-static references to the ambiguity symbols you need, instead of
(re)creating them every time they're encountered. A class-static gap
symbol reference would also be good in this situation.

cheers,
Richard



2008/10/31 community at struck.lu :
> Hello,
>
>
> I am using the SCF class in the context of HIV-1 population sequencing. In
> this context we do have sometimes ambiguous base calls. To support them I
> extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
>
> Therefore I simply added the following code to the "decode" function:
>
> #########################
>        public Symbol decode(byte call) throws IllegalSymbolException {
>
>            //get the DNA Alphabet
>            Alphabet dna = DNATools.getDNA();
>
>            char c = (char) call;
>            switch (c) {
>                case 'a':
>                case 'A':
>                    return DNATools.a();
>                case 'c':
>                case 'C':
>                    return DNATools.c();
>                case 'g':
>                case 'G':
>                    return DNATools.g();
>                case 't':
>                case 'T':
>                    return DNATools.t();
>                case 'n':
>                case 'N':
>                    return DNATools.n();
>                case '-':
>                    return DNATools.getDNA().getGapSymbol();
>                case 'w':
>                case 'W':
>                    //make the 'W' symbol
>                    Set symbolsThatMakeW = new HashSet();
>                    symbolsThatMakeW.add(DNATools.a());
>                    symbolsThatMakeW.add(DNATools.t());
>                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>                    return w;
>                case 's':
>                case 'S':
>                    //make the 'S' symbol
>                    Set symbolsThatMakeS = new HashSet();
>                    symbolsThatMakeS.add(DNATools.c());
>                    symbolsThatMakeS.add(DNATools.g());
>                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>                    return s;
> ... (and so on)
> #########################
>
> Is this the right way to do it? And if so, how can this code be submitted to
> the official biojava source code?
>
>
> Best regards,
> Daniel Struck
> _________________________________________________________
> Mail sent using root eSolutions Webmailer - www.root.lu
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From holland at eaglegenomics.com  Fri Oct 31 10:40:10 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 31 Oct 2008 14:40:10 +0000
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
	
Message-ID: 

It would be fine to add them there too. You'd still need to modify the
SCF parser though in order for it to be able to know about them.

cheers,
Richard

2008/10/31 Ben Berman :
>
> Is there a reason why IUPAC ambiguity codes have never been added to
> DNATools?  Would it hurt the performance of symbol lookups?
>
>
> On Oct 31, 2008, at 3:05 AM, community at struck.lu wrote:
>
>> Hello,
>>
>>
>> I am using the SCF class in the context of HIV-1 population sequencing. In
>> this context we do have sometimes ambiguous base calls. To support them I
>> extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
>>
>> Therefore I simply added the following code to the "decode" function:
>>
>> #########################
>>       public Symbol decode(byte call) throws IllegalSymbolException {
>>
>>           //get the DNA Alphabet
>>           Alphabet dna = DNATools.getDNA();
>>
>>           char c = (char) call;
>>           switch (c) {
>>               case 'a':
>>               case 'A':
>>                   return DNATools.a();
>>               case 'c':
>>               case 'C':
>>                   return DNATools.c();
>>               case 'g':
>>               case 'G':
>>                   return DNATools.g();
>>               case 't':
>>               case 'T':
>>                   return DNATools.t();
>>               case 'n':
>>               case 'N':
>>                   return DNATools.n();
>>               case '-':
>>                   return DNATools.getDNA().getGapSymbol();
>>               case 'w':
>>               case 'W':
>>                   //make the 'W' symbol
>>                   Set symbolsThatMakeW = new HashSet();
>>                   symbolsThatMakeW.add(DNATools.a());
>>                   symbolsThatMakeW.add(DNATools.t());
>>                   Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>>                   return w;
>>               case 's':
>>               case 'S':
>>                   //make the 'S' symbol
>>                   Set symbolsThatMakeS = new HashSet();
>>                   symbolsThatMakeS.add(DNATools.c());
>>                   symbolsThatMakeS.add(DNATools.g());
>>                   Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>>                   return s;
>> ... (and so on)
>> #########################
>>
>> Is this the right way to do it? And if so, how can this code be submitted
>> to
>> the official biojava source code?
>>
>>
>> Best regards,
>> Daniel Struck
>> _________________________________________________________
>> Mail sent using root eSolutions Webmailer - www.root.lu
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
> ----
> Ben Berman, PhD
> Research Associate, USC Epigenome Center
> Harlyne J. Norris Research Tower
> 1450 Biggy St.
> Room #G511, MC 9601
> Los Angeles, CA 90033
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From community at struck.lu  Fri Oct 31 12:06:45 2008
From: community at struck.lu (community at struck.lu)
Date: Fri, 31 Oct 2008 17:06:45 +0100
Subject: [Biojava-l] SCF: support for ambiguities
Message-ID: 

True. It was a first quick and dirty hack to get the rest of my project going.

I think adding support of the IUPAC ambiguities to DNATools would be the most
approbate solution. The SCF class can then easily be adapted.

Are there any plans to do so?
If not, I could give it a try and submit a patch for DNATools and SCF.

Greetings,
Daniel

"Richard Holland"  wrote:

> It is the correct method, yes.
> 
> However your code constructs a new hash set every time it does the
> check for W or S etc.. It would be much more efficient to create
> class-static references to the ambiguity symbols you need, instead of
> (re)creating them every time they're encountered. A class-static gap
> symbol reference would also be good in this situation.
> 
> cheers,
> Richard
> 
> 
> 
> 2008/10/31 community at struck.lu :
> > Hello,
> >
> >
> > I am using the SCF class in the context of HIV-1 population sequencing. In
> > this context we do have sometimes ambiguous base calls. To support them I
> > extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
> >
> > Therefore I simply added the following code to the "decode" function:
> >
> > #########################
> >        public Symbol decode(byte call) throws IllegalSymbolException {
> >
> >            //get the DNA Alphabet
> >            Alphabet dna = DNATools.getDNA();
> >
> >            char c = (char) call;
> >            switch (c) {
> >                case 'a':
> >                case 'A':
> >                    return DNATools.a();
> >                case 'c':
> >                case 'C':
> >                    return DNATools.c();
> >                case 'g':
> >                case 'G':
> >                    return DNATools.g();
> >                case 't':
> >                case 'T':
> >                    return DNATools.t();
> >                case 'n':
> >                case 'N':
> >                    return DNATools.n();
> >                case '-':
> >                    return DNATools.getDNA().getGapSymbol();
> >                case 'w':
> >                case 'W':
> >                    //make the 'W' symbol
> >                    Set symbolsThatMakeW = new HashSet();
> >                    symbolsThatMakeW.add(DNATools.a());
> >                    symbolsThatMakeW.add(DNATools.t());
> >                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
> >                    return w;
> >                case 's':
> >                case 'S':
> >                    //make the 'S' symbol
> >                    Set symbolsThatMakeS = new HashSet();
> >                    symbolsThatMakeS.add(DNATools.c());
> >                    symbolsThatMakeS.add(DNATools.g());
> >                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
> >                    return s;
> > ... (and so on)
> > #########################
> >
> > Is this the right way to do it? And if so, how can this code be submitted
to
> > the official biojava source code?
> >
> >
> > Best regards,
> > Daniel Struck
> > _________________________________________________________
> > Mail sent using root eSolutions Webmailer - www.root.lu
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> 
> 


_________________________________________________________
Mail sent using root eSolutions Webmailer - www.root.lu



From holland at eaglegenomics.com  Fri Oct 31 12:14:30 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 31 Oct 2008 16:14:30 +0000
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
Message-ID: 

A patch would be much appreciated!

cheers,
Richard

2008/10/31 community at struck.lu :
> True. It was a first quick and dirty hack to get the rest of my project going.
>
> I think adding support of the IUPAC ambiguities to DNATools would be the most
> approbate solution. The SCF class can then easily be adapted.
>
> Are there any plans to do so?
> If not, I could give it a try and submit a patch for DNATools and SCF.
>
> Greetings,
> Daniel
>
> "Richard Holland"  wrote:
>
>> It is the correct method, yes.
>>
>> However your code constructs a new hash set every time it does the
>> check for W or S etc.. It would be much more efficient to create
>> class-static references to the ambiguity symbols you need, instead of
>> (re)creating them every time they're encountered. A class-static gap
>> symbol reference would also be good in this situation.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/31 community at struck.lu :
>> > Hello,
>> >
>> >
>> > I am using the SCF class in the context of HIV-1 population sequencing. In
>> > this context we do have sometimes ambiguous base calls. To support them I
>> > extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
>> >
>> > Therefore I simply added the following code to the "decode" function:
>> >
>> > #########################
>> >        public Symbol decode(byte call) throws IllegalSymbolException {
>> >
>> >            //get the DNA Alphabet
>> >            Alphabet dna = DNATools.getDNA();
>> >
>> >            char c = (char) call;
>> >            switch (c) {
>> >                case 'a':
>> >                case 'A':
>> >                    return DNATools.a();
>> >                case 'c':
>> >                case 'C':
>> >                    return DNATools.c();
>> >                case 'g':
>> >                case 'G':
>> >                    return DNATools.g();
>> >                case 't':
>> >                case 'T':
>> >                    return DNATools.t();
>> >                case 'n':
>> >                case 'N':
>> >                    return DNATools.n();
>> >                case '-':
>> >                    return DNATools.getDNA().getGapSymbol();
>> >                case 'w':
>> >                case 'W':
>> >                    //make the 'W' symbol
>> >                    Set symbolsThatMakeW = new HashSet();
>> >                    symbolsThatMakeW.add(DNATools.a());
>> >                    symbolsThatMakeW.add(DNATools.t());
>> >                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>> >                    return w;
>> >                case 's':
>> >                case 'S':
>> >                    //make the 'S' symbol
>> >                    Set symbolsThatMakeS = new HashSet();
>> >                    symbolsThatMakeS.add(DNATools.c());
>> >                    symbolsThatMakeS.add(DNATools.g());
>> >                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>> >                    return s;
>> > ... (and so on)
>> > #########################
>> >
>> > Is this the right way to do it? And if so, how can this code be submitted
> to
>> > the official biojava source code?
>> >
>> >
>> > Best regards,
>> > Daniel Struck
>> > _________________________________________________________
>> > Mail sent using root eSolutions Webmailer - www.root.lu
>> >
>> >
>> > _______________________________________________
>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>>
>>
>
>
> _________________________________________________________
> Mail sent using root eSolutions Webmailer - www.root.lu
>
>
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/

From gabrielle_doan at gmx.net  Fri Oct 31 11:09:56 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Fri, 31 Oct 2008 15:09:56 -0000
Subject: [Biojava-l] differences between read in sequence and stored
 sequence in database]
In-Reply-To: 
References: <49072127.7010304@gmx.net>
	
Message-ID: <490B1FB3.7010607@gmx.net>

Hi all,
I've changed the regular expression in 
org.biojavax.bio.seq.io.GenbankFormat from


protected static final Pattern sectp =
Pattern.compile("^(\\s{0,8}(\\S+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");
<\code>

to


protected static final Pattern sectp =
Pattern.compile("^(\\s{0,8}([A-Za-z]+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");
<\code>

like in BioRuby 
(http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb.diff?r1=0.24&r2=0.25&cvsroot=bioruby). 
But than features like D-loop can't be detected. So this is not the 
solution for my problem.
The reason for the truncation is readSection(BufferedReader br) in 
org.biojavax.bio.seq.io.GenbankFormat.


             if (line==null || line.length()==0 || (!line.startsWith(" 
") && linecount++>0)) {
                     // dump out last part of section
                     section.add(new String[]{currKey,currVal.toString()});
                     br.reset();
                     done = true;
<\snip>

The condition in the if-clause will ignore lines which don't begin with 
a whitespace, so this line will be read


  99999961  cccgcccaca cccctcggcc ctgccctctg gccatacagg ttctcggtgg 
tgttgaagag
<\snip>

and this line won't be read:

100000021 gtcctcgggc tccggcttgg tgctcacgca cacaggaaag tcagcttctc ctgggagggc
<\snip>

If you change the if-statement to this:


String firstSecKey = section.size() == 0 ? "" : 
((String[])section.get(0))[0];

if (line==null || line.length()==0 || (!line.startsWith(" ") && 
linecount++>0 && ( !firstSecKey.equals(START_SEQUENCE_TAG)  || 
line.startsWith(END_SEQUENCE_TAG))))
<\snip>

You can add the whole sequence without truncation to the database.
I have attached GenbankFormat.java in this mail. Can anybody check the 
method for me and commit it? Since I'm not a BioJava specialist.

Cheers,
Gabrielle



Richard Holland schrieb:
> Hello.
> 
> Sorry for the delayed reply - I've been away on business all week.
> 
> The similar Ruby issue (and solution) is discussed here:
> 
> http://portal.open-bio.org/pipermail/bioruby/2004-March.txt
> 
> How did you parse the files in the first place? Did you use the new
> GenBank parsers (BJX), or the older ones? This will help indicate
> where the problem lies - the data will have been truncated at the
> point it was parsed from file, so the data in your database will
> reflect this and you'll have to reload it once the appropriate parser
> has been fixed.
> 
> If it was the newer BJX parser, then the problem most probably lies in
> this regex from org.biojavax.bio.seq.io.GenbankFormat, which can
> probably be fixed in a similar manner to the Ruby equivalent dicussed
> in the posting above:
> 
>     protected static final Pattern sectp =
> Pattern.compile("^(\\s{0,8}(\\S+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");
> 
> Could someone volunteer to develop and test a fix? If you come up with
> something, please commit it to the SVN trunk.
> 
> cheers,
> Richard
> 
> 
> 2008/10/28 Gabrielle Doan :
>> Hi all,
>> concering the problem as described below I have found out that this problem
>> also occured in BioRuby and was fixed in 2004.
>> See:
>> http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb?cvsroot=bioruby
>> Unfortunately I'm clueless about BioRuby. Does anybody recognize this
>> problem or understand how it was solved in BioRuby?
>>
>> I am grateful for any hints.
>>
>> Cheers,
>>
>> Gabrielle
>>
>>
>> -------- Original-Nachricht --------
>> Betreff: [Biojava-l] differences between read in sequence and stored
>> sequence in database
>> Datum: Mon, 27 Oct 2008 13:57:03 +0100
>> Von: Gabrielle Doan 
>> An: biojava-l at biojava.org
>>
>> Hi all,
>>
>> I have a BioSQL database which contains all human chromsomes. For my
>> recent project I have to query for a part of a sequence.
>> As far as I know I can get the whole sequence from the entry
>> Biosequence.Seq in the BioSQL schema. So I've made this query:
>>
>> SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
>>
>> But this query hasn't yield the desired string, because the length of
>> this biosequence is only 100,000,020 bp. I am very confused why I get
>> such a discrepancy. I have added all chromosomes with the build in
>> method in BioJava addRichSequence(RichSequence seq) to the database.
>> From my raw data I know that this sequence should have a length of
>> 140,279,252 bp. So where is the remaining part of my sequence? I have
>> observed these discrepancies on all chromsomes which are longer than
>> 100,000,020 bp.
>>
>> Here is an abstract of my database:
>> bioentry_id     description     length
>> 2       Homo sapiens mitochondrion, complete genome.    16571
>> 3       Homo sapiens chromosome Y, reference assembly, complete sequence.
>> 57772954
>> 4       Homo sapiens chromosome X, reference assembly, complete sequence.
>> 100000020
>> 5       Homo sapiens chromosome 22, reference assembly, complete sequence.
>> 49691432
>> 6       Homo sapiens chromosome 21, reference assembly, complete sequence.
>> 46944323
>> 7       Homo sapiens chromosome 20, reference assembly, complete sequence.
>> 25960004
>> 8       Homo sapiens chromosome 9, reference assembly, complete sequence.
>> 100000020
>> 9       Homo sapiens chromosome 7, reference assembly, complete sequence.
>> 100000020
>>
>> Sequences smaller than 100,000,020 bp are correctly stored under
>> Biosequence.seq.
>>
>> I am grateful for any hints, which explain the behaviour of my database.
>>
>> Cheers,
>>
>> Gabrielle
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: GenbankFormat.java
Type: text/x-java
Size: 48624 bytes
Desc: not available
URL: 

From markjschreiber at gmail.com  Wed Oct  1 06:07:51 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 1 Oct 2008 14:07:51 +0800
Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast
	result
In-Reply-To: 
References: 
	
Message-ID: <93b45ca50809302307t19a652a4v4a61eeceec07aa62@mail.gmail.com>

Actually, if it is an OS specific carriage return then there is still
a minor issue. We should really try and code stuff so that it can
handle files that originate from any major OS.

- Mark

On Wed, Oct 1, 2008 at 12:31 AM, Richard Holland
 wrote:
>
> Sounds like it _might_ be something to do with the carriage return
> itself. Is the blast file generated on the same OS that you're running
> your analysis on? (e.g. you might run Blast on a Linux box, but
> attempt to parse the file on a Windows box?). If the two OSes are
> different, this might point to it - as Linux won't necessarily
> understand the Windows linebreaks, or vice versa, and might
> misinterpret them. When you copy the portion of the file to a new file
> on the OS you're running the analysis on, it will substitute its own
> local linebreaks and thus mask the problem.
>
> So the first thing I'd check is to what the two OSes involved are. If
> they're different, try running your analysis program on the same OS as
> the Blast output was generated on. If that does fix it, then try
> putting your Blast files through dos2unix or something similar to
> convert the linebreaks before running your analysis program.
>
> If they're the same OS, then we still have a problem!
>
> cheers,
> Richard
>
> 2008/9/30 David Toomey :
> > Hi
> >
> >
> >
> > I am parsing a blast result and I am getting a
> > StringIndexOutOfBoundsException. The stack trace is
> >
> >
> >
> >        at java.lang.String.substring(String.java:1938)
> >
> >        at java.lang.String.substring(String.java:1905)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA
> > lignmentSAXParser.java:291)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign
> > mentSAXParser.java:116)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP
> > arser.java:517)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP
> > arser.java:287)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse
> > r.java:251)
> >
> >        at
> > org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja
> > va:117)
> >
> >        at
> > org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser
> > .java:634)
> >
> >        at
> > org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:341
> > )
> >
> >        at
> > org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:168)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars
> > er.java:314)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.
> > java:276)
> >
> >        at
> > org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java
> > :163)
> >
> >        at ie.rcsi.blast.StandardParser.parse(StandardParser.java:65)
> >
> >        at ie.rcsi.blast.BlastParser.parse(BlastParser.java:44)
> >
> >        at ie.rcsi.blast.Main.main(Main.java:30)
> >
> >
> >
> > I have updated BlastLikeAlignmentSAXParser to output some debug info and
> > narrowed down the line causing the problem to the following line
> >
> >
> >
> > 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
> >
> > GN=ISPF
> >
> >
> >
> > If I remove the carriage return and put it on a single line then everything
> > works fine. Strangely if I copy this entry and put it in a file on it's own
> > it also parses correctly, even with the carriage return!!!
> >
> >
> >
> > Has anyone seen this before or does anyone have a suggestion on what I might
> > to do fix it. I send the complete blast result if it would help. I have
> > tried using blast 2.2.18 and 2.2.17 and the problem is the same.
> >
> >
> >
> > Cheers
> >
> >
> >
> > Dave
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From dtoomey at rcsi.ie  Wed Oct  1 08:40:44 2008
From: dtoomey at rcsi.ie (David Toomey)
Date: Wed, 1 Oct 2008 09:40:44 +0100
Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast
	result
References: 
	
Message-ID: 

They are on the same OS. For all my tests I have run the blast search and
parsing on the same OS. This has mostly been windows but I have also tried
the whole thing on Linux and I get the same problem.
I have done some more testing and I don't think the carriage return is the
problem.
What I have found is that if the second line is less than 11 characters the
error is thrown. If I add 4 spaces in front of the 'GN=ISPF' on the second
line then it is parsed correctly, like this.

2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7) 
    GN=ISPF

I haven't figured out why it parses correctly when it is the only entry in
the file, even without the spaces. So maybe I am still missing something.

Cheers,

Dave

-----Original Message-----
From: dicknetherlands at gmail.com [mailto:dicknetherlands at gmail.com] On Behalf
Of Richard Holland
Sent: 30 September 2008 17:31
To: David Toomey
Cc: biojava-l at lists.open-bio.org
Subject: Re: [Biojava-l] StringIndexOutOfBoundsException while parsing blast
result

Sounds like it _might_ be something to do with the carriage return
itself. Is the blast file generated on the same OS that you're running
your analysis on? (e.g. you might run Blast on a Linux box, but
attempt to parse the file on a Windows box?). If the two OSes are
different, this might point to it - as Linux won't necessarily
understand the Windows linebreaks, or vice versa, and might
misinterpret them. When you copy the portion of the file to a new file
on the OS you're running the analysis on, it will substitute its own
local linebreaks and thus mask the problem.

So the first thing I'd check is to what the two OSes involved are. If
they're different, try running your analysis program on the same OS as
the Blast output was generated on. If that does fix it, then try
putting your Blast files through dos2unix or something similar to
convert the linebreaks before running your analysis program.

If they're the same OS, then we still have a problem!

cheers,
Richard






From holland at eaglegenomics.com  Wed Oct  1 09:37:59 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 1 Oct 2008 10:37:59 +0100
Subject: [Biojava-l] StringIndexOutOfBoundsException while parsing blast
	result
In-Reply-To: 
References: 
	
	
Message-ID: 

Thanks for the extra info.

2008/10/1 David Toomey :
> They are on the same OS. For all my tests I have run the blast search and
> parsing on the same OS. This has mostly been windows but I have also tried
> the whole thing on Linux and I get the same problem.
> I have done some more testing and I don't think the carriage return is the
> problem.
> What I have found is that if the second line is less than 11 characters the
> error is thrown. If I add 4 spaces in front of the 'GN=ISPF' on the second
> line then it is parsed correctly, like this.
>
> 2,4-cyclodiphosphate synthase OS=Plasmodium falciparum (isolate 3D7)
>    GN=ISPF
>
> I haven't figured out why it parses correctly when it is the only entry in
> the file, even without the spaces. So maybe I am still missing something.
>
> Cheers,
>
> Dave
>
> -----Original Message-----
> From: dicknetherlands at gmail.com [mailto:dicknetherlands at gmail.com] On Behalf
> Of Richard Holland
> Sent: 30 September 2008 17:31
> To: David Toomey
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] StringIndexOutOfBoundsException while parsing blast
> result
>
> Sounds like it _might_ be something to do with the carriage return
> itself. Is the blast file generated on the same OS that you're running
> your analysis on? (e.g. you might run Blast on a Linux box, but
> attempt to parse the file on a Windows box?). If the two OSes are
> different, this might point to it - as Linux won't necessarily
> understand the Windows linebreaks, or vice versa, and might
> misinterpret them. When you copy the portion of the file to a new file
> on the OS you're running the analysis on, it will substitute its own
> local linebreaks and thus mask the problem.
>
> So the first thing I'd check is to what the two OSes involved are. If
> they're different, try running your analysis program on the same OS as
> the Blast output was generated on. If that does fix it, then try
> putting your Blast files through dos2unix or something similar to
> convert the linebreaks before running your analysis program.
>
> If they're the same OS, then we still have a problem!
>
> cheers,
> Richard
>
>
>
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From pzgyuanf at gmail.com  Wed Oct  1 10:52:25 2008
From: pzgyuanf at gmail.com (pprun)
Date: Wed, 01 Oct 2008 18:52:25 +0800
Subject: [Biojava-l] BufferedOutputStream to RichSequence.IOTools.writeXXX()
 method needs to flush manually
Message-ID: 

Hi,
I don't know this is a feature or a bug,
If a BufferedOutputStream was passed to method
RichSequence.IOTools.writeGenbank(OutputStream os, Sequence seq,
Namespace ns),
at the end, I need to manually flush it - BufferedOutputStream.flush()

Otherwise, the output content will be truncated.

Is this the expected behavior?

Thanks,
- Pprun



From holland at eaglegenomics.com  Wed Oct  1 13:36:59 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 1 Oct 2008 14:36:59 +0100
Subject: [Biojava-l] BufferedOutputStream to
	RichSequence.IOTools.writeXXX() method needs to flush manually
In-Reply-To: 
References: 
Message-ID: 

The IOTools interfaces accept OutputStream instances, not
BufferedOutputStream instances. flush() is not a requirement on
OutputStream and so BJX does not call it.

cheers,
Richard

2008/10/1 pprun :
> Hi,
> I don't know this is a feature or a bug,
> If a BufferedOutputStream was passed to method
> RichSequence.IOTools.writeGenbank(OutputStream os, Sequence seq,
> Namespace ns),
> at the end, I need to manually flush it - BufferedOutputStream.flush()
>
> Otherwise, the output content will be truncated.
>
> Is this the expected behavior?
>
> Thanks,
> - Pprun
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Thu Oct  2 00:46:03 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Thu, 2 Oct 2008 08:46:03 +0800
Subject: [Biojava-l] BufferedOutputStream to
	RichSequence.IOTools.writeXXX() method needs to flush manually
In-Reply-To: 
References: 
	
Message-ID: <93b45ca50810011746y7d4f49biffd5c2e483c86bd1@mail.gmail.com>

As a general rule it is best if BioJava doesn't handle the flushing
and closing of OutputStreams. This is because you may want to keep
using the stream and control it's behaivour. An interesting example is
if you pass System.out to a method that closes the stream. Probably
not what you want.

Having said that maybe we should add a javadoc to say that
BufferedOutputStreams need to be flushed (and possibly closed).

- Mark

On Wed, Oct 1, 2008 at 9:36 PM, Richard Holland
 wrote:
> The IOTools interfaces accept OutputStream instances, not
> BufferedOutputStream instances. flush() is not a requirement on
> OutputStream and so BJX does not call it.
>
> cheers,
> Richard
>
> 2008/10/1 pprun :
>> Hi,
>> I don't know this is a feature or a bug,
>> If a BufferedOutputStream was passed to method
>> RichSequence.IOTools.writeGenbank(OutputStream os, Sequence seq,
>> Namespace ns),
>> at the end, I need to manually flush it - BufferedOutputStream.flush()
>>
>> Otherwise, the output content will be truncated.
>>
>> Is this the expected behavior?
>>
>> Thanks,
>> - Pprun
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From gabrielle_doan at gmx.net  Tue Oct  7 14:26:44 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Tue, 07 Oct 2008 16:26:44 +0200
Subject: [Biojava-l] Getting a part of a sequence
Message-ID: <48EB71A4.70409@gmx.net>

Hi all,
I have a BioSQL database which contains all human chromosomes. My 
intention is to get the information about a particular gene. How can I 
get a part of a particular chromosome with all associated features? At 
the moment I use following code to create my new sequence:


RichSequence subSeq = RichSequence.Tools.subSequence(parent,
	position[0], position[1], ns, geneName, parent.getAccession(),
	parent.getIdentifier(), parent.getVersion() + 1,
	(Double) (parent.getVersion() + 1.0));
<\code>

Here is the part how I get the parent sequence:

	public static RichSequence getChromosome(String chrNo) {
		Transaction tx = session.beginTransaction();
		RichSequence ret = null;

		String query;

		try {
			if (chrNo.equals("MT")) {
				query = "from BioEntry as be where be.description like '%:num%'";
				query = query.replaceAll(":num", "mitochondrion");
			} else {
				query = "from BioEntry as be where be.description like '%hromosome 
:num%'";
				query = query.replaceAll(":num", chrNo);
			}

			Query q = session.createQuery(query);

			ret = (RichSequence) q.list().get(0);
			tx.commit();
		} catch (Exception e) {
			tx.rollback();
			e.printStackTrace();
		}
		return ret;
	}
<\code>

I always have to load the whole chromsome to get a part of it, so it 
takes very long time and I get a lot of unused information (waste of 
memory). I also tried to use ThinRichSequence<\code> instead of 
RichSequence<\code>, but thereby I didn't notice any difference.
Can you give me a hint how to accelerate the code?
I am grateful for any hits.

cheers,
Gabrielle


From holland at eaglegenomics.com  Tue Oct  7 23:05:54 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 8 Oct 2008 00:05:54 +0100
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <48EB71A4.70409@gmx.net>
References: <48EB71A4.70409@gmx.net>
Message-ID: 

Hello.

Your code is pretty good already - but you're right, it will load the
whole chromosome into memory before you can chop out the interesting
bit you actually need.

As you observed, by using ThinRichSequence in your query it will load
only the initial shell of a sequence object to start with, but the
moment you try and sub-sequence it, it will immediately load the whole
sequence data into memory in order to perform the operation.

If you only want the sequence data, as a string, you can do this by
specifying the sequence attribute in the query and bypassing the
sequence object entirely:

 select rs.stringSequence from Sequence as rs where rs.description
like '%hromosome :num%

This will return a String instead of a RichSequence object. You can
use HQL operators to perform substrings etc. on the string inside the
query itself - see
http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
, particularly section 14.9.

If you only want the features, you can do this by using the
BioSQLFeatureFilter technique. In particular you will want the
BySequenceName filter, the And filter, and the OverlapsRichLocation
filter. You construct a filter then pass it to the filter() method in
BioSQLRichSequenceDB. The database will return to you all the
RichFeature objects that match your criteria. Note that it searches
the whole database so you really must use a BySequenceName filter at
the very least in order to make the results useful!

However, you can't use HQL to construct a complete slice of a sequence
directly in the database before returning it to the program for use as
a ready-made RichSequence object. This would require Hibernate to know
what a BioJava sub-sequence object is and how it behaves in relation
to an 'unsliced' one, which is beyond the scope of it's job as a
persistence framework.

cheers,
Richard



2008/10/7 Gabrielle Doan :
> Hi all,
> I have a BioSQL database which contains all human chromosomes. My intention
> is to get the information about a particular gene. How can I get a part of a
> particular chromosome with all associated features? At the moment I use
> following code to create my new sequence:
>
> 
> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>        position[0], position[1], ns, geneName, parent.getAccession(),
>        parent.getIdentifier(), parent.getVersion() + 1,
>        (Double) (parent.getVersion() + 1.0));
> <\code>
>
> Here is the part how I get the parent sequence:
> 
>        public static RichSequence getChromosome(String chrNo) {
>                Transaction tx = session.beginTransaction();
>                RichSequence ret = null;
>
>                String query;
>
>                try {
>                        if (chrNo.equals("MT")) {
>                                query = "from BioEntry as be where
> be.description like '%:num%'";
>                                query = query.replaceAll(":num",
> "mitochondrion");
>                        } else {
>                                query = "from BioEntry as be where
> be.description like '%hromosome :num%'";
>                                query = query.replaceAll(":num", chrNo);
>                        }
>
>                        Query q = session.createQuery(query);
>
>                        ret = (RichSequence) q.list().get(0);
>                        tx.commit();
>                } catch (Exception e) {
>                        tx.rollback();
>                        e.printStackTrace();
>                }
>                return ret;
>        }
> <\code>
>
> I always have to load the whole chromsome to get a part of it, so it takes
> very long time and I get a lot of unused information (waste of memory). I
> also tried to use ThinRichSequence<\code> instead of
> RichSequence<\code>, but thereby I didn't notice any difference.
> Can you give me a hint how to accelerate the code?
> I am grateful for any hits.
>
> cheers,
> Gabrielle
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From koen.bruynseels at cropdesign.com  Wed Oct  8 00:02:18 2008
From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com)
Date: Wed, 8 Oct 2008 02:02:18 +0200
Subject: [Biojava-l] Koen Bruynseels is out of the office.
Message-ID: 


I will be out of the office starting  04/10/2008 and will not return until
09/10/2008.

I will respond to your message when I return.



From gabrielle_doan at gmx.net  Thu Oct  9 12:22:01 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Thu, 09 Oct 2008 14:22:01 +0200
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: 
References: <48EB71A4.70409@gmx.net>
	
Message-ID: <48EDF769.8050901@gmx.net>

Hi Richard,

thanks a lot for your mail. I have successfully retrieved the 
subsequence of a sequence as a String. And now I try to get the features 
for a particular range with following code:


	public FeatureHolder filterFeature(String name, int startpos, int endpos) {
		RichLocation rl = new SimpleRichLocation(new SimplePosition(startpos),
				new SimplePosition(endpos), 0);
		BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
				new BioSQLFeatureFilter.BySequenceName(name),
				new BioSQLFeatureFilter.OverlapsRichLocation(rl));
		return filter(filter);
	}
<\code>

Fortunately I received these errors:

Exception in thread "main" java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
	at org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
	at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
	... 3 more
Caused by: org.hibernate.PropertyAccessException: Exception occurred 
inside setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
	at 
org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
	at 
org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
	at 
org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
	at 
org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
	at 
org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
	at org.hibernate.loader.Loader.doQuery(Loader.java:729)
	at 
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
	at org.hibernate.loader.Loader.doList(Loader.java:2213)
	at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
	at org.hibernate.loader.Loader.list(Loader.java:2099)
	at 
org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
	at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
	at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
	... 21 more
Caused by: java.lang.NullPointerException
	at 
org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
	at 
org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
	at 
org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
	at 
org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
	at org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
	at 
org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
	... 26 more
<\message>

Why do I get these errors?
BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. 
How can I find out the sequence name? Is it the value "name" in the 
table "Bioentry"? As the build-in subSequence method takes a long time I 
intend to get the subsequence as a String by myself and add the features 
to it. What do you think about this?

I'm grateful for any hints.
cheers,

Gabrielle



Richard Holland schrieb:
> Hello.
> 
> Your code is pretty good already - but you're right, it will load the
> whole chromosome into memory before you can chop out the interesting
> bit you actually need.
> 
> As you observed, by using ThinRichSequence in your query it will load
> only the initial shell of a sequence object to start with, but the
> moment you try and sub-sequence it, it will immediately load the whole
> sequence data into memory in order to perform the operation.
> 
> If you only want the sequence data, as a string, you can do this by
> specifying the sequence attribute in the query and bypassing the
> sequence object entirely:
> 
>  select rs.stringSequence from Sequence as rs where rs.description
> like '%hromosome :num%
> 
> This will return a String instead of a RichSequence object. You can
> use HQL operators to perform substrings etc. on the string inside the
> query itself - see
> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
> , particularly section 14.9.
> 
> If you only want the features, you can do this by using the
> BioSQLFeatureFilter technique. In particular you will want the
> BySequenceName filter, the And filter, and the OverlapsRichLocation
> filter. You construct a filter then pass it to the filter() method in
> BioSQLRichSequenceDB. The database will return to you all the
> RichFeature objects that match your criteria. Note that it searches
> the whole database so you really must use a BySequenceName filter at
> the very least in order to make the results useful!
> 
> However, you can't use HQL to construct a complete slice of a sequence
> directly in the database before returning it to the program for use as
> a ready-made RichSequence object. This would require Hibernate to know
> what a BioJava sub-sequence object is and how it behaves in relation
> to an 'unsliced' one, which is beyond the scope of it's job as a
> persistence framework.
> 
> cheers,
> Richard
> 
> 
> 
> 2008/10/7 Gabrielle Doan :
>> Hi all,
>> I have a BioSQL database which contains all human chromosomes. My intention
>> is to get the information about a particular gene. How can I get a part of a
>> particular chromosome with all associated features? At the moment I use
>> following code to create my new sequence:
>>
>> 
>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>        position[0], position[1], ns, geneName, parent.getAccession(),
>>        parent.getIdentifier(), parent.getVersion() + 1,
>>        (Double) (parent.getVersion() + 1.0));
>> <\code>
>>
>> Here is the part how I get the parent sequence:
>> 
>>        public static RichSequence getChromosome(String chrNo) {
>>                Transaction tx = session.beginTransaction();
>>                RichSequence ret = null;
>>
>>                String query;
>>
>>                try {
>>                        if (chrNo.equals("MT")) {
>>                                query = "from BioEntry as be where
>> be.description like '%:num%'";
>>                                query = query.replaceAll(":num",
>> "mitochondrion");
>>                        } else {
>>                                query = "from BioEntry as be where
>> be.description like '%hromosome :num%'";
>>                                query = query.replaceAll(":num", chrNo);
>>                        }
>>
>>                        Query q = session.createQuery(query);
>>
>>                        ret = (RichSequence) q.list().get(0);
>>                        tx.commit();
>>                } catch (Exception e) {
>>                        tx.rollback();
>>                        e.printStackTrace();
>>                }
>>                return ret;
>>        }
>> <\code>
>>
>> I always have to load the whole chromsome to get a part of it, so it takes
>> very long time and I get a lot of unused information (waste of memory). I
>> also tried to use ThinRichSequence<\code> instead of
>> RichSequence<\code>, but thereby I didn't notice any difference.
>> Can you give me a hint how to accelerate the code?
>> I am grateful for any hits.
>>
>> cheers,
>> Gabrielle
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> 
> 
> 



From holland at eaglegenomics.com  Fri Oct 10 14:30:03 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 10 Oct 2008 15:30:03 +0100
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <48EDF769.8050901@gmx.net>
References: <48EB71A4.70409@gmx.net>
	
	<48EDF769.8050901@gmx.net>
Message-ID: 

This looks like a bug in BJX. I have just committed a fix that I think will
fix it to the head of subversion. Can you check out the latest source,
compile it, and try your program again?

cheers,
Richard

2008/10/9 Gabrielle Doan 

> Hi Richard,
>
> thanks a lot for your mail. I have successfully retrieved the subsequence
> of a sequence as a String. And now I try to get the features for a
> particular range with following code:
>
> 
>        public FeatureHolder filterFeature(String name, int startpos, int
> endpos) {
>                RichLocation rl = new SimpleRichLocation(new
> SimplePosition(startpos),
>                                new SimplePosition(endpos), 0);
>                BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>                                new
> BioSQLFeatureFilter.BySequenceName(name),
>                                new
> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>                return filter(filter);
>        }
> <\code>
>
> Fortunately I received these errors:
> 
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>        at
> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>        at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>        ... 3 more
> Caused by: org.hibernate.PropertyAccessException: Exception occurred inside
> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>        at
> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>        at
> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>        at
> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>        at
> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>        at
> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>        at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>        at
> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>        at org.hibernate.loader.Loader.doList(Loader.java:2213)
>        at
> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>        at org.hibernate.loader.Loader.list(Loader.java:2099)
>        at
> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>        at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>        ... 8 more
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>        ... 21 more
> Caused by: java.lang.NullPointerException
>        at
> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>        at
> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>        at
> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>        ... 26 more
> <\message>
>
> Why do I get these errors?
> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. How
> can I find out the sequence name? Is it the value "name" in the table
> "Bioentry"? As the build-in subSequence method takes a long time I intend to
> get the subsequence as a String by myself and add the features to it. What
> do you think about this?
>
> I'm grateful for any hints.
> cheers,
>
> Gabrielle
>
>
>
> Richard Holland schrieb:
>
>  Hello.
>>
>> Your code is pretty good already - but you're right, it will load the
>> whole chromosome into memory before you can chop out the interesting
>> bit you actually need.
>>
>> As you observed, by using ThinRichSequence in your query it will load
>> only the initial shell of a sequence object to start with, but the
>> moment you try and sub-sequence it, it will immediately load the whole
>> sequence data into memory in order to perform the operation.
>>
>> If you only want the sequence data, as a string, you can do this by
>> specifying the sequence attribute in the query and bypassing the
>> sequence object entirely:
>>
>>  select rs.stringSequence from Sequence as rs where rs.description
>> like '%hromosome :num%
>>
>> This will return a String instead of a RichSequence object. You can
>> use HQL operators to perform substrings etc. on the string inside the
>> query itself - see
>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>> , particularly section 14.9.
>>
>> If you only want the features, you can do this by using the
>> BioSQLFeatureFilter technique. In particular you will want the
>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>> filter. You construct a filter then pass it to the filter() method in
>> BioSQLRichSequenceDB. The database will return to you all the
>> RichFeature objects that match your criteria. Note that it searches
>> the whole database so you really must use a BySequenceName filter at
>> the very least in order to make the results useful!
>>
>> However, you can't use HQL to construct a complete slice of a sequence
>> directly in the database before returning it to the program for use as
>> a ready-made RichSequence object. This would require Hibernate to know
>> what a BioJava sub-sequence object is and how it behaves in relation
>> to an 'unsliced' one, which is beyond the scope of it's job as a
>> persistence framework.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/7 Gabrielle Doan :
>>
>>> Hi all,
>>> I have a BioSQL database which contains all human chromosomes. My
>>> intention
>>> is to get the information about a particular gene. How can I get a part
>>> of a
>>> particular chromosome with all associated features? At the moment I use
>>> following code to create my new sequence:
>>>
>>> 
>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>       position[0], position[1], ns, geneName, parent.getAccession(),
>>>       parent.getIdentifier(), parent.getVersion() + 1,
>>>       (Double) (parent.getVersion() + 1.0));
>>> <\code>
>>>
>>> Here is the part how I get the parent sequence:
>>> 
>>>       public static RichSequence getChromosome(String chrNo) {
>>>               Transaction tx = session.beginTransaction();
>>>               RichSequence ret = null;
>>>
>>>               String query;
>>>
>>>               try {
>>>                       if (chrNo.equals("MT")) {
>>>                               query = "from BioEntry as be where
>>> be.description like '%:num%'";
>>>                               query = query.replaceAll(":num",
>>> "mitochondrion");
>>>                       } else {
>>>                               query = "from BioEntry as be where
>>> be.description like '%hromosome :num%'";
>>>                               query = query.replaceAll(":num", chrNo);
>>>                       }
>>>
>>>                       Query q = session.createQuery(query);
>>>
>>>                       ret = (RichSequence) q.list().get(0);
>>>                       tx.commit();
>>>               } catch (Exception e) {
>>>                       tx.rollback();
>>>                       e.printStackTrace();
>>>               }
>>>               return ret;
>>>       }
>>> <\code>
>>>
>>> I always have to load the whole chromsome to get a part of it, so it
>>> takes
>>> very long time and I get a lot of unused information (waste of memory). I
>>> also tried to use ThinRichSequence<\code> instead of
>>> RichSequence<\code>, but thereby I didn't notice any difference.
>>> Can you give me a hint how to accelerate the code?
>>> I am grateful for any hits.
>>>
>>> cheers,
>>> Gabrielle
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>
>>
>>
>>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From gabrielle_doan at gmx.net  Tue Oct 14 11:18:20 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Tue, 14 Oct 2008 13:18:20 +0200
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: 
References: <48EB71A4.70409@gmx.net>	
		
	<48EDF769.8050901@gmx.net>
	
Message-ID: <48F47FFC.4090607@gmx.net>

Hi Richard,
I have checked out the latest source and tried my code again. It still 
didn't work and I received following new errors:


Exception in thread "main" java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
	at org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:612)
	at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
	... 3 more
Caused by: org.hibernate.PropertyAccessException: Exception occurred 
inside setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
	at 
org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
	at 
org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
	at 
org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
	at 
org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
	at 
org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
	at org.hibernate.loader.Loader.doQuery(Loader.java:729)
	at 
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
	at org.hibernate.loader.Loader.doList(Loader.java:2213)
	at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
	at org.hibernate.loader.Loader.list(Loader.java:2099)
	at 
org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
	at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
	at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
	... 8 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
	... 21 more
Caused by: java.lang.NullPointerException
	at 
org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
	at 
org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
	at 
org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
	at 
org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
	at org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
	at 
org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
	... 25 more
<\message>

I think  BioSQLFeatureFilter.OverlapsRichLocation(rl) <\code> 
causes the problem I have. Can you help me to solve this problem?

I'm grateful for any hints.
cheers,

Gabrielle



Richard Holland schrieb:
> This looks like a bug in BJX. I have just committed a fix that I think will
> fix it to the head of subversion. Can you check out the latest source,
> compile it, and try your program again?
> 
> cheers,
> Richard
> 
> 2008/10/9 Gabrielle Doan 
> 
>> Hi Richard,
>>
>> thanks a lot for your mail. I have successfully retrieved the subsequence
>> of a sequence as a String. And now I try to get the features for a
>> particular range with following code:
>>
>> 
>>        public FeatureHolder filterFeature(String name, int startpos, int
>> endpos) {
>>                RichLocation rl = new SimpleRichLocation(new
>> SimplePosition(startpos),
>>                                new SimplePosition(endpos), 0);
>>                BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>>                                new
>> BioSQLFeatureFilter.BySequenceName(name),
>>                                new
>> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>>                return filter(filter);
>>        }
>> <\code>
>>
>> Fortunately I received these errors:
>> 
>> Exception in thread "main" java.lang.RuntimeException:
>> java.lang.reflect.InvocationTargetException
>>        at
>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>        at
>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>        at
>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>>        at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>> Caused by: java.lang.reflect.InvocationTargetException
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>        ... 3 more
>> Caused by: org.hibernate.PropertyAccessException: Exception occurred inside
>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>        at
>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>        at
>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>        at
>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>        at
>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>        at
>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>        at
>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>        at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>        at
>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>        at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>        at
>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>        at org.hibernate.loader.Loader.list(Loader.java:2099)
>>        at
>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>        at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>        ... 8 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>        ... 21 more
>> Caused by: java.lang.NullPointerException
>>        at
>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>        at
>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>        at
>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>        at
>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>        at
>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>        at
>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>        ... 26 more
>> <\message>
>>
>> Why do I get these errors?
>> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter. How
>> can I find out the sequence name? Is it the value "name" in the table
>> "Bioentry"? As the build-in subSequence method takes a long time I intend to
>> get the subsequence as a String by myself and add the features to it. What
>> do you think about this?
>>
>> I'm grateful for any hints.
>> cheers,
>>
>> Gabrielle
>>
>>
>>
>> Richard Holland schrieb:
>>
>>  Hello.
>>> Your code is pretty good already - but you're right, it will load the
>>> whole chromosome into memory before you can chop out the interesting
>>> bit you actually need.
>>>
>>> As you observed, by using ThinRichSequence in your query it will load
>>> only the initial shell of a sequence object to start with, but the
>>> moment you try and sub-sequence it, it will immediately load the whole
>>> sequence data into memory in order to perform the operation.
>>>
>>> If you only want the sequence data, as a string, you can do this by
>>> specifying the sequence attribute in the query and bypassing the
>>> sequence object entirely:
>>>
>>>  select rs.stringSequence from Sequence as rs where rs.description
>>> like '%hromosome :num%
>>>
>>> This will return a String instead of a RichSequence object. You can
>>> use HQL operators to perform substrings etc. on the string inside the
>>> query itself - see
>>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>>> , particularly section 14.9.
>>>
>>> If you only want the features, you can do this by using the
>>> BioSQLFeatureFilter technique. In particular you will want the
>>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>>> filter. You construct a filter then pass it to the filter() method in
>>> BioSQLRichSequenceDB. The database will return to you all the
>>> RichFeature objects that match your criteria. Note that it searches
>>> the whole database so you really must use a BySequenceName filter at
>>> the very least in order to make the results useful!
>>>
>>> However, you can't use HQL to construct a complete slice of a sequence
>>> directly in the database before returning it to the program for use as
>>> a ready-made RichSequence object. This would require Hibernate to know
>>> what a BioJava sub-sequence object is and how it behaves in relation
>>> to an 'unsliced' one, which is beyond the scope of it's job as a
>>> persistence framework.
>>>
>>> cheers,
>>> Richard
>>>
>>>
>>>
>>> 2008/10/7 Gabrielle Doan :
>>>
>>>> Hi all,
>>>> I have a BioSQL database which contains all human chromosomes. My
>>>> intention
>>>> is to get the information about a particular gene. How can I get a part
>>>> of a
>>>> particular chromosome with all associated features? At the moment I use
>>>> following code to create my new sequence:
>>>>
>>>> 
>>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>>       position[0], position[1], ns, geneName, parent.getAccession(),
>>>>       parent.getIdentifier(), parent.getVersion() + 1,
>>>>       (Double) (parent.getVersion() + 1.0));
>>>> <\code>
>>>>
>>>> Here is the part how I get the parent sequence:
>>>> 
>>>>       public static RichSequence getChromosome(String chrNo) {
>>>>               Transaction tx = session.beginTransaction();
>>>>               RichSequence ret = null;
>>>>
>>>>               String query;
>>>>
>>>>               try {
>>>>                       if (chrNo.equals("MT")) {
>>>>                               query = "from BioEntry as be where
>>>> be.description like '%:num%'";
>>>>                               query = query.replaceAll(":num",
>>>> "mitochondrion");
>>>>                       } else {
>>>>                               query = "from BioEntry as be where
>>>> be.description like '%hromosome :num%'";
>>>>                               query = query.replaceAll(":num", chrNo);
>>>>                       }
>>>>
>>>>                       Query q = session.createQuery(query);
>>>>
>>>>                       ret = (RichSequence) q.list().get(0);
>>>>                       tx.commit();
>>>>               } catch (Exception e) {
>>>>                       tx.rollback();
>>>>                       e.printStackTrace();
>>>>               }
>>>>               return ret;
>>>>       }
>>>> <\code>
>>>>
>>>> I always have to load the whole chromsome to get a part of it, so it
>>>> takes
>>>> very long time and I get a lot of unused information (waste of memory). I
>>>> also tried to use ThinRichSequence<\code> instead of
>>>> RichSequence<\code>, but thereby I didn't notice any difference.
>>>> Can you give me a hint how to accelerate the code?
>>>> I am grateful for any hits.
>>>>
>>>> cheers,
>>>> Gabrielle
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>>
>>>
>>>
> 
> 



From holland at eaglegenomics.com  Tue Oct 14 15:23:10 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 14 Oct 2008 16:23:10 +0100
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <48F47FFC.4090607@gmx.net>
References: <48EB71A4.70409@gmx.net>
	
	<48EDF769.8050901@gmx.net>
	
	<48F47FFC.4090607@gmx.net>
Message-ID: 

Something's broken! At least from your stack trace I can see exactly what's
going on. The set of locations is being loaded for the feature, but
Hibernate is not calling the setMin()/setMax() methods in each location
before inserting them into the set.

When they get added to the set of locations for the feature, they therefore
get added with null for min and max. At any point when these locations are
used, for instance when they are merged by the feature location setter, or
anywhere else, you'll get NullPointerExceptions.

This is despite the fact that the HBM XML files are explicitly telling it
_not_ to lazy-load them. Also this only happens when loading Features, and
not when loading Sequence objects.

I honestly don't know!

What I suggest is that you create a temporary database with only one record
in it, and run your test program against that to see what happens. If it
still breaks, raise a bug on BugZilla and post the Genbank dump of the
database to BugZilla along with your program code and the full stacktrace.
Someone with a bit more Hibernate knowledge than me might then be able to
help out.

cheers,
Richard


2008/10/14 Gabrielle Doan 

> Hi Richard,
> I have checked out the latest source and tried my code again. It still
> didn't work and I received following new errors:
>
> 
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>        at
> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:612)
>        at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>        ... 3 more
> Caused by: org.hibernate.PropertyAccessException: Exception occurred inside
> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>        at
> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>        at
> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>        at
> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>        at
> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>        at
> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>        at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>        at
> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>        at org.hibernate.loader.Loader.doList(Loader.java:2213)
>        at
> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>        at org.hibernate.loader.Loader.list(Loader.java:2099)
>        at
> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>        at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>        ... 8 more
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>        ... 21 more
> Caused by: java.lang.NullPointerException
>        at
> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>        at
> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>        at
> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>        at
> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>        ... 25 more
> <\message>
>
> I think  BioSQLFeatureFilter.OverlapsRichLocation(rl) <\code> causes
> the problem I have. Can you help me to solve this problem?
>
> I'm grateful for any hints.
> cheers,
>
> Gabrielle
>
>
>
> Richard Holland schrieb:
>
>> This looks like a bug in BJX. I have just committed a fix that I think
>> will
>> fix it to the head of subversion. Can you check out the latest source,
>> compile it, and try your program again?
>>
>> cheers,
>> Richard
>>
>> 2008/10/9 Gabrielle Doan 
>>
>>  Hi Richard,
>>>
>>> thanks a lot for your mail. I have successfully retrieved the subsequence
>>> of a sequence as a String. And now I try to get the features for a
>>> particular range with following code:
>>>
>>> 
>>>       public FeatureHolder filterFeature(String name, int startpos, int
>>> endpos) {
>>>               RichLocation rl = new SimpleRichLocation(new
>>> SimplePosition(startpos),
>>>                               new SimplePosition(endpos), 0);
>>>               BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>>>                               new
>>> BioSQLFeatureFilter.BySequenceName(name),
>>>                               new
>>> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>>>               return filter(filter);
>>>       }
>>> <\code>
>>>
>>> Fortunately I received these errors:
>>> 
>>> Exception in thread "main" java.lang.RuntimeException:
>>> java.lang.reflect.InvocationTargetException
>>>       at
>>>
>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>>       at
>>>
>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>>       at
>>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>>>       at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>       at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>       at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at
>>>
>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>>       ... 3 more
>>> Caused by: org.hibernate.PropertyAccessException: Exception occurred
>>> inside
>>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>>       at
>>>
>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>>       at
>>>
>>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>>       at
>>>
>>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>>       at
>>>
>>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>>       at
>>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>>       at
>>>
>>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>>       at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>>       at
>>>
>>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>>       at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>>       at
>>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>>       at org.hibernate.loader.Loader.list(Loader.java:2099)
>>>       at
>>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>>       at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>>       at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>>       ... 8 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>       at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>       at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at
>>>
>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>>       ... 21 more
>>> Caused by: java.lang.NullPointerException
>>>       at
>>>
>>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>>       at
>>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>>       at
>>>
>>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>>       ... 26 more
>>> <\message>
>>>
>>> Why do I get these errors?
>>> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter.
>>> How
>>> can I find out the sequence name? Is it the value "name" in the table
>>> "Bioentry"? As the build-in subSequence method takes a long time I intend
>>> to
>>> get the subsequence as a String by myself and add the features to it.
>>> What
>>> do you think about this?
>>>
>>> I'm grateful for any hints.
>>> cheers,
>>>
>>> Gabrielle
>>>
>>>
>>>
>>> Richard Holland schrieb:
>>>
>>>  Hello.
>>>
>>>> Your code is pretty good already - but you're right, it will load the
>>>> whole chromosome into memory before you can chop out the interesting
>>>> bit you actually need.
>>>>
>>>> As you observed, by using ThinRichSequence in your query it will load
>>>> only the initial shell of a sequence object to start with, but the
>>>> moment you try and sub-sequence it, it will immediately load the whole
>>>> sequence data into memory in order to perform the operation.
>>>>
>>>> If you only want the sequence data, as a string, you can do this by
>>>> specifying the sequence attribute in the query and bypassing the
>>>> sequence object entirely:
>>>>
>>>>  select rs.stringSequence from Sequence as rs where rs.description
>>>> like '%hromosome :num%
>>>>
>>>> This will return a String instead of a RichSequence object. You can
>>>> use HQL operators to perform substrings etc. on the string inside the
>>>> query itself - see
>>>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>>>> , particularly section 14.9.
>>>>
>>>> If you only want the features, you can do this by using the
>>>> BioSQLFeatureFilter technique. In particular you will want the
>>>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>>>> filter. You construct a filter then pass it to the filter() method in
>>>> BioSQLRichSequenceDB. The database will return to you all the
>>>> RichFeature objects that match your criteria. Note that it searches
>>>> the whole database so you really must use a BySequenceName filter at
>>>> the very least in order to make the results useful!
>>>>
>>>> However, you can't use HQL to construct a complete slice of a sequence
>>>> directly in the database before returning it to the program for use as
>>>> a ready-made RichSequence object. This would require Hibernate to know
>>>> what a BioJava sub-sequence object is and how it behaves in relation
>>>> to an 'unsliced' one, which is beyond the scope of it's job as a
>>>> persistence framework.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>>
>>>>
>>>> 2008/10/7 Gabrielle Doan :
>>>>
>>>>  Hi all,
>>>>> I have a BioSQL database which contains all human chromosomes. My
>>>>> intention
>>>>> is to get the information about a particular gene. How can I get a part
>>>>> of a
>>>>> particular chromosome with all associated features? At the moment I use
>>>>> following code to create my new sequence:
>>>>>
>>>>> 
>>>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>>>      position[0], position[1], ns, geneName, parent.getAccession(),
>>>>>      parent.getIdentifier(), parent.getVersion() + 1,
>>>>>      (Double) (parent.getVersion() + 1.0));
>>>>> <\code>
>>>>>
>>>>> Here is the part how I get the parent sequence:
>>>>> 
>>>>>      public static RichSequence getChromosome(String chrNo) {
>>>>>              Transaction tx = session.beginTransaction();
>>>>>              RichSequence ret = null;
>>>>>
>>>>>              String query;
>>>>>
>>>>>              try {
>>>>>                      if (chrNo.equals("MT")) {
>>>>>                              query = "from BioEntry as be where
>>>>> be.description like '%:num%'";
>>>>>                              query = query.replaceAll(":num",
>>>>> "mitochondrion");
>>>>>                      } else {
>>>>>                              query = "from BioEntry as be where
>>>>> be.description like '%hromosome :num%'";
>>>>>                              query = query.replaceAll(":num", chrNo);
>>>>>                      }
>>>>>
>>>>>                      Query q = session.createQuery(query);
>>>>>
>>>>>                      ret = (RichSequence) q.list().get(0);
>>>>>                      tx.commit();
>>>>>              } catch (Exception e) {
>>>>>                      tx.rollback();
>>>>>                      e.printStackTrace();
>>>>>              }
>>>>>              return ret;
>>>>>      }
>>>>> <\code>
>>>>>
>>>>> I always have to load the whole chromsome to get a part of it, so it
>>>>> takes
>>>>> very long time and I get a lot of unused information (waste of memory).
>>>>> I
>>>>> also tried to use ThinRichSequence<\code> instead of
>>>>> RichSequence<\code>, but thereby I didn't notice any difference.
>>>>> Can you give me a hint how to accelerate the code?
>>>>> I am grateful for any hits.
>>>>>
>>>>> cheers,
>>>>> Gabrielle
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From charles at imbusch.net  Tue Oct 14 21:03:04 2008
From: charles at imbusch.net (Charles Imbusch)
Date: Tue, 14 Oct 2008 23:03:04 +0200
Subject: [Biojava-l] parsing tblastn results
Message-ID: <48F50908.5060307@imbusch.net>

Hello,

for a project I want to parse a tblastn result with BioJava. I used the code
on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I 
get an
error message as follows:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: 
String index out of range: -3
    at java.lang.String.substring(String.java:1938)
    at java.lang.String.substring(String.java:1905)
    at 
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289)
    at 
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251)
    at 
org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118)
    at 
org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635)
    at 
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337)
    at 
org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
    at 
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313)
    at 
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276)
    at 
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162)
    at BlastEcho.echo(BlastEcho.java:29)
    at BlastEcho.main(BlastEcho.java:75)

I uploaded the Blast output file I want to parse here:
http://charles.imbusch.net/tmp/blastresult.txt

Any answer is appreciated.

Cheers,
  Charles


From ayates at ebi.ac.uk  Wed Oct 15 08:07:35 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Wed, 15 Oct 2008 09:07:35 +0100
Subject: [Biojava-l] ANN: EBI Course - Programmatic access in Java:
 webservices & work flows
Message-ID: <48F5A4C7.7010304@ebi.ac.uk>

Hi everyone,

Posting this here as it may be of interest to some people.

The EBI is holding a course in accessing a large number of its resources
from Java programs. The course will run from the 24th - 27th November
being held on-site at the Hinxton Genome Campus. Resources being covered
will include:

* Ontology Lookup Service - Offers access to multiple ontologies through
a common interface
* PICR - A tool for going between identifier spaces for proteins)
* UniProt
* IntAct
* ChEBI
* BioMart
* Integr8
* CiteXplore
* And many many more :)

If you are interested in any of these resources then please go to
http://www.ebi.ac.uk/training/handson/course_081124_javawebservices.html
. The course will cost you ?75 for the 3 days.

All the best,

Andy Yates


From holland at eaglegenomics.com  Wed Oct 15 08:13:18 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 15 Oct 2008 09:13:18 +0100
Subject: [Biojava-l] parsing tblastn results
In-Reply-To: <48F50908.5060307@imbusch.net>
References: <48F50908.5060307@imbusch.net>
Message-ID: 

I've raised a bug report for you. Hopefully someone will take a look at it
soon:

http://bugzilla.open-bio.org/show_bug.cgi?id=2617

cheers,
Richard

2008/10/14 Charles Imbusch 

> Hello,
>
> for a project I want to parse a tblastn result with BioJava. I used the
> code
> on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I
> get an
> error message as follows:
>
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
> String index out of range: -3
>   at java.lang.String.substring(String.java:1938)
>   at java.lang.String.substring(String.java:1905)
>   at
> org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeAlignmentSAXParser.java:289)
>   at
> org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlignmentSAXParser.java:115)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXParser.java:514)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXParser.java:287)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParser.java:251)
>   at
> org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.java:118)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser.java:635)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
>   at
> org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXParser.java:313)
>   at
> org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:276)
>   at
> org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:162)
>   at BlastEcho.echo(BlastEcho.java:29)
>   at BlastEcho.main(BlastEcho.java:75)
>
> I uploaded the Blast output file I want to parse here:
> http://charles.imbusch.net/tmp/blastresult.txt
>
> Any answer is appreciated.
>
> Cheers,
>  Charles
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From dtoomey at rcsi.ie  Wed Oct 15 09:46:58 2008
From: dtoomey at rcsi.ie (David Toomey)
Date: Wed, 15 Oct 2008 10:46:58 +0100
Subject: [Biojava-l] parsing tblastn results
References: <48F50908.5060307@imbusch.net>
	
Message-ID: 

Hi Richard

This looks suspiciously like a bug I raised a couple of weeks ago. I was
parsing blastp results but the stack trace is the same.

http://bugzilla.open-bio.org/show_bug.cgi?id=2603

Charles, I have updated the original bug with a hack which at least allows
you to parse the result and get an output. You just need to recompile the
source code with the modified 'BlastLikeAlignmentSAXParser.java. Not ideal
but at least you will be able to run your code until the source is fixed.

Cheers

Dave

-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Richard Holland
Sent: 15 October 2008 09:13
To: Charles Imbusch
Cc: biojava-l at biojava.org
Subject: Re: [Biojava-l] parsing tblastn results

I've raised a bug report for you. Hopefully someone will take a look at it
soon:

http://bugzilla.open-bio.org/show_bug.cgi?id=2617

cheers,
Richard

2008/10/14 Charles Imbusch 

> Hello,
>
> for a project I want to parse a tblastn result with BioJava. I used the
> code
> on http://biojava.org/wiki/BioJava:CookBook:Blast:Parser as it is and I
> get an
> error message as follows:
>
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
> String index out of range: -3
>   at java.lang.String.substring(String.java:1938)
>   at java.lang.String.substring(String.java:1905)
>   at
>
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parseLine(BlastLikeA
lignmentSAXParser.java:289)
>   at
>
org.biojava.bio.program.sax.BlastLikeAlignmentSAXParser.parse(BlastLikeAlign
mentSAXParser.java:115)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.outputHSPInfo(HitSectionSAXP
arser.java:514)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.firstHSPEvent(HitSectionSAXP
arser.java:287)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.interpret(HitSectionSAXParse
r.java:251)
>   at
>
org.biojava.bio.program.sax.HitSectionSAXParser.parse(HitSectionSAXParser.ja
va:118)
>   at
>
org.biojava.bio.program.sax.BlastSAXParser.hitsSectionReached(BlastSAXParser
.java:635)
>   at
>
org.biojava.bio.program.sax.BlastSAXParser.interpret(BlastSAXParser.java:337
)
>   at
> org.biojava.bio.program.sax.BlastSAXParser.parse(BlastSAXParser.java:164)
>   at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.onNewDataSet(BlastLikeSAXPars
er.java:313)
>   at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.
java:276)
>   at
>
org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java
:162)
>   at BlastEcho.echo(BlastEcho.java:29)
>   at BlastEcho.main(BlastEcho.java:75)
>
> I uploaded the Blast output file I want to parse here:
> http://charles.imbusch.net/tmp/blastresult.txt
>
> Any answer is appreciated.
>
> Cheers,
>  Charles
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l




From gabrielle_doan at gmx.net  Wed Oct 15 13:15:39 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Wed, 15 Oct 2008 15:15:39 +0200
Subject: [Biojava-l] Getting a part of a sequence
In-Reply-To: <381a3e850810142152p4e0a0c2ds80a74570b44f2be0@mail.gmail.com>
References: <48EB71A4.70409@gmx.net>	
		
	<48EDF769.8050901@gmx.net>	
		
	<48F47FFC.4090607@gmx.net>	
	<381a3e850810140928p4af06cf4r3dfd08908efd42f6@mail.gmail.com>	
	<48F4C99E.6070007@gmx.net>
	<381a3e850810142152p4e0a0c2ds80a74570b44f2be0@mail.gmail.com>
Message-ID: <48F5ECFB.6040703@gmx.net>

Hi Augusto,

I've inserted your files into BJX. Unfortunately it hasn't solved my 
problems. Maybe Richard has another idea how to handle it.

Best regards,
Gabrielle




Augusto Fernandes Vellozo schrieb:
> Hi Gabrielle,
> Please, let me know if the results ares ok or not.
> I remember, when I made the corrections, I didn't see the case with
> circularLength, because for my use case it doesn't matter and because
> i don't understand exactly what is this. Take care, if you have this
> use case.
> 
> Cheers,
> 
> Augusto
> 
> 2008/10/14 Gabrielle Doan :
>> Hi Augusto,
>>
>> thank you so much. I hope this will be the solution to my problem.
>>
>> cheers,
>> Gabrielle
>>
>> Augusto Fernandes Vellozo schrieb:
>>> Hi Gabrielle,
>>> I had some problems with the class Location and i modified some
>>> classes in my machine. I've already written to Richard.
>>> The classes modified are attached.
>>> These could help you.
>>>
>>> Good luck,
>>>
>>> Augusto
>>>
>>> 2008/10/14 Gabrielle Doan :
>>>> Hi Richard,
>>>> I have checked out the latest source and tried my code again. It still
>>>> didn't work and I received following new errors:
>>>>
>>>> 
>>>> Exception in thread "main" java.lang.RuntimeException:
>>>> java.lang.reflect.InvocationTargetException
>>>>       at
>>>>
>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>>>       at
>>>>
>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>>>       at
>>>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:612)
>>>>       at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>       at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>       at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>>       at
>>>>
>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>>>       ... 3 more
>>>> Caused by: org.hibernate.PropertyAccessException: Exception occurred
>>>> inside
>>>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>>>       at
>>>>
>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>>>       at
>>>>
>>>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>>>       at
>>>>
>>>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>>>       at
>>>>
>>>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>>>       at
>>>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>>>       at
>>>>
>>>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>>>       at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>>>       at
>>>>
>>>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>>>       at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>>>       at
>>>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>>>       at org.hibernate.loader.Loader.list(Loader.java:2099)
>>>>       at
>>>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>>>       at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>>>       at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>>>       ... 8 more
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>       at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>>>>       at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>>       at
>>>>
>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>>>       ... 21 more
>>>> Caused by: java.lang.NullPointerException
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>>>       at
>>>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>>>       at
>>>>
>>>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>>>       ... 25 more
>>>> <\message>
>>>>
>>>> I think  BioSQLFeatureFilter.OverlapsRichLocation(rl) <\code>
>>>> causes
>>>> the problem I have. Can you help me to solve this problem?
>>>>
>>>> I'm grateful for any hints.
>>>> cheers,
>>>>
>>>> Gabrielle
>>>>
>>>>
>>>>
>>>> Richard Holland schrieb:
>>>>> This looks like a bug in BJX. I have just committed a fix that I think
>>>>> will
>>>>> fix it to the head of subversion. Can you check out the latest source,
>>>>> compile it, and try your program again?
>>>>>
>>>>> cheers,
>>>>> Richard
>>>>>
>>>>> 2008/10/9 Gabrielle Doan 
>>>>>
>>>>>> Hi Richard,
>>>>>>
>>>>>> thanks a lot for your mail. I have successfully retrieved the
>>>>>> subsequence
>>>>>> of a sequence as a String. And now I try to get the features for a
>>>>>> particular range with following code:
>>>>>>
>>>>>> 
>>>>>>      public FeatureHolder filterFeature(String name, int startpos, int
>>>>>> endpos) {
>>>>>>              RichLocation rl = new SimpleRichLocation(new
>>>>>> SimplePosition(startpos),
>>>>>>                              new SimplePosition(endpos), 0);
>>>>>>              BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
>>>>>>                              new
>>>>>> BioSQLFeatureFilter.BySequenceName(name),
>>>>>>                              new
>>>>>> BioSQLFeatureFilter.OverlapsRichLocation(rl));
>>>>>>              return filter(filter);
>>>>>>      }
>>>>>> <\code>
>>>>>>
>>>>>> Fortunately I received these errors:
>>>>>> 
>>>>>> Exception in thread "main" java.lang.RuntimeException:
>>>>>> java.lang.reflect.InvocationTargetException
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
>>>>>>      at
>>>>>> org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
>>>>>>      at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
>>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
>>>>>>      ... 3 more
>>>>>> Caused by: org.hibernate.PropertyAccessException: Exception occurred
>>>>>> inside
>>>>>> setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
>>>>>>      at
>>>>>>
>>>>>> org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
>>>>>>      at org.hibernate.loader.Loader.doQuery(Loader.java:729)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
>>>>>>      at org.hibernate.loader.Loader.doList(Loader.java:2213)
>>>>>>      at
>>>>>> org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
>>>>>>      at org.hibernate.loader.Loader.list(Loader.java:2099)
>>>>>>      at
>>>>>>
>>>>>> org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
>>>>>>      at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
>>>>>>      at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
>>>>>>      ... 8 more
>>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
>>>>>>      ... 21 more
>>>>>> Caused by: java.lang.NullPointerException
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
>>>>>>      at
>>>>>> org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
>>>>>>      at
>>>>>>
>>>>>>
>>>>>> org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
>>>>>>      ... 26 more
>>>>>> <\message>
>>>>>>
>>>>>> Why do I get these errors?
>>>>>> BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter.
>>>>>> How
>>>>>> can I find out the sequence name? Is it the value "name" in the table
>>>>>> "Bioentry"? As the build-in subSequence method takes a long time I
>>>>>> intend
>>>>>> to
>>>>>> get the subsequence as a String by myself and add the features to it.
>>>>>> What
>>>>>> do you think about this?
>>>>>>
>>>>>> I'm grateful for any hints.
>>>>>> cheers,
>>>>>>
>>>>>> Gabrielle
>>>>>>
>>>>>>
>>>>>>
>>>>>> Richard Holland schrieb:
>>>>>>
>>>>>>  Hello.
>>>>>>> Your code is pretty good already - but you're right, it will load the
>>>>>>> whole chromosome into memory before you can chop out the interesting
>>>>>>> bit you actually need.
>>>>>>>
>>>>>>> As you observed, by using ThinRichSequence in your query it will load
>>>>>>> only the initial shell of a sequence object to start with, but the
>>>>>>> moment you try and sub-sequence it, it will immediately load the whole
>>>>>>> sequence data into memory in order to perform the operation.
>>>>>>>
>>>>>>> If you only want the sequence data, as a string, you can do this by
>>>>>>> specifying the sequence attribute in the query and bypassing the
>>>>>>> sequence object entirely:
>>>>>>>
>>>>>>>  select rs.stringSequence from Sequence as rs where rs.description
>>>>>>> like '%hromosome :num%
>>>>>>>
>>>>>>> This will return a String instead of a RichSequence object. You can
>>>>>>> use HQL operators to perform substrings etc. on the string inside the
>>>>>>> query itself - see
>>>>>>>
>>>>>>> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
>>>>>>> , particularly section 14.9.
>>>>>>>
>>>>>>> If you only want the features, you can do this by using the
>>>>>>> BioSQLFeatureFilter technique. In particular you will want the
>>>>>>> BySequenceName filter, the And filter, and the OverlapsRichLocation
>>>>>>> filter. You construct a filter then pass it to the filter() method in
>>>>>>> BioSQLRichSequenceDB. The database will return to you all the
>>>>>>> RichFeature objects that match your criteria. Note that it searches
>>>>>>> the whole database so you really must use a BySequenceName filter at
>>>>>>> the very least in order to make the results useful!
>>>>>>>
>>>>>>> However, you can't use HQL to construct a complete slice of a sequence
>>>>>>> directly in the database before returning it to the program for use as
>>>>>>> a ready-made RichSequence object. This would require Hibernate to know
>>>>>>> what a BioJava sub-sequence object is and how it behaves in relation
>>>>>>> to an 'unsliced' one, which is beyond the scope of it's job as a
>>>>>>> persistence framework.
>>>>>>>
>>>>>>> cheers,
>>>>>>> Richard
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2008/10/7 Gabrielle Doan :
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>> I have a BioSQL database which contains all human chromosomes. My
>>>>>>>> intention
>>>>>>>> is to get the information about a particular gene. How can I get a
>>>>>>>> part
>>>>>>>> of a
>>>>>>>> particular chromosome with all associated features? At the moment I
>>>>>>>> use
>>>>>>>> following code to create my new sequence:
>>>>>>>>
>>>>>>>> 
>>>>>>>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>>>>>>>>     position[0], position[1], ns, geneName, parent.getAccession(),
>>>>>>>>     parent.getIdentifier(), parent.getVersion() + 1,
>>>>>>>>     (Double) (parent.getVersion() + 1.0));
>>>>>>>> <\code>
>>>>>>>>
>>>>>>>> Here is the part how I get the parent sequence:
>>>>>>>> 
>>>>>>>>     public static RichSequence getChromosome(String chrNo) {
>>>>>>>>             Transaction tx = session.beginTransaction();
>>>>>>>>             RichSequence ret = null;
>>>>>>>>
>>>>>>>>             String query;
>>>>>>>>
>>>>>>>>             try {
>>>>>>>>                     if (chrNo.equals("MT")) {
>>>>>>>>                             query = "from BioEntry as be where
>>>>>>>> be.description like '%:num%'";
>>>>>>>>                             query = query.replaceAll(":num",
>>>>>>>> "mitochondrion");
>>>>>>>>                     } else {
>>>>>>>>                             query = "from BioEntry as be where
>>>>>>>> be.description like '%hromosome :num%'";
>>>>>>>>                             query = query.replaceAll(":num", chrNo);
>>>>>>>>                     }
>>>>>>>>
>>>>>>>>                     Query q = session.createQuery(query);
>>>>>>>>
>>>>>>>>                     ret = (RichSequence) q.list().get(0);
>>>>>>>>                     tx.commit();
>>>>>>>>             } catch (Exception e) {
>>>>>>>>                     tx.rollback();
>>>>>>>>                     e.printStackTrace();
>>>>>>>>             }
>>>>>>>>             return ret;
>>>>>>>>     }
>>>>>>>> <\code>
>>>>>>>>
>>>>>>>> I always have to load the whole chromsome to get a part of it, so it
>>>>>>>> takes
>>>>>>>> very long time and I get a lot of unused information (waste of
>>>>>>>> memory).
>>>>>>>> I
>>>>>>>> also tried to use ThinRichSequence<\code> instead of
>>>>>>>> RichSequence<\code>, but thereby I didn't notice any
>>>>>>>> difference.
>>>>>>>> Can you give me a hint how to accelerate the code?
>>>>>>>> I am grateful for any hits.
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Gabrielle
>>>>>>>> _______________________________________________
>>>>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>>>>
>>>>>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
>>
> 
> 
> 



From holland at eaglegenomics.com  Mon Oct 20 00:18:29 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 01:18:29 +0100
Subject: [Biojava-l] BioJava 3 Begins - Volunteers please!
Message-ID: 

Hi all,

I've just committed some new code to the biojava3 branch of the biojava-live
subversion repository. It's the foundations of a brand new alphabet+symbol
set of classes, and an example of how to use them to represent DNA. You'll
notice that the new code is very lightweight and allows for a lot more
flexibility than the old code - for instance, the concept of Alphabet has
changed radically. It also makes much more extensive use of the Collections
API.

I haven't got any test cases or usage examples yet but give me a shout if
you don't understand the code and I'll explain how it works. (Hint:
SymbolFormat is there to convert Strings into SymbolList objects, and vice
versa).

So, now we want some volunteers! We're starting from scratch here so there's
a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
whether it be copy-and-paste existing classes and modify them to suit the
new style, or write completely new ones to provide equivalent functionality.


I'll post an example of how to do file parsing soon, probably starting with
FASTA. In the meantime, a good place to start would be for people to design
object models to represent their favourite data types (e.g. Genbank, or
microarray data). Utility classes to manipulate those objects would be great
too.

The object models need to be normalised as much as possible - e.g. if your
data has a lot of comments, and the order of those comments is important,
then give your object model a collection of comment objects. The object
model for each data type should be completely independent and use basic data
types wherever possible (e.g. store sequences as strings, don't attempt to
parse them into anything fancy like SymbolLists). The closer the object
model is to the original data format, the better. There's going to be clever
tricks when it comes to converting data between different object models
(e.g. Genbank to INSDSeq), which I will explain later when I put the file
parsing examples up.

You'll notice how the biojava3 branch uses Maven instead of Ant. This is
because we want to make it as modular as possible, so if you want to write
microarray stuff, create a new microarray sub-project (as per the dna
example that's already there). This way if someone only wants the microarray
bit of BJ3, they only need install the appropriate JAR file and can ignore
the rest. (The 'core' module is for stuff that is so generic it could be
used anywhere, or is used in every single other module.)

If coding isn't your cup of tea, then we would very much welcome testers
(particularly those who enjoy writing test cases!), documenters
(particularly code commenters), translators (for internationalisation of the
code), and of course all those who wish to contribute ideas and suggestions
no matter how off-the-wall they might be. In particular if you'd like to
take charge of an area of the development process, e.g. Documentation Chief,
or Protein Champion, then that would be much appreciated.

I'm very much looking forward to working with everyone on this. Good luck,
and happy coding!

cheers,
Richard

PS. Please don't forget to attach the appropriate licence to your code. You
can copy-and-paste it from the existing classes I just committed this
evening.

PPS. For those who are worried about backwards compatibility - this was
discussed on the lists a while back and it was made clear that BJ3 is a
clean break. However, the existing code will continue to be maintained and
bugfixed for a couple of years so you don't have to upgrade if you don't
want to - it just won't have any new features developed for it. This is
largely because it'll probably take just that long to write all the new BJ3
code. When we do decide to desupport the existing BJ code, plenty of notice
will be given (i.e. years as opposed to months).


-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Oct 20 17:52:08 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 20 Oct 2008 18:52:08 +0100
Subject: [Biojava-l] File parsing in BJ3
Message-ID: 

(From now on I will only be posting these development messages to
biojava-dev, which is the intended purpose of that list. Those of you who
wish to keep track of things but are currently only subscribed to biojava-l
should also subscribe to biojava-dev in order to keep up to date.)

As promised, I've committed a new package in the biojava-core module that
should help understand how to do file parsing and conversion and writing in
the new BJ3 modules. Here's an example of how to use it to write a Genbank
parser (note no parsers actually exist yet!):

1. Design yourself a Genbank class which implements the interface Thing and
can fully represent all the data that might possibly occur inside a Genbank
file.

2. Write an interface called GenbankReceiver, which extends ThingReceiver
and defines all the methods you might need in order to construct a Genbank
object in an asynchronous fashion.

3. Write a GenbankBuilder class which implements GenbankReceiver and
ThingBuilder. It's job is to receive data via method calls, use that data to
construct a Genbank object, then provide that object on demand.

4. Write a GenbankWriter class which implements GenbankReceiver and
ThingWriter. It's job is similar to GenbankBuilder, but instead of
constructing new Genbank objects, it writes Genbank records to file that
reflect the data it receives.

5. Write a GenbankReader class which implements ThingReader. It can read
GenbankFiles and output the data to the methods of the ThingReceiver
provided to it, which in this case could be anything which implements the
interface GenbankReceiver.

6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
Genbank object and will fire off data from it to the provided ThingReceiver
(a GenbankReceiver instance) as if the Genbank object was being read from a
file or some other source.

That's it! OK so it's a minimum of 6 classes instead of the original 1 or 2,
but the additional steps are necessary for flexibility in converting between
formats.

Now to use it (you'll probably want a GenbankTools class to wrap these steps
up for user-friendliness, including various options for opening files,
etc.):

1. To read a file - instantiate ThingParser with your GenbankReader as the
reader, and GenbankBuilder as the receiver. Use the iterator methods on
ThingParser to get the objects out.

2. To write a file - instantiate ThingParser with a GenbankEmitter wrapping
your Genbank object, and a GenbankWriter as the receiver. Use the parseAll()
method on the ThingParser to dump the whole lot to your chosen output.

The clever bit comes when you want to convert between files. Imagine you've
done all the above for Genbank, and you've also done it for FASTA. How to
convert between them? What you need to do is this:

1. Implement all the classes for both Genbank and FASTA.

2. Write a GenbankFASTAConverter class that implements ThingConverter
and GenbankReceiver, and will internally convert the data received and pass
it on out to the receiver provided, which will be a FASTAReceiver instance.

3. Write a FASTAGenbankConverter class that operates in exactly the opposite
way, implementing ThingConverter and FASTAReceiver.

Then to convert you use ThingParser again:

1. From FASTA file to Genbank object: Instantiate ThingParser with a
FASTAReader reader, a GenbankBuilder receiver, and add a
FASTAGenbankConverter instance to the converter chain. Use the iterator to
get your Genbank objects out of your FASTA file.

2. From FASTA file to Genbank file: Same as option 1, but provide a
GenbankWriter instead and use parseAll() instead of the iterator methos.

3. From FASTA object to Genbank object: Same as option 1, but provide a
FASTAEmitter wrapping your FASTA object as the reader instead.

4. From FASTA object to Genbank file: Same as option 1, but swap both the
reader and the receiver as per options 2 and 3.

5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all mentions
of FASTA and Genbank, and use GenbankFASTAConverter instead.

One last and very important feature of this approach is that if you discover
that nobody has written the appropriate converter for your chosen pair of
formats A and C, but converters do exist to map A to some other format B and
that other format B on to C, then you can just put the two converts A-B and
B-C into the ThingParser chain and it'll work perfectly.

Enjoy!

cheers,
Richard

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Tue Oct 21 02:54:27 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 10:54:27 +0800
Subject: [Biojava-l] Biojava / BioSQL entity beans
Message-ID: <93b45ca50810201954k44ab0f65xb94a0214d8eb4e13@mail.gmail.com>

Hi -

Richard has kindly uploaded some JPA Entity beans that map to the
BioSQL database schema as a BioSQL module for BJ3.  These entity beans
where generated as part of the Tokyo webservices workshop.  As
Entities they are useful as POJOs as well as data transfer via JPA,
JAXB and can be used in EJB containers or a plain old JVM.  The have
no biological smarts and the intention was/is that these will be
provided by wrapping them in Bio-aware (and more thread safe) wrappers
that implement interfaces from other BJ3 modules.  In essence it is a
persistence layer.

The following is copied verbatim from the package-info.java and gives
you some idea of how I intend the package to be used (obviously some
of this is still to come).  There is also some discussion of some of
the gotcha's that might trip you up when playing with object
relational persistence.

BTW the naming convention is to call something FooEntity. Where BioSQL
requires a compound primary key this is implemented as an Embeddable
object called FooEntityPK which is the key for FooEntity.  The other
thing you may see is FooEntityUK which is the same concept but
represents some of the cases where BioSQL tables don't have a primary
key (even a compound one) but implicitly they do because all the
fields have the SQL unique restriction. In these cases JPA still
requires an Embeddable key to track updates. As far as Java is
concerned they are the same as a FooEntityPK but I used a different
name to make the distinction.

The annotations provide mapping to tables from a Derby database. This
is the reference Java in memory DB which can run from any JVM and is
also found in Glassfish. The mappings will likely also work with
MySQL. For Oracle (and possibly others) you would need to override the
@GeneratedValue strategy for generating primary keys. I believe this
can be done with external XML config files. You may also wish to
overide the default eager loading and cascade annotations depending on
your JPA persistence method and preferences.

This has been lightly tested using Glassfish, Derby and Toplink
essentials and is a work in progress but seems to work OK.

Best regards,

- Mark

/**
 * The package contains Entity representations of BioJava classes.
 * The purpose of these entities is to allow simple serialization of
BioJava data
 * using binary serialization for protocols that require this (eg RPC between
 * Java application servers) as well as persistence mechanisms that require bean
 * like ojbects such as the Java Persistence Architechture (JPA) or the
 * Java API for XML Binding (JAXB). For this reason all objects in this package
 * should provide a parameterless public constructor and public get/set methods
 * for relevant fields.
 * 
 * Given the public nature of the constructors and the setters in these beans
 * these classes are not intended for direct use in general programming when
 * using the BioJava v3 API. This is because it is possible to leave the bean in
 * and inconsitent state and they are not thread safe unless
synchronization
 * controlled externally (via synchornization blocks or via a
application container).
 * 

 * The Entities are intended to back other objects that a
 * programer will interact with directly. For example
Foo.class will be backed
 * by FooEntity.class. Generally interaction with
Foo.class is to be prefered and
 * will often be more sensible as the entities typically provide no 'biological
 * behaivour'. Relevant behaivour should be provided by the wrapping
class. It is best
 * to think of Foo as a view onto the data that is held in the
 * FooEntity.  A good example is the sophisticated Symbol
 * behaivour that can represent biological logic about IUPAC ambiguity symbols.
 * For example a 'w' in a Biosequence represents an abiguity between
'a' and 't',
 * whereas a 'w' in BiosequenceEntity is simply a 'w' and nothing else.
 * 

 * The wrapper entity pattern is intended to allow for a lot of the advanced
 * behaivour in the original BioJava while also allowing use of modern transport
 * and persistence packages. This is achieved by peristing and transporting the
 * entity without the wrapper and re-wrapping it at the other end.
 * 

 * Currently BioJava v3 uses annotated @Id fields to define
 * equals(Object o). Consistent definition is critical to how
 * the object will behave when persisted to a database. In the case of:
 * 
 * Foo f = ... initialize
 * Foo fo = ... initialize
 * boolean b = f.equals(fo);
 * 
 * b would be true if both objects share the same value
 * (or embeddable object) in the field that represents the primary key in the
 * database even if all other fields are equal. This is desirable because
 * two entities representing the same DB record may be retreived from
two different
 * sessions. Additionally these are the identity fields, so logically,
they should map to
 * the concept of identity. Finally, searching a collection is made very simple
 * without requireing an iterator:
 *  * Integer id = //code to initialize
 * collection.contains(new Foo(id));
 * 
 * By default BioJava v3 entities use only the primary key
field for equality
 * If either record has null as the primary key value it
is never equal
 * to another. When implementing equals(Object o) it is
not advisable to perform
 * the test this.getClass() == o.getClass() because of the possibility of proxy
 * classes used in JPA. This can, however, lead to an issue with the
 * hashcode() method.  Consider the following code:
 *  * Foo foo = new Foo() //no primary key
 * HashSet set = new HashSet();
 * set.add(foo);
 * // code here to persist Foo and consequently generate it's PK
 * boolean b = set.contains(foo);
 * 
 * Because only the PK is used for equality, then the PK is used in
the hashcode.
 * This means that b is probably going to be false because
 * it would have been stored in a hash bucket using the old hashcode that will
 * now be different even though the set actually does contain a pointer to foo.
 * Although a potential deficiency it is unlikely to be a major problem for
 * BioJava v3 developers because using entity backed objects is
prefered to direct
 * interaction with entities. If you need to use entities directly
then use hashed
 * collections with caution.
 *
 * Wrapper classes can either delegate it's equals call to the underlying
 * entity or it can do something that is more biologically sensible
 * (as PK values are typically not exposed in the wrapper). It is probably more
 * sensible for a wrapper to define it's own equals (and
haschode
 * implementations due to the limitations of the default @Id based system
 * described above. Especially the potential hashcode problems.
 *
 * For example FooSequence.class might want to base
 * equality on the exact match of the DNA sequence it holds even though
 * FooSequenceEntity.class may only use the PK field. If delegation
 * is used (or not) it should be clearly documented.
 * 

 *
 * 
 * @author Mark Schreiber
 */
package org.biojava.biosql.entity;


From markjschreiber at gmail.com  Tue Oct 21 03:16:51 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 11:16:51 +0800
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: 
References: 
Message-ID: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>

So if I want to build a BioSQL loader from Genbank then would the
classes (or there wrappers) in the BioSQL Entity package need to
implement Thing?  Would maven have an issue with that or would it just
create a dependency on core? (you can tell I've never used Maven
right).

>From a design point of view should Thing be an interface or an
Annotation? The reason I ask is that it doesn't define any methods so
it is more of a tag than an interface.

Anyway, my understanding is that I would use a Genbank parser (or
write one). Write a EntityReceiver interface (probably more than one
given the number of entities in BioSQL, implement a EntityBuilder
(again possibly more than one) that implements EntityReceiver and
builds Entity beans from messages it receives. In this case I probably
wouldn't provide a writer as JPA would be writing the beans to the
database.  Would this be how you imagine it?

- Mark


On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
 wrote:
> (From now on I will only be posting these development messages to
> biojava-dev, which is the intended purpose of that list. Those of you who
> wish to keep track of things but are currently only subscribed to biojava-l
> should also subscribe to biojava-dev in order to keep up to date.)
>
> As promised, I've committed a new package in the biojava-core module that
> should help understand how to do file parsing and conversion and writing in
> the new BJ3 modules. Here's an example of how to use it to write a Genbank
> parser (note no parsers actually exist yet!):
>
> 1. Design yourself a Genbank class which implements the interface Thing and
> can fully represent all the data that might possibly occur inside a Genbank
> file.
>
> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
> and defines all the methods you might need in order to construct a Genbank
> object in an asynchronous fashion.
>
> 3. Write a GenbankBuilder class which implements GenbankReceiver and
> ThingBuilder. It's job is to receive data via method calls, use that data to
> construct a Genbank object, then provide that object on demand.
>
> 4. Write a GenbankWriter class which implements GenbankReceiver and
> ThingWriter. It's job is similar to GenbankBuilder, but instead of
> constructing new Genbank objects, it writes Genbank records to file that
> reflect the data it receives.
>
> 5. Write a GenbankReader class which implements ThingReader. It can read
> GenbankFiles and output the data to the methods of the ThingReceiver
> provided to it, which in this case could be anything which implements the
> interface GenbankReceiver.
>
> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
> Genbank object and will fire off data from it to the provided ThingReceiver
> (a GenbankReceiver instance) as if the Genbank object was being read from a
> file or some other source.
>
> That's it! OK so it's a minimum of 6 classes instead of the original 1 or 2,
> but the additional steps are necessary for flexibility in converting between
> formats.
>
> Now to use it (you'll probably want a GenbankTools class to wrap these steps
> up for user-friendliness, including various options for opening files,
> etc.):
>
> 1. To read a file - instantiate ThingParser with your GenbankReader as the
> reader, and GenbankBuilder as the receiver. Use the iterator methods on
> ThingParser to get the objects out.
>
> 2. To write a file - instantiate ThingParser with a GenbankEmitter wrapping
> your Genbank object, and a GenbankWriter as the receiver. Use the parseAll()
> method on the ThingParser to dump the whole lot to your chosen output.
>
> The clever bit comes when you want to convert between files. Imagine you've
> done all the above for Genbank, and you've also done it for FASTA. How to
> convert between them? What you need to do is this:
>
> 1. Implement all the classes for both Genbank and FASTA.
>
> 2. Write a GenbankFASTAConverter class that implements ThingConverter
> and GenbankReceiver, and will internally convert the data received and pass
> it on out to the receiver provided, which will be a FASTAReceiver instance.
>
> 3. Write a FASTAGenbankConverter class that operates in exactly the opposite
> way, implementing ThingConverter and FASTAReceiver.
>
> Then to convert you use ThingParser again:
>
> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> FASTAReader reader, a GenbankBuilder receiver, and add a
> FASTAGenbankConverter instance to the converter chain. Use the iterator to
> get your Genbank objects out of your FASTA file.
>
> 2. From FASTA file to Genbank file: Same as option 1, but provide a
> GenbankWriter instead and use parseAll() instead of the iterator methos.
>
> 3. From FASTA object to Genbank object: Same as option 1, but provide a
> FASTAEmitter wrapping your FASTA object as the reader instead.
>
> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
> reader and the receiver as per options 2 and 3.
>
> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all mentions
> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>
> One last and very important feature of this approach is that if you discover
> that nobody has written the appropriate converter for your chosen pair of
> formats A and C, but converters do exist to map A to some other format B and
> that other format B on to C, then you can just put the two converts A-B and
> B-C into the ThingParser chain and it'll work perfectly.
>
> Enjoy!
>
> cheers,
> Richard
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From andreas at sdsc.edu  Tue Oct 21 03:17:28 2008
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 20 Oct 2008 20:17:28 -0700
Subject: [Biojava-l] [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: 
References: 
Message-ID: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>

Hi,

Couple of thoughts regarding biojava v3:

License: Since it seems we will end up copying code from biojava 1.6
to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
people should still use the same biojava license headers when
committing new files and all code will be considered to be LGPL, if no
header is present. Do NOT commit code under other licenses.

Installation: We need some installation instructions on the wiki site,
e.g. how to get the maven setup running.  What are the code
conventions for the new version?

Blast: the Blast parsing modules are among the most frequently used
ones in biojava 1.6. To make people use biojava v3 it will be crucial
to have a port of them to the new version. Does anybody want to take
care of that?

Automated builds: is it interesting to have automated builds set up
for the new version at this stage, or should we wait until a more
mature stage? I could easily add another auto-build similar to the one
for biojava 1.6 at http://www.spice-3d.org/cruise/

Andreas

On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
 wrote:
> Hi all,
>
> I've just committed some new code to the biojava3 branch of the biojava-live
> subversion repository. It's the foundations of a brand new alphabet+symbol
> set of classes, and an example of how to use them to represent DNA. You'll
> notice that the new code is very lightweight and allows for a lot more
> flexibility than the old code - for instance, the concept of Alphabet has
> changed radically. It also makes much more extensive use of the Collections
> API.
>
> I haven't got any test cases or usage examples yet but give me a shout if
> you don't understand the code and I'll explain how it works. (Hint:
> SymbolFormat is there to convert Strings into SymbolList objects, and vice
> versa).
>
> So, now we want some volunteers! We're starting from scratch here so there's
> a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> whether it be copy-and-paste existing classes and modify them to suit the
> new style, or write completely new ones to provide equivalent functionality.
>
>
> I'll post an example of how to do file parsing soon, probably starting with
> FASTA. In the meantime, a good place to start would be for people to design
> object models to represent their favourite data types (e.g. Genbank, or
> microarray data). Utility classes to manipulate those objects would be great
> too.
>
> The object models need to be normalised as much as possible - e.g. if your
> data has a lot of comments, and the order of those comments is important,
> then give your object model a collection of comment objects. The object
> model for each data type should be completely independent and use basic data
> types wherever possible (e.g. store sequences as strings, don't attempt to
> parse them into anything fancy like SymbolLists). The closer the object
> model is to the original data format, the better. There's going to be clever
> tricks when it comes to converting data between different object models
> (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> parsing examples up.
>
> You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> because we want to make it as modular as possible, so if you want to write
> microarray stuff, create a new microarray sub-project (as per the dna
> example that's already there). This way if someone only wants the microarray
> bit of BJ3, they only need install the appropriate JAR file and can ignore
> the rest. (The 'core' module is for stuff that is so generic it could be
> used anywhere, or is used in every single other module.)
>
> If coding isn't your cup of tea, then we would very much welcome testers
> (particularly those who enjoy writing test cases!), documenters
> (particularly code commenters), translators (for internationalisation of the
> code), and of course all those who wish to contribute ideas and suggestions
> no matter how off-the-wall they might be. In particular if you'd like to
> take charge of an area of the development process, e.g. Documentation Chief,
> or Protein Champion, then that would be much appreciated.
>
> I'm very much looking forward to working with everyone on this. Good luck,
> and happy coding!
>
> cheers,
> Richard
>
> PS. Please don't forget to attach the appropriate licence to your code. You
> can copy-and-paste it from the existing classes I just committed this
> evening.
>
> PPS. For those who are worried about backwards compatibility - this was
> discussed on the lists a while back and it was made clear that BJ3 is a
> clean break. However, the existing code will continue to be maintained and
> bugfixed for a couple of years so you don't have to upgrade if you don't
> want to - it just won't have any new features developed for it. This is
> largely because it'll probably take just that long to write all the new BJ3
> code. When we do decide to desupport the existing BJ code, plenty of notice
> will be given (i.e. years as opposed to months).
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>


From markjschreiber at gmail.com  Tue Oct 21 05:41:28 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 13:41:28 +0800
Subject: [Biojava-l] Logging in BJ3
Message-ID: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>

Hi -

I would like to strongly advocate the liberal and extensive use of
Logging in BioJava3.  The lack of this plagued us (me at least) during
bug fixes in previous versions of BioJava.  The default Java logging
API is very flexible and easily meets our needs. It's also not too
much effort for developers to put in place (you know you use
System.println() all over the place anyway).

The following is an example snippet using logging that would certainly
help debugging.  With the standard logging setup only the severe
statement would appear on the terminal. We could also provide config
files that show lower levels of logging so that people can easily
generate detailed logs to accompany bug reports.  If we want to be
really tricky we could even use a MemoryLogger that has a rotating
buffer of log statements that could spit out with a stack trace so you
could just submit the stack trace and the activity log all in one go
and we can get an idea of what was going on at the time.

The example below also shows what to do to avoid a major performance
hit during logging. The marked "expensive logging operation" pretends
to get config information by getting it from a database. One might
expect this to take time while the db connects etc and could produce
quite a long String of information. To save time when logging is not
set to the CONFIG level the if statement is able to skip this costly
step.

I know from experience we will definitely get the most value from this
in the IO parsers and ThingBuilders.

Any thoughts?

- Mark



    private Logger logger = Logger.getLogger("org.biojava.MyClass");

    public Object generateObject(String argument){
         logger.entering(""+getClass(), "generateObject", argument);

         //expensive logging operation
         if (logger.isLoggable( Level.CONFIG )) {
            logger.config("DB config: "+ getDBConfigInfo());
         }

         Object obj = null;
         try{

            //do some stuff
            logger.fine("doing stuff");
            obj = new Object();

         }catch(Exception ex){
             logger.severe("Failed to do stuff");
             logger.throwing(""+getClass(), "generateObject", ex);
         }

         logger.exiting(""+getClass(), "generateObject", obj);
         return obj;
    }


From holland at eaglegenomics.com  Tue Oct 21 08:34:46 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 09:34:46 +0100
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: <93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
References: 
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
Message-ID: 

Spot on.

Annotation/interface.... i think Annotation is probably better as you
suggest, but I'd have to look into that. Not sure how it works with
collections and generics. If it does turn out to be a better bet, I'll
change it over.

With the BioSQL dependencies, take a look at the pom.xml file inside the
biojava-dna module. It declares a dependency on biojava-core. If you want to
add dependencies to external JARs, take a look at biojava-biosql's pom.xml
to see how it depends on javax.persistence. (The easiest way to add these is
via an IDE such as NetBeans, which is what I'm using at the moment).

cheers,
Richard

2008/10/21 Mark Schreiber 

> So if I want to build a BioSQL loader from Genbank then would the
> classes (or there wrappers) in the BioSQL Entity package need to
> implement Thing?  Would maven have an issue with that or would it just
> create a dependency on core? (you can tell I've never used Maven
> right).
>
> From a design point of view should Thing be an interface or an
> Annotation? The reason I ask is that it doesn't define any methods so
> it is more of a tag than an interface.
>
> Anyway, my understanding is that I would use a Genbank parser (or
> write one). Write a EntityReceiver interface (probably more than one
> given the number of entities in BioSQL, implement a EntityBuilder
> (again possibly more than one) that implements EntityReceiver and
> builds Entity beans from messages it receives. In this case I probably
> wouldn't provide a writer as JPA would be writing the beans to the
> database.  Would this be how you imagine it?
>
> - Mark
>
>
> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>  wrote:
> > (From now on I will only be posting these development messages to
> > biojava-dev, which is the intended purpose of that list. Those of you who
> > wish to keep track of things but are currently only subscribed to
> biojava-l
> > should also subscribe to biojava-dev in order to keep up to date.)
> >
> > As promised, I've committed a new package in the biojava-core module that
> > should help understand how to do file parsing and conversion and writing
> in
> > the new BJ3 modules. Here's an example of how to use it to write a
> Genbank
> > parser (note no parsers actually exist yet!):
> >
> > 1. Design yourself a Genbank class which implements the interface Thing
> and
> > can fully represent all the data that might possibly occur inside a
> Genbank
> > file.
> >
> > 2. Write an interface called GenbankReceiver, which extends ThingReceiver
> > and defines all the methods you might need in order to construct a
> Genbank
> > object in an asynchronous fashion.
> >
> > 3. Write a GenbankBuilder class which implements GenbankReceiver and
> > ThingBuilder. It's job is to receive data via method calls, use that data
> to
> > construct a Genbank object, then provide that object on demand.
> >
> > 4. Write a GenbankWriter class which implements GenbankReceiver and
> > ThingWriter. It's job is similar to GenbankBuilder, but instead of
> > constructing new Genbank objects, it writes Genbank records to file that
> > reflect the data it receives.
> >
> > 5. Write a GenbankReader class which implements ThingReader. It can read
> > GenbankFiles and output the data to the methods of the ThingReceiver
> > provided to it, which in this case could be anything which implements the
> > interface GenbankReceiver.
> >
> > 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
> > Genbank object and will fire off data from it to the provided
> ThingReceiver
> > (a GenbankReceiver instance) as if the Genbank object was being read from
> a
> > file or some other source.
> >
> > That's it! OK so it's a minimum of 6 classes instead of the original 1 or
> 2,
> > but the additional steps are necessary for flexibility in converting
> between
> > formats.
> >
> > Now to use it (you'll probably want a GenbankTools class to wrap these
> steps
> > up for user-friendliness, including various options for opening files,
> > etc.):
> >
> > 1. To read a file - instantiate ThingParser with your GenbankReader as
> the
> > reader, and GenbankBuilder as the receiver. Use the iterator methods on
> > ThingParser to get the objects out.
> >
> > 2. To write a file - instantiate ThingParser with a GenbankEmitter
> wrapping
> > your Genbank object, and a GenbankWriter as the receiver. Use the
> parseAll()
> > method on the ThingParser to dump the whole lot to your chosen output.
> >
> > The clever bit comes when you want to convert between files. Imagine
> you've
> > done all the above for Genbank, and you've also done it for FASTA. How to
> > convert between them? What you need to do is this:
> >
> > 1. Implement all the classes for both Genbank and FASTA.
> >
> > 2. Write a GenbankFASTAConverter class that implements
> ThingConverter
> > and GenbankReceiver, and will internally convert the data received and
> pass
> > it on out to the receiver provided, which will be a FASTAReceiver
> instance.
> >
> > 3. Write a FASTAGenbankConverter class that operates in exactly the
> opposite
> > way, implementing ThingConverter and FASTAReceiver.
> >
> > Then to convert you use ThingParser again:
> >
> > 1. From FASTA file to Genbank object: Instantiate ThingParser with a
> > FASTAReader reader, a GenbankBuilder receiver, and add a
> > FASTAGenbankConverter instance to the converter chain. Use the iterator
> to
> > get your Genbank objects out of your FASTA file.
> >
> > 2. From FASTA file to Genbank file: Same as option 1, but provide a
> > GenbankWriter instead and use parseAll() instead of the iterator methos.
> >
> > 3. From FASTA object to Genbank object: Same as option 1, but provide a
> > FASTAEmitter wrapping your FASTA object as the reader instead.
> >
> > 4. From FASTA object to Genbank file: Same as option 1, but swap both the
> > reader and the receiver as per options 2 and 3.
> >
> > 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
> mentions
> > of FASTA and Genbank, and use GenbankFASTAConverter instead.
> >
> > One last and very important feature of this approach is that if you
> discover
> > that nobody has written the appropriate converter for your chosen pair of
> > formats A and C, but converters do exist to map A to some other format B
> and
> > that other format B on to C, then you can just put the two converts A-B
> and
> > B-C into the ThingParser chain and it'll work perfectly.
> >
> > Enjoy!
> >
> > cheers,
> > Richard
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From ayates at ebi.ac.uk  Tue Oct 21 08:40:48 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 09:40:48 +0100
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
Message-ID: <48FD9590.5010704@ebi.ac.uk>

Hi,

A logging framework is a priority to start baking into the new API now.
As Mark has mentioned logging frameworks are very flexible things but
it's not until you start using them do you get a real feel about how
easy & extensible they are.

The JDK logger has some good integration with MessageFormat &
localization. I'm not completely taken with how it does the checks for
log levels (log.isDebugEnabled() just seems easier that
log.isLoggable(Level.FINEST)) & how you grab a logger ( I'd prefer
something like Logger.getLogger(this.getClass()) ) but that's just
nit-picking.

I'll be happy to go with whatever people are most comfortable with & we
should attempt to use as many of the core Java classes as possible.

Andy

Mark Schreiber wrote:
> Hi -
> 
> I would like to strongly advocate the liberal and extensive use of
> Logging in BioJava3.  The lack of this plagued us (me at least) during
> bug fixes in previous versions of BioJava.  The default Java logging
> API is very flexible and easily meets our needs. It's also not too
> much effort for developers to put in place (you know you use
> System.println() all over the place anyway).
> 
> The following is an example snippet using logging that would certainly
> help debugging.  With the standard logging setup only the severe
> statement would appear on the terminal. We could also provide config
> files that show lower levels of logging so that people can easily
> generate detailed logs to accompany bug reports.  If we want to be
> really tricky we could even use a MemoryLogger that has a rotating
> buffer of log statements that could spit out with a stack trace so you
> could just submit the stack trace and the activity log all in one go
> and we can get an idea of what was going on at the time.
> 
> The example below also shows what to do to avoid a major performance
> hit during logging. The marked "expensive logging operation" pretends
> to get config information by getting it from a database. One might
> expect this to take time while the db connects etc and could produce
> quite a long String of information. To save time when logging is not
> set to the CONFIG level the if statement is able to skip this costly
> step.
> 
> I know from experience we will definitely get the most value from this
> in the IO parsers and ThingBuilders.
> 
> Any thoughts?
> 
> - Mark
> 
> 
> 
>     private Logger logger = Logger.getLogger("org.biojava.MyClass");
> 
>     public Object generateObject(String argument){
>          logger.entering(""+getClass(), "generateObject", argument);
> 
>          //expensive logging operation
>          if (logger.isLoggable( Level.CONFIG )) {
>             logger.config("DB config: "+ getDBConfigInfo());
>          }
> 
>          Object obj = null;
>          try{
> 
>             //do some stuff
>             logger.fine("doing stuff");
>             obj = new Object();
> 
>          }catch(Exception ex){
>              logger.severe("Failed to do stuff");
>              logger.throwing(""+getClass(), "generateObject", ex);
>          }
> 
>          logger.exiting(""+getClass(), "generateObject", obj);
>          return obj;
>     }
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From ayates at ebi.ac.uk  Tue Oct 21 08:49:47 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 09:49:47 +0100
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: 
References: 	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	
Message-ID: <48FD97AB.70503@ebi.ac.uk>

Depends on what you want to program. If you want to have a collection of
objects which are Things & perform a common action on them then
annotations are not the way forward.

If you want to have some kind of meta-programming occurring & need a
class to be multiple things then annotations are right. There is
currently no way to enforce compile time dependencies on annotations &
my thinking is that this is right. Annotations should be meta data or
provide a way to alter a class in a non-invasive way (think Web Service
annotations creating WS Servers & Clients without any alteration of the
class).

Andy

Richard Holland wrote:
> Spot on.
> 
> Annotation/interface.... i think Annotation is probably better as you
> suggest, but I'd have to look into that. Not sure how it works with
> collections and generics. If it does turn out to be a better bet, I'll
> change it over.
> 
> With the BioSQL dependencies, take a look at the pom.xml file inside the
> biojava-dna module. It declares a dependency on biojava-core. If you want to
> add dependencies to external JARs, take a look at biojava-biosql's pom.xml
> to see how it depends on javax.persistence. (The easiest way to add these is
> via an IDE such as NetBeans, which is what I'm using at the moment).
> 
> cheers,
> Richard
> 
> 2008/10/21 Mark Schreiber 
> 
>> So if I want to build a BioSQL loader from Genbank then would the
>> classes (or there wrappers) in the BioSQL Entity package need to
>> implement Thing?  Would maven have an issue with that or would it just
>> create a dependency on core? (you can tell I've never used Maven
>> right).
>>
>> From a design point of view should Thing be an interface or an
>> Annotation? The reason I ask is that it doesn't define any methods so
>> it is more of a tag than an interface.
>>
>> Anyway, my understanding is that I would use a Genbank parser (or
>> write one). Write a EntityReceiver interface (probably more than one
>> given the number of entities in BioSQL, implement a EntityBuilder
>> (again possibly more than one) that implements EntityReceiver and
>> builds Entity beans from messages it receives. In this case I probably
>> wouldn't provide a writer as JPA would be writing the beans to the
>> database.  Would this be how you imagine it?
>>
>> - Mark
>>
>>
>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>  wrote:
>>> (From now on I will only be posting these development messages to
>>> biojava-dev, which is the intended purpose of that list. Those of you who
>>> wish to keep track of things but are currently only subscribed to
>> biojava-l
>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>
>>> As promised, I've committed a new package in the biojava-core module that
>>> should help understand how to do file parsing and conversion and writing
>> in
>>> the new BJ3 modules. Here's an example of how to use it to write a
>> Genbank
>>> parser (note no parsers actually exist yet!):
>>>
>>> 1. Design yourself a Genbank class which implements the interface Thing
>> and
>>> can fully represent all the data that might possibly occur inside a
>> Genbank
>>> file.
>>>
>>> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
>>> and defines all the methods you might need in order to construct a
>> Genbank
>>> object in an asynchronous fashion.
>>>
>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>>> ThingBuilder. It's job is to receive data via method calls, use that data
>> to
>>> construct a Genbank object, then provide that object on demand.
>>>
>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>> constructing new Genbank objects, it writes Genbank records to file that
>>> reflect the data it receives.
>>>
>>> 5. Write a GenbankReader class which implements ThingReader. It can read
>>> GenbankFiles and output the data to the methods of the ThingReceiver
>>> provided to it, which in this case could be anything which implements the
>>> interface GenbankReceiver.
>>>
>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
>>> Genbank object and will fire off data from it to the provided
>> ThingReceiver
>>> (a GenbankReceiver instance) as if the Genbank object was being read from
>> a
>>> file or some other source.
>>>
>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 or
>> 2,
>>> but the additional steps are necessary for flexibility in converting
>> between
>>> formats.
>>>
>>> Now to use it (you'll probably want a GenbankTools class to wrap these
>> steps
>>> up for user-friendliness, including various options for opening files,
>>> etc.):
>>>
>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
>> the
>>> reader, and GenbankBuilder as the receiver. Use the iterator methods on
>>> ThingParser to get the objects out.
>>>
>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>> wrapping
>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>> parseAll()
>>> method on the ThingParser to dump the whole lot to your chosen output.
>>>
>>> The clever bit comes when you want to convert between files. Imagine
>> you've
>>> done all the above for Genbank, and you've also done it for FASTA. How to
>>> convert between them? What you need to do is this:
>>>
>>> 1. Implement all the classes for both Genbank and FASTA.
>>>
>>> 2. Write a GenbankFASTAConverter class that implements
>> ThingConverter
>>> and GenbankReceiver, and will internally convert the data received and
>> pass
>>> it on out to the receiver provided, which will be a FASTAReceiver
>> instance.
>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>> opposite
>>> way, implementing ThingConverter and FASTAReceiver.
>>>
>>> Then to convert you use ThingParser again:
>>>
>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>> FASTAGenbankConverter instance to the converter chain. Use the iterator
>> to
>>> get your Genbank objects out of your FASTA file.
>>>
>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>> GenbankWriter instead and use parseAll() instead of the iterator methos.
>>>
>>> 3. From FASTA object to Genbank object: Same as option 1, but provide a
>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>
>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
>>> reader and the receiver as per options 2 and 3.
>>>
>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>> mentions
>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>
>>> One last and very important feature of this approach is that if you
>> discover
>>> that nobody has written the appropriate converter for your chosen pair of
>>> formats A and C, but converters do exist to map A to some other format B
>> and
>>> that other format B on to C, then you can just put the two converts A-B
>> and
>>> B-C into the ThingParser chain and it'll work perfectly.
>>>
>>> Enjoy!
>>>
>>> cheers,
>>> Richard
>>>
>>> --
>>> Richard Holland, BSc MBCS
>>> Finance Director, Eagle Genomics Ltd
>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
> 
> 
> 


From holland at eaglegenomics.com  Tue Oct 21 09:06:41 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 21 Oct 2008 10:06:41 +0100
Subject: [Biojava-l] [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
References: 
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
Message-ID: 

>
>
> License: Since it seems we will end up copying code from biojava 1.6
> to biojava 3.0, we need to keep the license the same (LGPL 2.1). I.e.
> people should still use the same biojava license headers when
> committing new files and all code will be considered to be LGPL, if no
> header is present. Do NOT commit code under other licenses.
>
> Installation: We need some installation instructions on the wiki site,
> e.g. how to get the maven setup running.  What are the code
> conventions for the new version?


Not sure where best to put it in the Wiki, but I agree it needs to go there
somewhere.

Installation is a one-liner from within the top level of the project:

   mvn install

This compiles and installs the JARs into your local Maven repository, and
also downloads and installs any external dependencies. Then you can add the
installed modules as dependencies in your own Maven projects.

If you need to write a launcher script for your project, or you want to use
the JAR files outside Maven, you can use this command to generate the
CLASSPATH for use outside Maven. This only includes external dependencies -
you'll also need to add to it the individual JAR files from inside the
various target/ folders that Maven built for you:

  mvn dependency:build-classpath

Code conventions are simple:

1. I'm not fussed about the specific formatter people use in each module, as
long as the code is all formatted using some kind of consistent method. I
personally just use the default settings from Format code in NetBeans.

2. Use 'this' wherever possible, and for static references, use the
classname prefix (e.g. MyClass.staticField). I hate having to try and work
out in my head which references are going where, and which are static and
which are not!

3. Comment every single method, even if it's private. This helps understand
the flow of your code. Also comment liberally inside methods if they are
longer than just a few lines (i.e. if you can't fit the entire method within
the code panel in NetBeans, its going to need internal comments).

4. When writing getters/setters, follow the Java beans conventions so that
automated frameworks like Spring can easily pick it up and work with it.

5. Please write tests for your code using JUnit conventions, inside the
test/ folder of each module. I know I haven't done this myself yet, but I'm
going to!


>
>
> Blast: the Blast parsing modules are among the most frequently used
> ones in biojava 1.6. To make people use biojava v3 it will be crucial
> to have a port of them to the new version. Does anybody want to take
> care of that?


I'll second that. Blast is vital. We'd really appreciate a volunteer,
please!


>
> Automated builds: is it interesting to have automated builds set up
> for the new version at this stage, or should we wait until a more
> mature stage? I could easily add another auto-build similar to the one
> for biojava 1.6 at http://www.spice-3d.org/cruise/


You could do, although I don't think they'd be much use yet. But why not
start early then we won't forget to do it later.


Richard


>
> Andreas
>
> On Sun, Oct 19, 2008 at 5:18 PM, Richard Holland
>  wrote:
> > Hi all,
> >
> > I've just committed some new code to the biojava3 branch of the
> biojava-live
> > subversion repository. It's the foundations of a brand new
> alphabet+symbol
> > set of classes, and an example of how to use them to represent DNA.
> You'll
> > notice that the new code is very lightweight and allows for a lot more
> > flexibility than the old code - for instance, the concept of Alphabet has
> > changed radically. It also makes much more extensive use of the
> Collections
> > API.
> >
> > I haven't got any test cases or usage examples yet but give me a shout if
> > you don't understand the code and I'll explain how it works. (Hint:
> > SymbolFormat is there to convert Strings into SymbolList objects, and
> vice
> > versa).
> >
> > So, now we want some volunteers! We're starting from scratch here so
> there's
> > a lot of work to do. The whole of BioJava needs 'translating' into BJ3,
> > whether it be copy-and-paste existing classes and modify them to suit the
> > new style, or write completely new ones to provide equivalent
> functionality.
> >
> >
> > I'll post an example of how to do file parsing soon, probably starting
> with
> > FASTA. In the meantime, a good place to start would be for people to
> design
> > object models to represent their favourite data types (e.g. Genbank, or
> > microarray data). Utility classes to manipulate those objects would be
> great
> > too.
> >
> > The object models need to be normalised as much as possible - e.g. if
> your
> > data has a lot of comments, and the order of those comments is important,
> > then give your object model a collection of comment objects. The object
> > model for each data type should be completely independent and use basic
> data
> > types wherever possible (e.g. store sequences as strings, don't attempt
> to
> > parse them into anything fancy like SymbolLists). The closer the object
> > model is to the original data format, the better. There's going to be
> clever
> > tricks when it comes to converting data between different object models
> > (e.g. Genbank to INSDSeq), which I will explain later when I put the file
> > parsing examples up.
> >
> > You'll notice how the biojava3 branch uses Maven instead of Ant. This is
> > because we want to make it as modular as possible, so if you want to
> write
> > microarray stuff, create a new microarray sub-project (as per the dna
> > example that's already there). This way if someone only wants the
> microarray
> > bit of BJ3, they only need install the appropriate JAR file and can
> ignore
> > the rest. (The 'core' module is for stuff that is so generic it could be
> > used anywhere, or is used in every single other module.)
> >
> > If coding isn't your cup of tea, then we would very much welcome testers
> > (particularly those who enjoy writing test cases!), documenters
> > (particularly code commenters), translators (for internationalisation of
> the
> > code), and of course all those who wish to contribute ideas and
> suggestions
> > no matter how off-the-wall they might be. In particular if you'd like to
> > take charge of an area of the development process, e.g. Documentation
> Chief,
> > or Protein Champion, then that would be much appreciated.
> >
> > I'm very much looking forward to working with everyone on this. Good
> luck,
> > and happy coding!
> >
> > cheers,
> > Richard
> >
> > PS. Please don't forget to attach the appropriate licence to your code.
> You
> > can copy-and-paste it from the existing classes I just committed this
> > evening.
> >
> > PPS. For those who are worried about backwards compatibility - this was
> > discussed on the lists a while back and it was made clear that BJ3 is a
> > clean break. However, the existing code will continue to be maintained
> and
> > bugfixed for a couple of years so you don't have to upgrade if you don't
> > want to - it just won't have any new features developed for it. This is
> > largely because it'll probably take just that long to write all the new
> BJ3
> > code. When we do decide to desupport the existing BJ code, plenty of
> notice
> > will be given (i.e. years as opposed to months).
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From benn at mpi-cbg.de  Tue Oct 21 09:00:44 2008
From: benn at mpi-cbg.de (Neil Benn)
Date: Tue, 21 Oct 2008 11:00:44 +0200
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
Message-ID: <48FD9A3C.20904@mpi-cbg.de>

Hello,

          I'm not sure if I should comment as I have no time to 
contribute LOC but I thought I may as well ;).

Mark Schreiber wrote:
> Hi -
>
> I would like to strongly advocate the liberal and extensive use of
> Logging in BioJava3.  The lack of this plagued us (me at least) during
> bug fixes in previous versions of BioJava.  The default Java logging
> API is very flexible and easily meets our needs. It's also not too
> much effort for developers to put in place (you know you use
> System.println() all over the place anyway).
>   
Hmm, that is true but for total completeness you can use 
commons-logging, that is very easy to use and much more flexible as it 
can encapsulate other logging mechanisms (including JDK1.4 logging 
framework).  To use it you simply declare a new logger as follows:

private static final Log logger = LogFactory.getLog();

The rest of it works pretty much the same as below- if you dovetail 
commons-logging with log4j then you'll cover the most common case of 
logging used in other frameworks - the config files to setup log4j (XML 
and preperties fiels) are well documented all over the web.
> 
>
> I know from experience we will definitely get the most value from this
> in the IO parsers and ThingBuilders.
>
> Any thoughts?
>   
+1
> - Mark
>
>
>
>     private Logger logger = Logger.getLogger("org.biojava.MyClass");
>
>     public Object generateObject(String argument){
>          logger.entering(""+getClass(), "generateObject", argument);
>
>          //expensive logging operation
>          if (logger.isLoggable( Level.CONFIG )) {
>             logger.config("DB config: "+ getDBConfigInfo());
>          }
>
>          Object obj = null;
>          try{
>
>             //do some stuff
>             logger.fine("doing stuff");
>             obj = new Object();
>
>          }catch(Exception ex){
>              logger.severe("Failed to do stuff");
>              logger.throwing(""+getClass(), "generateObject", ex);
>          }
>
>          logger.exiting(""+getClass(), "generateObject", obj);
>          return obj;
>     }
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>   



From markjschreiber at gmail.com  Tue Oct 21 09:18:41 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 17:18:41 +0800
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: 
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
	
Message-ID: <93b45ca50810210218n1e2ac06bma211f1541b8be3bb@mail.gmail.com>

For the Entity classes my original thinking was to implement an EJB3
interceptor which logs all method calls. This would be preferable to
putting logging statements in all the classes but I don't know if such
an interceptor will work outside of a container. Does anyone know if
JPA can use an interceptor outside of a container?

Logging for the actual persistence would be via the persistence
provider (Hibernate, Toplink etc).

- Mark

On Tue, Oct 21, 2008 at 5:08 PM, Richard Holland
 wrote:
> Excellent idea. I'll integrate it into ThingParser as an example
>
> 2008/10/21 Mark Schreiber 
>>
>> Hi -
>>
>> I would like to strongly advocate the liberal and extensive use of
>> Logging in BioJava3.  The lack of this plagued us (me at least) during
>> bug fixes in previous versions of BioJava.  The default Java logging
>> API is very flexible and easily meets our needs. It's also not too
>> much effort for developers to put in place (you know you use
>> System.println() all over the place anyway).
>>
>> The following is an example snippet using logging that would certainly
>> help debugging.  With the standard logging setup only the severe
>> statement would appear on the terminal. We could also provide config
>> files that show lower levels of logging so that people can easily
>> generate detailed logs to accompany bug reports.  If we want to be
>> really tricky we could even use a MemoryLogger that has a rotating
>> buffer of log statements that could spit out with a stack trace so you
>> could just submit the stack trace and the activity log all in one go
>> and we can get an idea of what was going on at the time.
>>
>> The example below also shows what to do to avoid a major performance
>> hit during logging. The marked "expensive logging operation" pretends
>> to get config information by getting it from a database. One might
>> expect this to take time while the db connects etc and could produce
>> quite a long String of information. To save time when logging is not
>> set to the CONFIG level the if statement is able to skip this costly
>> step.
>>
>> I know from experience we will definitely get the most value from this
>> in the IO parsers and ThingBuilders.
>>
>> Any thoughts?
>>
>> - Mark
>>
>>
>>
>>    private Logger logger = Logger.getLogger("org.biojava.MyClass");
>>
>>    public Object generateObject(String argument){
>>         logger.entering(""+getClass(), "generateObject", argument);
>>
>>         //expensive logging operation
>>         if (logger.isLoggable( Level.CONFIG )) {
>>            logger.config("DB config: "+ getDBConfigInfo());
>>         }
>>
>>         Object obj = null;
>>         try{
>>
>>            //do some stuff
>>            logger.fine("doing stuff");
>>            obj = new Object();
>>
>>         }catch(Exception ex){
>>             logger.severe("Failed to do stuff");
>>             logger.throwing(""+getClass(), "generateObject", ex);
>>         }
>>
>>         logger.exiting(""+getClass(), "generateObject", obj);
>>         return obj;
>>    }
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>


From ayates at ebi.ac.uk  Tue Oct 21 09:21:26 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 10:21:26 +0100
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <48FD9A3C.20904@mpi-cbg.de>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>
	<48FD9A3C.20904@mpi-cbg.de>
Message-ID: <48FD9F16.2000405@ebi.ac.uk>

Hi Neil,

That's okay the more people take an interest in this the better it will
be. We did discuss this quite a bit ago at a biojava meeting & the
general consensus was bridges can be manually written between the
logging frameworks as and when they are required. Also using the JDK
logger reduces our external dependencies.

However I do like the logging facades & am in favour of them. Especially
SLF4J which does the same thing as commons-logging but relies on the
existence of SLF4J adaptors not the raw logging framework which
commons-logging does. It also has links to a lot more logging frameworks
 including simple-log (https://simple-log.dev.java.net/) & logback
(http://logback.qos.ch/).

There's just so many options here it's hard to gauge what is the best
thing to do. Do we buy into a single framework & use all of its features
(JDK logger has nice things for logging entering & exiting methods along
with locale ResourceBundles) or go for a common denominator.

It's not an easy decision to make ........

Andy

Neil Benn wrote:
> Hello,
> 
>          I'm not sure if I should comment as I have no time to
> contribute LOC but I thought I may as well ;).
> 
> Mark Schreiber wrote:
>> Hi -
>>
>> I would like to strongly advocate the liberal and extensive use of
>> Logging in BioJava3.  The lack of this plagued us (me at least) during
>> bug fixes in previous versions of BioJava.  The default Java logging
>> API is very flexible and easily meets our needs. It's also not too
>> much effort for developers to put in place (you know you use
>> System.println() all over the place anyway).
>>   
> Hmm, that is true but for total completeness you can use
> commons-logging, that is very easy to use and much more flexible as it
> can encapsulate other logging mechanisms (including JDK1.4 logging
> framework).  To use it you simply declare a new logger as follows:
> 
> private static final Log logger = LogFactory.getLog( here>);
> 
> The rest of it works pretty much the same as below- if you dovetail
> commons-logging with log4j then you'll cover the most common case of
> logging used in other frameworks - the config files to setup log4j (XML
> and preperties fiels) are well documented all over the web.
>> 
>>
>> I know from experience we will definitely get the most value from this
>> in the IO parsers and ThingBuilders.
>>
>> Any thoughts?
>>   
> +1
>> - Mark
>>
>>
>>
>>     private Logger logger = Logger.getLogger("org.biojava.MyClass");
>>
>>     public Object generateObject(String argument){
>>          logger.entering(""+getClass(), "generateObject", argument);
>>
>>          //expensive logging operation
>>          if (logger.isLoggable( Level.CONFIG )) {
>>             logger.config("DB config: "+ getDBConfigInfo());
>>          }
>>
>>          Object obj = null;
>>          try{
>>
>>             //do some stuff
>>             logger.fine("doing stuff");
>>             obj = new Object();
>>
>>          }catch(Exception ex){
>>              logger.severe("Failed to do stuff");
>>              logger.throwing(""+getClass(), "generateObject", ex);
>>          }
>>
>>          logger.exiting(""+getClass(), "generateObject", obj);
>>          return obj;
>>     }
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>   
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From ayates at ebi.ac.uk  Tue Oct 21 09:23:35 2008
From: ayates at ebi.ac.uk (Andy Yates)
Date: Tue, 21 Oct 2008 10:23:35 +0100
Subject: [Biojava-l] Logging in BJ3
In-Reply-To: <93b45ca50810210218n1e2ac06bma211f1541b8be3bb@mail.gmail.com>
References: <93b45ca50810202241i767e2a56w2d9b7ede0f895431@mail.gmail.com>	
	<93b45ca50810210218n1e2ac06bma211f1541b8be3bb@mail.gmail.com>
Message-ID: <48FD9F97.8010705@ebi.ac.uk>

As far as I was aware JPA has no concept of EJB3 interceptors. If you
want that kind of thing I think you would have to start using AOP or
proxy objects.

Andy

Mark Schreiber wrote:
> For the Entity classes my original thinking was to implement an EJB3
> interceptor which logs all method calls. This would be preferable to
> putting logging statements in all the classes but I don't know if such
> an interceptor will work outside of a container. Does anyone know if
> JPA can use an interceptor outside of a container?
> 
> Logging for the actual persistence would be via the persistence
> provider (Hibernate, Toplink etc).
> 
> - Mark
> 
> On Tue, Oct 21, 2008 at 5:08 PM, Richard Holland
>  wrote:
>> Excellent idea. I'll integrate it into ThingParser as an example
>>
>> 2008/10/21 Mark Schreiber 
>>> Hi -
>>>
>>> I would like to strongly advocate the liberal and extensive use of
>>> Logging in BioJava3.  The lack of this plagued us (me at least) during
>>> bug fixes in previous versions of BioJava.  The default Java logging
>>> API is very flexible and easily meets our needs. It's also not too
>>> much effort for developers to put in place (you know you use
>>> System.println() all over the place anyway).
>>>
>>> The following is an example snippet using logging that would certainly
>>> help debugging.  With the standard logging setup only the severe
>>> statement would appear on the terminal. We could also provide config
>>> files that show lower levels of logging so that people can easily
>>> generate detailed logs to accompany bug reports.  If we want to be
>>> really tricky we could even use a MemoryLogger that has a rotating
>>> buffer of log statements that could spit out with a stack trace so you
>>> could just submit the stack trace and the activity log all in one go
>>> and we can get an idea of what was going on at the time.
>>>
>>> The example below also shows what to do to avoid a major performance
>>> hit during logging. The marked "expensive logging operation" pretends
>>> to get config information by getting it from a database. One might
>>> expect this to take time while the db connects etc and could produce
>>> quite a long String of information. To save time when logging is not
>>> set to the CONFIG level the if statement is able to skip this costly
>>> step.
>>>
>>> I know from experience we will definitely get the most value from this
>>> in the IO parsers and ThingBuilders.
>>>
>>> Any thoughts?
>>>
>>> - Mark
>>>
>>>
>>>
>>>    private Logger logger = Logger.getLogger("org.biojava.MyClass");
>>>
>>>    public Object generateObject(String argument){
>>>         logger.entering(""+getClass(), "generateObject", argument);
>>>
>>>         //expensive logging operation
>>>         if (logger.isLoggable( Level.CONFIG )) {
>>>            logger.config("DB config: "+ getDBConfigInfo());
>>>         }
>>>
>>>         Object obj = null;
>>>         try{
>>>
>>>            //do some stuff
>>>            logger.fine("doing stuff");
>>>            obj = new Object();
>>>
>>>         }catch(Exception ex){
>>>             logger.severe("Failed to do stuff");
>>>             logger.throwing(""+getClass(), "generateObject", ex);
>>>         }
>>>
>>>         logger.exiting(""+getClass(), "generateObject", obj);
>>>         return obj;
>>>    }
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From markjschreiber at gmail.com  Tue Oct 21 09:26:41 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 17:26:41 +0800
Subject: [Biojava-l] [Biojava-dev] BioJava 3 Begins - Volunteers please!
In-Reply-To: 
References: 
	<59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	
Message-ID: <93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>

>> Blast: the Blast parsing modules are among the most frequently used
>> ones in biojava 1.6. To make people use biojava v3 it will be crucial
>> to have a port of them to the new version. Does anybody want to take
>> care of that?
>
>
> I'll second that. Blast is vital. We'd really appreciate a volunteer,
> please!
>

BlastXML output would certainly be the easiest place to start. I also
think with the new Thing/ ThingBuilder framework it will be possible
to develop all manner of parsers for the vagaries of Blast text output
that come with each new release of Blast. Possible but maybe not a
good idea. I don't think that output was ever supposed to be machine
readable.  The table formatted output (-m8 I think) would be a better
option.

Given the DTD it should be possible to do a quick JAXB binding. How
would that work in the Thing/ ThingBuilder paradigm?

- Mark


From markjschreiber at gmail.com  Tue Oct 21 10:35:14 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Tue, 21 Oct 2008 18:35:14 +0800
Subject: [Biojava-l] File parsing in BJ3
In-Reply-To: <48FD97AB.70503@ebi.ac.uk>
References: 
	<93b45ca50810202016j13a2a2a9y78a2992e543d6f5a@mail.gmail.com>
	
	<48FD97AB.70503@ebi.ac.uk>
Message-ID: <93b45ca50810210335j5ef4a206y545e5a1869cedc03@mail.gmail.com>

Is there any need for Thing at all? Can't a bulder be typed to produce
something that extends Object?

If Thing provides no behaivour contract or meta-information then why
does it exist?

- Mark

On Tue, Oct 21, 2008 at 4:49 PM, Andy Yates  wrote:
> Depends on what you want to program. If you want to have a collection of
> objects which are Things & perform a common action on them then
> annotations are not the way forward.
>
> If you want to have some kind of meta-programming occurring & need a
> class to be multiple things then annotations are right. There is
> currently no way to enforce compile time dependencies on annotations &
> my thinking is that this is right. Annotations should be meta data or
> provide a way to alter a class in a non-invasive way (think Web Service
> annotations creating WS Servers & Clients without any alteration of the
> class).
>
> Andy
>
> Richard Holland wrote:
>> Spot on.
>>
>> Annotation/interface.... i think Annotation is probably better as you
>> suggest, but I'd have to look into that. Not sure how it works with
>> collections and generics. If it does turn out to be a better bet, I'll
>> change it over.
>>
>> With the BioSQL dependencies, take a look at the pom.xml file inside the
>> biojava-dna module. It declares a dependency on biojava-core. If you want to
>> add dependencies to external JARs, take a look at biojava-biosql's pom.xml
>> to see how it depends on javax.persistence. (The easiest way to add these is
>> via an IDE such as NetBeans, which is what I'm using at the moment).
>>
>> cheers,
>> Richard
>>
>> 2008/10/21 Mark Schreiber 
>>
>>> So if I want to build a BioSQL loader from Genbank then would the
>>> classes (or there wrappers) in the BioSQL Entity package need to
>>> implement Thing?  Would maven have an issue with that or would it just
>>> create a dependency on core? (you can tell I've never used Maven
>>> right).
>>>
>>> From a design point of view should Thing be an interface or an
>>> Annotation? The reason I ask is that it doesn't define any methods so
>>> it is more of a tag than an interface.
>>>
>>> Anyway, my understanding is that I would use a Genbank parser (or
>>> write one). Write a EntityReceiver interface (probably more than one
>>> given the number of entities in BioSQL, implement a EntityBuilder
>>> (again possibly more than one) that implements EntityReceiver and
>>> builds Entity beans from messages it receives. In this case I probably
>>> wouldn't provide a writer as JPA would be writing the beans to the
>>> database.  Would this be how you imagine it?
>>>
>>> - Mark
>>>
>>>
>>> On Tue, Oct 21, 2008 at 1:52 AM, Richard Holland
>>>  wrote:
>>>> (From now on I will only be posting these development messages to
>>>> biojava-dev, which is the intended purpose of that list. Those of you who
>>>> wish to keep track of things but are currently only subscribed to
>>> biojava-l
>>>> should also subscribe to biojava-dev in order to keep up to date.)
>>>>
>>>> As promised, I've committed a new package in the biojava-core module that
>>>> should help understand how to do file parsing and conversion and writing
>>> in
>>>> the new BJ3 modules. Here's an example of how to use it to write a
>>> Genbank
>>>> parser (note no parsers actually exist yet!):
>>>>
>>>> 1. Design yourself a Genbank class which implements the interface Thing
>>> and
>>>> can fully represent all the data that might possibly occur inside a
>>> Genbank
>>>> file.
>>>>
>>>> 2. Write an interface called GenbankReceiver, which extends ThingReceiver
>>>> and defines all the methods you might need in order to construct a
>>> Genbank
>>>> object in an asynchronous fashion.
>>>>
>>>> 3. Write a GenbankBuilder class which implements GenbankReceiver and
>>>> ThingBuilder. It's job is to receive data via method calls, use that data
>>> to
>>>> construct a Genbank object, then provide that object on demand.
>>>>
>>>> 4. Write a GenbankWriter class which implements GenbankReceiver and
>>>> ThingWriter. It's job is similar to GenbankBuilder, but instead of
>>>> constructing new Genbank objects, it writes Genbank records to file that
>>>> reflect the data it receives.
>>>>
>>>> 5. Write a GenbankReader class which implements ThingReader. It can read
>>>> GenbankFiles and output the data to the methods of the ThingReceiver
>>>> provided to it, which in this case could be anything which implements the
>>>> interface GenbankReceiver.
>>>>
>>>> 6. Write a GenbankEmitter class which implements ThingEmitter. It takes a
>>>> Genbank object and will fire off data from it to the provided
>>> ThingReceiver
>>>> (a GenbankReceiver instance) as if the Genbank object was being read from
>>> a
>>>> file or some other source.
>>>>
>>>> That's it! OK so it's a minimum of 6 classes instead of the original 1 or
>>> 2,
>>>> but the additional steps are necessary for flexibility in converting
>>> between
>>>> formats.
>>>>
>>>> Now to use it (you'll probably want a GenbankTools class to wrap these
>>> steps
>>>> up for user-friendliness, including various options for opening files,
>>>> etc.):
>>>>
>>>> 1. To read a file - instantiate ThingParser with your GenbankReader as
>>> the
>>>> reader, and GenbankBuilder as the receiver. Use the iterator methods on
>>>> ThingParser to get the objects out.
>>>>
>>>> 2. To write a file - instantiate ThingParser with a GenbankEmitter
>>> wrapping
>>>> your Genbank object, and a GenbankWriter as the receiver. Use the
>>> parseAll()
>>>> method on the ThingParser to dump the whole lot to your chosen output.
>>>>
>>>> The clever bit comes when you want to convert between files. Imagine
>>> you've
>>>> done all the above for Genbank, and you've also done it for FASTA. How to
>>>> convert between them? What you need to do is this:
>>>>
>>>> 1. Implement all the classes for both Genbank and FASTA.
>>>>
>>>> 2. Write a GenbankFASTAConverter class that implements
>>> ThingConverter
>>>> and GenbankReceiver, and will internally convert the data received and
>>> pass
>>>> it on out to the receiver provided, which will be a FASTAReceiver
>>> instance.
>>>> 3. Write a FASTAGenbankConverter class that operates in exactly the
>>> opposite
>>>> way, implementing ThingConverter and FASTAReceiver.
>>>>
>>>> Then to convert you use ThingParser again:
>>>>
>>>> 1. From FASTA file to Genbank object: Instantiate ThingParser with a
>>>> FASTAReader reader, a GenbankBuilder receiver, and add a
>>>> FASTAGenbankConverter instance to the converter chain. Use the iterator
>>> to
>>>> get your Genbank objects out of your FASTA file.
>>>>
>>>> 2. From FASTA file to Genbank file: Same as option 1, but provide a
>>>> GenbankWriter instead and use parseAll() instead of the iterator methos.
>>>>
>>>> 3. From FASTA object to Genbank object: Same as option 1, but provide a
>>>> FASTAEmitter wrapping your FASTA object as the reader instead.
>>>>
>>>> 4. From FASTA object to Genbank file: Same as option 1, but swap both the
>>>> reader and the receiver as per options 2 and 3.
>>>>
>>>> 5/6/7/8. From Genbank * to FASTA * - same as 1,2,3,4 but swap all
>>> mentions
>>>> of FASTA and Genbank, and use GenbankFASTAConverter instead.
>>>>
>>>> One last and very important feature of this approach is that if you
>>> discover
>>>> that nobody has written the appropriate converter for your chosen pair of
>>>> formats A and C, but converters do exist to map A to some other format B
>>> and
>>>> that other format B on to C, then you can just put the two converts A-B
>>> and
>>>> B-C into the ThingParser chain and it'll work perfectly.
>>>>
>>>> Enjoy!
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Finance Director, Eagle Genomics Ltd
>>>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>
>>
>>
>


From augustovmail-java at yahoo.com.br  Tue Oct 21 11:45:41 2008
From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo)
Date: Tue, 21 Oct 2008 13:45:41 +0200
Subject: [Biojava-l] SimpleRichAnnotation
In-Reply-To: <381a3e850810210421u54058163ncf347b57394af1b2@mail.gmail.com>
References: <381a3e850810210421u54058163ncf347b57394af1b2@mail.gmail.com>
Message-ID: <381a3e850810210445sc801d40ja36655349b5920b9@mail.gmail.com>

Hi everyone,

I am having problems with the class SimpleRichAnnotation.
I have one term t of ontology o and I put one note n (with the term t)
in an SimpleRichAnnotation object a, but in the moment i call the
method
a.getProperties(t) it didn't return the note n.
I saw in the code of Biojava that the method getProperties imports the
term t into of the ontology default before to do the search. Because
this it doesn't return the correct note.

Please, someone knows why is this method changing the ontology?

Thanks,
--
Augusto F. Vellozo



-- 
Augusto F. Vellozo


From charles at imbusch.net  Tue Oct 21 14:00:45 2008
From: charles at imbusch.net (Charles Imbusch)
Date: Tue, 21 Oct 2008 16:00:45 +0200
Subject: [Biojava-l] parsing tblastn results
In-Reply-To: 
References: <48F50908.5060307@imbusch.net>	
	
Message-ID: <48FDE08D.8000300@imbusch.net>

Thank you David and Richard for the quick replies.
I downloaded two files from 
http://bugzilla.open-bio.org/show_bug.cgi?id=2603
and tried to apply the patches. I suppose that's the way to get the modified
BlastSAXParser.java.

charlie at custodian:~/biojava-live_1.6$ patch -p0 < BlastSAXParser.java.patch
(Stripping trailing CRs from patch.)
patching file src/org/biojava/bio/program/sax/BlastSAXParser.java
Hunk #1 FAILED at 60.
Hunk #2 FAILED at 631.
Hunk #3 FAILED at 643.
Hunk #4 FAILED at 650.
4 out of 4 hunks FAILED -- saving rejects to file 
src/org/biojava/bio/program/sax/BlastSAXParser.java.rej

and similar for the other file

charlie at custodian:~/biojava-live_1.6$ patch -p0 < 
HitSectionSAXParser.java.patch
(Stripping trailing CRs from patch.)
patching file src/org/biojava/bio/program/sax/HitSectionSAXParser.java
Hunk #1 FAILED at 41.
Hunk #2 FAILED at 65.
Hunk #3 FAILED at 96.
Hunk #4 FAILED at 515.
Hunk #5 FAILED at 524.
5 out of 5 hunks FAILED -- saving rejects to file 
src/org/biojava/bio/program/sax/HitSectionSAXParser.java.rej

Obviously something went wrong, but I couldn't figure out what. I 
uploaded the rej files to
http://charles.imbusch.net/tmp/

Any hint is appreciated.

cheers,
  Charles


From crackeur at comcast.net  Wed Oct 22 02:21:57 2008
From: crackeur at comcast.net (jimmy Zhang)
Date: Tue, 21 Oct 2008 19:21:57 -0700
Subject: [Biojava-l] [ANN] VTD-XML extended edition released
References: <59a41c430810202017n226327cahefe0ed7e5f6a8df2@mail.gmail.com>
	<93b45ca50810210226t79cfbcbfhcadaedcfe8735676@mail.gmail.com>
Message-ID: <009401c933ec$f572a700$0402a8c0@your55e5f9e3d2>

The Java version of extended VTD-XmL is released and available for download.
This version supports 256 GB max file sizes and memory mapped capabilities.
The updated documentation is also available for download. In short, you can
basically do full XPath query on documents that are bigger than memory space
available on your machine.

A special thanks to Duane May who provided value suggestions and inputs and
helped refine the VTD specs to make this happen.

To download the package and the documentation, go to
https://sourceforge.net/project/downloading.php?group_id=110612&use_mirror=&filename=vtd-xml_2.4_doc.zip&64621261

https://sourceforge.net/project/downloading.php?group_id=110612&use_mirror=&filename=ximpleware_extended_2.4.zip&99532507




From pzgyuanf at gmail.com  Sun Oct 26 00:57:16 2008
From: pzgyuanf at gmail.com (pprun)
Date: Sun, 26 Oct 2008 08:57:16 +0800
Subject: [Biojava-l] Test failed for Alphabet.getSymbolMatchType method
Message-ID: 

Hi,
The current implementation uses the same condition equalsIgnoreCase for
EXACT_STRING_MATCH and MIXED_CASE_MATCH


    public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
       ...
        if (a.toString().equalsIgnoreCase(b.toString())) {
            return SymbolMatchType.EXACT_STRING_MATCH;
        }
        if (a.toString().equalsIgnoreCase(b.toString())) {
            return SymbolMatchType.MIXED_CASE_MATCH;
        }
          ...

String.equals should be used for EXACT_STRING_MATCH:

    public SymbolMatchType getSymbolMatchType(Symbol a, Symbol b) {
        ...
        if (a.toString().equals(b.toString())) {
            return SymbolMatchType.EXACT_STRING_MATCH;
        }
        if (a.toString().equalsIgnoreCase(b.toString())) {
            return SymbolMatchType.MIXED_CASE_MATCH;
        }
          ...

The test case used to identify the above bug is:

/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */
package org.biojava.core.symbol;

import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import static org.junit.Assert.*;

/**
 *
 * @author pprun
 */
public class AlphabetTest {

    public AlphabetTest() {
    }

    @BeforeClass
    public static void setUpClass() throws Exception {
    }

    @AfterClass
    public static void tearDownClass() throws Exception {
    }

    @Before
    public void setUp() {
    }

    @After
    public void tearDown() {
    }

    /**
     * Test of getSymbolMatchType method, of class Alphabet.
     */
    @Test
    public void testGetSymbolMatchType() {
        System.out.println("getSymbolMatchType");

        Alphabet testAlphabet = new Alphabet("testGetSymbolMatchType");

        // 1. exact match
        Symbol a = Symbol.get("ATGC");
        Symbol b = Symbol.get("ATGC");
        SymbolMatchType expResult = SymbolMatchType.EXACT_MATCH;
        SymbolMatchType result = testAlphabet.getSymbolMatchType(a, b);
        assertEquals(expResult, result);

        // 2. mixed case match
        a = Symbol.get("ATGC");
        b = Symbol.get("aTGC");
        expResult = SymbolMatchType.MIXED_CASE_MATCH;
        result = testAlphabet.getSymbolMatchType(a, b);
        assertEquals(expResult, result);
    }
}


BTW., how can I get the dev/test role?
Then I can contribute to the development or test (as I'm still a
beginner for bio field) for BJ3.

Thanks,
Pprun



From gabrielle_doan at gmx.net  Mon Oct 27 12:57:03 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Mon, 27 Oct 2008 13:57:03 +0100
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database
Message-ID: <4905BA9F.1060400@gmx.net>

Hi all,

I have a BioSQL database which contains all human chromsomes. For my 
recent project I have to query for a part of a sequence.
As far as I know I can get the whole sequence from the entry 
Biosequence.Seq in the BioSQL schema. So I've made this query:

SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;

But this query hasn't yield the desired string, because the length of 
this biosequence is only 100,000,020 bp. I am very confused why I get 
such a discrepancy. I have added all chromosomes with the build in 
method in BioJava addRichSequence(RichSequence seq) to the database. 
 From my raw data I know that this sequence should have a length of 
140,279,252 bp. So where is the remaining part of my sequence? I have 
observed these discrepancies on all chromsomes which are longer than 
100,000,020 bp.

Here is an abstract of my database:
bioentry_id	description	length	
2	Homo sapiens mitochondrion, complete genome.	16571	
3	Homo sapiens chromosome Y, reference assembly, complete sequence. 
57772954	
4	Homo sapiens chromosome X, reference assembly, complete sequence. 
100000020	
5	Homo sapiens chromosome 22, reference assembly, complete sequence. 
49691432	
6	Homo sapiens chromosome 21, reference assembly, complete sequence. 
46944323	
7	Homo sapiens chromosome 20, reference assembly, complete sequence. 
25960004	
8	Homo sapiens chromosome 9, reference assembly, complete sequence. 
100000020	
9	Homo sapiens chromosome 7, reference assembly, complete sequence. 
100000020	

Sequences smaller than 100,000,020 bp are correctly stored under 
Biosequence.seq.

I am grateful for any hints, which explain the behaviour of my database.

Cheers,

Gabrielle


From gabrielle_doan at gmx.net  Tue Oct 28 14:26:47 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Tue, 28 Oct 2008 15:26:47 +0100
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database]
Message-ID: <49072127.7010304@gmx.net>

Hi all,
concering the problem as described below I have found out that this 
problem also occured in BioRuby and was fixed in 2004.
See: 
http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb?cvsroot=bioruby
Unfortunately I'm clueless about BioRuby. Does anybody recognize this 
problem or understand how it was solved in BioRuby?

I am grateful for any hints.

Cheers,

Gabrielle


-------- Original-Nachricht --------
Betreff: [Biojava-l] differences between read in sequence and stored 
sequence in database
Datum: Mon, 27 Oct 2008 13:57:03 +0100
Von: Gabrielle Doan 
An: biojava-l at biojava.org

Hi all,

I have a BioSQL database which contains all human chromsomes. For my
recent project I have to query for a part of a sequence.
As far as I know I can get the whole sequence from the entry
Biosequence.Seq in the BioSQL schema. So I've made this query:

SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;

But this query hasn't yield the desired string, because the length of
this biosequence is only 100,000,020 bp. I am very confused why I get
such a discrepancy. I have added all chromosomes with the build in
method in BioJava addRichSequence(RichSequence seq) to the database.
 From my raw data I know that this sequence should have a length of
140,279,252 bp. So where is the remaining part of my sequence? I have
observed these discrepancies on all chromsomes which are longer than
100,000,020 bp.

Here is an abstract of my database:
bioentry_id	description	length	
2	Homo sapiens mitochondrion, complete genome.	16571	
3	Homo sapiens chromosome Y, reference assembly, complete sequence.
57772954	
4	Homo sapiens chromosome X, reference assembly, complete sequence.
100000020	
5	Homo sapiens chromosome 22, reference assembly, complete sequence.
49691432	
6	Homo sapiens chromosome 21, reference assembly, complete sequence.
46944323	
7	Homo sapiens chromosome 20, reference assembly, complete sequence.
25960004	
8	Homo sapiens chromosome 9, reference assembly, complete sequence.
100000020	
9	Homo sapiens chromosome 7, reference assembly, complete sequence.
100000020	

Sequences smaller than 100,000,020 bp are correctly stored under
Biosequence.seq.

I am grateful for any hints, which explain the behaviour of my database.

Cheers,

Gabrielle
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l



From dtoomey at rcsi.ie  Wed Oct 29 10:45:45 2008
From: dtoomey at rcsi.ie (David Toomey)
Date: Wed, 29 Oct 2008 10:45:45 +0000
Subject: [Biojava-l] How to get full query description from blast result
Message-ID: 

Hi

I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query

Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
falciparum (isolate 3D7) GN=ABRA

I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation

Can anyone point me in the right direction for retrieving the complete query description?

Thanks

Dave




From holland at eaglegenomics.com  Thu Oct 30 14:07:42 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 30 Oct 2008 14:07:42 +0000
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database]
In-Reply-To: <49072127.7010304@gmx.net>
References: <49072127.7010304@gmx.net>
Message-ID: 

Hello.

Sorry for the delayed reply - I've been away on business all week.

The similar Ruby issue (and solution) is discussed here:

http://portal.open-bio.org/pipermail/bioruby/2004-March.txt

How did you parse the files in the first place? Did you use the new
GenBank parsers (BJX), or the older ones? This will help indicate
where the problem lies - the data will have been truncated at the
point it was parsed from file, so the data in your database will
reflect this and you'll have to reload it once the appropriate parser
has been fixed.

If it was the newer BJX parser, then the problem most probably lies in
this regex from org.biojavax.bio.seq.io.GenbankFormat, which can
probably be fixed in a similar manner to the Ruby equivalent dicussed
in the posting above:

    protected static final Pattern sectp =
Pattern.compile("^(\\s{0,8}(\\S+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");

Could someone volunteer to develop and test a fix? If you come up with
something, please commit it to the SVN trunk.

cheers,
Richard


2008/10/28 Gabrielle Doan :
> Hi all,
> concering the problem as described below I have found out that this problem
> also occured in BioRuby and was fixed in 2004.
> See:
> http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb?cvsroot=bioruby
> Unfortunately I'm clueless about BioRuby. Does anybody recognize this
> problem or understand how it was solved in BioRuby?
>
> I am grateful for any hints.
>
> Cheers,
>
> Gabrielle
>
>
> -------- Original-Nachricht --------
> Betreff: [Biojava-l] differences between read in sequence and stored
> sequence in database
> Datum: Mon, 27 Oct 2008 13:57:03 +0100
> Von: Gabrielle Doan 
> An: biojava-l at biojava.org
>
> Hi all,
>
> I have a BioSQL database which contains all human chromsomes. For my
> recent project I have to query for a part of a sequence.
> As far as I know I can get the whole sequence from the entry
> Biosequence.Seq in the BioSQL schema. So I've made this query:
>
> SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
>
> But this query hasn't yield the desired string, because the length of
> this biosequence is only 100,000,020 bp. I am very confused why I get
> such a discrepancy. I have added all chromosomes with the build in
> method in BioJava addRichSequence(RichSequence seq) to the database.
> From my raw data I know that this sequence should have a length of
> 140,279,252 bp. So where is the remaining part of my sequence? I have
> observed these discrepancies on all chromsomes which are longer than
> 100,000,020 bp.
>
> Here is an abstract of my database:
> bioentry_id     description     length
> 2       Homo sapiens mitochondrion, complete genome.    16571
> 3       Homo sapiens chromosome Y, reference assembly, complete sequence.
> 57772954
> 4       Homo sapiens chromosome X, reference assembly, complete sequence.
> 100000020
> 5       Homo sapiens chromosome 22, reference assembly, complete sequence.
> 49691432
> 6       Homo sapiens chromosome 21, reference assembly, complete sequence.
> 46944323
> 7       Homo sapiens chromosome 20, reference assembly, complete sequence.
> 25960004
> 8       Homo sapiens chromosome 9, reference assembly, complete sequence.
> 100000020
> 9       Homo sapiens chromosome 7, reference assembly, complete sequence.
> 100000020
>
> Sequences smaller than 100,000,020 bp are correctly stored under
> Biosequence.seq.
>
> I am grateful for any hints, which explain the behaviour of my database.
>
> Cheers,
>
> Gabrielle
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Thu Oct 30 14:10:12 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 30 Oct 2008 14:10:12 +0000
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: 
References: 
Message-ID: 

Good question!

Can someone who knows a lot about the blast parser internals provide
David with an answer to his question?

cheers,
Richard

2008/10/29 David Toomey :
> Hi
>
> I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query
>
> Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> falciparum (isolate 3D7) GN=ABRA
>
> I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation
>
> Can anyone point me in the right direction for retrieving the complete query description?
>
> Thanks
>
> Dave
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From markjschreiber at gmail.com  Fri Oct 31 07:26:35 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 31 Oct 2008 15:26:35 +0800
Subject: [Biojava-l] differences between read in sequence and stored
	sequence in database
In-Reply-To: <4905BA9F.1060400@gmx.net>
References: <4905BA9F.1060400@gmx.net>
Message-ID: <93b45ca50810310026o6ee35a61sf2815c3547e1e679@mail.gmail.com>

Could this be a database implementation issue? Is there a limit on how
long a field can be in your DB?

- Mark

On Mon, Oct 27, 2008 at 8:57 PM, Gabrielle Doan  wrote:
>
> Hi all,
>
> I have a BioSQL database which contains all human chromsomes. For my recent project I have to query for a part of a sequence.
> As far as I know I can get the whole sequence from the entry Biosequence.Seq in the BioSQL schema. So I've made this query:
>
> SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
>
> But this query hasn't yield the desired string, because the length of this biosequence is only 100,000,020 bp. I am very confused why I get such a discrepancy. I have added all chromosomes with the build in method in BioJava addRichSequence(RichSequence seq) to the database. From my raw data I know that this sequence should have a length of 140,279,252 bp. So where is the remaining part of my sequence? I have observed these discrepancies on all chromsomes which are longer than 100,000,020 bp.
>
> Here is an abstract of my database:
> bioentry_id     description     length
> 2       Homo sapiens mitochondrion, complete genome.    16571
> 3       Homo sapiens chromosome Y, reference assembly, complete sequence. 57772954
> 4       Homo sapiens chromosome X, reference assembly, complete sequence. 100000020
> 5       Homo sapiens chromosome 22, reference assembly, complete sequence. 49691432
> 6       Homo sapiens chromosome 21, reference assembly, complete sequence. 46944323
> 7       Homo sapiens chromosome 20, reference assembly, complete sequence. 25960004
> 8       Homo sapiens chromosome 9, reference assembly, complete sequence. 100000020
> 9       Homo sapiens chromosome 7, reference assembly, complete sequence. 100000020
>
> Sequences smaller than 100,000,020 bp are correctly stored under Biosequence.seq.
>
> I am grateful for any hints, which explain the behaviour of my database.
>
> Cheers,
>
> Gabrielle
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From markjschreiber at gmail.com  Fri Oct 31 08:00:35 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 31 Oct 2008 16:00:35 +0800
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: 
References: 
	
Message-ID: <93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>

Hi -

If you use the BlastEcho program on the cookbook pages you can find
out if and how the information is being parsed and where it goes.

It is possible it is not parsed. In this case you could add a feature request.

- Mark

On Thu, Oct 30, 2008 at 10:10 PM, Richard Holland
 wrote:
>
> Good question!
>
> Can someone who knows a lot about the blast parser internals provide
> David with an answer to his question?
>
> cheers,
> Richard
>
> 2008/10/29 David Toomey :
> > Hi
> >
> > I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query
> >
> > Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> > falciparum (isolate 3D7) GN=ABRA
> >
> > I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation
> >
> > Can anyone point me in the right direction for retrieving the complete query description?
> >
> > Thanks
> >
> > Dave
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From community at struck.lu  Fri Oct 31 10:05:00 2008
From: community at struck.lu (community at struck.lu)
Date: Fri, 31 Oct 2008 11:05:00 +0100
Subject: [Biojava-l] SCF: support for ambiguities
Message-ID: 

Hello,


I am using the SCF class in the context of HIV-1 population sequencing. In
this context we do have sometimes ambiguous base calls. To support them I
extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.

Therefore I simply added the following code to the "decode" function:

#########################
        public Symbol decode(byte call) throws IllegalSymbolException {

            //get the DNA Alphabet
            Alphabet dna = DNATools.getDNA();

            char c = (char) call;
            switch (c) {
                case 'a':
                case 'A':
                    return DNATools.a();
                case 'c':
                case 'C':
                    return DNATools.c();
                case 'g':
                case 'G':
                    return DNATools.g();
                case 't':
                case 'T':
                    return DNATools.t();
                case 'n':
                case 'N':
                    return DNATools.n();
                case '-':
                    return DNATools.getDNA().getGapSymbol();
                case 'w':
                case 'W':
                    //make the 'W' symbol
                    Set symbolsThatMakeW = new HashSet();
                    symbolsThatMakeW.add(DNATools.a());
                    symbolsThatMakeW.add(DNATools.t());
                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
                    return w;
                case 's':
                case 'S':
                    //make the 'S' symbol
                    Set symbolsThatMakeS = new HashSet();
                    symbolsThatMakeS.add(DNATools.c());
                    symbolsThatMakeS.add(DNATools.g());
                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
                    return s;
... (and so on)
#########################

Is this the right way to do it? And if so, how can this code be submitted to
the official biojava source code?


Best regards,
Daniel Struck
_________________________________________________________
Mail sent using root eSolutions Webmailer - www.root.lu




From dtoomey at rcsi.ie  Fri Oct 31 12:07:19 2008
From: dtoomey at rcsi.ie (David Toomey)
Date: Fri, 31 Oct 2008 12:07:19 +0000
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: <93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
References: 
	
	<93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
Message-ID: 

Hi Mark

I tried that and it appears that it is not being parsed. Only the portion of the line up to the first space is returned as queryId. The rest of the line is not returned.
Could this be added to the blast parser?

Cheers

Dave


-----Original Message-----
From: Mark Schreiber [mailto:markjschreiber at gmail.com]
Sent: 31 October 2008 08:01
To: holland at eaglegenomics.com
Cc: David Toomey; biojava-l at biojava.org
Subject: Re: [Biojava-l] How to get full query description from blast result

Hi -

If you use the BlastEcho program on the cookbook pages you can find
out if and how the information is being parsed and where it goes.

It is possible it is not parsed. In this case you could add a feature request.

- Mark

On Thu, Oct 30, 2008 at 10:10 PM, Richard Holland
 wrote:
>
> Good question!
>
> Can someone who knows a lot about the blast parser internals provide
> David with an answer to his question?
>
> cheers,
> Richard
>
> 2008/10/29 David Toomey :
> > Hi
> >
> > I am parsing blast results and I need to get the complete query description line but I can only work out how to get the first part of the line. So for example in the blast result query
> >
> > Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> > falciparum (isolate 3D7) GN=ABRA
> >
> > I need to get all of the description above but I can only seem to retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the queryId property of the annotation
> >
> > Can anyone point me in the right direction for retrieving the complete query description?
> >
> > Thanks
> >
> > Dave
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l



From simon.foote at nrc-cnrc.gc.ca  Fri Oct 31 11:56:30 2008
From: simon.foote at nrc-cnrc.gc.ca (Simon Foote)
Date: Fri, 31 Oct 2008 07:56:30 -0400
Subject: [Biojava-l] How to get full query description from blast result
In-Reply-To: <93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
References: 
	<93b45ca50810310100w5e922161iaf79469050afbc3c@mail.gmail.com>
Message-ID: <490AF26E.7000604@nrc-cnrc.gc.ca>

Mark is right
A quick look at the code shows that for the query line, it extracts 
everything upto the first whitespace and puts that into the queryId and 
everything else is discarded.
To get the full description, some additional code is needed to populate 
a queryDescription with everything from the query line upto the query 
length information which is contained in parentheses.

Simon

Bioinformatics Specialist
Institute for Biological Sciences | Institut des sciences biologiques
National Research Council of Canada | Conseil national de recherches Canada
Ottawa, Canada K1A 0R6
Telephone | T?l?phone 613-990-3600 / Facsimile | T?l?copieur 613-990-9092
Government of Canada | Gouvernement du Canada



Mark Schreiber wrote:
>
> Hi -
>
> If you use the BlastEcho program on the cookbook pages you can find
> out if and how the information is being parsed and where it goes.
>
> It is possible it is not parsed. In this case you could add a feature 
> request.
>
> - Mark
>
> On Thu, Oct 30, 2008 at 10:10 PM, Richard Holland
>  wrote:
> >
> > Good question!
> >
> > Can someone who knows a lot about the blast parser internals provide
> > David with an answer to his question?
> >
> > cheers,
> > Richard
> >
> > 2008/10/29 David Toomey :
> > > Hi
> > >
> > > I am parsing blast results and I need to get the complete query 
> description line but I can only work out how to get the first part of 
> the line. So for example in the blast result query
> > >
> > > Query= sp|Q8I5D2|ABRA_PLAF7 101 kDa malaria antigen OS=Plasmodium
> > > falciparum (isolate 3D7) GN=ABRA
> > >
> > > I need to get all of the description above but I can only seem to 
> retrieve the first part 'sp|Q8I5D2|ABRA_PLAF7' which I get from the 
> queryId property of the annotation
> > >
> > > Can anyone point me in the right direction for retrieving the 
> complete query description?
> > >
> > > Thanks
> > >
> > > Dave
> > >
> > >
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >
> >
> >
> >
> > --
> > Richard Holland, BSc MBCS
> > Finance Director, Eagle Genomics Ltd
> > M: +44 7500 438846 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From benb at fruitfly.org  Fri Oct 31 13:38:32 2008
From: benb at fruitfly.org (Ben Berman)
Date: Fri, 31 Oct 2008 06:38:32 -0700
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
Message-ID: 


Is there a reason why IUPAC ambiguity codes have never been added to  
DNATools?  Would it hurt the performance of symbol lookups?


On Oct 31, 2008, at 3:05 AM, community at struck.lu wrote:

> Hello,
>
>
> I am using the SCF class in the context of HIV-1 population  
> sequencing. In
> this context we do have sometimes ambiguous base calls. To support  
> them I
> extended the SCF class to allow for IUPAC ambiguities up to 2  
> nucleotides.
>
> Therefore I simply added the following code to the "decode" function:
>
> #########################
>        public Symbol decode(byte call) throws IllegalSymbolException {
>
>            //get the DNA Alphabet
>            Alphabet dna = DNATools.getDNA();
>
>            char c = (char) call;
>            switch (c) {
>                case 'a':
>                case 'A':
>                    return DNATools.a();
>                case 'c':
>                case 'C':
>                    return DNATools.c();
>                case 'g':
>                case 'G':
>                    return DNATools.g();
>                case 't':
>                case 'T':
>                    return DNATools.t();
>                case 'n':
>                case 'N':
>                    return DNATools.n();
>                case '-':
>                    return DNATools.getDNA().getGapSymbol();
>                case 'w':
>                case 'W':
>                    //make the 'W' symbol
>                    Set symbolsThatMakeW = new HashSet();
>                    symbolsThatMakeW.add(DNATools.a());
>                    symbolsThatMakeW.add(DNATools.t());
>                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>                    return w;
>                case 's':
>                case 'S':
>                    //make the 'S' symbol
>                    Set symbolsThatMakeS = new HashSet();
>                    symbolsThatMakeS.add(DNATools.c());
>                    symbolsThatMakeS.add(DNATools.g());
>                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>                    return s;
> ... (and so on)
> #########################
>
> Is this the right way to do it? And if so, how can this code be  
> submitted to
> the official biojava source code?
>
>
> Best regards,
> Daniel Struck
> _________________________________________________________
> Mail sent using root eSolutions Webmailer - www.root.lu
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

----
Ben Berman, PhD
Research Associate, USC Epigenome Center
Harlyne J. Norris Research Tower
1450 Biggy St.
Room #G511, MC 9601
Los Angeles, CA 90033



From holland at eaglegenomics.com  Fri Oct 31 13:56:54 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 31 Oct 2008 13:56:54 +0000
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
Message-ID: 

It is the correct method, yes.

However your code constructs a new hash set every time it does the
check for W or S etc.. It would be much more efficient to create
class-static references to the ambiguity symbols you need, instead of
(re)creating them every time they're encountered. A class-static gap
symbol reference would also be good in this situation.

cheers,
Richard



2008/10/31 community at struck.lu :
> Hello,
>
>
> I am using the SCF class in the context of HIV-1 population sequencing. In
> this context we do have sometimes ambiguous base calls. To support them I
> extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
>
> Therefore I simply added the following code to the "decode" function:
>
> #########################
>        public Symbol decode(byte call) throws IllegalSymbolException {
>
>            //get the DNA Alphabet
>            Alphabet dna = DNATools.getDNA();
>
>            char c = (char) call;
>            switch (c) {
>                case 'a':
>                case 'A':
>                    return DNATools.a();
>                case 'c':
>                case 'C':
>                    return DNATools.c();
>                case 'g':
>                case 'G':
>                    return DNATools.g();
>                case 't':
>                case 'T':
>                    return DNATools.t();
>                case 'n':
>                case 'N':
>                    return DNATools.n();
>                case '-':
>                    return DNATools.getDNA().getGapSymbol();
>                case 'w':
>                case 'W':
>                    //make the 'W' symbol
>                    Set symbolsThatMakeW = new HashSet();
>                    symbolsThatMakeW.add(DNATools.a());
>                    symbolsThatMakeW.add(DNATools.t());
>                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>                    return w;
>                case 's':
>                case 'S':
>                    //make the 'S' symbol
>                    Set symbolsThatMakeS = new HashSet();
>                    symbolsThatMakeS.add(DNATools.c());
>                    symbolsThatMakeS.add(DNATools.g());
>                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>                    return s;
> ... (and so on)
> #########################
>
> Is this the right way to do it? And if so, how can this code be submitted to
> the official biojava source code?
>
>
> Best regards,
> Daniel Struck
> _________________________________________________________
> Mail sent using root eSolutions Webmailer - www.root.lu
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Fri Oct 31 14:40:10 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 31 Oct 2008 14:40:10 +0000
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
	
Message-ID: 

It would be fine to add them there too. You'd still need to modify the
SCF parser though in order for it to be able to know about them.

cheers,
Richard

2008/10/31 Ben Berman :
>
> Is there a reason why IUPAC ambiguity codes have never been added to
> DNATools?  Would it hurt the performance of symbol lookups?
>
>
> On Oct 31, 2008, at 3:05 AM, community at struck.lu wrote:
>
>> Hello,
>>
>>
>> I am using the SCF class in the context of HIV-1 population sequencing. In
>> this context we do have sometimes ambiguous base calls. To support them I
>> extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
>>
>> Therefore I simply added the following code to the "decode" function:
>>
>> #########################
>>       public Symbol decode(byte call) throws IllegalSymbolException {
>>
>>           //get the DNA Alphabet
>>           Alphabet dna = DNATools.getDNA();
>>
>>           char c = (char) call;
>>           switch (c) {
>>               case 'a':
>>               case 'A':
>>                   return DNATools.a();
>>               case 'c':
>>               case 'C':
>>                   return DNATools.c();
>>               case 'g':
>>               case 'G':
>>                   return DNATools.g();
>>               case 't':
>>               case 'T':
>>                   return DNATools.t();
>>               case 'n':
>>               case 'N':
>>                   return DNATools.n();
>>               case '-':
>>                   return DNATools.getDNA().getGapSymbol();
>>               case 'w':
>>               case 'W':
>>                   //make the 'W' symbol
>>                   Set symbolsThatMakeW = new HashSet();
>>                   symbolsThatMakeW.add(DNATools.a());
>>                   symbolsThatMakeW.add(DNATools.t());
>>                   Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>>                   return w;
>>               case 's':
>>               case 'S':
>>                   //make the 'S' symbol
>>                   Set symbolsThatMakeS = new HashSet();
>>                   symbolsThatMakeS.add(DNATools.c());
>>                   symbolsThatMakeS.add(DNATools.g());
>>                   Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>>                   return s;
>> ... (and so on)
>> #########################
>>
>> Is this the right way to do it? And if so, how can this code be submitted
>> to
>> the official biojava source code?
>>
>>
>> Best regards,
>> Daniel Struck
>> _________________________________________________________
>> Mail sent using root eSolutions Webmailer - www.root.lu
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
> ----
> Ben Berman, PhD
> Research Associate, USC Epigenome Center
> Harlyne J. Norris Research Tower
> 1450 Biggy St.
> Room #G511, MC 9601
> Los Angeles, CA 90033
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From community at struck.lu  Fri Oct 31 16:06:45 2008
From: community at struck.lu (community at struck.lu)
Date: Fri, 31 Oct 2008 17:06:45 +0100
Subject: [Biojava-l] SCF: support for ambiguities
Message-ID: 

True. It was a first quick and dirty hack to get the rest of my project going.

I think adding support of the IUPAC ambiguities to DNATools would be the most
approbate solution. The SCF class can then easily be adapted.

Are there any plans to do so?
If not, I could give it a try and submit a patch for DNATools and SCF.

Greetings,
Daniel

"Richard Holland"  wrote:

> It is the correct method, yes.
> 
> However your code constructs a new hash set every time it does the
> check for W or S etc.. It would be much more efficient to create
> class-static references to the ambiguity symbols you need, instead of
> (re)creating them every time they're encountered. A class-static gap
> symbol reference would also be good in this situation.
> 
> cheers,
> Richard
> 
> 
> 
> 2008/10/31 community at struck.lu :
> > Hello,
> >
> >
> > I am using the SCF class in the context of HIV-1 population sequencing. In
> > this context we do have sometimes ambiguous base calls. To support them I
> > extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
> >
> > Therefore I simply added the following code to the "decode" function:
> >
> > #########################
> >        public Symbol decode(byte call) throws IllegalSymbolException {
> >
> >            //get the DNA Alphabet
> >            Alphabet dna = DNATools.getDNA();
> >
> >            char c = (char) call;
> >            switch (c) {
> >                case 'a':
> >                case 'A':
> >                    return DNATools.a();
> >                case 'c':
> >                case 'C':
> >                    return DNATools.c();
> >                case 'g':
> >                case 'G':
> >                    return DNATools.g();
> >                case 't':
> >                case 'T':
> >                    return DNATools.t();
> >                case 'n':
> >                case 'N':
> >                    return DNATools.n();
> >                case '-':
> >                    return DNATools.getDNA().getGapSymbol();
> >                case 'w':
> >                case 'W':
> >                    //make the 'W' symbol
> >                    Set symbolsThatMakeW = new HashSet();
> >                    symbolsThatMakeW.add(DNATools.a());
> >                    symbolsThatMakeW.add(DNATools.t());
> >                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
> >                    return w;
> >                case 's':
> >                case 'S':
> >                    //make the 'S' symbol
> >                    Set symbolsThatMakeS = new HashSet();
> >                    symbolsThatMakeS.add(DNATools.c());
> >                    symbolsThatMakeS.add(DNATools.g());
> >                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
> >                    return s;
> > ... (and so on)
> > #########################
> >
> > Is this the right way to do it? And if so, how can this code be submitted
to
> > the official biojava source code?
> >
> >
> > Best regards,
> > Daniel Struck
> > _________________________________________________________
> > Mail sent using root eSolutions Webmailer - www.root.lu
> >
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> 
> 


_________________________________________________________
Mail sent using root eSolutions Webmailer - www.root.lu




From holland at eaglegenomics.com  Fri Oct 31 16:14:30 2008
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 31 Oct 2008 16:14:30 +0000
Subject: [Biojava-l] SCF: support for ambiguities
In-Reply-To: 
References: 
Message-ID: 

A patch would be much appreciated!

cheers,
Richard

2008/10/31 community at struck.lu :
> True. It was a first quick and dirty hack to get the rest of my project going.
>
> I think adding support of the IUPAC ambiguities to DNATools would be the most
> approbate solution. The SCF class can then easily be adapted.
>
> Are there any plans to do so?
> If not, I could give it a try and submit a patch for DNATools and SCF.
>
> Greetings,
> Daniel
>
> "Richard Holland"  wrote:
>
>> It is the correct method, yes.
>>
>> However your code constructs a new hash set every time it does the
>> check for W or S etc.. It would be much more efficient to create
>> class-static references to the ambiguity symbols you need, instead of
>> (re)creating them every time they're encountered. A class-static gap
>> symbol reference would also be good in this situation.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/31 community at struck.lu :
>> > Hello,
>> >
>> >
>> > I am using the SCF class in the context of HIV-1 population sequencing. In
>> > this context we do have sometimes ambiguous base calls. To support them I
>> > extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
>> >
>> > Therefore I simply added the following code to the "decode" function:
>> >
>> > #########################
>> >        public Symbol decode(byte call) throws IllegalSymbolException {
>> >
>> >            //get the DNA Alphabet
>> >            Alphabet dna = DNATools.getDNA();
>> >
>> >            char c = (char) call;
>> >            switch (c) {
>> >                case 'a':
>> >                case 'A':
>> >                    return DNATools.a();
>> >                case 'c':
>> >                case 'C':
>> >                    return DNATools.c();
>> >                case 'g':
>> >                case 'G':
>> >                    return DNATools.g();
>> >                case 't':
>> >                case 'T':
>> >                    return DNATools.t();
>> >                case 'n':
>> >                case 'N':
>> >                    return DNATools.n();
>> >                case '-':
>> >                    return DNATools.getDNA().getGapSymbol();
>> >                case 'w':
>> >                case 'W':
>> >                    //make the 'W' symbol
>> >                    Set symbolsThatMakeW = new HashSet();
>> >                    symbolsThatMakeW.add(DNATools.a());
>> >                    symbolsThatMakeW.add(DNATools.t());
>> >                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>> >                    return w;
>> >                case 's':
>> >                case 'S':
>> >                    //make the 'S' symbol
>> >                    Set symbolsThatMakeS = new HashSet();
>> >                    symbolsThatMakeS.add(DNATools.c());
>> >                    symbolsThatMakeS.add(DNATools.g());
>> >                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>> >                    return s;
>> > ... (and so on)
>> > #########################
>> >
>> > Is this the right way to do it? And if so, how can this code be submitted
> to
>> > the official biojava source code?
>> >
>> >
>> > Best regards,
>> > Daniel Struck
>> > _________________________________________________________
>> > Mail sent using root eSolutions Webmailer - www.root.lu
>> >
>> >
>> > _______________________________________________
>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>>
>>
>
>
> _________________________________________________________
> Mail sent using root eSolutions Webmailer - www.root.lu
>
>
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From gabrielle_doan at gmx.net  Fri Oct 31 15:09:56 2008
From: gabrielle_doan at gmx.net (Gabrielle Doan)
Date: Fri, 31 Oct 2008 15:09:56 -0000
Subject: [Biojava-l] differences between read in sequence and stored
 sequence in database]
In-Reply-To: 
References: <49072127.7010304@gmx.net>
	
Message-ID: <490B1FB3.7010607@gmx.net>

Hi all,
I've changed the regular expression in 
org.biojavax.bio.seq.io.GenbankFormat from


protected static final Pattern sectp =
Pattern.compile("^(\\s{0,8}(\\S+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");
<\code>

to


protected static final Pattern sectp =
Pattern.compile("^(\\s{0,8}([A-Za-z]+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");
<\code>

like in BioRuby 
(http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb.diff?r1=0.24&r2=0.25&cvsroot=bioruby). 
But than features like D-loop can't be detected. So this is not the 
solution for my problem.
The reason for the truncation is readSection(BufferedReader br) in 
org.biojavax.bio.seq.io.GenbankFormat.


             if (line==null || line.length()==0 || (!line.startsWith(" 
") && linecount++>0)) {
                     // dump out last part of section
                     section.add(new String[]{currKey,currVal.toString()});
                     br.reset();
                     done = true;
<\snip>

The condition in the if-clause will ignore lines which don't begin with 
a whitespace, so this line will be read


  99999961  cccgcccaca cccctcggcc ctgccctctg gccatacagg ttctcggtgg 
tgttgaagag
<\snip>

and this line won't be read:

100000021 gtcctcgggc tccggcttgg tgctcacgca cacaggaaag tcagcttctc ctgggagggc
<\snip>

If you change the if-statement to this:


String firstSecKey = section.size() == 0 ? "" : 
((String[])section.get(0))[0];

if (line==null || line.length()==0 || (!line.startsWith(" ") && 
linecount++>0 && ( !firstSecKey.equals(START_SEQUENCE_TAG)  || 
line.startsWith(END_SEQUENCE_TAG))))
<\snip>

You can add the whole sequence without truncation to the database.
I have attached GenbankFormat.java in this mail. Can anybody check the 
method for me and commit it? Since I'm not a BioJava specialist.

Cheers,
Gabrielle



Richard Holland schrieb:
> Hello.
> 
> Sorry for the delayed reply - I've been away on business all week.
> 
> The similar Ruby issue (and solution) is discussed here:
> 
> http://portal.open-bio.org/pipermail/bioruby/2004-March.txt
> 
> How did you parse the files in the first place? Did you use the new
> GenBank parsers (BJX), or the older ones? This will help indicate
> where the problem lies - the data will have been truncated at the
> point it was parsed from file, so the data in your database will
> reflect this and you'll have to reload it once the appropriate parser
> has been fixed.
> 
> If it was the newer BJX parser, then the problem most probably lies in
> this regex from org.biojavax.bio.seq.io.GenbankFormat, which can
> probably be fixed in a similar manner to the Ruby equivalent dicussed
> in the posting above:
> 
>     protected static final Pattern sectp =
> Pattern.compile("^(\\s{0,8}(\\S+)\\s{1,7}(.*)|\\s{21}(/\\S+?)=(.*)|\\s{21}(/\\S+))$");
> 
> Could someone volunteer to develop and test a fix? If you come up with
> something, please commit it to the SVN trunk.
> 
> cheers,
> Richard
> 
> 
> 2008/10/28 Gabrielle Doan :
>> Hi all,
>> concering the problem as described below I have found out that this problem
>> also occured in BioRuby and was fixed in 2004.
>> See:
>> http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/lib/bio/db.rb?cvsroot=bioruby
>> Unfortunately I'm clueless about BioRuby. Does anybody recognize this
>> problem or understand how it was solved in BioRuby?
>>
>> I am grateful for any hints.
>>
>> Cheers,
>>
>> Gabrielle
>>
>>
>> -------- Original-Nachricht --------
>> Betreff: [Biojava-l] differences between read in sequence and stored
>> sequence in database
>> Datum: Mon, 27 Oct 2008 13:57:03 +0100
>> Von: Gabrielle Doan 
>> An: biojava-l at biojava.org
>>
>> Hi all,
>>
>> I have a BioSQL database which contains all human chromsomes. For my
>> recent project I have to query for a part of a sequence.
>> As far as I know I can get the whole sequence from the entry
>> Biosequence.Seq in the BioSQL schema. So I've made this query:
>>
>> SELECT SUBSTRING(bs.seq, 131615042, 131626262) FROM biosequence bs;
>>
>> But this query hasn't yield the desired string, because the length of
>> this biosequence is only 100,000,020 bp. I am very confused why I get
>> such a discrepancy. I have added all chromosomes with the build in
>> method in BioJava addRichSequence(RichSequence seq) to the database.
>> From my raw data I know that this sequence should have a length of
>> 140,279,252 bp. So where is the remaining part of my sequence? I have
>> observed these discrepancies on all chromsomes which are longer than
>> 100,000,020 bp.
>>
>> Here is an abstract of my database:
>> bioentry_id     description     length
>> 2       Homo sapiens mitochondrion, complete genome.    16571
>> 3       Homo sapiens chromosome Y, reference assembly, complete sequence.
>> 57772954
>> 4       Homo sapiens chromosome X, reference assembly, complete sequence.
>> 100000020
>> 5       Homo sapiens chromosome 22, reference assembly, complete sequence.
>> 49691432
>> 6       Homo sapiens chromosome 21, reference assembly, complete sequence.
>> 46944323
>> 7       Homo sapiens chromosome 20, reference assembly, complete sequence.
>> 25960004
>> 8       Homo sapiens chromosome 9, reference assembly, complete sequence.
>> 100000020
>> 9       Homo sapiens chromosome 7, reference assembly, complete sequence.
>> 100000020
>>
>> Sequences smaller than 100,000,020 bp are correctly stored under
>> Biosequence.seq.
>>
>> I am grateful for any hints, which explain the behaviour of my database.
>>
>> Cheers,
>>
>> Gabrielle
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: GenbankFormat.java
Type: text/x-java
Size: 48624 bytes
Desc: not available
URL: