From andreas at sdsc.edu  Fri Mar  5 11:56:40 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 5 Mar 2010 08:56:40 -0800
Subject: [Biojava-l] Google summer of code
Message-ID: <59a41c431003050856v17c83b80sf1fb59f2587c9cd1@mail.gmail.com>

Hi,

The Open Bioinformatics Foundation (BioJava's mother organisation) is
preparing an application for the Google Summer of Code. If you are
interested in becoming a mentor for a BioJava related project, you can join
us in the application. If you are a student and are interested in a project,
please take a look at these pages:

http://www.open-bio.org/wiki/Google_Summer_of_Code

http://biojava.org/wiki/Google_Summer_of_Code

Andreas

From jeedward at yahoo.com  Mon Mar  8 10:44:05 2010
From: jeedward at yahoo.com (John Edward)
Date: Mon, 8 Mar 2010 07:44:05 -0800 (PST)
Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010
Message-ID: <800341.81267.qm@web45915.mail.sp1.yahoo.com>

It
would be highly appreciated if you could share this announcement with your
colleagues, students and individuals whose research is in bioinformatics,
computational biology, genomics, data-mining, and related areas.
 
Call
for papers: BCBGC-10, USA, July 2010
 
The
2010 International Conference on Bioinformatics, Computational Biology,
Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will
be held during 12-14 of July 2010 in Orlando, FL, USA.  BCBGC is an important event in the areas of
bioinformatics, computational biology, genomics and chemoinformatics and
focuses on all areas related to the conference.
 
The
conference will be held at the same time and location where several other major
international conferences will be taking place. The conference will be held as
part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during
July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to
promote research and developmental activities in computer science, information
technology, control engineering, and related fields. Another goal is to promote
the dissemination of research to a multidisciplinary audience and to facilitate
communication among researchers, developers, practitioners in different fields.
The following conferences are planned to be organized as part of MULTICONF-10.
 
?           International Conference on
Artificial Intelligence and Pattern Recognition (AIPR-10)
?            International Conference on Automation,
Robotics and Control Systems (ARCS-10)
?           International Conference on
Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10)
?           International Conference on Computer
Communications and Networks (CCN-10)
?           International Conference on
Enterprise Information Systems and Web Technologies (EISWT-10)
?           International Conference on High
Performance Computing Systems (HPCS-10)
?           International Conference on
Information Security and Privacy (ISP-10) 
?           International Conference on Image and
Video Processing and Computer Vision (IVPCV-10)
?           International Conference on Software
Engineering Theory and Practice (SETP-10) 
?           International Conference on
Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) 
 
 
MULTICONF-10
will be held at Imperial Swan Hotel and Suites.  It is a full-service resort that puts you in the middle of the fun!
Located 1/2 block south of the famed International Drive, the hotel is just
minutes from great entertainment like Walt Disney World? Resort, Universal
Studios and Sea World Orlando. Guests can enjoy free scheduled transportation
to these theme parks, as well as spacious accommodations, outdoor pools and
on-site dining ? all situated on 10 tropically landscaped acres. Here, guests
can experience a full-service resort with discount hotel pricing in Orlando.
 
We
invite draft paper submissions. Please see the website http://www.PromoteResearch.org for
more details.
 
Sincerely
John
Edward


From sheoran143 at gmail.com  Mon Mar  8 16:11:05 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Mon, 08 Mar 2010 15:11:05 -0600
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
 current maven based project
Message-ID: <4B9567E9.7080909@gmail.com>

Hi
I was making a local version of current maven project on my machine so 
that i can fix some reference related bugs in biojava. But when I build 
the local version and tried to use it. I got an error on method
RichObjectFactory.connectToBioSql(Object session) of current version of 
bio-java live. when I had a look on it I saw a comment on it

     "// commenting out for the moment, since it prevents core from 
compiling.
     // TODO: move to BioSql module"

then I uncommitted the code and add these import statements to 
RichObjectFactory.java and the problem is fixed :

import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;

After this I tried compiling bioSql module it went successfully and also 
when I compiled Core module it went successfully too.I don't if this is 
the only reason then please uncomment these line in main svn version 
since i don't how to do it.

Thanks
Deepak Sheoran


From andreas at sdsc.edu  Tue Mar  9 12:28:25 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 9 Mar 2010 09:28:25 -0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
	current maven based project
In-Reply-To: <4B9567E9.7080909@gmail.com>
References: <4B9567E9.7080909@gmail.com>
Message-ID: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>

Hi Deepak,

thanks for spotting this. This factory method should clearly be moved to the
biosql module and not be part of the core.  Anybody who has a deeper
knowledge of the biosql code: Where is the best place in the biosql module
to move this to?

A work around the compile problem would be to use reflection to mask the
calls to the methods in the other module, but it feels like a hack...

Andreas

On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com> wrote:

> Hi
> I was making a local version of current maven project on my machine so that
> i can fix some reference related bugs in biojava. But when I build the local
> version and tried to use it. I got an error on method
> RichObjectFactory.connectToBioSql(Object session) of current version of
> bio-java live. when I had a look on it I saw a comment on it
>
>    "// commenting out for the moment, since it prevents core from
> compiling.
>    // TODO: move to BioSql module"
>
> then I uncommitted the code and add these import statements to
> RichObjectFactory.java and the problem is fixed :
>
> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
>
> After this I tried compiling bioSql module it went successfully and also
> when I compiled Core module it went successfully too.I don't if this is the
> only reason then please uncomment these line in main svn version since i
> don't how to do it.
>
> Thanks
> Deepak Sheoran
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From sheoran143 at gmail.com  Tue Mar  9 15:10:00 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Tue, 09 Mar 2010 14:10:00 -0600
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
 current maven based project
In-Reply-To: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>
References: <4B9567E9.7080909@gmail.com>
	<59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>
Message-ID: <4B96AB18.908@gmail.com>

Hi Andreas
I guess it should go in "org.biojavax.bio.db.biosql" package, it make 
sense to put this class their.

Deepak Sheoran

On 3/9/2010 11:28 AM, Andreas Prlic wrote:
> Hi Deepak,
>
> thanks for spotting this. This factory method should clearly be moved 
> to the biosql module and not be part of the core.  Anybody who has a 
> deeper knowledge of the biosql code: Where is the best place in the 
> biosql module to move this to?
>
> A work around the compile problem would be to use reflection to mask 
> the calls to the methods in the other module, but it feels like a hack...
>
> Andreas
>
> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com 
> <mailto:sheoran143 at gmail.com>> wrote:
>
>     Hi
>     I was making a local version of current maven project on my
>     machine so that i can fix some reference related bugs in biojava.
>     But when I build the local version and tried to use it. I got an
>     error on method
>     RichObjectFactory.connectToBioSql(Object session) of current
>     version of bio-java live. when I had a look on it I saw a comment
>     on it
>
>        "// commenting out for the moment, since it prevents core from
>     compiling.
>        // TODO: move to BioSql module"
>
>     then I uncommitted the code and add these import statements to
>     RichObjectFactory.java and the problem is fixed :
>
>     import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
>     import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
>     import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
>
>     After this I tried compiling bioSql module it went successfully
>     and also when I compiled Core module it went successfully too.I
>     don't if this is the only reason then please uncomment these line
>     in main svn version since i don't how to do it.
>
>     Thanks
>     Deepak Sheoran
>
>
>     _______________________________________________
>     Biojava-l mailing list  - Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>


From holland at eaglegenomics.com  Wed Mar 10 08:31:43 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 10 Mar 2010 21:31:43 +0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
	current maven based project
In-Reply-To: <4B96AB18.908@gmail.com>
References: <4B9567E9.7080909@gmail.com>
	<59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>
	<4B96AB18.908@gmail.com>
Message-ID: <CF54F815-7918-4E20-A305-543F5A46071D@eaglegenomics.com>

The problem is that the RichObjectFactory is generic, but the connectToBioSQL method is BioSQL specific. What really needs to happen is abstract out the connectToBioSQL method _only_ to a more specific class in the biosql module, and use (if necessary create) setters on RichObjectFactory for it to use.


On 10 Mar 2010, at 04:10, Deepak Sheoran wrote:

> Hi Andreas
> I guess it should go in "org.biojavax.bio.db.biosql" package, it make sense to put this class their.
> 
> Deepak Sheoran
> 
> On 3/9/2010 11:28 AM, Andreas Prlic wrote:
>> Hi Deepak,
>> 
>> thanks for spotting this. This factory method should clearly be moved to the biosql module and not be part of the core.  Anybody who has a deeper knowledge of the biosql code: Where is the best place in the biosql module to move this to?
>> 
>> A work around the compile problem would be to use reflection to mask the calls to the methods in the other module, but it feels like a hack...
>> 
>> Andreas
>> 
>> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com <mailto:sheoran143 at gmail.com>> wrote:
>> 
>>    Hi
>>    I was making a local version of current maven project on my
>>    machine so that i can fix some reference related bugs in biojava.
>>    But when I build the local version and tried to use it. I got an
>>    error on method
>>    RichObjectFactory.connectToBioSql(Object session) of current
>>    version of bio-java live. when I had a look on it I saw a comment
>>    on it
>> 
>>       "// commenting out for the moment, since it prevents core from
>>    compiling.
>>       // TODO: move to BioSql module"
>> 
>>    then I uncommitted the code and add these import statements to
>>    RichObjectFactory.java and the problem is fixed :
>> 
>>    import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
>>    import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
>>    import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
>> 
>>    After this I tried compiling bioSql module it went successfully
>>    and also when I compiled Core module it went successfully too.I
>>    don't if this is the only reason then please uncomment these line
>>    in main svn version since i don't how to do it.
>> 
>>    Thanks
>>    Deepak Sheoran
>> 
>> 
>>    _______________________________________________
>>    Biojava-l mailing list  - Biojava-l at lists.open-bio.org
>>    <mailto:Biojava-l at lists.open-bio.org>
>>    http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From mark.schreiber at novartis.com  Wed Mar 10 22:14:54 2010
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 11 Mar 2010 11:14:54 +0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java
 in	current maven based project
In-Reply-To: <CF54F815-7918-4E20-A305-543F5A46071D@eaglegenomics.com>
Message-ID: <OF2FF55EB1.D19103D2-ON482576E3.0011B573-482576E3.0011D831@ah.novartis.com>

Could a subclass of the RichObjectFactory exist in the BioSQL module. If 
you want your RichObjects backed by BioSQL you use the 
[BioSQL]RichObjectFactory from the BioSQL package???

- Mark


biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM:

> The problem is that the RichObjectFactory is generic, but the 
> connectToBioSQL method is BioSQL specific. What really needs to 
> happen is abstract out the connectToBioSQL method _only_ to a more 
> specific class in the biosql module, and use (if necessary create) 
> setters on RichObjectFactory for it to use.
> 
> 
> On 10 Mar 2010, at 04:10, Deepak Sheoran wrote:
> 
> > Hi Andreas
> > I guess it should go in "org.biojavax.bio.db.biosql" package, it 
> make sense to put this class their.
> > 
> > Deepak Sheoran
> > 
> > On 3/9/2010 11:28 AM, Andreas Prlic wrote:
> >> Hi Deepak,
> >> 
> >> thanks for spotting this. This factory method should clearly be 
> moved to the biosql module and not be part of the core.  Anybody who
> has a deeper knowledge of the biosql code: Where is the best place 
> in the biosql module to move this to?
> >> 
> >> A work around the compile problem would be to use reflection to 
> mask the calls to the methods in the other module, but it feels likea 
hack...
> >> 
> >> Andreas
> >> 
> >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com 
<
> mailto:sheoran143 at gmail.com>> wrote:
> >> 
> >>    Hi
> >>    I was making a local version of current maven project on my
> >>    machine so that i can fix some reference related bugs in biojava.
> >>    But when I build the local version and tried to use it. I got an
> >>    error on method
> >>    RichObjectFactory.connectToBioSql(Object session) of current
> >>    version of bio-java live. when I had a look on it I saw a comment
> >>    on it
> >> 
> >>       "// commenting out for the moment, since it prevents core from
> >>    compiling.
> >>       // TODO: move to BioSql module"
> >> 
> >>    then I uncommitted the code and add these import statements to
> >>    RichObjectFactory.java and the problem is fixed :
> >> 
> >>    import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
> >>    import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
> >>    import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
> >> 
> >>    After this I tried compiling bioSql module it went successfully
> >>    and also when I compiled Core module it went successfully too.I
> >>    don't if this is the only reason then please uncomment these line
> >>    in main svn version since i don't how to do it.
> >> 
> >>    Thanks
> >>    Deepak Sheoran
> >> 
> >> 
> >>    _______________________________________________
> >>    Biojava-l mailing list  - Biojava-l at lists.open-bio.org
> >>    <mailto:Biojava-l at lists.open-bio.org>
> >>    http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> 
> >> 
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.

From holland at eaglegenomics.com  Thu Mar 11 11:10:15 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 12 Mar 2010 00:10:15 +0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java
	in	current maven based project
In-Reply-To: <OF2FF55EB1.D19103D2-ON482576E3.0011B573-482576E3.0011D831@ah.novartis.com>
References: <OF2FF55EB1.D19103D2-ON482576E3.0011B573-482576E3.0011D831@ah.novartis.com>
Message-ID: <4E92965B-F9EA-43B1-9235-4FA7BAC09308@eaglegenomics.com>

Could do.

On 11 Mar 2010, at 11:14, mark.schreiber at novartis.com wrote:

> 
> Could a subclass of the RichObjectFactory exist in the BioSQL module. If you want your RichObjects backed by BioSQL you use the [BioSQL]RichObjectFactory from the BioSQL package??? 
> 
> - Mark 
> 
> 
> biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM:
> 
> > The problem is that the RichObjectFactory is generic, but the 
> > connectToBioSQL method is BioSQL specific. What really needs to 
> > happen is abstract out the connectToBioSQL method _only_ to a more 
> > specific class in the biosql module, and use (if necessary create) 
> > setters on RichObjectFactory for it to use.
> > 
> > 
> > On 10 Mar 2010, at 04:10, Deepak Sheoran wrote:
> > 
> > > Hi Andreas
> > > I guess it should go in "org.biojavax.bio.db.biosql" package, it 
> > make sense to put this class their.
> > > 
> > > Deepak Sheoran
> > > 
> > > On 3/9/2010 11:28 AM, Andreas Prlic wrote:
> > >> Hi Deepak,
> > >> 
> > >> thanks for spotting this. This factory method should clearly be 
> > moved to the biosql module and not be part of the core.  Anybody who
> > has a deeper knowledge of the biosql code: Where is the best place 
> > in the biosql module to move this to?
> > >> 
> > >> A work around the compile problem would be to use reflection to 
> > mask the calls to the methods in the other module, but it feels likea hack...
> > >> 
> > >> Andreas
> > >> 
> > >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com <
> > mailto:sheoran143 at gmail.com>> wrote:
> > >> 
> > >>    Hi
> > >>    I was making a local version of current maven project on my
> > >>    machine so that i can fix some reference related bugs in biojava.
> > >>    But when I build the local version and tried to use it. I got an
> > >>    error on method
> > >>    RichObjectFactory.connectToBioSql(Object session) of current
> > >>    version of bio-java live. when I had a look on it I saw a comment
> > >>    on it
> > >> 
> > >>       "// commenting out for the moment, since it prevents core from
> > >>    compiling.
> > >>       // TODO: move to BioSql module"
> > >> 
> > >>    then I uncommitted the code and add these import statements to
> > >>    RichObjectFactory.java and the problem is fixed :
> > >> 
> > >>    import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
> > >>    import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
> > >>    import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
> > >> 
> > >>    After this I tried compiling bioSql module it went successfully
> > >>    and also when I compiled Core module it went successfully too.I
> > >>    don't if this is the only reason then please uncomment these line
> > >>    in main svn version since i don't how to do it.
> > >> 
> > >>    Thanks
> > >>    Deepak Sheoran
> > >> 
> > >> 
> > >>    _______________________________________________
> > >>    Biojava-l mailing list  - Biojava-l at lists.open-bio.org
> > >>    <mailto:Biojava-l at lists.open-bio.org>
> > >>    http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >> 
> > >> 
> > > 
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > 
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > 
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer.  Thank you.

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Mar 15 06:34:14 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 15 Mar 2010 10:34:14 +0000
Subject: [Biojava-l] Hackathon in Boston, July 2010
Message-ID: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com>

Hi all,

Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved.

However, OBF have beaten us to it by organising a cross-project CodeFest!

 http://www.open-bio.org/wiki/Codefest_2010

It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :)

All relevant info is on the web page linked to above, and if you have any questions, ask Brad as detailed on the page.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From xuejiachen at gmail.com  Mon Mar 15 19:09:50 2010
From: xuejiachen at gmail.com (Jiachen Xue)
Date: Mon, 15 Mar 2010 19:09:50 -0400
Subject: [Biojava-l] question about BLAST output parsing
Message-ID: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>

Hi,

Thanks advance for help.

For the following piece of text appearing in a blast output. How can I get
the fields of "Identities", "Positives", "Gaps" as well as the alignment
information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
subject string?

>sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
AltName: Full=UMP
           pyrophosphorylase; AltName: Full=UPRTase
          Length = 209

 Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
adjust.
 Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)

Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
           DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165

From anjolou at hotmail.com  Tue Mar 16 05:20:35 2010
From: anjolou at hotmail.com (Louise Ott)
Date: Tue, 16 Mar 2010 10:20:35 +0100
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
Message-ID: <BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>


Hello,
I tried to use the biojava blast parser myself but i didn't find a way to get back these informations.If your blast result can be in xml, you should try to use jaxb to parse it (this is what i used).There are already some code for marshall/unmarshall in the biojava3 project.I give you the link, but it seems to be dead right now :
http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3
http://www.biojava.org/wiki/BioJava3_project
Have a nice day,
Louise


> Date: Mon, 15 Mar 2010 19:09:50 -0400
> From: xuejiachen at gmail.com
> To: biojava-l at lists.open-bio.org
> Subject: [Biojava-l] question about BLAST output parsing
> 
> Hi,
> 
> Thanks advance for help.
> 
> For the following piece of text appearing in a blast output. How can I get
> the fields of "Identities", "Positives", "Gaps" as well as the alignment
> information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> subject string?
> 
> >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> AltName: Full=UMP
>            pyrophosphorylase; AltName: Full=UPRTase
>           Length = 209
> 
>  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> adjust.
>  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> 
> Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
>            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
 		 	   		  
_________________________________________________________________
Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, Blackberry, ?
http://www.messengersurvotremobile.com/?d=Hotmail

From anjolou at hotmail.com  Tue Mar 16 05:23:37 2010
From: anjolou at hotmail.com (Louise Ott)
Date: Tue, 16 Mar 2010 10:23:37 +0100
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
Message-ID: <BAY110-W425A46DCB7DB1DB2110609B32D0@phx.gbl>


Sorry i forgot : there is an example of using blast parser in here :
http://biojava.org/wiki/BioJava:CookBook:Blast:Parser
It should be enough for what you want to do.


> Date: Mon, 15 Mar 2010 19:09:50 -0400
> From: xuejiachen at gmail.com
> To: biojava-l at lists.open-bio.org
> Subject: [Biojava-l] question about BLAST output parsing
> 
> Hi,
> 
> Thanks advance for help.
> 
> For the following piece of text appearing in a blast output. How can I get
> the fields of "Identities", "Positives", "Gaps" as well as the alignment
> information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> subject string?
> 
> >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> AltName: Full=UMP
>            pyrophosphorylase; AltName: Full=UPRTase
>           Length = 209
> 
>  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> adjust.
>  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> 
> Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
>            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
 		 	   		  
_________________________________________________________________
Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL !
http://www.windowslive.fr/hotmail/agregation/

From andreas at sdsc.edu  Tue Mar 16 11:19:45 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 16 Mar 2010 08:19:45 -0700
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
	<BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>
Message-ID: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com>

Yea, the BioJava Blast parser has not been maintained in quite a while.
Probably parsing the XML output of Blast is the thing to do nowadays. About
Biojava3: the wiki documentation is a bit behind, the code is now in the
main biojava-trunk and development has been quite active over the last
months.

Andreas

On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott <anjolou at hotmail.com> wrote:

>
>
> Hello,
> I tried to use the biojava blast parser myself but i didn't find a way to
> get back these informations.If your blast result can be in xml, you should
> try to use jaxb to parse it (this is what i used).There are already some
> code for marshall/unmarshall in the biojava3 project.I give you the link,
> but it seems to be dead right now :
>
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3
> http://www.biojava.org/wiki/BioJava3_project
> Have a nice day,
> Louise
>
>
> > Date: Mon, 15 Mar 2010 19:09:50 -0400
> > From: xuejiachen at gmail.com
> > To: biojava-l at lists.open-bio.org
> > Subject: [Biojava-l] question about BLAST output parsing
> >
> > Hi,
> >
> > Thanks advance for help.
> >
> > For the following piece of text appearing in a blast output. How can I
> get
> > the fields of "Identities", "Positives", "Gaps" as well as the alignment
> > information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> > subject string?
> >
> > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> > AltName: Full=UMP
> >            pyrophosphorylase; AltName: Full=UPRTase
> >           Length = 209
> >
> >  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> > adjust.
> >  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> >
> > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
> >            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> _________________________________________________________________
> Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone,
> Blackberry, ?
> http://www.messengersurvotremobile.com/?d=Hotmail
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From hlapp at drycafe.net  Tue Mar 16 16:03:50 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 16 Mar 2010 16:03:50 -0400
Subject: [Biojava-l] [OT] Job opportunity: Training coordinator and
	Bioinformatics Project Manager
Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net>

Hi all -

first off, sorry for the cross-posting, we're trying to advertise this  
as widely as possible. Second, apologies if this is committing an  
offense and considered spam. I thought though that there might be some  
people around here who may be interested and suitable.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================

A unique position is available for a training coordinator and  
bioinformatics project manager at the U.S. National Evolutionary  
Synthesis Center in Durham, North Carolina (NESCent, http:// 
nescent.org).  NESCent is a National Science Foundation funded  
research center managed by Duke University, the University of North  
Carolina at Chapel Hill and North Carolina State University on behalf  
of the international evolutionary biology community.  NESCent  
facilitates synthetic research by bringing together diverse expertise,  
data, tools and concepts (Sidlauskas et al. 2009).  In addition to a  
resident population of 20-30 scientists, the Center hosts over 800  
visitors a year.  An informatics staff is on-site to support resident  
and visiting scientists? needs in high-performance computing,  
electronic collaboration, scientific software and databases; this  
includes custom software development for a limited number of high- 
impact projects.  NESCent?s informatics training program includes a  
rotating series of open-application summer courses, ad-hoc short  
courses for resident scientists, and remote internships (including  
past participation in the Google Summer of Code).

The training coordinator and bioinformatics project manager will  
provide oversight to the Center?s training activities. The incumbent  
will also serve as the interface between scientists and software  
developers at NESCent. The position provides extensive opportunities  
for collaboration and intellectual engagement with both NESCent- 
sponsored scientists and informatics staff; however, this is not an  
independent research position. The incumbent will report to the  
Director, while overseeing the work of a small informatics team and  
coordinating activities among the Center?s science, education and  
informatics programs.


Responsibilities:

	? 50% - Consult with sponsored scientists (including scientists in  
residence and working group participants) about informatics resources  
and needs. Manage software product development by gathering  
requirements from scientists, participating in conceptual design,  
monitoring implementation progress and product quality, facilitating  
communication between software developers and scientists, and       
researching software solutions.

	? 25% - Oversee NESCent?s course curriculum by identifying  
opportunities for onsite or online informatics courses that satisfy  
demand for advanced training of resident and visiting scientists,  
recruiting instructors, providing guidance to instructors in  
developing course syllabi, coordinating logistical and technical  
support requirements, conducting assessments, and serving as a liaison  
to course organizers at other institutions.

	? 25% - Assisting in the management of NESCent?s summer informatics  
intern program, by coordinating the recruitment, application & review  
process for students, communicating expectations to students and  
mentors, monitoring student progress, documenting student outcomes,  
and performing assessments.


Education:

Required: M.S. in Biology, Bioinformatics, or a related field.
Preferred: Ph.D. and two years postdoctoral experience in evolutionary  
biology, or an equivalent combination of relevant education and/or  
experience.


Experience:

Required: Excellent communication, interpersonal, and organizational  
skills.  Experience with computationally oriented scientific research.
Preferred: At least two years in development of databases and open  
source software.   Organization, coordination, development and  
delivery of courses and workshops appropriate for graduate-level  
participants.


Terms of Employment:

Salary will be competitive and commensurate with experience.  As a  
full-time employee, the incumbent will receive Duke University?s  
benefits package (http://hr.duke.edu/benefits/main.html). The position  
is available immediately and will remain open until filled.  The  
position is currently funded through November 2014, contingent on  
annual renewal of the Center by the NSF.


How to Apply:

Please send a C.V., including contact information for three  
references, and a brief statement of interest to Allen Rodrigo,  
Director, NESCent, at a.rodrigo at nescent.org. Inquiries about  
suitability for the position are welcome.  Duke University is an Equal  
Opportunity/Affirmative Action employer.  Additional information about  
NESCent: http://www.nescent.org


References:

Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall  
LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The  
Continuing Promise of Evolutionary Synthesis. Evolution.
http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x


From markjschreiber at gmail.com  Tue Mar 16 21:14:51 2010
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 17 Mar 2010 09:14:51 +0800
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> 
	<BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>
	<59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com>
Message-ID: <93b45ca51003161814y7196e3e8i8e329b79e612cf50@mail.gmail.com>

I generally don't recommend parsing the standard BLAST output as it keeps
changing subtly . Best to parse one of the tabular formats or the XML
output.

- Mark

On Tue, Mar 16, 2010 at 11:19 PM, Andreas Prlic <andreas at sdsc.edu> wrote:

> Yea, the BioJava Blast parser has not been maintained in quite a while.
> Probably parsing the XML output of Blast is the thing to do nowadays. About
> Biojava3: the wiki documentation is a bit behind, the code is now in the
> main biojava-trunk and development has been quite active over the last
> months.
>
> Andreas
>
> On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott <anjolou at hotmail.com> wrote:
>
> >
> >
> > Hello,
> > I tried to use the biojava blast parser myself but i didn't find a way to
> > get back these informations.If your blast result can be in xml, you
> should
> > try to use jaxb to parse it (this is what i used).There are already some
> > code for marshall/unmarshall in the biojava3 project.I give you the link,
> > but it seems to be dead right now :
> >
> >
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3
> > http://www.biojava.org/wiki/BioJava3_project
> > Have a nice day,
> > Louise
> >
> >
> > > Date: Mon, 15 Mar 2010 19:09:50 -0400
> > > From: xuejiachen at gmail.com
> > > To: biojava-l at lists.open-bio.org
> > > Subject: [Biojava-l] question about BLAST output parsing
> > >
> > > Hi,
> > >
> > > Thanks advance for help.
> > >
> > > For the following piece of text appearing in a blast output. How can I
> > get
> > > the fields of "Identities", "Positives", "Gaps" as well as the
> alignment
> > > information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> > > subject string?
> > >
> > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> > > AltName: Full=UMP
> > >            pyrophosphorylase; AltName: Full=UPRTase
> > >           Length = 209
> > >
> > >  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> > > adjust.
> > >  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> > >
> > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
> > >            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> > > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> > _________________________________________________________________
> > Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone,
> > Blackberry, ?
> > http://www.messengersurvotremobile.com/?d=Hotmail
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From Richard.Finkers at wur.nl  Wed Mar 17 03:21:16 2010
From: Richard.Finkers at wur.nl (Richard Finkers)
Date: Wed, 17 Mar 2010 08:21:16 +0100
Subject: [Biojava-l] SVN repository
In-Reply-To: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
Message-ID: <4BA082EC.8010908@wur.nl>

Hi,

I would like to have a look at the BioJava 3 code (and perhaps in the 
future contribute to). However, I cannot access the SVN repository 
(http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk). 


Is the repository down?

Thanks,
Richard


From biopython at maubp.freeserve.co.uk  Wed Mar 17 06:16:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Mar 2010 10:16:45 +0000
Subject: [Biojava-l] SVN repository
In-Reply-To: <4BA082EC.8010908@wur.nl>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
	<4BA082EC.8010908@wur.nl>
Message-ID: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>

On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <Richard.Finkers at wur.nl> wrote:
>
> Hi,
>
> I would like to have a look at the BioJava 3 code (and perhaps in the future
> contribute to). However, I cannot access the SVN repository
> (http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk).
>
> Is the repository down?
>
> Thanks,
> Richard

Probably :(

There have been problems discussed on the BioPerl mailing list
(they use the same servers), and the OBF team are aware of it:
http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html

The code.open-bio.org repositories are a read only public mirror,
while dev.open-bio.org is the master repository I think is fine
(but not available for anonymous download).

In the mean time BioPerl have also setup a read only mirror
on github - perhaps BioJava could do the same? Meanwhile
BioRuby and Biopython are just using github (not SVN or CVS).

Peter

From andreas at sdsc.edu  Wed Mar 17 13:39:41 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 17 Mar 2010 10:39:41 -0700
Subject: [Biojava-l] SVN repository
In-Reply-To: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
	<4BA082EC.8010908@wur.nl>
	<320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
Message-ID: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>

I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
SVN is currently down. Depending on how big the problem turns out to be, it
will be back at some point later today / should be back latest tomorrow.

Sorry for this inconvenience.
Andreas


On Wed, Mar 17, 2010 at 3:16 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <Richard.Finkers at wur.nl>
> wrote:
> >
> > Hi,
> >
> > I would like to have a look at the BioJava 3 code (and perhaps in the
> future
> > contribute to). However, I cannot access the SVN repository
> > (
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
> ).
> >
> > Is the repository down?
> >
> > Thanks,
> > Richard
>
> Probably :(
>
> There have been problems discussed on the BioPerl mailing list
> (they use the same servers), and the OBF team are aware of it:
> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>
> The code.open-bio.org repositories are a read only public mirror,
> while dev.open-bio.org is the master repository I think is fine
> (but not available for anonymous download).
>
> In the mean time BioPerl have also setup a read only mirror
> on github - perhaps BioJava could do the same? Meanwhile
> BioRuby and Biopython are just using github (not SVN or CVS).
>
> Peter
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From andreas at sdsc.edu  Thu Mar 18 16:36:38 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 18 Mar 2010 13:36:38 -0700
Subject: [Biojava-l] Google summer of code
Message-ID: <59a41c431003181336i33d388aak4b5a26e11ee4161b@mail.gmail.com>

Hi,

It seems our (the Open Biology Foundation's) Google Summer of Code
application has been accepted.
http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010

As such we are now looking for an interested and skilled student to work on
the BioJava multiple sequence alignment project. Take a look at the project
description, and if you think you are up for the challenge, send me an email
with your application.

http://biojava.org/wiki/Google_Summer_of_Code

Andreas

From shakunb at uom.ac.mu  Fri Mar 19 06:50:40 2010
From: shakunb at uom.ac.mu (Shakuntala baichoo)
Date: Fri, 19 Mar 2010 14:50:40 +0400
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
Message-ID: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>

Hi!
I would like to know the interpretation of the scores after running the
needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
Actually I have taken the named genes from a bacteria EMBL file and I am
trying to compare each gene to the other genes in the lot, using the
needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I
would like to determine the % match for each pair but since I get mostly -ve
and some positive values, I would like to know how to calculate the % match
for a pair of genes.
I would be grateful if anybody could help me.

Thanks.
Shakuntala

On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org>wrote:

> Send Biojava-l mailing list submissions to
>        biojava-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.open-bio.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
>        biojava-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
>        biojava-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
>
>
> Today's Topics:
>
>   1. Re: SVN repository (Andreas Prlic)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 17 Mar 2010 10:39:41 -0700
> From: Andreas Prlic <andreas at sdsc.edu>
> Subject: Re: [Biojava-l] SVN repository
> To: Richard Finkers <Richard.Finkers at wur.nl>
> Cc: biojava-l at lists.open-bio.org
> Message-ID:
>        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
> SVN is currently down. Depending on how big the problem turns out to be, it
> will be back at some point later today / should be back latest tomorrow.
>
> Sorry for this inconvenience.
> Andreas
>
>
>
>
> On Wed, Mar 17, 2010 at 3:16 AM, Peter <biopython at maubp.freeserve.co.uk
> >wrote:
>
> > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <Richard.Finkers at wur.nl
> >
> > wrote:
> > >
> > > Hi,
> > >
> > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > future
> > > contribute to). However, I cannot access the SVN repository
> > > (
> >
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
> > ).
> > >
> > > Is the repository down?
> > >
> > > Thanks,
> > > Richard
> >
> > Probably :(
> >
> > There have been problems discussed on the BioPerl mailing list
> > (they use the same servers), and the OBF team are aware of it:
> > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> >
> > The code.open-bio.org repositories are a read only public mirror,
> > while dev.open-bio.org is the master repository I think is fine
> > (but not available for anonymous download).
> >
> > In the mean time BioPerl have also setup a read only mirror
> > on github - perhaps BioJava could do the same? Meanwhile
> > BioRuby and Biopython are just using github (not SVN or CVS).
> >
> > Peter
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
> ------------------------------
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
> End of Biojava-l Digest, Vol 86, Issue 9
> ****************************************
>


-- 
Best Regards

Dr. (Mrs.) S.Baichoo
Senior Lecturer
CSE Dept, FoE
University of Mauritius

From andreas at sdsc.edu  Fri Mar 19 13:42:44 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 19 Mar 2010 10:42:44 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
Message-ID: <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>

sorry, can you clarify: what do you mean with you "get mostly -ve" ?

Andreas

On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:

> Hi!
> I would like to know the interpretation of the scores after running the
> needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
> Actually I have taken the named genes from a bacteria EMBL file and I am
> trying to compare each gene to the other genes in the lot, using the
> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I
> would like to determine the % match for each pair but since I get mostly
> -ve
> and some positive values, I would like to know how to calculate the % match
> for a pair of genes.
> I would be grateful if anybody could help me.
>
> Thanks.
> Shakuntala
>
> On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org
> >wrote:
>
> > Send Biojava-l mailing list submissions to
> >        biojava-l at lists.open-bio.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >        http://lists.open-bio.org/mailman/listinfo/biojava-l
> > or, via email, send a message with subject or body 'help' to
> >        biojava-l-request at lists.open-bio.org
> >
> > You can reach the person managing the list at
> >        biojava-l-owner at lists.open-bio.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Biojava-l digest..."
> >
> >
> > Today's Topics:
> >
> >   1. Re: SVN repository (Andreas Prlic)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 17 Mar 2010 10:39:41 -0700
> > From: Andreas Prlic <andreas at sdsc.edu>
> > Subject: Re: [Biojava-l] SVN repository
> > To: Richard Finkers <Richard.Finkers at wur.nl>
> > Cc: biojava-l at lists.open-bio.org
> > Message-ID:
> >        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > I have just heard back from the OBF-helpdesk. The VM hosting the
> anonymous
> > SVN is currently down. Depending on how big the problem turns out to be,
> it
> > will be back at some point later today / should be back latest tomorrow.
> >
> > Sorry for this inconvenience.
> > Andreas
> >
> >
> >
> >
> > On Wed, Mar 17, 2010 at 3:16 AM, Peter <biopython at maubp.freeserve.co.uk
> > >wrote:
> >
> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <
> Richard.Finkers at wur.nl
> > >
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > > future
> > > > contribute to). However, I cannot access the SVN repository
> > > > (
> > >
> >
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
> > > ).
> > > >
> > > > Is the repository down?
> > > >
> > > > Thanks,
> > > > Richard
> > >
> > > Probably :(
> > >
> > > There have been problems discussed on the BioPerl mailing list
> > > (they use the same servers), and the OBF team are aware of it:
> > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> > >
> > > The code.open-bio.org repositories are a read only public mirror,
> > > while dev.open-bio.org is the master repository I think is fine
> > > (but not available for anonymous download).
> > >
> > > In the mean time BioPerl have also setup a read only mirror
> > > on github - perhaps BioJava could do the same? Meanwhile
> > > BioRuby and Biopython are just using github (not SVN or CVS).
> > >
> > > Peter
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> >
> > End of Biojava-l Digest, Vol 86, Issue 9
> > ****************************************
> >
>
>
>
> --
> Best Regards
>
> Dr. (Mrs.) S.Baichoo
> Senior Lecturer
> CSE Dept, FoE
> University of Mauritius
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From mitlox at op.pl  Sat Mar 20 06:17:17 2010
From: mitlox at op.pl (xyz)
Date: Sat, 20 Mar 2010 20:17:17 +1000
Subject: [Biojava-l] sort fasta file
Message-ID: <20100320201718.4420a9b9@wp01>

Hello,
I would like to sort multiple fasta file depends on the sequence length,
ie. from the read with longest sequence to the read with the shortest
sequence.

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import org.biojava.bio.BioException;

import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;

public class SortFasta {

  public static void main(String[] args) throws FileNotFoundException,
  BioException {

    BufferedReader br = new BufferedReader(new
    FileReader("sortfasta.fasta")); SimpleNamespace ns = new
    SimpleNamespace("biojava");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null,
    ns);

    while (rsi.hasNext()) {
      RichSequence rs = rsi.nextRichSequence();
      System.out.println(rs.getName());
      System.out.println(rs.seqString());
    }
  }
}

I have tried to do it, but I do not how to continue.

Thank you in advance.

Best regards,

From jswetnam at gmail.com  Sun Mar 21 16:56:35 2010
From: jswetnam at gmail.com (James Swetnam)
Date: Sun, 21 Mar 2010 16:56:35 -0400
Subject: [Biojava-l] sort fasta file
In-Reply-To: <20100320201718.4420a9b9@wp01>
References: <20100320201718.4420a9b9@wp01>
Message-ID: <ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>

Just hacked this together, warning: I am new to both java and biojava.

import java.io.*;
import java.util.*;

import org.biojava.bio.BioException;
import org.biojava.bio.symbol.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.*;

import java.util.Comparator;

public class SortFasta {

    static private class RichSequenceComparator implements
Comparator<RichSequence> {

    public int compare(RichSequence seq1, RichSequence seq2)
    {
        return seq1.length() - seq2.length();
    }


    }

    // Usage:  SortFasta unsortedFile.fasta
    public static void main(String[] args) throws FileNotFoundException,
                          BioException {

    String fastaFile = args[0];

    BufferedReader br = new BufferedReader(new FileReader(fastaFile));
    SimpleNamespace ns = new SimpleNamespace("biojava");

    Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
                                  protein.getTokenization("token"),
                                  ns);

    SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
SortFasta.RichSequenceComparator());

    while (rsi.hasNext()) {
        sorted.add(rsi.nextRichSequence());
    }

    Iterator<RichSequence> sortedIt = sorted.iterator();

    //Do whatever you want here with the ascending list of RichSequences by
length, I'll just print them.
    while(sortedIt.hasNext())
        {
        System.out.println(((RichSequence) sortedIt.next()).length());
        }
    }
}

On Sat, Mar 20, 2010 at 6:17 AM, xyz <mitlox at op.pl> wrote:

> Hello,
> I would like to sort multiple fasta file depends on the sequence length,
> ie. from the read with longest sequence to the read with the shortest
> sequence.
>
> import java.io.BufferedReader;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import org.biojava.bio.BioException;
>
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.RichSequence;
> import org.biojavax.bio.seq.RichSequenceIterator;
>
> public class SortFasta {
>
>  public static void main(String[] args) throws FileNotFoundException,
>  BioException {
>
>    BufferedReader br = new BufferedReader(new
>    FileReader("sortfasta.fasta")); SimpleNamespace ns = new
>    SimpleNamespace("biojava");
>
>    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null,
>    ns);
>
>    while (rsi.hasNext()) {
>      RichSequence rs = rsi.nextRichSequence();
>      System.out.println(rs.getName());
>      System.out.println(rs.seqString());
>    }
>  }
> }
>
> I have tried to do it, but I do not how to continue.
>
> Thank you in advance.
>
> Best regards,
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>

From andreas at sdsc.edu  Mon Mar 22 19:46:26 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 22 Mar 2010 16:46:26 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>
	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
Message-ID: <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>

Hi Shakuntala,

at the present the NeedlemanWunch implementation does not make it totally
straightforward to access the %id. You could try parsing the result of the
getAlignmentString() call and accessing the information from there ...
Making the underlying data more accessible is on the TODO list for this
module: http://biojava.org/wiki/BioJava:Modules

Andreas

2010/3/21 Shakuntala baichoo <shakunb at uom.ac.mu>

> Hi Andreas!
> The problem is as follows. We have a bacteria file. There are about 565
> named genes/features there. We wish to compare each gene with the other 564
> genes. I am using needleman-wunsch from biojava to do so. For one specific
> run, I am attaching the result.
> The score after comparing Feature no. 0 with Feature no. 1 to Feature no.
> 564 is displayed (along with the product name etc...). If I wish to
> interpret these scores as a percentage homology, how do I do it?
>
> P.S. Most of the scores are -ve. Only one or a few is +ve.  The comparison
> is done using NUCC44.txt.
>
> Thanks
> Kind Regards
> Shakuntala
>
>
> On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>
>> sorry, can you clarify: what do you mean with you "get mostly -ve" ?
>>
>> Andreas
>>
>>
>> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:
>>
>>> Hi!
>>> I would like to know the interpretation of the scores after running the
>>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
>>> Actually I have taken the named genes from a bacteria EMBL file and I am
>>> trying to compare each gene to the other genes in the lot, using the
>>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I
>>> would like to determine the % match for each pair but since I get mostly
>>> -ve
>>> and some positive values, I would like to know how to calculate the %
>>> match
>>> for a pair of genes.
>>> I would be grateful if anybody could help me.
>>>
>>> Thanks.
>>> Shakuntala
>>>
>>> On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org
>>> >wrote:
>>>
>>> > Send Biojava-l mailing list submissions to
>>> >        biojava-l at lists.open-bio.org
>>> >
>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>> >        http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> > or, via email, send a message with subject or body 'help' to
>>> >        biojava-l-request at lists.open-bio.org
>>> >
>>> > You can reach the person managing the list at
>>> >        biojava-l-owner at lists.open-bio.org
>>> >
>>> > When replying, please edit your Subject line so it is more specific
>>> > than "Re: Contents of Biojava-l digest..."
>>> >
>>> >
>>> > Today's Topics:
>>> >
>>> >   1. Re: SVN repository (Andreas Prlic)
>>> >
>>> >
>>> > ----------------------------------------------------------------------
>>> >
>>> > Message: 1
>>> > Date: Wed, 17 Mar 2010 10:39:41 -0700
>>> > From: Andreas Prlic <andreas at sdsc.edu>
>>> > Subject: Re: [Biojava-l] SVN repository
>>> > To: Richard Finkers <Richard.Finkers at wur.nl>
>>> > Cc: biojava-l at lists.open-bio.org
>>> > Message-ID:
>>> >        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
>>> > Content-Type: text/plain; charset=ISO-8859-1
>>> >
>>> > I have just heard back from the OBF-helpdesk. The VM hosting the
>>> anonymous
>>> > SVN is currently down. Depending on how big the problem turns out to
>>> be, it
>>> > will be back at some point later today / should be back latest
>>> tomorrow.
>>> >
>>> > Sorry for this inconvenience.
>>> > Andreas
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter <
>>> biopython at maubp.freeserve.co.uk
>>> > >wrote:
>>> >
>>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <
>>> Richard.Finkers at wur.nl
>>> > >
>>> > > wrote:
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > I would like to have a look at the BioJava 3 code (and perhaps in
>>> the
>>> > > future
>>> > > > contribute to). However, I cannot access the SVN repository
>>> > > > (
>>> > >
>>> >
>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
>>> > > ).
>>> > > >
>>> > > > Is the repository down?
>>> > > >
>>> > > > Thanks,
>>> > > > Richard
>>> > >
>>> > > Probably :(
>>> > >
>>> > > There have been problems discussed on the BioPerl mailing list
>>> > > (they use the same servers), and the OBF team are aware of it:
>>> > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>>> > >
>>> > > The code.open-bio.org repositories are a read only public mirror,
>>> > > while dev.open-bio.org is the master repository I think is fine
>>> > > (but not available for anonymous download).
>>> > >
>>> > > In the mean time BioPerl have also setup a read only mirror
>>> > > on github - perhaps BioJava could do the same? Meanwhile
>>> > > BioRuby and Biopython are just using github (not SVN or CVS).
>>> > >
>>> > > Peter
>>> > > _______________________________________________
>>> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> > >
>>> >
>>> >
>>> > ------------------------------
>>> >
>>> > _______________________________________________
>>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> >
>>> >
>>> > End of Biojava-l Digest, Vol 86, Issue 9
>>> > ****************************************
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Dr. (Mrs.) S.Baichoo
>>> Senior Lecturer
>>> CSE Dept, FoE
>>> University of Mauritius
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>
>
> --
> Best Regards
>
> Dr. (Mrs.) S.Baichoo
> Senior Lecturer
> CSE Dept, FoE
> University of Mauritius
>

From zm19fitz at siena.edu  Mon Mar 22 16:36:14 2010
From: zm19fitz at siena.edu (Fitzsimmons, Zachary)
Date: Mon, 22 Mar 2010 16:36:14 -0400
Subject: [Biojava-l] (no subject)
Message-ID: <3898DEB8D4D8E34EB622AC53CEFFA2680173D9476385@mb-1.siena.edu>

Hi,

I am currently a sophomore at Siena College and a Dual Major in Computer Science and Mathematics and I am writing you today to voice my interest in developing for BioJava this summer through Google?s Summer of Code program.  I did research at my own college last summer on the Netflix Prize Project with one of my computer science professors and I am very interested in diversifying my work this summer.  Currently I am taking an upper-level computer science course in bioinformatics and I have always thought of this as a possible field of study when I attend graduate school.  I have learned about different global alignment algorithms such as Needleman?Wunsch and Smith?Waterman in class to match proteins and DNA sequences and later we are going to study the HP folding problem in-depth.  I am well versed in the Java programming language, having taken all of the Java courses at my college, and confident in my abilities to contribute to the BioJava project.  I consider the All-Java Multiple Sequence Alignment project described in your wiki article [http://biojava.org/wiki/Google_Summer_of_Code] something within my abilities as an experienced Java programmer with past research experience and an interest in the field of bioinformatics.  Updating the BioJava code to be newly compliant and eventually implementing a Clustal algorithm for multiple sequence alignment is well within my grasp especially on completion of my college?s bioinformatics course and studying BioJava?s documentation.  I would just like your feedback on my proposal for working on your project.  I hope to hear from you soon and to apply for the position through Google.

Sincerely,

Zack Fitzsimmons

From andreas at sdsc.edu  Tue Mar 23 20:33:09 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 23 Mar 2010 17:33:09 -0700
Subject: [Biojava-l] GSoC update
Message-ID: <59a41c431003231733t1e259753k55fbe0a8bfb801a3@mail.gmail.com>

Hi,

A quick update regarding the current status of our Google Summer of Code
project: Several students already have expressed their interest. In fact the
response was so good that I believe BioJava should try to run more than just
one project.  In the meanwhile we added another "mentor proposed" project to
our GSoC page : http://biojava.org/wiki/Google_Summer_of_Code . Identification
and Classification of Posttranslational Modification of Proteins:  Develop a
Postranslational Modification package for the BioJava project.

In general Google strongly encourages to have student-proposed projects,
since historically those are often the most successful GSoC projects. It is
recommended that students contact us / possible mentors prior to their
application so we can match up students with suitable mentors and projects
and we can help in solidifying your project ideas. In principle any BioJava
contributor is suitable as a mentor. Students can apply between March 22nd
and April 9th via the google web site. http://socghop.appspot.com/

Andreas

From andreas at sdsc.edu  Wed Mar 24 11:37:43 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 24 Mar 2010 08:37:43 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>
	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
	<59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>
	<3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
Message-ID: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>

Hi Shakuntala,

If the score is positive or negative only depends on the implementation and
representation... I think most people expect the score to be positive, so
the toAlignmentString method displays it as a positive value, while
internally it is a bit different...

Andreas

On Wed, Mar 24, 2010 at 3:32 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:

> Hello Andreas!
> Thanks for the quick reply.
> I tried the getAlignmentString. It provides a lot of information. However,
> I think there is a slight problem here. From the getAlignmentString call I
> see that the score after aligning a pair of dna strings is 2706.
> But when I view the return value from the method pairwiseAlignment (for the
> same set) then the score is -2706.  Why?
>
> Thanks
> Shakuntala
>
> *
> *
>
>
> On Tue, Mar 23, 2010 at 3:46 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>
>> Hi Shakuntala,
>>
>> at the present the NeedlemanWunch implementation does not make it totally
>> straightforward to access the %id. You could try parsing the result of the
>> getAlignmentString() call and accessing the information from there ...
>> Making the underlying data more accessible is on the TODO list for this
>> module: http://biojava.org/wiki/BioJava:Modules
>>
>> Andreas
>>
>> 2010/3/21 Shakuntala baichoo <shakunb at uom.ac.mu>
>>
>> Hi Andreas!
>>> The problem is as follows. We have a bacteria file. There are about 565
>>> named genes/features there. We wish to compare each gene with the other 564
>>> genes. I am using needleman-wunsch from biojava to do so. For one specific
>>> run, I am attaching the result.
>>> The score after comparing Feature no. 0 with Feature no. 1 to Feature no.
>>> 564 is displayed (along with the product name etc...). If I wish to
>>> interpret these scores as a percentage homology, how do I do it?
>>>
>>> P.S. Most of the scores are -ve. Only one or a few is +ve.  The
>>> comparison is done using NUCC44.txt.
>>>
>>> Thanks
>>> Kind Regards
>>> Shakuntala
>>>
>>>
>>> On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>
>>>> sorry, can you clarify: what do you mean with you "get mostly -ve" ?
>>>>
>>>> Andreas
>>>>
>>>>
>>>> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:
>>>>
>>>>> Hi!
>>>>> I would like to know the interpretation of the scores after running the
>>>>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
>>>>> Actually I have taken the named genes from a bacteria EMBL file and I
>>>>> am
>>>>> trying to compare each gene to the other genes in the lot, using the
>>>>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix.
>>>>> I
>>>>> would like to determine the % match for each pair but since I get
>>>>> mostly -ve
>>>>> and some positive values, I would like to know how to calculate the %
>>>>> match
>>>>> for a pair of genes.
>>>>> I would be grateful if anybody could help me.
>>>>>
>>>>> Thanks.
>>>>> Shakuntala
>>>>>
>>>>> On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org
>>>>> >wrote:
>>>>>
>>>>> > Send Biojava-l mailing list submissions to
>>>>> >        biojava-l at lists.open-bio.org
>>>>> >
>>>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>>>> >        http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> > or, via email, send a message with subject or body 'help' to
>>>>> >        biojava-l-request at lists.open-bio.org
>>>>> >
>>>>> > You can reach the person managing the list at
>>>>> >        biojava-l-owner at lists.open-bio.org
>>>>> >
>>>>> > When replying, please edit your Subject line so it is more specific
>>>>> > than "Re: Contents of Biojava-l digest..."
>>>>> >
>>>>> >
>>>>> > Today's Topics:
>>>>> >
>>>>> >   1. Re: SVN repository (Andreas Prlic)
>>>>> >
>>>>> >
>>>>> >
>>>>> ----------------------------------------------------------------------
>>>>> >
>>>>> > Message: 1
>>>>> > Date: Wed, 17 Mar 2010 10:39:41 -0700
>>>>> > From: Andreas Prlic <andreas at sdsc.edu>
>>>>> > Subject: Re: [Biojava-l] SVN repository
>>>>> > To: Richard Finkers <Richard.Finkers at wur.nl>
>>>>> > Cc: biojava-l at lists.open-bio.org
>>>>> > Message-ID:
>>>>> >        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
>>>>> > Content-Type: text/plain; charset=ISO-8859-1
>>>>> >
>>>>> > I have just heard back from the OBF-helpdesk. The VM hosting the
>>>>> anonymous
>>>>> > SVN is currently down. Depending on how big the problem turns out to
>>>>> be, it
>>>>> > will be back at some point later today / should be back latest
>>>>> tomorrow.
>>>>> >
>>>>> > Sorry for this inconvenience.
>>>>> > Andreas
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter <
>>>>> biopython at maubp.freeserve.co.uk
>>>>> > >wrote:
>>>>> >
>>>>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <
>>>>> Richard.Finkers at wur.nl
>>>>> > >
>>>>> > > wrote:
>>>>> > > >
>>>>> > > > Hi,
>>>>> > > >
>>>>> > > > I would like to have a look at the BioJava 3 code (and perhaps in
>>>>> the
>>>>> > > future
>>>>> > > > contribute to). However, I cannot access the SVN repository
>>>>> > > > (
>>>>> > >
>>>>> >
>>>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
>>>>> > > ).
>>>>> > > >
>>>>> > > > Is the repository down?
>>>>> > > >
>>>>> > > > Thanks,
>>>>> > > > Richard
>>>>> > >
>>>>> > > Probably :(
>>>>> > >
>>>>> > > There have been problems discussed on the BioPerl mailing list
>>>>> > > (they use the same servers), and the OBF team are aware of it:
>>>>> > >
>>>>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>>>>> > >
>>>>> > > The code.open-bio.org repositories are a read only public mirror,
>>>>> > > while dev.open-bio.org is the master repository I think is fine
>>>>> > > (but not available for anonymous download).
>>>>> > >
>>>>> > > In the mean time BioPerl have also setup a read only mirror
>>>>> > > on github - perhaps BioJava could do the same? Meanwhile
>>>>> > > BioRuby and Biopython are just using github (not SVN or CVS).
>>>>> > >
>>>>> > > Peter
>>>>> > > _______________________________________________
>>>>> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> > >
>>>>> >
>>>>> >
>>>>> > ------------------------------
>>>>> >
>>>>> > _______________________________________________
>>>>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> >
>>>>> >
>>>>> > End of Biojava-l Digest, Vol 86, Issue 9
>>>>> > ****************************************
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Dr. (Mrs.) S.Baichoo
>>>>> Senior Lecturer
>>>>> CSE Dept, FoE
>>>>> University of Mauritius
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Dr. (Mrs.) S.Baichoo
>>> Senior Lecturer
>>> CSE Dept, FoE
>>> University of Mauritius
>>>
>>
>>
>
>
> --
> Best Regards
>
> Dr. (Mrs.) S.Baichoo
> Senior Lecturer
> CSE Dept, FoE
> University of Mauritius
>

From jeedward at yahoo.com  Wed Mar 24 20:27:28 2010
From: jeedward at yahoo.com (John Edward)
Date: Wed, 24 Mar 2010 17:27:28 -0700 (PDT)
Subject: [Biojava-l] Call for papers (Deadline Extended): BCBGC-10, USA,
	July 2010
Message-ID: <852924.28793.qm@web45911.mail.sp1.yahoo.com>

It
would be highly appreciated if you could share this announcement with your
colleagues, students and individuals whose research is in bioinformatics,
computational biology, genomics, data-mining, and related areas.
 
Call
for papers (Deadline Extended): BCBGC-10, USA, July 2010
 
The
2010 International Conference on Bioinformatics, Computational Biology,
Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will
be held during 12-14 of July 2010 in Orlando, FL, USA.  BCBGC is an important event in the areas of
bioinformatics, computational biology, genomics and chemoinformatics and
focuses on all areas related to the conference.
 
The
conference will be held at the same time and location where several other major
international conferences will be taking place. The conference will be held as
part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during
July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to
promote research and developmental activities in computer science, information
technology, control engineering, and related fields. Another goal is to promote
the dissemination of research to a multidisciplinary audience and to facilitate
communication among researchers, developers, practitioners in different fields.
The following conferences are planned to be organized as part of MULTICONF-10.
 
?           International Conference on
Artificial Intelligence and Pattern Recognition (AIPR-10)
?            International Conference on
Automation, Robotics and Control Systems (ARCS-10)
?           International Conference on
Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10)
?           International Conference on Computer
Communications and Networks (CCN-10)
?           International Conference on
Enterprise Information Systems and Web Technologies (EISWT-10)
?           International Conference on High
Performance Computing Systems (HPCS-10)
?           International Conference on
Information Security and Privacy (ISP-10) 
?           International Conference on Image and
Video Processing and Computer Vision (IVPCV-10)
?           International Conference on Software
Engineering Theory and Practice (SETP-10) 
?           International Conference on
Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) 
 
 
MULTICONF-10
will be held at Imperial Swan Hotel and Suites.  It is a full-service resort that puts you in the middle of the fun!
Located 1/2 block south of the famed International Drive, the hotel is just
minutes from great entertainment like Walt Disney World? Resort, Universal
Studios and Sea World Orlando. Guests can enjoy free scheduled transportation
to these theme parks, as well as spacious accommodations, outdoor pools and
on-site dining ? all situated on 10 tropically landscaped acres. Here, guests
can experience a full-service resort with discount hotel pricing in Orlando.
 
We
invite draft paper submissions. Please see the website http://www.PromoteResearch.org for
more details.
 
Sincerely
John
Edward


From andreas.draeger at uni-tuebingen.de  Thu Mar 25 10:19:02 2010
From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Thu, 25 Mar 2010 15:19:02 +0100
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>	<59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>	<3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
	<59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>
Message-ID: <4BAB70D6.5060309@uni-tuebingen.de>

Hi Andreas and Shakuntala,

The alignment classes have just been revised and can be now updated from 
the repository. As a major improvement the alignment result has become 
much easier usable. So, if you're interested in computing something 
based on the score, you can now simply apply the dedicated get method 
and don't have to care about parsing anymore. I hope that helps.

Cheers
Andreas

-- 
Dipl.-Bioinform. Andreas Dr?ger
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Sand 1
72076 T?bingen
Germany

Phone: +49-7071-29-70436
Fax:   +49-7071-29-5091

From mitlox at op.pl  Thu Mar 25 09:23:37 2010
From: mitlox at op.pl (xyz)
Date: Thu, 25 Mar 2010 23:23:37 +1000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
Message-ID: <20100325232337.3021200a@wp01>

Hi James,
Thank you for the solution, but I get this 
7
13
23
30
as output for this input file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttccccccccccccccccccccccc

How is it possible to fix it and why did you chose Comparator and not
Comparable?

Thank you in advance.

Best regards,


On Sun, 21 Mar 2010 16:56:35 -0400
James Swetnam <jswetnam at gmail.com> wrote:

> Just hacked this together, warning: I am new to both java and biojava.
> 
> import java.io.*;
> import java.util.*;
> 
> import org.biojava.bio.BioException;
> import org.biojava.bio.symbol.*;
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.*;
> 
> import java.util.Comparator;
> 
> public class SortFasta {
> 
>     static private class RichSequenceComparator implements
> Comparator<RichSequence> {
> 
>     public int compare(RichSequence seq1, RichSequence seq2)
>     {
>         return seq1.length() - seq2.length();
>     }
> 
> 
>     }
> 
>     // Usage:  SortFasta unsortedFile.fasta
>     public static void main(String[] args) throws
> FileNotFoundException, BioException {
> 
>     String fastaFile = args[0];
> 
>     BufferedReader br = new BufferedReader(new FileReader(fastaFile));
>     SimpleNamespace ns = new SimpleNamespace("biojava");
> 
>     Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");
> 
>     RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
>                                   protein.getTokenization("token"),
>                                   ns);
> 
>     SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
> SortFasta.RichSequenceComparator());
> 
>     while (rsi.hasNext()) {
>         sorted.add(rsi.nextRichSequence());
>     }
> 
>     Iterator<RichSequence> sortedIt = sorted.iterator();
> 
>     //Do whatever you want here with the ascending list of
> RichSequences by length, I'll just print them.
>     while(sortedIt.hasNext())
>         {
>         System.out.println(((RichSequence) sortedIt.next()).length());
>         }
>     }
> }
> 

From holland at eaglegenomics.com  Thu Mar 25 12:27:17 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 25 Mar 2010 16:27:17 +0000
Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject
	:( Hibernate Exception and suggestion for change in BioSqlSchema)
In-Reply-To: <4BAABA21.4000301@gmail.com>
References: <4BAABA21.4000301@gmail.com>
Message-ID: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>

Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL.

On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:

> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed.
> 
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> Hi
> In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database.
> 
> Can somebody please have a look on second issue of it and fix it
> "
> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> "
> 
> Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it.
> Have a look on attached files 
> 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava
> 2) GenbankRecord.doc  ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava
> 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class
> 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> 
> -------- Original Message --------
> Subject:	Re: Hibernate Exception and suggestion for change in BioSqlSchema
> Date:	Tue, 9 Feb 2010 20:34:32 +1300
> From:	Richard Holland <holland at eaglegenomics.com>
> To:	Deepak Sheoran <sheoran143 at gmail.com>
> CC:	biojava-l at biojava.org
> 
> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.
> 
> However, in answer to your two questions:
> 
>   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).
> 
>   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> 
> cheers,
> Richard
> 
> On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> 
> > 
> > Hi Richard
> > 
> > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
> > 
> > 
> > Thanks
> > Deepak Sheoran
> > -------- Original Message --------
> > Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
> > Date:	Wed, 03 Feb 2010 08:07:35 -0600
> > From:	Deepak Sheoran 
> <sheoran143 at gmail.com>
> 
> > To:	
> biojava-l at lists.open-bio.org
> 
> > 
> > Hi guys,
> > 
> > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:  
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> 
> > On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
> > 	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
> > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
> >  Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
> > But problem is with below part of that method:
> > ?..LineNumber: 114
> > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> >  {                queryType = "DocRef";
> >                 // convert List constructor to String representation for query
> >                 ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> >                 if (ourParamsList.size()<3) {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
> >                 } else {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
> >                 }       
> >  }
> > ..LineNubmer: 123
> > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
> > ?.LineNumber: 447
> > else {
> >                                         try {
> >                                             CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
> >                                             RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
> >                                             rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> >                                         } catch (ChangeVetoException e) {
> >                                             throw new ParseException(e+", accession:"+accession);
> >                                         }
> >                                     }
> >                     ?..LineNumber:455
> > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
> >  
> > The only way to get these record in database is:
> > 		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
> > 		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
> >  
> > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
> > Reference_id
> > Dbxref_id         
> > Location
> > Title
> > Authors
> > crc
> > 216
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > 9E940E01F4BE3CD0
> > 230
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > D3BC0C17F3F786C9
> > 415
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 60AEDFA0CEEACC38
> > 969
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 4B1232999F6E8130
> > 929
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 3E79B40DD2AAA2B7
> > 932
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 094EB3384F8D6DE8
> > 1426
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
> > 357648D8FD8C6C8A
> > 1481
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > 115411EB2DEE5654
> > 1497
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D5D376EECCD186B
> > 1501
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D57954EECDED66B
> > 1556
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > 698688FB6DB95247
> > 1559
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > E25E1BA99DB18F3D
> >  
> > 	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> > 		? Which means in richsequence object some feature have location object which have its feature set to null.
> > 		? My Observation:
> > 			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
> > 			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
> > 			? Below is the screen shot of one of my tests
> > 				? Settings before trying to persits the richsequence object to database
> >  
> > <Mail Attachment.png>
> > 		?  
> > 		? After trying to persits the richsequence object to database and got in hibernate exception catch
> >  
> > 		? <Mail Attachment.png>
> >  
> > 		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
> > 		? Some extra information to make things more clear to you guys.
> > 			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
> > 				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
> > 					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
> > 				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
> > 					? richSequence.feature Index : 127 and line number in the genbank record : 2137
> > 				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
> > 					? richSequence.feature Index : 389 and line number in the genbank record : 3632
> > 				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
> > 					? richSequence.feature Index : 47 and line number in the genbank record : 4841
> > 				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
> > 					? richSequence.feature Index : 45 and line number in the genbank record : 442
> > 		? The complete exception msg :
> > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> >         at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> >         at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> >  
> >  
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: 
> holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> 
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Thu Mar 25 12:47:45 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 25 Mar 2010 09:47:45 -0700
Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject
	:( Hibernate Exception and suggestion for change in BioSqlSchema)
In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
Message-ID: <59a41c431003250947g6ecd11cbw21c5be5858b9aa09@mail.gmail.com>

Excellent, thanks Richard and Deepak!
Andreas

On Thu, Mar 25, 2010 at 9:27 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Patched and in subversion on the head in the new Biojava 3 code. I modified
> the code slightly to simplify it. There were also parallel changes required
> over in SimpleDocRef itself to enable it to continue working without being
> connected to BioSQL.
>
> On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:
>
> > I am writing this email again, I didn't get any response weather this
> bugs are patched or are they lost some where on mailing list. I am not sure
> that's why I am writing this back. I don't know how to apply this patch So I
> am counting on you guys to apply theses patch and reply me back so I know
> its fixed.
> >
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> > Hi
> > In response to bug fix suggested by Richard I have created some patches.
> We need to apply these to fix biojava from processing references from a
> genbank record in a wrong manner which cause more hibernate exceptions.
> After applying patch, reference resolution code will test pubmed or medline
> id, then if no match then test author/title/location, then if still no match
> create a new reference. I even tested it with GenbankRelease 175 and I
> gained almost 3159 more records in my database.
> >
> > Can somebody please have a look on second issue of it and fix it
> > "
> > 2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> > "
> >
> > Also I am planning on making a bridge between biosql database loaded
> using bioperl and biojava, here is my some of the investigation can you guys
> suggest some direction on it.
> > Have a look on attached files
> > 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank
> record is stored in biosql instance by bioperl and biojava
> > 2) GenbankRecord.doc  ==> its word document having a genbank showing
> where its information goes in biosql using bioperl and biojava
> > 3) BioSqlRichobjectBuilder.patch ==> patch needed for
> BioSqlRichObjectBuild.java class
> > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> >
> > -------- Original Message --------
> > Subject:      Re: Hibernate Exception and suggestion for change in
> BioSqlSchema
> > Date: Tue, 9 Feb 2010 20:34:32 +1300
> > From: Richard Holland <holland at eaglegenomics.com>
> > To:   Deepak Sheoran <sheoran143 at gmail.com>
> > CC:   biojava-l at biojava.org
> >
> > Hi. It's possible that your original email didn't make it to the list
> because it is HTML format, and the list only accepts plain text.
> >
> > However, in answer to your two questions:
> >
> >   1. The code that does the resolution of references might be better if
> it looks up existing IDs rather than using author, title, location to
> identify existing records. I would suggest modifying it to a three-step
> process - test ID, then if no match then test author/title/location, then if
> still no match create a new reference. Could someone do that? (I'm unable to
> do anything until late March).
> >
> >   2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> >
> > cheers,
> > Richard
> >
> > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> >
> > >
> > > Hi Richard
> > >
> > > Below is the email which I sent to Biojava-1 mailing list but it never
> get posted on the mailing list server neither do i got any response, so
> please have a look on this email and tell what can be the solution of the
> problem described in the message.
> > >
> > >
> > > Thanks
> > > Deepak Sheoran
> > > -------- Original Message --------
> > > Subject:    Hibernate Exception and suggestion for change in
> BioSqlSchema
> > > Date:       Wed, 03 Feb 2010 08:07:35 -0600
> > > From:       Deepak Sheoran
> > <sheoran143 at gmail.com>
> >
> > > To:
> > biojava-l at lists.open-bio.org
> >
> > >
> > > Hi guys,
> > >
> > > A couple of days back I was having some problem with hibernate
> exception but that exception got resolved and the reference to that email
> is:
> >
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> >
> > > On Richard  suggestion in above link  I am able to resolve some of
>  issues but then, I got stuck in to some other error with hibernate and then
> decided to investigate the matter and below are some facts and information
> which I found and I guess it is going to affect all of us.
> > >     ? The "Reference" table in bioSql schema have unique constraint on
> "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)).
> Which mean only one entry in reference table can use on dbxref_id.
> > > This Works wells but in cases when you have little variation in value
> of following column "location", "title", "authors" and all these variation
> refers to same PUBMED_ID. Then we can't persist or create a richsequence
> object .
> > >  Now when you tie RichObjectFactory to a  active hibernate session then
> the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class
> clazz, List paramsList) " which is responsible  for looking up details of
> object in the database and if it find one then it will return that object,
> else it will try to persist the new object into the database.
> > > But problem is with below part of that method:
> > > ?..LineNumber: 114
> > > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> > >  {                queryType = "DocRef";
> > >                 // convert List constructor to String representation
> for query
> > >                 ourParamsList.set(0,
> DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> > >                 if (ourParamsList.size()<3) {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title is null";
> > >                 } else {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title = ?";
> > >                 }
> > >  }
> > > ..LineNubmer: 123
> > > Now when hibernate search the database, it won't find any other record
> in "reference" table because those two record are different in string
> comparison, so it will return a new object back to "GenbankFormat" to
> following piece of code
> > > ?.LineNumber: 447
> > > else {
> > >                                         try {
> > >                                             CrossRef cr =
> (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new
> Object[]{dbname, raccession, new Integer(0)});
> > >                                             RankedCrossRef rcr = new
> SimpleRankedCrossRef(cr, ++rcrossrefCount);
> > >
> rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> > >                                         } catch (ChangeVetoException e)
> {
> > >                                             throw new
> ParseException(e+", accession:"+accession);
> > >                                         }
> > >                                     }
> > >                     ?..LineNumber:455
> > > Then we will add that object to rlistener. And move to next part of
> genbank record and then biojava search for a new crossref in database and it
> will try to persist the old one it get a hibernate exception regarding
> violation of  "unique constraint on dbxref_id" column.
> > >
> > > The only way to get these record in database is:
> > >             ? The very easy solution and the way I did it for testing
> my theory is Change the bioSql schema so that it can allow many to one on
> relation between "reference" and "dbxref" table.  Which even make sense
> because one paper can have many different variation of naming, and this
> change allow us to store that info too. But this is something BioSql people
> have decide and I don't know how to approach them.
> > >             ? Second solution is slightly difficult to implement, is to
> change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List
> paramsList)"  make decision about weather a particular DocRef already exist
> in database or not. I am mean testing all possible string variations of
> authors, location, title of the docRef which we are searching. Which does
> have many complications and may slow down process of creating a richsequence
> object when link RichObjectFactory with a active hibernate session.
> > >
> > > Example:Below is a sample of what i have in my local biosql schema
> which has modification suggested by me. (dbxref_id column have Pubmed_id , I
> replaced the local dbxref_id which was present on this table in my database
> with pubmed_id stored in "dbxref" table, for easy reference with outside
> world in this email)
> > > Reference_id
> > > Dbxref_id
> > > Location
> > > Title
> > > Authors
> > > crc
> > > 216
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536
> (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > 9E940E01F4BE3CD0
> > > 230
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > D3BC0C17F3F786C9
> > > 415
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is
> Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via
> Recombination with Repetitive Chromosomal Sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 60AEDFA0CEEACC38
> > > 969
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is
> extensive in vitro and in vivo and suggests that variation is generated via
> recombination with repetitive chromosomal sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 4B1232999F6E8130
> > > 929
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A.,
> Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W.,
> Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P.,
> Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and
> Venter,J.C.
> > > 3E79B40DD2AAA2B7
> > > 932
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T.,
> Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C.,
> Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M.,
> Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > > 094EB3384F8D6DE8
> > > 1426
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O.,
> Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S.,
> Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M.,
> Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and
> Fraser,C.M.
> > > 357648D8FD8C6C8A
> > > 1481
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O.,
> Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K.,
> Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W.,
> DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > > 115411EB2DEE5654
> > > 1497
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D5D376EECCD186B
> > > 1501
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D57954EECDED66B
> > > 1556
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4
> and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > 698688FB6DB95247
> > > 1559
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4
> and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > E25E1BA99DB18F3D
> > >
> > >     ? The second kind of error which I got was :
> org.hibernate.PropertyValueException: not-null property references a null or
> transient value: Location.feature
> > >             ? Which means in richsequence object some feature have
> location object which have its feature set to null.
> > >             ? My Observation:
> > >                     ? Usually occur when you try to persist a
> richsequence object to database, and occur to those features which have
> CompoundRichLocation usually "joins" and "complement" in cds region of a
> genbank record
> > >                     ? After catching the hibernate exception I went
> through all the features and either biojava or hibernate  changed the object
> type of a CompoundRichLocation  to SimpleRichLocation and set the feature
> variable to null.
> > >                     ? Below is the screen shot of one of my tests
> > >                             ? Settings before trying to persits the
> richsequence object to database
> > >
> > > <Mail Attachment.png>
> > >             ?
> > >             ? After trying to persits the richsequence object to
> database and got in hibernate exception catch
> > >
> > >             ? <Mail Attachment.png>
> > >
> > >             ? So my question is why is this happening and how to stop
> or how to get these record into database, I have no clue why is this
> happening.
> > >             ? Some extra information to make things more clear to you
> guys.
> > >                     ? Below are some Locus line from genbank record for
> which I know the error of location, I mean the cds region causing error, and
> array index in richsequence.feature arrayList object.
> > >                             ? LOCUS       AE001439             1643831
> bp    DNA     circular BCT 19-JAN-2006
> > >                                     ? richSequence.feature Index : 2540
> and line number in the genbank record : 22115
> > >                             ? LOCUS       CP001189             3887492
> bp    DNA     circular BCT 16-OCT-2008
> > >                                     ? richSequence.feature Index : 127
> and line number in the genbank record : 2137
> > >                             ? LOCUS       CP001292              328635
> bp    DNA     circular BCT 17-DEC-2008
> > >                                     ? richSequence.feature Index : 389
> and line number in the genbank record : 3632
> > >                             ? LOCUS       AM279694              238517
> bp    DNA     linear   BCT 23-OCT-2008
> > >                                     ? richSequence.feature Index : 47
> and line number in the genbank record : 4841
> > >                             ? LOCUS       CR931663               18517
> bp    DNA     linear   BCT 18-SEP-2008
> > >                                     ? richSequence.feature Index : 45
> and line number in the genbank record : 442
> > >             ? The complete exception msg :
> > > org.hibernate.PropertyValueException: not-null property references a
> null or transient value: Location.feature
> > >         at
> org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> > >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> > >         at
> trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> > >
> > >
> >
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E:
> > holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
> >
> >
> >
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>


From andreas at sdsc.edu  Thu Mar 25 12:56:21 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 25 Mar 2010 09:56:21 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <4BAB70D6.5060309@uni-tuebingen.de>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>
	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
	<59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>
	<3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
	<59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>
	<4BAB70D6.5060309@uni-tuebingen.de>
Message-ID: <59a41c431003250956h14abdbe2t1367bec10069d1f3@mail.gmail.com>

Hi Andreas,

that sounds great! I'll take a look at this soon...

Thanks,
Andreas

On Thu, Mar 25, 2010 at 7:19 AM, Andreas Dr?ger <
andreas.draeger at uni-tuebingen.de> wrote:

> Hi Andreas and Shakuntala,
>
> The alignment classes have just been revised and can be now updated from
> the repository. As a major improvement the alignment result has become much
> easier usable. So, if you're interested in computing something based on the
> score, you can now simply apply the dedicated get method and don't have to
> care about parsing anymore. I hope that helps.
>
> Cheers
> Andreas
>
> --
> Dipl.-Bioinform. Andreas Dr?ger
> Eberhard Karls University T?bingen
> Center for Bioinformatics (ZBIT)
> Sand 1
> 72076 T?bingen
> Germany
>
> Phone: +49-7071-29-70436
> Fax:   +49-7071-29-5091
>


From zhangyiwei79 at gmail.com  Thu Mar 25 16:14:50 2010
From: zhangyiwei79 at gmail.com (Yiwei Zhang)
Date: Thu, 25 Mar 2010 16:14:50 -0400
Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment
	project of Google Summer of Code
Message-ID: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com>

Hi,

I am a graduate student of computer science and my field of study is related
to Bioinformatic algorithms. I am proficient at  JAVA programming. I feel
very interested in this project because currently I am working on sequence
alignment and phylogeny tree reconstruction.

My question is that, if the project requires implementing the existing
alignment algorithms of current tools, what is the
original implementation language of the tools? C++ or C or something else?

Thanks!

From biopython at maubp.freeserve.co.uk  Thu Mar 25 18:16:55 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Mar 2010 22:16:55 +0000
Subject: [Biojava-l] [Biojava-dev] Bug fix for Biojava in regard to
	email with subject : ( Hibernate Exception and suggestion for
	change in BioSqlSchema)
In-Reply-To: <4BABAFA1.6090806@orionbiosciences.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
	<4BABAFA1.6090806@orionbiosciences.com>
Message-ID: <320fb6e01003251516w2977ab2h9869342f94576287@mail.gmail.com>

On Thu, Mar 25, 2010 at 6:46 PM, Deepak Sheoran
<deepak.sheoran at orionbiosciences.com> wrote:
>
> That is reason why I was getting error when i was creating a Richsequence
> object without any active session to biosql, I didn't had the clue that I
> created one more bug by fixing one, thanks for noticing that and fixing
> that.
>
> I am thinking should we use bioperl -biojava and biosql compatibility ?as
> one of the google summer of code project. I have vision on this, but don't
> know right way to being with. This can ?help people who want to use biojava
> but can't because they are afraid to loos their Perl code,which is heavily
> dependent on perl way of loading the schema. Or come out with a hybrid way
> which have good from both languages.
>
> Deepak Sheoran

That is an interesting idea for GSoC, I wonder if we at Biopython
should do the same. I know of a few things where we differ from
BioPerl's BioSQL support (e.g. SwissProt comment lines).

[I take we agree that bioperl-db is the de facto reference
implementation for mapping GenBank etc into BioSQL?]

Peter


From chapman at cs.wisc.edu  Fri Mar 26 03:14:24 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Fri, 26 Mar 2010 02:14:24 -0500
Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment
 project of Google Summer of Code
In-Reply-To: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com>
References: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com>
Message-ID: <4BAC5ED0.1050009@cs.wisc.edu>

Hi Yiwei (and list members),

I am also a graduate student in Bioinformatics interested in the Google Summer 
of Code project.  The authors' current implementations of ClustalW and ClustalX 
are written in C++.  Binaries, code, and references are located at 
http://www.clustal.org/ .  Download the boldfaced references (Larkin et al 2007 
and Thompson et al 1994) for the most relevant information.

Take care,
Mark


On 3/25/2010 3:14 PM, Yiwei Zhang wrote:
> Hi,
>
> I am a graduate student of computer science and my field of study is related
> to Bioinformatic algorithms. I am proficient at  JAVA programming. I feel
> very interested in this project because currently I am working on sequence
> alignment and phylogeny tree reconstruction.
>
> My question is that, if the project requires implementing the existing
> alignment algorithms of current tools, what is the
> original implementation language of the tools? C++ or C or something else?
>
> Thanks!
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

From bernd.jagla at pasteur.fr  Fri Mar 26 05:33:05 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Fri, 26 Mar 2010 10:33:05 +0100
Subject: [Biojava-l] SVN repository
In-Reply-To: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org><4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
	<59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
Message-ID: <776506315DB04C3EBF2A7FDA610390AB@zillumina>

Hi,

I am trying to check out biojava for the first time, and I am not sure if
the server is still down... Could you please let me if it is up or down?

Thanks,

Bernd

> -----Original Message-----
> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic
> Sent: Wednesday, March 17, 2010 6:40 PM
> To: Richard Finkers
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] SVN repository
> 
> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
> SVN is currently down. Depending on how big the problem turns out to be,
> it
> will be back at some point later today / should be back latest tomorrow.
> 
> Sorry for this inconvenience.
> Andreas
> 
> 
> 
> 
> On Wed, Mar 17, 2010 at 3:16 AM, Peter
> <biopython at maubp.freeserve.co.uk>wrote:
> 
> > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers
> <Richard.Finkers at wur.nl>
> > wrote:
> > >
> > > Hi,
> > >
> > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > future
> > > contribute to). However, I cannot access the SVN repository
> > > (
> > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-
> live/trunk
> > ).
> > >
> > > Is the repository down?
> > >
> > > Thanks,
> > > Richard
> >
> > Probably :(
> >
> > There have been problems discussed on the BioPerl mailing list
> > (they use the same servers), and the OBF team are aware of it:
> > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> >
> > The code.open-bio.org repositories are a read only public mirror,
> > while dev.open-bio.org is the master repository I think is fine
> > (but not available for anonymous download).
> >
> > In the mean time BioPerl have also setup a read only mirror
> > on github - perhaps BioJava could do the same? Meanwhile
> > BioRuby and Biopython are just using github (not SVN or CVS).
> >
> > Peter
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From mitlox at op.pl  Fri Mar 26 05:57:41 2010
From: mitlox at op.pl (xyz)
Date: Fri, 26 Mar 2010 19:57:41 +1000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
Message-ID: <20100326195741.4799c398@wp01>

@Andy: Thank you for the explanation. After the last sequence in the
input file in no newline character. 

@James: I change the code in order to get the biggest sequence first,
but the last sequence is missing. 


import java.io.*;
import java.util.*;

import org.biojava.bio.BioException;
import org.biojava.bio.symbol.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.*;

import java.util.Comparator;

public class SortFasta2 {

  static private class RichSequenceComparator implements
  Comparator<RichSequence> {

    public int compare(RichSequence seq1, RichSequence seq2) {
      return  seq2.length() - seq1.length();
    }
  }

  // Usage:  SortFasta unsortedFile.fasta
  public static void main(String[] args) throws FileNotFoundException,
  BioException {

    String fastaFile = "sortFasta.fasta";

    BufferedReader br = new BufferedReader(new FileReader(fastaFile));
    SimpleNamespace ns = new SimpleNamespace("biojava");

    Alphabet protein = AlphabetManager.alphabetForName("DNA");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
            protein.getTokenization("token"),
            ns);
    

    SortedSet<RichSequence> sorted = new TreeSet<RichSequence>(new
    SortFasta2.RichSequenceComparator());

    while (rsi.hasNext()) {
      sorted.add(rsi.nextRichSequence());
    }

    Iterator<RichSequence> sortedIt = sorted.iterator();

    /*Do whatever you want here with the ascending list of
    RichSequences by length, I'll just print them. */
    while (sortedIt.hasNext()) {
      //System.out.println(((RichSequence) sortedIt.next()).length());
      //System.out.println(sortedIt.next().getComments());
      System.out.println(sortedIt.next().seqString());
    }
  }
}

Input file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttcccccccccccccccccccccca

Output on the screen:
tttttttccccccccccccccccccccccc
atccccccccccccccccctttt
atccccctttttt
atccccc

How is it possible to get the last sequence and print the output in
fasta format on the screen?

Thank you in advance.


On Thu, 25 Mar 2010 10:17:31 -0400
James Swetnam wrote:

> Just replace the system.out.println with whatever you want to do with
> the sequences; write them to a file, etc.
> 
> James
> 

On Fri, 26 Mar 2010 09:40:28 +0000
"Andy Law (RI)" wrote:

> Does your input file have a line feed at the end or not? (Just a  
> thought)
> 
> Comparable is for comparing two objects using their "natural"
> ordering and is therefore a "fundamental" property of the class. A
> Comparator lets you compare/sort two objects on any characteristics
> and you can have many different comparators. Since this is a somewhat
> arbitrary way of comparing sequences (you could sort them on
> alphabetical sequence for example, or GC content), I guess that's why
> James used a comparator.
> 


From richard.finkers at wur.nl  Fri Mar 26 06:10:39 2010
From: richard.finkers at wur.nl (Finkers, Richard)
Date: Fri, 26 Mar 2010 11:10:39 +0100
Subject: [Biojava-l] SVN repository
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org><4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
	<59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
	<776506315DB04C3EBF2A7FDA610390AB@zillumina>
Message-ID: <33AFFE3255BCA043AF09514A6F6BFBAED04C0D@scomp0039.wurnet.nl>

Hi Bernd,

It has been working for two days but it seems to be down again.

Richard


-----Original Message-----
From: Bernd Jagla [mailto:bernd.jagla at pasteur.fr]
Sent: Fri 2010-03-26 10:33
To: 'Andreas Prlic'; Finkers, Richard
Cc: biojava-l at lists.open-bio.org
Subject: RE: [Biojava-l] SVN repository
 
Hi,

I am trying to check out biojava for the first time, and I am not sure if
the server is still down... Could you please let me if it is up or down?

Thanks,

Bernd

> -----Original Message-----
> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic
> Sent: Wednesday, March 17, 2010 6:40 PM
> To: Richard Finkers
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] SVN repository
> 
> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
> SVN is currently down. Depending on how big the problem turns out to be,
> it
> will be back at some point later today / should be back latest tomorrow.
> 
> Sorry for this inconvenience.
> Andreas
> 
> 
> 
> 
> On Wed, Mar 17, 2010 at 3:16 AM, Peter
> <biopython at maubp.freeserve.co.uk>wrote:
> 
> > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers
> <Richard.Finkers at wur.nl>
> > wrote:
> > >
> > > Hi,
> > >
> > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > future
> > > contribute to). However, I cannot access the SVN repository
> > > (
> > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-
> live/trunk
> > ).
> > >
> > > Is the repository down?
> > >
> > > Thanks,
> > > Richard
> >
> > Probably :(
> >
> > There have been problems discussed on the BioPerl mailing list
> > (they use the same servers), and the OBF team are aware of it:
> > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> >
> > The code.open-bio.org repositories are a read only public mirror,
> > while dev.open-bio.org is the master repository I think is fine
> > (but not available for anonymous download).
> >
> > In the mean time BioPerl have also setup a read only mirror
> > on github - perhaps BioJava could do the same? Meanwhile
> > BioRuby and Biopython are just using github (not SVN or CVS).
> >
> > Peter
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From andy.law at roslin.ed.ac.uk  Fri Mar 26 06:12:11 2010
From: andy.law at roslin.ed.ac.uk (Andy Law (RI))
Date: Fri, 26 Mar 2010 10:12:11 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <20100326195741.4799c398@wp01>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
Message-ID: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>


On 26 Mar 2010, at 09:57, xyz wrote:

> @Andy: Thank you for the explanation. After the last sequence in the
> input file in no newline character.
>

Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not  
seeing the last sequence when the file is not terminated with a  
newline character. Is this a bug or a feature, folks?

Later,

Andy
--------
Yada, yada, yada...
The University of Edinburgh is a charitable body, registered in  
Scotland, with registration number SC005336
Disclaimer: This e-mail and any attachments are confidential and  
intended solely for the use of the recipient(s) to whom they are  
addressed. If you have received it in error, please destroy all copies  
and inform the sender.


From andy.law at roslin.ed.ac.uk  Fri Mar 26 06:36:25 2010
From: andy.law at roslin.ed.ac.uk (Andy Law (RI))
Date: Fri, 26 Mar 2010 10:36:25 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
Message-ID: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>


On 26 Mar 2010, at 10:28, Richard Holland wrote:

> That there be a bug.

Albeit one with a simple workaround while the SVN server is broken :o}

>
> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:
>
>>
>> On 26 Mar 2010, at 09:57, xyz wrote:
>>
>>> @Andy: Thank you for the explanation. After the last sequence in the
>>> input file in no newline character.
>>>
>>
>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are  
>> not seeing the last sequence when the file is not terminated with a  
>> newline character. Is this a bug or a feature, folks?
>>
>> Later,
>>
>> Andy
>> --------
>> Yada, yada, yada...
>> The University of Edinburgh is a charitable body, registered in  
>> Scotland, with registration number SC005336
>> Disclaimer: This e-mail and any attachments are confidential and  
>> intended solely for the use of the recipient(s) to whom they are  
>> addressed. If you have received it in error, please destroy all  
>> copies and inform the sender.
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>

Later,

Andy
--------
Yada, yada, yada...
The University of Edinburgh is a charitable body, registered in  
Scotland, with registration number SC005336
Disclaimer: This e-mail and any attachments are confidential and  
intended solely for the use of the recipient(s) to whom they are  
addressed. If you have received it in error, please destroy all copies  
and inform the sender.


From holland at eaglegenomics.com  Fri Mar 26 06:28:19 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 26 Mar 2010 10:28:19 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
Message-ID: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>

That there be a bug. 

On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:

> 
> On 26 Mar 2010, at 09:57, xyz wrote:
> 
>> @Andy: Thank you for the explanation. After the last sequence in the
>> input file in no newline character.
>> 
> 
> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks?
> 
> Later,
> 
> Andy
> --------
> Yada, yada, yada...
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Fri Mar 26 06:41:21 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 26 Mar 2010 10:41:21 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
Message-ID: <A11F888E-C180-40B2-BD84-1590F4FC7905@eaglegenomics.com>

Do you have a fix? I can't remember if you've got SVN access or not - if you do, please do commit it, otherwise email me a patch and I'll commit it for you.

On 26 Mar 2010, at 10:36, Andy Law (RI) wrote:

> 
> On 26 Mar 2010, at 10:28, Richard Holland wrote:
> 
>> That there be a bug.
> 
> Albeit one with a simple workaround while the SVN server is broken :o}
> 
>> 
>> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:
>> 
>>> 
>>> On 26 Mar 2010, at 09:57, xyz wrote:
>>> 
>>>> @Andy: Thank you for the explanation. After the last sequence in the
>>>> input file in no newline character.
>>>> 
>>> 
>>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks?
>>> 
>>> Later,
>>> 
>>> Andy
>>> --------
>>> Yada, yada, yada...
>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
> 
> Later,
> 
> Andy
> --------
> Yada, yada, yada...
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
> 
> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Fri Mar 26 07:04:22 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 26 Mar 2010 11:04:22 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
Message-ID: <E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>

I can't see anything in the code that would cause that behaviour. :( Could you provide sample code and a supporting FASTA file that replicates the problem?

On 26 Mar 2010, at 10:36, Andy Law (RI) wrote:

> 
> On 26 Mar 2010, at 10:28, Richard Holland wrote:
> 
>> That there be a bug.
> 
> Albeit one with a simple workaround while the SVN server is broken :o}
> 
>> 
>> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:
>> 
>>> 
>>> On 26 Mar 2010, at 09:57, xyz wrote:
>>> 
>>>> @Andy: Thank you for the explanation. After the last sequence in the
>>>> input file in no newline character.
>>>> 
>>> 
>>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks?
>>> 
>>> Later,
>>> 
>>> Andy
>>> --------
>>> Yada, yada, yada...
>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
> 
> Later,
> 
> Andy
> --------
> Yada, yada, yada...
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
> 
> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From Richard.Finkers at wur.nl  Fri Mar 26 12:27:59 2010
From: Richard.Finkers at wur.nl (Richard Finkers)
Date: Fri, 26 Mar 2010 17:27:59 +0100
Subject: [Biojava-l] SVN repository
In-Reply-To: <776506315DB04C3EBF2A7FDA610390AB@zillumina>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org><4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
	<59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
	<776506315DB04C3EBF2A7FDA610390AB@zillumina>
Message-ID: <4BACE08F.8020604@wur.nl>


The repository has been back for two days. But it appears to be down again.

Richard

Bernd Jagla wrote:
> Hi,
>
> I am trying to check out biojava for the first time, and I am not sure if
> the server is still down... Could you please let me if it is up or down?
>
> Thanks,
>
> Bernd
>
>   
>> -----Original Message-----
>> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
>> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic
>> Sent: Wednesday, March 17, 2010 6:40 PM
>> To: Richard Finkers
>> Cc: biojava-l at lists.open-bio.org
>> Subject: Re: [Biojava-l] SVN repository
>>
>> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
>> SVN is currently down. Depending on how big the problem turns out to be,
>> it
>> will be back at some point later today / should be back latest tomorrow.
>>
>> Sorry for this inconvenience.
>> Andreas
>>
>>
>>
>>
>> On Wed, Mar 17, 2010 at 3:16 AM, Peter
>> <biopython at maubp.freeserve.co.uk>wrote:
>>
>>     
>>> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers
>>>       
>> <Richard.Finkers at wur.nl>
>>     
>>> wrote:
>>>       
>>>> Hi,
>>>>
>>>> I would like to have a look at the BioJava 3 code (and perhaps in the
>>>>         
>>> future
>>>       
>>>> contribute to). However, I cannot access the SVN repository
>>>> (
>>>>         
>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-
>>>       
>> live/trunk
>>     
>>> ).
>>>       
>>>> Is the repository down?
>>>>
>>>> Thanks,
>>>> Richard
>>>>         
>>> Probably :(
>>>
>>> There have been problems discussed on the BioPerl mailing list
>>> (they use the same servers), and the OBF team are aware of it:
>>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>>>
>>> The code.open-bio.org repositories are a read only public mirror,
>>> while dev.open-bio.org is the master repository I think is fine
>>> (but not available for anonymous download).
>>>
>>> In the mean time BioPerl have also setup a read only mirror
>>> on github - perhaps BioJava could do the same? Meanwhile
>>> BioRuby and Biopython are just using github (not SVN or CVS).
>>>
>>> Peter
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>       
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>     
>
>
>
>   


-- 
Dr. Richard Finkers
Researcher Plant Breeding
Wageningen UR Plant Breeding
P.O. Box 16, 6700 AA, Wageningen, The Netherlands
Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB
Wageningen, The Netherlands
Tel. +31-317-484165 Fax +31-317-418094
http://www.plantbreeding.wur.nl/
https://www.eu-sol.wur.nl/
https://cbsgdbase.wur.nl/ <https://cbsgdbase.wur.nl>
http://www.disclaimer-uk.wur.nl/


From mitlox at op.pl  Fri Mar 26 21:49:46 2010
From: mitlox at op.pl (xyz)
Date: Sat, 27 Mar 2010 11:49:46 +1000
Subject: [Biojava-l] Reading and writting Fastq files
Message-ID: <20100327114946.276925da@wp01>

Hello,
I could not find any examples how to read or write fastq files. 

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import org.biojava.bio.program.fastq.FastqReader;

public class Fastq2Fasta {
  public static void main(String[] args) throws FileNotFoundException  {
    BufferedReader br = new BufferedReader(new FileReader("fastq2fasta.fasta"));
  }
}

Are there any examples how to work with fastq files?

Thank you in advance.

Best regards,

From holland at eaglegenomics.com  Sat Mar 27 04:18:04 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 27 Mar 2010 08:18:04 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <20100327100348.1f253bfb@wp01>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
	<20100327100348.1f253bfb@wp01>
Message-ID: <2AC8333D-EE71-495E-9C12-98764D81FE2D@eaglegenomics.com>

Andy and I came to the conclusion yesterday that this is probably a bug with Java itself - somewhere in the readLine() method in BufferedReader. There's nothing in BioJava that could cause this kind of behaviour other than if it was being fed duff information by BufferedReader.

On 27 Mar 2010, at 00:03, xyz wrote:

> Please find the input fasta file attached. This file I created under
> Linux and I also work with BioJava under Linux. Nothing change if I
> created after the last sequence a new line.
> 
> On Fri, 26 Mar 2010 11:04:22 +0000
> Richard Holland wrote:
> 
>> I can't see anything in the code that would cause that
>> behaviour. :( Could you provide sample code and a supporting FASTA
>> file that replicates the problem?
>> 
> 
> <sortFasta.fasta>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From mitlox at op.pl  Sat Mar 27 05:48:14 2010
From: mitlox at op.pl (xyz)
Date: Sat, 27 Mar 2010 19:48:14 +1000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
Message-ID: <20100327194814.1acc8655@wp01>

You can find the input fasta file here 
http://mitlox.republika.pl/sortFasta.fasta . This file I created under
Linux and I also work with BioJava under Linux. Nothing change if I
created after the last sequence a new line.

On Fri, 26 Mar 2010 11:04:22 +0000
Richard Holland wrote:

> I can't see anything in the code that would cause that
> behaviour. :( Could you provide sample code and a supporting FASTA
> file that replicates the problem?
> 


From voisingreg at yahoo.fr  Sat Mar 27 07:24:01 2010
From: voisingreg at yahoo.fr (gregory voisin)
Date: Sat, 27 Mar 2010 11:24:01 +0000 (GMT)
Subject: [Biojava-l] Unsubcribe?
In-Reply-To: <20100327194814.1acc8655@wp01>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
	<20100327194814.1acc8655@wp01>
Message-ID: <832231.74869.qm@web23207.mail.ird.yahoo.com>

Hi, 
How to unsubscribe of this list ?
thanks
greg

?


________________________________
De : xyz <mitlox at op.pl>
? : Richard Holland <holland at eaglegenomics.com>
Cc : Andy Law (RI) <andy.law at roslin.ed.ac.uk>; "biojava-l at lists.open-bio.org" <biojava-l at lists.open-bio.org>
Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s
Objet?: Re: [Biojava-l] sort fasta file

You can find the input fasta file here 
http://mitlox.republika.pl/sortFasta.fasta . This file I created under
Linux and I also work with BioJava under Linux. Nothing change if I
created after the last sequence a new line.

On Fri, 26 Mar 2010 11:04:22 +0000
Richard Holland wrote:

> I can't see anything in the code that would cause that
> behaviour. :( Could you provide sample code and a supporting FASTA
> file that replicates the problem?
> 

_______________________________________________
Biojava-l mailing list? -? Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mitlox at op.pl  Sat Mar 27 09:54:40 2010
From: mitlox at op.pl (xyz)
Date: Sat, 27 Mar 2010 23:54:40 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com>
References: <20100327114946.276925da@wp01>
	<326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com>
Message-ID: <20100327235440.23cffb47@wp01>

Hello,
I would like to use org.biojava.bio.program.fastq in order to read and
write Illumina fastq files.

Are there any BioJava examples how to work with fastq files?

On Sat, 27 Mar 2010 17:40:21 +0530
jitesh dundas wrote:

> Hello,
> 
> Fasta files are  normal text files. Try parsing using normal text
> parsing methods.
> 
> If you could be more specific & tell me the format details,then I
> could help better.
> 
> btw,try using biojava ,the easy & better option if you want.
> 
> Regards,
> Jitesh Dundas
> 
> On 3/27/10, xyz <mitlox at op.pl> wrote:
> > Hello,
> > I could not find any examples how to read or write fastq files.
> >
> > import java.io.BufferedReader;
> > import java.io.FileNotFoundException;
> > import java.io.FileReader;
> > import org.biojava.bio.program.fastq.FastqReader;
> >
> > public class Fastq2Fasta {
> >   public static void main(String[] args) throws
> > FileNotFoundException  { BufferedReader br = new BufferedReader(new
> > FileReader("fastq2fasta.fasta"));
> >   }
> > }
> >
> > Are there any examples how to work with fastq files?
> >
> > Thank you in advance.
> >
> > Best regards,
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >


From heuermh at acm.org  Sun Mar 28 00:27:16 2010
From: heuermh at acm.org (Michael Heuer)
Date: Sun, 28 Mar 2010 00:27:16 -0400 (EDT)
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <20100327235440.23cffb47@wp01>
Message-ID: <Pine.GSO.4.44.1003280014470.28125-100000@shell3.shore.net>


Sorry, I haven't written up an example for the Biojava Cookbook yet.

The FASTQ package javadoc API is at

http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html

If you want to read Illumina format FASTQ files, use

FastqReader reader = new IlluminaFastqReader();
for (Fastq fastq : reader.read(new File("in.fastq")))
{
  // ...
}

   michael


On Sat, 27 Mar 2010, xyz wrote:

> Hello,
> I would like to use org.biojava.bio.program.fastq in order to read and
> write Illumina fastq files.
>
> Are there any BioJava examples how to work with fastq files?
>
> On Sat, 27 Mar 2010 17:40:21 +0530
> jitesh dundas wrote:
>
> > Hello,
> >
> > Fasta files are  normal text files. Try parsing using normal text
> > parsing methods.
> >
> > If you could be more specific & tell me the format details,then I
> > could help better.
> >
> > btw,try using biojava ,the easy & better option if you want.
> >
> > Regards,
> > Jitesh Dundas
> >
> > On 3/27/10, xyz <mitlox at op.pl> wrote:
> > > Hello,
> > > I could not find any examples how to read or write fastq files.
> > >
> > > import java.io.BufferedReader;
> > > import java.io.FileNotFoundException;
> > > import java.io.FileReader;
> > > import org.biojava.bio.program.fastq.FastqReader;
> > >
> > > public class Fastq2Fasta {
> > >   public static void main(String[] args) throws
> > > FileNotFoundException  { BufferedReader br = new BufferedReader(new
> > > FileReader("fastq2fasta.fasta"));
> > >   }
> > > }
> > >
> > > Are there any examples how to work with fastq files?
> > >
> > > Thank you in advance.
> > >
> > > Best regards,
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From mitlox at op.pl  Sun Mar 28 01:44:57 2010
From: mitlox at op.pl (xyz)
Date: Sun, 28 Mar 2010 15:44:57 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com>
References: <20100327114946.276925da@wp01>
	<326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com>
	<20100327235440.23cffb47@wp01>
	<326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com>
Message-ID: <20100328154457.46e088a6@wp01>

Hello,
I could create methods which can read and write fastq files.
However, I downloaded the BioJava source code and in folder
src/org/biojava/bio/program are following files:

* AbstractFastqReader.java
* AbstractFastqWriter.java
* Fastq.java
* FastqBuilder.java
* FastqReader.java
* FastqVariant.java
* FastqWriter.java
* IlluminaFastqReader.java
* IlluminaFastqWriter.java
* SangerFastqReader.java
* SangerFastqWriter.java
* SolexaFastqReader.java
* SolexaFastqWriter.java

These looks to me that is exactly what I need, but unfortunately I do
not how to use it.

On Sat, 27 Mar 2010 20:13:02 +0530
jitesh dundas wrote:

> Hello,
> 
> I could not find much info on that Q.Try the Biojava API for methods.
> 
> However, I would think of this problem as a simple text file parsing
> using BufferedReader and ByteInputStream based I/p ..You have to read
> the text file content byte by byte using a while loop. The loop will
> detect each column using the patterns (i haven't worked on fastq or
> biojava that much) in the text file, e.g. space tabs..
> Why don't you try reading this fastq file as a simple text file in
> java.
> 
> This is assuming that fastq are text files..Correct me if I am wrong..
> Java tutorial & forums have bulk of egs on that.
> 
> Try writing the code and send the fastq file with the java code if you
> face issues..
> 
> Hope this helps..
> 
> Regards,
> jd


From mitlox at op.pl  Sun Mar 28 03:20:40 2010
From: mitlox at op.pl (xyz)
Date: Sun, 28 Mar 2010 17:20:40 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <Pine.GSO.4.44.1003280014470.28125-100000@shell3.shore.net>
References: <20100327235440.23cffb47@wp01>
	<Pine.GSO.4.44.1003280014470.28125-100000@shell3.shore.net>
Message-ID: <20100328172040.478de1a1@wp01>

Do not worry. I wrote following code:

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.biojava.bio.program.fastq.Fastq;
import org.biojava.bio.program.fastq.FastqBuilder;
import org.biojava.bio.program.fastq.FastqReader;
import org.biojava.bio.program.fastq.FastqWriter;
import org.biojava.bio.program.fastq.IlluminaFastqReader;
import org.biojava.bio.program.fastq.IlluminaFastqWriter;

public class Fastq2Fasta {

  public static void main(String[] args) throws FileNotFoundException,
  IOException { 
    FileInputStream inputFastq = new
  FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
  IlluminaFastqReader();

    FileOutputStream outputFastq = new
    FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
    new IlluminaFastqWriter();


    for (Fastq fastq : qReader.read(inputFastq)) {
      System.out.println(fastq.getDescription());
      System.out.println(fastq.getSequence());
      String trimSeq = fastq.getSequence().substring(0,
    fastq.getSequence().length() - 6); System.out.println(trimSeq);
      System.out.println(fastq.getQuality());
      String trimQual = fastq.getQuality().substring(0,
    fastq.getQuality().length() - 6); System.out.println(trimQual);

      FastqBuilder trimFastq = new FastqBuilder();
      trimFastq.withDescription(fastq.getDescription());
      trimFastq.appendSequence(trimSeq);
      trimFastq.appendQuality(trimQual);
      
      qWriter.write(outputFastq, trimFastq.build());
    }
  }
}

and the input fastq file is:
@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
+HWI-EAS406:5:1:0:1390#0/1
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+HWI-EAS406:5:1:0:1390#0/1
PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB
@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC
+HWI-EAS406:5:1:0:1390#0/1
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC

Unfortunately, I get the following error:
HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
GGGTGATGGCCGCTGCCGATGGCGTCAAAA
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Exception in thread "main" java.io.IOException: sequence
HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger
at
org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41)
at
org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67)
at
org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143)
at
org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125)
at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1

What did I wrong?

On Sun, 28 Mar 2010 00:27:16 -0400 (EDT)
Michael Heuer wrote:

> 
> Sorry, I haven't written up an example for the Biojava Cookbook yet.
> 
> The FASTQ package javadoc API is at
> 
> http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html
> 
> If you want to read Illumina format FASTQ files, use
> 
> FastqReader reader = new IlluminaFastqReader();
> for (Fastq fastq : reader.read(new File("in.fastq")))
> {
>   // ...
> }
> 
>    michael


From andreas at sdsc.edu  Sun Mar 28 13:44:32 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 28 Mar 2010 10:44:32 -0700
Subject: [Biojava-l] Unsubcribe?
In-Reply-To: <832231.74869.qm@web23207.mail.ird.yahoo.com>
References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
	<20100327194814.1acc8655@wp01>
	<832231.74869.qm@web23207.mail.ird.yahoo.com>
Message-ID: <59a41c431003281044y36137b05nd993e8e51ef7484e@mail.gmail.com>

We are using mailman for our mailing lists :

http://www.biojava.org/mailman/listinfo/biojava-l

Andreas

On Sat, Mar 27, 2010 at 4:24 AM, gregory voisin <voisingreg at yahoo.fr> wrote:

> Hi,
> How to unsubscribe of this list ?
> thanks
> greg
>
>
>
>
>
> ________________________________
> De : xyz <mitlox at op.pl>
> ? : Richard Holland <holland at eaglegenomics.com>
> Cc : Andy Law (RI) <andy.law at roslin.ed.ac.uk>; "
> biojava-l at lists.open-bio.org" <biojava-l at lists.open-bio.org>
> Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s
> Objet : Re: [Biojava-l] sort fasta file
>
> You can find the input fasta file here
> http://mitlox.republika.pl/sortFasta.fasta . This file I created under
> Linux and I also work with BioJava under Linux. Nothing change if I
> created after the last sequence a new line.
>
> On Fri, 26 Mar 2010 11:04:22 +0000
> Richard Holland wrote:
>
> > I can't see anything in the code that would cause that
> > behaviour. :( Could you provide sample code and a supporting FASTA
> > file that replicates the problem?
> >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From heuermh at acm.org  Mon Mar 29 22:01:23 2010
From: heuermh at acm.org (Michael Heuer)
Date: Mon, 29 Mar 2010 22:01:23 -0400 (EDT)
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <20100328172040.478de1a1@wp01>
Message-ID: <Pine.GSO.4.44.1003292153001.17205-100000@shell3.shore.net>


FastqBuilder defaults to the Sanger variant, see

http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT


In your code, you just need to specify the Illumina variant

FastqBuilder trimFastq = new FastqBuilder()
  .withVariant(FastqVariant.FASTQ_ILLUMINA)
  .withDescription(fastq.getDescription())
  .appendSequence(trimSeq)
  .appendQuality(trimQual);


Please let me know if you have any API or doc suggestions, as this stuff
has not been used much by anyone other than myself.

   michael


On Sun, 28 Mar 2010, xyz wrote:

> Do not worry. I wrote following code:
>
> import java.io.FileInputStream;
> import java.io.FileNotFoundException;
> import java.io.FileOutputStream;
> import java.io.IOException;
> import org.biojava.bio.program.fastq.Fastq;
> import org.biojava.bio.program.fastq.FastqBuilder;
> import org.biojava.bio.program.fastq.FastqReader;
> import org.biojava.bio.program.fastq.FastqWriter;
> import org.biojava.bio.program.fastq.IlluminaFastqReader;
> import org.biojava.bio.program.fastq.IlluminaFastqWriter;
>
> public class Fastq2Fasta {
>
>   public static void main(String[] args) throws FileNotFoundException,
>   IOException {
>     FileInputStream inputFastq = new
>   FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
>   IlluminaFastqReader();
>
>     FileOutputStream outputFastq = new
>     FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
>     new IlluminaFastqWriter();
>
>
>     for (Fastq fastq : qReader.read(inputFastq)) {
>       System.out.println(fastq.getDescription());
>       System.out.println(fastq.getSequence());
>       String trimSeq = fastq.getSequence().substring(0,
>     fastq.getSequence().length() - 6); System.out.println(trimSeq);
>       System.out.println(fastq.getQuality());
>       String trimQual = fastq.getQuality().substring(0,
>     fastq.getQuality().length() - 6); System.out.println(trimQual);
>
>       FastqBuilder trimFastq = new FastqBuilder();
>       trimFastq.withDescription(fastq.getDescription());
>       trimFastq.appendSequence(trimSeq);
>       trimFastq.appendQuality(trimQual);
>
>       qWriter.write(outputFastq, trimFastq.build());
>     }
>   }
> }
>
> and the input fastq file is:
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
> +HWI-EAS406:5:1:0:1390#0/1
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
> +HWI-EAS406:5:1:0:1390#0/1
> PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC
> +HWI-EAS406:5:1:0:1390#0/1
> QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC
>
> Unfortunately, I get the following error:
> HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
> GGGTGATGGCCGCTGCCGATGGCGTCAAAA
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> Exception in thread "main" java.io.IOException: sequence
> HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger
> at
> org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41)
> at
> org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67)
> at
> org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143)
> at
> org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125)
> at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1
>
> What did I wrong?
>
> On Sun, 28 Mar 2010 00:27:16 -0400 (EDT)
> Michael Heuer wrote:
>
> >
> > Sorry, I haven't written up an example for the Biojava Cookbook yet.
> >
> > The FASTQ package javadoc API is at
> >
> > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html
> >
> > If you want to read Illumina format FASTQ files, use
> >
> > FastqReader reader = new IlluminaFastqReader();
> > for (Fastq fastq : reader.read(new File("in.fastq")))
> > {
> >   // ...
> > }
> >
> >    michael
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From mitlox at op.pl  Tue Mar 30 07:50:47 2010
From: mitlox at op.pl (xyz)
Date: Tue, 30 Mar 2010 21:50:47 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <Pine.GSO.4.44.1003292153001.17205-100000@shell3.shore.net>
References: <20100328172040.478de1a1@wp01>
	<Pine.GSO.4.44.1003292153001.17205-100000@shell3.shore.net>
Message-ID: <20100330215047.084f6b00@wp01>

Thank you it works, but after I extended the code with 
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription());
in order to get also a trimmed fasta file I got the following error:

Fastq2Fasta.java:51: cannot
find symbol symbol  : method
writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String)
location: class org.biojavax.bio.seq.RichSequence.IOTools
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription()); 1 error

Complete Code:
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.biojava.bio.program.fastq.Fastq;
import org.biojava.bio.program.fastq.FastqBuilder;
import org.biojava.bio.program.fastq.FastqReader;
import org.biojava.bio.program.fastq.FastqVariant;
import org.biojava.bio.program.fastq.FastqWriter;
import org.biojava.bio.program.fastq.IlluminaFastqReader;
import org.biojava.bio.program.fastq.IlluminaFastqWriter;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;


public class Fastq2Fasta {

  public static void main(String[] args) throws FileNotFoundException,
  IOException {

    FileInputStream inputFastq = new
    FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
    IlluminaFastqReader();

    FileOutputStream outputFastq = new
    FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
    new IlluminaFastqWriter();

    SimpleNamespace ns = new SimpleNamespace("biojava");

    FileOutputStream outputFasta = new
    FileOutputStream("fastq2fastaTrim.fasta");


    for (Fastq fastq : qReader.read(inputFastq)) {
      System.out.println(fastq.getDescription());
      System.out.println(fastq.getSequence());
      String trimSeq = fastq.getSequence().substring(0,
    fastq.getSequence().length() - 6); System.out.println(trimSeq);
      System.out.println(fastq.getQuality());
      String trimQual = fastq.getQuality().substring(0,
    fastq.getQuality().length() - 6); System.out.println(trimQual);

      FastqBuilder trimFastq = new FastqBuilder();
      trimFastq.withVariant(FastqVariant.FASTQ_ILLUMINA);
      trimFastq.withDescription(fastq.getDescription());
      trimFastq.appendSequence(trimSeq);
      trimFastq.appendQuality(trimQual);

      qWriter.write(outputFastq, trimFastq.build());
      
      RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
      fastq.getDescription());


    }
  }
}

What did I wrong?

Suggestions:
1) 
After I trimmed the fastq files the header information for quality
is empty

@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAA
+
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

this reduced the size of the files but is it compatible with
SOAP and TopHat?

2)
I was using fastq files up to 6 GBytes and I have not run any benchmarks
with different Buffer/stream combination on big text files and therefore
I am not sure that is enough to use just FileInputStream or
FileOutputStream. BioJavaX is using BufferedReader br = new
BufferedReader(new FileReader()) are there any speed difference?

Overall I think the API looks good and for doc you could use this code
and put it on BioJava.


On Mon, 29 Mar 2010 22:01:23 -0400 (EDT)
Michael Heuer wrote:

> 
> FastqBuilder defaults to the Sanger variant, see
> 
> http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT
> 
> 
> In your code, you just need to specify the Illumina variant
> 
> FastqBuilder trimFastq = new FastqBuilder()
>   .withVariant(FastqVariant.FASTQ_ILLUMINA)
>   .withDescription(fastq.getDescription())
>   .appendSequence(trimSeq)
>   .appendQuality(trimQual);
> 
> 
> Please let me know if you have any API or doc suggestions, as this
> stuff has not been used much by anyone other than myself.
> 
>    michael
> 
> 
> 


From heuermh at acm.org  Wed Mar 31 23:56:42 2010
From: heuermh at acm.org (Michael Heuer)
Date: Wed, 31 Mar 2010 23:56:42 -0400 (EDT)
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <20100330215047.084f6b00@wp01>
Message-ID: <Pine.GSO.4.44.1003312334350.18726-100000@shell3.shore.net>

xyz wrote:

> Thank you it works, but after I extended the code with
> RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
> fastq.getDescription());
> in order to get also a trimmed fasta file I got the following error:
>
> Fastq2Fasta.java:51: cannot
> find symbol symbol  : method
> writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String)
> location: class org.biojavax.bio.seq.RichSequence.IOTools
> RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
> fastq.getDescription()); 1 error

The fastq package has not yet been integrated with biojava core or the
biojavax packages.  If you would like to use RichSequence.IOTools, you
would need to create a RichSequence from each Fastq object before writing.

Something like

import static ...RichSequence.Tools.*;
import static ...RichSequence.IOTools.*;

Fastq fastq = ...;
Namespace namepace = ...;
RichSequence richSequence = createRichSequence(
  namespace,
  fastq.getDescription(),
  fastq.getSequence(),
  DNATools.getDNA());

writeFasta(outputStream, richSequence, namespace);

may work.


> Suggestions:
> 1)
> After I trimmed the fastq files the header information for quality
> is empty
>
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAAA
> +
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
>
> this reduced the size of the files but is it compatible with
> SOAP and TopHat?

Sorry, not sure what you are asking here.


> 2)
> I was using fastq files up to 6 GBytes and I have not run any benchmarks
> with different Buffer/stream combination on big text files and therefore
> I am not sure that is enough to use just FileInputStream or
> FileOutputStream. BioJavaX is using BufferedReader br = new
> BufferedReader(new FileReader()) are there any speed difference?

AbstractFastqReader.read(InputStream) uses a BufferedReader, and all the
other read methods pass through that one.

   michael


From rmb32 at cornell.edu  Fri Mar 26 03:44:09 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Mar 2010 00:44:09 -0700
Subject: [Biojava-l] GSoC mentors mailing list
Message-ID: <4BAC65C9.307@cornell.edu>

Hi all,

If you have volunteered to be a possible GSoC mentor, and have not 
already been subscribed to the (mentors-only) gsoc-mentors mailing list, 
send me an email and I'll subscribe you.

Rob Buels
OBF GSoC 2010 Admin


From rmb32 at cornell.edu  Fri Mar 26 12:30:30 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Mar 2010 09:30:30 -0700
Subject: [Biojava-l] Announcing OBF Summer of Code - please forward!
Message-ID: <4BACE126.1030500@cornell.edu>

Hi all,

Here's an advertising-ready announcement for OBF's Summer of Code, 
thanks to Christian Zmasek and Hilmar Lapp for their excellent writing.

Student applications are due April 9!  Please spread it widely, we need 
to reach lots of students with it!

Rob Buels
OBF GSoC 2010 Admin


============================================================

*** Please disseminate widely at your local institutions ***
*** including posting to message and job boards, so that ***
*** we reach as many students as possible.               ***

============================================================


OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010

Applications due 19:00 UTC, April 9, 2010.
http://www.open-bio.org/wiki/Google_Summer_of_Code

The Open Bioinformatics Foundation Summer of Code program provides a 
unique opportunity for undergraduate, masters, and PhD students to 
obtain hands-on experience writing and extending open-source software 
for bioinformatics under the mentorship of experienced developers from 
around the world. The program is the participation of the Open 
Bioinformatics Foundation (OBF) as a mentoring organization in the 
Google Summer of Code(tm) (http://code.google.com/soc/).

Students successfully completing the 3 month program receive a $5,000 
USD stipend, and may work entirely from their home or home institution. 
  Participation is open to students from any country in the world except 
countries subject to US trade restrictions.  Each student will have at 
least one dedicated mentor to show them the ropes and help them complete 
their project.

The Open Bioinformatics Foundation is particularly seeking students 
interested in both bioinformatics (computational biology) and software 
development. Some initial project ideas are listed on the website. These 
range from Galaxy phylogenetics pipeline development in Biopython to 
lightweight sequence objects and lazy parsing in BioPerl, a DAS Server 
for large files on local filesystems, and mapping Java libraries to 
Perl/Ruby/Python using Biolib+SWIG+JNI.  All project ideas are flexible 
and many can be adjusted in scope to match the skills of the student. We 
also welcome and encourage students proposing their own project ideas; 
historically some of the most successful Summer of Code projects are 
ones proposed by the students themselves.

TO APPLY: Apply online at the Google Summer of Code website 
(http://socghop.appspot.com/), where you will also find GSoC program 
rules and eligibility requirements. The 12-day application period for 
students runs from Monday, March 29 through Friday, April 9th, 2010.

INQUIRIES:

We strongly encourage all interested students to get in touch with us 
with their ideas as early on as possible.  See the OBF GSoC page for 
contact details.

2010 OBF Summer of Code:
http://www.open-bio.org/wiki/Google_Summer_of_Code

Google Summer of Code FAQ:
http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs


From sheoran143 at gmail.com  Wed Mar 24 21:19:29 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Wed, 24 Mar 2010 20:19:29 -0500
Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :(
 Hibernate Exception and suggestion for change in BioSqlSchema)
Message-ID: <4BAABA21.4000301@gmail.com>

I am writing this email again, I didn't get any response weather this 
bugs are patched or are they lost some where on mailing list. I am not 
sure that's why I am writing this back. I don't know how to apply this 
patch So I am counting on you guys to apply theses patch and reply me 
back so I know its fixed.


Thanks
Deepak Sheoran


Hi
In response to bug fix suggested by Richard I have created some patches. 
We need to apply these to fix biojava from processing references from a 
genbank record in a wrong manner which cause more hibernate exceptions. 
After applying patch, reference resolution code will test pubmed or 
medline id, then if no match then test author/title/location, then if 
still no match create a new reference. I even tested it with 
GenbankRelease 175 and I gained almost 3159 more records in my database.

Can somebody please have a look on second issue of it and fix it
"

2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

"

Also I am planning on making a bridge between biosql database loaded 
using bioperl and biojava, here is my some of the investigation can you 
guys suggest some direction on it.
Have a look on attached files
1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank 
record is stored in biosql instance by bioperl and biojava
2) GenbankRecord.doc  ==> its word document having a genbank showing 
where its information goes in biosql using bioperl and biojava
3) BioSqlRichobjectBuilder.patch ==> patch needed for 
BioSqlRichObjectBuild.java class
4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class


Thanks
Deepak Sheoran


-------- Original Message --------
Subject: 	Re: Hibernate Exception and suggestion for change in BioSqlSchema
Date: 	Tue, 9 Feb 2010 20:34:32 +1300
From: 	Richard Holland <holland at eaglegenomics.com>
To: 	Deepak Sheoran <sheoran143 at gmail.com>
CC: 	biojava-l at biojava.org


Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.

However, in answer to your two questions:

   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).

   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

cheers,
Richard

On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:

>
>  Hi Richard
>
>  Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
>
>
>  Thanks
>  Deepak Sheoran
>  -------- Original Message --------
>  Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
>  Date:	Wed, 03 Feb 2010 08:07:35 -0600
>  From:	Deepak Sheoran<sheoran143 at gmail.com>
>  To:	biojava-l at lists.open-bio.org
>
>  Hi guys,
>
>  A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>  On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
>  	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
>  This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
>   Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
>  But problem is with below part of that method:
>  ?..LineNumber: 114
>  else if (SimpleDocRef.class.isAssignableFrom(clazz))
>   {                queryType = "DocRef";
>                  // convert List constructor to String representation for query
>                  ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
>                  if (ourParamsList.size()<3) {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
>                  } else {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
>                  }
>   }
>  ..LineNubmer: 123
>  Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
>  ?.LineNumber: 447
>  else {
>                                          try {
>                                              CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
>                                              RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
>                                              rlistener.getCurrentFeature().addRankedCrossRef(rcr);
>                                          } catch (ChangeVetoException e) {
>                                              throw new ParseException(e+", accession:"+accession);
>                                          }
>                                      }
>                      ?..LineNumber:455
>  Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
>
>  The only way to get these record in database is:
>  		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
>  		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
>
>  Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
>  Reference_id
>  Dbxref_id
>  Location
>  Title
>  Authors
>  crc
>  216
>  18554304
>  FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  9E940E01F4BE3CD0
>  230
>  18554304
>  FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  D3BC0C17F3F786C9
>  415
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  60AEDFA0CEEACC38
>  969
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  4B1232999F6E8130
>  929
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  3E79B40DD2AAA2B7
>  932
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  094EB3384F8D6DE8
>  1426
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
>  357648D8FD8C6C8A
>  1481
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
>  115411EB2DEE5654
>  1497
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D5D376EECCD186B
>  1501
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D57954EECDED66B
>  1556
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  698688FB6DB95247
>  1559
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  E25E1BA99DB18F3D
>
>  	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>  		? Which means in richsequence object some feature have location object which have its feature set to null.
>  		? My Observation:
>  			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
>  			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
>  			? Below is the screen shot of one of my tests
>  				? Settings before trying to persits the richsequence object to database
>
>  <Mail Attachment.png>
>  		?
>  		? After trying to persits the richsequence object to database and got in hibernate exception catch
>
>  		?<Mail Attachment.png>
>
>  		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
>  		? Some extra information to make things more clear to you guys.
>  			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
>  				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
>  					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
>  				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
>  					? richSequence.feature Index : 127 and line number in the genbank record : 2137
>  				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
>  					? richSequence.feature Index : 389 and line number in the genbank record : 3632
>  				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
>  					? richSequence.feature Index : 47 and line number in the genbank record : 4841
>  				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
>  					? richSequence.feature Index : 45 and line number in the genbank record : 442
>  		? The complete exception msg :
>  org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>          at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
>          at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
>          at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
http://www.eaglegenomics.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Biojava_BioPerl_diff.xls
Type: application/vnd.ms-excel
Size: 346624 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0001.xls>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BioSqlRichObjectBuilder.patch
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0002.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: GenbankFormat.patch
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0003.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GenbankRecord.doc
Type: application/msword
Size: 59392 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0001.doc>

From andreas at sdsc.edu  Fri Mar  5 16:56:40 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 5 Mar 2010 08:56:40 -0800
Subject: [Biojava-l] Google summer of code
Message-ID: <59a41c431003050856v17c83b80sf1fb59f2587c9cd1@mail.gmail.com>

Hi,

The Open Bioinformatics Foundation (BioJava's mother organisation) is
preparing an application for the Google Summer of Code. If you are
interested in becoming a mentor for a BioJava related project, you can join
us in the application. If you are a student and are interested in a project,
please take a look at these pages:

http://www.open-bio.org/wiki/Google_Summer_of_Code

http://biojava.org/wiki/Google_Summer_of_Code

Andreas


From jeedward at yahoo.com  Mon Mar  8 15:44:05 2010
From: jeedward at yahoo.com (John Edward)
Date: Mon, 8 Mar 2010 07:44:05 -0800 (PST)
Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010
Message-ID: <800341.81267.qm@web45915.mail.sp1.yahoo.com>

It
would be highly appreciated if you could share this announcement with your
colleagues, students and individuals whose research is in bioinformatics,
computational biology, genomics, data-mining, and related areas.
 
Call
for papers: BCBGC-10, USA, July 2010
 
The
2010 International Conference on Bioinformatics, Computational Biology,
Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will
be held during 12-14 of July 2010 in Orlando, FL, USA.  BCBGC is an important event in the areas of
bioinformatics, computational biology, genomics and chemoinformatics and
focuses on all areas related to the conference.
 
The
conference will be held at the same time and location where several other major
international conferences will be taking place. The conference will be held as
part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during
July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to
promote research and developmental activities in computer science, information
technology, control engineering, and related fields. Another goal is to promote
the dissemination of research to a multidisciplinary audience and to facilitate
communication among researchers, developers, practitioners in different fields.
The following conferences are planned to be organized as part of MULTICONF-10.
 
?           International Conference on
Artificial Intelligence and Pattern Recognition (AIPR-10)
?            International Conference on Automation,
Robotics and Control Systems (ARCS-10)
?           International Conference on
Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10)
?           International Conference on Computer
Communications and Networks (CCN-10)
?           International Conference on
Enterprise Information Systems and Web Technologies (EISWT-10)
?           International Conference on High
Performance Computing Systems (HPCS-10)
?           International Conference on
Information Security and Privacy (ISP-10) 
?           International Conference on Image and
Video Processing and Computer Vision (IVPCV-10)
?           International Conference on Software
Engineering Theory and Practice (SETP-10) 
?           International Conference on
Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) 
 
 
MULTICONF-10
will be held at Imperial Swan Hotel and Suites.  It is a full-service resort that puts you in the middle of the fun!
Located 1/2 block south of the famed International Drive, the hotel is just
minutes from great entertainment like Walt Disney World? Resort, Universal
Studios and Sea World Orlando. Guests can enjoy free scheduled transportation
to these theme parks, as well as spacious accommodations, outdoor pools and
on-site dining ? all situated on 10 tropically landscaped acres. Here, guests
can experience a full-service resort with discount hotel pricing in Orlando.
 
We
invite draft paper submissions. Please see the website http://www.PromoteResearch.org for
more details.
 
Sincerely
John
Edward


From sheoran143 at gmail.com  Mon Mar  8 21:11:05 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Mon, 08 Mar 2010 15:11:05 -0600
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
 current maven based project
Message-ID: <4B9567E9.7080909@gmail.com>

Hi
I was making a local version of current maven project on my machine so 
that i can fix some reference related bugs in biojava. But when I build 
the local version and tried to use it. I got an error on method
RichObjectFactory.connectToBioSql(Object session) of current version of 
bio-java live. when I had a look on it I saw a comment on it

     "// commenting out for the moment, since it prevents core from 
compiling.
     // TODO: move to BioSql module"

then I uncommitted the code and add these import statements to 
RichObjectFactory.java and the problem is fixed :

import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;

After this I tried compiling bioSql module it went successfully and also 
when I compiled Core module it went successfully too.I don't if this is 
the only reason then please uncomment these line in main svn version 
since i don't how to do it.

Thanks
Deepak Sheoran


From andreas at sdsc.edu  Tue Mar  9 17:28:25 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 9 Mar 2010 09:28:25 -0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
	current maven based project
In-Reply-To: <4B9567E9.7080909@gmail.com>
References: <4B9567E9.7080909@gmail.com>
Message-ID: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>

Hi Deepak,

thanks for spotting this. This factory method should clearly be moved to the
biosql module and not be part of the core.  Anybody who has a deeper
knowledge of the biosql code: Where is the best place in the biosql module
to move this to?

A work around the compile problem would be to use reflection to mask the
calls to the methods in the other module, but it feels like a hack...

Andreas

On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com> wrote:

> Hi
> I was making a local version of current maven project on my machine so that
> i can fix some reference related bugs in biojava. But when I build the local
> version and tried to use it. I got an error on method
> RichObjectFactory.connectToBioSql(Object session) of current version of
> bio-java live. when I had a look on it I saw a comment on it
>
>    "// commenting out for the moment, since it prevents core from
> compiling.
>    // TODO: move to BioSql module"
>
> then I uncommitted the code and add these import statements to
> RichObjectFactory.java and the problem is fixed :
>
> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
>
> After this I tried compiling bioSql module it went successfully and also
> when I compiled Core module it went successfully too.I don't if this is the
> only reason then please uncomment these line in main svn version since i
> don't how to do it.
>
> Thanks
> Deepak Sheoran
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From sheoran143 at gmail.com  Tue Mar  9 20:10:00 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Tue, 09 Mar 2010 14:10:00 -0600
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
 current maven based project
In-Reply-To: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>
References: <4B9567E9.7080909@gmail.com>
	<59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>
Message-ID: <4B96AB18.908@gmail.com>

Hi Andreas
I guess it should go in "org.biojavax.bio.db.biosql" package, it make 
sense to put this class their.

Deepak Sheoran

On 3/9/2010 11:28 AM, Andreas Prlic wrote:
> Hi Deepak,
>
> thanks for spotting this. This factory method should clearly be moved 
> to the biosql module and not be part of the core.  Anybody who has a 
> deeper knowledge of the biosql code: Where is the best place in the 
> biosql module to move this to?
>
> A work around the compile problem would be to use reflection to mask 
> the calls to the methods in the other module, but it feels like a hack...
>
> Andreas
>
> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com 
> <mailto:sheoran143 at gmail.com>> wrote:
>
>     Hi
>     I was making a local version of current maven project on my
>     machine so that i can fix some reference related bugs in biojava.
>     But when I build the local version and tried to use it. I got an
>     error on method
>     RichObjectFactory.connectToBioSql(Object session) of current
>     version of bio-java live. when I had a look on it I saw a comment
>     on it
>
>        "// commenting out for the moment, since it prevents core from
>     compiling.
>        // TODO: move to BioSql module"
>
>     then I uncommitted the code and add these import statements to
>     RichObjectFactory.java and the problem is fixed :
>
>     import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
>     import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
>     import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
>
>     After this I tried compiling bioSql module it went successfully
>     and also when I compiled Core module it went successfully too.I
>     don't if this is the only reason then please uncomment these line
>     in main svn version since i don't how to do it.
>
>     Thanks
>     Deepak Sheoran
>
>
>     _______________________________________________
>     Biojava-l mailing list  - Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>


From holland at eaglegenomics.com  Wed Mar 10 13:31:43 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Wed, 10 Mar 2010 21:31:43 +0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in
	current maven based project
In-Reply-To: <4B96AB18.908@gmail.com>
References: <4B9567E9.7080909@gmail.com>
	<59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com>
	<4B96AB18.908@gmail.com>
Message-ID: <CF54F815-7918-4E20-A305-543F5A46071D@eaglegenomics.com>

The problem is that the RichObjectFactory is generic, but the connectToBioSQL method is BioSQL specific. What really needs to happen is abstract out the connectToBioSQL method _only_ to a more specific class in the biosql module, and use (if necessary create) setters on RichObjectFactory for it to use.


On 10 Mar 2010, at 04:10, Deepak Sheoran wrote:

> Hi Andreas
> I guess it should go in "org.biojavax.bio.db.biosql" package, it make sense to put this class their.
> 
> Deepak Sheoran
> 
> On 3/9/2010 11:28 AM, Andreas Prlic wrote:
>> Hi Deepak,
>> 
>> thanks for spotting this. This factory method should clearly be moved to the biosql module and not be part of the core.  Anybody who has a deeper knowledge of the biosql code: Where is the best place in the biosql module to move this to?
>> 
>> A work around the compile problem would be to use reflection to mask the calls to the methods in the other module, but it feels like a hack...
>> 
>> Andreas
>> 
>> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com <mailto:sheoran143 at gmail.com>> wrote:
>> 
>>    Hi
>>    I was making a local version of current maven project on my
>>    machine so that i can fix some reference related bugs in biojava.
>>    But when I build the local version and tried to use it. I got an
>>    error on method
>>    RichObjectFactory.connectToBioSql(Object session) of current
>>    version of bio-java live. when I had a look on it I saw a comment
>>    on it
>> 
>>       "// commenting out for the moment, since it prevents core from
>>    compiling.
>>       // TODO: move to BioSql module"
>> 
>>    then I uncommitted the code and add these import statements to
>>    RichObjectFactory.java and the problem is fixed :
>> 
>>    import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
>>    import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
>>    import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
>> 
>>    After this I tried compiling bioSql module it went successfully
>>    and also when I compiled Core module it went successfully too.I
>>    don't if this is the only reason then please uncomment these line
>>    in main svn version since i don't how to do it.
>> 
>>    Thanks
>>    Deepak Sheoran
>> 
>> 
>>    _______________________________________________
>>    Biojava-l mailing list  - Biojava-l at lists.open-bio.org
>>    <mailto:Biojava-l at lists.open-bio.org>
>>    http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From mark.schreiber at novartis.com  Thu Mar 11 03:14:54 2010
From: mark.schreiber at novartis.com (mark.schreiber at novartis.com)
Date: Thu, 11 Mar 2010 11:14:54 +0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java
 in	current maven based project
In-Reply-To: <CF54F815-7918-4E20-A305-543F5A46071D@eaglegenomics.com>
Message-ID: <OF2FF55EB1.D19103D2-ON482576E3.0011B573-482576E3.0011D831@ah.novartis.com>

Could a subclass of the RichObjectFactory exist in the BioSQL module. If 
you want your RichObjects backed by BioSQL you use the 
[BioSQL]RichObjectFactory from the BioSQL package???

- Mark


biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM:

> The problem is that the RichObjectFactory is generic, but the 
> connectToBioSQL method is BioSQL specific. What really needs to 
> happen is abstract out the connectToBioSQL method _only_ to a more 
> specific class in the biosql module, and use (if necessary create) 
> setters on RichObjectFactory for it to use.
> 
> 
> On 10 Mar 2010, at 04:10, Deepak Sheoran wrote:
> 
> > Hi Andreas
> > I guess it should go in "org.biojavax.bio.db.biosql" package, it 
> make sense to put this class their.
> > 
> > Deepak Sheoran
> > 
> > On 3/9/2010 11:28 AM, Andreas Prlic wrote:
> >> Hi Deepak,
> >> 
> >> thanks for spotting this. This factory method should clearly be 
> moved to the biosql module and not be part of the core.  Anybody who
> has a deeper knowledge of the biosql code: Where is the best place 
> in the biosql module to move this to?
> >> 
> >> A work around the compile problem would be to use reflection to 
> mask the calls to the methods in the other module, but it feels likea 
hack...
> >> 
> >> Andreas
> >> 
> >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com 
<
> mailto:sheoran143 at gmail.com>> wrote:
> >> 
> >>    Hi
> >>    I was making a local version of current maven project on my
> >>    machine so that i can fix some reference related bugs in biojava.
> >>    But when I build the local version and tried to use it. I got an
> >>    error on method
> >>    RichObjectFactory.connectToBioSql(Object session) of current
> >>    version of bio-java live. when I had a look on it I saw a comment
> >>    on it
> >> 
> >>       "// commenting out for the moment, since it prevents core from
> >>    compiling.
> >>       // TODO: move to BioSql module"
> >> 
> >>    then I uncommitted the code and add these import statements to
> >>    RichObjectFactory.java and the problem is fixed :
> >> 
> >>    import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
> >>    import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
> >>    import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
> >> 
> >>    After this I tried compiling bioSql module it went successfully
> >>    and also when I compiled Core module it went successfully too.I
> >>    don't if this is the only reason then please uncomment these line
> >>    in main svn version since i don't how to do it.
> >> 
> >>    Thanks
> >>    Deepak Sheoran
> >> 
> >> 
> >>    _______________________________________________
> >>    Biojava-l mailing list  - Biojava-l at lists.open-bio.org
> >>    <mailto:Biojava-l at lists.open-bio.org>
> >>    http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> 
> >> 
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From holland at eaglegenomics.com  Thu Mar 11 16:10:15 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 12 Mar 2010 00:10:15 +0800
Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java
	in	current maven based project
In-Reply-To: <OF2FF55EB1.D19103D2-ON482576E3.0011B573-482576E3.0011D831@ah.novartis.com>
References: <OF2FF55EB1.D19103D2-ON482576E3.0011B573-482576E3.0011D831@ah.novartis.com>
Message-ID: <4E92965B-F9EA-43B1-9235-4FA7BAC09308@eaglegenomics.com>

Could do.

On 11 Mar 2010, at 11:14, mark.schreiber at novartis.com wrote:

> 
> Could a subclass of the RichObjectFactory exist in the BioSQL module. If you want your RichObjects backed by BioSQL you use the [BioSQL]RichObjectFactory from the BioSQL package??? 
> 
> - Mark 
> 
> 
> biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM:
> 
> > The problem is that the RichObjectFactory is generic, but the 
> > connectToBioSQL method is BioSQL specific. What really needs to 
> > happen is abstract out the connectToBioSQL method _only_ to a more 
> > specific class in the biosql module, and use (if necessary create) 
> > setters on RichObjectFactory for it to use.
> > 
> > 
> > On 10 Mar 2010, at 04:10, Deepak Sheoran wrote:
> > 
> > > Hi Andreas
> > > I guess it should go in "org.biojavax.bio.db.biosql" package, it 
> > make sense to put this class their.
> > > 
> > > Deepak Sheoran
> > > 
> > > On 3/9/2010 11:28 AM, Andreas Prlic wrote:
> > >> Hi Deepak,
> > >> 
> > >> thanks for spotting this. This factory method should clearly be 
> > moved to the biosql module and not be part of the core.  Anybody who
> > has a deeper knowledge of the biosql code: Where is the best place 
> > in the biosql module to move this to?
> > >> 
> > >> A work around the compile problem would be to use reflection to 
> > mask the calls to the methods in the other module, but it feels likea hack...
> > >> 
> > >> Andreas
> > >> 
> > >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran <sheoran143 at gmail.com <
> > mailto:sheoran143 at gmail.com>> wrote:
> > >> 
> > >>    Hi
> > >>    I was making a local version of current maven project on my
> > >>    machine so that i can fix some reference related bugs in biojava.
> > >>    But when I build the local version and tried to use it. I got an
> > >>    error on method
> > >>    RichObjectFactory.connectToBioSql(Object session) of current
> > >>    version of bio-java live. when I had a look on it I saw a comment
> > >>    on it
> > >> 
> > >>       "// commenting out for the moment, since it prevents core from
> > >>    compiling.
> > >>       // TODO: move to BioSql module"
> > >> 
> > >>    then I uncommitted the code and add these import statements to
> > >>    RichObjectFactory.java and the problem is fixed :
> > >> 
> > >>    import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver;
> > >>    import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder;
> > >>    import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler;
> > >> 
> > >>    After this I tried compiling bioSql module it went successfully
> > >>    and also when I compiled Core module it went successfully too.I
> > >>    don't if this is the only reason then please uncomment these line
> > >>    in main svn version since i don't how to do it.
> > >> 
> > >>    Thanks
> > >>    Deepak Sheoran
> > >> 
> > >> 
> > >>    _______________________________________________
> > >>    Biojava-l mailing list  - Biojava-l at lists.open-bio.org
> > >>    <mailto:Biojava-l at lists.open-bio.org>
> > >>    http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >> 
> > >> 
> > > 
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > 
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> > 
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer.  Thank you.

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Mon Mar 15 10:34:14 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Mon, 15 Mar 2010 10:34:14 +0000
Subject: [Biojava-l] Hackathon in Boston, July 2010
Message-ID: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com>

Hi all,

Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved.

However, OBF have beaten us to it by organising a cross-project CodeFest!

 http://www.open-bio.org/wiki/Codefest_2010

It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :)

All relevant info is on the web page linked to above, and if you have any questions, ask Brad as detailed on the page.

cheers,
Richard

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From xuejiachen at gmail.com  Mon Mar 15 23:09:50 2010
From: xuejiachen at gmail.com (Jiachen Xue)
Date: Mon, 15 Mar 2010 19:09:50 -0400
Subject: [Biojava-l] question about BLAST output parsing
Message-ID: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>

Hi,

Thanks advance for help.

For the following piece of text appearing in a blast output. How can I get
the fields of "Identities", "Positives", "Gaps" as well as the alignment
information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
subject string?

>sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
AltName: Full=UMP
           pyrophosphorylase; AltName: Full=UPRTase
          Length = 209

 Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
adjust.
 Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)

Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
           DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165


From anjolou at hotmail.com  Tue Mar 16 09:20:35 2010
From: anjolou at hotmail.com (Louise Ott)
Date: Tue, 16 Mar 2010 10:20:35 +0100
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
Message-ID: <BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>


Hello,
I tried to use the biojava blast parser myself but i didn't find a way to get back these informations.If your blast result can be in xml, you should try to use jaxb to parse it (this is what i used).There are already some code for marshall/unmarshall in the biojava3 project.I give you the link, but it seems to be dead right now :
http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3
http://www.biojava.org/wiki/BioJava3_project
Have a nice day,
Louise


> Date: Mon, 15 Mar 2010 19:09:50 -0400
> From: xuejiachen at gmail.com
> To: biojava-l at lists.open-bio.org
> Subject: [Biojava-l] question about BLAST output parsing
> 
> Hi,
> 
> Thanks advance for help.
> 
> For the following piece of text appearing in a blast output. How can I get
> the fields of "Identities", "Positives", "Gaps" as well as the alignment
> information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> subject string?
> 
> >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> AltName: Full=UMP
>            pyrophosphorylase; AltName: Full=UPRTase
>           Length = 209
> 
>  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> adjust.
>  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> 
> Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
>            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
 		 	   		  
_________________________________________________________________
Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, Blackberry, ?
http://www.messengersurvotremobile.com/?d=Hotmail


From anjolou at hotmail.com  Tue Mar 16 09:23:37 2010
From: anjolou at hotmail.com (Louise Ott)
Date: Tue, 16 Mar 2010 10:23:37 +0100
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
Message-ID: <BAY110-W425A46DCB7DB1DB2110609B32D0@phx.gbl>


Sorry i forgot : there is an example of using blast parser in here :
http://biojava.org/wiki/BioJava:CookBook:Blast:Parser
It should be enough for what you want to do.


> Date: Mon, 15 Mar 2010 19:09:50 -0400
> From: xuejiachen at gmail.com
> To: biojava-l at lists.open-bio.org
> Subject: [Biojava-l] question about BLAST output parsing
> 
> Hi,
> 
> Thanks advance for help.
> 
> For the following piece of text appearing in a blast output. How can I get
> the fields of "Identities", "Positives", "Gaps" as well as the alignment
> information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> subject string?
> 
> >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> AltName: Full=UMP
>            pyrophosphorylase; AltName: Full=UPRTase
>           Length = 209
> 
>  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> adjust.
>  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> 
> Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
>            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
 		 	   		  
_________________________________________________________________
Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL !
http://www.windowslive.fr/hotmail/agregation/


From andreas at sdsc.edu  Tue Mar 16 15:19:45 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 16 Mar 2010 08:19:45 -0700
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com>
	<BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>
Message-ID: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com>

Yea, the BioJava Blast parser has not been maintained in quite a while.
Probably parsing the XML output of Blast is the thing to do nowadays. About
Biojava3: the wiki documentation is a bit behind, the code is now in the
main biojava-trunk and development has been quite active over the last
months.

Andreas

On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott <anjolou at hotmail.com> wrote:

>
>
> Hello,
> I tried to use the biojava blast parser myself but i didn't find a way to
> get back these informations.If your blast result can be in xml, you should
> try to use jaxb to parse it (this is what i used).There are already some
> code for marshall/unmarshall in the biojava3 project.I give you the link,
> but it seems to be dead right now :
>
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3
> http://www.biojava.org/wiki/BioJava3_project
> Have a nice day,
> Louise
>
>
> > Date: Mon, 15 Mar 2010 19:09:50 -0400
> > From: xuejiachen at gmail.com
> > To: biojava-l at lists.open-bio.org
> > Subject: [Biojava-l] question about BLAST output parsing
> >
> > Hi,
> >
> > Thanks advance for help.
> >
> > For the following piece of text appearing in a blast output. How can I
> get
> > the fields of "Identities", "Positives", "Gaps" as well as the alignment
> > information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> > subject string?
> >
> > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> > AltName: Full=UMP
> >            pyrophosphorylase; AltName: Full=UPRTase
> >           Length = 209
> >
> >  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> > adjust.
> >  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> >
> > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
> >            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> _________________________________________________________________
> Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone,
> Blackberry, ?
> http://www.messengersurvotremobile.com/?d=Hotmail
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From hlapp at drycafe.net  Tue Mar 16 20:03:50 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Tue, 16 Mar 2010 16:03:50 -0400
Subject: [Biojava-l] [OT] Job opportunity: Training coordinator and
	Bioinformatics Project Manager
Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net>

Hi all -

first off, sorry for the cross-posting, we're trying to advertise this  
as widely as possible. Second, apologies if this is committing an  
offense and considered spam. I thought though that there might be some  
people around here who may be interested and suitable.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================

A unique position is available for a training coordinator and  
bioinformatics project manager at the U.S. National Evolutionary  
Synthesis Center in Durham, North Carolina (NESCent, http:// 
nescent.org).  NESCent is a National Science Foundation funded  
research center managed by Duke University, the University of North  
Carolina at Chapel Hill and North Carolina State University on behalf  
of the international evolutionary biology community.  NESCent  
facilitates synthetic research by bringing together diverse expertise,  
data, tools and concepts (Sidlauskas et al. 2009).  In addition to a  
resident population of 20-30 scientists, the Center hosts over 800  
visitors a year.  An informatics staff is on-site to support resident  
and visiting scientists? needs in high-performance computing,  
electronic collaboration, scientific software and databases; this  
includes custom software development for a limited number of high- 
impact projects.  NESCent?s informatics training program includes a  
rotating series of open-application summer courses, ad-hoc short  
courses for resident scientists, and remote internships (including  
past participation in the Google Summer of Code).

The training coordinator and bioinformatics project manager will  
provide oversight to the Center?s training activities. The incumbent  
will also serve as the interface between scientists and software  
developers at NESCent. The position provides extensive opportunities  
for collaboration and intellectual engagement with both NESCent- 
sponsored scientists and informatics staff; however, this is not an  
independent research position. The incumbent will report to the  
Director, while overseeing the work of a small informatics team and  
coordinating activities among the Center?s science, education and  
informatics programs.


Responsibilities:

	? 50% - Consult with sponsored scientists (including scientists in  
residence and working group participants) about informatics resources  
and needs. Manage software product development by gathering  
requirements from scientists, participating in conceptual design,  
monitoring implementation progress and product quality, facilitating  
communication between software developers and scientists, and       
researching software solutions.

	? 25% - Oversee NESCent?s course curriculum by identifying  
opportunities for onsite or online informatics courses that satisfy  
demand for advanced training of resident and visiting scientists,  
recruiting instructors, providing guidance to instructors in  
developing course syllabi, coordinating logistical and technical  
support requirements, conducting assessments, and serving as a liaison  
to course organizers at other institutions.

	? 25% - Assisting in the management of NESCent?s summer informatics  
intern program, by coordinating the recruitment, application & review  
process for students, communicating expectations to students and  
mentors, monitoring student progress, documenting student outcomes,  
and performing assessments.


Education:

Required: M.S. in Biology, Bioinformatics, or a related field.
Preferred: Ph.D. and two years postdoctoral experience in evolutionary  
biology, or an equivalent combination of relevant education and/or  
experience.


Experience:

Required: Excellent communication, interpersonal, and organizational  
skills.  Experience with computationally oriented scientific research.
Preferred: At least two years in development of databases and open  
source software.   Organization, coordination, development and  
delivery of courses and workshops appropriate for graduate-level  
participants.


Terms of Employment:

Salary will be competitive and commensurate with experience.  As a  
full-time employee, the incumbent will receive Duke University?s  
benefits package (http://hr.duke.edu/benefits/main.html). The position  
is available immediately and will remain open until filled.  The  
position is currently funded through November 2014, contingent on  
annual renewal of the Center by the NSF.


How to Apply:

Please send a C.V., including contact information for three  
references, and a brief statement of interest to Allen Rodrigo,  
Director, NESCent, at a.rodrigo at nescent.org. Inquiries about  
suitability for the position are welcome.  Duke University is an Equal  
Opportunity/Affirmative Action employer.  Additional information about  
NESCent: http://www.nescent.org


References:

Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall  
LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The  
Continuing Promise of Evolutionary Synthesis. Evolution.
http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x


From markjschreiber at gmail.com  Wed Mar 17 01:14:51 2010
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Wed, 17 Mar 2010 09:14:51 +0800
Subject: [Biojava-l] question about BLAST output parsing
In-Reply-To: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com>
References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> 
	<BAY110-W414E5F60A9FCDA61710DF9B32D0@phx.gbl>
	<59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com>
Message-ID: <93b45ca51003161814y7196e3e8i8e329b79e612cf50@mail.gmail.com>

I generally don't recommend parsing the standard BLAST output as it keeps
changing subtly . Best to parse one of the tabular formats or the XML
output.

- Mark

On Tue, Mar 16, 2010 at 11:19 PM, Andreas Prlic <andreas at sdsc.edu> wrote:

> Yea, the BioJava Blast parser has not been maintained in quite a while.
> Probably parsing the XML output of Blast is the thing to do nowadays. About
> Biojava3: the wiki documentation is a bit behind, the code is now in the
> main biojava-trunk and development has been quite active over the last
> months.
>
> Andreas
>
> On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott <anjolou at hotmail.com> wrote:
>
> >
> >
> > Hello,
> > I tried to use the biojava blast parser myself but i didn't find a way to
> > get back these informations.If your blast result can be in xml, you
> should
> > try to use jaxb to parse it (this is what i used).There are already some
> > code for marshall/unmarshall in the biojava3 project.I give you the link,
> > but it seems to be dead right now :
> >
> >
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3
> > http://www.biojava.org/wiki/BioJava3_project
> > Have a nice day,
> > Louise
> >
> >
> > > Date: Mon, 15 Mar 2010 19:09:50 -0400
> > > From: xuejiachen at gmail.com
> > > To: biojava-l at lists.open-bio.org
> > > Subject: [Biojava-l] question about BLAST output parsing
> > >
> > > Hi,
> > >
> > > Thanks advance for help.
> > >
> > > For the following piece of text appearing in a blast output. How can I
> > get
> > > the fields of "Identities", "Positives", "Gaps" as well as the
> alignment
> > > information, such as "DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE" and
> > > subject string?
> > >
> > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase;
> > > AltName: Full=UMP
> > >            pyrophosphorylase; AltName: Full=UPRTase
> > >           Length = 209
> > >
> > >  Score = 32.0 bits (71), Expect = 9.7,   Method: Compositional matrix
> > > adjust.
> > >  Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%)
> > >
> > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399
> > >            DK V L+D  +  G +S + +++ +E GA+K+ L +  AAPE
> > > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> > _________________________________________________________________
> > Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone,
> > Blackberry, ?
> > http://www.messengersurvotremobile.com/?d=Hotmail
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From Richard.Finkers at wur.nl  Wed Mar 17 07:21:16 2010
From: Richard.Finkers at wur.nl (Richard Finkers)
Date: Wed, 17 Mar 2010 08:21:16 +0100
Subject: [Biojava-l] SVN repository
In-Reply-To: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
Message-ID: <4BA082EC.8010908@wur.nl>

Hi,

I would like to have a look at the BioJava 3 code (and perhaps in the 
future contribute to). However, I cannot access the SVN repository 
(http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk). 


Is the repository down?

Thanks,
Richard


From biopython at maubp.freeserve.co.uk  Wed Mar 17 10:16:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Mar 2010 10:16:45 +0000
Subject: [Biojava-l] SVN repository
In-Reply-To: <4BA082EC.8010908@wur.nl>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
	<4BA082EC.8010908@wur.nl>
Message-ID: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>

On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <Richard.Finkers at wur.nl> wrote:
>
> Hi,
>
> I would like to have a look at the BioJava 3 code (and perhaps in the future
> contribute to). However, I cannot access the SVN repository
> (http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk).
>
> Is the repository down?
>
> Thanks,
> Richard

Probably :(

There have been problems discussed on the BioPerl mailing list
(they use the same servers), and the OBF team are aware of it:
http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html

The code.open-bio.org repositories are a read only public mirror,
while dev.open-bio.org is the master repository I think is fine
(but not available for anonymous download).

In the mean time BioPerl have also setup a read only mirror
on github - perhaps BioJava could do the same? Meanwhile
BioRuby and Biopython are just using github (not SVN or CVS).

Peter


From andreas at sdsc.edu  Wed Mar 17 17:39:41 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 17 Mar 2010 10:39:41 -0700
Subject: [Biojava-l] SVN repository
In-Reply-To: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org>
	<4BA082EC.8010908@wur.nl>
	<320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
Message-ID: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>

I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
SVN is currently down. Depending on how big the problem turns out to be, it
will be back at some point later today / should be back latest tomorrow.

Sorry for this inconvenience.
Andreas


On Wed, Mar 17, 2010 at 3:16 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <Richard.Finkers at wur.nl>
> wrote:
> >
> > Hi,
> >
> > I would like to have a look at the BioJava 3 code (and perhaps in the
> future
> > contribute to). However, I cannot access the SVN repository
> > (
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
> ).
> >
> > Is the repository down?
> >
> > Thanks,
> > Richard
>
> Probably :(
>
> There have been problems discussed on the BioPerl mailing list
> (they use the same servers), and the OBF team are aware of it:
> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>
> The code.open-bio.org repositories are a read only public mirror,
> while dev.open-bio.org is the master repository I think is fine
> (but not available for anonymous download).
>
> In the mean time BioPerl have also setup a read only mirror
> on github - perhaps BioJava could do the same? Meanwhile
> BioRuby and Biopython are just using github (not SVN or CVS).
>
> Peter
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From andreas at sdsc.edu  Thu Mar 18 20:36:38 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 18 Mar 2010 13:36:38 -0700
Subject: [Biojava-l] Google summer of code
Message-ID: <59a41c431003181336i33d388aak4b5a26e11ee4161b@mail.gmail.com>

Hi,

It seems our (the Open Biology Foundation's) Google Summer of Code
application has been accepted.
http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010

As such we are now looking for an interested and skilled student to work on
the BioJava multiple sequence alignment project. Take a look at the project
description, and if you think you are up for the challenge, send me an email
with your application.

http://biojava.org/wiki/Google_Summer_of_Code

Andreas


From shakunb at uom.ac.mu  Fri Mar 19 10:50:40 2010
From: shakunb at uom.ac.mu (Shakuntala baichoo)
Date: Fri, 19 Mar 2010 14:50:40 +0400
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
Message-ID: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>

Hi!
I would like to know the interpretation of the scores after running the
needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
Actually I have taken the named genes from a bacteria EMBL file and I am
trying to compare each gene to the other genes in the lot, using the
needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I
would like to determine the % match for each pair but since I get mostly -ve
and some positive values, I would like to know how to calculate the % match
for a pair of genes.
I would be grateful if anybody could help me.

Thanks.
Shakuntala

On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org>wrote:

> Send Biojava-l mailing list submissions to
>        biojava-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.open-bio.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
>        biojava-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
>        biojava-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
>
>
> Today's Topics:
>
>   1. Re: SVN repository (Andreas Prlic)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 17 Mar 2010 10:39:41 -0700
> From: Andreas Prlic <andreas at sdsc.edu>
> Subject: Re: [Biojava-l] SVN repository
> To: Richard Finkers <Richard.Finkers at wur.nl>
> Cc: biojava-l at lists.open-bio.org
> Message-ID:
>        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
> SVN is currently down. Depending on how big the problem turns out to be, it
> will be back at some point later today / should be back latest tomorrow.
>
> Sorry for this inconvenience.
> Andreas
>
>
>
>
> On Wed, Mar 17, 2010 at 3:16 AM, Peter <biopython at maubp.freeserve.co.uk
> >wrote:
>
> > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <Richard.Finkers at wur.nl
> >
> > wrote:
> > >
> > > Hi,
> > >
> > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > future
> > > contribute to). However, I cannot access the SVN repository
> > > (
> >
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
> > ).
> > >
> > > Is the repository down?
> > >
> > > Thanks,
> > > Richard
> >
> > Probably :(
> >
> > There have been problems discussed on the BioPerl mailing list
> > (they use the same servers), and the OBF team are aware of it:
> > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> >
> > The code.open-bio.org repositories are a read only public mirror,
> > while dev.open-bio.org is the master repository I think is fine
> > (but not available for anonymous download).
> >
> > In the mean time BioPerl have also setup a read only mirror
> > on github - perhaps BioJava could do the same? Meanwhile
> > BioRuby and Biopython are just using github (not SVN or CVS).
> >
> > Peter
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
> ------------------------------
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
> End of Biojava-l Digest, Vol 86, Issue 9
> ****************************************
>


-- 
Best Regards

Dr. (Mrs.) S.Baichoo
Senior Lecturer
CSE Dept, FoE
University of Mauritius


From andreas at sdsc.edu  Fri Mar 19 17:42:44 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Fri, 19 Mar 2010 10:42:44 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
Message-ID: <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>

sorry, can you clarify: what do you mean with you "get mostly -ve" ?

Andreas

On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:

> Hi!
> I would like to know the interpretation of the scores after running the
> needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
> Actually I have taken the named genes from a bacteria EMBL file and I am
> trying to compare each gene to the other genes in the lot, using the
> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I
> would like to determine the % match for each pair but since I get mostly
> -ve
> and some positive values, I would like to know how to calculate the % match
> for a pair of genes.
> I would be grateful if anybody could help me.
>
> Thanks.
> Shakuntala
>
> On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org
> >wrote:
>
> > Send Biojava-l mailing list submissions to
> >        biojava-l at lists.open-bio.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >        http://lists.open-bio.org/mailman/listinfo/biojava-l
> > or, via email, send a message with subject or body 'help' to
> >        biojava-l-request at lists.open-bio.org
> >
> > You can reach the person managing the list at
> >        biojava-l-owner at lists.open-bio.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Biojava-l digest..."
> >
> >
> > Today's Topics:
> >
> >   1. Re: SVN repository (Andreas Prlic)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 17 Mar 2010 10:39:41 -0700
> > From: Andreas Prlic <andreas at sdsc.edu>
> > Subject: Re: [Biojava-l] SVN repository
> > To: Richard Finkers <Richard.Finkers at wur.nl>
> > Cc: biojava-l at lists.open-bio.org
> > Message-ID:
> >        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > I have just heard back from the OBF-helpdesk. The VM hosting the
> anonymous
> > SVN is currently down. Depending on how big the problem turns out to be,
> it
> > will be back at some point later today / should be back latest tomorrow.
> >
> > Sorry for this inconvenience.
> > Andreas
> >
> >
> >
> >
> > On Wed, Mar 17, 2010 at 3:16 AM, Peter <biopython at maubp.freeserve.co.uk
> > >wrote:
> >
> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <
> Richard.Finkers at wur.nl
> > >
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > > future
> > > > contribute to). However, I cannot access the SVN repository
> > > > (
> > >
> >
> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
> > > ).
> > > >
> > > > Is the repository down?
> > > >
> > > > Thanks,
> > > > Richard
> > >
> > > Probably :(
> > >
> > > There have been problems discussed on the BioPerl mailing list
> > > (they use the same servers), and the OBF team are aware of it:
> > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> > >
> > > The code.open-bio.org repositories are a read only public mirror,
> > > while dev.open-bio.org is the master repository I think is fine
> > > (but not available for anonymous download).
> > >
> > > In the mean time BioPerl have also setup a read only mirror
> > > on github - perhaps BioJava could do the same? Meanwhile
> > > BioRuby and Biopython are just using github (not SVN or CVS).
> > >
> > > Peter
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> >
> > End of Biojava-l Digest, Vol 86, Issue 9
> > ****************************************
> >
>
>
>
> --
> Best Regards
>
> Dr. (Mrs.) S.Baichoo
> Senior Lecturer
> CSE Dept, FoE
> University of Mauritius
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From mitlox at op.pl  Sat Mar 20 10:17:17 2010
From: mitlox at op.pl (xyz)
Date: Sat, 20 Mar 2010 20:17:17 +1000
Subject: [Biojava-l] sort fasta file
Message-ID: <20100320201718.4420a9b9@wp01>

Hello,
I would like to sort multiple fasta file depends on the sequence length,
ie. from the read with longest sequence to the read with the shortest
sequence.

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import org.biojava.bio.BioException;

import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;

public class SortFasta {

  public static void main(String[] args) throws FileNotFoundException,
  BioException {

    BufferedReader br = new BufferedReader(new
    FileReader("sortfasta.fasta")); SimpleNamespace ns = new
    SimpleNamespace("biojava");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null,
    ns);

    while (rsi.hasNext()) {
      RichSequence rs = rsi.nextRichSequence();
      System.out.println(rs.getName());
      System.out.println(rs.seqString());
    }
  }
}

I have tried to do it, but I do not how to continue.

Thank you in advance.

Best regards,


From jswetnam at gmail.com  Sun Mar 21 20:56:35 2010
From: jswetnam at gmail.com (James Swetnam)
Date: Sun, 21 Mar 2010 16:56:35 -0400
Subject: [Biojava-l] sort fasta file
In-Reply-To: <20100320201718.4420a9b9@wp01>
References: <20100320201718.4420a9b9@wp01>
Message-ID: <ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>

Just hacked this together, warning: I am new to both java and biojava.

import java.io.*;
import java.util.*;

import org.biojava.bio.BioException;
import org.biojava.bio.symbol.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.*;

import java.util.Comparator;

public class SortFasta {

    static private class RichSequenceComparator implements
Comparator<RichSequence> {

    public int compare(RichSequence seq1, RichSequence seq2)
    {
        return seq1.length() - seq2.length();
    }


    }

    // Usage:  SortFasta unsortedFile.fasta
    public static void main(String[] args) throws FileNotFoundException,
                          BioException {

    String fastaFile = args[0];

    BufferedReader br = new BufferedReader(new FileReader(fastaFile));
    SimpleNamespace ns = new SimpleNamespace("biojava");

    Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
                                  protein.getTokenization("token"),
                                  ns);

    SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
SortFasta.RichSequenceComparator());

    while (rsi.hasNext()) {
        sorted.add(rsi.nextRichSequence());
    }

    Iterator<RichSequence> sortedIt = sorted.iterator();

    //Do whatever you want here with the ascending list of RichSequences by
length, I'll just print them.
    while(sortedIt.hasNext())
        {
        System.out.println(((RichSequence) sortedIt.next()).length());
        }
    }
}

On Sat, Mar 20, 2010 at 6:17 AM, xyz <mitlox at op.pl> wrote:

> Hello,
> I would like to sort multiple fasta file depends on the sequence length,
> ie. from the read with longest sequence to the read with the shortest
> sequence.
>
> import java.io.BufferedReader;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import org.biojava.bio.BioException;
>
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.RichSequence;
> import org.biojavax.bio.seq.RichSequenceIterator;
>
> public class SortFasta {
>
>  public static void main(String[] args) throws FileNotFoundException,
>  BioException {
>
>    BufferedReader br = new BufferedReader(new
>    FileReader("sortfasta.fasta")); SimpleNamespace ns = new
>    SimpleNamespace("biojava");
>
>    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null,
>    ns);
>
>    while (rsi.hasNext()) {
>      RichSequence rs = rsi.nextRichSequence();
>      System.out.println(rs.getName());
>      System.out.println(rs.seqString());
>    }
>  }
> }
>
> I have tried to do it, but I do not how to continue.
>
> Thank you in advance.
>
> Best regards,
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From andreas at sdsc.edu  Mon Mar 22 23:46:26 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Mon, 22 Mar 2010 16:46:26 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>
	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
Message-ID: <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>

Hi Shakuntala,

at the present the NeedlemanWunch implementation does not make it totally
straightforward to access the %id. You could try parsing the result of the
getAlignmentString() call and accessing the information from there ...
Making the underlying data more accessible is on the TODO list for this
module: http://biojava.org/wiki/BioJava:Modules

Andreas

2010/3/21 Shakuntala baichoo <shakunb at uom.ac.mu>

> Hi Andreas!
> The problem is as follows. We have a bacteria file. There are about 565
> named genes/features there. We wish to compare each gene with the other 564
> genes. I am using needleman-wunsch from biojava to do so. For one specific
> run, I am attaching the result.
> The score after comparing Feature no. 0 with Feature no. 1 to Feature no.
> 564 is displayed (along with the product name etc...). If I wish to
> interpret these scores as a percentage homology, how do I do it?
>
> P.S. Most of the scores are -ve. Only one or a few is +ve.  The comparison
> is done using NUCC44.txt.
>
> Thanks
> Kind Regards
> Shakuntala
>
>
> On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>
>> sorry, can you clarify: what do you mean with you "get mostly -ve" ?
>>
>> Andreas
>>
>>
>> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:
>>
>>> Hi!
>>> I would like to know the interpretation of the scores after running the
>>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
>>> Actually I have taken the named genes from a bacteria EMBL file and I am
>>> trying to compare each gene to the other genes in the lot, using the
>>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I
>>> would like to determine the % match for each pair but since I get mostly
>>> -ve
>>> and some positive values, I would like to know how to calculate the %
>>> match
>>> for a pair of genes.
>>> I would be grateful if anybody could help me.
>>>
>>> Thanks.
>>> Shakuntala
>>>
>>> On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org
>>> >wrote:
>>>
>>> > Send Biojava-l mailing list submissions to
>>> >        biojava-l at lists.open-bio.org
>>> >
>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>> >        http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> > or, via email, send a message with subject or body 'help' to
>>> >        biojava-l-request at lists.open-bio.org
>>> >
>>> > You can reach the person managing the list at
>>> >        biojava-l-owner at lists.open-bio.org
>>> >
>>> > When replying, please edit your Subject line so it is more specific
>>> > than "Re: Contents of Biojava-l digest..."
>>> >
>>> >
>>> > Today's Topics:
>>> >
>>> >   1. Re: SVN repository (Andreas Prlic)
>>> >
>>> >
>>> > ----------------------------------------------------------------------
>>> >
>>> > Message: 1
>>> > Date: Wed, 17 Mar 2010 10:39:41 -0700
>>> > From: Andreas Prlic <andreas at sdsc.edu>
>>> > Subject: Re: [Biojava-l] SVN repository
>>> > To: Richard Finkers <Richard.Finkers at wur.nl>
>>> > Cc: biojava-l at lists.open-bio.org
>>> > Message-ID:
>>> >        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
>>> > Content-Type: text/plain; charset=ISO-8859-1
>>> >
>>> > I have just heard back from the OBF-helpdesk. The VM hosting the
>>> anonymous
>>> > SVN is currently down. Depending on how big the problem turns out to
>>> be, it
>>> > will be back at some point later today / should be back latest
>>> tomorrow.
>>> >
>>> > Sorry for this inconvenience.
>>> > Andreas
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter <
>>> biopython at maubp.freeserve.co.uk
>>> > >wrote:
>>> >
>>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <
>>> Richard.Finkers at wur.nl
>>> > >
>>> > > wrote:
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > I would like to have a look at the BioJava 3 code (and perhaps in
>>> the
>>> > > future
>>> > > > contribute to). However, I cannot access the SVN repository
>>> > > > (
>>> > >
>>> >
>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
>>> > > ).
>>> > > >
>>> > > > Is the repository down?
>>> > > >
>>> > > > Thanks,
>>> > > > Richard
>>> > >
>>> > > Probably :(
>>> > >
>>> > > There have been problems discussed on the BioPerl mailing list
>>> > > (they use the same servers), and the OBF team are aware of it:
>>> > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>>> > >
>>> > > The code.open-bio.org repositories are a read only public mirror,
>>> > > while dev.open-bio.org is the master repository I think is fine
>>> > > (but not available for anonymous download).
>>> > >
>>> > > In the mean time BioPerl have also setup a read only mirror
>>> > > on github - perhaps BioJava could do the same? Meanwhile
>>> > > BioRuby and Biopython are just using github (not SVN or CVS).
>>> > >
>>> > > Peter
>>> > > _______________________________________________
>>> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> > >
>>> >
>>> >
>>> > ------------------------------
>>> >
>>> > _______________________________________________
>>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> >
>>> >
>>> > End of Biojava-l Digest, Vol 86, Issue 9
>>> > ****************************************
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Dr. (Mrs.) S.Baichoo
>>> Senior Lecturer
>>> CSE Dept, FoE
>>> University of Mauritius
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>
>
> --
> Best Regards
>
> Dr. (Mrs.) S.Baichoo
> Senior Lecturer
> CSE Dept, FoE
> University of Mauritius
>


From zm19fitz at siena.edu  Mon Mar 22 20:36:14 2010
From: zm19fitz at siena.edu (Fitzsimmons, Zachary)
Date: Mon, 22 Mar 2010 16:36:14 -0400
Subject: [Biojava-l] (no subject)
Message-ID: <3898DEB8D4D8E34EB622AC53CEFFA2680173D9476385@mb-1.siena.edu>

Hi,

I am currently a sophomore at Siena College and a Dual Major in Computer Science and Mathematics and I am writing you today to voice my interest in developing for BioJava this summer through Google?s Summer of Code program.  I did research at my own college last summer on the Netflix Prize Project with one of my computer science professors and I am very interested in diversifying my work this summer.  Currently I am taking an upper-level computer science course in bioinformatics and I have always thought of this as a possible field of study when I attend graduate school.  I have learned about different global alignment algorithms such as Needleman?Wunsch and Smith?Waterman in class to match proteins and DNA sequences and later we are going to study the HP folding problem in-depth.  I am well versed in the Java programming language, having taken all of the Java courses at my college, and confident in my abilities to contribute to the BioJava project.  I consider the All-Java Multiple Sequence Alignment project described in your wiki article [http://biojava.org/wiki/Google_Summer_of_Code] something within my abilities as an experienced Java programmer with past research experience and an interest in the field of bioinformatics.  Updating the BioJava code to be newly compliant and eventually implementing a Clustal algorithm for multiple sequence alignment is well within my grasp especially on completion of my college?s bioinformatics course and studying BioJava?s documentation.  I would just like your feedback on my proposal for working on your project.  I hope to hear from you soon and to apply for the position through Google.

Sincerely,

Zack Fitzsimmons


From andreas at sdsc.edu  Wed Mar 24 00:33:09 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Tue, 23 Mar 2010 17:33:09 -0700
Subject: [Biojava-l] GSoC update
Message-ID: <59a41c431003231733t1e259753k55fbe0a8bfb801a3@mail.gmail.com>

Hi,

A quick update regarding the current status of our Google Summer of Code
project: Several students already have expressed their interest. In fact the
response was so good that I believe BioJava should try to run more than just
one project.  In the meanwhile we added another "mentor proposed" project to
our GSoC page : http://biojava.org/wiki/Google_Summer_of_Code . Identification
and Classification of Posttranslational Modification of Proteins:  Develop a
Postranslational Modification package for the BioJava project.

In general Google strongly encourages to have student-proposed projects,
since historically those are often the most successful GSoC projects. It is
recommended that students contact us / possible mentors prior to their
application so we can match up students with suitable mentors and projects
and we can help in solidifying your project ideas. In principle any BioJava
contributor is suitable as a mentor. Students can apply between March 22nd
and April 9th via the google web site. http://socghop.appspot.com/

Andreas


From andreas at sdsc.edu  Wed Mar 24 15:37:43 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Wed, 24 Mar 2010 08:37:43 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>
	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
	<59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>
	<3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
Message-ID: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>

Hi Shakuntala,

If the score is positive or negative only depends on the implementation and
representation... I think most people expect the score to be positive, so
the toAlignmentString method displays it as a positive value, while
internally it is a bit different...

Andreas

On Wed, Mar 24, 2010 at 3:32 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:

> Hello Andreas!
> Thanks for the quick reply.
> I tried the getAlignmentString. It provides a lot of information. However,
> I think there is a slight problem here. From the getAlignmentString call I
> see that the score after aligning a pair of dna strings is 2706.
> But when I view the return value from the method pairwiseAlignment (for the
> same set) then the score is -2706.  Why?
>
> Thanks
> Shakuntala
>
> *
> *
>
>
> On Tue, Mar 23, 2010 at 3:46 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>
>> Hi Shakuntala,
>>
>> at the present the NeedlemanWunch implementation does not make it totally
>> straightforward to access the %id. You could try parsing the result of the
>> getAlignmentString() call and accessing the information from there ...
>> Making the underlying data more accessible is on the TODO list for this
>> module: http://biojava.org/wiki/BioJava:Modules
>>
>> Andreas
>>
>> 2010/3/21 Shakuntala baichoo <shakunb at uom.ac.mu>
>>
>> Hi Andreas!
>>> The problem is as follows. We have a bacteria file. There are about 565
>>> named genes/features there. We wish to compare each gene with the other 564
>>> genes. I am using needleman-wunsch from biojava to do so. For one specific
>>> run, I am attaching the result.
>>> The score after comparing Feature no. 0 with Feature no. 1 to Feature no.
>>> 564 is displayed (along with the product name etc...). If I wish to
>>> interpret these scores as a percentage homology, how do I do it?
>>>
>>> P.S. Most of the scores are -ve. Only one or a few is +ve.  The
>>> comparison is done using NUCC44.txt.
>>>
>>> Thanks
>>> Kind Regards
>>> Shakuntala
>>>
>>>
>>> On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>
>>>> sorry, can you clarify: what do you mean with you "get mostly -ve" ?
>>>>
>>>> Andreas
>>>>
>>>>
>>>> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo <shakunb at uom.ac.mu>wrote:
>>>>
>>>>> Hi!
>>>>> I would like to know the interpretation of the scores after running the
>>>>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix.
>>>>> Actually I have taken the named genes from a bacteria EMBL file and I
>>>>> am
>>>>> trying to compare each gene to the other genes in the lot, using the
>>>>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix.
>>>>> I
>>>>> would like to determine the % match for each pair but since I get
>>>>> mostly -ve
>>>>> and some positive values, I would like to know how to calculate the %
>>>>> match
>>>>> for a pair of genes.
>>>>> I would be grateful if anybody could help me.
>>>>>
>>>>> Thanks.
>>>>> Shakuntala
>>>>>
>>>>> On Thu, Mar 18, 2010 at 8:00 PM, <biojava-l-request at lists.open-bio.org
>>>>> >wrote:
>>>>>
>>>>> > Send Biojava-l mailing list submissions to
>>>>> >        biojava-l at lists.open-bio.org
>>>>> >
>>>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>>>> >        http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> > or, via email, send a message with subject or body 'help' to
>>>>> >        biojava-l-request at lists.open-bio.org
>>>>> >
>>>>> > You can reach the person managing the list at
>>>>> >        biojava-l-owner at lists.open-bio.org
>>>>> >
>>>>> > When replying, please edit your Subject line so it is more specific
>>>>> > than "Re: Contents of Biojava-l digest..."
>>>>> >
>>>>> >
>>>>> > Today's Topics:
>>>>> >
>>>>> >   1. Re: SVN repository (Andreas Prlic)
>>>>> >
>>>>> >
>>>>> >
>>>>> ----------------------------------------------------------------------
>>>>> >
>>>>> > Message: 1
>>>>> > Date: Wed, 17 Mar 2010 10:39:41 -0700
>>>>> > From: Andreas Prlic <andreas at sdsc.edu>
>>>>> > Subject: Re: [Biojava-l] SVN repository
>>>>> > To: Richard Finkers <Richard.Finkers at wur.nl>
>>>>> > Cc: biojava-l at lists.open-bio.org
>>>>> > Message-ID:
>>>>> >        <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com>
>>>>> > Content-Type: text/plain; charset=ISO-8859-1
>>>>> >
>>>>> > I have just heard back from the OBF-helpdesk. The VM hosting the
>>>>> anonymous
>>>>> > SVN is currently down. Depending on how big the problem turns out to
>>>>> be, it
>>>>> > will be back at some point later today / should be back latest
>>>>> tomorrow.
>>>>> >
>>>>> > Sorry for this inconvenience.
>>>>> > Andreas
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter <
>>>>> biopython at maubp.freeserve.co.uk
>>>>> > >wrote:
>>>>> >
>>>>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers <
>>>>> Richard.Finkers at wur.nl
>>>>> > >
>>>>> > > wrote:
>>>>> > > >
>>>>> > > > Hi,
>>>>> > > >
>>>>> > > > I would like to have a look at the BioJava 3 code (and perhaps in
>>>>> the
>>>>> > > future
>>>>> > > > contribute to). However, I cannot access the SVN repository
>>>>> > > > (
>>>>> > >
>>>>> >
>>>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk
>>>>> > > ).
>>>>> > > >
>>>>> > > > Is the repository down?
>>>>> > > >
>>>>> > > > Thanks,
>>>>> > > > Richard
>>>>> > >
>>>>> > > Probably :(
>>>>> > >
>>>>> > > There have been problems discussed on the BioPerl mailing list
>>>>> > > (they use the same servers), and the OBF team are aware of it:
>>>>> > >
>>>>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>>>>> > >
>>>>> > > The code.open-bio.org repositories are a read only public mirror,
>>>>> > > while dev.open-bio.org is the master repository I think is fine
>>>>> > > (but not available for anonymous download).
>>>>> > >
>>>>> > > In the mean time BioPerl have also setup a read only mirror
>>>>> > > on github - perhaps BioJava could do the same? Meanwhile
>>>>> > > BioRuby and Biopython are just using github (not SVN or CVS).
>>>>> > >
>>>>> > > Peter
>>>>> > > _______________________________________________
>>>>> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> > >
>>>>> >
>>>>> >
>>>>> > ------------------------------
>>>>> >
>>>>> > _______________________________________________
>>>>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> >
>>>>> >
>>>>> > End of Biojava-l Digest, Vol 86, Issue 9
>>>>> > ****************************************
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Dr. (Mrs.) S.Baichoo
>>>>> Senior Lecturer
>>>>> CSE Dept, FoE
>>>>> University of Mauritius
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Dr. (Mrs.) S.Baichoo
>>> Senior Lecturer
>>> CSE Dept, FoE
>>> University of Mauritius
>>>
>>
>>
>
>
> --
> Best Regards
>
> Dr. (Mrs.) S.Baichoo
> Senior Lecturer
> CSE Dept, FoE
> University of Mauritius
>


From jeedward at yahoo.com  Thu Mar 25 00:27:28 2010
From: jeedward at yahoo.com (John Edward)
Date: Wed, 24 Mar 2010 17:27:28 -0700 (PDT)
Subject: [Biojava-l] Call for papers (Deadline Extended): BCBGC-10, USA,
	July 2010
Message-ID: <852924.28793.qm@web45911.mail.sp1.yahoo.com>

It
would be highly appreciated if you could share this announcement with your
colleagues, students and individuals whose research is in bioinformatics,
computational biology, genomics, data-mining, and related areas.
 
Call
for papers (Deadline Extended): BCBGC-10, USA, July 2010
 
The
2010 International Conference on Bioinformatics, Computational Biology,
Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will
be held during 12-14 of July 2010 in Orlando, FL, USA.  BCBGC is an important event in the areas of
bioinformatics, computational biology, genomics and chemoinformatics and
focuses on all areas related to the conference.
 
The
conference will be held at the same time and location where several other major
international conferences will be taking place. The conference will be held as
part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during
July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to
promote research and developmental activities in computer science, information
technology, control engineering, and related fields. Another goal is to promote
the dissemination of research to a multidisciplinary audience and to facilitate
communication among researchers, developers, practitioners in different fields.
The following conferences are planned to be organized as part of MULTICONF-10.
 
?           International Conference on
Artificial Intelligence and Pattern Recognition (AIPR-10)
?            International Conference on
Automation, Robotics and Control Systems (ARCS-10)
?           International Conference on
Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10)
?           International Conference on Computer
Communications and Networks (CCN-10)
?           International Conference on
Enterprise Information Systems and Web Technologies (EISWT-10)
?           International Conference on High
Performance Computing Systems (HPCS-10)
?           International Conference on
Information Security and Privacy (ISP-10) 
?           International Conference on Image and
Video Processing and Computer Vision (IVPCV-10)
?           International Conference on Software
Engineering Theory and Practice (SETP-10) 
?           International Conference on
Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) 
 
 
MULTICONF-10
will be held at Imperial Swan Hotel and Suites.  It is a full-service resort that puts you in the middle of the fun!
Located 1/2 block south of the famed International Drive, the hotel is just
minutes from great entertainment like Walt Disney World? Resort, Universal
Studios and Sea World Orlando. Guests can enjoy free scheduled transportation
to these theme parks, as well as spacious accommodations, outdoor pools and
on-site dining ? all situated on 10 tropically landscaped acres. Here, guests
can experience a full-service resort with discount hotel pricing in Orlando.
 
We
invite draft paper submissions. Please see the website http://www.PromoteResearch.org for
more details.
 
Sincerely
John
Edward


From andreas.draeger at uni-tuebingen.de  Thu Mar 25 14:19:02 2010
From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=)
Date: Thu, 25 Mar 2010 15:19:02 +0100
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>	<59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>	<3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
	<59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>
Message-ID: <4BAB70D6.5060309@uni-tuebingen.de>

Hi Andreas and Shakuntala,

The alignment classes have just been revised and can be now updated from 
the repository. As a major improvement the alignment result has become 
much easier usable. So, if you're interested in computing something 
based on the score, you can now simply apply the dedicated get method 
and don't have to care about parsing anymore. I hope that helps.

Cheers
Andreas

-- 
Dipl.-Bioinform. Andreas Dr?ger
Eberhard Karls University T?bingen
Center for Bioinformatics (ZBIT)
Sand 1
72076 T?bingen
Germany

Phone: +49-7071-29-70436
Fax:   +49-7071-29-5091


From mitlox at op.pl  Thu Mar 25 13:23:37 2010
From: mitlox at op.pl (xyz)
Date: Thu, 25 Mar 2010 23:23:37 +1000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
Message-ID: <20100325232337.3021200a@wp01>

Hi James,
Thank you for the solution, but I get this 
7
13
23
30
as output for this input file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttccccccccccccccccccccccc

How is it possible to fix it and why did you chose Comparator and not
Comparable?

Thank you in advance.

Best regards,


On Sun, 21 Mar 2010 16:56:35 -0400
James Swetnam <jswetnam at gmail.com> wrote:

> Just hacked this together, warning: I am new to both java and biojava.
> 
> import java.io.*;
> import java.util.*;
> 
> import org.biojava.bio.BioException;
> import org.biojava.bio.symbol.*;
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.*;
> 
> import java.util.Comparator;
> 
> public class SortFasta {
> 
>     static private class RichSequenceComparator implements
> Comparator<RichSequence> {
> 
>     public int compare(RichSequence seq1, RichSequence seq2)
>     {
>         return seq1.length() - seq2.length();
>     }
> 
> 
>     }
> 
>     // Usage:  SortFasta unsortedFile.fasta
>     public static void main(String[] args) throws
> FileNotFoundException, BioException {
> 
>     String fastaFile = args[0];
> 
>     BufferedReader br = new BufferedReader(new FileReader(fastaFile));
>     SimpleNamespace ns = new SimpleNamespace("biojava");
> 
>     Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");
> 
>     RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
>                                   protein.getTokenization("token"),
>                                   ns);
> 
>     SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
> SortFasta.RichSequenceComparator());
> 
>     while (rsi.hasNext()) {
>         sorted.add(rsi.nextRichSequence());
>     }
> 
>     Iterator<RichSequence> sortedIt = sorted.iterator();
> 
>     //Do whatever you want here with the ascending list of
> RichSequences by length, I'll just print them.
>     while(sortedIt.hasNext())
>         {
>         System.out.println(((RichSequence) sortedIt.next()).length());
>         }
>     }
> }
> 


From holland at eaglegenomics.com  Thu Mar 25 16:27:17 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Thu, 25 Mar 2010 16:27:17 +0000
Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject
	:( Hibernate Exception and suggestion for change in BioSqlSchema)
In-Reply-To: <4BAABA21.4000301@gmail.com>
References: <4BAABA21.4000301@gmail.com>
Message-ID: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>

Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL.

On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:

> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed.
> 
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> Hi
> In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database.
> 
> Can somebody please have a look on second issue of it and fix it
> "
> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> "
> 
> Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it.
> Have a look on attached files 
> 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava
> 2) GenbankRecord.doc  ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava
> 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class
> 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> 
> 
> Thanks
> Deepak Sheoran
> 
> 
> 
> -------- Original Message --------
> Subject:	Re: Hibernate Exception and suggestion for change in BioSqlSchema
> Date:	Tue, 9 Feb 2010 20:34:32 +1300
> From:	Richard Holland <holland at eaglegenomics.com>
> To:	Deepak Sheoran <sheoran143 at gmail.com>
> CC:	biojava-l at biojava.org
> 
> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.
> 
> However, in answer to your two questions:
> 
>   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).
> 
>   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> 
> cheers,
> Richard
> 
> On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> 
> > 
> > Hi Richard
> > 
> > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
> > 
> > 
> > Thanks
> > Deepak Sheoran
> > -------- Original Message --------
> > Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
> > Date:	Wed, 03 Feb 2010 08:07:35 -0600
> > From:	Deepak Sheoran 
> <sheoran143 at gmail.com>
> 
> > To:	
> biojava-l at lists.open-bio.org
> 
> > 
> > Hi guys,
> > 
> > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:  
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> 
> > On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
> > 	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
> > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
> >  Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
> > But problem is with below part of that method:
> > ?..LineNumber: 114
> > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> >  {                queryType = "DocRef";
> >                 // convert List constructor to String representation for query
> >                 ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> >                 if (ourParamsList.size()<3) {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
> >                 } else {
> >                         queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
> >                 }       
> >  }
> > ..LineNubmer: 123
> > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
> > ?.LineNumber: 447
> > else {
> >                                         try {
> >                                             CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
> >                                             RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
> >                                             rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> >                                         } catch (ChangeVetoException e) {
> >                                             throw new ParseException(e+", accession:"+accession);
> >                                         }
> >                                     }
> >                     ?..LineNumber:455
> > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
> >  
> > The only way to get these record in database is:
> > 		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
> > 		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
> >  
> > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
> > Reference_id
> > Dbxref_id         
> > Location
> > Title
> > Authors
> > crc
> > 216
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > 9E940E01F4BE3CD0
> > 230
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > D3BC0C17F3F786C9
> > 415
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 60AEDFA0CEEACC38
> > 969
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 4B1232999F6E8130
> > 929
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 3E79B40DD2AAA2B7
> > 932
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 094EB3384F8D6DE8
> > 1426
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
> > 357648D8FD8C6C8A
> > 1481
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > 115411EB2DEE5654
> > 1497
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D5D376EECCD186B
> > 1501
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D57954EECDED66B
> > 1556
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > 698688FB6DB95247
> > 1559
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > E25E1BA99DB18F3D
> >  
> > 	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> > 		? Which means in richsequence object some feature have location object which have its feature set to null.
> > 		? My Observation:
> > 			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
> > 			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
> > 			? Below is the screen shot of one of my tests
> > 				? Settings before trying to persits the richsequence object to database
> >  
> > <Mail Attachment.png>
> > 		?  
> > 		? After trying to persits the richsequence object to database and got in hibernate exception catch
> >  
> > 		? <Mail Attachment.png>
> >  
> > 		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
> > 		? Some extra information to make things more clear to you guys.
> > 			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
> > 				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
> > 					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
> > 				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
> > 					? richSequence.feature Index : 127 and line number in the genbank record : 2137
> > 				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
> > 					? richSequence.feature Index : 389 and line number in the genbank record : 3632
> > 				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
> > 					? richSequence.feature Index : 47 and line number in the genbank record : 4841
> > 				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
> > 					? richSequence.feature Index : 45 and line number in the genbank record : 442
> > 		? The complete exception msg :
> > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> >         at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> >         at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> >         at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> >         at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> >         at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> >         at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> >         at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> >         at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> >         at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> >         at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> >         at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> >         at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> >         at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> >         at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> >  
> >  
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: 
> holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 
> 
> 
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andreas at sdsc.edu  Thu Mar 25 16:47:45 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 25 Mar 2010 09:47:45 -0700
Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject
	:( Hibernate Exception and suggestion for change in BioSqlSchema)
In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
Message-ID: <59a41c431003250947g6ecd11cbw21c5be5858b9aa09@mail.gmail.com>

Excellent, thanks Richard and Deepak!
Andreas

On Thu, Mar 25, 2010 at 9:27 AM, Richard Holland
<holland at eaglegenomics.com>wrote:

> Patched and in subversion on the head in the new Biojava 3 code. I modified
> the code slightly to simplify it. There were also parallel changes required
> over in SimpleDocRef itself to enable it to continue working without being
> connected to BioSQL.
>
> On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:
>
> > I am writing this email again, I didn't get any response weather this
> bugs are patched or are they lost some where on mailing list. I am not sure
> that's why I am writing this back. I don't know how to apply this patch So I
> am counting on you guys to apply theses patch and reply me back so I know
> its fixed.
> >
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> > Hi
> > In response to bug fix suggested by Richard I have created some patches.
> We need to apply these to fix biojava from processing references from a
> genbank record in a wrong manner which cause more hibernate exceptions.
> After applying patch, reference resolution code will test pubmed or medline
> id, then if no match then test author/title/location, then if still no match
> create a new reference. I even tested it with GenbankRelease 175 and I
> gained almost 3159 more records in my database.
> >
> > Can somebody please have a look on second issue of it and fix it
> > "
> > 2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> > "
> >
> > Also I am planning on making a bridge between biosql database loaded
> using bioperl and biojava, here is my some of the investigation can you guys
> suggest some direction on it.
> > Have a look on attached files
> > 1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank
> record is stored in biosql instance by bioperl and biojava
> > 2) GenbankRecord.doc  ==> its word document having a genbank showing
> where its information goes in biosql using bioperl and biojava
> > 3) BioSqlRichobjectBuilder.patch ==> patch needed for
> BioSqlRichObjectBuild.java class
> > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
> >
> >
> > Thanks
> > Deepak Sheoran
> >
> >
> >
> > -------- Original Message --------
> > Subject:      Re: Hibernate Exception and suggestion for change in
> BioSqlSchema
> > Date: Tue, 9 Feb 2010 20:34:32 +1300
> > From: Richard Holland <holland at eaglegenomics.com>
> > To:   Deepak Sheoran <sheoran143 at gmail.com>
> > CC:   biojava-l at biojava.org
> >
> > Hi. It's possible that your original email didn't make it to the list
> because it is HTML format, and the list only accepts plain text.
> >
> > However, in answer to your two questions:
> >
> >   1. The code that does the resolution of references might be better if
> it looks up existing IDs rather than using author, title, location to
> identify existing records. I would suggest modifying it to a three-step
> process - test ID, then if no match then test author/title/location, then if
> still no match create a new reference. Could someone do that? (I'm unable to
> do anything until late March).
> >
> >   2. I think that's a bug (compound locations with null features) but not
> sure why. Could be that the process of constructing a CompoundRichLocation
> is somehow losing the feature reference from the original
> SimpleRichLocation. Again I can't investigate until March - can someone else
> take a look at the code? (A good starting point would be to look at how a
> CompoundRichLocation decides to select the feature from the
> SimpleRichLocations it is made up from).
> >
> > cheers,
> > Richard
> >
> > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
> >
> > >
> > > Hi Richard
> > >
> > > Below is the email which I sent to Biojava-1 mailing list but it never
> get posted on the mailing list server neither do i got any response, so
> please have a look on this email and tell what can be the solution of the
> problem described in the message.
> > >
> > >
> > > Thanks
> > > Deepak Sheoran
> > > -------- Original Message --------
> > > Subject:    Hibernate Exception and suggestion for change in
> BioSqlSchema
> > > Date:       Wed, 03 Feb 2010 08:07:35 -0600
> > > From:       Deepak Sheoran
> > <sheoran143 at gmail.com>
> >
> > > To:
> > biojava-l at lists.open-bio.org
> >
> > >
> > > Hi guys,
> > >
> > > A couple of days back I was having some problem with hibernate
> exception but that exception got resolved and the reference to that email
> is:
> >
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
> >
> > > On Richard  suggestion in above link  I am able to resolve some of
>  issues but then, I got stuck in to some other error with hibernate and then
> decided to investigate the matter and below are some facts and information
> which I found and I guess it is going to affect all of us.
> > >     ? The "Reference" table in bioSql schema have unique constraint on
> "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)).
> Which mean only one entry in reference table can use on dbxref_id.
> > > This Works wells but in cases when you have little variation in value
> of following column "location", "title", "authors" and all these variation
> refers to same PUBMED_ID. Then we can't persist or create a richsequence
> object .
> > >  Now when you tie RichObjectFactory to a  active hibernate session then
> the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class
> clazz, List paramsList) " which is responsible  for looking up details of
> object in the database and if it find one then it will return that object,
> else it will try to persist the new object into the database.
> > > But problem is with below part of that method:
> > > ?..LineNumber: 114
> > > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> > >  {                queryType = "DocRef";
> > >                 // convert List constructor to String representation
> for query
> > >                 ourParamsList.set(0,
> DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> > >                 if (ourParamsList.size()<3) {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title is null";
> > >                 } else {
> > >                         queryText = "from DocRef as cr where cr.authors
> = ? and cr.location = ? and cr.title = ?";
> > >                 }
> > >  }
> > > ..LineNubmer: 123
> > > Now when hibernate search the database, it won't find any other record
> in "reference" table because those two record are different in string
> comparison, so it will return a new object back to "GenbankFormat" to
> following piece of code
> > > ?.LineNumber: 447
> > > else {
> > >                                         try {
> > >                                             CrossRef cr =
> (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new
> Object[]{dbname, raccession, new Integer(0)});
> > >                                             RankedCrossRef rcr = new
> SimpleRankedCrossRef(cr, ++rcrossrefCount);
> > >
> rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> > >                                         } catch (ChangeVetoException e)
> {
> > >                                             throw new
> ParseException(e+", accession:"+accession);
> > >                                         }
> > >                                     }
> > >                     ?..LineNumber:455
> > > Then we will add that object to rlistener. And move to next part of
> genbank record and then biojava search for a new crossref in database and it
> will try to persist the old one it get a hibernate exception regarding
> violation of  "unique constraint on dbxref_id" column.
> > >
> > > The only way to get these record in database is:
> > >             ? The very easy solution and the way I did it for testing
> my theory is Change the bioSql schema so that it can allow many to one on
> relation between "reference" and "dbxref" table.  Which even make sense
> because one paper can have many different variation of naming, and this
> change allow us to store that info too. But this is something BioSql people
> have decide and I don't know how to approach them.
> > >             ? Second solution is slightly difficult to implement, is to
> change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List
> paramsList)"  make decision about weather a particular DocRef already exist
> in database or not. I am mean testing all possible string variations of
> authors, location, title of the docRef which we are searching. Which does
> have many complications and may slow down process of creating a richsequence
> object when link RichObjectFactory with a active hibernate session.
> > >
> > > Example:Below is a sample of what i have in my local biosql schema
> which has modification suggested by me. (dbxref_id column have Pubmed_id , I
> replaced the local dbxref_id which was present on this table in my database
> with pubmed_id stored in "dbxref" table, for easy reference with outside
> world in this email)
> > > Reference_id
> > > Dbxref_id
> > > Location
> > > Title
> > > Authors
> > > crc
> > > 216
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536
> (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > 9E940E01F4BE3CD0
> > > 230
> > > 18554304
> > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > > Isolation of lactate-utilizing butyrate-producing bacteria from human
> feces and in vivo administration of Anaerostipes caccae strain L2 and
> galacto-oligosaccharides in a rat model
> > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y.,
> Nomoto,K., Ito,M. and Sawada,H.
> > > D3BC0C17F3F786C9
> > > 415
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is
> Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via
> Recombination with Repetitive Chromosomal Sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 60AEDFA0CEEACC38
> > > 969
> > > 16790744
> > > Infect. Immun. 74 (7), 3715-3726 (2006)
> > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is
> extensive in vitro and in vivo and suggests that variation is generated via
> recombination with repetitive chromosomal sequences
> > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and
> Totten,P.A.
> > > 4B1232999F6E8130
> > > 929
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A.,
> Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W.,
> Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P.,
> Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and
> Venter,J.C.
> > > 3E79B40DD2AAA2B7
> > > 932
> > > 8688087
> > > Science 273 (5278), 1058-1073 (1996)
> > > Complete genome sequence of the methanogenic archaeon, Methanococcus
> jannaschii
> > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D.,
> Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D.,
> Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I.,
> Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A.,
> Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T.,
> Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C.,
> Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M.,
> Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > > 094EB3384F8D6DE8
> > > 1426
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O.,
> Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S.,
> Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M.,
> Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and
> Fraser,C.M.
> > > 357648D8FD8C6C8A
> > > 1481
> > > 10684935
> > > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae
> AR39
> > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O.,
> Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K.,
> Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W.,
> DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > > 115411EB2DEE5654
> > > 1497
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D5D376EECCD186B
> > > 1501
> > > 14689165
> > > Arch. Microbiol. 181 (2), 144-154 (2004)
> > > The effect of FITA mutations on the symbiotic properties of
> Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R.,
> Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G.
> and Ruiz-Sainz,J.E.
> > > 4D57954EECDED66B
> > > 1556
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4
> and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > 698688FB6DB95247
> > > 1559
> > > 18060065
> > > PLoS ONE 2 (12), E1271 (2007)
> > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4
> and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C.,
> Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > > E25E1BA99DB18F3D
> > >
> > >     ? The second kind of error which I got was :
> org.hibernate.PropertyValueException: not-null property references a null or
> transient value: Location.feature
> > >             ? Which means in richsequence object some feature have
> location object which have its feature set to null.
> > >             ? My Observation:
> > >                     ? Usually occur when you try to persist a
> richsequence object to database, and occur to those features which have
> CompoundRichLocation usually "joins" and "complement" in cds region of a
> genbank record
> > >                     ? After catching the hibernate exception I went
> through all the features and either biojava or hibernate  changed the object
> type of a CompoundRichLocation  to SimpleRichLocation and set the feature
> variable to null.
> > >                     ? Below is the screen shot of one of my tests
> > >                             ? Settings before trying to persits the
> richsequence object to database
> > >
> > > <Mail Attachment.png>
> > >             ?
> > >             ? After trying to persits the richsequence object to
> database and got in hibernate exception catch
> > >
> > >             ? <Mail Attachment.png>
> > >
> > >             ? So my question is why is this happening and how to stop
> or how to get these record into database, I have no clue why is this
> happening.
> > >             ? Some extra information to make things more clear to you
> guys.
> > >                     ? Below are some Locus line from genbank record for
> which I know the error of location, I mean the cds region causing error, and
> array index in richsequence.feature arrayList object.
> > >                             ? LOCUS       AE001439             1643831
> bp    DNA     circular BCT 19-JAN-2006
> > >                                     ? richSequence.feature Index : 2540
> and line number in the genbank record : 22115
> > >                             ? LOCUS       CP001189             3887492
> bp    DNA     circular BCT 16-OCT-2008
> > >                                     ? richSequence.feature Index : 127
> and line number in the genbank record : 2137
> > >                             ? LOCUS       CP001292              328635
> bp    DNA     circular BCT 17-DEC-2008
> > >                                     ? richSequence.feature Index : 389
> and line number in the genbank record : 3632
> > >                             ? LOCUS       AM279694              238517
> bp    DNA     linear   BCT 23-OCT-2008
> > >                                     ? richSequence.feature Index : 47
> and line number in the genbank record : 4841
> > >                             ? LOCUS       CR931663               18517
> bp    DNA     linear   BCT 18-SEP-2008
> > >                                     ? richSequence.feature Index : 45
> and line number in the genbank record : 442
> > >             ? The complete exception msg :
> > > org.hibernate.PropertyValueException: not-null property references a
> null or transient value: Location.feature
> > >         at
> org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > >         at
> org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > >         at
> org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > >         at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > >         at
> org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > >         at
> org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > >         at
> org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > >         at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > >         at
> org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > >         at
> org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> > >         at
> org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > >         at
> org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> > >         at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> > >         at
> trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> > >
> > >
> >
> > --
> > Richard Holland, BSc MBCS
> > Operations and Delivery Director, Eagle Genomics Ltd
> > T: +44 (0)1223 654481 ext 3 | E:
> > holland at eaglegenomics.com
> > http://www.eaglegenomics.com/
> >
> >
> >
> >
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>


From andreas at sdsc.edu  Thu Mar 25 16:56:21 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Thu, 25 Mar 2010 09:56:21 -0700
Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9
In-Reply-To: <4BAB70D6.5060309@uni-tuebingen.de>
References: <mailman.7.1268928002.8593.biojava-l@lists.open-bio.org>
	<3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com>
	<59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com>
	<3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com>
	<59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com>
	<3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com>
	<59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com>
	<4BAB70D6.5060309@uni-tuebingen.de>
Message-ID: <59a41c431003250956h14abdbe2t1367bec10069d1f3@mail.gmail.com>

Hi Andreas,

that sounds great! I'll take a look at this soon...

Thanks,
Andreas

On Thu, Mar 25, 2010 at 7:19 AM, Andreas Dr?ger <
andreas.draeger at uni-tuebingen.de> wrote:

> Hi Andreas and Shakuntala,
>
> The alignment classes have just been revised and can be now updated from
> the repository. As a major improvement the alignment result has become much
> easier usable. So, if you're interested in computing something based on the
> score, you can now simply apply the dedicated get method and don't have to
> care about parsing anymore. I hope that helps.
>
> Cheers
> Andreas
>
> --
> Dipl.-Bioinform. Andreas Dr?ger
> Eberhard Karls University T?bingen
> Center for Bioinformatics (ZBIT)
> Sand 1
> 72076 T?bingen
> Germany
>
> Phone: +49-7071-29-70436
> Fax:   +49-7071-29-5091
>


From zhangyiwei79 at gmail.com  Thu Mar 25 20:14:50 2010
From: zhangyiwei79 at gmail.com (Yiwei Zhang)
Date: Thu, 25 Mar 2010 16:14:50 -0400
Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment
	project of Google Summer of Code
Message-ID: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com>

Hi,

I am a graduate student of computer science and my field of study is related
to Bioinformatic algorithms. I am proficient at  JAVA programming. I feel
very interested in this project because currently I am working on sequence
alignment and phylogeny tree reconstruction.

My question is that, if the project requires implementing the existing
alignment algorithms of current tools, what is the
original implementation language of the tools? C++ or C or something else?

Thanks!


From biopython at maubp.freeserve.co.uk  Thu Mar 25 22:16:55 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Mar 2010 22:16:55 +0000
Subject: [Biojava-l] [Biojava-dev] Bug fix for Biojava in regard to
	email with subject : ( Hibernate Exception and suggestion for
	change in BioSqlSchema)
In-Reply-To: <4BABAFA1.6090806@orionbiosciences.com>
References: <4BAABA21.4000301@gmail.com>
	<4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com>
	<4BABAFA1.6090806@orionbiosciences.com>
Message-ID: <320fb6e01003251516w2977ab2h9869342f94576287@mail.gmail.com>

On Thu, Mar 25, 2010 at 6:46 PM, Deepak Sheoran
<deepak.sheoran at orionbiosciences.com> wrote:
>
> That is reason why I was getting error when i was creating a Richsequence
> object without any active session to biosql, I didn't had the clue that I
> created one more bug by fixing one, thanks for noticing that and fixing
> that.
>
> I am thinking should we use bioperl -biojava and biosql compatibility ?as
> one of the google summer of code project. I have vision on this, but don't
> know right way to being with. This can ?help people who want to use biojava
> but can't because they are afraid to loos their Perl code,which is heavily
> dependent on perl way of loading the schema. Or come out with a hybrid way
> which have good from both languages.
>
> Deepak Sheoran

That is an interesting idea for GSoC, I wonder if we at Biopython
should do the same. I know of a few things where we differ from
BioPerl's BioSQL support (e.g. SwissProt comment lines).

[I take we agree that bioperl-db is the de facto reference
implementation for mapping GenBank etc into BioSQL?]

Peter


From chapman at cs.wisc.edu  Fri Mar 26 07:14:24 2010
From: chapman at cs.wisc.edu (Mark Chapman)
Date: Fri, 26 Mar 2010 02:14:24 -0500
Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment
 project of Google Summer of Code
In-Reply-To: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com>
References: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com>
Message-ID: <4BAC5ED0.1050009@cs.wisc.edu>

Hi Yiwei (and list members),

I am also a graduate student in Bioinformatics interested in the Google Summer 
of Code project.  The authors' current implementations of ClustalW and ClustalX 
are written in C++.  Binaries, code, and references are located at 
http://www.clustal.org/ .  Download the boldfaced references (Larkin et al 2007 
and Thompson et al 1994) for the most relevant information.

Take care,
Mark


On 3/25/2010 3:14 PM, Yiwei Zhang wrote:
> Hi,
>
> I am a graduate student of computer science and my field of study is related
> to Bioinformatic algorithms. I am proficient at  JAVA programming. I feel
> very interested in this project because currently I am working on sequence
> alignment and phylogeny tree reconstruction.
>
> My question is that, if the project requires implementing the existing
> alignment algorithms of current tools, what is the
> original implementation language of the tools? C++ or C or something else?
>
> Thanks!
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From bernd.jagla at pasteur.fr  Fri Mar 26 09:33:05 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Fri, 26 Mar 2010 10:33:05 +0100
Subject: [Biojava-l] SVN repository
In-Reply-To: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org><4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
	<59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
Message-ID: <776506315DB04C3EBF2A7FDA610390AB@zillumina>

Hi,

I am trying to check out biojava for the first time, and I am not sure if
the server is still down... Could you please let me if it is up or down?

Thanks,

Bernd

> -----Original Message-----
> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic
> Sent: Wednesday, March 17, 2010 6:40 PM
> To: Richard Finkers
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] SVN repository
> 
> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
> SVN is currently down. Depending on how big the problem turns out to be,
> it
> will be back at some point later today / should be back latest tomorrow.
> 
> Sorry for this inconvenience.
> Andreas
> 
> 
> 
> 
> On Wed, Mar 17, 2010 at 3:16 AM, Peter
> <biopython at maubp.freeserve.co.uk>wrote:
> 
> > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers
> <Richard.Finkers at wur.nl>
> > wrote:
> > >
> > > Hi,
> > >
> > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > future
> > > contribute to). However, I cannot access the SVN repository
> > > (
> > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-
> live/trunk
> > ).
> > >
> > > Is the repository down?
> > >
> > > Thanks,
> > > Richard
> >
> > Probably :(
> >
> > There have been problems discussed on the BioPerl mailing list
> > (they use the same servers), and the OBF team are aware of it:
> > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> >
> > The code.open-bio.org repositories are a read only public mirror,
> > while dev.open-bio.org is the master repository I think is fine
> > (but not available for anonymous download).
> >
> > In the mean time BioPerl have also setup a read only mirror
> > on github - perhaps BioJava could do the same? Meanwhile
> > BioRuby and Biopython are just using github (not SVN or CVS).
> >
> > Peter
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From mitlox at op.pl  Fri Mar 26 09:57:41 2010
From: mitlox at op.pl (xyz)
Date: Fri, 26 Mar 2010 19:57:41 +1000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
Message-ID: <20100326195741.4799c398@wp01>

@Andy: Thank you for the explanation. After the last sequence in the
input file in no newline character. 

@James: I change the code in order to get the biggest sequence first,
but the last sequence is missing. 


import java.io.*;
import java.util.*;

import org.biojava.bio.BioException;
import org.biojava.bio.symbol.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.*;

import java.util.Comparator;

public class SortFasta2 {

  static private class RichSequenceComparator implements
  Comparator<RichSequence> {

    public int compare(RichSequence seq1, RichSequence seq2) {
      return  seq2.length() - seq1.length();
    }
  }

  // Usage:  SortFasta unsortedFile.fasta
  public static void main(String[] args) throws FileNotFoundException,
  BioException {

    String fastaFile = "sortFasta.fasta";

    BufferedReader br = new BufferedReader(new FileReader(fastaFile));
    SimpleNamespace ns = new SimpleNamespace("biojava");

    Alphabet protein = AlphabetManager.alphabetForName("DNA");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
            protein.getTokenization("token"),
            ns);
    

    SortedSet<RichSequence> sorted = new TreeSet<RichSequence>(new
    SortFasta2.RichSequenceComparator());

    while (rsi.hasNext()) {
      sorted.add(rsi.nextRichSequence());
    }

    Iterator<RichSequence> sortedIt = sorted.iterator();

    /*Do whatever you want here with the ascending list of
    RichSequences by length, I'll just print them. */
    while (sortedIt.hasNext()) {
      //System.out.println(((RichSequence) sortedIt.next()).length());
      //System.out.println(sortedIt.next().getComments());
      System.out.println(sortedIt.next().seqString());
    }
  }
}

Input file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttcccccccccccccccccccccca

Output on the screen:
tttttttccccccccccccccccccccccc
atccccccccccccccccctttt
atccccctttttt
atccccc

How is it possible to get the last sequence and print the output in
fasta format on the screen?

Thank you in advance.


On Thu, 25 Mar 2010 10:17:31 -0400
James Swetnam wrote:

> Just replace the system.out.println with whatever you want to do with
> the sequences; write them to a file, etc.
> 
> James
> 

On Fri, 26 Mar 2010 09:40:28 +0000
"Andy Law (RI)" wrote:

> Does your input file have a line feed at the end or not? (Just a  
> thought)
> 
> Comparable is for comparing two objects using their "natural"
> ordering and is therefore a "fundamental" property of the class. A
> Comparator lets you compare/sort two objects on any characteristics
> and you can have many different comparators. Since this is a somewhat
> arbitrary way of comparing sequences (you could sort them on
> alphabetical sequence for example, or GC content), I guess that's why
> James used a comparator.
> 


From richard.finkers at wur.nl  Fri Mar 26 10:10:39 2010
From: richard.finkers at wur.nl (Finkers, Richard)
Date: Fri, 26 Mar 2010 11:10:39 +0100
Subject: [Biojava-l] SVN repository
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org><4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
	<59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
	<776506315DB04C3EBF2A7FDA610390AB@zillumina>
Message-ID: <33AFFE3255BCA043AF09514A6F6BFBAED04C0D@scomp0039.wurnet.nl>

Hi Bernd,

It has been working for two days but it seems to be down again.

Richard


-----Original Message-----
From: Bernd Jagla [mailto:bernd.jagla at pasteur.fr]
Sent: Fri 2010-03-26 10:33
To: 'Andreas Prlic'; Finkers, Richard
Cc: biojava-l at lists.open-bio.org
Subject: RE: [Biojava-l] SVN repository
 
Hi,

I am trying to check out biojava for the first time, and I am not sure if
the server is still down... Could you please let me if it is up or down?

Thanks,

Bernd

> -----Original Message-----
> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic
> Sent: Wednesday, March 17, 2010 6:40 PM
> To: Richard Finkers
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] SVN repository
> 
> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
> SVN is currently down. Depending on how big the problem turns out to be,
> it
> will be back at some point later today / should be back latest tomorrow.
> 
> Sorry for this inconvenience.
> Andreas
> 
> 
> 
> 
> On Wed, Mar 17, 2010 at 3:16 AM, Peter
> <biopython at maubp.freeserve.co.uk>wrote:
> 
> > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers
> <Richard.Finkers at wur.nl>
> > wrote:
> > >
> > > Hi,
> > >
> > > I would like to have a look at the BioJava 3 code (and perhaps in the
> > future
> > > contribute to). However, I cannot access the SVN repository
> > > (
> > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-
> live/trunk
> > ).
> > >
> > > Is the repository down?
> > >
> > > Thanks,
> > > Richard
> >
> > Probably :(
> >
> > There have been problems discussed on the BioPerl mailing list
> > (they use the same servers), and the OBF team are aware of it:
> > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
> >
> > The code.open-bio.org repositories are a read only public mirror,
> > while dev.open-bio.org is the master repository I think is fine
> > (but not available for anonymous download).
> >
> > In the mean time BioPerl have also setup a read only mirror
> > on github - perhaps BioJava could do the same? Meanwhile
> > BioRuby and Biopython are just using github (not SVN or CVS).
> >
> > Peter
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l


From andy.law at roslin.ed.ac.uk  Fri Mar 26 10:12:11 2010
From: andy.law at roslin.ed.ac.uk (Andy Law (RI))
Date: Fri, 26 Mar 2010 10:12:11 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <20100326195741.4799c398@wp01>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
Message-ID: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>


On 26 Mar 2010, at 09:57, xyz wrote:

> @Andy: Thank you for the explanation. After the last sequence in the
> input file in no newline character.
>

Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not  
seeing the last sequence when the file is not terminated with a  
newline character. Is this a bug or a feature, folks?

Later,

Andy
--------
Yada, yada, yada...
The University of Edinburgh is a charitable body, registered in  
Scotland, with registration number SC005336
Disclaimer: This e-mail and any attachments are confidential and  
intended solely for the use of the recipient(s) to whom they are  
addressed. If you have received it in error, please destroy all copies  
and inform the sender.


From andy.law at roslin.ed.ac.uk  Fri Mar 26 10:36:25 2010
From: andy.law at roslin.ed.ac.uk (Andy Law (RI))
Date: Fri, 26 Mar 2010 10:36:25 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
Message-ID: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>


On 26 Mar 2010, at 10:28, Richard Holland wrote:

> That there be a bug.

Albeit one with a simple workaround while the SVN server is broken :o}

>
> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:
>
>>
>> On 26 Mar 2010, at 09:57, xyz wrote:
>>
>>> @Andy: Thank you for the explanation. After the last sequence in the
>>> input file in no newline character.
>>>
>>
>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are  
>> not seeing the last sequence when the file is not terminated with a  
>> newline character. Is this a bug or a feature, folks?
>>
>> Later,
>>
>> Andy
>> --------
>> Yada, yada, yada...
>> The University of Edinburgh is a charitable body, registered in  
>> Scotland, with registration number SC005336
>> Disclaimer: This e-mail and any attachments are confidential and  
>> intended solely for the use of the recipient(s) to whom they are  
>> addressed. If you have received it in error, please destroy all  
>> copies and inform the sender.
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>

Later,

Andy
--------
Yada, yada, yada...
The University of Edinburgh is a charitable body, registered in  
Scotland, with registration number SC005336
Disclaimer: This e-mail and any attachments are confidential and  
intended solely for the use of the recipient(s) to whom they are  
addressed. If you have received it in error, please destroy all copies  
and inform the sender.


From holland at eaglegenomics.com  Fri Mar 26 10:28:19 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 26 Mar 2010 10:28:19 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
Message-ID: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>

That there be a bug. 

On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:

> 
> On 26 Mar 2010, at 09:57, xyz wrote:
> 
>> @Andy: Thank you for the explanation. After the last sequence in the
>> input file in no newline character.
>> 
> 
> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks?
> 
> Later,
> 
> Andy
> --------
> Yada, yada, yada...
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Fri Mar 26 10:41:21 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 26 Mar 2010 10:41:21 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
Message-ID: <A11F888E-C180-40B2-BD84-1590F4FC7905@eaglegenomics.com>

Do you have a fix? I can't remember if you've got SVN access or not - if you do, please do commit it, otherwise email me a patch and I'll commit it for you.

On 26 Mar 2010, at 10:36, Andy Law (RI) wrote:

> 
> On 26 Mar 2010, at 10:28, Richard Holland wrote:
> 
>> That there be a bug.
> 
> Albeit one with a simple workaround while the SVN server is broken :o}
> 
>> 
>> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:
>> 
>>> 
>>> On 26 Mar 2010, at 09:57, xyz wrote:
>>> 
>>>> @Andy: Thank you for the explanation. After the last sequence in the
>>>> input file in no newline character.
>>>> 
>>> 
>>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks?
>>> 
>>> Later,
>>> 
>>> Andy
>>> --------
>>> Yada, yada, yada...
>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
> 
> Later,
> 
> Andy
> --------
> Yada, yada, yada...
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
> 
> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Fri Mar 26 11:04:22 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 26 Mar 2010 11:04:22 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
Message-ID: <E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>

I can't see anything in the code that would cause that behaviour. :( Could you provide sample code and a supporting FASTA file that replicates the problem?

On 26 Mar 2010, at 10:36, Andy Law (RI) wrote:

> 
> On 26 Mar 2010, at 10:28, Richard Holland wrote:
> 
>> That there be a bug.
> 
> Albeit one with a simple workaround while the SVN server is broken :o}
> 
>> 
>> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote:
>> 
>>> 
>>> On 26 Mar 2010, at 09:57, xyz wrote:
>>> 
>>>> @Andy: Thank you for the explanation. After the last sequence in the
>>>> input file in no newline character.
>>>> 
>>> 
>>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks?
>>> 
>>> Later,
>>> 
>>> Andy
>>> --------
>>> Yada, yada, yada...
>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
>>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
> 
> Later,
> 
> Andy
> --------
> Yada, yada, yada...
> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336
> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender.
> 
> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From Richard.Finkers at wur.nl  Fri Mar 26 16:27:59 2010
From: Richard.Finkers at wur.nl (Richard Finkers)
Date: Fri, 26 Mar 2010 17:27:59 +0100
Subject: [Biojava-l] SVN repository
In-Reply-To: <776506315DB04C3EBF2A7FDA610390AB@zillumina>
References: <mailman.3.1268755203.30058.biojava-l@lists.open-bio.org><4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com>
	<59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com>
	<776506315DB04C3EBF2A7FDA610390AB@zillumina>
Message-ID: <4BACE08F.8020604@wur.nl>


The repository has been back for two days. But it appears to be down again.

Richard

Bernd Jagla wrote:
> Hi,
>
> I am trying to check out biojava for the first time, and I am not sure if
> the server is still down... Could you please let me if it is up or down?
>
> Thanks,
>
> Bernd
>
>   
>> -----Original Message-----
>> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
>> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic
>> Sent: Wednesday, March 17, 2010 6:40 PM
>> To: Richard Finkers
>> Cc: biojava-l at lists.open-bio.org
>> Subject: Re: [Biojava-l] SVN repository
>>
>> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous
>> SVN is currently down. Depending on how big the problem turns out to be,
>> it
>> will be back at some point later today / should be back latest tomorrow.
>>
>> Sorry for this inconvenience.
>> Andreas
>>
>>
>>
>>
>> On Wed, Mar 17, 2010 at 3:16 AM, Peter
>> <biopython at maubp.freeserve.co.uk>wrote:
>>
>>     
>>> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers
>>>       
>> <Richard.Finkers at wur.nl>
>>     
>>> wrote:
>>>       
>>>> Hi,
>>>>
>>>> I would like to have a look at the BioJava 3 code (and perhaps in the
>>>>         
>>> future
>>>       
>>>> contribute to). However, I cannot access the SVN repository
>>>> (
>>>>         
>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-
>>>       
>> live/trunk
>>     
>>> ).
>>>       
>>>> Is the repository down?
>>>>
>>>> Thanks,
>>>> Richard
>>>>         
>>> Probably :(
>>>
>>> There have been problems discussed on the BioPerl mailing list
>>> (they use the same servers), and the OBF team are aware of it:
>>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html
>>>
>>> The code.open-bio.org repositories are a read only public mirror,
>>> while dev.open-bio.org is the master repository I think is fine
>>> (but not available for anonymous download).
>>>
>>> In the mean time BioPerl have also setup a read only mirror
>>> on github - perhaps BioJava could do the same? Meanwhile
>>> BioRuby and Biopython are just using github (not SVN or CVS).
>>>
>>> Peter
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>       
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>     
>
>
>
>   


-- 
Dr. Richard Finkers
Researcher Plant Breeding
Wageningen UR Plant Breeding
P.O. Box 16, 6700 AA, Wageningen, The Netherlands
Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB
Wageningen, The Netherlands
Tel. +31-317-484165 Fax +31-317-418094
http://www.plantbreeding.wur.nl/
https://www.eu-sol.wur.nl/
https://cbsgdbase.wur.nl/ <https://cbsgdbase.wur.nl>
http://www.disclaimer-uk.wur.nl/


From mitlox at op.pl  Sat Mar 27 01:49:46 2010
From: mitlox at op.pl (xyz)
Date: Sat, 27 Mar 2010 11:49:46 +1000
Subject: [Biojava-l] Reading and writting Fastq files
Message-ID: <20100327114946.276925da@wp01>

Hello,
I could not find any examples how to read or write fastq files. 

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import org.biojava.bio.program.fastq.FastqReader;

public class Fastq2Fasta {
  public static void main(String[] args) throws FileNotFoundException  {
    BufferedReader br = new BufferedReader(new FileReader("fastq2fasta.fasta"));
  }
}

Are there any examples how to work with fastq files?

Thank you in advance.

Best regards,


From holland at eaglegenomics.com  Sat Mar 27 08:18:04 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 27 Mar 2010 08:18:04 +0000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <20100327100348.1f253bfb@wp01>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
	<20100327100348.1f253bfb@wp01>
Message-ID: <2AC8333D-EE71-495E-9C12-98764D81FE2D@eaglegenomics.com>

Andy and I came to the conclusion yesterday that this is probably a bug with Java itself - somewhere in the readLine() method in BufferedReader. There's nothing in BioJava that could cause this kind of behaviour other than if it was being fed duff information by BufferedReader.

On 27 Mar 2010, at 00:03, xyz wrote:

> Please find the input fasta file attached. This file I created under
> Linux and I also work with BioJava under Linux. Nothing change if I
> created after the last sequence a new line.
> 
> On Fri, 26 Mar 2010 11:04:22 +0000
> Richard Holland wrote:
> 
>> I can't see anything in the code that would cause that
>> behaviour. :( Could you provide sample code and a supporting FASTA
>> file that replicates the problem?
>> 
> 
> <sortFasta.fasta>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From mitlox at op.pl  Sat Mar 27 09:48:14 2010
From: mitlox at op.pl (xyz)
Date: Sat, 27 Mar 2010 19:48:14 +1000
Subject: [Biojava-l] sort fasta file
In-Reply-To: <E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
Message-ID: <20100327194814.1acc8655@wp01>

You can find the input fasta file here 
http://mitlox.republika.pl/sortFasta.fasta . This file I created under
Linux and I also work with BioJava under Linux. Nothing change if I
created after the last sequence a new line.

On Fri, 26 Mar 2010 11:04:22 +0000
Richard Holland wrote:

> I can't see anything in the code that would cause that
> behaviour. :( Could you provide sample code and a supporting FASTA
> file that replicates the problem?
> 


From voisingreg at yahoo.fr  Sat Mar 27 11:24:01 2010
From: voisingreg at yahoo.fr (gregory voisin)
Date: Sat, 27 Mar 2010 11:24:01 +0000 (GMT)
Subject: [Biojava-l] Unsubcribe?
In-Reply-To: <20100327194814.1acc8655@wp01>
References: <20100320201718.4420a9b9@wp01>
	<ccc8042e1003211356t1ea534b6gf6d020983977881e@mail.gmail.com>
	<20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
	<20100327194814.1acc8655@wp01>
Message-ID: <832231.74869.qm@web23207.mail.ird.yahoo.com>

Hi, 
How to unsubscribe of this list ?
thanks
greg

?


________________________________
De : xyz <mitlox at op.pl>
? : Richard Holland <holland at eaglegenomics.com>
Cc : Andy Law (RI) <andy.law at roslin.ed.ac.uk>; "biojava-l at lists.open-bio.org" <biojava-l at lists.open-bio.org>
Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s
Objet?: Re: [Biojava-l] sort fasta file

You can find the input fasta file here 
http://mitlox.republika.pl/sortFasta.fasta . This file I created under
Linux and I also work with BioJava under Linux. Nothing change if I
created after the last sequence a new line.

On Fri, 26 Mar 2010 11:04:22 +0000
Richard Holland wrote:

> I can't see anything in the code that would cause that
> behaviour. :( Could you provide sample code and a supporting FASTA
> file that replicates the problem?
> 

_______________________________________________
Biojava-l mailing list? -? Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l


From mitlox at op.pl  Sat Mar 27 13:54:40 2010
From: mitlox at op.pl (xyz)
Date: Sat, 27 Mar 2010 23:54:40 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com>
References: <20100327114946.276925da@wp01>
	<326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com>
Message-ID: <20100327235440.23cffb47@wp01>

Hello,
I would like to use org.biojava.bio.program.fastq in order to read and
write Illumina fastq files.

Are there any BioJava examples how to work with fastq files?

On Sat, 27 Mar 2010 17:40:21 +0530
jitesh dundas wrote:

> Hello,
> 
> Fasta files are  normal text files. Try parsing using normal text
> parsing methods.
> 
> If you could be more specific & tell me the format details,then I
> could help better.
> 
> btw,try using biojava ,the easy & better option if you want.
> 
> Regards,
> Jitesh Dundas
> 
> On 3/27/10, xyz <mitlox at op.pl> wrote:
> > Hello,
> > I could not find any examples how to read or write fastq files.
> >
> > import java.io.BufferedReader;
> > import java.io.FileNotFoundException;
> > import java.io.FileReader;
> > import org.biojava.bio.program.fastq.FastqReader;
> >
> > public class Fastq2Fasta {
> >   public static void main(String[] args) throws
> > FileNotFoundException  { BufferedReader br = new BufferedReader(new
> > FileReader("fastq2fasta.fasta"));
> >   }
> > }
> >
> > Are there any examples how to work with fastq files?
> >
> > Thank you in advance.
> >
> > Best regards,
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >


From heuermh at acm.org  Sun Mar 28 04:27:16 2010
From: heuermh at acm.org (Michael Heuer)
Date: Sun, 28 Mar 2010 00:27:16 -0400 (EDT)
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <20100327235440.23cffb47@wp01>
Message-ID: <Pine.GSO.4.44.1003280014470.28125-100000@shell3.shore.net>


Sorry, I haven't written up an example for the Biojava Cookbook yet.

The FASTQ package javadoc API is at

http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html

If you want to read Illumina format FASTQ files, use

FastqReader reader = new IlluminaFastqReader();
for (Fastq fastq : reader.read(new File("in.fastq")))
{
  // ...
}

   michael


On Sat, 27 Mar 2010, xyz wrote:

> Hello,
> I would like to use org.biojava.bio.program.fastq in order to read and
> write Illumina fastq files.
>
> Are there any BioJava examples how to work with fastq files?
>
> On Sat, 27 Mar 2010 17:40:21 +0530
> jitesh dundas wrote:
>
> > Hello,
> >
> > Fasta files are  normal text files. Try parsing using normal text
> > parsing methods.
> >
> > If you could be more specific & tell me the format details,then I
> > could help better.
> >
> > btw,try using biojava ,the easy & better option if you want.
> >
> > Regards,
> > Jitesh Dundas
> >
> > On 3/27/10, xyz <mitlox at op.pl> wrote:
> > > Hello,
> > > I could not find any examples how to read or write fastq files.
> > >
> > > import java.io.BufferedReader;
> > > import java.io.FileNotFoundException;
> > > import java.io.FileReader;
> > > import org.biojava.bio.program.fastq.FastqReader;
> > >
> > > public class Fastq2Fasta {
> > >   public static void main(String[] args) throws
> > > FileNotFoundException  { BufferedReader br = new BufferedReader(new
> > > FileReader("fastq2fasta.fasta"));
> > >   }
> > > }
> > >
> > > Are there any examples how to work with fastq files?
> > >
> > > Thank you in advance.
> > >
> > > Best regards,
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l
> > >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From mitlox at op.pl  Sun Mar 28 05:44:57 2010
From: mitlox at op.pl (xyz)
Date: Sun, 28 Mar 2010 15:44:57 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com>
References: <20100327114946.276925da@wp01>
	<326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com>
	<20100327235440.23cffb47@wp01>
	<326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com>
Message-ID: <20100328154457.46e088a6@wp01>

Hello,
I could create methods which can read and write fastq files.
However, I downloaded the BioJava source code and in folder
src/org/biojava/bio/program are following files:

* AbstractFastqReader.java
* AbstractFastqWriter.java
* Fastq.java
* FastqBuilder.java
* FastqReader.java
* FastqVariant.java
* FastqWriter.java
* IlluminaFastqReader.java
* IlluminaFastqWriter.java
* SangerFastqReader.java
* SangerFastqWriter.java
* SolexaFastqReader.java
* SolexaFastqWriter.java

These looks to me that is exactly what I need, but unfortunately I do
not how to use it.

On Sat, 27 Mar 2010 20:13:02 +0530
jitesh dundas wrote:

> Hello,
> 
> I could not find much info on that Q.Try the Biojava API for methods.
> 
> However, I would think of this problem as a simple text file parsing
> using BufferedReader and ByteInputStream based I/p ..You have to read
> the text file content byte by byte using a while loop. The loop will
> detect each column using the patterns (i haven't worked on fastq or
> biojava that much) in the text file, e.g. space tabs..
> Why don't you try reading this fastq file as a simple text file in
> java.
> 
> This is assuming that fastq are text files..Correct me if I am wrong..
> Java tutorial & forums have bulk of egs on that.
> 
> Try writing the code and send the fastq file with the java code if you
> face issues..
> 
> Hope this helps..
> 
> Regards,
> jd


From mitlox at op.pl  Sun Mar 28 07:20:40 2010
From: mitlox at op.pl (xyz)
Date: Sun, 28 Mar 2010 17:20:40 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <Pine.GSO.4.44.1003280014470.28125-100000@shell3.shore.net>
References: <20100327235440.23cffb47@wp01>
	<Pine.GSO.4.44.1003280014470.28125-100000@shell3.shore.net>
Message-ID: <20100328172040.478de1a1@wp01>

Do not worry. I wrote following code:

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.biojava.bio.program.fastq.Fastq;
import org.biojava.bio.program.fastq.FastqBuilder;
import org.biojava.bio.program.fastq.FastqReader;
import org.biojava.bio.program.fastq.FastqWriter;
import org.biojava.bio.program.fastq.IlluminaFastqReader;
import org.biojava.bio.program.fastq.IlluminaFastqWriter;

public class Fastq2Fasta {

  public static void main(String[] args) throws FileNotFoundException,
  IOException { 
    FileInputStream inputFastq = new
  FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
  IlluminaFastqReader();

    FileOutputStream outputFastq = new
    FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
    new IlluminaFastqWriter();


    for (Fastq fastq : qReader.read(inputFastq)) {
      System.out.println(fastq.getDescription());
      System.out.println(fastq.getSequence());
      String trimSeq = fastq.getSequence().substring(0,
    fastq.getSequence().length() - 6); System.out.println(trimSeq);
      System.out.println(fastq.getQuality());
      String trimQual = fastq.getQuality().substring(0,
    fastq.getQuality().length() - 6); System.out.println(trimQual);

      FastqBuilder trimFastq = new FastqBuilder();
      trimFastq.withDescription(fastq.getDescription());
      trimFastq.appendSequence(trimSeq);
      trimFastq.appendQuality(trimQual);
      
      qWriter.write(outputFastq, trimFastq.build());
    }
  }
}

and the input fastq file is:
@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
+HWI-EAS406:5:1:0:1390#0/1
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+HWI-EAS406:5:1:0:1390#0/1
PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB
@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC
+HWI-EAS406:5:1:0:1390#0/1
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC

Unfortunately, I get the following error:
HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
GGGTGATGGCCGCTGCCGATGGCGTCAAAA
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Exception in thread "main" java.io.IOException: sequence
HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger
at
org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41)
at
org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67)
at
org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143)
at
org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125)
at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1

What did I wrong?

On Sun, 28 Mar 2010 00:27:16 -0400 (EDT)
Michael Heuer wrote:

> 
> Sorry, I haven't written up an example for the Biojava Cookbook yet.
> 
> The FASTQ package javadoc API is at
> 
> http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html
> 
> If you want to read Illumina format FASTQ files, use
> 
> FastqReader reader = new IlluminaFastqReader();
> for (Fastq fastq : reader.read(new File("in.fastq")))
> {
>   // ...
> }
> 
>    michael


From andreas at sdsc.edu  Sun Mar 28 17:44:32 2010
From: andreas at sdsc.edu (Andreas Prlic)
Date: Sun, 28 Mar 2010 10:44:32 -0700
Subject: [Biojava-l] Unsubcribe?
In-Reply-To: <832231.74869.qm@web23207.mail.ird.yahoo.com>
References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01>
	<ccc8042e1003250717s36657fdchef532c512323b784@mail.gmail.com>
	<20100326195741.4799c398@wp01>
	<7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk>
	<84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com>
	<1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk>
	<E75CE797-3618-41AC-B6E7-BD53DBD0379A@eaglegenomics.com>
	<20100327194814.1acc8655@wp01>
	<832231.74869.qm@web23207.mail.ird.yahoo.com>
Message-ID: <59a41c431003281044y36137b05nd993e8e51ef7484e@mail.gmail.com>

We are using mailman for our mailing lists :

http://www.biojava.org/mailman/listinfo/biojava-l

Andreas

On Sat, Mar 27, 2010 at 4:24 AM, gregory voisin <voisingreg at yahoo.fr> wrote:

> Hi,
> How to unsubscribe of this list ?
> thanks
> greg
>
>
>
>
>
> ________________________________
> De : xyz <mitlox at op.pl>
> ? : Richard Holland <holland at eaglegenomics.com>
> Cc : Andy Law (RI) <andy.law at roslin.ed.ac.uk>; "
> biojava-l at lists.open-bio.org" <biojava-l at lists.open-bio.org>
> Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s
> Objet : Re: [Biojava-l] sort fasta file
>
> You can find the input fasta file here
> http://mitlox.republika.pl/sortFasta.fasta . This file I created under
> Linux and I also work with BioJava under Linux. Nothing change if I
> created after the last sequence a new line.
>
> On Fri, 26 Mar 2010 11:04:22 +0000
> Richard Holland wrote:
>
> > I can't see anything in the code that would cause that
> > behaviour. :( Could you provide sample code and a supporting FASTA
> > file that replicates the problem?
> >
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From heuermh at acm.org  Tue Mar 30 02:01:23 2010
From: heuermh at acm.org (Michael Heuer)
Date: Mon, 29 Mar 2010 22:01:23 -0400 (EDT)
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <20100328172040.478de1a1@wp01>
Message-ID: <Pine.GSO.4.44.1003292153001.17205-100000@shell3.shore.net>


FastqBuilder defaults to the Sanger variant, see

http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT


In your code, you just need to specify the Illumina variant

FastqBuilder trimFastq = new FastqBuilder()
  .withVariant(FastqVariant.FASTQ_ILLUMINA)
  .withDescription(fastq.getDescription())
  .appendSequence(trimSeq)
  .appendQuality(trimQual);


Please let me know if you have any API or doc suggestions, as this stuff
has not been used much by anyone other than myself.

   michael


On Sun, 28 Mar 2010, xyz wrote:

> Do not worry. I wrote following code:
>
> import java.io.FileInputStream;
> import java.io.FileNotFoundException;
> import java.io.FileOutputStream;
> import java.io.IOException;
> import org.biojava.bio.program.fastq.Fastq;
> import org.biojava.bio.program.fastq.FastqBuilder;
> import org.biojava.bio.program.fastq.FastqReader;
> import org.biojava.bio.program.fastq.FastqWriter;
> import org.biojava.bio.program.fastq.IlluminaFastqReader;
> import org.biojava.bio.program.fastq.IlluminaFastqWriter;
>
> public class Fastq2Fasta {
>
>   public static void main(String[] args) throws FileNotFoundException,
>   IOException {
>     FileInputStream inputFastq = new
>   FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
>   IlluminaFastqReader();
>
>     FileOutputStream outputFastq = new
>     FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
>     new IlluminaFastqWriter();
>
>
>     for (Fastq fastq : qReader.read(inputFastq)) {
>       System.out.println(fastq.getDescription());
>       System.out.println(fastq.getSequence());
>       String trimSeq = fastq.getSequence().substring(0,
>     fastq.getSequence().length() - 6); System.out.println(trimSeq);
>       System.out.println(fastq.getQuality());
>       String trimQual = fastq.getQuality().substring(0,
>     fastq.getQuality().length() - 6); System.out.println(trimQual);
>
>       FastqBuilder trimFastq = new FastqBuilder();
>       trimFastq.withDescription(fastq.getDescription());
>       trimFastq.appendSequence(trimSeq);
>       trimFastq.appendQuality(trimQual);
>
>       qWriter.write(outputFastq, trimFastq.build());
>     }
>   }
> }
>
> and the input fastq file is:
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
> +HWI-EAS406:5:1:0:1390#0/1
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
> +HWI-EAS406:5:1:0:1390#0/1
> PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB
> @HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC
> +HWI-EAS406:5:1:0:1390#0/1
> QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC
>
> Unfortunately, I get the following error:
> HWI-EAS406:5:1:0:1390#0/1
> GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC
> GGGTGATGGCCGCTGCCGATGGCGTCAAAA
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> Exception in thread "main" java.io.IOException: sequence
> HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger
> at
> org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41)
> at
> org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67)
> at
> org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143)
> at
> org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125)
> at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1
>
> What did I wrong?
>
> On Sun, 28 Mar 2010 00:27:16 -0400 (EDT)
> Michael Heuer wrote:
>
> >
> > Sorry, I haven't written up an example for the Biojava Cookbook yet.
> >
> > The FASTQ package javadoc API is at
> >
> > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html
> >
> > If you want to read Illumina format FASTQ files, use
> >
> > FastqReader reader = new IlluminaFastqReader();
> > for (Fastq fastq : reader.read(new File("in.fastq")))
> > {
> >   // ...
> > }
> >
> >    michael
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>


From mitlox at op.pl  Tue Mar 30 11:50:47 2010
From: mitlox at op.pl (xyz)
Date: Tue, 30 Mar 2010 21:50:47 +1000
Subject: [Biojava-l] Reading and writting Fastq files
In-Reply-To: <Pine.GSO.4.44.1003292153001.17205-100000@shell3.shore.net>
References: <20100328172040.478de1a1@wp01>
	<Pine.GSO.4.44.1003292153001.17205-100000@shell3.shore.net>
Message-ID: <20100330215047.084f6b00@wp01>

Thank you it works, but after I extended the code with 
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription());
in order to get also a trimmed fasta file I got the following error:

Fastq2Fasta.java:51: cannot
find symbol symbol  : method
writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String)
location: class org.biojavax.bio.seq.RichSequence.IOTools
RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
fastq.getDescription()); 1 error

Complete Code:
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import org.biojava.bio.program.fastq.Fastq;
import org.biojava.bio.program.fastq.FastqBuilder;
import org.biojava.bio.program.fastq.FastqReader;
import org.biojava.bio.program.fastq.FastqVariant;
import org.biojava.bio.program.fastq.FastqWriter;
import org.biojava.bio.program.fastq.IlluminaFastqReader;
import org.biojava.bio.program.fastq.IlluminaFastqWriter;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.RichSequence;


public class Fastq2Fasta {

  public static void main(String[] args) throws FileNotFoundException,
  IOException {

    FileInputStream inputFastq = new
    FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new
    IlluminaFastqReader();

    FileOutputStream outputFastq = new
    FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter =
    new IlluminaFastqWriter();

    SimpleNamespace ns = new SimpleNamespace("biojava");

    FileOutputStream outputFasta = new
    FileOutputStream("fastq2fastaTrim.fasta");


    for (Fastq fastq : qReader.read(inputFastq)) {
      System.out.println(fastq.getDescription());
      System.out.println(fastq.getSequence());
      String trimSeq = fastq.getSequence().substring(0,
    fastq.getSequence().length() - 6); System.out.println(trimSeq);
      System.out.println(fastq.getQuality());
      String trimQual = fastq.getQuality().substring(0,
    fastq.getQuality().length() - 6); System.out.println(trimQual);

      FastqBuilder trimFastq = new FastqBuilder();
      trimFastq.withVariant(FastqVariant.FASTQ_ILLUMINA);
      trimFastq.withDescription(fastq.getDescription());
      trimFastq.appendSequence(trimSeq);
      trimFastq.appendQuality(trimQual);

      qWriter.write(outputFastq, trimFastq.build());
      
      RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns,
      fastq.getDescription());


    }
  }
}

What did I wrong?

Suggestions:
1) 
After I trimmed the fastq files the header information for quality
is empty

@HWI-EAS406:5:1:0:1390#0/1
GGGTGATGGCCGCTGCCGATGGCGTCAAAA
+
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

this reduced the size of the files but is it compatible with
SOAP and TopHat?

2)
I was using fastq files up to 6 GBytes and I have not run any benchmarks
with different Buffer/stream combination on big text files and therefore
I am not sure that is enough to use just FileInputStream or
FileOutputStream. BioJavaX is using BufferedReader br = new
BufferedReader(new FileReader()) are there any speed difference?

Overall I think the API looks good and for doc you could use this code
and put it on BioJava.


On Mon, 29 Mar 2010 22:01:23 -0400 (EDT)
Michael Heuer wrote:

> 
> FastqBuilder defaults to the Sanger variant, see
> 
> http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT
> 
> 
> In your code, you just need to specify the Illumina variant
> 
> FastqBuilder trimFastq = new FastqBuilder()
>   .withVariant(FastqVariant.FASTQ_ILLUMINA)
>   .withDescription(fastq.getDescription())
>   .appendSequence(trimSeq)
>   .appendQuality(trimQual);
> 
> 
> Please let me know if you have any API or doc suggestions, as this
> stuff has not been used much by anyone other than myself.
> 
>    michael
> 
> 
> 


From rmb32 at cornell.edu  Fri Mar 26 07:44:09 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Mar 2010 00:44:09 -0700
Subject: [Biojava-l] GSoC mentors mailing list
Message-ID: <4BAC65C9.307@cornell.edu>

Hi all,

If you have volunteered to be a possible GSoC mentor, and have not 
already been subscribed to the (mentors-only) gsoc-mentors mailing list, 
send me an email and I'll subscribe you.

Rob Buels
OBF GSoC 2010 Admin


From rmb32 at cornell.edu  Fri Mar 26 16:30:30 2010
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Mar 2010 09:30:30 -0700
Subject: [Biojava-l] Announcing OBF Summer of Code - please forward!
Message-ID: <4BACE126.1030500@cornell.edu>

Hi all,

Here's an advertising-ready announcement for OBF's Summer of Code, 
thanks to Christian Zmasek and Hilmar Lapp for their excellent writing.

Student applications are due April 9!  Please spread it widely, we need 
to reach lots of students with it!

Rob Buels
OBF GSoC 2010 Admin


============================================================

*** Please disseminate widely at your local institutions ***
*** including posting to message and job boards, so that ***
*** we reach as many students as possible.               ***

============================================================


OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010

Applications due 19:00 UTC, April 9, 2010.
http://www.open-bio.org/wiki/Google_Summer_of_Code

The Open Bioinformatics Foundation Summer of Code program provides a 
unique opportunity for undergraduate, masters, and PhD students to 
obtain hands-on experience writing and extending open-source software 
for bioinformatics under the mentorship of experienced developers from 
around the world. The program is the participation of the Open 
Bioinformatics Foundation (OBF) as a mentoring organization in the 
Google Summer of Code(tm) (http://code.google.com/soc/).

Students successfully completing the 3 month program receive a $5,000 
USD stipend, and may work entirely from their home or home institution. 
  Participation is open to students from any country in the world except 
countries subject to US trade restrictions.  Each student will have at 
least one dedicated mentor to show them the ropes and help them complete 
their project.

The Open Bioinformatics Foundation is particularly seeking students 
interested in both bioinformatics (computational biology) and software 
development. Some initial project ideas are listed on the website. These 
range from Galaxy phylogenetics pipeline development in Biopython to 
lightweight sequence objects and lazy parsing in BioPerl, a DAS Server 
for large files on local filesystems, and mapping Java libraries to 
Perl/Ruby/Python using Biolib+SWIG+JNI.  All project ideas are flexible 
and many can be adjusted in scope to match the skills of the student. We 
also welcome and encourage students proposing their own project ideas; 
historically some of the most successful Summer of Code projects are 
ones proposed by the students themselves.

TO APPLY: Apply online at the Google Summer of Code website 
(http://socghop.appspot.com/), where you will also find GSoC program 
rules and eligibility requirements. The 12-day application period for 
students runs from Monday, March 29 through Friday, April 9th, 2010.

INQUIRIES:

We strongly encourage all interested students to get in touch with us 
with their ideas as early on as possible.  See the OBF GSoC page for 
contact details.

2010 OBF Summer of Code:
http://www.open-bio.org/wiki/Google_Summer_of_Code

Google Summer of Code FAQ:
http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs


From sheoran143 at gmail.com  Thu Mar 25 01:19:29 2010
From: sheoran143 at gmail.com (Deepak Sheoran)
Date: Wed, 24 Mar 2010 20:19:29 -0500
Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :(
 Hibernate Exception and suggestion for change in BioSqlSchema)
Message-ID: <4BAABA21.4000301@gmail.com>

I am writing this email again, I didn't get any response weather this 
bugs are patched or are they lost some where on mailing list. I am not 
sure that's why I am writing this back. I don't know how to apply this 
patch So I am counting on you guys to apply theses patch and reply me 
back so I know its fixed.


Thanks
Deepak Sheoran


Hi
In response to bug fix suggested by Richard I have created some patches. 
We need to apply these to fix biojava from processing references from a 
genbank record in a wrong manner which cause more hibernate exceptions. 
After applying patch, reference resolution code will test pubmed or 
medline id, then if no match then test author/title/location, then if 
still no match create a new reference. I even tested it with 
GenbankRelease 175 and I gained almost 3159 more records in my database.

Can somebody please have a look on second issue of it and fix it
"

2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

"

Also I am planning on making a bridge between biosql database loaded 
using bioperl and biojava, here is my some of the investigation can you 
guys suggest some direction on it.
Have a look on attached files
1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank 
record is stored in biosql instance by bioperl and biojava
2) GenbankRecord.doc  ==> its word document having a genbank showing 
where its information goes in biosql using bioperl and biojava
3) BioSqlRichobjectBuilder.patch ==> patch needed for 
BioSqlRichObjectBuild.java class
4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class


Thanks
Deepak Sheoran


-------- Original Message --------
Subject: 	Re: Hibernate Exception and suggestion for change in BioSqlSchema
Date: 	Tue, 9 Feb 2010 20:34:32 +1300
From: 	Richard Holland <holland at eaglegenomics.com>
To: 	Deepak Sheoran <sheoran143 at gmail.com>
CC: 	biojava-l at biojava.org


Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.

However, in answer to your two questions:

   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).

   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

cheers,
Richard

On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:

>
>  Hi Richard
>
>  Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
>
>
>  Thanks
>  Deepak Sheoran
>  -------- Original Message --------
>  Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
>  Date:	Wed, 03 Feb 2010 08:07:35 -0600
>  From:	Deepak Sheoran<sheoran143 at gmail.com>
>  To:	biojava-l at lists.open-bio.org
>
>  Hi guys,
>
>  A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>  On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
>  	? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
>  This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
>   Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
>  But problem is with below part of that method:
>  ?..LineNumber: 114
>  else if (SimpleDocRef.class.isAssignableFrom(clazz))
>   {                queryType = "DocRef";
>                  // convert List constructor to String representation for query
>                  ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
>                  if (ourParamsList.size()<3) {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
>                  } else {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
>                  }
>   }
>  ..LineNubmer: 123
>  Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
>  ?.LineNumber: 447
>  else {
>                                          try {
>                                              CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
>                                              RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
>                                              rlistener.getCurrentFeature().addRankedCrossRef(rcr);
>                                          } catch (ChangeVetoException e) {
>                                              throw new ParseException(e+", accession:"+accession);
>                                          }
>                                      }
>                      ?..LineNumber:455
>  Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
>
>  The only way to get these record in database is:
>  		? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
>  		? Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
>
>  Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
>  Reference_id
>  Dbxref_id
>  Location
>  Title
>  Authors
>  crc
>  216
>  18554304
>  FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  9E940E01F4BE3CD0
>  230
>  18554304
>  FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  D3BC0C17F3F786C9
>  415
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  60AEDFA0CEEACC38
>  969
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  4B1232999F6E8130
>  929
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  3E79B40DD2AAA2B7
>  932
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  094EB3384F8D6DE8
>  1426
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
>  357648D8FD8C6C8A
>  1481
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
>  115411EB2DEE5654
>  1497
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D5D376EECCD186B
>  1501
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D57954EECDED66B
>  1556
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  698688FB6DB95247
>  1559
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  E25E1BA99DB18F3D
>
>  	? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>  		? Which means in richsequence object some feature have location object which have its feature set to null.
>  		? My Observation:
>  			? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
>  			? After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
>  			? Below is the screen shot of one of my tests
>  				? Settings before trying to persits the richsequence object to database
>
>  <Mail Attachment.png>
>  		?
>  		? After trying to persits the richsequence object to database and got in hibernate exception catch
>
>  		?<Mail Attachment.png>
>
>  		? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
>  		? Some extra information to make things more clear to you guys.
>  			? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
>  				? LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
>  					? richSequence.feature Index : 2540 and line number in the genbank record : 22115
>  				? LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
>  					? richSequence.feature Index : 127 and line number in the genbank record : 2137
>  				? LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
>  					? richSequence.feature Index : 389 and line number in the genbank record : 3632
>  				? LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
>  					? richSequence.feature Index : 47 and line number in the genbank record : 4841
>  				? LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
>  					? richSequence.feature Index : 45 and line number in the genbank record : 442
>  		? The complete exception msg :
>  org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>          at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
>          at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
>          at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
http://www.eaglegenomics.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Biojava_BioPerl_diff.xls
Type: application/vnd.ms-excel
Size: 346624 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0002.xls>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BioSqlRichObjectBuilder.patch
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: GenbankFormat.patch
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0001.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GenbankRecord.doc
Type: application/msword
Size: 59392 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0002.doc>