[Biojava-l] Getting a part of a sequence
Gabrielle Doan
gabrielle_doan at gmx.net
Thu Oct 9 12:22:01 UTC 2008
Hi Richard,
thanks a lot for your mail. I have successfully retrieved the
subsequence of a sequence as a String. And now I try to get the features
for a particular range with following code:
<code>
public FeatureHolder filterFeature(String name, int startpos, int endpos) {
RichLocation rl = new SimpleRichLocation(new SimplePosition(startpos),
new SimplePosition(endpos), 0);
BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And(
new BioSQLFeatureFilter.BySequenceName(name),
new BioSQLFeatureFilter.OverlapsRichLocation(rl));
return filter(filter);
}
<\code>
Fortunately I received these errors:
<message>
Exception in thread "main" java.lang.RuntimeException:
java.lang.reflect.InvocationTargetException
at
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:143)
at
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.filter(BioSQLRichSequenceDB.java:151)
at org.sequence_viewer.db.HBioSQLDB.filterFeature(HBioSQLDB.java:599)
at org.sequence_viewer.db.AbfragenTest.main(AbfragenTest.java:56)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.processFeatureFilter(BioSQLRichSequenceDB.java:138)
... 3 more
Caused by: org.hibernate.PropertyAccessException: Exception occurred
inside setter of org.biojavax.bio.seq.SimpleRichFeature.locationSet
at
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:65)
at
org.hibernate.tuple.entity.AbstractEntityTuplizer.setPropertyValues(AbstractEntityTuplizer.java:337)
at
org.hibernate.tuple.entity.PojoEntityTuplizer.setPropertyValues(PojoEntityTuplizer.java:200)
at
org.hibernate.persister.entity.AbstractEntityPersister.setPropertyValues(AbstractEntityPersister.java:3571)
at
org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:133)
at
org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:854)
at org.hibernate.loader.Loader.doQuery(Loader.java:729)
at
org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
at org.hibernate.loader.Loader.doList(Loader.java:2213)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
at org.hibernate.loader.Loader.list(Loader.java:2099)
at
org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
... 8 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.hibernate.property.BasicPropertyAccessor$BasicSetter.set(BasicPropertyAccessor.java:42)
... 21 more
Caused by: java.lang.NullPointerException
at
org.biojavax.bio.seq.PositionResolver$AverageResolver.getMin(PositionResolver.java:103)
at
org.biojavax.bio.seq.SimpleRichLocation.getMin(SimpleRichLocation.java:323)
at
org.biojavax.bio.seq.SimpleRichLocation.overlaps(SimpleRichLocation.java:451)
at
org.biojavax.bio.seq.SimpleRichLocation.union(SimpleRichLocation.java:469)
at org.biojavax.bio.seq.RichLocation$Tools.merge(RichLocation.java:363)
at
org.biojavax.bio.seq.SimpleRichFeature.setLocationSet(SimpleRichFeature.java:181)
... 26 more
<\message>
Why do I get these errors?
BioSQLFeatureFilter.BySequenceName(name) needs a seqName as parameter.
How can I find out the sequence name? Is it the value "name" in the
table "Bioentry"? As the build-in subSequence method takes a long time I
intend to get the subsequence as a String by myself and add the features
to it. What do you think about this?
I'm grateful for any hints.
cheers,
Gabrielle
Richard Holland schrieb:
> Hello.
>
> Your code is pretty good already - but you're right, it will load the
> whole chromosome into memory before you can chop out the interesting
> bit you actually need.
>
> As you observed, by using ThinRichSequence in your query it will load
> only the initial shell of a sequence object to start with, but the
> moment you try and sub-sequence it, it will immediately load the whole
> sequence data into memory in order to perform the operation.
>
> If you only want the sequence data, as a string, you can do this by
> specifying the sequence attribute in the query and bypassing the
> sequence object entirely:
>
> select rs.stringSequence from Sequence as rs where rs.description
> like '%hromosome :num%
>
> This will return a String instead of a RichSequence object. You can
> use HQL operators to perform substrings etc. on the string inside the
> query itself - see
> http://docs.huihoo.com/hibernate/hibernate-reference-3.2.1/queryhql.html
> , particularly section 14.9.
>
> If you only want the features, you can do this by using the
> BioSQLFeatureFilter technique. In particular you will want the
> BySequenceName filter, the And filter, and the OverlapsRichLocation
> filter. You construct a filter then pass it to the filter() method in
> BioSQLRichSequenceDB. The database will return to you all the
> RichFeature objects that match your criteria. Note that it searches
> the whole database so you really must use a BySequenceName filter at
> the very least in order to make the results useful!
>
> However, you can't use HQL to construct a complete slice of a sequence
> directly in the database before returning it to the program for use as
> a ready-made RichSequence object. This would require Hibernate to know
> what a BioJava sub-sequence object is and how it behaves in relation
> to an 'unsliced' one, which is beyond the scope of it's job as a
> persistence framework.
>
> cheers,
> Richard
>
>
>
> 2008/10/7 Gabrielle Doan <gabrielle_doan at gmx.net>:
>> Hi all,
>> I have a BioSQL database which contains all human chromosomes. My intention
>> is to get the information about a particular gene. How can I get a part of a
>> particular chromosome with all associated features? At the moment I use
>> following code to create my new sequence:
>>
>> <code>
>> RichSequence subSeq = RichSequence.Tools.subSequence(parent,
>> position[0], position[1], ns, geneName, parent.getAccession(),
>> parent.getIdentifier(), parent.getVersion() + 1,
>> (Double) (parent.getVersion() + 1.0));
>> <\code>
>>
>> Here is the part how I get the parent sequence:
>> <code>
>> public static RichSequence getChromosome(String chrNo) {
>> Transaction tx = session.beginTransaction();
>> RichSequence ret = null;
>>
>> String query;
>>
>> try {
>> if (chrNo.equals("MT")) {
>> query = "from BioEntry as be where
>> be.description like '%:num%'";
>> query = query.replaceAll(":num",
>> "mitochondrion");
>> } else {
>> query = "from BioEntry as be where
>> be.description like '%hromosome :num%'";
>> query = query.replaceAll(":num", chrNo);
>> }
>>
>> Query q = session.createQuery(query);
>>
>> ret = (RichSequence) q.list().get(0);
>> tx.commit();
>> } catch (Exception e) {
>> tx.rollback();
>> e.printStackTrace();
>> }
>> return ret;
>> }
>> <\code>
>>
>> I always have to load the whole chromsome to get a part of it, so it takes
>> very long time and I get a lot of unused information (waste of memory). I
>> also tried to use <code>ThinRichSequence<\code> instead of
>> <code>RichSequence<\code>, but thereby I didn't notice any difference.
>> Can you give me a hint how to accelerate the code?
>> I am grateful for any hits.
>>
>> cheers,
>> Gabrielle
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
>
More information about the Biojava-l
mailing list