From neil at cambia.org Tue Aug 1 03:41:48 2006 From: neil at cambia.org (Neil Bacon) Date: Tue, 01 Aug 2006 17:41:48 +1000 Subject: [Biojava-l] three-letter Protein alphabet names Message-ID: <44CF05BC.6020509@cambia.org> Hi, I'm looking at extending biojava sequence io to read sequences from patents (initially current US data formats, later perhaps older formats and other jurisdictions). Anyone done this already or interested? Protein data uses 3-letter codes. I found an old posting about 3-letter codes: [Biojava-dev] Protein alphabet names http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html >/ - Add an additional tokenization (probably called />/ "three-letter" />/ unless someone comes up with a better />/ suggestion) for people />/ who actually want 3-letter codes. / Did this happen (I can't find it)? I'll try extending WordTokenization to do this unless someone has already done it or can advise me better (I'm new here and advice would be very welcome). Cheers, Neil Bacon From richard.holland at ebi.ac.uk Tue Aug 1 04:15:52 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 01 Aug 2006 09:15:52 +0100 Subject: [Biojava-l] Problem Inserting Genbank File In-Reply-To: <44CF07D70200004E00003B08@gwc2cn06.its.mq.edu.au> References: <44CF07D70200004E00003B08@gwc2cn06.its.mq.edu.au> Message-ID: <1154420152.4151.25.camel@localhost.localdomain> Hello. I have a sneaking suspicion I know what is wrong, but I can't tell for sure without seeing your full source code. Could you post that? It'd certainly help a lot in trying to find the exact cause of the problem. cheers, Richard On Tue, 2006-08-01 at 07:50 +1000, Michael Joss wrote: > Hi all, > I am pretty new to this whole BioJava/BioJavaX thing. I thought > I would start with something reasonably basic. At least what I thought > would be. I wanted to open a Genbank file and save it into a BioSQL DB. > I have got the BioSQL Database all running and BioJava and BioJavaX seem > to be working ok ( I might have messed up some stuff along the way but > it does appear to be working). I can open the file and can convert it to > fasta etc .. all the code was found in various examples. When I use > session.saveOrUpdate: > > BufferedReader br = new BufferedReader(new > FileReader("C:/CODE/AY928791.GBANK")); > // a namespace to override that in the file > Namespace ns = RichObjectFactory.getDefaultNamespace(); > // we are reading DNA sequences > RichSequenceIterator seqs = > RichSequence.IOTools.readGenbankDNA(br,ns); > while (seqs.hasNext()) { > RichSequence rs = seqs.nextRichSequence(); > session.saveOrUpdate("Sequence",rs); > } > > I get an error saying it can't insert a taxon, the taxon and taxon_name > tables seem to be populated correctly and I am not sure how to work out > why its attempting to insert a taxon that is already there? I just don't > know enough about .. well anything.. but hibernate in particular. Any > ideas? > If you need anything else please let me know? The file is simply a > single genbank record with locus the same name as the file. I tried a > few others and got the same result. I am using the latest CVS of > BioJavaX and BioJava 1.4 and Hibernate 3.1. > > Cheers > > Joss > > 6860 [main] DEBUG org.hibernate.engine.Cascade - processing cascade > ACTION_SAVE_UPDATE for: Sequence > 6860 [main] DEBUG org.hibernate.engine.CascadingAction - cascading to > saveOrUpdate: Taxon > 6860 [main] DEBUG org.hibernate.event.def.AbstractSaveEventListener - > transient instance of: Taxon > 6860 [main] DEBUG > org.hibernate.event.def.DefaultSaveOrUpdateEventListener - saving > transient instance > 6860 [main] DEBUG org.hibernate.event.def.AbstractSaveEventListener - > saving [Taxon#] > 6860 [main] DEBUG org.hibernate.event.def.AbstractSaveEventListener - > executing insertions > 6860 [main] DEBUG org.hibernate.event.def.WrapVisitor - Wrapped > collection in role: Taxon.nameSet > 6875 [main] DEBUG org.hibernate.persister.entity.AbstractEntityPersister > - Inserting entity: Taxon (native id) > 6875 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - about to open > PreparedStatement (open PreparedStatements: 0, globally: 0) > 6875 [main] DEBUG org.hibernate.SQL - insert into taxon (ncbi_taxon_id, > node_rank, genetic_code, mito_genetic_code, left_value, right_value, > parent_taxon_id) values (?, ?, ?, ?, ?, ?, ?) > 6875 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - preparing > statement > 6891 [main] DEBUG org.hibernate.persister.entity.AbstractEntityPersister > - Dehydrating entity: [Taxon#] > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding '36865' to > parameter: 1 > 6891 [main] DEBUG org.hibernate.type.StringType - binding null to > parameter: 2 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 3 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 4 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 5 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 6 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 7 > 6953 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - about to close > PreparedStatement (open PreparedStatements: 1, globally: 1) > 6953 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - closing > statement > 6953 [main] DEBUG org.hibernate.util.JDBCExceptionReporter - could not > insert: [Taxon] [insert into taxon (ncbi_taxon_id, node_rank, > genetic_code, mito_genetic_code, left_value, right_value, > parent_taxon_id) values (?, ?, ?, ?, ?, ?, ?)] > java.sql.SQLException: Duplicate entry '36865' for key 2 > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Tue Aug 1 04:19:36 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 01 Aug 2006 09:19:36 +0100 Subject: [Biojava-l] three-letter Protein alphabet names In-Reply-To: <44CF05BC.6020509@cambia.org> References: <44CF05BC.6020509@cambia.org> Message-ID: <1154420376.4151.28.camel@localhost.localdomain> I'm not sure, but it should simply be a matter of defining an alphabet where each symbol in the alphabet is a 3-letter combo. Then you can use the alphabet to tokenize the input string appropriately. Mark will know more about this than me. Mark - comments? cheers, Richard On Tue, 2006-08-01 at 17:41 +1000, Neil Bacon wrote: > Hi, > I'm looking at extending biojava sequence io to read sequences from > patents (initially current US data formats, later perhaps older formats > and other jurisdictions). > Anyone done this already or interested? > > Protein data uses 3-letter codes. I found an old posting about 3-letter > codes: > > [Biojava-dev] Protein alphabet names > http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html > > >/ - Add an additional tokenization (probably called > />/ "three-letter" > />/ unless someone comes up with a better > />/ suggestion) for people > />/ who actually want 3-letter codes. > / > > Did this happen (I can't find it)? > I'll try extending WordTokenization to do this unless someone has > already done it or can advise me better (I'm new here and advice would > be very welcome). > > Cheers, > Neil Bacon > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Aug 1 04:20:30 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 1 Aug 2006 16:20:30 +0800 Subject: [Biojava-l] three-letter Protein alphabet names Message-ID: You mean something like .. Pro Ala Tyr Then yes in this case you would want to make a WordTokenization. Best regards, - Mark Neil Bacon Sent by: biojava-l-bounces at lists.open-bio.org 08/01/2006 03:41 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] three-letter Protein alphabet names Hi, I'm looking at extending biojava sequence io to read sequences from patents (initially current US data formats, later perhaps older formats and other jurisdictions). Anyone done this already or interested? Protein data uses 3-letter codes. I found an old posting about 3-letter codes: [Biojava-dev] Protein alphabet names http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html >/ - Add an additional tokenization (probably called />/ "three-letter" />/ unless someone comes up with a better />/ suggestion) for people />/ who actually want 3-letter codes. / Did this happen (I can't find it)? I'll try extending WordTokenization to do this unless someone has already done it or can advise me better (I'm new here and advice would be very welcome). Cheers, Neil Bacon _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mjoss at bio.mq.edu.au Tue Aug 1 08:41:03 2006 From: mjoss at bio.mq.edu.au (Michael Joss) Date: Tue, 01 Aug 2006 22:41:03 +1000 Subject: [Biojava-l] Problem Inserting Genbank File Message-ID: <44CFD8800200004E00003B5A@gwc2cn06.its.mq.edu.au> Hi Richard and fellow listers, Will teach me to be all gung-ho. Asking me to post the entire source code made me actually look at what I had written. I realised I had commented out a line that had been giving me trouble... I imagine it is actually useful for something.. maybe connecting BiojavaX to the DB ;) Thats what you get for patching a whole bunch of examples together when you have no idea what you are doing I guess. I am now getting an error with the previously commented code that doesn't make a lot of sense to me?? It says: Exception in thread "main" java.lang.IllegalArgumentException: Parameter must be a org.hibernate.Session object at org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder.(BioSQLRichObjectBuilder.java:68) at org.biojavax.RichObjectFactory.connectToBioSQL(RichObjectFactory.java:221) at biojavaxtest.Main.main(Main.java:45) but sess definitly is a org.hibernate.Session object.. isn't it? Sorry if I am being a pain its kinna tough learning Java and Biojava and hibernate at the same time. I am sure once I get the ball rolling I will be ok. Cheers Joss Source code | | V package biojavaxtest; import java.io.BufferedReader; import java.io.FileReader; import org.biojavax.Namespace; import org.biojavax.RichObjectFactory; import org.biojavax.bio.db.RichSequenceDB; import org.biojavax.bio.db.biosql.BioSQLRichSequenceDB; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; import org.hibernate.Session; import org.hibernate.SessionFactory; import org.hibernate.Transaction; import org.hibernate.cfg.Configuration; /** * * @author Joss */ public class Main { /** Creates a new instance of Main */ public Main() { } /** * @param args the command line arguments */ public static void main(String[] args) { org.apache.log4j.BasicConfigurator.configure(); SessionFactory sessionFactory = new Configuration().configure().buildSessionFactory(); // open the session Session sess = sessionFactory.openSession(); // connect it to BioJavaX RichObjectFactory.connectToBioSQL(sess); //### Was commented out in previous post ### Transaction tx = sess.beginTransaction(); try { // create the RichSequenceDB wrapper around the Hibernate session RichSequenceDB db = new BioSQLRichSequenceDB(sess); RichSequence seq1 = db.getRichSequence("AE012130"); // load the sequence where name='AE012130' BufferedReader br = new BufferedReader(new FileReader("C:/CODE/AY928791.GBANK")); // a namespace to override that in the file Namespace ns = RichObjectFactory.getDefaultNamespace(); // we are reading DNA sequences RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(br,ns); while (seqs.hasNext()) { RichSequence seq2 = seqs.nextRichSequence(); db.addRichSequence(seq2); } // add it to the database tx.commit(); System.out.println("Changes committed."); } catch (Exception e) { tx.rollback(); System.out.println("Changes rolled back."); e.printStackTrace(); } sess.close(); // disconnect from the database } } From mjoss at bio.mq.edu.au Tue Aug 1 21:07:11 2006 From: mjoss at bio.mq.edu.au (Michael Joss) Date: Wed, 02 Aug 2006 11:07:11 +1000 Subject: [Biojava-l] Problem Inserting Genbank File Message-ID: <44D0875F0200004E00003B7B@gwc2cn06.its.mq.edu.au> Hi Guys, Its OK you can offically ignore me now. I just rebuilt my BioJavaX.jar file from the latest CVS this morning and it is all working fine. No idea what the difference was.. but well.. I guess it really doesn't matter. Thanks for bearing with me while I work through my (apparently non existant) issues. Cheers Joss From pmaes3 at uz.kuleuven.ac.be Thu Aug 3 17:41:44 2006 From: pmaes3 at uz.kuleuven.ac.be (pmaes3) Date: Thu, 3 Aug 2006 14:41:44 -0700 (PDT) Subject: [Biojava-l] The Java sandbox and BioJava In-Reply-To: <44980650.9040007@andrew.cmu.edu> References: <44980650.9040007@andrew.cmu.edu> Message-ID: <5641280.post@talk.nabble.com> Hi, I have the same problem. Which I was able to solve exactly the way have described here. Unfortunately, when I try my applet online. I still get the same error message, although on my computer, the applet works perfect. Piet. -- View this message in context: http://www.nabble.com/The-Java-sandbox-and-BioJava-tf1818162.html#a5641280 Sent from the BioJava forum at Nabble.com. From mjoss at bio.mq.edu.au Thu Aug 3 21:16:39 2006 From: mjoss at bio.mq.edu.au (Michael Joss) Date: Fri, 04 Aug 2006 11:16:39 +1000 Subject: [Biojava-l] Managing Ranks? Message-ID: <44D32C980200004E00003C40@gwc2cn06.its.mq.edu.au> Hi all, Perhaps this is more of a Java/Programming/BioSQL question rather than a BioJavaX question in particular but I was wondering what people thought the best way to manage Ranks in Features and Locations was? Seems to me that iterating through every feature and location and adjusting their ranks everytime I add or remove a feature is a risky business. I can just see it getting out of hand and messy very quickly. Also I notice that the Genbank parsing etc does not set a rank initially for features or locations. Is there a reason for this since the file clearly places the features in rank order? I guess to explain a little more fully, I come from a database background, I have some experience with OO programming but not a whole heap so the object models are a little alien to me. Is there someway I could order a featureset by the minposition of each feaures location forinstance? Is the FeatureSet actually ordered by a whole number of things, or just Feature.rank? I can see the importance of rank for producing an arbritary order but if I wanted to keep things more rigid how would I go about adjusting the order of the set returned from RichSequence.getFeatureSet() to be ordered by feature.location.minposition? Am I thinking about this all the wrong way, I just keep wanting to query the database and get a nice ordered result set rather than working with objects as the information stored in them seems to be so arbritarily ordered. Admittedly most of this is unimportant except for display purposes but still it seems like a mental leap I am just not making. Thanks in advance for any help you can offer. Cheers Joss As an aside some explanation of what brings me here. Essentially what I am trying to do overall is use BioJavaX to make a JSP front end for a BioSQL Database. Just be able to add sequences, and annotations, and search them via a web interface, then visualise the data, sequence data, protein translations etc. Will be adding a whole lot more later, but thats the starting plan. I had done all this for a database structure I had come up with but decided that the framework set up in BioSQL and The tools in BioJava would make my info a whole lot more transferrable and maintainable in the long run. Hence here I am. From johnson.biotech at gmail.com Fri Aug 4 11:59:56 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Fri, 4 Aug 2006 11:59:56 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI Message-ID: Hi Richard, I'm back for more help. I've just completed getting and parsing the entire human genome RefSeq list from NCBI. I'm not going to post my source code since the invoking code has been described by the gentlemen who started the original thread last month. The result of the parsing is such that out of ~28K sequences, 13 produced the exceptions below. I've used the latest biojava code from CVS, not quite sure what the problem is on these 13. Trying to get: NM_006145 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) -------------------------------------------------------------------------------- Trying to get: NM_000602 at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------- Trying to get: NM_006226 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ---------------------------------------------------------------------------------- Trying to get: NM_000371 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -------------------------------------------------------------------------------- Trying to get: NM_019072 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -------------------------------------------------------------------------------- Trying to get: NM_017884 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -------------------------------------------------------------------------------- Trying to get: NM_022107 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more --------------------------------------------------------------------------------- Trying to get: NM_031418 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more --------------------------------------------------------------------------------------- Trying to get: NM_030809 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------- Trying to get: NM_032731 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------ Trying to get: NM_001029888 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------ Trying to get: NM_001029869 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------ Trying to get: NM_182572 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -- Best Regards, Seth Johnson Senior Bioinformatics Associate From david at autohandle.com Fri Aug 4 14:57:04 2006 From: david at autohandle.com (David Scott) Date: Fri, 04 Aug 2006 11:57:04 -0700 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: Message-ID: <44D39880.9020202@autohandle.com> hi seth- the 3rd argument to SimpleDocRef constructor is the REFERENCE title - which appears to be null in the trace - which happens, but rarely. i had the exact same problem recently - and richard put in code to check for a null title and then call a special 2 argument constructor for SimpleDocRef - any chance you don't have that code checked out? best- david Seth Johnson wrote: > Hi Richard, > > > I'm back for more help. I've just completed getting and parsing the entire > human genome RefSeq list from NCBI. I'm not going to post my source code > since the invoking code has been described by the gentlemen who started the > original thread last month. The result of the parsing is such that out of > ~28K sequences, 13 produced the exceptions below. I've used the latest > biojava code from CVS, not quite sure what the problem is on these 13. > > > > Trying to get: NM_006145 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > -------------------------------------------------------------------------------- > > Trying to get: NM_000602 > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------- > > Trying to get: NM_006226 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ---------------------------------------------------------------------------------- > > Trying to get: NM_000371 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > -------------------------------------------------------------------------------- > > Trying to get: NM_019072 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > -------------------------------------------------------------------------------- > > Trying to get: NM_017884 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > -------------------------------------------------------------------------------- > > Trying to get: NM_022107 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > --------------------------------------------------------------------------------- > > Trying to get: NM_031418 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > --------------------------------------------------------------------------------------- > > Trying to get: NM_030809 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------- > > Trying to get: NM_032731 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------ > > Trying to get: NM_001029888 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------ > > Trying to get: NM_001029869 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------ > > Trying to get: NM_182572 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > > > > From johnson.biotech at gmail.com Fri Aug 4 16:05:57 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Fri, 4 Aug 2006 16:05:57 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44D39880.9020202@autohandle.com> References: <44D39880.9020202@autohandle.com> Message-ID: Hi David, I compiled my biojava.jar on 7/28 and, just to make sure, I've updated my biojava-live cvs just now and it doesn't look like there were any changes made since that date. The SimpleDocRef.java was last updated on 7/18 and the version that I have does include the second constructor with 2 parameters. It seems to be related to null TITLE since all of the entries are missing it, but I was also under the impression that null TITLE issue was fixed. That's what is so puzzling about this. Below is the list of the problem accession IDs if you'd like to replicate the exception: NM_006145 NM_000602 NM_006226 NM_000371 NM_019072 NM_017884 NM_022107 NM_031418 NM_030809 NM_032731 NM_001029888 NM_001029869 NM_182572 On 8/4/06, David Scott wrote: > > hi seth- > > the 3rd argument to SimpleDocRef constructor is the REFERENCE title - > which appears to be null in the trace - which happens, but rarely. i had > the exact same problem recently - and richard put in code to check for a > null title and then call a special 2 argument constructor for > SimpleDocRef - any chance you don't have that code checked out? > > best- > david > > Seth Johnson wrote: > > Hi Richard, > > > > > > I'm back for more help. I've just completed getting and parsing the > entire > > human genome RefSeq list from NCBI. I'm not going to post my source > code > > since the invoking code has been described by the gentlemen who started > the > > original thread last month. The result of the parsing is such that out > of > > ~28K sequences, 13 produced the exceptions below. I've used the latest > > biojava code from CVS, not quite sure what the problem is on these 13. > > > > > > > > Trying to get: NM_006145 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_000602 > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------- > > > > Trying to get: NM_006226 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ---------------------------------------------------------------------------------- > > > > Trying to get: NM_000371 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_019072 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_017884 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_022107 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------- > > > > Trying to get: NM_031418 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------------- > > > > Trying to get: NM_030809 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------- > > > > Trying to get: NM_032731 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > Trying to get: NM_001029888 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > Trying to get: NM_001029869 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > Trying to get: NM_182572 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > > > > > > > > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From david at autohandle.com Fri Aug 4 16:52:47 2006 From: david at autohandle.com (David Scott) Date: Fri, 04 Aug 2006 13:52:47 -0700 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: <44D39880.9020202@autohandle.com> Message-ID: <44D3B39F.6000508@autohandle.com> hi seth- you are right - the fix for null titles was put in BioSQLRichObjectBuilder on 7.19 - i guess there must be a bug in the fix - i offered to check the fix - so i'll have to hang my head in shame. i'm looking at the code now - not the 1st time for me and i don't see the problem. i'll try one of your test cases - but i can't get to it until tomorrow. don't tell anyone i messed up- david Seth Johnson wrote: > Hi David, > > I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > my biojava-live cvs just now and it doesn't look like there were any > changes made since that date. The SimpleDocRef.java was last updated > on 7/18 and the version that I have does include the second > constructor with 2 parameters. It seems to be related to null TITLE > since all of the entries are missing it, but I was also under the > impression that null TITLE issue was fixed. That's what is so puzzling > about this. Below is the list of the problem accession IDs if you'd > like to replicate the exception: > > NM_006145 > NM_000602 > NM_006226 > NM_000371 > NM_019072 > NM_017884 > NM_022107 > NM_031418 > NM_030809 > NM_032731 > NM_001029888 > NM_001029869 > NM_182572 > > On 8/4/06, *David Scott* > wrote: > > hi seth- > > the 3rd argument to SimpleDocRef constructor is the REFERENCE title - > which appears to be null in the trace - which happens, but rarely. > i had > the exact same problem recently - and richard put in code to check > for a > null title and then call a special 2 argument constructor for > SimpleDocRef - any chance you don't have that code checked out? > > best- > david > > Seth Johnson wrote: > > Hi Richard, > > > > > > I'm back for more help. I've just completed getting and parsing > the entire > > human genome RefSeq list from NCBI. I'm not going to post my > source code > > since the invoking code has been described by the gentlemen who > started the > > original thread last month. The result of the parsing is such > that out of > > ~28K sequences, 13 produced the exceptions below. I've used the > latest > > biojava code from CVS, not quite sure what the problem is on > these 13. > > > > > > > > Trying to get: NM_006145 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_000602 > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------- > > > > > Trying to get: NM_006226 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ---------------------------------------------------------------------------------- > > > > > Trying to get: NM_000371 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_019072 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_017884 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_022107 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------- > > > > > Trying to get: NM_031418 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------------- > > > > > Trying to get: NM_030809 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------- > > > > > Trying to get: NM_032731 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029888 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029869 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_182572 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > > > > > > > > > > > -- > Best Regards, > > > Seth Johnson > Senior Bioinformatics Associate > > Ph: (202) 470-0900 > Fx: (775) 251-0358 From david at autohandle.com Sat Aug 5 18:55:23 2006 From: david at autohandle.com (David Scott) Date: Sat, 05 Aug 2006 15:55:23 -0700 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: <44D39880.9020202@autohandle.com> Message-ID: <44D521DB.2080601@autohandle.com> hi seth- nm_006145 loaded for me and i recreated the genbank entry with only minor differences - now, i'm at a loss - maybe we should wait for the other guys to get online after the weekend - they are much better at this remote debugging than i am. sorry i just couldn't help- david Seth Johnson wrote: > Hi David, > > I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > my biojava-live cvs just now and it doesn't look like there were any > changes made since that date. The SimpleDocRef.java was last updated > on 7/18 and the version that I have does include the second > constructor with 2 parameters. It seems to be related to null TITLE > since all of the entries are missing it, but I was also under the > impression that null TITLE issue was fixed. That's what is so puzzling > about this. Below is the list of the problem accession IDs if you'd > like to replicate the exception: > > NM_006145 > NM_000602 > NM_006226 > NM_000371 > NM_019072 > NM_017884 > NM_022107 > NM_031418 > NM_030809 > NM_032731 > NM_001029888 > NM_001029869 > NM_182572 > > On 8/4/06, *David Scott* > wrote: > > hi seth- > > the 3rd argument to SimpleDocRef constructor is the REFERENCE title - > which appears to be null in the trace - which happens, but rarely. > i had > the exact same problem recently - and richard put in code to check > for a > null title and then call a special 2 argument constructor for > SimpleDocRef - any chance you don't have that code checked out? > > best- > david > > Seth Johnson wrote: > > Hi Richard, > > > > > > I'm back for more help. I've just completed getting and parsing > the entire > > human genome RefSeq list from NCBI. I'm not going to post my > source code > > since the invoking code has been described by the gentlemen who > started the > > original thread last month. The result of the parsing is such > that out of > > ~28K sequences, 13 produced the exceptions below. I've used the > latest > > biojava code from CVS, not quite sure what the problem is on > these 13. > > > > > > > > Trying to get: NM_006145 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_000602 > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------- > > > > > Trying to get: NM_006226 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ---------------------------------------------------------------------------------- > > > > > Trying to get: NM_000371 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_019072 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_017884 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_022107 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------- > > > > > Trying to get: NM_031418 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------------- > > > > > Trying to get: NM_030809 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------- > > > > > Trying to get: NM_032731 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029888 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029869 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_182572 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > > > > > > > > > > > -- > Best Regards, > > > Seth Johnson > Senior Bioinformatics Associate > > Ph: (202) 470-0900 > Fx: (775) 251-0358 From n.haigh at sheffield.ac.uk Mon Aug 7 03:17:22 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Mon, 07 Aug 2006 08:17:22 +0100 Subject: [Biojava-l] Iterating over alignment columns Message-ID: <44D6E902.1030306@sheffield.ac.uk> I'm returning to a project i started having a look at a few months ago and i still can't figure out how to do the following since i'm new to Java and Biojava. It seems to me that this should be easy to do since it is essentially why alignments are generated in the first instance - to infer homology of residues (symbols) at the same position (column) in the alignment. I want to be able to iterate over all positions in an alignment and then do something with the symbols at a given position, in my case calculate the proportion of each symbol at that position. I understand that i could loop over the length of the alignment, use an index to represent the position of the column in the alignment and generate a subalignment of length 1 for all labels. However, is this efficient and how would i access the symbols so i can calculate the proportions for each symbol at the position. I would really appreciate some hand holding on this, as i'm strugling to climb the steep learning curve of OOP, Java and Biojava :o( Thanks Nathan From czaleski at albany.edu Mon Aug 7 12:33:52 2006 From: czaleski at albany.edu (czaleski at albany.edu) Date: Mon, 7 Aug 2006 12:33:52 -0400 (EDT) Subject: [Biojava-l] How to - Collection of features only? Message-ID: <1397.72.226.74.171.1154968432.squirrel@webmail.albany.edu> Greetings, I have a question about coordinate-only data. I have a database with many flavors of annotation, and I build collections of objects from this data (usually Sequences). However I have an instance where I need to retrieve coordinate based data only. For instance, I'd fetch something like all 3' UTRs defined in my RefSeq table, and then build a .bed file to be loaded into UCSC's genome browser. In this case, I need the coordinates only (chromosome, txStart, txEnd) and I do not need the actual sequence. So basically I'd like to make a collection of Features. But since in the tutorial is says: "Features cannot be created independently of a sequence", how would I do this? I expect I could create a Sequence object with an empty or null String/SymbolList, and then add a single Feature to each... but this does not seem like it would be the intended solution. Is there some other method by which I could/should accomplish this? Thanks very much Chris From mark.schreiber at novartis.com Tue Aug 1 21:17:18 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 2 Aug 2006 09:17:18 +0800 Subject: [Biojava-l] Problem Inserting Genbank File Message-ID: An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/biojava-l/attachments/20060802/79e9dd4d/attachment.html From edbeaty at charter.net Mon Aug 7 16:41:49 2006 From: edbeaty at charter.net (Dexter Riley) Date: Mon, 7 Aug 2006 13:41:49 -0700 (PDT) Subject: [Biojava-l] Getting a Slice of an Alignment In-Reply-To: <1152008150.3948.63.camel@texas.ebi.ac.uk> References: <5047818.post@talk.nabble.com> <1151335745.3938.40.camel@texas.ebi.ac.uk> <5049831.post@talk.nabble.com> <1151399858.3938.57.camel@texas.ebi.ac.uk> <5066891.post@talk.nabble.com> <1151421997.3938.91.camel@texas.ebi.ac.uk> <5072893.post@talk.nabble.com> <1151485076.3942.8.camel@texas.ebi.ac.uk> <1152008150.3948.63.camel@texas.ebi.ac.uk> Message-ID: <5695086.post@talk.nabble.com> I have time to think about the problem of creating a subalignment again. To see if I understand Richard's solution, you: Create a subalignment from the original alignment, at the desired location Iterate through each SymbolList in the alignment, and determine the offset of the SymbolList in the original alignment, determine the offset of the SymbolList in the subalignment, create a new SymbolList using these offsets. My main problem with doing this is that you create an Alignment to get the SymbolLists that represent the slice, which I then would use to...create an Alignment. Since all I really want is an Alignment view of a particular location slice of an Alignment, I really think your original idea of changing the behavior of AbstractULAlignment.SubULAlignment.symbolListForLabel() would be much more intuitive (at least to a new user like myself), and be at least one object lighter, and possibly faster to boot (can't say for sure since I'm not familiar with how AbstractULAlignment uses SubULAlignments.) Thanks, Ed -- View this message in context: http://www.nabble.com/Getting-a-Slice-of-an-Alignment-tf1849222.html#a5695086 Sent from the BioJava forum at Nabble.com. From mark.schreiber at novartis.com Mon Aug 7 22:37:58 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 8 Aug 2006 10:37:58 +0800 Subject: [Biojava-l] How to - Collection of features only? Message-ID: Hi - You can use a dummy sequence as the anchor for your sequences. org.biojava.bio.seq.SequenceTools.createDummy() - Mark czaleski at albany.edu Sent by: biojava-l-bounces at lists.open-bio.org 08/08/2006 12:33 AM Please respond to czaleski To: biojava-l at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] How to - Collection of features only? Greetings, I have a question about coordinate-only data. I have a database with many flavors of annotation, and I build collections of objects from this data (usually Sequences). However I have an instance where I need to retrieve coordinate based data only. For instance, I'd fetch something like all 3' UTRs defined in my RefSeq table, and then build a .bed file to be loaded into UCSC's genome browser. In this case, I need the coordinates only (chromosome, txStart, txEnd) and I do not need the actual sequence. So basically I'd like to make a collection of Features. But since in the tutorial is says: "Features cannot be created independently of a sequence", how would I do this? I expect I could create a Sequence object with an empty or null String/SymbolList, and then add a single Feature to each... but this does not seem like it would be the intended solution. Is there some other method by which I could/should accomplish this? Thanks very much Chris _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From n.haigh at sheffield.ac.uk Tue Aug 8 01:50:47 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 08 Aug 2006 05:50:47 +0000 Subject: [Biojava-l] Iterating over alignment columns Message-ID: <44D82637.2030800@sheffield.ac.uk> Apologies if this has come through more than once, I appear to be having some problems getting posts through. Anyway, I'm returning to a project i started having a look at a few months ago and i still can't figure out how to do the following since i'm new to Java and Biojava. It seems to me that this should be easy to do since it is essentially why alignments are generated in the first instance - to infer homology of residues (symbols) at the same position (column) in the alignment. I want to be able to iterate over all positions in an alignment and then do something with the symbols at a given position, in my case calculate the proportion of each symbol at that position. I understand that i could loop over the length of the alignment, use an index to represent the position of the column in the alignment and generate a subalignment of length 1 for all labels. However, is this efficient and how would i access the symbols so i can calculate the proportions for each symbol at the position. I would really appreciate some hand holding on this, as i'm strugling to climb the steep learning curve of OOP, Java and Biojava :o( Thanks Nathan From n.haigh at sheffield.ac.uk Tue Aug 8 11:18:23 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 08 Aug 2006 15:18:23 +0000 Subject: [Biojava-l] Iterating over alignment columns In-Reply-To: <44D85352.9070802@ebi.ac.uk> References: <44D6E902.1030306@sheffield.ac.uk> <44D85352.9070802@ebi.ac.uk> Message-ID: <44D8AB3F.7080004@sheffield.ac.uk> Richard Holland wrote: > Hello. > > Here's how you can do it: > > Alignment algn = ....; // get an alignment from somewhere > for (int col = 1; col <= algn.length(); col++) { > List symbols = new ArrayList(); > for (Iterator labels = algn.labelsAt(col); labels.hasNext(); ) { > Object label = labels.next(); > symbols.add(algn.symbolAt(label,col)); > } > // symbols now contains all symbols at column 'col' > // of the alignment. > } > > cheers, > Richard Thanks for this, it looks just like what i need. However, is the method labelsAt part of the latest CVS version of Biojava - they do not appear to be in Biojava 1.4 according to eclipse. Thanks Nathan From johnson.biotech at gmail.com Tue Aug 8 11:44:42 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Tue, 8 Aug 2006 11:44:42 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44D84F0C.50700@ebi.ac.uk> References: <44D39880.9020202@autohandle.com> <44D521DB.2080601@autohandle.com> <44D84F0C.50700@ebi.ac.uk> Message-ID: Hello, Works perfect now!!! Thanks for looking into it. On 8/8/06, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello all. > > Sorry for the delay - I was off sick yesterday. > > The bug had been fixed in the BioSQL side of things, but not when merely > loading sequences into memory without persisting them to the database. > I've been through and checked and hopefully fixed this now in CVS. Let > me know how you get on! > > cheers, > Richard > > > David Scott wrote: > > hi seth- > > > > nm_006145 loaded for me and i recreated the genbank entry with only > > minor differences - now, i'm at a loss - maybe we should wait for the > > other guys to get online after the weekend - they are much better at > > this remote debugging than i am. > > > > sorry i just couldn't help- > > david > > > > Seth Johnson wrote: > >> Hi David, > >> > >> I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > >> my biojava-live cvs just now and it doesn't look like there were any > >> changes made since that date. The SimpleDocRef.java was last updated > >> on 7/18 and the version that I have does include the second > >> constructor with 2 parameters. It seems to be related to null TITLE > >> since all of the entries are missing it, but I was also under the > >> impression that null TITLE issue was fixed. That's what is so puzzling > >> about this. Below is the list of the problem accession IDs if you'd > >> like to replicate the exception: > >> > >> NM_006145 > >> NM_000602 > >> NM_006226 > >> NM_000371 > >> NM_019072 > >> NM_017884 > >> NM_022107 > >> NM_031418 > >> NM_030809 > >> NM_032731 > >> NM_001029888 > >> NM_001029869 > >> NM_182572 > >> > >> On 8/4/06, *David Scott* >> > wrote: > >> > >> hi seth- > >> > >> the 3rd argument to SimpleDocRef constructor is the REFERENCE title > - > >> which appears to be null in the trace - which happens, but rarely. > >> i had > >> the exact same problem recently - and richard put in code to check > >> for a > >> null title and then call a special 2 argument constructor for > >> SimpleDocRef - any chance you don't have that code checked out? > >> > >> best- > >> david > >> > >> Seth Johnson wrote: > >> > Hi Richard, > >> > > >> > > >> > I'm back for more help. I've just completed getting and parsing > >> the entire > >> > human genome RefSeq list from NCBI. I'm not going to post my > >> source code > >> > since the invoking code has been described by the gentlemen who > >> started the > >> > original thread last month. The result of the parsing is such > >> that out of > >> > ~28K sequences, 13 produced the exceptions below. I've used the > >> latest > >> > biojava code from CVS, not quite sure what the problem is on > >> these 13. > >> > > >> > > >> > > >> > Trying to get: NM_006145 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000602 > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_006226 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ---------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000371 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_019072 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_017884 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_022107 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_031418 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_030809 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_032731 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029888 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029869 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_182572 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> -- > >> Best Regards, > >> > >> > >> Seth Johnson > >> Senior Bioinformatics Associate > >> > >> Ph: (202) 470-0900 > >> Fx: (775) 251-0358 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFE2E8K4C5LeMEKA/QRAsZlAKCUyzOv2z94PViXbx2i3RVTXCfn1gCfbxEU > oCGhefHeLFUxIrHLgZJHuQ0= > =SMVh > -----END PGP SIGNATURE----- > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From johnson.biotech at gmail.com Tue Aug 8 11:44:42 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Tue, 8 Aug 2006 11:44:42 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44D84F0C.50700@ebi.ac.uk> References: <44D39880.9020202@autohandle.com> <44D521DB.2080601@autohandle.com> <44D84F0C.50700@ebi.ac.uk> Message-ID: Hello, Works perfect now!!! Thanks for looking into it. On 8/8/06, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello all. > > Sorry for the delay - I was off sick yesterday. > > The bug had been fixed in the BioSQL side of things, but not when merely > loading sequences into memory without persisting them to the database. > I've been through and checked and hopefully fixed this now in CVS. Let > me know how you get on! > > cheers, > Richard > > > David Scott wrote: > > hi seth- > > > > nm_006145 loaded for me and i recreated the genbank entry with only > > minor differences - now, i'm at a loss - maybe we should wait for the > > other guys to get online after the weekend - they are much better at > > this remote debugging than i am. > > > > sorry i just couldn't help- > > david > > > > Seth Johnson wrote: > >> Hi David, > >> > >> I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > >> my biojava-live cvs just now and it doesn't look like there were any > >> changes made since that date. The SimpleDocRef.java was last updated > >> on 7/18 and the version that I have does include the second > >> constructor with 2 parameters. It seems to be related to null TITLE > >> since all of the entries are missing it, but I was also under the > >> impression that null TITLE issue was fixed. That's what is so puzzling > >> about this. Below is the list of the problem accession IDs if you'd > >> like to replicate the exception: > >> > >> NM_006145 > >> NM_000602 > >> NM_006226 > >> NM_000371 > >> NM_019072 > >> NM_017884 > >> NM_022107 > >> NM_031418 > >> NM_030809 > >> NM_032731 > >> NM_001029888 > >> NM_001029869 > >> NM_182572 > >> > >> On 8/4/06, *David Scott* >> > wrote: > >> > >> hi seth- > >> > >> the 3rd argument to SimpleDocRef constructor is the REFERENCE title > - > >> which appears to be null in the trace - which happens, but rarely. > >> i had > >> the exact same problem recently - and richard put in code to check > >> for a > >> null title and then call a special 2 argument constructor for > >> SimpleDocRef - any chance you don't have that code checked out? > >> > >> best- > >> david > >> > >> Seth Johnson wrote: > >> > Hi Richard, > >> > > >> > > >> > I'm back for more help. I've just completed getting and parsing > >> the entire > >> > human genome RefSeq list from NCBI. I'm not going to post my > >> source code > >> > since the invoking code has been described by the gentlemen who > >> started the > >> > original thread last month. The result of the parsing is such > >> that out of > >> > ~28K sequences, 13 produced the exceptions below. I've used the > >> latest > >> > biojava code from CVS, not quite sure what the problem is on > >> these 13. > >> > > >> > > >> > > >> > Trying to get: NM_006145 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000602 > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_006226 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ---------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000371 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_019072 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_017884 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_022107 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_031418 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_030809 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_032731 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029888 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029869 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_182572 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> -- > >> Best Regards, > >> > >> > >> Seth Johnson > >> Senior Bioinformatics Associate > >> > >> Ph: (202) 470-0900 > >> Fx: (775) 251-0358 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFE2E8K4C5LeMEKA/QRAsZlAKCUyzOv2z94PViXbx2i3RVTXCfn1gCfbxEU > oCGhefHeLFUxIrHLgZJHuQ0= > =SMVh > -----END PGP SIGNATURE----- > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From n.haigh at sheffield.ac.uk Wed Aug 9 09:20:17 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Wed, 09 Aug 2006 13:20:17 +0000 Subject: [Biojava-l] Location Objects Message-ID: <44D9E111.3000007@sheffield.ac.uk> I apologies for the waffle but i'm a bit stuck with Location objects, and in particular if a Location object can be empty and if there are methods for creating, checking and testing for empty Locations. I'm writing some classes which use Location objects. It seems to me that it should be possible to create an empty Location object and also test if a Location object is empty. In addition, it would be good if the union method could handle these empty Location objects. What i would like to do, is to iterate over columns in an alignment and determine in that position should be added to the Location object. Ideally, what i would like to do is to create an empty Location object and then use the union method to add positions as i iterate over the alignment columns. In addition, i'm writing a method that takes an inverse of a Location given an alignment length using the exclude method. What i'm currently having to do is create a dummy Location object with position 0,0 - which doesn't make much sense since they should be +ve integers shouldn't they? Then while itereating over positions in the alignment, i use LocationTools.union with the dummy Location and a PointLocation object to add the current position to the Location object which is being built. The main problem i have is when i have an alignment of length 20 and a Location object coving all positions from 1-20 and i want to invert this Location object. It make sense to return an "empty" Location object, but i'm using a Location object with coordinates 0,0 for my purposes. At the moment my JUnit test for inverting the Location object shown above tests for a Location object with coordinates 0,0: Location inv = invLocation(LocationTools.makeLocation(1, 17), 17); assertEquals(LocationTools.makeLocation(0,0), inv); however i get an error like: junit.framework.AssertionFailedError: expected:<0> but was:<{}> Therefore, there must be a way to represent an empty Location object, but it may not be fully implemented. If i use getMin on this inverted Location (which should be empty) i get a value of 2147483647 with no errors. Thanks Nathan From n.haigh at sheffield.ac.uk Wed Aug 9 11:13:49 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Wed, 09 Aug 2006 15:13:49 +0000 Subject: [Biojava-l] Alignment objects In-Reply-To: <44D9E948.5090309@ebi.ac.uk> References: <44D9E111.3000007@sheffield.ac.uk> <44D9E453.9080104@ebi.ac.uk> <44D9F54E.6060602@sheffield.ac.uk> <44D9E948.5090309@ebi.ac.uk> Message-ID: <44D9FBAD.3080408@sheffield.ac.uk> I think i'm having a few problem with alignments. I've generated an protein alignment in the following way: String alnString1 = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br1 = new BufferedReader(new StringReader(alnString1)); FastaAlignmentFormat faf1 = new FastaAlignmentFormat(); alignment = faf1.read( br1 ); If i loop over positions in the alignment to add the positions with gaps to a Location object, i have to do the following. It seems hacky since i'm having to check for symbol names containing "[]" in order to identify gaps. I'm sure there must be a better way to do this!? A better way would be to calculate the frequency of each symbol (including gaps) at a position in the alignment. This way i could return a list of these frequencies for each position which could be used by other methods for identifying positions with certain characteristic (such as those containing gaps) ...any ideas? for (int col = 1; col <= alignment.length(); col++) { for (Iterator labels = alignment.getLabels().iterator(); labels.hasNext(); ) { Object label = labels.next(); Symbol sym = alignment.symbolAt(label,col); if (sym.getName().contains("[]")) { Location newLocation = LocationTools.makeLocation(col, col); gapped = this.appendLocation(gapped, newLocation); } } } Cheers Nath From walsh at andrew.cmu.edu Wed Aug 9 11:59:52 2006 From: walsh at andrew.cmu.edu (Andrew Walsh) Date: Wed, 09 Aug 2006 11:59:52 -0400 Subject: [Biojava-l] Getting a Slice of an Alignment In-Reply-To: <5695086.post@talk.nabble.com> References: <5047818.post@talk.nabble.com> <1151335745.3938.40.camel@texas.ebi.ac.uk> <5049831.post@talk.nabble.com> <1151399858.3938.57.camel@texas.ebi.ac.uk> <5066891.post@talk.nabble.com> <1151421997.3938.91.camel@texas.ebi.ac.uk> <5072893.post@talk.nabble.com> <1151485076.3942.8.camel@texas.ebi.ac.uk> <1152008150.3948.63.camel@texas.ebi.ac.uk> <5695086.post@talk.nabble.com> Message-ID: <44DA0678.4040404@andrew.cmu.edu> I have just found a need to do something similar (i.e. extract select columns from an alignment) and have discovered that the subAlignment() implementation of the SimpleAlignment class does exactly what the original poster wants the method to do, and what the method description from the Alignment interface API suggests it should do. It returns an Alignment object which contains only those sequences indicated by the first argument (a Set of labels), and only the symbols from the columns specified in the second argument (a Location object). No further processing is needed to get just the symbols from the specified columns. These columns need not even be contiguous; it will work correctly with any arbitrary subset of the columns. It seems to me that since this method is part of the Alignment interface that all implementations should have the same behavior. They should provide the specified subalignment without the need for further processing. I would thus propose that a modification to the AbstractULAlignment.subAlignment() method (or the AbstractULAlignment.SubULAlignment(Set labels, Location loc) constructor, since this is where the actual work is done) be made to have it perform correctly. Other Alignment implementing classes may also need to be modified as well. -Andy Dexter Riley wrote: > I have time to think about the problem of creating a subalignment again. To > see if I understand Richard's solution, you: > Create a subalignment from the original alignment, at the desired location > Iterate through each SymbolList in the alignment, and > determine the offset of the SymbolList in the original alignment, > determine the offset of the SymbolList in the subalignment, > create a new SymbolList using these offsets. > > My main problem with doing this is that you create an Alignment to get the > SymbolLists that represent the slice, which I then would use to...create an > Alignment. Since all I really want is an Alignment view of a particular > location slice of an Alignment, I really think your original idea of > changing the behavior of > AbstractULAlignment.SubULAlignment.symbolListForLabel() would be much more > intuitive (at least to a new user like myself), and be at least one object > lighter, and possibly faster to boot (can't say for sure since I'm not > familiar with how AbstractULAlignment uses SubULAlignments.) > Thanks, > Ed > > From Russell.Smithies at agresearch.co.nz Wed Aug 9 21:59:34 2006 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 10 Aug 2006 13:59:34 +1200 Subject: [Biojava-l] [OFF TOPIC] NCBI eUtils PowerScripting course In-Reply-To: <44D9FBAD.3080408@sheffield.ac.uk> Message-ID: Hi all, I'm writing a proposal to attend the NCBI eUtils PowerScripting course and am looking for testiminials from past attendees. Has anyone on the list done the course? Was it worth while? Any comments greatly appreciated :-) Thanx, Russell Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mark.schreiber at novartis.com Wed Aug 9 22:17:55 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 10 Aug 2006 10:17:55 +0800 Subject: [Biojava-l] Location Objects Message-ID: There is the static member of Location, Location.empty , unfortunately it is package private. I have just made it public in biojava-live cause I can't see why it shouldn't be. You can actually get it from LocationTools buy performing an operation that doesn't make sense. For example do an intersection of two locations that don't intersect. If you are using biojava-live from CVS there is also RichLocation.EMPTY_LOCATION which is public. RichLocations are basically normal locations with more functionality I'm not certain that they will behaive in the way you expect. An EmptyLocation doens't really exist, we only really use it to avoid returning null. Thus the min value of an empty is MAX_INT and the max value is MIN_INT. This is because they have to have values and 0,0 could be a real location so we use this strangely inverted max an min which probably best represents some kind of black-hole or something?! I would be interested to know what happens when you try using one of the above for your example below. Hope this helps, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Nathan S. Haigh" Sent by: biojava-l-bounces at lists.open-bio.org 08/09/2006 09:20 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Location Objects I apologies for the waffle but i'm a bit stuck with Location objects, and in particular if a Location object can be empty and if there are methods for creating, checking and testing for empty Locations. I'm writing some classes which use Location objects. It seems to me that it should be possible to create an empty Location object and also test if a Location object is empty. In addition, it would be good if the union method could handle these empty Location objects. What i would like to do, is to iterate over columns in an alignment and determine in that position should be added to the Location object. Ideally, what i would like to do is to create an empty Location object and then use the union method to add positions as i iterate over the alignment columns. In addition, i'm writing a method that takes an inverse of a Location given an alignment length using the exclude method. What i'm currently having to do is create a dummy Location object with position 0,0 - which doesn't make much sense since they should be +ve integers shouldn't they? Then while itereating over positions in the alignment, i use LocationTools.union with the dummy Location and a PointLocation object to add the current position to the Location object which is being built. The main problem i have is when i have an alignment of length 20 and a Location object coving all positions from 1-20 and i want to invert this Location object. It make sense to return an "empty" Location object, but i'm using a Location object with coordinates 0,0 for my purposes. At the moment my JUnit test for inverting the Location object shown above tests for a Location object with coordinates 0,0: Location inv = invLocation(LocationTools.makeLocation(1, 17), 17); assertEquals(LocationTools.makeLocation(0,0), inv); however i get an error like: junit.framework.AssertionFailedError: expected:<0> but was:<{}> Therefore, there must be a way to represent an empty Location object, but it may not be fully implemented. If i use getMin on this inverted Location (which should be empty) i get a value of 2147483647 with no errors. Thanks Nathan _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From n.haigh at sheffield.ac.uk Thu Aug 10 04:31:04 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 10 Aug 2006 08:31:04 +0000 Subject: [Biojava-l] Alignment objects In-Reply-To: <44D9F874.3060001@ebi.ac.uk> References: <44D9E111.3000007@sheffield.ac.uk> <44D9E453.9080104@ebi.ac.uk> <44D9F54E.6060602@sheffield.ac.uk> <44D9E948.5090309@ebi.ac.uk> <44D9FBAD.3080408@sheffield.ac.uk> <44D9F874.3060001@ebi.ac.uk> Message-ID: <44DAEEC8.9090209@sheffield.ac.uk> Richard Holland wrote: > You could change this: > > sym.getName().contains("[]") > > to this: > > AlphabetManager.getGapSymbol().equals(sym) > > Frequency calculations can be done quite quickly using DistributionTools: > > Distribution[] dists = DistributionTools.distOverAlignment(algn, > true); > // true says to include gaps in the statistics > // The dists array will have the same number of entries as there > // are columns in the alignment. > for (int i = 0; i < dists.length; i++) { > // i = 0 = first column in alignment > Distribution dist = dists[i]; > // Find out the weight for A in this column. > double AWeight = dist.getWeight(DNATools.a()); > // Find out the weight for gaps in this column. > double GapWeight = > dist.getWeight(DNATools.getDNA().getGapSymbol()); > } > > cheers, > Richard This is definitely getting close to what i need. However, i think i'm having trouble with alphabets which is stopping me from using soemthing like: AlphabetManager.getGapSymbol().equals(sym) I currently creating an alignment like this: String alnString1 = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br1 = new BufferedReader(new StringReader(alnString1)); FastaAlignmentFormat faf1 = new FastaAlignmentFormat(); aln1 = faf1.read( br1 ); And i never get true returned from: AlphabetManager.getGapSymbol().equals(sym) I assume this is because the mechanisms that are in place for setting the alphabet of the alignment are not correctly setting the gap symbol. The program i am writing should be capable of determining the alphabet of any alignment that is loaded, so it makes sense to change: AlphabetManager.getGapSymbol().equals(sym) to: alignment.getAlphabet.getGapSymbol().equals(sym) but this doesn't work either. Eventually i'd like my application to be able to load alignment from several different formats, some of which may use more than one symbol as the gap, while others have a "default" gap character. Are there mechanisms in place to attempt to correctly set the gapSymbol for an alignment? For example FASTA format alignments should probably set the gap symbol to the hyphen "-". Once again, being new to this, i am probably missing something that is obvious to you guys. Thanks for all your time end effort in helping me out. Nathan From mark.schreiber at novartis.com Thu Aug 10 03:56:42 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 10 Aug 2006 15:56:42 +0800 Subject: [Biojava-l] Alignment objects Message-ID: Hi - There is a difference between the gap returned by AlphabetManager.getGapSymbol and the gap returned by an alphabet.getGapSymbol(). There is some very complex reasons for this which could make up a large part of a thesis (literally, take a look at Matthew Pococks thesis some time). Simply speaking, dynamic programming and HMMs wouldn't work without it. It becomes especially obvious when you have an alignment. The alphabet of an alignment of 3 DNA sequences is DNAxDNAxDNA. Thus a gap from that alphabet is really gap x gap x gap. Depending on what you are trying to do you would want to test for Symbol s == align.getAlphabet().getGap() or Symbol s == DNATools.getDNA().getGap(). - Mark "Nathan S. Haigh" Sent by: biojava-l-bounces at lists.open-bio.org 08/10/2006 04:31 PM To: Richard Holland , biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Alignment objects Richard Holland wrote: > You could change this: > > sym.getName().contains("[]") > > to this: > > AlphabetManager.getGapSymbol().equals(sym) > > Frequency calculations can be done quite quickly using DistributionTools: > > Distribution[] dists = DistributionTools.distOverAlignment(algn, > true); > // true says to include gaps in the statistics > // The dists array will have the same number of entries as there > // are columns in the alignment. > for (int i = 0; i < dists.length; i++) { > // i = 0 = first column in alignment > Distribution dist = dists[i]; > // Find out the weight for A in this column. > double AWeight = dist.getWeight(DNATools.a()); > // Find out the weight for gaps in this column. > double GapWeight = > dist.getWeight(DNATools.getDNA().getGapSymbol()); > } > > cheers, > Richard This is definitely getting close to what i need. However, i think i'm having trouble with alphabets which is stopping me from using soemthing like: AlphabetManager.getGapSymbol().equals(sym) I currently creating an alignment like this: String alnString1 = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br1 = new BufferedReader(new StringReader(alnString1)); FastaAlignmentFormat faf1 = new FastaAlignmentFormat(); aln1 = faf1.read( br1 ); And i never get true returned from: AlphabetManager.getGapSymbol().equals(sym) I assume this is because the mechanisms that are in place for setting the alphabet of the alignment are not correctly setting the gap symbol. The program i am writing should be capable of determining the alphabet of any alignment that is loaded, so it makes sense to change: AlphabetManager.getGapSymbol().equals(sym) to: alignment.getAlphabet.getGapSymbol().equals(sym) but this doesn't work either. Eventually i'd like my application to be able to load alignment from several different formats, some of which may use more than one symbol as the gap, while others have a "default" gap character. Are there mechanisms in place to attempt to correctly set the gapSymbol for an alignment? For example FASTA format alignments should probably set the gap symbol to the hyphen "-". Once again, being new to this, i am probably missing something that is obvious to you guys. Thanks for all your time end effort in helping me out. Nathan _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From ckol at iti.gr Thu Aug 10 08:05:56 2006 From: ckol at iti.gr (ckol at iti.gr) Date: Thu, 10 Aug 2006 15:05:56 +0300 (EEST) Subject: [Biojava-l] Create MSA from a profile hmm Message-ID: <1217.155.207.19.4.1155211556.squirrel@mail.iti.gr> Hello all, I'm new in biojava and i reached a problem. i created a profile hmm from a set of unaligned homologues sequences then i aligned some sequences to the model.i noticed that every sequence has its own length for the alignment.is there any method by which i can create a multiple sequence alignment between these sequences? thanks in advance, ckol From ola.spjuth at farmbio.uu.se Fri Aug 11 11:08:26 2006 From: ola.spjuth at farmbio.uu.se (Ola Spjuth) Date: Fri, 11 Aug 2006 17:08:26 +0200 Subject: [Biojava-l] Bioclipse 1.0 released Message-ID: <8DD39D6D-FB01-411A-910D-CA23E29255FF@farmbio.uu.se> The Bioclipse team is proud to announce the release of Bioclipse 1.0, containing a BioJava plugin for parsing and visualizing sequences (currently only fasta sequences and uniprot features are supported). Bioclipse [1] is a free, open source, workbench for chemo- and bioinformatics with powerful editing and visualization capabilities for molecules, sequences, proteins, spectra etc. The major features of version 1.0 are: * Import and export in various file formats * Visual editing of molecular 2D-structures * 3D-visualization of molecules and proteins * Editing and visualization of sequences and features (DNA, RNA, proteins etc) * Graphing and editing of various types of spectra, e. g. NMR, MS * Retrieval of resources (sequences, proteins, etc) from public data repositories * Scripting of 3D-visualizations with syntax highlighting and content assistance * PDB-editor with syntax highlighting for working with PDB files * CMLRSS-viewer for downloading chemical content published on the web using RSS-feeds * Integrated, searchable help-system * Hierarchical view of molecular and macromolecular substructures and calculation of chemical properties * Connection with external programs, e. g. PyMol Bioclipse is a rich client, which means it is run on your local computer but also gives the possibility to communicate with servers for data retrieval and computational services. The powerful plugin architecture is based on Eclipse[2], and results in a responsive, integrated user interface designed for simple and intuitive operations that at the same time is easy to extend with custom functionality. There is much ongoing work with Bioclipse and new features are constantly added. Please visit the Bioclipse Wiki [3] in order to get the latest information regarding the development. Bioclipse is available for download from Sourceforge [4]. [1] Bioclipse homepage: http://www.bioclipse.net [2] Eclipse: http://www.eclipse.org [3] Bioclipse Wiki: http://wiki.bioclipse.net [4] Sourceforge project site: http://sourceforge.net/projects/bioclipse/ From johnson.biotech at gmail.com Sat Aug 12 13:17:57 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Sat, 12 Aug 2006 10:17:57 -0700 (PDT) Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: Message-ID: <5777810.post@talk.nabble.com> More problems with parsing nucleotide sequences from NCBI. Apparently, there's an odd dbxref tag on some of the sequences submitted by ATCC that causes an exception. I've ran into 2 so far, but I'm sure there are more: AA343569.1 AA325485.1 Exceptions produced are as follows: -------------------------------------------------------------- Trying to get: AA343569.1 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:157) at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java:250) at exonhit.parsers.EventParser.insertRglrSE(EventParser.java:197) at exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) at exonhit.parsers.EventParser.main(EventParser.java:310) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:153) ... 4 more Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC (inhost):145151, accession:AA343569 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:438) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) ... 5 more Java Result: -1 ========================================================= Trying to get: AA325485.1 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:157) at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java:250) at exonhit.parsers.EventParser.insertRglrSE(EventParser.java:197) at exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) at exonhit.parsers.EventParser.main(EventParser.java:312) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:153) ... 4 more Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC (inhost):125990, accession:AA325485 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:438) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) ... 5 more Java Result: -1 -- View this message in context: http://www.nabble.com/Parsing-Genbank-sequences-from-NCBI-tf2052235.html#a5777810 Sent from the BioJava forum at Nabble.com. From johnson.biotech at gmail.com Mon Aug 14 10:47:26 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Mon, 14 Aug 2006 10:47:26 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44E0357E.3040008@ebi.ac.uk> References: <5777810.post@talk.nabble.com> <44E0357E.3040008@ebi.ac.uk> Message-ID: Hi Richard, Apparently there are more problems. I get an exception while trying to retrieve BM353894.1 -------------------------------------------------------------- Trying to get: BM353894.1 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java:250) at exonhit.parsers.EventParser.insertRglrSE(EventParser.java:197) at exonhit.parsers.EventParser.createSpliceEvents(EventParser.java :105) at exonhit.parsers.EventParser.main(EventParser.java:312) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 4 more Caused by: org.biojava.bio.seq.io.ParseException at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:274) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 5 more Java Result: -1 ------------------------------------------------------------- On 8/14/06, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I've made a small change to the regex which matches these so that it > will now accept spaces before the colon (previously, it didn't). > > Can you check out the latest from CVS and try again? > > cheers, > Richard > > Seth Johnson wrote: > > More problems with parsing nucleotide sequences from NCBI. Apparently, > > there's an odd dbxref tag on some of the sequences submitted by ATCC > that > > causes an exception. I've ran into 2 so far, but I'm sure there are > more: > > > > AA343569.1 > > AA325485.1 > > > > Exceptions produced are as follows: > > -------------------------------------------------------------- > > Trying to get: AA343569.1 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java > :250) > > at exonhit.parsers.EventParser.insertRglrSE(EventParser.java > :197) > > at > > exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) > > at exonhit.parsers.EventParser.main(EventParser.java:310) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 4 more > > Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC > > (inhost):145151, accession:AA343569 > > at > > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:438) > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 5 more > > Java Result: -1 > > ========================================================= > > Trying to get: AA325485.1 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java > :250) > > at exonhit.parsers.EventParser.insertRglrSE(EventParser.java > :197) > > at > > exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) > > at exonhit.parsers.EventParser.main(EventParser.java:312) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 4 more > > Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC > > (inhost):125990, accession:AA325485 > > at > > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:438) > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 5 more > > Java Result: -1 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFE4DV+4C5LeMEKA/QRAtrTAKCjNFnkmhAF52LhvrpyurnRToe0LACgiEUs > GUmVcpkdByVWADCXvfKCsYE= > =ZBlJ > -----END PGP SIGNATURE----- > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From mark.schreiber at novartis.com Mon Aug 14 21:18:08 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 15 Aug 2006 09:18:08 +0800 Subject: [Biojava-l] Alignment objects Message-ID: Hi Nathan - You are on the right track, almost. The alphabet of the alignment is PROTEIN x PROTEIN (possibly it is PROTEIN-TERM x PROTEIN-TERM). PROTEIN-TERM is the same as protein but contains a * symbol to represent a translated stop codon. Useful if someone translates the wrong reading frame. Thus the gap symbol of your alignment is gapxgap or [] [] as you found. The first symbol of your alignment is ([] Ala). The reason you find nothing with the gap symbol of the alignment is that there are no columns with only gaps. It is always gap x something or something x gap. To check for gaps in columns you could iterate like you have done with each individual sequence. In this case you would need to check for the gap symbol from the alphabet PROTEIN-TERM, or equivalently the gap symbol of the Alphabet of one of the SymbolLists from the alignment (specifically the one you are checking). You could also search make ambiguity symbols from the Alignment alphabet that contain gaps ([] X) gap with anything (X []) anything with gap and ([] []) gap with gap or the gap symbol of the Alignment. This approach is faster but for larger alignments requires more Symbols to check. It would be pretty easy to construct them recursively though. Hope this helps, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Nathan S. Haigh" 08/14/2006 04:00 PM To: mark.schreiber at novartis.com cc: Subject: Re: [Biojava-l] Alignment objects Hi Mark - this doesn't seem to be working as I'd expected/hoped. Let me just recap what I've got so far: I create an alignment (for testing purposes) like this: String alnString = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br = new BufferedReader(new StringReader(alnString)); FastaAlignmentFormat faf = new FastaAlignmentFormat(); alignment = faf.read( br ); I loop over columns of the alignment and test if there are any gaps in the column, I have shown 2 alternative if statements which are supposed to test if a gap is present. One of these works (but is a bit of a hack) and the other (which seems like the correct way to do things) doesn't work: for (int col = 1; col <= alignment.length(); col++) { for (Iterator labels = alignment.getLabels().iterator(); labels.hasNext(); ) { Symbol sym = alignment.symbolAt(labels.next(),col); if (sym.getName().contains("[]")) { // this currently works if (sym.equals(alignment.getAlphabet().getGapSymbol())) { // this doesn't work // add this col to a Location object } } } If I do: System.out.println(alignment.getAlphabet().getGapSymbol()); I get: org.biojava.bio.symbol.SimpleBasisSymbol: ([] []) I'm unsure exactly what I'm supposed to get here, but I suspect that the gap symbol isn't getting set correctly when I create the alignment. I really want to use the getGapSymbol method of the alignment, since the alignment a user may load in practice could be either nucleotide or amino acid. Cheers Nathan mark.schreiber at novartis.com wrote: > Sorry, that should be getGapSymbol(). > > - Mark > > > > > > > Nathan Haigh > 08/11/2006 06:12 PM > Please respond to n.haigh > > > To: mark.schreiber at novartis.com > cc: > Subject: Re: [Biojava-l] Alignment objects > > > mark.schreiber at novartis.com wrote: > >> Hi - >> >> There is a difference between the gap returned by >> AlphabetManager.getGapSymbol and the gap returned by an >> alphabet.getGapSymbol(). There is some very complex reasons for this >> > which > >> could make up a large part of a thesis (literally, take a look at >> > Matthew > >> Pococks thesis some time). Simply speaking, dynamic programming and HMMs >> wouldn't work without it. >> >> It becomes especially obvious when you have an alignment. The alphabet >> > of > >> an alignment of 3 DNA sequences is DNAxDNAxDNA. Thus a gap from that >> alphabet is really gap x gap x gap. >> >> Depending on what you are trying to do you would want to test for >> >> Symbol s == align.getAlphabet().getGap() >> >> or >> >> Symbol s == DNATools.getDNA().getGap(). >> >> - Mark >> >> >> > Is the getGap method part of the Biojava-live API but not the 1.4 API? > > Cheers > Nath > > > > > [ Attachment ''N.HAIGH.VCF'' removed by Mark Schreiber ] > > > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From mark.schreiber at novartis.com Tue Aug 15 06:02:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 15 Aug 2006 18:02:13 +0800 Subject: [Biojava-l] Alignment objects Message-ID: >OK, so this is where I'm going wrong - I thought a symbol was 1 >residue/amino acid/gap....why is this not the case? it seems rather >counter intuitive to ask for the gap symbol for an alignment and be >returned [] x n, where n=the number of sequences in the alignment. > It's all really down to inhertitence and the object model. Alignments implement SymbolList. Which makes sense really even though it is not intuitive. An alignment is really a sequence of Symbols that is N deep. It's alphabet therefore must be N deep. It is a compound Alphabet. This allows for enormous flexibility. It is perfectly possible to make an alignment which is ((DNAxDNAxDNA)xPROTEIN-TERM) which is an alignment of a codon alphabet to a protein. You can even do it as ((DNAxDNAxDNA)x(DNAxDNAxDNA)xPROTEIN-TERM) which is two dna sequences in codon alphabet against a single protein!! A gap symbol from any alphabet is really N gaps where N is the number of Alphabets (typically 1 as most Alphabets are not compound). Unfortunately this very flexible design is as you say not intuitive. Why would we even want this (apart from the trivial DNA to protein example). The main reason comes from HMMs. The result of a pairwise alignment using an HMM is a 3 part alignment of the query the state path and the match. The query and match are probably from the same Alphabet but the state path is a SymbolList made from the Alphabet of states in the HMM. Similarly the alignment of a profile HMM to a protein is an alignment of the HMM alphabet to the protein Alphabet. >I may be getting a little confused because I'm new to this, and i have >in mind what i would like to do, think it should be straight forward but >am finding it not so easy. How then, do I get the gap symbol e.g. [] >from an alignment (which may be DNA, RNA or protein) so I can check if >any column in the alignment contains 1 or more gaps? The reason why there cannot be a gap symbol that would work for all occasions is because you can ask mulitple questions of gaps in an alignment. Eg, is position i composed only of gaps?, does sequence j at position i in the alignment contain a gap?, is there any gap in position i? To answer the sequence j position i example the best (only really) way is to get the symbol for position i (with label j) and test if it is equal to j.getAlphabet.getGapSymbol(). To answer your question it is simply a matter of performing the same operation at position i for all SymbolLists at that position. >I am also thinking of using distributions for doing this job in a more >generic approach so that any symbol could be used, but i think i might >end up with the same problems as I have here, so I think I should try to >figure out this "simpler" problem first. There are methods in DistributionTools for calculating an array of Distributions for an Alignment. This is not efficient for your purposes because it also counts all the other residues and divides by the total (weighted by the background model). If you only want to find gaps that is a few more operations than you need. The approach above would be faster. Also this DistributionTools method will not work for alignments with more than one alphabet like the codon, protein one above (not a problem in your case but not generic). >It seems to me, that if i could have >alignment.getAlphabet().getGapSymbol() to return the same thing as would >proteinTools.getAlphabet().getGapSymbol() then my problem would be solved. This can't be done as you would have to assume that all the SymbolLists in the Alignment would have to be from the same Alphabet which is not required (and not even desirable, especially for HMMs). - Mark mark.schreiber at novartis.com wrote: > Hi Nathan - > > You are on the right track, almost. > > The alphabet of the alignment is PROTEIN x PROTEIN (possibly it is > PROTEIN-TERM x PROTEIN-TERM). PROTEIN-TERM is the same as protein but > contains a * symbol to represent a translated stop codon. Useful if > someone translates the wrong reading frame. > > > Thus the gap symbol of your alignment is gapxgap or [] [] as you found. > The first symbol of your alignment is ([] Ala). The reason you find > nothing with the gap symbol of the alignment is that there are no columns > with only gaps. It is always gap x something or something x gap. To check > for gaps in columns you could iterate like you have done with each > individual sequence. In this case you would need to check for the gap > symbol from the alphabet PROTEIN-TERM, or equivalently the gap symbol of > the Alphabet of one of the SymbolLists from the alignment (specifically > the one you are checking). > > You could also search make ambiguity symbols from the Alignment alphabet > that contain gaps ([] X) gap with anything (X []) anything with gap and > ([] []) gap with gap or the gap symbol of the Alignment. This approach is > faster but for larger alignments requires more Symbols to check. It would > be pretty easy to construct them recursively though. > > Hope this helps, > > - Mark > From mark.schreiber at novartis.com Tue Aug 15 21:07:19 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 16 Aug 2006 09:07:19 +0800 Subject: [Biojava-l] Alignment objects Message-ID: You've got it! There is actually one other way ... If you have an alignment 3 deep for DNA you can create and test for the following symbols ([], N, N) (N, [], N) (N, N, []). These will find sites with exactly one gap. You can then make ([], [], N) and (N. [] []) and ([], N, []) and ([],[],[]) which is the alignment gap symbol. This means you can test each site of the alignment for each of the symbols. It requires some work up front to make the symbols (possibly recursion) but if you put them in a HashMap then you can simply iterate over each site in the Alignment and see if the DNAxDNAxDNA symbol returned is contained in the HashMap. This may actually be more efficient. - Mark "Nathan S. Haigh" 08/15/2006 08:17 PM Please respond to n.haigh To: mark.schreiber at novartis.com cc: Subject: Re: [Biojava-l] Alignment objects mark.schreiber at novartis.com wrote: >> OK, so this is where I'm going wrong - I thought a symbol was 1 >> residue/amino acid/gap....why is this not the case? it seems rather >> counter intuitive to ask for the gap symbol for an alignment and be >> returned [] x n, where n=the number of sequences in the alignment. >> >> > > It's all really down to inhertitence and the object model. Alignments > implement SymbolList. Which makes sense really even though it is not > intuitive. An alignment is really a sequence of Symbols that is N deep. > It's alphabet therefore must be N deep. It is a compound Alphabet. This > allows for enormous flexibility. It is perfectly possible to make an > alignment which is ((DNAxDNAxDNA)xPROTEIN-TERM) which is an alignment of a > codon alphabet to a protein. You can even do it as > ((DNAxDNAxDNA)x(DNAxDNAxDNA)xPROTEIN-TERM) which is two dna sequences in > codon alphabet against a single protein!! A gap symbol from any alphabet > is really N gaps where N is the number of Alphabets (typically 1 as most > Alphabets are not compound). Unfortunately this very flexible design is as > you say not intuitive. > > Why would we even want this (apart from the trivial DNA to protein > example). The main reason comes from HMMs. The result of a pairwise > alignment using an HMM is a 3 part alignment of the query the state path > and the match. The query and match are probably from the same Alphabet but > the state path is a SymbolList made from the Alphabet of states in the > HMM. Similarly the alignment of a profile HMM to a protein is an alignment > of the HMM alphabet to the protein Alphabet. > > >> I may be getting a little confused because I'm new to this, and i have >> in mind what i would like to do, think it should be straight forward but >> am finding it not so easy. How then, do I get the gap symbol e.g. [] >> > >from an alignment (which may be DNA, RNA or protein) so I can check if > >> any column in the alignment contains 1 or more gaps? >> > > The reason why there cannot be a gap symbol that would work for all > occasions is because you can ask mulitple questions of gaps in an > alignment. Eg, is position i composed only of gaps?, does sequence j at > position i in the alignment contain a gap?, is there any gap in position > i? > > To answer the sequence j position i example the best (only really) way is > to get the symbol for position i (with label j) and test if it is equal to > j.getAlphabet.getGapSymbol(). To answer your question it is simply a > matter of performing the same operation at position i for all SymbolLists > at that position. > > >> I am also thinking of using distributions for doing this job in a more >> generic approach so that any symbol could be used, but i think i might >> end up with the same problems as I have here, so I think I should try to >> figure out this "simpler" problem first. >> > > There are methods in DistributionTools for calculating an array of > Distributions for an Alignment. This is not efficient for your purposes > because it also counts all the other residues and divides by the total > (weighted by the background model). If you only want to find gaps that is > a few more operations than you need. The approach above would be faster. > Also this DistributionTools method will not work for alignments with more > than one alphabet like the codon, protein one above (not a problem in your > case but not generic). > > >> It seems to me, that if i could have >> alignment.getAlphabet().getGapSymbol() to return the same thing as would >> proteinTools.getAlphabet().getGapSymbol() then my problem would be >> > solved. > > This can't be done as you would have to assume that all the SymbolLists in > the Alignment would have to be from the same Alphabet which is not > required (and not even desirable, especially for HMMs). > > - Mark > > Fantastic! Thanks for taking time to explaining this to me - it does make much more sense to do things this way now i understand things a little better. So, effectively, an alignment may be made from sequences that are composed of symbols from different alphabets. The solution I have is that I loop over the positions in the alignment, and then over the labels for that position, get the gap symbol for each label using: alignment.symbolListForLabel(label).getAlphabet().getGapSymbol() and then test if this symbol is the same as the symbol at position i for label j. :o) I think the main reason for my confusion, is that i'm trying to make a move from Bioperl to Biojava, and Bioperl has an alphabet for the whole alignment, therefore has a restriction that an alignment can only be comprised of sequences from the same alphabet. Thanks very much for your help! Nathan From dms700 at gmail.com Wed Aug 16 09:04:13 2006 From: dms700 at gmail.com (dmitriy) Date: Wed, 16 Aug 2006 09:04:13 -0400 Subject: [Biojava-l] ??? extracting introns sequences for transcripts using java API for ensembl ??? In-Reply-To: <299614de0608151646t57eb2653m519c83b049607afd@mail.gmail.com> References: <299614de0608151646t57eb2653m519c83b049607afd@mail.gmail.com> Message-ID: <299614de0608160604u3a2ec573nc006033899883240@mail.gmail.com> Hi I'm trying to use ensembl java API to extract info on five Prime UTR, exons, introns,threePrimeUTR for transcripts corresponding to particular NM_xxxxx ref seq . Unfortunetly it looks like I incorrectly use API to get intron info. The following is the type of the code I try to use to get intron info. ---------------------------- int exon1EndOffsetRelativeToGeneStart = ((Exon)transcript.getExons().get(0)).getLocation().getEnd() - gene.getLocation().getStart(); int exon2StartOffsetRelativeToGeneStart = ((Exon)transcript.getExons().get(1)).getLocation().getStart() - gene.getLocation().getStart(); String intron1 = gene.getSequence().getString().substring(exon1EndOffsetRelativeToGeneStart + 1, exon2StartOffsetRelativeToGeneStart )); ------------------------------ This code would works for "ENST00000275493" EGFR NM_005228.3 , but would not work for many if not vast majority of genes. I would greatly appreciate info on correct way of getting intron info for transcript. Thank you Dmitriy From k_stellar at msn.com Wed Aug 30 16:34:14 2006 From: k_stellar at msn.com (K.R. Carter) Date: Wed, 30 Aug 2006 16:34:14 -0400 Subject: [Biojava-l] SCF file wont load from URL Message-ID: <5d9376b50608301334m6e3288ffr1336cf68e1cbe08c@mail.gmail.com> Hello, I am trying to load an scf file by using the input stream from a url and it will not load. Does anyone know what might be happening? My program doesnt give an error, it just completely freezes. I am using the latest ( i think) version of SCF class. /** * A {@link org.biojava.bio.chromatogram.Chromatogram} as loaded from an * SCF v2 or v3 file. Also loads and exposes the SCF format's "private data" * and "comments" sections. The quality values from the SCF are stored as * additional sequences on the base call alignment. The labels are the * PROB_* constants in this class. * The values are {@link org.biojava.bio.symbol.IntegerAlphabet.IntegerSymbol} * objects in the range 0 to 255. * * * @author Rhett Sutphin (UI CBCB) */ any help would be greatly appreciated. Thanks! From mark.schreiber at novartis.com Thu Aug 31 03:01:32 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 31 Aug 2006 15:01:32 +0800 Subject: [Biojava-l] SCF file wont load from URL Message-ID: Hi - This sounds very strange. Is there any stack trace? Could you possibly post the code that recreates the problem? - Mark "K.R. Carter" Sent by: biojava-l-bounces at lists.open-bio.org 08/31/2006 04:34 AM Please respond to kikia.reneese To: biojava-l at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] SCF file wont load from URL Hello, I am trying to load an scf file by using the input stream from a url and it will not load. Does anyone know what might be happening? My program doesnt give an error, it just completely freezes. I am using the latest ( i think) version of SCF class. /** * A {@link org.biojava.bio.chromatogram.Chromatogram} as loaded from an * SCF v2 or v3 file. Also loads and exposes the SCF format's "private data" * and "comments" sections. The quality values from the SCF are stored as * additional sequences on the base call alignment. The labels are the * PROB_* constants in this class. * The values are {@link org.biojava.bio.symbol.IntegerAlphabet.IntegerSymbol} * objects in the range 0 to 255. * * * @author Rhett Sutphin (UI CBCB) */ any help would be greatly appreciated. Thanks! _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From ady at sanger.ac.uk Thu Aug 31 04:47:51 2006 From: ady at sanger.ac.uk (Andy Yates) Date: Thu, 31 Aug 2006 09:47:51 +0100 Subject: [Biojava-l] SCF file wont load from URL In-Reply-To: References: Message-ID: <44F6A237.1060302@sanger.ac.uk> That sounds like http proxy problems in my book. Try looking at this page: http://mindprod.com/jgloss/proxy.html The main thing to take home is try setting the system properties: proxySet=true http.proxyHost=proxyHostName http.proxyPort=proxyHostPort You can do this programatically using the System.setProperty() method or with -DpropertyName=propertyValue from the command line. Hope that helps, Andy Yates mark.schreiber at novartis.com wrote: > Hi - > > This sounds very strange. Is there any stack trace? Could you possibly > post the code that recreates the problem? > > - Mark > > > > > > "K.R. Carter" > Sent by: biojava-l-bounces at lists.open-bio.org > 08/31/2006 04:34 AM > Please respond to kikia.reneese > > > To: biojava-l at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] SCF file wont load from URL > > > Hello, > > I am trying to load an scf file by using the input stream from a url and > it > will not load. Does anyone know what might be happening? My program doesnt > give an error, it just completely freezes. I am using the latest ( i > think) > version of SCF class. > > > /** > * A {@link org.biojava.bio.chromatogram.Chromatogram} as loaded from an > * SCF v2 or v3 file. Also loads and exposes the SCF format's "private > data" > * and "comments" sections. The quality values from the SCF are stored as > * additional sequences on the base call alignment. The labels are the > * PROB_* constants in this class. > * The values are {@link > org.biojava.bio.symbol.IntegerAlphabet.IntegerSymbol} > * objects in the range 0 to 255. > * > * > * @author Rhett Sutphin (UI CBCB) > */ > > any help would be greatly appreciated. > > Thanks! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From neil at cambia.org Tue Aug 1 07:41:48 2006 From: neil at cambia.org (Neil Bacon) Date: Tue, 01 Aug 2006 17:41:48 +1000 Subject: [Biojava-l] three-letter Protein alphabet names Message-ID: <44CF05BC.6020509@cambia.org> Hi, I'm looking at extending biojava sequence io to read sequences from patents (initially current US data formats, later perhaps older formats and other jurisdictions). Anyone done this already or interested? Protein data uses 3-letter codes. I found an old posting about 3-letter codes: [Biojava-dev] Protein alphabet names http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html >/ - Add an additional tokenization (probably called />/ "three-letter" />/ unless someone comes up with a better />/ suggestion) for people />/ who actually want 3-letter codes. / Did this happen (I can't find it)? I'll try extending WordTokenization to do this unless someone has already done it or can advise me better (I'm new here and advice would be very welcome). Cheers, Neil Bacon From richard.holland at ebi.ac.uk Tue Aug 1 08:15:52 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 01 Aug 2006 09:15:52 +0100 Subject: [Biojava-l] Problem Inserting Genbank File In-Reply-To: <44CF07D70200004E00003B08@gwc2cn06.its.mq.edu.au> References: <44CF07D70200004E00003B08@gwc2cn06.its.mq.edu.au> Message-ID: <1154420152.4151.25.camel@localhost.localdomain> Hello. I have a sneaking suspicion I know what is wrong, but I can't tell for sure without seeing your full source code. Could you post that? It'd certainly help a lot in trying to find the exact cause of the problem. cheers, Richard On Tue, 2006-08-01 at 07:50 +1000, Michael Joss wrote: > Hi all, > I am pretty new to this whole BioJava/BioJavaX thing. I thought > I would start with something reasonably basic. At least what I thought > would be. I wanted to open a Genbank file and save it into a BioSQL DB. > I have got the BioSQL Database all running and BioJava and BioJavaX seem > to be working ok ( I might have messed up some stuff along the way but > it does appear to be working). I can open the file and can convert it to > fasta etc .. all the code was found in various examples. When I use > session.saveOrUpdate: > > BufferedReader br = new BufferedReader(new > FileReader("C:/CODE/AY928791.GBANK")); > // a namespace to override that in the file > Namespace ns = RichObjectFactory.getDefaultNamespace(); > // we are reading DNA sequences > RichSequenceIterator seqs = > RichSequence.IOTools.readGenbankDNA(br,ns); > while (seqs.hasNext()) { > RichSequence rs = seqs.nextRichSequence(); > session.saveOrUpdate("Sequence",rs); > } > > I get an error saying it can't insert a taxon, the taxon and taxon_name > tables seem to be populated correctly and I am not sure how to work out > why its attempting to insert a taxon that is already there? I just don't > know enough about .. well anything.. but hibernate in particular. Any > ideas? > If you need anything else please let me know? The file is simply a > single genbank record with locus the same name as the file. I tried a > few others and got the same result. I am using the latest CVS of > BioJavaX and BioJava 1.4 and Hibernate 3.1. > > Cheers > > Joss > > 6860 [main] DEBUG org.hibernate.engine.Cascade - processing cascade > ACTION_SAVE_UPDATE for: Sequence > 6860 [main] DEBUG org.hibernate.engine.CascadingAction - cascading to > saveOrUpdate: Taxon > 6860 [main] DEBUG org.hibernate.event.def.AbstractSaveEventListener - > transient instance of: Taxon > 6860 [main] DEBUG > org.hibernate.event.def.DefaultSaveOrUpdateEventListener - saving > transient instance > 6860 [main] DEBUG org.hibernate.event.def.AbstractSaveEventListener - > saving [Taxon#] > 6860 [main] DEBUG org.hibernate.event.def.AbstractSaveEventListener - > executing insertions > 6860 [main] DEBUG org.hibernate.event.def.WrapVisitor - Wrapped > collection in role: Taxon.nameSet > 6875 [main] DEBUG org.hibernate.persister.entity.AbstractEntityPersister > - Inserting entity: Taxon (native id) > 6875 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - about to open > PreparedStatement (open PreparedStatements: 0, globally: 0) > 6875 [main] DEBUG org.hibernate.SQL - insert into taxon (ncbi_taxon_id, > node_rank, genetic_code, mito_genetic_code, left_value, right_value, > parent_taxon_id) values (?, ?, ?, ?, ?, ?, ?) > 6875 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - preparing > statement > 6891 [main] DEBUG org.hibernate.persister.entity.AbstractEntityPersister > - Dehydrating entity: [Taxon#] > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding '36865' to > parameter: 1 > 6891 [main] DEBUG org.hibernate.type.StringType - binding null to > parameter: 2 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 3 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 4 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 5 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 6 > 6891 [main] DEBUG org.hibernate.type.IntegerType - binding null to > parameter: 7 > 6953 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - about to close > PreparedStatement (open PreparedStatements: 1, globally: 1) > 6953 [main] DEBUG org.hibernate.jdbc.AbstractBatcher - closing > statement > 6953 [main] DEBUG org.hibernate.util.JDBCExceptionReporter - could not > insert: [Taxon] [insert into taxon (ncbi_taxon_id, node_rank, > genetic_code, mito_genetic_code, left_value, right_value, > parent_taxon_id) values (?, ?, ?, ?, ?, ?, ?)] > java.sql.SQLException: Duplicate entry '36865' for key 2 > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From richard.holland at ebi.ac.uk Tue Aug 1 08:19:36 2006 From: richard.holland at ebi.ac.uk (Richard Holland) Date: Tue, 01 Aug 2006 09:19:36 +0100 Subject: [Biojava-l] three-letter Protein alphabet names In-Reply-To: <44CF05BC.6020509@cambia.org> References: <44CF05BC.6020509@cambia.org> Message-ID: <1154420376.4151.28.camel@localhost.localdomain> I'm not sure, but it should simply be a matter of defining an alphabet where each symbol in the alphabet is a 3-letter combo. Then you can use the alphabet to tokenize the input string appropriately. Mark will know more about this than me. Mark - comments? cheers, Richard On Tue, 2006-08-01 at 17:41 +1000, Neil Bacon wrote: > Hi, > I'm looking at extending biojava sequence io to read sequences from > patents (initially current US data formats, later perhaps older formats > and other jurisdictions). > Anyone done this already or interested? > > Protein data uses 3-letter codes. I found an old posting about 3-letter > codes: > > [Biojava-dev] Protein alphabet names > http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html > > >/ - Add an additional tokenization (probably called > />/ "three-letter" > />/ unless someone comes up with a better > />/ suggestion) for people > />/ who actually want 3-letter codes. > / > > Did this happen (I can't find it)? > I'll try extending WordTokenization to do this unless someone has > already done it or can advise me better (I'm new here and advice would > be very welcome). > > Cheers, > Neil Bacon > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland (BioMart Team) EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UNITED KINGDOM Tel: +44-(0)1223-494416 From mark.schreiber at novartis.com Tue Aug 1 08:20:30 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 1 Aug 2006 16:20:30 +0800 Subject: [Biojava-l] three-letter Protein alphabet names Message-ID: You mean something like .. Pro Ala Tyr Then yes in this case you would want to make a WordTokenization. Best regards, - Mark Neil Bacon Sent by: biojava-l-bounces at lists.open-bio.org 08/01/2006 03:41 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] three-letter Protein alphabet names Hi, I'm looking at extending biojava sequence io to read sequences from patents (initially current US data formats, later perhaps older formats and other jurisdictions). Anyone done this already or interested? Protein data uses 3-letter codes. I found an old posting about 3-letter codes: [Biojava-dev] Protein alphabet names http://lists.open-bio.org/pipermail/biojava-dev/2002-October/000143.html >/ - Add an additional tokenization (probably called />/ "three-letter" />/ unless someone comes up with a better />/ suggestion) for people />/ who actually want 3-letter codes. / Did this happen (I can't find it)? I'll try extending WordTokenization to do this unless someone has already done it or can advise me better (I'm new here and advice would be very welcome). Cheers, Neil Bacon _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mjoss at bio.mq.edu.au Tue Aug 1 12:41:03 2006 From: mjoss at bio.mq.edu.au (Michael Joss) Date: Tue, 01 Aug 2006 22:41:03 +1000 Subject: [Biojava-l] Problem Inserting Genbank File Message-ID: <44CFD8800200004E00003B5A@gwc2cn06.its.mq.edu.au> Hi Richard and fellow listers, Will teach me to be all gung-ho. Asking me to post the entire source code made me actually look at what I had written. I realised I had commented out a line that had been giving me trouble... I imagine it is actually useful for something.. maybe connecting BiojavaX to the DB ;) Thats what you get for patching a whole bunch of examples together when you have no idea what you are doing I guess. I am now getting an error with the previously commented code that doesn't make a lot of sense to me?? It says: Exception in thread "main" java.lang.IllegalArgumentException: Parameter must be a org.hibernate.Session object at org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder.(BioSQLRichObjectBuilder.java:68) at org.biojavax.RichObjectFactory.connectToBioSQL(RichObjectFactory.java:221) at biojavaxtest.Main.main(Main.java:45) but sess definitly is a org.hibernate.Session object.. isn't it? Sorry if I am being a pain its kinna tough learning Java and Biojava and hibernate at the same time. I am sure once I get the ball rolling I will be ok. Cheers Joss Source code | | V package biojavaxtest; import java.io.BufferedReader; import java.io.FileReader; import org.biojavax.Namespace; import org.biojavax.RichObjectFactory; import org.biojavax.bio.db.RichSequenceDB; import org.biojavax.bio.db.biosql.BioSQLRichSequenceDB; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; import org.hibernate.Session; import org.hibernate.SessionFactory; import org.hibernate.Transaction; import org.hibernate.cfg.Configuration; /** * * @author Joss */ public class Main { /** Creates a new instance of Main */ public Main() { } /** * @param args the command line arguments */ public static void main(String[] args) { org.apache.log4j.BasicConfigurator.configure(); SessionFactory sessionFactory = new Configuration().configure().buildSessionFactory(); // open the session Session sess = sessionFactory.openSession(); // connect it to BioJavaX RichObjectFactory.connectToBioSQL(sess); //### Was commented out in previous post ### Transaction tx = sess.beginTransaction(); try { // create the RichSequenceDB wrapper around the Hibernate session RichSequenceDB db = new BioSQLRichSequenceDB(sess); RichSequence seq1 = db.getRichSequence("AE012130"); // load the sequence where name='AE012130' BufferedReader br = new BufferedReader(new FileReader("C:/CODE/AY928791.GBANK")); // a namespace to override that in the file Namespace ns = RichObjectFactory.getDefaultNamespace(); // we are reading DNA sequences RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(br,ns); while (seqs.hasNext()) { RichSequence seq2 = seqs.nextRichSequence(); db.addRichSequence(seq2); } // add it to the database tx.commit(); System.out.println("Changes committed."); } catch (Exception e) { tx.rollback(); System.out.println("Changes rolled back."); e.printStackTrace(); } sess.close(); // disconnect from the database } } From mjoss at bio.mq.edu.au Wed Aug 2 01:07:11 2006 From: mjoss at bio.mq.edu.au (Michael Joss) Date: Wed, 02 Aug 2006 11:07:11 +1000 Subject: [Biojava-l] Problem Inserting Genbank File Message-ID: <44D0875F0200004E00003B7B@gwc2cn06.its.mq.edu.au> Hi Guys, Its OK you can offically ignore me now. I just rebuilt my BioJavaX.jar file from the latest CVS this morning and it is all working fine. No idea what the difference was.. but well.. I guess it really doesn't matter. Thanks for bearing with me while I work through my (apparently non existant) issues. Cheers Joss From pmaes3 at uz.kuleuven.ac.be Thu Aug 3 21:41:44 2006 From: pmaes3 at uz.kuleuven.ac.be (pmaes3) Date: Thu, 3 Aug 2006 14:41:44 -0700 (PDT) Subject: [Biojava-l] The Java sandbox and BioJava In-Reply-To: <44980650.9040007@andrew.cmu.edu> References: <44980650.9040007@andrew.cmu.edu> Message-ID: <5641280.post@talk.nabble.com> Hi, I have the same problem. Which I was able to solve exactly the way have described here. Unfortunately, when I try my applet online. I still get the same error message, although on my computer, the applet works perfect. Piet. -- View this message in context: http://www.nabble.com/The-Java-sandbox-and-BioJava-tf1818162.html#a5641280 Sent from the BioJava forum at Nabble.com. From mjoss at bio.mq.edu.au Fri Aug 4 01:16:39 2006 From: mjoss at bio.mq.edu.au (Michael Joss) Date: Fri, 04 Aug 2006 11:16:39 +1000 Subject: [Biojava-l] Managing Ranks? Message-ID: <44D32C980200004E00003C40@gwc2cn06.its.mq.edu.au> Hi all, Perhaps this is more of a Java/Programming/BioSQL question rather than a BioJavaX question in particular but I was wondering what people thought the best way to manage Ranks in Features and Locations was? Seems to me that iterating through every feature and location and adjusting their ranks everytime I add or remove a feature is a risky business. I can just see it getting out of hand and messy very quickly. Also I notice that the Genbank parsing etc does not set a rank initially for features or locations. Is there a reason for this since the file clearly places the features in rank order? I guess to explain a little more fully, I come from a database background, I have some experience with OO programming but not a whole heap so the object models are a little alien to me. Is there someway I could order a featureset by the minposition of each feaures location forinstance? Is the FeatureSet actually ordered by a whole number of things, or just Feature.rank? I can see the importance of rank for producing an arbritary order but if I wanted to keep things more rigid how would I go about adjusting the order of the set returned from RichSequence.getFeatureSet() to be ordered by feature.location.minposition? Am I thinking about this all the wrong way, I just keep wanting to query the database and get a nice ordered result set rather than working with objects as the information stored in them seems to be so arbritarily ordered. Admittedly most of this is unimportant except for display purposes but still it seems like a mental leap I am just not making. Thanks in advance for any help you can offer. Cheers Joss As an aside some explanation of what brings me here. Essentially what I am trying to do overall is use BioJavaX to make a JSP front end for a BioSQL Database. Just be able to add sequences, and annotations, and search them via a web interface, then visualise the data, sequence data, protein translations etc. Will be adding a whole lot more later, but thats the starting plan. I had done all this for a database structure I had come up with but decided that the framework set up in BioSQL and The tools in BioJava would make my info a whole lot more transferrable and maintainable in the long run. Hence here I am. From johnson.biotech at gmail.com Fri Aug 4 15:59:56 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Fri, 4 Aug 2006 11:59:56 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI Message-ID: Hi Richard, I'm back for more help. I've just completed getting and parsing the entire human genome RefSeq list from NCBI. I'm not going to post my source code since the invoking code has been described by the gentlemen who started the original thread last month. The result of the parsing is such that out of ~28K sequences, 13 produced the exceptions below. I've used the latest biojava code from CVS, not quite sure what the problem is on these 13. Trying to get: NM_006145 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) -------------------------------------------------------------------------------- Trying to get: NM_000602 at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------- Trying to get: NM_006226 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ---------------------------------------------------------------------------------- Trying to get: NM_000371 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -------------------------------------------------------------------------------- Trying to get: NM_019072 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -------------------------------------------------------------------------------- Trying to get: NM_017884 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -------------------------------------------------------------------------------- Trying to get: NM_022107 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more --------------------------------------------------------------------------------- Trying to get: NM_031418 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more --------------------------------------------------------------------------------------- Trying to get: NM_030809 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------- Trying to get: NM_032731 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------ Trying to get: NM_001029888 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------ Trying to get: NM_001029869 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more ------------------------------------------------------------------------------------ Trying to get: NM_182572 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 3 more Caused by: java.lang.IllegalArgumentException: Could not find constructor for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class java.lang.String,null) at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:78) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java :104) at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:387) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 4 more Caused by: java.lang.NullPointerException at org.biojavax.SimpleRichObjectBuilder.buildObject( SimpleRichObjectBuilder.java:59) ... 7 more -- Best Regards, Seth Johnson Senior Bioinformatics Associate From david at autohandle.com Fri Aug 4 18:57:04 2006 From: david at autohandle.com (David Scott) Date: Fri, 04 Aug 2006 11:57:04 -0700 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: Message-ID: <44D39880.9020202@autohandle.com> hi seth- the 3rd argument to SimpleDocRef constructor is the REFERENCE title - which appears to be null in the trace - which happens, but rarely. i had the exact same problem recently - and richard put in code to check for a null title and then call a special 2 argument constructor for SimpleDocRef - any chance you don't have that code checked out? best- david Seth Johnson wrote: > Hi Richard, > > > I'm back for more help. I've just completed getting and parsing the entire > human genome RefSeq list from NCBI. I'm not going to post my source code > since the invoking code has been described by the gentlemen who started the > original thread last month. The result of the parsing is such that out of > ~28K sequences, 13 produced the exceptions below. I've used the latest > biojava code from CVS, not quite sure what the problem is on these 13. > > > > Trying to get: NM_006145 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > -------------------------------------------------------------------------------- > > Trying to get: NM_000602 > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------- > > Trying to get: NM_006226 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ---------------------------------------------------------------------------------- > > Trying to get: NM_000371 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > -------------------------------------------------------------------------------- > > Trying to get: NM_019072 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > -------------------------------------------------------------------------------- > > Trying to get: NM_017884 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > -------------------------------------------------------------------------------- > > Trying to get: NM_022107 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > --------------------------------------------------------------------------------- > > Trying to get: NM_031418 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > --------------------------------------------------------------------------------------- > > Trying to get: NM_030809 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------- > > Trying to get: NM_032731 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------ > > Trying to get: NM_001029888 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------ > > Trying to get: NM_001029869 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > ------------------------------------------------------------------------------------ > > Trying to get: NM_182572 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 3 more > > Caused by: java.lang.IllegalArgumentException: Could not find constructor > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > java.lang.String,null) > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:78) > > at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > :104) > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:387) > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > SimpleRichObjectBuilder.java:59) > > ... 7 more > > > > > From johnson.biotech at gmail.com Fri Aug 4 20:05:57 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Fri, 4 Aug 2006 16:05:57 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44D39880.9020202@autohandle.com> References: <44D39880.9020202@autohandle.com> Message-ID: Hi David, I compiled my biojava.jar on 7/28 and, just to make sure, I've updated my biojava-live cvs just now and it doesn't look like there were any changes made since that date. The SimpleDocRef.java was last updated on 7/18 and the version that I have does include the second constructor with 2 parameters. It seems to be related to null TITLE since all of the entries are missing it, but I was also under the impression that null TITLE issue was fixed. That's what is so puzzling about this. Below is the list of the problem accession IDs if you'd like to replicate the exception: NM_006145 NM_000602 NM_006226 NM_000371 NM_019072 NM_017884 NM_022107 NM_031418 NM_030809 NM_032731 NM_001029888 NM_001029869 NM_182572 On 8/4/06, David Scott wrote: > > hi seth- > > the 3rd argument to SimpleDocRef constructor is the REFERENCE title - > which appears to be null in the trace - which happens, but rarely. i had > the exact same problem recently - and richard put in code to check for a > null title and then call a special 2 argument constructor for > SimpleDocRef - any chance you don't have that code checked out? > > best- > david > > Seth Johnson wrote: > > Hi Richard, > > > > > > I'm back for more help. I've just completed getting and parsing the > entire > > human genome RefSeq list from NCBI. I'm not going to post my source > code > > since the invoking code has been described by the gentlemen who started > the > > original thread last month. The result of the parsing is such that out > of > > ~28K sequences, 13 produced the exceptions below. I've used the latest > > biojava code from CVS, not quite sure what the problem is on these 13. > > > > > > > > Trying to get: NM_006145 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_000602 > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------- > > > > Trying to get: NM_006226 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ---------------------------------------------------------------------------------- > > > > Trying to get: NM_000371 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_019072 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_017884 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > Trying to get: NM_022107 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------- > > > > Trying to get: NM_031418 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------------- > > > > Trying to get: NM_030809 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------- > > > > Trying to get: NM_032731 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > Trying to get: NM_001029888 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > Trying to get: NM_001029869 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > Trying to get: NM_182572 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java > :162) > > > > at exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at org.biojavax.RichObjectFactory.getObject( > RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > > > > > > > > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From david at autohandle.com Fri Aug 4 20:52:47 2006 From: david at autohandle.com (David Scott) Date: Fri, 04 Aug 2006 13:52:47 -0700 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: <44D39880.9020202@autohandle.com> Message-ID: <44D3B39F.6000508@autohandle.com> hi seth- you are right - the fix for null titles was put in BioSQLRichObjectBuilder on 7.19 - i guess there must be a bug in the fix - i offered to check the fix - so i'll have to hang my head in shame. i'm looking at the code now - not the 1st time for me and i don't see the problem. i'll try one of your test cases - but i can't get to it until tomorrow. don't tell anyone i messed up- david Seth Johnson wrote: > Hi David, > > I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > my biojava-live cvs just now and it doesn't look like there were any > changes made since that date. The SimpleDocRef.java was last updated > on 7/18 and the version that I have does include the second > constructor with 2 parameters. It seems to be related to null TITLE > since all of the entries are missing it, but I was also under the > impression that null TITLE issue was fixed. That's what is so puzzling > about this. Below is the list of the problem accession IDs if you'd > like to replicate the exception: > > NM_006145 > NM_000602 > NM_006226 > NM_000371 > NM_019072 > NM_017884 > NM_022107 > NM_031418 > NM_030809 > NM_032731 > NM_001029888 > NM_001029869 > NM_182572 > > On 8/4/06, *David Scott* > wrote: > > hi seth- > > the 3rd argument to SimpleDocRef constructor is the REFERENCE title - > which appears to be null in the trace - which happens, but rarely. > i had > the exact same problem recently - and richard put in code to check > for a > null title and then call a special 2 argument constructor for > SimpleDocRef - any chance you don't have that code checked out? > > best- > david > > Seth Johnson wrote: > > Hi Richard, > > > > > > I'm back for more help. I've just completed getting and parsing > the entire > > human genome RefSeq list from NCBI. I'm not going to post my > source code > > since the invoking code has been described by the gentlemen who > started the > > original thread last month. The result of the parsing is such > that out of > > ~28K sequences, 13 produced the exceptions below. I've used the > latest > > biojava code from CVS, not quite sure what the problem is on > these 13. > > > > > > > > Trying to get: NM_006145 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_000602 > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------- > > > > > Trying to get: NM_006226 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ---------------------------------------------------------------------------------- > > > > > Trying to get: NM_000371 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_019072 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_017884 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_022107 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------- > > > > > Trying to get: NM_031418 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------------- > > > > > Trying to get: NM_030809 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------- > > > > > Trying to get: NM_032731 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029888 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029869 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_182572 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > > > > > > > > > > > -- > Best Regards, > > > Seth Johnson > Senior Bioinformatics Associate > > Ph: (202) 470-0900 > Fx: (775) 251-0358 From david at autohandle.com Sat Aug 5 22:55:23 2006 From: david at autohandle.com (David Scott) Date: Sat, 05 Aug 2006 15:55:23 -0700 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: <44D39880.9020202@autohandle.com> Message-ID: <44D521DB.2080601@autohandle.com> hi seth- nm_006145 loaded for me and i recreated the genbank entry with only minor differences - now, i'm at a loss - maybe we should wait for the other guys to get online after the weekend - they are much better at this remote debugging than i am. sorry i just couldn't help- david Seth Johnson wrote: > Hi David, > > I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > my biojava-live cvs just now and it doesn't look like there were any > changes made since that date. The SimpleDocRef.java was last updated > on 7/18 and the version that I have does include the second > constructor with 2 parameters. It seems to be related to null TITLE > since all of the entries are missing it, but I was also under the > impression that null TITLE issue was fixed. That's what is so puzzling > about this. Below is the list of the problem accession IDs if you'd > like to replicate the exception: > > NM_006145 > NM_000602 > NM_006226 > NM_000371 > NM_019072 > NM_017884 > NM_022107 > NM_031418 > NM_030809 > NM_032731 > NM_001029888 > NM_001029869 > NM_182572 > > On 8/4/06, *David Scott* > wrote: > > hi seth- > > the 3rd argument to SimpleDocRef constructor is the REFERENCE title - > which appears to be null in the trace - which happens, but rarely. > i had > the exact same problem recently - and richard put in code to check > for a > null title and then call a special 2 argument constructor for > SimpleDocRef - any chance you don't have that code checked out? > > best- > david > > Seth Johnson wrote: > > Hi Richard, > > > > > > I'm back for more help. I've just completed getting and parsing > the entire > > human genome RefSeq list from NCBI. I'm not going to post my > source code > > since the invoking code has been described by the gentlemen who > started the > > original thread last month. The result of the parsing is such > that out of > > ~28K sequences, 13 produced the exceptions below. I've used the > latest > > biojava code from CVS, not quite sure what the problem is on > these 13. > > > > > > > > Trying to get: NM_006145 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_000602 > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------- > > > > > Trying to get: NM_006226 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ---------------------------------------------------------------------------------- > > > > > Trying to get: NM_000371 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_019072 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_017884 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > -------------------------------------------------------------------------------- > > > > > Trying to get: NM_022107 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------- > > > > > Trying to get: NM_031418 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > --------------------------------------------------------------------------------------- > > > > > Trying to get: NM_030809 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------- > > > > > Trying to get: NM_032731 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029888 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_001029869 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException: Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence ( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > ------------------------------------------------------------------------------------ > > > > > Trying to get: NM_182572 > > > > org.biojava.bio.BioException: Failed to read Genbank sequence > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java :157) > > > > at > exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > > > > at > exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > > > > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java:195) > > > > Caused by: org.biojava.bio.BioException: Could not read sequence > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:112) > > > > at > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > > GenbankRichSequenceDB.java:153) > > > > ... 3 more > > > > Caused by: java.lang.IllegalArgumentException : Could not find > constructor > > for class org.biojavax.SimpleDocRef(class java.util.ArrayList,class > > java.lang.String,null) > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:78) > > > > at > org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > > :104) > > > > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > > GenbankFormat.java:387) > > > > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > > RichStreamReader.java:109) > > > > ... 4 more > > > > Caused by: java.lang.NullPointerException > > > > at org.biojavax.SimpleRichObjectBuilder.buildObject( > > SimpleRichObjectBuilder.java:59) > > > > ... 7 more > > > > > > > > > > > > > > > -- > Best Regards, > > > Seth Johnson > Senior Bioinformatics Associate > > Ph: (202) 470-0900 > Fx: (775) 251-0358 From n.haigh at sheffield.ac.uk Mon Aug 7 07:17:22 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Mon, 07 Aug 2006 08:17:22 +0100 Subject: [Biojava-l] Iterating over alignment columns Message-ID: <44D6E902.1030306@sheffield.ac.uk> I'm returning to a project i started having a look at a few months ago and i still can't figure out how to do the following since i'm new to Java and Biojava. It seems to me that this should be easy to do since it is essentially why alignments are generated in the first instance - to infer homology of residues (symbols) at the same position (column) in the alignment. I want to be able to iterate over all positions in an alignment and then do something with the symbols at a given position, in my case calculate the proportion of each symbol at that position. I understand that i could loop over the length of the alignment, use an index to represent the position of the column in the alignment and generate a subalignment of length 1 for all labels. However, is this efficient and how would i access the symbols so i can calculate the proportions for each symbol at the position. I would really appreciate some hand holding on this, as i'm strugling to climb the steep learning curve of OOP, Java and Biojava :o( Thanks Nathan From czaleski at albany.edu Mon Aug 7 16:33:52 2006 From: czaleski at albany.edu (czaleski at albany.edu) Date: Mon, 7 Aug 2006 12:33:52 -0400 (EDT) Subject: [Biojava-l] How to - Collection of features only? Message-ID: <1397.72.226.74.171.1154968432.squirrel@webmail.albany.edu> Greetings, I have a question about coordinate-only data. I have a database with many flavors of annotation, and I build collections of objects from this data (usually Sequences). However I have an instance where I need to retrieve coordinate based data only. For instance, I'd fetch something like all 3' UTRs defined in my RefSeq table, and then build a .bed file to be loaded into UCSC's genome browser. In this case, I need the coordinates only (chromosome, txStart, txEnd) and I do not need the actual sequence. So basically I'd like to make a collection of Features. But since in the tutorial is says: "Features cannot be created independently of a sequence", how would I do this? I expect I could create a Sequence object with an empty or null String/SymbolList, and then add a single Feature to each... but this does not seem like it would be the intended solution. Is there some other method by which I could/should accomplish this? Thanks very much Chris From mark.schreiber at novartis.com Wed Aug 2 01:17:18 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 2 Aug 2006 09:17:18 +0800 Subject: [Biojava-l] Problem Inserting Genbank File Message-ID: An HTML attachment was scrubbed... URL: From edbeaty at charter.net Mon Aug 7 20:41:49 2006 From: edbeaty at charter.net (Dexter Riley) Date: Mon, 7 Aug 2006 13:41:49 -0700 (PDT) Subject: [Biojava-l] Getting a Slice of an Alignment In-Reply-To: <1152008150.3948.63.camel@texas.ebi.ac.uk> References: <5047818.post@talk.nabble.com> <1151335745.3938.40.camel@texas.ebi.ac.uk> <5049831.post@talk.nabble.com> <1151399858.3938.57.camel@texas.ebi.ac.uk> <5066891.post@talk.nabble.com> <1151421997.3938.91.camel@texas.ebi.ac.uk> <5072893.post@talk.nabble.com> <1151485076.3942.8.camel@texas.ebi.ac.uk> <1152008150.3948.63.camel@texas.ebi.ac.uk> Message-ID: <5695086.post@talk.nabble.com> I have time to think about the problem of creating a subalignment again. To see if I understand Richard's solution, you: Create a subalignment from the original alignment, at the desired location Iterate through each SymbolList in the alignment, and determine the offset of the SymbolList in the original alignment, determine the offset of the SymbolList in the subalignment, create a new SymbolList using these offsets. My main problem with doing this is that you create an Alignment to get the SymbolLists that represent the slice, which I then would use to...create an Alignment. Since all I really want is an Alignment view of a particular location slice of an Alignment, I really think your original idea of changing the behavior of AbstractULAlignment.SubULAlignment.symbolListForLabel() would be much more intuitive (at least to a new user like myself), and be at least one object lighter, and possibly faster to boot (can't say for sure since I'm not familiar with how AbstractULAlignment uses SubULAlignments.) Thanks, Ed -- View this message in context: http://www.nabble.com/Getting-a-Slice-of-an-Alignment-tf1849222.html#a5695086 Sent from the BioJava forum at Nabble.com. From mark.schreiber at novartis.com Tue Aug 8 02:37:58 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 8 Aug 2006 10:37:58 +0800 Subject: [Biojava-l] How to - Collection of features only? Message-ID: Hi - You can use a dummy sequence as the anchor for your sequences. org.biojava.bio.seq.SequenceTools.createDummy() - Mark czaleski at albany.edu Sent by: biojava-l-bounces at lists.open-bio.org 08/08/2006 12:33 AM Please respond to czaleski To: biojava-l at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] How to - Collection of features only? Greetings, I have a question about coordinate-only data. I have a database with many flavors of annotation, and I build collections of objects from this data (usually Sequences). However I have an instance where I need to retrieve coordinate based data only. For instance, I'd fetch something like all 3' UTRs defined in my RefSeq table, and then build a .bed file to be loaded into UCSC's genome browser. In this case, I need the coordinates only (chromosome, txStart, txEnd) and I do not need the actual sequence. So basically I'd like to make a collection of Features. But since in the tutorial is says: "Features cannot be created independently of a sequence", how would I do this? I expect I could create a Sequence object with an empty or null String/SymbolList, and then add a single Feature to each... but this does not seem like it would be the intended solution. Is there some other method by which I could/should accomplish this? Thanks very much Chris _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From n.haigh at sheffield.ac.uk Tue Aug 8 05:50:47 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 08 Aug 2006 05:50:47 +0000 Subject: [Biojava-l] Iterating over alignment columns Message-ID: <44D82637.2030800@sheffield.ac.uk> Apologies if this has come through more than once, I appear to be having some problems getting posts through. Anyway, I'm returning to a project i started having a look at a few months ago and i still can't figure out how to do the following since i'm new to Java and Biojava. It seems to me that this should be easy to do since it is essentially why alignments are generated in the first instance - to infer homology of residues (symbols) at the same position (column) in the alignment. I want to be able to iterate over all positions in an alignment and then do something with the symbols at a given position, in my case calculate the proportion of each symbol at that position. I understand that i could loop over the length of the alignment, use an index to represent the position of the column in the alignment and generate a subalignment of length 1 for all labels. However, is this efficient and how would i access the symbols so i can calculate the proportions for each symbol at the position. I would really appreciate some hand holding on this, as i'm strugling to climb the steep learning curve of OOP, Java and Biojava :o( Thanks Nathan From n.haigh at sheffield.ac.uk Tue Aug 8 15:18:23 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Tue, 08 Aug 2006 15:18:23 +0000 Subject: [Biojava-l] Iterating over alignment columns In-Reply-To: <44D85352.9070802@ebi.ac.uk> References: <44D6E902.1030306@sheffield.ac.uk> <44D85352.9070802@ebi.ac.uk> Message-ID: <44D8AB3F.7080004@sheffield.ac.uk> Richard Holland wrote: > Hello. > > Here's how you can do it: > > Alignment algn = ....; // get an alignment from somewhere > for (int col = 1; col <= algn.length(); col++) { > List symbols = new ArrayList(); > for (Iterator labels = algn.labelsAt(col); labels.hasNext(); ) { > Object label = labels.next(); > symbols.add(algn.symbolAt(label,col)); > } > // symbols now contains all symbols at column 'col' > // of the alignment. > } > > cheers, > Richard Thanks for this, it looks just like what i need. However, is the method labelsAt part of the latest CVS version of Biojava - they do not appear to be in Biojava 1.4 according to eclipse. Thanks Nathan From johnson.biotech at gmail.com Tue Aug 8 15:44:42 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Tue, 8 Aug 2006 11:44:42 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44D84F0C.50700@ebi.ac.uk> References: <44D39880.9020202@autohandle.com> <44D521DB.2080601@autohandle.com> <44D84F0C.50700@ebi.ac.uk> Message-ID: Hello, Works perfect now!!! Thanks for looking into it. On 8/8/06, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello all. > > Sorry for the delay - I was off sick yesterday. > > The bug had been fixed in the BioSQL side of things, but not when merely > loading sequences into memory without persisting them to the database. > I've been through and checked and hopefully fixed this now in CVS. Let > me know how you get on! > > cheers, > Richard > > > David Scott wrote: > > hi seth- > > > > nm_006145 loaded for me and i recreated the genbank entry with only > > minor differences - now, i'm at a loss - maybe we should wait for the > > other guys to get online after the weekend - they are much better at > > this remote debugging than i am. > > > > sorry i just couldn't help- > > david > > > > Seth Johnson wrote: > >> Hi David, > >> > >> I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > >> my biojava-live cvs just now and it doesn't look like there were any > >> changes made since that date. The SimpleDocRef.java was last updated > >> on 7/18 and the version that I have does include the second > >> constructor with 2 parameters. It seems to be related to null TITLE > >> since all of the entries are missing it, but I was also under the > >> impression that null TITLE issue was fixed. That's what is so puzzling > >> about this. Below is the list of the problem accession IDs if you'd > >> like to replicate the exception: > >> > >> NM_006145 > >> NM_000602 > >> NM_006226 > >> NM_000371 > >> NM_019072 > >> NM_017884 > >> NM_022107 > >> NM_031418 > >> NM_030809 > >> NM_032731 > >> NM_001029888 > >> NM_001029869 > >> NM_182572 > >> > >> On 8/4/06, *David Scott* >> > wrote: > >> > >> hi seth- > >> > >> the 3rd argument to SimpleDocRef constructor is the REFERENCE title > - > >> which appears to be null in the trace - which happens, but rarely. > >> i had > >> the exact same problem recently - and richard put in code to check > >> for a > >> null title and then call a special 2 argument constructor for > >> SimpleDocRef - any chance you don't have that code checked out? > >> > >> best- > >> david > >> > >> Seth Johnson wrote: > >> > Hi Richard, > >> > > >> > > >> > I'm back for more help. I've just completed getting and parsing > >> the entire > >> > human genome RefSeq list from NCBI. I'm not going to post my > >> source code > >> > since the invoking code has been described by the gentlemen who > >> started the > >> > original thread last month. The result of the parsing is such > >> that out of > >> > ~28K sequences, 13 produced the exceptions below. I've used the > >> latest > >> > biojava code from CVS, not quite sure what the problem is on > >> these 13. > >> > > >> > > >> > > >> > Trying to get: NM_006145 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000602 > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_006226 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ---------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000371 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_019072 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_017884 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_022107 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_031418 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_030809 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_032731 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029888 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029869 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_182572 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> -- > >> Best Regards, > >> > >> > >> Seth Johnson > >> Senior Bioinformatics Associate > >> > >> Ph: (202) 470-0900 > >> Fx: (775) 251-0358 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFE2E8K4C5LeMEKA/QRAsZlAKCUyzOv2z94PViXbx2i3RVTXCfn1gCfbxEU > oCGhefHeLFUxIrHLgZJHuQ0= > =SMVh > -----END PGP SIGNATURE----- > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From johnson.biotech at gmail.com Tue Aug 8 15:44:42 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Tue, 8 Aug 2006 11:44:42 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44D84F0C.50700@ebi.ac.uk> References: <44D39880.9020202@autohandle.com> <44D521DB.2080601@autohandle.com> <44D84F0C.50700@ebi.ac.uk> Message-ID: Hello, Works perfect now!!! Thanks for looking into it. On 8/8/06, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello all. > > Sorry for the delay - I was off sick yesterday. > > The bug had been fixed in the BioSQL side of things, but not when merely > loading sequences into memory without persisting them to the database. > I've been through and checked and hopefully fixed this now in CVS. Let > me know how you get on! > > cheers, > Richard > > > David Scott wrote: > > hi seth- > > > > nm_006145 loaded for me and i recreated the genbank entry with only > > minor differences - now, i'm at a loss - maybe we should wait for the > > other guys to get online after the weekend - they are much better at > > this remote debugging than i am. > > > > sorry i just couldn't help- > > david > > > > Seth Johnson wrote: > >> Hi David, > >> > >> I compiled my biojava.jar on 7/28 and, just to make sure, I've updated > >> my biojava-live cvs just now and it doesn't look like there were any > >> changes made since that date. The SimpleDocRef.java was last updated > >> on 7/18 and the version that I have does include the second > >> constructor with 2 parameters. It seems to be related to null TITLE > >> since all of the entries are missing it, but I was also under the > >> impression that null TITLE issue was fixed. That's what is so puzzling > >> about this. Below is the list of the problem accession IDs if you'd > >> like to replicate the exception: > >> > >> NM_006145 > >> NM_000602 > >> NM_006226 > >> NM_000371 > >> NM_019072 > >> NM_017884 > >> NM_022107 > >> NM_031418 > >> NM_030809 > >> NM_032731 > >> NM_001029888 > >> NM_001029869 > >> NM_182572 > >> > >> On 8/4/06, *David Scott* >> > wrote: > >> > >> hi seth- > >> > >> the 3rd argument to SimpleDocRef constructor is the REFERENCE title > - > >> which appears to be null in the trace - which happens, but rarely. > >> i had > >> the exact same problem recently - and richard put in code to check > >> for a > >> null title and then call a special 2 argument constructor for > >> SimpleDocRef - any chance you don't have that code checked out? > >> > >> best- > >> david > >> > >> Seth Johnson wrote: > >> > Hi Richard, > >> > > >> > > >> > I'm back for more help. I've just completed getting and parsing > >> the entire > >> > human genome RefSeq list from NCBI. I'm not going to post my > >> source code > >> > since the invoking code has been described by the gentlemen who > >> started the > >> > original thread last month. The result of the parsing is such > >> that out of > >> > ~28K sequences, 13 produced the exceptions below. I've used the > >> latest > >> > biojava code from CVS, not quite sure what the problem is on > >> these 13. > >> > > >> > > >> > > >> > Trying to get: NM_006145 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence ( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000602 > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_006226 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ---------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_000371 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_019072 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_017884 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > -------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_022107 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_031418 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > --------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_030809 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------- > >> > >> > > >> > Trying to get: NM_032731 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029888 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_001029869 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main(RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence ( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException: Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject ( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > ------------------------------------------------------------------------------------ > >> > >> > > >> > Trying to get: NM_182572 > >> > > >> > org.biojava.bio.BioException: Failed to read Genbank sequence > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java :157) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.updateRefSeq(RefSeqParser.java:162) > >> > > >> > at > >> exonhit.parsers.RefSeqParser.update(RefSeqParser.java:146) > >> > > >> > at exonhit.parsers.RefSeqParser.main (RefSeqParser.java > :195) > >> > > >> > Caused by: org.biojava.bio.BioException: Could not read sequence > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:112) > >> > > >> > at > >> org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > >> > GenbankRichSequenceDB.java:153) > >> > > >> > ... 3 more > >> > > >> > Caused by: java.lang.IllegalArgumentException : Could not find > >> constructor > >> > for class org.biojavax.SimpleDocRef(class java.util.ArrayList > ,class > >> > java.lang.String,null) > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:78) > >> > > >> > at > >> org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java > >> > :104) > >> > > >> > at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence > ( > >> > GenbankFormat.java:387) > >> > > >> > at > >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > >> > RichStreamReader.java:109) > >> > > >> > ... 4 more > >> > > >> > Caused by: java.lang.NullPointerException > >> > > >> > at org.biojavax.SimpleRichObjectBuilder.buildObject( > >> > SimpleRichObjectBuilder.java:59) > >> > > >> > ... 7 more > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> -- > >> Best Regards, > >> > >> > >> Seth Johnson > >> Senior Bioinformatics Associate > >> > >> Ph: (202) 470-0900 > >> Fx: (775) 251-0358 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFE2E8K4C5LeMEKA/QRAsZlAKCUyzOv2z94PViXbx2i3RVTXCfn1gCfbxEU > oCGhefHeLFUxIrHLgZJHuQ0= > =SMVh > -----END PGP SIGNATURE----- > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From n.haigh at sheffield.ac.uk Wed Aug 9 13:20:17 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Wed, 09 Aug 2006 13:20:17 +0000 Subject: [Biojava-l] Location Objects Message-ID: <44D9E111.3000007@sheffield.ac.uk> I apologies for the waffle but i'm a bit stuck with Location objects, and in particular if a Location object can be empty and if there are methods for creating, checking and testing for empty Locations. I'm writing some classes which use Location objects. It seems to me that it should be possible to create an empty Location object and also test if a Location object is empty. In addition, it would be good if the union method could handle these empty Location objects. What i would like to do, is to iterate over columns in an alignment and determine in that position should be added to the Location object. Ideally, what i would like to do is to create an empty Location object and then use the union method to add positions as i iterate over the alignment columns. In addition, i'm writing a method that takes an inverse of a Location given an alignment length using the exclude method. What i'm currently having to do is create a dummy Location object with position 0,0 - which doesn't make much sense since they should be +ve integers shouldn't they? Then while itereating over positions in the alignment, i use LocationTools.union with the dummy Location and a PointLocation object to add the current position to the Location object which is being built. The main problem i have is when i have an alignment of length 20 and a Location object coving all positions from 1-20 and i want to invert this Location object. It make sense to return an "empty" Location object, but i'm using a Location object with coordinates 0,0 for my purposes. At the moment my JUnit test for inverting the Location object shown above tests for a Location object with coordinates 0,0: Location inv = invLocation(LocationTools.makeLocation(1, 17), 17); assertEquals(LocationTools.makeLocation(0,0), inv); however i get an error like: junit.framework.AssertionFailedError: expected:<0> but was:<{}> Therefore, there must be a way to represent an empty Location object, but it may not be fully implemented. If i use getMin on this inverted Location (which should be empty) i get a value of 2147483647 with no errors. Thanks Nathan From n.haigh at sheffield.ac.uk Wed Aug 9 15:13:49 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Wed, 09 Aug 2006 15:13:49 +0000 Subject: [Biojava-l] Alignment objects In-Reply-To: <44D9E948.5090309@ebi.ac.uk> References: <44D9E111.3000007@sheffield.ac.uk> <44D9E453.9080104@ebi.ac.uk> <44D9F54E.6060602@sheffield.ac.uk> <44D9E948.5090309@ebi.ac.uk> Message-ID: <44D9FBAD.3080408@sheffield.ac.uk> I think i'm having a few problem with alignments. I've generated an protein alignment in the following way: String alnString1 = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br1 = new BufferedReader(new StringReader(alnString1)); FastaAlignmentFormat faf1 = new FastaAlignmentFormat(); alignment = faf1.read( br1 ); If i loop over positions in the alignment to add the positions with gaps to a Location object, i have to do the following. It seems hacky since i'm having to check for symbol names containing "[]" in order to identify gaps. I'm sure there must be a better way to do this!? A better way would be to calculate the frequency of each symbol (including gaps) at a position in the alignment. This way i could return a list of these frequencies for each position which could be used by other methods for identifying positions with certain characteristic (such as those containing gaps) ...any ideas? for (int col = 1; col <= alignment.length(); col++) { for (Iterator labels = alignment.getLabels().iterator(); labels.hasNext(); ) { Object label = labels.next(); Symbol sym = alignment.symbolAt(label,col); if (sym.getName().contains("[]")) { Location newLocation = LocationTools.makeLocation(col, col); gapped = this.appendLocation(gapped, newLocation); } } } Cheers Nath From walsh at andrew.cmu.edu Wed Aug 9 15:59:52 2006 From: walsh at andrew.cmu.edu (Andrew Walsh) Date: Wed, 09 Aug 2006 11:59:52 -0400 Subject: [Biojava-l] Getting a Slice of an Alignment In-Reply-To: <5695086.post@talk.nabble.com> References: <5047818.post@talk.nabble.com> <1151335745.3938.40.camel@texas.ebi.ac.uk> <5049831.post@talk.nabble.com> <1151399858.3938.57.camel@texas.ebi.ac.uk> <5066891.post@talk.nabble.com> <1151421997.3938.91.camel@texas.ebi.ac.uk> <5072893.post@talk.nabble.com> <1151485076.3942.8.camel@texas.ebi.ac.uk> <1152008150.3948.63.camel@texas.ebi.ac.uk> <5695086.post@talk.nabble.com> Message-ID: <44DA0678.4040404@andrew.cmu.edu> I have just found a need to do something similar (i.e. extract select columns from an alignment) and have discovered that the subAlignment() implementation of the SimpleAlignment class does exactly what the original poster wants the method to do, and what the method description from the Alignment interface API suggests it should do. It returns an Alignment object which contains only those sequences indicated by the first argument (a Set of labels), and only the symbols from the columns specified in the second argument (a Location object). No further processing is needed to get just the symbols from the specified columns. These columns need not even be contiguous; it will work correctly with any arbitrary subset of the columns. It seems to me that since this method is part of the Alignment interface that all implementations should have the same behavior. They should provide the specified subalignment without the need for further processing. I would thus propose that a modification to the AbstractULAlignment.subAlignment() method (or the AbstractULAlignment.SubULAlignment(Set labels, Location loc) constructor, since this is where the actual work is done) be made to have it perform correctly. Other Alignment implementing classes may also need to be modified as well. -Andy Dexter Riley wrote: > I have time to think about the problem of creating a subalignment again. To > see if I understand Richard's solution, you: > Create a subalignment from the original alignment, at the desired location > Iterate through each SymbolList in the alignment, and > determine the offset of the SymbolList in the original alignment, > determine the offset of the SymbolList in the subalignment, > create a new SymbolList using these offsets. > > My main problem with doing this is that you create an Alignment to get the > SymbolLists that represent the slice, which I then would use to...create an > Alignment. Since all I really want is an Alignment view of a particular > location slice of an Alignment, I really think your original idea of > changing the behavior of > AbstractULAlignment.SubULAlignment.symbolListForLabel() would be much more > intuitive (at least to a new user like myself), and be at least one object > lighter, and possibly faster to boot (can't say for sure since I'm not > familiar with how AbstractULAlignment uses SubULAlignments.) > Thanks, > Ed > > From Russell.Smithies at agresearch.co.nz Thu Aug 10 01:59:34 2006 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 10 Aug 2006 13:59:34 +1200 Subject: [Biojava-l] [OFF TOPIC] NCBI eUtils PowerScripting course In-Reply-To: <44D9FBAD.3080408@sheffield.ac.uk> Message-ID: Hi all, I'm writing a proposal to attend the NCBI eUtils PowerScripting course and am looking for testiminials from past attendees. Has anyone on the list done the course? Was it worth while? Any comments greatly appreciated :-) Thanx, Russell Russell Smithies Bioinformatics Software Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mark.schreiber at novartis.com Thu Aug 10 02:17:55 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 10 Aug 2006 10:17:55 +0800 Subject: [Biojava-l] Location Objects Message-ID: There is the static member of Location, Location.empty , unfortunately it is package private. I have just made it public in biojava-live cause I can't see why it shouldn't be. You can actually get it from LocationTools buy performing an operation that doesn't make sense. For example do an intersection of two locations that don't intersect. If you are using biojava-live from CVS there is also RichLocation.EMPTY_LOCATION which is public. RichLocations are basically normal locations with more functionality I'm not certain that they will behaive in the way you expect. An EmptyLocation doens't really exist, we only really use it to avoid returning null. Thus the min value of an empty is MAX_INT and the max value is MIN_INT. This is because they have to have values and 0,0 could be a real location so we use this strangely inverted max an min which probably best represents some kind of black-hole or something?! I would be interested to know what happens when you try using one of the above for your example below. Hope this helps, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Nathan S. Haigh" Sent by: biojava-l-bounces at lists.open-bio.org 08/09/2006 09:20 PM To: biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Location Objects I apologies for the waffle but i'm a bit stuck with Location objects, and in particular if a Location object can be empty and if there are methods for creating, checking and testing for empty Locations. I'm writing some classes which use Location objects. It seems to me that it should be possible to create an empty Location object and also test if a Location object is empty. In addition, it would be good if the union method could handle these empty Location objects. What i would like to do, is to iterate over columns in an alignment and determine in that position should be added to the Location object. Ideally, what i would like to do is to create an empty Location object and then use the union method to add positions as i iterate over the alignment columns. In addition, i'm writing a method that takes an inverse of a Location given an alignment length using the exclude method. What i'm currently having to do is create a dummy Location object with position 0,0 - which doesn't make much sense since they should be +ve integers shouldn't they? Then while itereating over positions in the alignment, i use LocationTools.union with the dummy Location and a PointLocation object to add the current position to the Location object which is being built. The main problem i have is when i have an alignment of length 20 and a Location object coving all positions from 1-20 and i want to invert this Location object. It make sense to return an "empty" Location object, but i'm using a Location object with coordinates 0,0 for my purposes. At the moment my JUnit test for inverting the Location object shown above tests for a Location object with coordinates 0,0: Location inv = invLocation(LocationTools.makeLocation(1, 17), 17); assertEquals(LocationTools.makeLocation(0,0), inv); however i get an error like: junit.framework.AssertionFailedError: expected:<0> but was:<{}> Therefore, there must be a way to represent an empty Location object, but it may not be fully implemented. If i use getMin on this inverted Location (which should be empty) i get a value of 2147483647 with no errors. Thanks Nathan _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From n.haigh at sheffield.ac.uk Thu Aug 10 08:31:04 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 10 Aug 2006 08:31:04 +0000 Subject: [Biojava-l] Alignment objects In-Reply-To: <44D9F874.3060001@ebi.ac.uk> References: <44D9E111.3000007@sheffield.ac.uk> <44D9E453.9080104@ebi.ac.uk> <44D9F54E.6060602@sheffield.ac.uk> <44D9E948.5090309@ebi.ac.uk> <44D9FBAD.3080408@sheffield.ac.uk> <44D9F874.3060001@ebi.ac.uk> Message-ID: <44DAEEC8.9090209@sheffield.ac.uk> Richard Holland wrote: > You could change this: > > sym.getName().contains("[]") > > to this: > > AlphabetManager.getGapSymbol().equals(sym) > > Frequency calculations can be done quite quickly using DistributionTools: > > Distribution[] dists = DistributionTools.distOverAlignment(algn, > true); > // true says to include gaps in the statistics > // The dists array will have the same number of entries as there > // are columns in the alignment. > for (int i = 0; i < dists.length; i++) { > // i = 0 = first column in alignment > Distribution dist = dists[i]; > // Find out the weight for A in this column. > double AWeight = dist.getWeight(DNATools.a()); > // Find out the weight for gaps in this column. > double GapWeight = > dist.getWeight(DNATools.getDNA().getGapSymbol()); > } > > cheers, > Richard This is definitely getting close to what i need. However, i think i'm having trouble with alphabets which is stopping me from using soemthing like: AlphabetManager.getGapSymbol().equals(sym) I currently creating an alignment like this: String alnString1 = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br1 = new BufferedReader(new StringReader(alnString1)); FastaAlignmentFormat faf1 = new FastaAlignmentFormat(); aln1 = faf1.read( br1 ); And i never get true returned from: AlphabetManager.getGapSymbol().equals(sym) I assume this is because the mechanisms that are in place for setting the alphabet of the alignment are not correctly setting the gap symbol. The program i am writing should be capable of determining the alphabet of any alignment that is loaded, so it makes sense to change: AlphabetManager.getGapSymbol().equals(sym) to: alignment.getAlphabet.getGapSymbol().equals(sym) but this doesn't work either. Eventually i'd like my application to be able to load alignment from several different formats, some of which may use more than one symbol as the gap, while others have a "default" gap character. Are there mechanisms in place to attempt to correctly set the gapSymbol for an alignment? For example FASTA format alignments should probably set the gap symbol to the hyphen "-". Once again, being new to this, i am probably missing something that is obvious to you guys. Thanks for all your time end effort in helping me out. Nathan From mark.schreiber at novartis.com Thu Aug 10 07:56:42 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 10 Aug 2006 15:56:42 +0800 Subject: [Biojava-l] Alignment objects Message-ID: Hi - There is a difference between the gap returned by AlphabetManager.getGapSymbol and the gap returned by an alphabet.getGapSymbol(). There is some very complex reasons for this which could make up a large part of a thesis (literally, take a look at Matthew Pococks thesis some time). Simply speaking, dynamic programming and HMMs wouldn't work without it. It becomes especially obvious when you have an alignment. The alphabet of an alignment of 3 DNA sequences is DNAxDNAxDNA. Thus a gap from that alphabet is really gap x gap x gap. Depending on what you are trying to do you would want to test for Symbol s == align.getAlphabet().getGap() or Symbol s == DNATools.getDNA().getGap(). - Mark "Nathan S. Haigh" Sent by: biojava-l-bounces at lists.open-bio.org 08/10/2006 04:31 PM To: Richard Holland , biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Alignment objects Richard Holland wrote: > You could change this: > > sym.getName().contains("[]") > > to this: > > AlphabetManager.getGapSymbol().equals(sym) > > Frequency calculations can be done quite quickly using DistributionTools: > > Distribution[] dists = DistributionTools.distOverAlignment(algn, > true); > // true says to include gaps in the statistics > // The dists array will have the same number of entries as there > // are columns in the alignment. > for (int i = 0; i < dists.length; i++) { > // i = 0 = first column in alignment > Distribution dist = dists[i]; > // Find out the weight for A in this column. > double AWeight = dist.getWeight(DNATools.a()); > // Find out the weight for gaps in this column. > double GapWeight = > dist.getWeight(DNATools.getDNA().getGapSymbol()); > } > > cheers, > Richard This is definitely getting close to what i need. However, i think i'm having trouble with alphabets which is stopping me from using soemthing like: AlphabetManager.getGapSymbol().equals(sym) I currently creating an alignment like this: String alnString1 = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br1 = new BufferedReader(new StringReader(alnString1)); FastaAlignmentFormat faf1 = new FastaAlignmentFormat(); aln1 = faf1.read( br1 ); And i never get true returned from: AlphabetManager.getGapSymbol().equals(sym) I assume this is because the mechanisms that are in place for setting the alphabet of the alignment are not correctly setting the gap symbol. The program i am writing should be capable of determining the alphabet of any alignment that is loaded, so it makes sense to change: AlphabetManager.getGapSymbol().equals(sym) to: alignment.getAlphabet.getGapSymbol().equals(sym) but this doesn't work either. Eventually i'd like my application to be able to load alignment from several different formats, some of which may use more than one symbol as the gap, while others have a "default" gap character. Are there mechanisms in place to attempt to correctly set the gapSymbol for an alignment? For example FASTA format alignments should probably set the gap symbol to the hyphen "-". Once again, being new to this, i am probably missing something that is obvious to you guys. Thanks for all your time end effort in helping me out. Nathan _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From ckol at iti.gr Thu Aug 10 12:05:56 2006 From: ckol at iti.gr (ckol at iti.gr) Date: Thu, 10 Aug 2006 15:05:56 +0300 (EEST) Subject: [Biojava-l] Create MSA from a profile hmm Message-ID: <1217.155.207.19.4.1155211556.squirrel@mail.iti.gr> Hello all, I'm new in biojava and i reached a problem. i created a profile hmm from a set of unaligned homologues sequences then i aligned some sequences to the model.i noticed that every sequence has its own length for the alignment.is there any method by which i can create a multiple sequence alignment between these sequences? thanks in advance, ckol From ola.spjuth at farmbio.uu.se Fri Aug 11 15:08:26 2006 From: ola.spjuth at farmbio.uu.se (Ola Spjuth) Date: Fri, 11 Aug 2006 17:08:26 +0200 Subject: [Biojava-l] Bioclipse 1.0 released Message-ID: <8DD39D6D-FB01-411A-910D-CA23E29255FF@farmbio.uu.se> The Bioclipse team is proud to announce the release of Bioclipse 1.0, containing a BioJava plugin for parsing and visualizing sequences (currently only fasta sequences and uniprot features are supported). Bioclipse [1] is a free, open source, workbench for chemo- and bioinformatics with powerful editing and visualization capabilities for molecules, sequences, proteins, spectra etc. The major features of version 1.0 are: * Import and export in various file formats * Visual editing of molecular 2D-structures * 3D-visualization of molecules and proteins * Editing and visualization of sequences and features (DNA, RNA, proteins etc) * Graphing and editing of various types of spectra, e. g. NMR, MS * Retrieval of resources (sequences, proteins, etc) from public data repositories * Scripting of 3D-visualizations with syntax highlighting and content assistance * PDB-editor with syntax highlighting for working with PDB files * CMLRSS-viewer for downloading chemical content published on the web using RSS-feeds * Integrated, searchable help-system * Hierarchical view of molecular and macromolecular substructures and calculation of chemical properties * Connection with external programs, e. g. PyMol Bioclipse is a rich client, which means it is run on your local computer but also gives the possibility to communicate with servers for data retrieval and computational services. The powerful plugin architecture is based on Eclipse[2], and results in a responsive, integrated user interface designed for simple and intuitive operations that at the same time is easy to extend with custom functionality. There is much ongoing work with Bioclipse and new features are constantly added. Please visit the Bioclipse Wiki [3] in order to get the latest information regarding the development. Bioclipse is available for download from Sourceforge [4]. [1] Bioclipse homepage: http://www.bioclipse.net [2] Eclipse: http://www.eclipse.org [3] Bioclipse Wiki: http://wiki.bioclipse.net [4] Sourceforge project site: http://sourceforge.net/projects/bioclipse/ From johnson.biotech at gmail.com Sat Aug 12 17:17:57 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Sat, 12 Aug 2006 10:17:57 -0700 (PDT) Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: References: Message-ID: <5777810.post@talk.nabble.com> More problems with parsing nucleotide sequences from NCBI. Apparently, there's an odd dbxref tag on some of the sequences submitted by ATCC that causes an exception. I've ran into 2 so far, but I'm sure there are more: AA343569.1 AA325485.1 Exceptions produced are as follows: -------------------------------------------------------------- Trying to get: AA343569.1 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:157) at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java:250) at exonhit.parsers.EventParser.insertRglrSE(EventParser.java:197) at exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) at exonhit.parsers.EventParser.main(EventParser.java:310) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:153) ... 4 more Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC (inhost):145151, accession:AA343569 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:438) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) ... 5 more Java Result: -1 ========================================================= Trying to get: AA325485.1 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:157) at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java:250) at exonhit.parsers.EventParser.insertRglrSE(EventParser.java:197) at exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) at exonhit.parsers.EventParser.main(EventParser.java:312) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:153) ... 4 more Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC (inhost):125990, accession:AA325485 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:438) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:109) ... 5 more Java Result: -1 -- View this message in context: http://www.nabble.com/Parsing-Genbank-sequences-from-NCBI-tf2052235.html#a5777810 Sent from the BioJava forum at Nabble.com. From johnson.biotech at gmail.com Mon Aug 14 14:47:26 2006 From: johnson.biotech at gmail.com (Seth Johnson) Date: Mon, 14 Aug 2006 10:47:26 -0400 Subject: [Biojava-l] Parsing Genbank-sequences from NCBI In-Reply-To: <44E0357E.3040008@ebi.ac.uk> References: <5777810.post@talk.nabble.com> <44E0357E.3040008@ebi.ac.uk> Message-ID: Hi Richard, Apparently there are more problems. I get an exception while trying to retrieve BM353894.1 -------------------------------------------------------------- Trying to get: BM353894.1 org.biojava.bio.BioException: Failed to read Genbank sequence at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:157) at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java:250) at exonhit.parsers.EventParser.insertRglrSE(EventParser.java:197) at exonhit.parsers.EventParser.createSpliceEvents(EventParser.java :105) at exonhit.parsers.EventParser.main(EventParser.java:312) Caused by: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:112) at org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( GenbankRichSequenceDB.java:153) ... 4 more Caused by: org.biojava.bio.seq.io.ParseException at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( GenbankFormat.java:274) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( RichStreamReader.java:109) ... 5 more Java Result: -1 ------------------------------------------------------------- On 8/14/06, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I've made a small change to the regex which matches these so that it > will now accept spaces before the colon (previously, it didn't). > > Can you check out the latest from CVS and try again? > > cheers, > Richard > > Seth Johnson wrote: > > More problems with parsing nucleotide sequences from NCBI. Apparently, > > there's an odd dbxref tag on some of the sequences submitted by ATCC > that > > causes an exception. I've ran into 2 so far, but I'm sure there are > more: > > > > AA343569.1 > > AA325485.1 > > > > Exceptions produced are as follows: > > -------------------------------------------------------------- > > Trying to get: AA343569.1 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java > :250) > > at exonhit.parsers.EventParser.insertRglrSE(EventParser.java > :197) > > at > > exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) > > at exonhit.parsers.EventParser.main(EventParser.java:310) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 4 more > > Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC > > (inhost):145151, accession:AA343569 > > at > > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:438) > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 5 more > > Java Result: -1 > > ========================================================= > > Trying to get: AA325485.1 > > org.biojava.bio.BioException: Failed to read Genbank sequence > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:157) > > at exonhit.parsers.EventParser.getSeqFromNCBI(EventParser.java > :250) > > at exonhit.parsers.EventParser.insertRglrSE(EventParser.java > :197) > > at > > exonhit.parsers.EventParser.createSpliceEvents(EventParser.java:105) > > at exonhit.parsers.EventParser.main(EventParser.java:312) > > Caused by: org.biojava.bio.BioException: Could not read sequence > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:112) > > at > > org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence( > GenbankRichSequenceDB.java:153) > > ... 4 more > > Caused by: org.biojava.bio.seq.io.ParseException: Bad dbxref found: ATCC > > (inhost):125990, accession:AA325485 > > at > > org.biojavax.bio.seq.io.GenbankFormat.readRichSequence( > GenbankFormat.java:438) > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence( > RichStreamReader.java:109) > > ... 5 more > > Java Result: -1 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFE4DV+4C5LeMEKA/QRAtrTAKCjNFnkmhAF52LhvrpyurnRToe0LACgiEUs > GUmVcpkdByVWADCXvfKCsYE= > =ZBlJ > -----END PGP SIGNATURE----- > -- Best Regards, Seth Johnson Senior Bioinformatics Associate Ph: (202) 470-0900 Fx: (775) 251-0358 From mark.schreiber at novartis.com Tue Aug 15 01:18:08 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 15 Aug 2006 09:18:08 +0800 Subject: [Biojava-l] Alignment objects Message-ID: Hi Nathan - You are on the right track, almost. The alphabet of the alignment is PROTEIN x PROTEIN (possibly it is PROTEIN-TERM x PROTEIN-TERM). PROTEIN-TERM is the same as protein but contains a * symbol to represent a translated stop codon. Useful if someone translates the wrong reading frame. Thus the gap symbol of your alignment is gapxgap or [] [] as you found. The first symbol of your alignment is ([] Ala). The reason you find nothing with the gap symbol of the alignment is that there are no columns with only gaps. It is always gap x something or something x gap. To check for gaps in columns you could iterate like you have done with each individual sequence. In this case you would need to check for the gap symbol from the alphabet PROTEIN-TERM, or equivalently the gap symbol of the Alphabet of one of the SymbolLists from the alignment (specifically the one you are checking). You could also search make ambiguity symbols from the Alignment alphabet that contain gaps ([] X) gap with anything (X []) anything with gap and ([] []) gap with gap or the gap symbol of the Alignment. This approach is faster but for larger alignments requires more Symbols to check. It would be pretty easy to construct them recursively though. Hope this helps, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Nathan S. Haigh" 08/14/2006 04:00 PM To: mark.schreiber at novartis.com cc: Subject: Re: [Biojava-l] Alignment objects Hi Mark - this doesn't seem to be working as I'd expected/hoped. Let me just recap what I've got so far: I create an alignment (for testing purposes) like this: String alnString = ">seq1\n" + "----FGHIKLMNPQRST\n" + ">seq2\n" + "ACDEFGHIKLMNPQRST\n"; BufferedReader br = new BufferedReader(new StringReader(alnString)); FastaAlignmentFormat faf = new FastaAlignmentFormat(); alignment = faf.read( br ); I loop over columns of the alignment and test if there are any gaps in the column, I have shown 2 alternative if statements which are supposed to test if a gap is present. One of these works (but is a bit of a hack) and the other (which seems like the correct way to do things) doesn't work: for (int col = 1; col <= alignment.length(); col++) { for (Iterator labels = alignment.getLabels().iterator(); labels.hasNext(); ) { Symbol sym = alignment.symbolAt(labels.next(),col); if (sym.getName().contains("[]")) { // this currently works if (sym.equals(alignment.getAlphabet().getGapSymbol())) { // this doesn't work // add this col to a Location object } } } If I do: System.out.println(alignment.getAlphabet().getGapSymbol()); I get: org.biojava.bio.symbol.SimpleBasisSymbol: ([] []) I'm unsure exactly what I'm supposed to get here, but I suspect that the gap symbol isn't getting set correctly when I create the alignment. I really want to use the getGapSymbol method of the alignment, since the alignment a user may load in practice could be either nucleotide or amino acid. Cheers Nathan mark.schreiber at novartis.com wrote: > Sorry, that should be getGapSymbol(). > > - Mark > > > > > > > Nathan Haigh > 08/11/2006 06:12 PM > Please respond to n.haigh > > > To: mark.schreiber at novartis.com > cc: > Subject: Re: [Biojava-l] Alignment objects > > > mark.schreiber at novartis.com wrote: > >> Hi - >> >> There is a difference between the gap returned by >> AlphabetManager.getGapSymbol and the gap returned by an >> alphabet.getGapSymbol(). There is some very complex reasons for this >> > which > >> could make up a large part of a thesis (literally, take a look at >> > Matthew > >> Pococks thesis some time). Simply speaking, dynamic programming and HMMs >> wouldn't work without it. >> >> It becomes especially obvious when you have an alignment. The alphabet >> > of > >> an alignment of 3 DNA sequences is DNAxDNAxDNA. Thus a gap from that >> alphabet is really gap x gap x gap. >> >> Depending on what you are trying to do you would want to test for >> >> Symbol s == align.getAlphabet().getGap() >> >> or >> >> Symbol s == DNATools.getDNA().getGap(). >> >> - Mark >> >> >> > Is the getGap method part of the Biojava-live API but not the 1.4 API? > > Cheers > Nath > > > > > [ Attachment ''N.HAIGH.VCF'' removed by Mark Schreiber ] > > > -- > A: Yes. >> Q: Are you sure? >> >>> A: Because it reverses the logical flow of conversation. >>> >>>> Q: Why is top posting frowned upon? >>>> Get Thunderbird From mark.schreiber at novartis.com Tue Aug 15 10:02:13 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 15 Aug 2006 18:02:13 +0800 Subject: [Biojava-l] Alignment objects Message-ID: >OK, so this is where I'm going wrong - I thought a symbol was 1 >residue/amino acid/gap....why is this not the case? it seems rather >counter intuitive to ask for the gap symbol for an alignment and be >returned [] x n, where n=the number of sequences in the alignment. > It's all really down to inhertitence and the object model. Alignments implement SymbolList. Which makes sense really even though it is not intuitive. An alignment is really a sequence of Symbols that is N deep. It's alphabet therefore must be N deep. It is a compound Alphabet. This allows for enormous flexibility. It is perfectly possible to make an alignment which is ((DNAxDNAxDNA)xPROTEIN-TERM) which is an alignment of a codon alphabet to a protein. You can even do it as ((DNAxDNAxDNA)x(DNAxDNAxDNA)xPROTEIN-TERM) which is two dna sequences in codon alphabet against a single protein!! A gap symbol from any alphabet is really N gaps where N is the number of Alphabets (typically 1 as most Alphabets are not compound). Unfortunately this very flexible design is as you say not intuitive. Why would we even want this (apart from the trivial DNA to protein example). The main reason comes from HMMs. The result of a pairwise alignment using an HMM is a 3 part alignment of the query the state path and the match. The query and match are probably from the same Alphabet but the state path is a SymbolList made from the Alphabet of states in the HMM. Similarly the alignment of a profile HMM to a protein is an alignment of the HMM alphabet to the protein Alphabet. >I may be getting a little confused because I'm new to this, and i have >in mind what i would like to do, think it should be straight forward but >am finding it not so easy. How then, do I get the gap symbol e.g. [] >from an alignment (which may be DNA, RNA or protein) so I can check if >any column in the alignment contains 1 or more gaps? The reason why there cannot be a gap symbol that would work for all occasions is because you can ask mulitple questions of gaps in an alignment. Eg, is position i composed only of gaps?, does sequence j at position i in the alignment contain a gap?, is there any gap in position i? To answer the sequence j position i example the best (only really) way is to get the symbol for position i (with label j) and test if it is equal to j.getAlphabet.getGapSymbol(). To answer your question it is simply a matter of performing the same operation at position i for all SymbolLists at that position. >I am also thinking of using distributions for doing this job in a more >generic approach so that any symbol could be used, but i think i might >end up with the same problems as I have here, so I think I should try to >figure out this "simpler" problem first. There are methods in DistributionTools for calculating an array of Distributions for an Alignment. This is not efficient for your purposes because it also counts all the other residues and divides by the total (weighted by the background model). If you only want to find gaps that is a few more operations than you need. The approach above would be faster. Also this DistributionTools method will not work for alignments with more than one alphabet like the codon, protein one above (not a problem in your case but not generic). >It seems to me, that if i could have >alignment.getAlphabet().getGapSymbol() to return the same thing as would >proteinTools.getAlphabet().getGapSymbol() then my problem would be solved. This can't be done as you would have to assume that all the SymbolLists in the Alignment would have to be from the same Alphabet which is not required (and not even desirable, especially for HMMs). - Mark mark.schreiber at novartis.com wrote: > Hi Nathan - > > You are on the right track, almost. > > The alphabet of the alignment is PROTEIN x PROTEIN (possibly it is > PROTEIN-TERM x PROTEIN-TERM). PROTEIN-TERM is the same as protein but > contains a * symbol to represent a translated stop codon. Useful if > someone translates the wrong reading frame. > > > Thus the gap symbol of your alignment is gapxgap or [] [] as you found. > The first symbol of your alignment is ([] Ala). The reason you find > nothing with the gap symbol of the alignment is that there are no columns > with only gaps. It is always gap x something or something x gap. To check > for gaps in columns you could iterate like you have done with each > individual sequence. In this case you would need to check for the gap > symbol from the alphabet PROTEIN-TERM, or equivalently the gap symbol of > the Alphabet of one of the SymbolLists from the alignment (specifically > the one you are checking). > > You could also search make ambiguity symbols from the Alignment alphabet > that contain gaps ([] X) gap with anything (X []) anything with gap and > ([] []) gap with gap or the gap symbol of the Alignment. This approach is > faster but for larger alignments requires more Symbols to check. It would > be pretty easy to construct them recursively though. > > Hope this helps, > > - Mark > From mark.schreiber at novartis.com Wed Aug 16 01:07:19 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 16 Aug 2006 09:07:19 +0800 Subject: [Biojava-l] Alignment objects Message-ID: You've got it! There is actually one other way ... If you have an alignment 3 deep for DNA you can create and test for the following symbols ([], N, N) (N, [], N) (N, N, []). These will find sites with exactly one gap. You can then make ([], [], N) and (N. [] []) and ([], N, []) and ([],[],[]) which is the alignment gap symbol. This means you can test each site of the alignment for each of the symbols. It requires some work up front to make the symbols (possibly recursion) but if you put them in a HashMap then you can simply iterate over each site in the Alignment and see if the DNAxDNAxDNA symbol returned is contained in the HashMap. This may actually be more efficient. - Mark "Nathan S. Haigh" 08/15/2006 08:17 PM Please respond to n.haigh To: mark.schreiber at novartis.com cc: Subject: Re: [Biojava-l] Alignment objects mark.schreiber at novartis.com wrote: >> OK, so this is where I'm going wrong - I thought a symbol was 1 >> residue/amino acid/gap....why is this not the case? it seems rather >> counter intuitive to ask for the gap symbol for an alignment and be >> returned [] x n, where n=the number of sequences in the alignment. >> >> > > It's all really down to inhertitence and the object model. Alignments > implement SymbolList. Which makes sense really even though it is not > intuitive. An alignment is really a sequence of Symbols that is N deep. > It's alphabet therefore must be N deep. It is a compound Alphabet. This > allows for enormous flexibility. It is perfectly possible to make an > alignment which is ((DNAxDNAxDNA)xPROTEIN-TERM) which is an alignment of a > codon alphabet to a protein. You can even do it as > ((DNAxDNAxDNA)x(DNAxDNAxDNA)xPROTEIN-TERM) which is two dna sequences in > codon alphabet against a single protein!! A gap symbol from any alphabet > is really N gaps where N is the number of Alphabets (typically 1 as most > Alphabets are not compound). Unfortunately this very flexible design is as > you say not intuitive. > > Why would we even want this (apart from the trivial DNA to protein > example). The main reason comes from HMMs. The result of a pairwise > alignment using an HMM is a 3 part alignment of the query the state path > and the match. The query and match are probably from the same Alphabet but > the state path is a SymbolList made from the Alphabet of states in the > HMM. Similarly the alignment of a profile HMM to a protein is an alignment > of the HMM alphabet to the protein Alphabet. > > >> I may be getting a little confused because I'm new to this, and i have >> in mind what i would like to do, think it should be straight forward but >> am finding it not so easy. How then, do I get the gap symbol e.g. [] >> > >from an alignment (which may be DNA, RNA or protein) so I can check if > >> any column in the alignment contains 1 or more gaps? >> > > The reason why there cannot be a gap symbol that would work for all > occasions is because you can ask mulitple questions of gaps in an > alignment. Eg, is position i composed only of gaps?, does sequence j at > position i in the alignment contain a gap?, is there any gap in position > i? > > To answer the sequence j position i example the best (only really) way is > to get the symbol for position i (with label j) and test if it is equal to > j.getAlphabet.getGapSymbol(). To answer your question it is simply a > matter of performing the same operation at position i for all SymbolLists > at that position. > > >> I am also thinking of using distributions for doing this job in a more >> generic approach so that any symbol could be used, but i think i might >> end up with the same problems as I have here, so I think I should try to >> figure out this "simpler" problem first. >> > > There are methods in DistributionTools for calculating an array of > Distributions for an Alignment. This is not efficient for your purposes > because it also counts all the other residues and divides by the total > (weighted by the background model). If you only want to find gaps that is > a few more operations than you need. The approach above would be faster. > Also this DistributionTools method will not work for alignments with more > than one alphabet like the codon, protein one above (not a problem in your > case but not generic). > > >> It seems to me, that if i could have >> alignment.getAlphabet().getGapSymbol() to return the same thing as would >> proteinTools.getAlphabet().getGapSymbol() then my problem would be >> > solved. > > This can't be done as you would have to assume that all the SymbolLists in > the Alignment would have to be from the same Alphabet which is not > required (and not even desirable, especially for HMMs). > > - Mark > > Fantastic! Thanks for taking time to explaining this to me - it does make much more sense to do things this way now i understand things a little better. So, effectively, an alignment may be made from sequences that are composed of symbols from different alphabets. The solution I have is that I loop over the positions in the alignment, and then over the labels for that position, get the gap symbol for each label using: alignment.symbolListForLabel(label).getAlphabet().getGapSymbol() and then test if this symbol is the same as the symbol at position i for label j. :o) I think the main reason for my confusion, is that i'm trying to make a move from Bioperl to Biojava, and Bioperl has an alphabet for the whole alignment, therefore has a restriction that an alignment can only be comprised of sequences from the same alphabet. Thanks very much for your help! Nathan From dms700 at gmail.com Wed Aug 16 13:04:13 2006 From: dms700 at gmail.com (dmitriy) Date: Wed, 16 Aug 2006 09:04:13 -0400 Subject: [Biojava-l] ??? extracting introns sequences for transcripts using java API for ensembl ??? In-Reply-To: <299614de0608151646t57eb2653m519c83b049607afd@mail.gmail.com> References: <299614de0608151646t57eb2653m519c83b049607afd@mail.gmail.com> Message-ID: <299614de0608160604u3a2ec573nc006033899883240@mail.gmail.com> Hi I'm trying to use ensembl java API to extract info on five Prime UTR, exons, introns,threePrimeUTR for transcripts corresponding to particular NM_xxxxx ref seq . Unfortunetly it looks like I incorrectly use API to get intron info. The following is the type of the code I try to use to get intron info. ---------------------------- int exon1EndOffsetRelativeToGeneStart = ((Exon)transcript.getExons().get(0)).getLocation().getEnd() - gene.getLocation().getStart(); int exon2StartOffsetRelativeToGeneStart = ((Exon)transcript.getExons().get(1)).getLocation().getStart() - gene.getLocation().getStart(); String intron1 = gene.getSequence().getString().substring(exon1EndOffsetRelativeToGeneStart + 1, exon2StartOffsetRelativeToGeneStart )); ------------------------------ This code would works for "ENST00000275493" EGFR NM_005228.3 , but would not work for many if not vast majority of genes. I would greatly appreciate info on correct way of getting intron info for transcript. Thank you Dmitriy From k_stellar at msn.com Wed Aug 30 20:34:14 2006 From: k_stellar at msn.com (K.R. Carter) Date: Wed, 30 Aug 2006 16:34:14 -0400 Subject: [Biojava-l] SCF file wont load from URL Message-ID: <5d9376b50608301334m6e3288ffr1336cf68e1cbe08c@mail.gmail.com> Hello, I am trying to load an scf file by using the input stream from a url and it will not load. Does anyone know what might be happening? My program doesnt give an error, it just completely freezes. I am using the latest ( i think) version of SCF class. /** * A {@link org.biojava.bio.chromatogram.Chromatogram} as loaded from an * SCF v2 or v3 file. Also loads and exposes the SCF format's "private data" * and "comments" sections. The quality values from the SCF are stored as * additional sequences on the base call alignment. The labels are the * PROB_* constants in this class. * The values are {@link org.biojava.bio.symbol.IntegerAlphabet.IntegerSymbol} * objects in the range 0 to 255. * * * @author Rhett Sutphin (UI CBCB) */ any help would be greatly appreciated. Thanks! From mark.schreiber at novartis.com Thu Aug 31 07:01:32 2006 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 31 Aug 2006 15:01:32 +0800 Subject: [Biojava-l] SCF file wont load from URL Message-ID: Hi - This sounds very strange. Is there any stack trace? Could you possibly post the code that recreates the problem? - Mark "K.R. Carter" Sent by: biojava-l-bounces at lists.open-bio.org 08/31/2006 04:34 AM Please respond to kikia.reneese To: biojava-l at biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] SCF file wont load from URL Hello, I am trying to load an scf file by using the input stream from a url and it will not load. Does anyone know what might be happening? My program doesnt give an error, it just completely freezes. I am using the latest ( i think) version of SCF class. /** * A {@link org.biojava.bio.chromatogram.Chromatogram} as loaded from an * SCF v2 or v3 file. Also loads and exposes the SCF format's "private data" * and "comments" sections. The quality values from the SCF are stored as * additional sequences on the base call alignment. The labels are the * PROB_* constants in this class. * The values are {@link org.biojava.bio.symbol.IntegerAlphabet.IntegerSymbol} * objects in the range 0 to 255. * * * @author Rhett Sutphin (UI CBCB) */ any help would be greatly appreciated. Thanks! _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From ady at sanger.ac.uk Thu Aug 31 08:47:51 2006 From: ady at sanger.ac.uk (Andy Yates) Date: Thu, 31 Aug 2006 09:47:51 +0100 Subject: [Biojava-l] SCF file wont load from URL In-Reply-To: References: Message-ID: <44F6A237.1060302@sanger.ac.uk> That sounds like http proxy problems in my book. Try looking at this page: http://mindprod.com/jgloss/proxy.html The main thing to take home is try setting the system properties: proxySet=true http.proxyHost=proxyHostName http.proxyPort=proxyHostPort You can do this programatically using the System.setProperty() method or with -DpropertyName=propertyValue from the command line. Hope that helps, Andy Yates mark.schreiber at novartis.com wrote: > Hi - > > This sounds very strange. Is there any stack trace? Could you possibly > post the code that recreates the problem? > > - Mark > > > > > > "K.R. Carter" > Sent by: biojava-l-bounces at lists.open-bio.org > 08/31/2006 04:34 AM > Please respond to kikia.reneese > > > To: biojava-l at biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] SCF file wont load from URL > > > Hello, > > I am trying to load an scf file by using the input stream from a url and > it > will not load. Does anyone know what might be happening? My program doesnt > give an error, it just completely freezes. I am using the latest ( i > think) > version of SCF class. > > > /** > * A {@link org.biojava.bio.chromatogram.Chromatogram} as loaded from an > * SCF v2 or v3 file. Also loads and exposes the SCF format's "private > data" > * and "comments" sections. The quality values from the SCF are stored as > * additional sequences on the base call alignment. The labels are the > * PROB_* constants in this class. > * The values are {@link > org.biojava.bio.symbol.IntegerAlphabet.IntegerSymbol} > * objects in the range 0 to 255. > * > * > * @author Rhett Sutphin (UI CBCB) > */ > > any help would be greatly appreciated. > > Thanks! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l >