From ayates at ebi.ac.uk Sat Feb 6 10:12:11 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Sat, 6 Feb 2010 15:12:11 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <4B60244F.2070603@ebi.ac.uk> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> Message-ID: Finally it's in. I've managed to get enough time to finish the transcription code off. The main features of this check-in are: * DNA -> RNA -> Codon -> Peptide translation * Support for all IUPAC tables * New views for reversing sequences & complementing them * Windowed sequence views giving portions of a sequence as requested * TranscriptionEngine & TranscriptionEngine.Builder deal with the business of assembling the classes together as required * Singletons provided from the classes they are in (e.g. IUPACParser has one) *but* no class requires a singleton! * Utilities for working with IO streams & classpath resources (useful for testing) * Test case shows 1000 translations of BRCA2 (from DNA) in 0.7seconds (on my MacBook Pro; YMMV); test case will break if it takes longer than a second ** This is a vast improvement over my first attempt that had a rate of 1 per second hence why that was not checked in Limitations are: * Not much checking WRT lengths of sequence given to the code; need a strict & lenient mode * Stop codon trimming controlled by a boolean * No init-met translation (very important as some programs get a bit annoyed if they're given a V as an initiator AA) * Not sure if there is a way to ask if a codon is a start codon easily; I'm sure it can be done just not as easily as we may want * No way of modifying a badly translated peptide which we expect to badly translate However it's workable & means if you have a DNASequence technically you can get a peptide by saying: DNASequence s = getSeq(); ProteinSequence p = s.getRNASequence().getProteinSequence(); Now how's that for easy :) Share & enjoy! Andy From markjschreiber at gmail.com Sat Feb 6 10:45:17 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 6 Feb 2010 23:45:17 +0800 Subject: [Biojava-dev] Code Update In-Reply-To: References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> Message-ID: <93b45ca51002060745x30c2cd8bu3226732b64f354fa@mail.gmail.com> Hi Andy, Great work! Regarding the issue of non Met start codons such as TTG, biologically speaking they are still translated as Met. fMet is the only tRNA that can initiate. Presumably there is some flexibility in the recognition of the start codon. Maybe this is what you where aiming for but it would be good to have the option of making the first codon translate to Met no matter what the codon. Mark On 06-Feb-2010 11:13 PM, "Andy Yates" wrote: Finally it's in. I've managed to get enough time to finish the transcription code off. The main features of this check-in are: * DNA -> RNA -> Codon -> Peptide translation * Support for all IUPAC tables * New views for reversing sequences & complementing them * Windowed sequence views giving portions of a sequence as requested * TranscriptionEngine & TranscriptionEngine.Builder deal with the business of assembling the classes together as required * Singletons provided from the classes they are in (e.g. IUPACParser has one) *but* no class requires a singleton! * Utilities for working with IO streams & classpath resources (useful for testing) * Test case shows 1000 translations of BRCA2 (from DNA) in 0.7seconds (on my MacBook Pro; YMMV); test case will break if it takes longer than a second ** This is a vast improvement over my first attempt that had a rate of 1 per second hence why that was not checked in Limitations are: * Not much checking WRT lengths of sequence given to the code; need a strict & lenient mode * Stop codon trimming controlled by a boolean * No init-met translation (very important as some programs get a bit annoyed if they're given a V as an initiator AA) * Not sure if there is a way to ask if a codon is a start codon easily; I'm sure it can be done just not as easily as we may want * No way of modifying a badly translated peptide which we expect to badly translate However it's workable & means if you have a DNASequence technically you can get a peptide by saying: DNASequence s = getSeq(); ProteinSequence p = s.getRNASequence().getProteinSequence(); Now how's that for easy :) Share & enjoy! Andy _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio... From andreas at sdsc.edu Sat Feb 6 17:08:48 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 6 Feb 2010 14:08:48 -0800 Subject: [Biojava-dev] Code Update In-Reply-To: References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> Message-ID: <59a41c431002061408i1958f79cm6c70c56fcb941e63@mail.gmail.com> Looks good, Andy! Andreas On Sat, Feb 6, 2010 at 7:12 AM, Andy Yates wrote: > Finally it's in. I've managed to get enough time to finish the > transcription code off. The main features of this check-in are: > > * DNA -> RNA -> Codon -> Peptide translation > * Support for all IUPAC tables > * New views for reversing sequences & complementing them > * Windowed sequence views giving portions of a sequence as requested > * TranscriptionEngine & TranscriptionEngine.Builder deal with the business > of assembling the classes together as required > * Singletons provided from the classes they are in (e.g. IUPACParser has > one) *but* no class requires a singleton! > * Utilities for working with IO streams & classpath resources (useful for > testing) > * Test case shows 1000 translations of BRCA2 (from DNA) in 0.7seconds (on > my MacBook Pro; YMMV); test case will break if it takes longer than a second > ** This is a vast improvement over my first attempt that had a rate of 1 > per second hence why that was not checked in > > Limitations are: > > * Not much checking WRT lengths of sequence given to the code; need a > strict & lenient mode > * Stop codon trimming controlled by a boolean > * No init-met translation (very important as some programs get a bit > annoyed if they're given a V as an initiator AA) > * Not sure if there is a way to ask if a codon is a start codon easily; I'm > sure it can be done just not as easily as we may want > * No way of modifying a badly translated peptide which we expect to badly > translate > > However it's workable & means if you have a DNASequence technically you can > get a peptide by saying: > > DNASequence s = getSeq(); > ProteinSequence p = s.getRNASequence().getProteinSequence(); > > Now how's that for easy :) > > Share & enjoy! > > Andy > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Sat Feb 6 19:27:18 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Sun, 7 Feb 2010 00:27:18 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <93b45ca51002060745x30c2cd8bu3226732b64f354fa@mail.gmail.com> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> <93b45ca51002060745x30c2cd8bu3226732b64f354fa@mail.gmail.com> Message-ID: <05DB0C6D-21D4-41EB-A2C5-36B460B54204@ebi.ac.uk> Hi Mark, I thought that there had been some cases of Valine in bacteria not being modified but that said it's probably me mis-understanding what the other person was saying :). However there is the in-flexibility in other resources/programs which can throw the toys out of the pram the moment they are presented with a peptide not starting with M. What I was thinking is to either have this option available (much in the same way I've got stop codon trimming working) or using the planned edit capability. If I do put the optional translation as a boolean for the moment then that bit becomes a bit more feature complete and then a better solution can be applied later on :). Anyway next on the hit list I think is locations .... Andy On 6 Feb 2010, at 15:45, Mark Schreiber wrote: > Hi Andy, > > Great work! > > Regarding the issue of non Met start codons such as TTG, > biologically speaking they are still translated as Met. fMet is the > only tRNA that can initiate. Presumably there is some flexibility in > the recognition of the start codon. > > Maybe this is what you where aiming for but it would be good to have > the option of making the first codon translate to Met no matter what > the codon. > > Mark > > >> On 06-Feb-2010 11:13 PM, "Andy Yates" wrote: >> >> Finally it's in. I've managed to get enough time to finish the >> transcription code off. The main features of this check-in are: >> >> * DNA -> RNA -> Codon -> Peptide translation >> * Support for all IUPAC tables >> * New views for reversing sequences & complementing them >> * Windowed sequence views giving portions of a sequence as requested >> * TranscriptionEngine & TranscriptionEngine.Builder deal with the >> business of assembling the classes together as required >> * Singletons provided from the classes they are in (e.g. >> IUPACParser has one) *but* no class requires a singleton! >> * Utilities for working with IO streams & classpath resources >> (useful for testing) >> * Test case shows 1000 translations of BRCA2 (from DNA) in >> 0.7seconds (on my MacBook Pro; YMMV); test case will break if it >> takes longer than a second >> ** This is a vast improvement over my first attempt that had a rate >> of 1 per second hence why that was not checked in >> >> Limitations are: >> >> * Not much checking WRT lengths of sequence given to the code; need >> a strict & lenient mode >> * Stop codon trimming controlled by a boolean >> * No init-met translation (very important as some programs get a >> bit annoyed if they're given a V as an initiator AA) >> * Not sure if there is a way to ask if a codon is a start codon >> easily; I'm sure it can be done just not as easily as we may want >> * No way of modifying a badly translated peptide which we expect to >> badly translate >> >> However it's workable & means if you have a DNASequence technically >> you can get a peptide by saying: >> >> DNASequence s = getSeq(); >> ProteinSequence p = s.getRNASequence().getProteinSequence(); >> >> Now how's that for easy :) >> >> Share & enjoy! >> >> Andy >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio... >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From jolyon.holdstock at ogt.co.uk Wed Feb 17 12:43:05 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Wed, 17 Feb 2010 17:43:05 -0000 Subject: [Biojava-dev] Alignment viewer Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D2DF60@EUCLID.internal.ogtip.com> Hi, Following on from the query posted about viewing an alignment I've added a recipe to the cookbook. I hope it's OK but if anyone has any suggestions please let me know. Cheers, J Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 15781 bytes Desc: image001.png URL: From HWillis at scripps.edu Fri Feb 19 13:32:29 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 19 Feb 2010 13:32:29 -0500 Subject: [Biojava-dev] Netbeans Maven Message-ID: <4A02584C-C278-4FA9-94B5-8F663D9D0891@scripps.edu> I saw this link in Netbeans that goes over using Maven in Netbeans. Looks to be feature complete http://wiki.netbeans.org/MavenBestPractices Thanks Scooter Willis From HWillis at scripps.edu Fri Feb 19 14:30:13 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 19 Feb 2010 14:30:13 -0500 Subject: [Biojava-dev] List vs LinkedHashMap Message-ID: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? Thanks Scooter From ayates at ebi.ac.uk Fri Feb 19 15:29:56 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 19 Feb 2010 20:29:56 +0000 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> Message-ID: <01E00799-939D-426E-9157-E0ED65BBB864@ebi.ac.uk> Can't see any reason why not after all to get a collection back it's just a case of calling values() on the Map. The only other things I would say is I would prefer returning a Map rather than a LinkedHashMap but if there's a good reason why you want to return the solid class then that's fine :). I've done some work as well to cleanup the generics situation in these which I hope are okay. If you commit your changes in then I'll commit mine in & you can see what you think. Andy On 19 Feb 2010, at 19:30, Scooter Willis wrote: > > I am starting to use the new FastaReader in a project and the > default implementation I setup returns a List. The > very next thing I needed to do was convert to a > LinkedHashMap so I could query the sequence > of interest. It would seem that this is probably a fairly standard > use case. If I returned a LinkedHashMap as > the default container then we have a slight memory hit on keeping a > hash of the accession ID and a linked list for preserving order. > > Does anyone have objections to returning the sequences read from a > Fasta file as a LinkedHashMap? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From holland at eaglegenomics.com Fri Feb 19 14:51:13 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 20 Feb 2010 08:51:13 +1300 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> Message-ID: <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> Depends on whether or not you want to parse-at-once or stream-parse. If the parser is set up to load the whole lot at once, then a map is fine, otherwise not. On 20 Feb 2010, at 08:30, Scooter Willis wrote: > > I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. > > Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From HWillis at scripps.edu Fri Feb 19 15:53:24 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 19 Feb 2010 15:53:24 -0500 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> Message-ID: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Richard For Stream parsing I have abstracted that down to a proxy data structure that looks just like ArrayListSequenceBackingStore that can keep an offset token in file stream where this makes sense for loading very large files without actually keeping everything in memory. You pay the price once to get the header information and the offset into the stream/file of the start of the sequence and the length. Then if the user makes a call to the Sequence to get either the actual sequence data or a subsequence then the required sequence is loaded from the stream/file. This doesn't make sense for slow io bound streams where the load penalty would be high but does work well for file IO via RandomAccessFile seek and how it is currently implemented. If you have a fasta file with 1GB of data but only plan on selecting 10 sequences but don't know what those 10 sequences are at load time then this works well. This also allows you to load a large genome or genome scaffold file and by implementing the details in SequenceFileProxyLoader access sequence data without loading in the entire genome into memory. Here is two approaches to loading the same file found in FastaReader.java The first FastaReader passes in ProteinSequenceCreator that will handle the creation of the actual protein sequence and the storage. The second test case use FileProxyProteinSequence where you need to pass in a reference to the File and as the initial file is parsed once it simply keeps track of the locations. The actual ProteinSequence that gets created is a ProteinSequence where the store is a SequenceArrayListProxyLoader instead of ArrayListSequenceBackingStore. I have put together but haven't checked in a FastReaderHelper class with static methods to hide this detail from someone who simply wants to load a Fasta file. String inputFile = "src/test/resources/PF00104_small.fasta"; FileInputStream is = new FileInputStream(inputFile); FastaReader fastaReader = new FastaReader(is, new GenericFastaHeaderParser(), new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())); Collection proteinSequences = fastaReader.process(); is.close(); System.out.println(proteinSequences); File file = new File(inputFile); FastaReader fastaProxyReader = new FastaReader(file, new GenericFastaHeaderParser(), new FileProxyProteinSequenceCreator(file, AminoAcidCompoundSet.getAminoAcidCompoundSet())); Collection proteinProxySequences = fastaProxyReader.process(); System.out.println(proteinProxySequences); So in the current approach I would always be able to return a collection with knowledge of the header and a sequence that will either have the sequence data or know how to get it. This same concept works for being able to create a ProteinSequence where you can have a UniprotProxyProteinSequenceLoader or NCBIProxyProteinSequenceLoader where you only need to pass in the sequence unique id. The loader can get the sequence detail at the time the ProteinSequence is created or do it lazily when a request is made. This then extends back to genome views of DNASequence data where you don't need to even have the genome local but the appropriate genome sequence proxy loader would do a web services/REST call to the external server to retrieve the actual sequence or subsequence that is being requested. If you look at the AccessionID class I keep track of the type of accession id based on either recognizing the Fasta file header type or allowing the user to set it that will make working with features very powerful. If you know the accesion id and the type of id then making a request to a DAS server, Genome annotaiton server or Uniprot service to retreive features is easy. I haven't done that code yet but it is next on my list for a project I am working on. We also worked on building in the sequence classes the proper biological relationships such that if you start with a DNA sequence and apply the various exon/intron features you can have a TranscriptSequence that can return a ProteinSequence. In the reverse direction you should be able to take a ProteinSequence with a valid accession id with a known type and retrieve the parent DNA sequence if that linkage information is available via the appropriate web services/REST call. Part of the design but going in reverse is not implemented. You can start with a ChromosomeSequence and work your way down by adding introns and extrons. Andy has worked hard on this code which will make it really easy to use by programmers who don't know all the details. It has been a month since the BioJava Hackathon and feeling guilty that I haven't taken the time to write any of this up. Writing code is the easy part doing the documentation is always tough! I will see if this email generates a larger discussion among the list and based on how everything shakes out will turn the discussion into a wiki page to give a sequence design overview and code for testing and implementing of other proxy loaders. Thanks Scooter Willis On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > Depends on whether or not you want to parse-at-once or stream-parse. If the parser is set up to load the whole lot at once, then a map is fine, otherwise not. > > On 20 Feb 2010, at 08:30, Scooter Willis wrote: > >> >> I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. >> >> Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? >> >> Thanks >> >> Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From holland at eaglegenomics.com Sat Feb 20 00:28:31 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 20 Feb 2010 18:28:31 +1300 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Message-ID: <2BCB480A-6CE6-4700-B210-25FE095CA95E@eaglegenomics.com> sounds excellent! On 20 Feb 2010, at 09:53, Scooter Willis wrote: > Richard > > For Stream parsing I have abstracted that down to a proxy data structure that looks just like ArrayListSequenceBackingStore that can keep an offset token in file stream where this makes sense for loading very large files without actually keeping everything in memory. You pay the price once to get the header information and the offset into the stream/file of the start of the sequence and the length. Then if the user makes a call to the Sequence to get either the actual sequence data or a subsequence then the required sequence is loaded from the stream/file. This doesn't make sense for slow io bound streams where the load penalty would be high but does work well for file IO via RandomAccessFile seek and how it is currently implemented. If you have a fasta file with 1GB of data but only plan on selecting 10 sequences but don't know what those 10 sequences are at load time then this works well. > > This also allows you to load a large genome or genome scaffold file and by implementing the details in SequenceFileProxyLoader access sequence data without loading in the entire genome into memory. Here is two approaches to loading the same file found in FastaReader.java The first FastaReader passes in ProteinSequenceCreator that will handle the creation of the actual protein sequence and the storage. The second test case use FileProxyProteinSequence where you need to pass in a reference to the File and as the initial file is parsed once it simply keeps track of the locations. The actual ProteinSequence that gets created is a ProteinSequence where the store is a SequenceArrayListProxyLoader instead of ArrayListSequenceBackingStore. I have put together but haven't checked in a FastReaderHelper class with static methods to hide this detail from someone who simply wants to load a Fasta file. > > String inputFile = "src/test/resources/PF00104_small.fasta"; > FileInputStream is = new FileInputStream(inputFile); > > FastaReader fastaReader = new FastaReader(is, new GenericFastaHeaderParser(), new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinSequences = fastaReader.process(); > is.close(); > > > System.out.println(proteinSequences); > > File file = new File(inputFile); > FastaReader fastaProxyReader = new FastaReader(file, new GenericFastaHeaderParser(), new FileProxyProteinSequenceCreator(file, AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinProxySequences = fastaProxyReader.process(); > > System.out.println(proteinProxySequences); > > > So in the current approach I would always be able to return a collection with knowledge of the header and a sequence that will either have the sequence data or know how to get it. This same concept works for being able to create a ProteinSequence where you can have a UniprotProxyProteinSequenceLoader or NCBIProxyProteinSequenceLoader where you only need to pass in the sequence unique id. The loader can get the sequence detail at the time the ProteinSequence is created or do it lazily when a request is made. This then extends back to genome views of DNASequence data where you don't need to even have the genome local but the appropriate genome sequence proxy loader would do a web services/REST call to the external server to retrieve the actual sequence or subsequence that is being requested. > > If you look at the AccessionID class I keep track of the type of accession id based on either recognizing the Fasta file header type or allowing the user to set it that will make working with features very powerful. If you know the accesion id and the type of id then making a request to a DAS server, Genome annotaiton server or Uniprot service to retreive features is easy. I haven't done that code yet but it is next on my list for a project I am working on. > > We also worked on building in the sequence classes the proper biological relationships such that if you start with a DNA sequence and apply the various exon/intron features you can have a TranscriptSequence that can return a ProteinSequence. In the reverse direction you should be able to take a ProteinSequence with a valid accession id with a known type and retrieve the parent DNA sequence if that linkage information is available via the appropriate web services/REST call. Part of the design but going in reverse is not implemented. You can start with a ChromosomeSequence and work your way down by adding introns and extrons. Andy has worked hard on this code which will make it really easy to use by programmers who don't know all the details. > > It has been a month since the BioJava Hackathon and feeling guilty that I haven't taken the time to write any of this up. Writing code is the easy part doing the documentation is always tough! I will see if this email generates a larger discussion among the list and based on how everything shakes out will turn the discussion into a wiki page to give a sequence design overview and code for testing and implementing of other proxy loaders. > > Thanks > > Scooter Willis > > > > > > > > On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > >> Depends on whether or not you want to parse-at-once or stream-parse. If the parser is set up to load the whole lot at once, then a map is fine, otherwise not. >> >> On 20 Feb 2010, at 08:30, Scooter Willis wrote: >> >>> >>> I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. >>> >>> Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? >>> >>> Thanks >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Sat Feb 20 05:47:40 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Sat, 20 Feb 2010 10:47:40 +0000 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Message-ID: <7532BACD-357E-4389-A072-20CE491BFC52@ebi.ac.uk> Hey guys, All the things that Scooter has done here I think is fantastic and the thought in the abstractions behind the loaders is really good. I'm especially liking the idea of being able to move into external resources for sequences. One thing that has always annoyed me is if I wanted to do some coding on a peptide sequence is having to download if from say UniProt/UniParc, save it into a file, read it & then do something. A store backing onto the large sequence repositories is a great win. I think the ones to target first for these systems are UniProt, eFetch & dbfetch. The last 2 are very important because they give us access to a huge number of databases from the single interface. One thing to remember when writing these classes is that the inbuilt HTTPConnection code with Java was always buggy and would leak sockets under some circumstances if you do not always read the out and the error streams. But I'm sure the IO utility code we've got can be modified to ensure we don't leak them :). In terms of the stuff I've been doing I've pushed everything into some lower level classes so in order to go from DNA to RNA you instantiate a class which can handle nucleotides. If you're in a DNASequence then there are already methods on there to go to RNA & from RNA to Protein. All the code which does this is held in other classes so it's all design by composition rather than inheritance. The way I'm currently imagining how you can move from one sequence to another is the registration of type specific features and then offering these sub-structures using the SequenceView code. So if we had a Gene the transcript could be defined by TransciptSequence.class & then when you request it we can then send back a SequenceView with ExonSequence.class objects registered to give the Exons & well I'm sure you can all see where it's coming. One thing I can't handle ATMO are phases so the code assumes everything starts in phase 1. For what we've got ATMO it's fine but later on this needs to be addressed. I'm also feeling guilty but it's quite hard getting the time to get the code down let alone documentation. So long as we make sure there's test cases available then we can see how to use the code as well for when we get round to documentation. Andy On 19 Feb 2010, at 20:53, Scooter Willis wrote: > Richard > > For Stream parsing I have abstracted that down to a proxy data > structure that looks just like ArrayListSequenceBackingStore that > can keep an offset token in file stream where this makes sense for > loading very large files without actually keeping everything in > memory. You pay the price once to get the header information and the > offset into the stream/file of the start of the sequence and the > length. Then if the user makes a call to the Sequence to get either > the actual sequence data or a subsequence then the required sequence > is loaded from the stream/file. This doesn't make sense for slow io > bound streams where the load penalty would be high but does work > well for file IO via RandomAccessFile seek and how it is currently > implemented. If you have a fasta file with 1GB of data but only plan > on selecting 10 sequences but don't know what those 10 sequences are > at load time then this works well. > > This also allows you to load a large genome or genome scaffold file > and by implementing the details in SequenceFileProxyLoader access > sequence data without loading in the entire genome into memory. Here > is two approaches to loading the same file found in FastaReader.java > The first FastaReader passes in ProteinSequenceCreator that will > handle the creation of the actual protein sequence and the storage. > The second test case use FileProxyProteinSequence where you need to > pass in a reference to the File and as the initial file is parsed > once it simply keeps track of the locations. The actual > ProteinSequence that gets created is a ProteinSequence where the > store is a SequenceArrayListProxyLoader instead of > ArrayListSequenceBackingStore. I have put together but haven't > checked in a FastReaderHelper class with static methods to hide this > detail from someone who simply wants to load a Fasta file. > > String inputFile = "src/test/resources/ > PF00104_small.fasta"; > FileInputStream is = new FileInputStream(inputFile); > > FastaReader fastaReader = new > FastaReader(is, new GenericFastaHeaderParser(), new > ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet > ())); > Collection proteinSequences = > fastaReader.process(); > is.close(); > > > System.out.println(proteinSequences); > > File file = new File(inputFile); > FastaReader fastaProxyReader = new > FastaReader(file, new GenericFastaHeaderParser(), > new FileProxyProteinSequenceCreator(file, > AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinProxySequences = > fastaProxyReader.process(); > > System.out.println(proteinProxySequences); > > > So in the current approach I would always be able to return a > collection with knowledge of the header and a sequence that will > either have the sequence data or know how to get it. This same > concept works for being able to create a ProteinSequence where you > can have a UniprotProxyProteinSequenceLoader or > NCBIProxyProteinSequenceLoader where you only need to pass in the > sequence unique id. The loader can get the sequence detail at the > time the ProteinSequence is created or do it lazily when a request > is made. This then extends back to genome views of DNASequence data > where you don't need to even have the genome local but the > appropriate genome sequence proxy loader would do a web services/ > REST call to the external server to retrieve the actual sequence or > subsequence that is being requested. > > If you look at the AccessionID class I keep track of the type of > accession id based on either recognizing the Fasta file header type > or allowing the user to set it that will make working with features > very powerful. If you know the accesion id and the type of id then > making a request to a DAS server, Genome annotaiton server or > Uniprot service to retreive features is easy. I haven't done that > code yet but it is next on my list for a project I am working on. > > We also worked on building in the sequence classes the proper > biological relationships such that if you start with a DNA sequence > and apply the various exon/intron features you can have a > TranscriptSequence that can return a ProteinSequence. In the reverse > direction you should be able to take a ProteinSequence with a valid > accession id with a known type and retrieve the parent DNA sequence > if that linkage information is available via the appropriate web > services/REST call. Part of the design but going in reverse is not > implemented. You can start with a ChromosomeSequence and work your > way down by adding introns and extrons. Andy has worked hard on this > code which will make it really easy to use by programmers who don't > know all the details. > > It has been a month since the BioJava Hackathon and feeling guilty > that I haven't taken the time to write any of this up. Writing code > is the easy part doing the documentation is always tough! I will see > if this email generates a larger discussion among the list and based > on how everything shakes out will turn the discussion into a wiki > page to give a sequence design overview and code for testing and > implementing of other proxy loaders. > > Thanks > > Scooter Willis > > > > > > > > On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > >> Depends on whether or not you want to parse-at-once or stream- >> parse. If the parser is set up to load the whole lot at once, then >> a map is fine, otherwise not. >> >> On 20 Feb 2010, at 08:30, Scooter Willis wrote: >> >>> >>> I am starting to use the new FastaReader in a project and the >>> default implementation I setup returns a List. >>> The very next thing I needed to do was convert to a >>> LinkedHashMap so I could query the >>> sequence of interest. It would seem that this is probably a fairly >>> standard use case. If I returned a >>> LinkedHashMap as the default container >>> then we have a slight memory hit on keeping a hash of the >>> accession ID and a linked list for preserving order. >>> >>> Does anyone have objections to returning the sequences read from a >>> Fasta file as a LinkedHashMap? >>> >>> Thanks >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From genjasp at gmail.com Sat Feb 20 06:07:05 2010 From: genjasp at gmail.com (Alessandro Cipriani) Date: Sat, 20 Feb 2010 12:07:05 +0100 Subject: [Biojava-dev] Global Alignment Message-ID: <46b9a2151002200307r749273e2gedf1ee28667f3f31@mail.gmail.com> Hi to all, i have a question: How do i align two circular dna sequence? Is it possible? Thx Regards alex -- Alessandro Cipriani (+39) 3206009509 http://www.cipriania.it skype:genjasp at gmail.com msn:jaspzz From invite+maeygmgn at facebookmail.com Fri Feb 19 03:42:23 2010 From: invite+maeygmgn at facebookmail.com (Biswaroop Ghosh) Date: Fri, 19 Feb 2010 00:42:23 -0800 Subject: [Biojava-dev] Check out my photos on Facebook Message-ID: <8c39212906af4eea9f7164124248e982@www.facebook.com> Hi Biojava-dev, I invited you to join Facebook a while back and wanted to remind you that once you join, we'll be able to connect online, share photos, organize groups and events, and more. Thanks, Biswaroop To sign up for Facebook, follow the link below: http://www.facebook.com/p.php?i=695556070&k=4V1Y3XW3UTXN3CD1RKWVPVWUVSKJ3U&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=biojava-dev at biojava.org&c=a29fd3b2297228c482384df0f8e6b128.biojava-dev at biojava.org was invited to join Facebook by Biswaroop Ghosh. If you do not wish to receive this type of email from Facebook in the future, please click on the link below to unsubscribe. http://www.facebook.com/o.php?k=e57693&u=693100398&mid=1e84c1dG294fdf6eG0G8 Facebook's offices are located at 1601 S. California Ave., Palo Alto, CA 94304. From andreas.prlic at gmail.com Sat Feb 20 15:55:39 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Sat, 20 Feb 2010 12:55:39 -0800 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Message-ID: <59a41c431002201255u52aaf539m3d3e523763cdc87@mail.gmail.com> I think this is coming along really nicely. Before we can call this ready for release, we will still need to spend some time on documenting things, otherwise it will be very hard for any user to figure out what is going on. (that also applies to the structure modules, which need a push for more docu...) Beside this,what I have seen so far seems to be really easy to use... Andreas On Fri, Feb 19, 2010 at 12:53 PM, Scooter Willis wrote: > Richard > > For Stream parsing I have abstracted that down to a proxy data structure > that looks just like ArrayListSequenceBackingStore that can keep an offset > token in file stream where this makes sense for loading very large files > without actually keeping everything in memory. You pay the price once to get > the header information and the offset into the stream/file of the start of > the sequence and the length. Then if the user makes a call to the Sequence > to get either the actual sequence data or a subsequence then the required > sequence is loaded from the stream/file. This doesn't make sense for slow io > bound streams where the load penalty would be high but does work well for > file IO via RandomAccessFile seek and how it is currently implemented. If > you have a fasta file with 1GB of data but only plan on selecting 10 > sequences but don't know what those 10 sequences are at load time then this > works well. > > This also allows you to load a large genome or genome scaffold file and by > implementing the details in SequenceFileProxyLoader access sequence data > without loading in the entire genome into memory. Here is two approaches to > loading the same file found in FastaReader.java The first FastaReader passes > in ProteinSequenceCreator that will handle the creation of the actual > protein sequence and the storage. The second test case use > FileProxyProteinSequence where you need to pass in a reference to the File > and as the initial file is parsed once it simply keeps track of the > locations. The actual ProteinSequence that gets created is a ProteinSequence > where the store is a SequenceArrayListProxyLoader instead of > ArrayListSequenceBackingStore. I have put together but haven't checked in a > FastReaderHelper class with static methods to hide this detail from someone > who simply wants to load a Fasta file. > > String inputFile = "src/test/resources/PF00104_small.fasta"; > FileInputStream is = new FileInputStream(inputFile); > > FastaReader fastaReader = new > FastaReader(is, new GenericFastaHeaderParser(), new > ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinSequences = > fastaReader.process(); > is.close(); > > > System.out.println(proteinSequences); > > File file = new File(inputFile); > FastaReader fastaProxyReader = new > FastaReader(file, new GenericFastaHeaderParser(), new > FileProxyProteinSequenceCreator(file, > AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinProxySequences = > fastaProxyReader.process(); > > System.out.println(proteinProxySequences); > > > So in the current approach I would always be able to return a collection > with knowledge of the header and a sequence that will either have the > sequence data or know how to get it. This same concept works for being able > to create a ProteinSequence where you can have a > UniprotProxyProteinSequenceLoader or NCBIProxyProteinSequenceLoader where > you only need to pass in the sequence unique id. The loader can get the > sequence detail at the time the ProteinSequence is created or do it lazily > when a request is made. This then extends back to genome views of > DNASequence data where you don't need to even have the genome local but the > appropriate genome sequence proxy loader would do a web services/REST call > to the external server to retrieve the actual sequence or subsequence that > is being requested. > > If you look at the AccessionID class I keep track of the type of accession > id based on either recognizing the Fasta file header type or allowing the > user to set it that will make working with features very powerful. If you > know the accesion id and the type of id then making a request to a DAS > server, Genome annotaiton server or Uniprot service to retreive features is > easy. I haven't done that code yet but it is next on my list for a project I > am working on. > > We also worked on building in the sequence classes the proper biological > relationships such that if you start with a DNA sequence and apply the > various exon/intron features you can have a TranscriptSequence that can > return a ProteinSequence. In the reverse direction you should be able to > take a ProteinSequence with a valid accession id with a known type and > retrieve the parent DNA sequence if that linkage information is available > via the appropriate web services/REST call. Part of the design but going in > reverse is not implemented. You can start with a ChromosomeSequence and work > your way down by adding introns and extrons. Andy has worked hard on this > code which will make it really easy to use by programmers who don't know all > the details. > > It has been a month since the BioJava Hackathon and feeling guilty that I > haven't taken the time to write any of this up. Writing code is the easy > part doing the documentation is always tough! I will see if this email > generates a larger discussion among the list and based on how everything > shakes out will turn the discussion into a wiki page to give a sequence > design overview and code for testing and implementing of other proxy > loaders. > > Thanks > > Scooter Willis > > > > > > > > On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > > > Depends on whether or not you want to parse-at-once or stream-parse. If > the parser is set up to load the whole lot at once, then a map is fine, > otherwise not. > > > > On 20 Feb 2010, at 08:30, Scooter Willis wrote: > > > >> > >> I am starting to use the new FastaReader in a project and the default > implementation I setup returns a List. The very next thing > I needed to do was convert to a LinkedHashMap so I > could query the sequence of interest. It would seem that this is probably a > fairly standard use case. If I returned a > LinkedHashMap as the default container then we have > a slight memory hit on keeping a hash of the accession ID and a linked list > for preserving order. > >> > >> Does anyone have objections to returning the sequences read from a Fasta > file as a LinkedHashMap? > >> > >> Thanks > >> > >> Scooter > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From jolyon.holdstock at ogt.co.uk Thu Feb 25 13:11:39 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 25 Feb 2010 18:11:39 -0000 Subject: [Biojava-dev] Another alignment example Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> Hi, I was playing around with my alignment example and have made it prettier. I've added this to the cookbook as well. Is it a suitable example for a cookbook? Thanks, J Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. This email has been scanned by Oxford Gene Technology Security Systems. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 15781 bytes Desc: image001.png URL: From bugzilla-daemon at portal.open-bio.org Thu Feb 25 19:30:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Feb 2010 19:30:06 -0500 Subject: [Biojava-dev] [Bug 2541] Exception is thrown when trying to parse a valid GenBank file In-Reply-To: Message-ID: <201002260030.o1Q0U6U4030935@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2541 maruco at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 26 05:46:54 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 05:46:54 -0500 Subject: [Biojava-dev] [Bug 2541] Exception is thrown when trying to parse a valid GenBank file In-Reply-To: Message-ID: <201002261046.o1QAksIv016486@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2541 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 05:46 EST ------- Did you cut and paste the example file? It is malformed (missing the recorded terminating // line after the sequence). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maruco at gmail.com Fri Feb 26 07:32:52 2010 From: maruco at gmail.com (Thiago Satake) Date: Fri, 26 Feb 2010 09:32:52 -0300 Subject: [Biojava-dev] SVN Timed Out Message-ID: Hello all, I am new here and I am trying to check out the latest copy of biojava-live from the "SVN trunk" but I am reciving the message: svn list svn://code.open-bio.org/biojava svn: Can't connect to host 'code.open-bio.org': Connection timed out Is there sometinhg wrong I am doing ? Thanks -- Thiago Seito Satake Tel: (011) 6588-8045 From biopython at maubp.freeserve.co.uk Fri Feb 26 07:56:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Feb 2010 12:56:16 +0000 Subject: [Biojava-dev] SVN Timed Out In-Reply-To: References: Message-ID: <320fb6e01002260456l2c0b807ft1d5a847179e1e78f@mail.gmail.com> On Fri, Feb 26, 2010 at 12:32 PM, Thiago Satake wrote: > > Hello all, > > I am new here and I am trying to check out the latest copy of biojava-live > from the "SVN trunk" but I am reciving the message: > > svn list svn://code.open-bio.org/biojava > svn: Can't connect to host 'code.open-bio.org': Connection timed out > > Is there sometinhg wrong I am doing ? I know there have been some recent problems with the server code.open-bio.org which offers anonymous CVS/SVN access to most of the OBF projects. The OBF are aware of this and are looking into it. Peter e.g. http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032365.html From andreas at sdsc.edu Fri Feb 26 12:30:28 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 26 Feb 2010 09:30:28 -0800 Subject: [Biojava-dev] Another alignment example In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> Message-ID: <59a41c431002260930r7864930fjd32ca6f35acbdca6@mail.gmail.com> Hi Jolyon, I think it is fine. Perhaps you can merge the two examples into one, since they seem to be very similar? Cheers, Andreas On Thu, Feb 25, 2010 at 10:11 AM, Jolyon Holdstock < jolyon.holdstock at ogt.co.uk> wrote: > Hi, > > > > I was playing around with my alignment example and have made it > prettier. I've added this to the cookbook as well. > > > > Is it a suitable example for a cookbook? > > > > Thanks, > > > > J > > > > Dr. Jolyon Holdstock > Senior Computational Biologist, > > Oxford Gene Technology, > Begbroke Science Park, > Sandy Lane, Yarnton, > Oxford, OX5 1PF, UK. > > T: +44 (0)1865 856 852 > F: +44 (0)1865 842 116 > E: jolyon.holdstock at ogt.co.uk > > W: www.ogt.co.uk > > > > Looking to outsource your microarray studies? Look no further. > Click here to tour our facilities > > > Click here to request a quotation > > > > > Scientific pedigree delivering high quality microarray results to you: > > * Service capacity >1000 samples per week > > * Rigorous QC from sample to > result > > * Applications available > include aCGH, CNV, methylation studies and miRNA > > > > Oxford Gene Technology (Operations) Ltd. Registered in England No: > 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF > > Confidentiality Notice: The contents of this email from Oxford Gene > Technology are confidential and intended solely for the person to whom > it is addressed. It may contain privileged and confidential information. > If you are not the intended recipient you must not read, copy, > distribute, discuss or take any action in reliance on it. If you have > received this email in error please advise the sender so that we can > arrange for proper delivery. Then please delete the message from your > inbox. Thank you. > > > > > > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From jolyon.holdstock at ogt.co.uk Fri Feb 26 12:35:42 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Fri, 26 Feb 2010 17:35:42 -0000 Subject: [Biojava-dev] Another alignment example[Scanned] References: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> <59a41c431002260930r7864930fjd32ca6f35acbdca6@mail.gmail.com> Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D5FCF0@EUCLID.internal.ogtip.com> Hi Andreas, They are very similar; the second one uses the same code as the first but build on it. I don't mind merging them, I didn't want to make it too complicated for a new BioJava user. Cheers, J From: Andreas Prlic [mailto:andreas at sdsc.edu] Sent: 26 February 2010 17:30 To: Jolyon Holdstock Cc: biojava-dev Subject: Re: [Biojava-dev] Another alignment example[Scanned] Hi Jolyon, I think it is fine. Perhaps you can merge the two examples into one, since they seem to be very similar? Cheers, Andreas On Thu, Feb 25, 2010 at 10:11 AM, Jolyon Holdstock wrote: Hi, I was playing around with my alignment example and have made it prettier. I've added this to the cookbook as well. Is it a suitable example for a cookbook? Thanks, J Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. This email has been scanned by Oxford Gene Technology Security Systems. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev This email has been scanned by Oxford Gene Technology Security Systems. This email has been scanned by Oxford Gene Technology Security Systems. From ayates at ebi.ac.uk Sat Feb 6 15:12:11 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Sat, 6 Feb 2010 15:12:11 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <4B60244F.2070603@ebi.ac.uk> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> Message-ID: Finally it's in. I've managed to get enough time to finish the transcription code off. The main features of this check-in are: * DNA -> RNA -> Codon -> Peptide translation * Support for all IUPAC tables * New views for reversing sequences & complementing them * Windowed sequence views giving portions of a sequence as requested * TranscriptionEngine & TranscriptionEngine.Builder deal with the business of assembling the classes together as required * Singletons provided from the classes they are in (e.g. IUPACParser has one) *but* no class requires a singleton! * Utilities for working with IO streams & classpath resources (useful for testing) * Test case shows 1000 translations of BRCA2 (from DNA) in 0.7seconds (on my MacBook Pro; YMMV); test case will break if it takes longer than a second ** This is a vast improvement over my first attempt that had a rate of 1 per second hence why that was not checked in Limitations are: * Not much checking WRT lengths of sequence given to the code; need a strict & lenient mode * Stop codon trimming controlled by a boolean * No init-met translation (very important as some programs get a bit annoyed if they're given a V as an initiator AA) * Not sure if there is a way to ask if a codon is a start codon easily; I'm sure it can be done just not as easily as we may want * No way of modifying a badly translated peptide which we expect to badly translate However it's workable & means if you have a DNASequence technically you can get a peptide by saying: DNASequence s = getSeq(); ProteinSequence p = s.getRNASequence().getProteinSequence(); Now how's that for easy :) Share & enjoy! Andy From markjschreiber at gmail.com Sat Feb 6 15:45:17 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 6 Feb 2010 23:45:17 +0800 Subject: [Biojava-dev] Code Update In-Reply-To: References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> Message-ID: <93b45ca51002060745x30c2cd8bu3226732b64f354fa@mail.gmail.com> Hi Andy, Great work! Regarding the issue of non Met start codons such as TTG, biologically speaking they are still translated as Met. fMet is the only tRNA that can initiate. Presumably there is some flexibility in the recognition of the start codon. Maybe this is what you where aiming for but it would be good to have the option of making the first codon translate to Met no matter what the codon. Mark On 06-Feb-2010 11:13 PM, "Andy Yates" wrote: Finally it's in. I've managed to get enough time to finish the transcription code off. The main features of this check-in are: * DNA -> RNA -> Codon -> Peptide translation * Support for all IUPAC tables * New views for reversing sequences & complementing them * Windowed sequence views giving portions of a sequence as requested * TranscriptionEngine & TranscriptionEngine.Builder deal with the business of assembling the classes together as required * Singletons provided from the classes they are in (e.g. IUPACParser has one) *but* no class requires a singleton! * Utilities for working with IO streams & classpath resources (useful for testing) * Test case shows 1000 translations of BRCA2 (from DNA) in 0.7seconds (on my MacBook Pro; YMMV); test case will break if it takes longer than a second ** This is a vast improvement over my first attempt that had a rate of 1 per second hence why that was not checked in Limitations are: * Not much checking WRT lengths of sequence given to the code; need a strict & lenient mode * Stop codon trimming controlled by a boolean * No init-met translation (very important as some programs get a bit annoyed if they're given a V as an initiator AA) * Not sure if there is a way to ask if a codon is a start codon easily; I'm sure it can be done just not as easily as we may want * No way of modifying a badly translated peptide which we expect to badly translate However it's workable & means if you have a DNASequence technically you can get a peptide by saying: DNASequence s = getSeq(); ProteinSequence p = s.getRNASequence().getProteinSequence(); Now how's that for easy :) Share & enjoy! Andy _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio... From andreas at sdsc.edu Sat Feb 6 22:08:48 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 6 Feb 2010 14:08:48 -0800 Subject: [Biojava-dev] Code Update In-Reply-To: References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> Message-ID: <59a41c431002061408i1958f79cm6c70c56fcb941e63@mail.gmail.com> Looks good, Andy! Andreas On Sat, Feb 6, 2010 at 7:12 AM, Andy Yates wrote: > Finally it's in. I've managed to get enough time to finish the > transcription code off. The main features of this check-in are: > > * DNA -> RNA -> Codon -> Peptide translation > * Support for all IUPAC tables > * New views for reversing sequences & complementing them > * Windowed sequence views giving portions of a sequence as requested > * TranscriptionEngine & TranscriptionEngine.Builder deal with the business > of assembling the classes together as required > * Singletons provided from the classes they are in (e.g. IUPACParser has > one) *but* no class requires a singleton! > * Utilities for working with IO streams & classpath resources (useful for > testing) > * Test case shows 1000 translations of BRCA2 (from DNA) in 0.7seconds (on > my MacBook Pro; YMMV); test case will break if it takes longer than a second > ** This is a vast improvement over my first attempt that had a rate of 1 > per second hence why that was not checked in > > Limitations are: > > * Not much checking WRT lengths of sequence given to the code; need a > strict & lenient mode > * Stop codon trimming controlled by a boolean > * No init-met translation (very important as some programs get a bit > annoyed if they're given a V as an initiator AA) > * Not sure if there is a way to ask if a codon is a start codon easily; I'm > sure it can be done just not as easily as we may want > * No way of modifying a badly translated peptide which we expect to badly > translate > > However it's workable & means if you have a DNASequence technically you can > get a peptide by saying: > > DNASequence s = getSeq(); > ProteinSequence p = s.getRNASequence().getProteinSequence(); > > Now how's that for easy :) > > Share & enjoy! > > Andy > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Sun Feb 7 00:27:18 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Sun, 7 Feb 2010 00:27:18 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <93b45ca51002060745x30c2cd8bu3226732b64f354fa@mail.gmail.com> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <4B60244F.2070603@ebi.ac.uk> <93b45ca51002060745x30c2cd8bu3226732b64f354fa@mail.gmail.com> Message-ID: <05DB0C6D-21D4-41EB-A2C5-36B460B54204@ebi.ac.uk> Hi Mark, I thought that there had been some cases of Valine in bacteria not being modified but that said it's probably me mis-understanding what the other person was saying :). However there is the in-flexibility in other resources/programs which can throw the toys out of the pram the moment they are presented with a peptide not starting with M. What I was thinking is to either have this option available (much in the same way I've got stop codon trimming working) or using the planned edit capability. If I do put the optional translation as a boolean for the moment then that bit becomes a bit more feature complete and then a better solution can be applied later on :). Anyway next on the hit list I think is locations .... Andy On 6 Feb 2010, at 15:45, Mark Schreiber wrote: > Hi Andy, > > Great work! > > Regarding the issue of non Met start codons such as TTG, > biologically speaking they are still translated as Met. fMet is the > only tRNA that can initiate. Presumably there is some flexibility in > the recognition of the start codon. > > Maybe this is what you where aiming for but it would be good to have > the option of making the first codon translate to Met no matter what > the codon. > > Mark > > >> On 06-Feb-2010 11:13 PM, "Andy Yates" wrote: >> >> Finally it's in. I've managed to get enough time to finish the >> transcription code off. The main features of this check-in are: >> >> * DNA -> RNA -> Codon -> Peptide translation >> * Support for all IUPAC tables >> * New views for reversing sequences & complementing them >> * Windowed sequence views giving portions of a sequence as requested >> * TranscriptionEngine & TranscriptionEngine.Builder deal with the >> business of assembling the classes together as required >> * Singletons provided from the classes they are in (e.g. >> IUPACParser has one) *but* no class requires a singleton! >> * Utilities for working with IO streams & classpath resources >> (useful for testing) >> * Test case shows 1000 translations of BRCA2 (from DNA) in >> 0.7seconds (on my MacBook Pro; YMMV); test case will break if it >> takes longer than a second >> ** This is a vast improvement over my first attempt that had a rate >> of 1 per second hence why that was not checked in >> >> Limitations are: >> >> * Not much checking WRT lengths of sequence given to the code; need >> a strict & lenient mode >> * Stop codon trimming controlled by a boolean >> * No init-met translation (very important as some programs get a >> bit annoyed if they're given a V as an initiator AA) >> * Not sure if there is a way to ask if a codon is a start codon >> easily; I'm sure it can be done just not as easily as we may want >> * No way of modifying a badly translated peptide which we expect to >> badly translate >> >> However it's workable & means if you have a DNASequence technically >> you can get a peptide by saying: >> >> DNASequence s = getSeq(); >> ProteinSequence p = s.getRNASequence().getProteinSequence(); >> >> Now how's that for easy :) >> >> Share & enjoy! >> >> Andy >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio... >> > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From jolyon.holdstock at ogt.co.uk Wed Feb 17 17:43:05 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Wed, 17 Feb 2010 17:43:05 -0000 Subject: [Biojava-dev] Alignment viewer Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D2DF60@EUCLID.internal.ogtip.com> Hi, Following on from the query posted about viewing an alignment I've added a recipe to the cookbook. I hope it's OK but if anyone has any suggestions please let me know. Cheers, J Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 15781 bytes Desc: image001.png URL: From HWillis at scripps.edu Fri Feb 19 18:32:29 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 19 Feb 2010 13:32:29 -0500 Subject: [Biojava-dev] Netbeans Maven Message-ID: <4A02584C-C278-4FA9-94B5-8F663D9D0891@scripps.edu> I saw this link in Netbeans that goes over using Maven in Netbeans. Looks to be feature complete http://wiki.netbeans.org/MavenBestPractices Thanks Scooter Willis From HWillis at scripps.edu Fri Feb 19 19:30:13 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 19 Feb 2010 14:30:13 -0500 Subject: [Biojava-dev] List vs LinkedHashMap Message-ID: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? Thanks Scooter From ayates at ebi.ac.uk Fri Feb 19 20:29:56 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 19 Feb 2010 20:29:56 +0000 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> Message-ID: <01E00799-939D-426E-9157-E0ED65BBB864@ebi.ac.uk> Can't see any reason why not after all to get a collection back it's just a case of calling values() on the Map. The only other things I would say is I would prefer returning a Map rather than a LinkedHashMap but if there's a good reason why you want to return the solid class then that's fine :). I've done some work as well to cleanup the generics situation in these which I hope are okay. If you commit your changes in then I'll commit mine in & you can see what you think. Andy On 19 Feb 2010, at 19:30, Scooter Willis wrote: > > I am starting to use the new FastaReader in a project and the > default implementation I setup returns a List. The > very next thing I needed to do was convert to a > LinkedHashMap so I could query the sequence > of interest. It would seem that this is probably a fairly standard > use case. If I returned a LinkedHashMap as > the default container then we have a slight memory hit on keeping a > hash of the accession ID and a linked list for preserving order. > > Does anyone have objections to returning the sequences read from a > Fasta file as a LinkedHashMap? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From holland at eaglegenomics.com Fri Feb 19 19:51:13 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 20 Feb 2010 08:51:13 +1300 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> Message-ID: <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> Depends on whether or not you want to parse-at-once or stream-parse. If the parser is set up to load the whole lot at once, then a map is fine, otherwise not. On 20 Feb 2010, at 08:30, Scooter Willis wrote: > > I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. > > Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From HWillis at scripps.edu Fri Feb 19 20:53:24 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 19 Feb 2010 15:53:24 -0500 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> Message-ID: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Richard For Stream parsing I have abstracted that down to a proxy data structure that looks just like ArrayListSequenceBackingStore that can keep an offset token in file stream where this makes sense for loading very large files without actually keeping everything in memory. You pay the price once to get the header information and the offset into the stream/file of the start of the sequence and the length. Then if the user makes a call to the Sequence to get either the actual sequence data or a subsequence then the required sequence is loaded from the stream/file. This doesn't make sense for slow io bound streams where the load penalty would be high but does work well for file IO via RandomAccessFile seek and how it is currently implemented. If you have a fasta file with 1GB of data but only plan on selecting 10 sequences but don't know what those 10 sequences are at load time then this works well. This also allows you to load a large genome or genome scaffold file and by implementing the details in SequenceFileProxyLoader access sequence data without loading in the entire genome into memory. Here is two approaches to loading the same file found in FastaReader.java The first FastaReader passes in ProteinSequenceCreator that will handle the creation of the actual protein sequence and the storage. The second test case use FileProxyProteinSequence where you need to pass in a reference to the File and as the initial file is parsed once it simply keeps track of the locations. The actual ProteinSequence that gets created is a ProteinSequence where the store is a SequenceArrayListProxyLoader instead of ArrayListSequenceBackingStore. I have put together but haven't checked in a FastReaderHelper class with static methods to hide this detail from someone who simply wants to load a Fasta file. String inputFile = "src/test/resources/PF00104_small.fasta"; FileInputStream is = new FileInputStream(inputFile); FastaReader fastaReader = new FastaReader(is, new GenericFastaHeaderParser(), new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())); Collection proteinSequences = fastaReader.process(); is.close(); System.out.println(proteinSequences); File file = new File(inputFile); FastaReader fastaProxyReader = new FastaReader(file, new GenericFastaHeaderParser(), new FileProxyProteinSequenceCreator(file, AminoAcidCompoundSet.getAminoAcidCompoundSet())); Collection proteinProxySequences = fastaProxyReader.process(); System.out.println(proteinProxySequences); So in the current approach I would always be able to return a collection with knowledge of the header and a sequence that will either have the sequence data or know how to get it. This same concept works for being able to create a ProteinSequence where you can have a UniprotProxyProteinSequenceLoader or NCBIProxyProteinSequenceLoader where you only need to pass in the sequence unique id. The loader can get the sequence detail at the time the ProteinSequence is created or do it lazily when a request is made. This then extends back to genome views of DNASequence data where you don't need to even have the genome local but the appropriate genome sequence proxy loader would do a web services/REST call to the external server to retrieve the actual sequence or subsequence that is being requested. If you look at the AccessionID class I keep track of the type of accession id based on either recognizing the Fasta file header type or allowing the user to set it that will make working with features very powerful. If you know the accesion id and the type of id then making a request to a DAS server, Genome annotaiton server or Uniprot service to retreive features is easy. I haven't done that code yet but it is next on my list for a project I am working on. We also worked on building in the sequence classes the proper biological relationships such that if you start with a DNA sequence and apply the various exon/intron features you can have a TranscriptSequence that can return a ProteinSequence. In the reverse direction you should be able to take a ProteinSequence with a valid accession id with a known type and retrieve the parent DNA sequence if that linkage information is available via the appropriate web services/REST call. Part of the design but going in reverse is not implemented. You can start with a ChromosomeSequence and work your way down by adding introns and extrons. Andy has worked hard on this code which will make it really easy to use by programmers who don't know all the details. It has been a month since the BioJava Hackathon and feeling guilty that I haven't taken the time to write any of this up. Writing code is the easy part doing the documentation is always tough! I will see if this email generates a larger discussion among the list and based on how everything shakes out will turn the discussion into a wiki page to give a sequence design overview and code for testing and implementing of other proxy loaders. Thanks Scooter Willis On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > Depends on whether or not you want to parse-at-once or stream-parse. If the parser is set up to load the whole lot at once, then a map is fine, otherwise not. > > On 20 Feb 2010, at 08:30, Scooter Willis wrote: > >> >> I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. >> >> Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? >> >> Thanks >> >> Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > From holland at eaglegenomics.com Sat Feb 20 05:28:31 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 20 Feb 2010 18:28:31 +1300 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Message-ID: <2BCB480A-6CE6-4700-B210-25FE095CA95E@eaglegenomics.com> sounds excellent! On 20 Feb 2010, at 09:53, Scooter Willis wrote: > Richard > > For Stream parsing I have abstracted that down to a proxy data structure that looks just like ArrayListSequenceBackingStore that can keep an offset token in file stream where this makes sense for loading very large files without actually keeping everything in memory. You pay the price once to get the header information and the offset into the stream/file of the start of the sequence and the length. Then if the user makes a call to the Sequence to get either the actual sequence data or a subsequence then the required sequence is loaded from the stream/file. This doesn't make sense for slow io bound streams where the load penalty would be high but does work well for file IO via RandomAccessFile seek and how it is currently implemented. If you have a fasta file with 1GB of data but only plan on selecting 10 sequences but don't know what those 10 sequences are at load time then this works well. > > This also allows you to load a large genome or genome scaffold file and by implementing the details in SequenceFileProxyLoader access sequence data without loading in the entire genome into memory. Here is two approaches to loading the same file found in FastaReader.java The first FastaReader passes in ProteinSequenceCreator that will handle the creation of the actual protein sequence and the storage. The second test case use FileProxyProteinSequence where you need to pass in a reference to the File and as the initial file is parsed once it simply keeps track of the locations. The actual ProteinSequence that gets created is a ProteinSequence where the store is a SequenceArrayListProxyLoader instead of ArrayListSequenceBackingStore. I have put together but haven't checked in a FastReaderHelper class with static methods to hide this detail from someone who simply wants to load a Fasta file. > > String inputFile = "src/test/resources/PF00104_small.fasta"; > FileInputStream is = new FileInputStream(inputFile); > > FastaReader fastaReader = new FastaReader(is, new GenericFastaHeaderParser(), new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinSequences = fastaReader.process(); > is.close(); > > > System.out.println(proteinSequences); > > File file = new File(inputFile); > FastaReader fastaProxyReader = new FastaReader(file, new GenericFastaHeaderParser(), new FileProxyProteinSequenceCreator(file, AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinProxySequences = fastaProxyReader.process(); > > System.out.println(proteinProxySequences); > > > So in the current approach I would always be able to return a collection with knowledge of the header and a sequence that will either have the sequence data or know how to get it. This same concept works for being able to create a ProteinSequence where you can have a UniprotProxyProteinSequenceLoader or NCBIProxyProteinSequenceLoader where you only need to pass in the sequence unique id. The loader can get the sequence detail at the time the ProteinSequence is created or do it lazily when a request is made. This then extends back to genome views of DNASequence data where you don't need to even have the genome local but the appropriate genome sequence proxy loader would do a web services/REST call to the external server to retrieve the actual sequence or subsequence that is being requested. > > If you look at the AccessionID class I keep track of the type of accession id based on either recognizing the Fasta file header type or allowing the user to set it that will make working with features very powerful. If you know the accesion id and the type of id then making a request to a DAS server, Genome annotaiton server or Uniprot service to retreive features is easy. I haven't done that code yet but it is next on my list for a project I am working on. > > We also worked on building in the sequence classes the proper biological relationships such that if you start with a DNA sequence and apply the various exon/intron features you can have a TranscriptSequence that can return a ProteinSequence. In the reverse direction you should be able to take a ProteinSequence with a valid accession id with a known type and retrieve the parent DNA sequence if that linkage information is available via the appropriate web services/REST call. Part of the design but going in reverse is not implemented. You can start with a ChromosomeSequence and work your way down by adding introns and extrons. Andy has worked hard on this code which will make it really easy to use by programmers who don't know all the details. > > It has been a month since the BioJava Hackathon and feeling guilty that I haven't taken the time to write any of this up. Writing code is the easy part doing the documentation is always tough! I will see if this email generates a larger discussion among the list and based on how everything shakes out will turn the discussion into a wiki page to give a sequence design overview and code for testing and implementing of other proxy loaders. > > Thanks > > Scooter Willis > > > > > > > > On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > >> Depends on whether or not you want to parse-at-once or stream-parse. If the parser is set up to load the whole lot at once, then a map is fine, otherwise not. >> >> On 20 Feb 2010, at 08:30, Scooter Willis wrote: >> >>> >>> I am starting to use the new FastaReader in a project and the default implementation I setup returns a List. The very next thing I needed to do was convert to a LinkedHashMap so I could query the sequence of interest. It would seem that this is probably a fairly standard use case. If I returned a LinkedHashMap as the default container then we have a slight memory hit on keeping a hash of the accession ID and a linked list for preserving order. >>> >>> Does anyone have objections to returning the sequences read from a Fasta file as a LinkedHashMap? >>> >>> Thanks >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Sat Feb 20 10:47:40 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Sat, 20 Feb 2010 10:47:40 +0000 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Message-ID: <7532BACD-357E-4389-A072-20CE491BFC52@ebi.ac.uk> Hey guys, All the things that Scooter has done here I think is fantastic and the thought in the abstractions behind the loaders is really good. I'm especially liking the idea of being able to move into external resources for sequences. One thing that has always annoyed me is if I wanted to do some coding on a peptide sequence is having to download if from say UniProt/UniParc, save it into a file, read it & then do something. A store backing onto the large sequence repositories is a great win. I think the ones to target first for these systems are UniProt, eFetch & dbfetch. The last 2 are very important because they give us access to a huge number of databases from the single interface. One thing to remember when writing these classes is that the inbuilt HTTPConnection code with Java was always buggy and would leak sockets under some circumstances if you do not always read the out and the error streams. But I'm sure the IO utility code we've got can be modified to ensure we don't leak them :). In terms of the stuff I've been doing I've pushed everything into some lower level classes so in order to go from DNA to RNA you instantiate a class which can handle nucleotides. If you're in a DNASequence then there are already methods on there to go to RNA & from RNA to Protein. All the code which does this is held in other classes so it's all design by composition rather than inheritance. The way I'm currently imagining how you can move from one sequence to another is the registration of type specific features and then offering these sub-structures using the SequenceView code. So if we had a Gene the transcript could be defined by TransciptSequence.class & then when you request it we can then send back a SequenceView with ExonSequence.class objects registered to give the Exons & well I'm sure you can all see where it's coming. One thing I can't handle ATMO are phases so the code assumes everything starts in phase 1. For what we've got ATMO it's fine but later on this needs to be addressed. I'm also feeling guilty but it's quite hard getting the time to get the code down let alone documentation. So long as we make sure there's test cases available then we can see how to use the code as well for when we get round to documentation. Andy On 19 Feb 2010, at 20:53, Scooter Willis wrote: > Richard > > For Stream parsing I have abstracted that down to a proxy data > structure that looks just like ArrayListSequenceBackingStore that > can keep an offset token in file stream where this makes sense for > loading very large files without actually keeping everything in > memory. You pay the price once to get the header information and the > offset into the stream/file of the start of the sequence and the > length. Then if the user makes a call to the Sequence to get either > the actual sequence data or a subsequence then the required sequence > is loaded from the stream/file. This doesn't make sense for slow io > bound streams where the load penalty would be high but does work > well for file IO via RandomAccessFile seek and how it is currently > implemented. If you have a fasta file with 1GB of data but only plan > on selecting 10 sequences but don't know what those 10 sequences are > at load time then this works well. > > This also allows you to load a large genome or genome scaffold file > and by implementing the details in SequenceFileProxyLoader access > sequence data without loading in the entire genome into memory. Here > is two approaches to loading the same file found in FastaReader.java > The first FastaReader passes in ProteinSequenceCreator that will > handle the creation of the actual protein sequence and the storage. > The second test case use FileProxyProteinSequence where you need to > pass in a reference to the File and as the initial file is parsed > once it simply keeps track of the locations. The actual > ProteinSequence that gets created is a ProteinSequence where the > store is a SequenceArrayListProxyLoader instead of > ArrayListSequenceBackingStore. I have put together but haven't > checked in a FastReaderHelper class with static methods to hide this > detail from someone who simply wants to load a Fasta file. > > String inputFile = "src/test/resources/ > PF00104_small.fasta"; > FileInputStream is = new FileInputStream(inputFile); > > FastaReader fastaReader = new > FastaReader(is, new GenericFastaHeaderParser(), new > ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet > ())); > Collection proteinSequences = > fastaReader.process(); > is.close(); > > > System.out.println(proteinSequences); > > File file = new File(inputFile); > FastaReader fastaProxyReader = new > FastaReader(file, new GenericFastaHeaderParser(), > new FileProxyProteinSequenceCreator(file, > AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinProxySequences = > fastaProxyReader.process(); > > System.out.println(proteinProxySequences); > > > So in the current approach I would always be able to return a > collection with knowledge of the header and a sequence that will > either have the sequence data or know how to get it. This same > concept works for being able to create a ProteinSequence where you > can have a UniprotProxyProteinSequenceLoader or > NCBIProxyProteinSequenceLoader where you only need to pass in the > sequence unique id. The loader can get the sequence detail at the > time the ProteinSequence is created or do it lazily when a request > is made. This then extends back to genome views of DNASequence data > where you don't need to even have the genome local but the > appropriate genome sequence proxy loader would do a web services/ > REST call to the external server to retrieve the actual sequence or > subsequence that is being requested. > > If you look at the AccessionID class I keep track of the type of > accession id based on either recognizing the Fasta file header type > or allowing the user to set it that will make working with features > very powerful. If you know the accesion id and the type of id then > making a request to a DAS server, Genome annotaiton server or > Uniprot service to retreive features is easy. I haven't done that > code yet but it is next on my list for a project I am working on. > > We also worked on building in the sequence classes the proper > biological relationships such that if you start with a DNA sequence > and apply the various exon/intron features you can have a > TranscriptSequence that can return a ProteinSequence. In the reverse > direction you should be able to take a ProteinSequence with a valid > accession id with a known type and retrieve the parent DNA sequence > if that linkage information is available via the appropriate web > services/REST call. Part of the design but going in reverse is not > implemented. You can start with a ChromosomeSequence and work your > way down by adding introns and extrons. Andy has worked hard on this > code which will make it really easy to use by programmers who don't > know all the details. > > It has been a month since the BioJava Hackathon and feeling guilty > that I haven't taken the time to write any of this up. Writing code > is the easy part doing the documentation is always tough! I will see > if this email generates a larger discussion among the list and based > on how everything shakes out will turn the discussion into a wiki > page to give a sequence design overview and code for testing and > implementing of other proxy loaders. > > Thanks > > Scooter Willis > > > > > > > > On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > >> Depends on whether or not you want to parse-at-once or stream- >> parse. If the parser is set up to load the whole lot at once, then >> a map is fine, otherwise not. >> >> On 20 Feb 2010, at 08:30, Scooter Willis wrote: >> >>> >>> I am starting to use the new FastaReader in a project and the >>> default implementation I setup returns a List. >>> The very next thing I needed to do was convert to a >>> LinkedHashMap so I could query the >>> sequence of interest. It would seem that this is probably a fairly >>> standard use case. If I returned a >>> LinkedHashMap as the default container >>> then we have a slight memory hit on keeping a hash of the >>> accession ID and a linked list for preserving order. >>> >>> Does anyone have objections to returning the sequences read from a >>> Fasta file as a LinkedHashMap? >>> >>> Thanks >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From genjasp at gmail.com Sat Feb 20 11:07:05 2010 From: genjasp at gmail.com (Alessandro Cipriani) Date: Sat, 20 Feb 2010 12:07:05 +0100 Subject: [Biojava-dev] Global Alignment Message-ID: <46b9a2151002200307r749273e2gedf1ee28667f3f31@mail.gmail.com> Hi to all, i have a question: How do i align two circular dna sequence? Is it possible? Thx Regards alex -- Alessandro Cipriani (+39) 3206009509 http://www.cipriania.it skype:genjasp at gmail.com msn:jaspzz From invite+maeygmgn at facebookmail.com Fri Feb 19 08:42:23 2010 From: invite+maeygmgn at facebookmail.com (Biswaroop Ghosh) Date: Fri, 19 Feb 2010 00:42:23 -0800 Subject: [Biojava-dev] Check out my photos on Facebook Message-ID: <8c39212906af4eea9f7164124248e982@www.facebook.com> Hi Biojava-dev, I invited you to join Facebook a while back and wanted to remind you that once you join, we'll be able to connect online, share photos, organize groups and events, and more. Thanks, Biswaroop To sign up for Facebook, follow the link below: http://www.facebook.com/p.php?i=695556070&k=4V1Y3XW3UTXN3CD1RKWVPVWUVSKJ3U&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=biojava-dev at biojava.org&c=a29fd3b2297228c482384df0f8e6b128.biojava-dev at biojava.org was invited to join Facebook by Biswaroop Ghosh. If you do not wish to receive this type of email from Facebook in the future, please click on the link below to unsubscribe. http://www.facebook.com/o.php?k=e57693&u=693100398&mid=1e84c1dG294fdf6eG0G8 Facebook's offices are located at 1601 S. California Ave., Palo Alto, CA 94304. From andreas.prlic at gmail.com Sat Feb 20 20:55:39 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Sat, 20 Feb 2010 12:55:39 -0800 Subject: [Biojava-dev] List vs LinkedHashMap In-Reply-To: <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> References: <0FD1F27D-3577-48C9-A34E-C41191130C44@scripps.edu> <9570AD34-C620-4F46-A2AC-DD66CD5F4196@eaglegenomics.com> <4343DB95-1642-4933-A361-1DEDAF6A1CAC@scripps.edu> Message-ID: <59a41c431002201255u52aaf539m3d3e523763cdc87@mail.gmail.com> I think this is coming along really nicely. Before we can call this ready for release, we will still need to spend some time on documenting things, otherwise it will be very hard for any user to figure out what is going on. (that also applies to the structure modules, which need a push for more docu...) Beside this,what I have seen so far seems to be really easy to use... Andreas On Fri, Feb 19, 2010 at 12:53 PM, Scooter Willis wrote: > Richard > > For Stream parsing I have abstracted that down to a proxy data structure > that looks just like ArrayListSequenceBackingStore that can keep an offset > token in file stream where this makes sense for loading very large files > without actually keeping everything in memory. You pay the price once to get > the header information and the offset into the stream/file of the start of > the sequence and the length. Then if the user makes a call to the Sequence > to get either the actual sequence data or a subsequence then the required > sequence is loaded from the stream/file. This doesn't make sense for slow io > bound streams where the load penalty would be high but does work well for > file IO via RandomAccessFile seek and how it is currently implemented. If > you have a fasta file with 1GB of data but only plan on selecting 10 > sequences but don't know what those 10 sequences are at load time then this > works well. > > This also allows you to load a large genome or genome scaffold file and by > implementing the details in SequenceFileProxyLoader access sequence data > without loading in the entire genome into memory. Here is two approaches to > loading the same file found in FastaReader.java The first FastaReader passes > in ProteinSequenceCreator that will handle the creation of the actual > protein sequence and the storage. The second test case use > FileProxyProteinSequence where you need to pass in a reference to the File > and as the initial file is parsed once it simply keeps track of the > locations. The actual ProteinSequence that gets created is a ProteinSequence > where the store is a SequenceArrayListProxyLoader instead of > ArrayListSequenceBackingStore. I have put together but haven't checked in a > FastReaderHelper class with static methods to hide this detail from someone > who simply wants to load a Fasta file. > > String inputFile = "src/test/resources/PF00104_small.fasta"; > FileInputStream is = new FileInputStream(inputFile); > > FastaReader fastaReader = new > FastaReader(is, new GenericFastaHeaderParser(), new > ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinSequences = > fastaReader.process(); > is.close(); > > > System.out.println(proteinSequences); > > File file = new File(inputFile); > FastaReader fastaProxyReader = new > FastaReader(file, new GenericFastaHeaderParser(), new > FileProxyProteinSequenceCreator(file, > AminoAcidCompoundSet.getAminoAcidCompoundSet())); > Collection proteinProxySequences = > fastaProxyReader.process(); > > System.out.println(proteinProxySequences); > > > So in the current approach I would always be able to return a collection > with knowledge of the header and a sequence that will either have the > sequence data or know how to get it. This same concept works for being able > to create a ProteinSequence where you can have a > UniprotProxyProteinSequenceLoader or NCBIProxyProteinSequenceLoader where > you only need to pass in the sequence unique id. The loader can get the > sequence detail at the time the ProteinSequence is created or do it lazily > when a request is made. This then extends back to genome views of > DNASequence data where you don't need to even have the genome local but the > appropriate genome sequence proxy loader would do a web services/REST call > to the external server to retrieve the actual sequence or subsequence that > is being requested. > > If you look at the AccessionID class I keep track of the type of accession > id based on either recognizing the Fasta file header type or allowing the > user to set it that will make working with features very powerful. If you > know the accesion id and the type of id then making a request to a DAS > server, Genome annotaiton server or Uniprot service to retreive features is > easy. I haven't done that code yet but it is next on my list for a project I > am working on. > > We also worked on building in the sequence classes the proper biological > relationships such that if you start with a DNA sequence and apply the > various exon/intron features you can have a TranscriptSequence that can > return a ProteinSequence. In the reverse direction you should be able to > take a ProteinSequence with a valid accession id with a known type and > retrieve the parent DNA sequence if that linkage information is available > via the appropriate web services/REST call. Part of the design but going in > reverse is not implemented. You can start with a ChromosomeSequence and work > your way down by adding introns and extrons. Andy has worked hard on this > code which will make it really easy to use by programmers who don't know all > the details. > > It has been a month since the BioJava Hackathon and feeling guilty that I > haven't taken the time to write any of this up. Writing code is the easy > part doing the documentation is always tough! I will see if this email > generates a larger discussion among the list and based on how everything > shakes out will turn the discussion into a wiki page to give a sequence > design overview and code for testing and implementing of other proxy > loaders. > > Thanks > > Scooter Willis > > > > > > > > On Feb 19, 2010, at 2:51 PM, Richard Holland wrote: > > > Depends on whether or not you want to parse-at-once or stream-parse. If > the parser is set up to load the whole lot at once, then a map is fine, > otherwise not. > > > > On 20 Feb 2010, at 08:30, Scooter Willis wrote: > > > >> > >> I am starting to use the new FastaReader in a project and the default > implementation I setup returns a List. The very next thing > I needed to do was convert to a LinkedHashMap so I > could query the sequence of interest. It would seem that this is probably a > fairly standard use case. If I returned a > LinkedHashMap as the default container then we have > a slight memory hit on keeping a hash of the accession ID and a linked list > for preserving order. > >> > >> Does anyone have objections to returning the sequences read from a Fasta > file as a LinkedHashMap? > >> > >> Thanks > >> > >> Scooter > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From jolyon.holdstock at ogt.co.uk Thu Feb 25 18:11:39 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Thu, 25 Feb 2010 18:11:39 -0000 Subject: [Biojava-dev] Another alignment example Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> Hi, I was playing around with my alignment example and have made it prettier. I've added this to the cookbook as well. Is it a suitable example for a cookbook? Thanks, J Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. This email has been scanned by Oxford Gene Technology Security Systems. -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 15781 bytes Desc: image001.png URL: From bugzilla-daemon at portal.open-bio.org Fri Feb 26 00:30:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Feb 2010 19:30:06 -0500 Subject: [Biojava-dev] [Bug 2541] Exception is thrown when trying to parse a valid GenBank file In-Reply-To: Message-ID: <201002260030.o1Q0U6U4030935@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2541 maruco at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 26 10:46:54 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Feb 2010 05:46:54 -0500 Subject: [Biojava-dev] [Bug 2541] Exception is thrown when trying to parse a valid GenBank file In-Reply-To: Message-ID: <201002261046.o1QAksIv016486@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2541 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-02-26 05:46 EST ------- Did you cut and paste the example file? It is malformed (missing the recorded terminating // line after the sequence). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From maruco at gmail.com Fri Feb 26 12:32:52 2010 From: maruco at gmail.com (Thiago Satake) Date: Fri, 26 Feb 2010 09:32:52 -0300 Subject: [Biojava-dev] SVN Timed Out Message-ID: Hello all, I am new here and I am trying to check out the latest copy of biojava-live from the "SVN trunk" but I am reciving the message: svn list svn://code.open-bio.org/biojava svn: Can't connect to host 'code.open-bio.org': Connection timed out Is there sometinhg wrong I am doing ? Thanks -- Thiago Seito Satake Tel: (011) 6588-8045 From biopython at maubp.freeserve.co.uk Fri Feb 26 12:56:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Feb 2010 12:56:16 +0000 Subject: [Biojava-dev] SVN Timed Out In-Reply-To: References: Message-ID: <320fb6e01002260456l2c0b807ft1d5a847179e1e78f@mail.gmail.com> On Fri, Feb 26, 2010 at 12:32 PM, Thiago Satake wrote: > > Hello all, > > I am new here and I am trying to check out the latest copy of biojava-live > from the "SVN trunk" but I am reciving the message: > > svn list svn://code.open-bio.org/biojava > svn: Can't connect to host 'code.open-bio.org': Connection timed out > > Is there sometinhg wrong I am doing ? I know there have been some recent problems with the server code.open-bio.org which offers anonymous CVS/SVN access to most of the OBF projects. The OBF are aware of this and are looking into it. Peter e.g. http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032365.html From andreas at sdsc.edu Fri Feb 26 17:30:28 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 26 Feb 2010 09:30:28 -0800 Subject: [Biojava-dev] Another alignment example In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> Message-ID: <59a41c431002260930r7864930fjd32ca6f35acbdca6@mail.gmail.com> Hi Jolyon, I think it is fine. Perhaps you can merge the two examples into one, since they seem to be very similar? Cheers, Andreas On Thu, Feb 25, 2010 at 10:11 AM, Jolyon Holdstock < jolyon.holdstock at ogt.co.uk> wrote: > Hi, > > > > I was playing around with my alignment example and have made it > prettier. I've added this to the cookbook as well. > > > > Is it a suitable example for a cookbook? > > > > Thanks, > > > > J > > > > Dr. Jolyon Holdstock > Senior Computational Biologist, > > Oxford Gene Technology, > Begbroke Science Park, > Sandy Lane, Yarnton, > Oxford, OX5 1PF, UK. > > T: +44 (0)1865 856 852 > F: +44 (0)1865 842 116 > E: jolyon.holdstock at ogt.co.uk > > W: www.ogt.co.uk > > > > Looking to outsource your microarray studies? Look no further. > Click here to tour our facilities > > > Click here to request a quotation > > > > > Scientific pedigree delivering high quality microarray results to you: > > * Service capacity >1000 samples per week > > * Rigorous QC from sample to > result > > * Applications available > include aCGH, CNV, methylation studies and miRNA > > > > Oxford Gene Technology (Operations) Ltd. Registered in England No: > 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF > > Confidentiality Notice: The contents of this email from Oxford Gene > Technology are confidential and intended solely for the person to whom > it is addressed. It may contain privileged and confidential information. > If you are not the intended recipient you must not read, copy, > distribute, discuss or take any action in reliance on it. If you have > received this email in error please advise the sender so that we can > arrange for proper delivery. Then please delete the message from your > inbox. Thank you. > > > > > > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From jolyon.holdstock at ogt.co.uk Fri Feb 26 17:35:42 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Fri, 26 Feb 2010 17:35:42 -0000 Subject: [Biojava-dev] Another alignment example[Scanned] References: <588D0DD225D05746B5D8CAE1BE971F3F02D5FB19@EUCLID.internal.ogtip.com> <59a41c431002260930r7864930fjd32ca6f35acbdca6@mail.gmail.com> Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D5FCF0@EUCLID.internal.ogtip.com> Hi Andreas, They are very similar; the second one uses the same code as the first but build on it. I don't mind merging them, I didn't want to make it too complicated for a new BioJava user. Cheers, J From: Andreas Prlic [mailto:andreas at sdsc.edu] Sent: 26 February 2010 17:30 To: Jolyon Holdstock Cc: biojava-dev Subject: Re: [Biojava-dev] Another alignment example[Scanned] Hi Jolyon, I think it is fine. Perhaps you can merge the two examples into one, since they seem to be very similar? Cheers, Andreas On Thu, Feb 25, 2010 at 10:11 AM, Jolyon Holdstock wrote: Hi, I was playing around with my alignment example and have made it prettier. I've added this to the cookbook as well. Is it a suitable example for a cookbook? Thanks, J Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. This email has been scanned by Oxford Gene Technology Security Systems. _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev This email has been scanned by Oxford Gene Technology Security Systems. This email has been scanned by Oxford Gene Technology Security Systems.