From hrh at fmi.ch Tue Nov 1 06:18:54 2011 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Tue, 1 Nov 2011 11:18:54 +0100 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: Message-ID: Hi Carn? Please allow me to make a few comments: I very much like your idea of writing a free tool to edit and draw sequences. We (ie people working in core Bioinformatics facilities) all suffer from having to deal with files originally created with commercial packages. And on top of all the pain, those commercial packages are very expensive and they don't deliver what they promise to do. Just double checking: Have you looked a the free tools which are available? I am aware of the following ones (as far as I know, they are all GUI based and don't have a command line API): Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html GENtle http://gentle.magnusmanske.de/ GeneCoder http://www.algosome.com/gene-coder/gene-coder.html pDRAW32 http://www.acaclone.com/ Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> UGene http://ugene.unipro.ru/ maybe others on the list know of even better free tools? Also, have you looked at the emboss tool "cirdna" ? WRT file formats: I strongly suggest to stick to embl and genbank format as input and (text) output format. The features are not indexed, but you can create your own when you store the sequences in your system. Internally, you probably wanna keep the data in a 'simpler' format than embl or genbank, anyway. Alternatively, have you looked at gff/gtf as away of getting features? see: http://www.sequenceontology.org/gff3.shtml http://mblab.wustl.edu/GTF22.html I am looking forward to any progress you make Regards, Hans Hans-Rudolf Hotz, PhD Bioinformatics Support Friedrich Miescher Institute for Biomedical Research Maulbeerstrasse 66 4058 Basel/Switzerland On 10/31/11 7:05 PM, "Carn? Draug" wrote: > Hi > > I've been planning on writing a free (as in freedom) tool to edit > sequences and make plamids maps. The idea is to build the command line > tool first and maybe later work on a GUI for it. > > The problem I foresee at the moment while designing it, is how to > change a feature of the sequence. I'm not familiar with all sequence > formats (only fasta, ensembl and genbank) but I can't see how to > specify from the command line what feature to edit since I can't see > any unique identifiers for them. Is there a file format that makes > this easier? Any tips would be most appreciated. > > Thank in advance, > Carn? Draug > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 09:40:30 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 13:40:30 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote: > Hi, > > I am having problems running Bio::Index::Fastq. I get the following error when a quality line begins with '@'. > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: No description line parsed > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71 > STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29 > STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147 > STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198 > STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68 > > > Here is an example of a fastq record that is causing this error, The last line which starts with an '@' is actually the qual line. > > @5:105:15806:16092:Y > GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG > + > @9;A565:=8B? > > > i see that chris has partially addressed this in the mailing list > http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html > > However as he pointed out at the time, it appears this may be a fairly large problem. The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not. I can try to push this to the forefront this week, the fix shouldn't be too hard to implement. In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running. > My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0 would work since the header lines are always the first of 4 lines , 0,4,8, etc. That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing. There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again. A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second). The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use. > But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence > > > ## only for single line seq and qual > my $line_count = 0; > while (<$FASTQ>) { > if (/^@/ and $line_count % 4 == 0) { > # $begin is the position of the first character after the '@' > my $begin = tell($FASTQ) - length( $_ ) + 1; > foreach my $id (&$id_parser($_)) { > $self->add_record($id, $i, $begin); > $c++; > } > } > $line_count++; > } > > > -- > BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID? > > There's one called cdbfasta which looks like it might work ? does anyone have experience with it? I haven't, but it appears FASTA-specific. Does it parse FASTQ as well? I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well. May have to look that one up. > Thanks, > sofia > > P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here. chris From p.j.a.cock at googlemail.com Tue Nov 1 10:38:43 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 14:38:43 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J wrote: > > One problem the various Bio* indexers have currently is the lack of > standardization on a specific schema for indexing. ?There are in-roads > towards this (OBDA) that haven't been adequately traveled IMHO, > which need to be taken up again. > Something to switch to open-bio-l at lists.open-bio.org for, http://lists.open-bio.org/mailman/listinfo/open-bio-l We can continue this thread from last summer, http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html ... http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html And CC Peter Rice from EMBOSS too - we chatted about this at ISMB/BOSC 2011 in July - and whomever looks after the OBDA/indexing code in BioRuby and BioJava too. > A second, and maybe this is more specific to BioPerl, is that the > parsers and indexers essentially reimplement the format parsing > in each module, so if there are bugs they have to be independently > fixed (hence why SeqIO works and the indexer doesn't; I wrote the > first but not the second). ?The best place for any optimizations > would be in a unified parser that both the SeqIO and indexer > modules could use. We have that problem to an extent in Biopython's Bio.SeqIO code. The indexing code duplicates some logic of the parsing code (how much depends on the format), sufficient to extract the read ID and the bounds on disk. The two could be more unified but the parsers came first and didn't want to change them at the time. Instead I tried to be rigorous in consistency testing for the index code's unit tests. Regards, Peter From carandraug+dev at gmail.com Tue Nov 1 11:13:06 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 1 Nov 2011 15:13:06 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: On 1 November 2011 10:18, Hotz, Hans-Rudolf wrote: > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): They are not all free. Just for future reference, here's their licenses: > Serial Cloner Couldn't find a license and the download for linux has no source so I'm guessing proprietary. > GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/ Free under GPL > GeneCoder Proprietary > pDRAW32 Proprietary > Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/ Seems public domain. License is not defined anywhere but the files I checked had the public domain notice on the header > Ape Proprietary ("license" is at the top of AppMain.tcl) > UGene ? ? ? ? ? ? http://ugene.unipro.ru/ Free under GPL > Also, have you looked at the emboss tool "cirdna" ? Free under GPL > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html Considering the already existing alternatives, I'm more likely to collaborate with one of them to do what I want. I'll just have to check them all and decide. I was planning on writing a new tool and contribute it to the scripts section of bioperl since when I googled before all the links only the proprietary tools showed up. Thank you very much for the links. Carn? From roy.chaudhuri at gmail.com Tue Nov 1 11:44:19 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 01 Nov 2011 15:44:19 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: <4EB013D3.30801@gmail.com> The Sanger Institute's Artemis is good for editing sequence features, and DNAPlotter can be used to produce circular diagrams: http://www.sanger.ac.uk/resources/software/artemis http://www.sanger.ac.uk/resources/software/dnaplotter Roy. On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote: > Hi Carn? > > Please allow me to make a few comments: > > I very much like your idea of writing a free tool to edit and draw > sequences. We (ie people working in core Bioinformatics facilities) all > suffer from having to deal with files originally created with commercial > packages. And on top of all the pain, those commercial packages are very > expensive and they don't deliver what they promise to do. > > > Just double checking: Have you looked a the free tools which are available? > > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): > > Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html > GENtle http://gentle.magnusmanske.de/ > GeneCoder http://www.algosome.com/gene-coder/gene-coder.html > pDRAW32 http://www.acaclone.com/ > Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ > Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> > UGene http://ugene.unipro.ru/ > > maybe others on the list know of even better free tools? > > Also, have you looked at the emboss tool "cirdna" ? > > > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html > > > > I am looking forward to any progress you make > > Regards, Hans > > > > Hans-Rudolf Hotz, PhD > Bioinformatics Support > > Friedrich Miescher Institute for Biomedical Research > Maulbeerstrasse 66 > 4058 Basel/Switzerland > > > > On 10/31/11 7:05 PM, "Carn? Draug" wrote: > >> Hi >> >> I've been planning on writing a free (as in freedom) tool to edit >> sequences and make plamids maps. The idea is to build the command line >> tool first and maybe later work on a GUI for it. >> >> The problem I foresee at the moment while designing it, is how to >> change a feature of the sequence. I'm not familiar with all sequence >> formats (only fasta, ensembl and genbank) but I can't see how to >> specify from the command line what feature to edit since I can't see >> any unique identifiers for them. Is there a file format that makes >> this easier? Any tips would be most appreciated. >> >> Thank in advance, >> Carn? Draug >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Tue Nov 1 12:02:24 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 1 Nov 2011 09:02:24 -0700 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Jason On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J > wrote: >> >> One problem the various Bio* indexers have currently is the lack of >> standardization on a specific schema for indexing. There are in-roads >> towards this (OBDA) that haven't been adequately traveled IMHO, >> which need to be taken up again. >> > > Something to switch to open-bio-l at lists.open-bio.org for, > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > We can continue this thread from last summer, > http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html > ... > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html > > And CC Peter Rice from EMBOSS too - we chatted about this > at ISMB/BOSC 2011 in July - and whomever looks after the > OBDA/indexing code in BioRuby and BioJava too. > >> A second, and maybe this is more specific to BioPerl, is that the >> parsers and indexers essentially reimplement the format parsing >> in each module, so if there are bugs they have to be independently >> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >> first but not the second). The best place for any optimizations >> would be in a unified parser that both the SeqIO and indexer >> modules could use. > > We have that problem to an extent in Biopython's Bio.SeqIO code. > The indexing code duplicates some logic of the parsing code > (how much depends on the format), sufficient to extract the read > ID and the bounds on disk. The two could be more unified but > the parsers came first and didn't want to change them at the time. > Instead I tried to be rigorous in consistency testing for the index > code's unit tests. > > Regards, > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 13:44:25 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 17:44:25 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point. > I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data. The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)? Or are there problems afoot there we're unaware of? Re: specifics, I think Biopython uses SQLite, is that correct Peter? chris > Jason > On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > >> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J >> wrote: >>> >>> One problem the various Bio* indexers have currently is the lack of >>> standardization on a specific schema for indexing. There are in-roads >>> towards this (OBDA) that haven't been adequately traveled IMHO, >>> which need to be taken up again. >>> >> >> Something to switch to open-bio-l at lists.open-bio.org for, >> http://lists.open-bio.org/mailman/listinfo/open-bio-l >> >> We can continue this thread from last summer, >> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html >> ... >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html >> >> And CC Peter Rice from EMBOSS too - we chatted about this >> at ISMB/BOSC 2011 in July - and whomever looks after the >> OBDA/indexing code in BioRuby and BioJava too. >> >>> A second, and maybe this is more specific to BioPerl, is that the >>> parsers and indexers essentially reimplement the format parsing >>> in each module, so if there are bugs they have to be independently >>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >>> first but not the second). The best place for any optimizations >>> would be in a unified parser that both the SeqIO and indexer >>> modules could use. >> >> We have that problem to an extent in Biopython's Bio.SeqIO code. >> The indexing code duplicates some logic of the parsing code >> (how much depends on the format), sufficient to extract the read >> ID and the bounds on disk. The two could be more unified but >> the parsers came first and didn't want to change them at the time. >> Instead I tried to be rigorous in consistency testing for the index >> code's unit tests. >> >> Regards, >> >> Peter >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From p.j.a.cock at googlemail.com Tue Nov 1 14:06:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 18:06:50 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J wrote: > On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > >> I think a different indexer is needed for the scale of key/value >> pairs we see in fastq files if we want to make a fast lookup by >> ID. I think speed is of essence for this type of solution and so >> a forced all records must be 4 lines long is okay for this type >> of implementation. > > This can always be an early optimization, that's easy enough. > But I'm sure we will have to deal with multi-line seq/qual > FASTQ at some point. > >> I found NOSQL implementations to be much better >> performance and than any of the BDB type solutions -- they >> end up being really slow at above 1-5M keys. ?I used >> TokyoCabinet and KyotoCabinet to do indexing of accession >> -> taxonomy ID and found it quite fast for the needs. I >> haven't tried storing 100bp reads + qual string as the >> value in it yet but I think it could be done, certainly worth >> a prototype. > > Adding a middle layer where the backend storage is abstracted > is the probably the (best|most flexible) option, converging on a > good default that will work for this data. ?The actual interface is > in place, though would it be more feasible to go the OBDA > (converge on a cross-Bio* compatible schema)? ?Or are there > problems afoot there we're unaware of? > > Re: specifics, I think Biopython uses SQLite, is that correct Peter? > > chris Yes, we're using SQLite3 to store essentially a list of filenames and their format as one table, and then in the main table an entry for each sequence recording the ID (only one accession, unlike OBDA which had infrastructure for a secondary accession), file number, offset of the start of the record, and optionally the length of the record on disk. i.e. Basically what OBDA does, but using SQLite rather than BDB (not included in Python 3) or a flat file index (poor performance with large datasets). I find this design attractive on several levels: * File format neutral, covers FASTA, FASTQ, GenBank, etc * Preserves the original file untouched * Index is a small single file (thanks to SQLite) * Back end could be switched out * Could be applied to compressed file formats * Reuses existing parsing code to access entries This could easily form basis of OBDA v2, the main points of difference I anticipate between the Bio* projects would be naming conventions for the different file formats, and what we consider to be the default record ID of each read (e.g. which field in a GenBank file - although agreement here is not essential). Some of that was already settled in principle with OBDA v1. On the other hand, you could try and store the parsed data itself, which is where NOSQL looks more interesting. That essentially requires the ability to serialise your annotated sequence object model to disk - which would be tricky to do cross project (much more ambitious than BioSQL is). It also means the "index" becomes very large because it now holds all the original data. Peter From wenbinmei at gmail.com Wed Nov 2 00:25:32 2011 From: wenbinmei at gmail.com (wenbin mei) Date: Wed, 2 Nov 2011 00:25:32 -0400 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment Message-ID: Hi, I need some help in coding. I have a multiple sequence alignment which has gaps. And also I have a reference genome sequence in the alignment which I know all the coordinates for the protein coding genes. I want to extract all these protein coding genes alignment from the big alignment. I am using Bio SimpleAlign but the question is that due to the gaps in the alignment, the coordinates has shifted in the alignment. I wonder is there a way I can not count the gaps and still be able to extract the protein alignment. One way I can do is remove the gaps in the reference first and then extract the sequence. But I don't like this way ... Thank you for help. -best, wenbin From dejian.zhao at gmail.com Wed Nov 2 09:33:18 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Wed, 02 Nov 2011 21:33:18 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree Message-ID: <4EB1469E.4050108@gmail.com> There are various packages on CPAN to cope with phylogenetic analysis. I wonder which module can read the output from other phylogenetic softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to produce a picture which combines the phylogenetic tree and the structure of each gene. From roy.chaudhuri at gmail.com Wed Nov 2 09:49:46 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 02 Nov 2011 13:49:46 +0000 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB1469E.4050108@gmail.com> References: <4EB1469E.4050108@gmail.com> Message-ID: <4EB14A7A.30307@gmail.com> MEGA can export trees in Newick format, which can be read by Bio::TreeIO. The tree can be drawn in EPS format using Bio::Tree::Draw::Cladogram. See: http://www.bioperl.org/wiki/HOWTO:Trees Roy. On 02/11/2011 13:33, Dejian Zhao wrote: > There are various packages on CPAN to cope with phylogenetic analysis. I > wonder which module can read the output from other phylogenetic > softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to > produce a picture which combines the phylogenetic tree and the structure > of each gene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Wed Nov 2 12:29:45 2011 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT) Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: References: Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie> Hi, You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. $aln2 = $aln->slice(20, 30); Cheers, Jun ----- Original Message ----- From: wenbin mei Date: Wednesday, November 2, 2011 5:51 am Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment To: bioperl-l at lists.open-bio.org > Hi, > > I need some help in coding. I have a multiple sequence alignment > which has > gaps. And also I have a reference genome sequence in the > alignment which I > know all the coordinates for the protein coding genes. I want to > extractall these protein coding genes alignment from the big > alignment. I am using > Bio SimpleAlign but the question is that due to the gaps in the > alignment,the coordinates has shifted in the alignment. I wonder > is there a way I can > not count the gaps and still be able to extract the protein > alignment. One > way I can do is remove the gaps in the reference first and then > extract the > sequence. But I don't like this way ... Thank you for help. > > -best, > wenbin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Wed Nov 2 21:39:22 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Thu, 03 Nov 2011 09:39:22 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB14A7A.30307@gmail.com> References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com> Message-ID: <4EB1F0CA.80309@gmail.com> That's great! Many thanks, Roy. On 2011-11-2 21:49, Roy Chaudhuri wrote: > MEGA can export trees in Newick format, which can be read by > Bio::TreeIO. The tree can be drawn in EPS format using > Bio::Tree::Draw::Cladogram. See: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > On 02/11/2011 13:33, Dejian Zhao wrote: >> There are various packages on CPAN to cope with phylogenetic analysis. I >> wonder which module can read the output from other phylogenetic >> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to >> produce a picture which combines the phylogenetic tree and the structure >> of each gene. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From noncoding at gmail.com Thu Nov 3 05:59:26 2011 From: noncoding at gmail.com (Remo Sanges) Date: Thu, 03 Nov 2011 10:59:26 +0100 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie> References: <7300ecdd1dd56.4eb16ff9@ucd.ie> Message-ID: <4EB265FE.30909@gmail.com> To get the location in the initial sequence starting from a column in a multiple alignment you can: 1) create a Bio::LocatableSeq compliant object by using the method each_seq_with_id on the SimpleAlign object 2) then using the method location_from_column on the created LocatableSeq object HTH ERemo -- Remo Sanges Bioinformatics - Animal Physiology and Evolution Stazione Zoologica Anton Dohrn Villa Comunale, 80121 Napoli - Italy +39 081 5833428 On 11/2/11 5:29 PM, Jun Yin wrote: > Hi, > > You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. > > $aln2 = $aln->slice(20, 30); > > Cheers, > Jun > > > ----- Original Message ----- > From: wenbin mei > Date: Wednesday, November 2, 2011 5:51 am > Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment > To: bioperl-l at lists.open-bio.org > >> Hi, >> >> I need some help in coding. I have a multiple sequence alignment >> which has >> gaps. And also I have a reference genome sequence in the >> alignment which I >> know all the coordinates for the protein coding genes. I want to >> extractall these protein coding genes alignment from the big >> alignment. I am using >> Bio SimpleAlign but the question is that due to the gaps in the >> alignment,the coordinates has shifted in the alignment. I wonder >> is there a way I can >> not count the gaps and still be able to extract the protein >> alignment. One >> way I can do is remove the gaps in the reference first and then >> extract the >> sequence. But I don't like this way ... Thank you for help. >> >> -best, >> wenbin >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From G.Gallone at sms.ed.ac.uk Thu Nov 3 07:50:11 2011 From: G.Gallone at sms.ed.ac.uk (Giuseppe G.) Date: Thu, 03 Nov 2011 11:50:11 +0000 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk> Hi, I would be grateful if you could shed some light on the exact meaning of the method overall_percentage_identity in Bio::SimpleAlign. If I understand correctly, the method works by considering only aminoacids that are identical over all the members of the alignment, and then averaging over the total number of aminoacids in the sequence. Is this correct? Thank you Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Thu Nov 3 09:22:21 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 3 Nov 2011 14:22:21 +0100 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk> References: <4EB27FF3.9050203@sms.ed.ac.uk> Message-ID: Hi Giuseppe, If I understand correctly, the method works by considering only aminoacids > that are identical over all the members of the alignment Yes. > , and then averaging over the total number of aminoacids in the sequence. > Is this correct? > Almost. By default, the denominator is the alignment length, namely the length of the MSA including gaps. By means of the 'short' and 'long' options, it's also possible to use the shortest or longest sequence's ungapped lengths as the denominator. Dave From cjfields at illinois.edu Thu Nov 3 14:28:36 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 18:28:36 +0000 Subject: [Bioperl-l] OBDA redux? was Re: Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: (side thread, so re-titling...) On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J > wrote: >> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: >> >>> I think a different indexer is needed for the scale of key/value >>> pairs we see in fastq files if we want to make a fast lookup by >>> ID. I think speed is of essence for this type of solution and so >>> a forced all records must be 4 lines long is okay for this type >>> of implementation. >> >> This can always be an early optimization, that's easy enough. >> But I'm sure we will have to deal with multi-line seq/qual >> FASTQ at some point. >> >>> I found NOSQL implementations to be much better >>> performance and than any of the BDB type solutions -- they >>> end up being really slow at above 1-5M keys. I used >>> TokyoCabinet and KyotoCabinet to do indexing of accession >>> -> taxonomy ID and found it quite fast for the needs. I >>> haven't tried storing 100bp reads + qual string as the >>> value in it yet but I think it could be done, certainly worth >>> a prototype. >> >> Adding a middle layer where the backend storage is abstracted >> is the probably the (best|most flexible) option, converging on a >> good default that will work for this data. The actual interface is >> in place, though would it be more feasible to go the OBDA >> (converge on a cross-Bio* compatible schema)? Or are there >> problems afoot there we're unaware of? >> >> Re: specifics, I think Biopython uses SQLite, is that correct Peter? >> >> chris > > Yes, we're using SQLite3 to store essentially a list of filenames > and their format as one table, and then in the main table an > entry for each sequence recording the ID (only one accession, > unlike OBDA which had infrastructure for a secondary accession), > file number, offset of the start of the record, and optionally the > length of the record on disk. > > i.e. Basically what OBDA does, but using SQLite rather > than BDB (not included in Python 3) or a flat file index > (poor performance with large datasets). > > I find this design attractive on several levels: > * File format neutral, covers FASTA, FASTQ, GenBank, etc > * Preserves the original file untouched > * Index is a small single file (thanks to SQLite) > * Back end could be switched out > * Could be applied to compressed file formats > * Reuses existing parsing code to access entries > > This could easily form basis of OBDA v2, the main points > of difference I anticipate between the Bio* projects would > be naming conventions for the different file formats, and > what we consider to be the default record ID of each read > (e.g. which field in a GenBank file - although agreement > here is not essential). Some of that was already settled in > principle with OBDA v1. The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested). > On the other hand, you could try and store the parsed data > itself, which is where NOSQL looks more interesting. That > essentially requires the ability to serialise your annotated > sequence object model to disk - which would be tricky to do > cross project (much more ambitious than BioSQL is). It also > means the "index" becomes very large because it now holds > all the original data. > > Peter For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc). Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs). Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully. Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs. chris From p.j.a.cock at googlemail.com Thu Nov 3 14:52:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 3 Nov 2011 18:52:50 +0000 Subject: [Bioperl-l] OBDA redux? Message-ID: On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J wrote: > (side thread, so re-titling...) > And CC'ing open-bio-l, which is a better home for this than bioperl-l, where OBDA v2 talk came up again in discussion of a BioPerl indexing problem. Archive links for thread here: http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >> >> Yes, we're using SQLite3 to store essentially a list of filenames >> and their format as one table, and then in the main table an >> entry for each sequence recording the ID (only one accession, >> unlike OBDA which had infrastructure for a secondary accession), >> file number, offset of the start of the record, and optionally the >> length of the record on disk. >> >> i.e. Basically what OBDA does, but using SQLite rather >> than BDB (not included in Python 3) or a flat file index >> (poor performance with large datasets). >> >> I find this design attractive on several levels: >> * File format neutral, covers FASTA, FASTQ, GenBank, etc >> * Preserves the original file untouched >> * Index is a small single file (thanks to SQLite) >> * Back end could be switched out >> * Could be applied to compressed file formats >> * Reuses existing parsing code to access entries >> >> This could easily form basis of OBDA v2, the main points >> of difference I anticipate between the Bio* projects would >> be naming conventions for the different file formats, and >> what we consider to be the default record ID of each read >> (e.g. which field in a GenBank file - although agreement >> here is not essential). Some of that was already settled in >> principle with OBDA v1. > > The primary/secondary IDs could be configurable with a sane > default, I think the bioperl implementations allowed this (and > it is certainly something that will be requested). One reason I went with a single ID only was to keep the Python dictionary based API simple (think hash in Perl). You don't get secondary keys in a Python dict or a hash ;) As a nod to flexibility, in Biopython's Bio.SeqIO indexing you can provide a call back function to map the suggested ID to something else. Obviously this doesn't give the full flexibility of extracting a field from the record's annotation because we don't parse the whole record during indexing (it would be too slow). However, I'm happy for there to be an *optional* secondary key in an OBDA v2 SQLite schema, but Biopython probably won't populate it. We could optionally use it rather than the primary ID on loading an existing index though. Personally I would stick with one key in the index - it should be faster and makes it simpler to switch the back end if we need to later. If anyone wants a second key, they can build a second index *grin*. >> On the other hand, you could try and store the parsed data >> itself, which is where NOSQL looks more interesting. That >> essentially requires the ability to serialise your annotated >> sequence object model to disk - which would be tricky to do >> cross project (much more ambitious than BioSQL is). It also >> means the "index" becomes very large because it now holds >> all the original data. >> >> Peter > > For a fully cross-Bio* compliant format, I don't think it's feasible > to use serialized data unless they are serialized in something > that is easily deserialized across HLLs (JSON, BSON, YAML, > XML, etc). Either that, or such data is stored concurrently with > the binary blob, along with meta data that indicates the source > of the blob, parser, version, etc, etc (unless there are tools out > there that reliably interconvert serialized complex data structures > between HLLs). Anyway you go about it, it seems like it could > be a major ball of hurt, unless implemented very carefully. You missed out RDF as a serialisation ;) But yes, going down the shared serialisation route is going to be messy - as you are well aware: > Aside: I think this was one of the problems with > Bio::DB::SeqFeature::Store, in that it at one point stored > Perl-specific Storable blobs. > > chris Peter From cjfields at illinois.edu Thu Nov 3 15:47:51 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 19:47:51 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J > wrote: >> (side thread, so re-titling...) >> > And CC'ing open-bio-l, which is a better home for this than bioperl-l, > where OBDA v2 talk came up again in discussion of a BioPerl indexing > problem. Archive links for thread here: > > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html yes, good idea... >> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>> >>> Yes, we're using SQLite3 to store essentially a list of filenames >>> and their format as one table, and then in the main table an >>> entry for each sequence recording the ID (only one accession, >>> unlike OBDA which had infrastructure for a secondary accession), >>> file number, offset of the start of the record, and optionally the >>> length of the record on disk. >>> >>> i.e. Basically what OBDA does, but using SQLite rather >>> than BDB (not included in Python 3) or a flat file index >>> (poor performance with large datasets). >>> >>> I find this design attractive on several levels: >>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>> * Preserves the original file untouched >>> * Index is a small single file (thanks to SQLite) >>> * Back end could be switched out >>> * Could be applied to compressed file formats >>> * Reuses existing parsing code to access entries >>> >>> This could easily form basis of OBDA v2, the main points >>> of difference I anticipate between the Bio* projects would >>> be naming conventions for the different file formats, and >>> what we consider to be the default record ID of each read >>> (e.g. which field in a GenBank file - although agreement >>> here is not essential). Some of that was already settled in >>> principle with OBDA v1. >> >> The primary/secondary IDs could be configurable with a sane >> default, I think the bioperl implementations allowed this (and >> it is certainly something that will be requested). > > One reason I went with a single ID only was to keep the > Python dictionary based API simple (think hash in Perl). > You don't get secondary keys in a Python dict or a hash ;) > > As a nod to flexibility, in Biopython's Bio.SeqIO indexing you > can provide a call back function to map the suggested ID to > something else. Obviously this doesn't give the full flexibility > of extracting a field from the record's annotation because we > don't parse the whole record during indexing (it would be too > slow). Same with bioperl. > However, I'm happy for there to be an *optional* secondary > key in an OBDA v2 SQLite schema, but Biopython probably > won't populate it. We could optionally use it rather than the > primary ID on loading an existing index though. Optional implementation of that is fine by me. > Personally I would stick with one key in the index - it should > be faster and makes it simpler to switch the back end if we > need to later. If anyone wants a second key, they can build > a second index *grin*. That's easy enough. >>> On the other hand, you could try and store the parsed data >>> itself, which is where NOSQL looks more interesting. That >>> essentially requires the ability to serialise your annotated >>> sequence object model to disk - which would be tricky to do >>> cross project (much more ambitious than BioSQL is). It also >>> means the "index" becomes very large because it now holds >>> all the original data. >>> >>> Peter >> >> For a fully cross-Bio* compliant format, I don't think it's feasible >> to use serialized data unless they are serialized in something >> that is easily deserialized across HLLs (JSON, BSON, YAML, >> XML, etc). Either that, or such data is stored concurrently with >> the binary blob, along with meta data that indicates the source >> of the blob, parser, version, etc, etc (unless there are tools out >> there that reliably interconvert serialized complex data structures >> between HLLs). Anyway you go about it, it seems like it could >> be a major ball of hurt, unless implemented very carefully. > > You missed out RDF as a serialisation ;) > > But yes, going down the shared serialisation route is going > to be messy - as you are well aware: > >> Aside: I think this was one of the problems with >> Bio::DB::SeqFeature::Store, in that it at one point stored >> Perl-specific Storable blobs. >> >> chris > > Peter yes, it's a problem w/o an easy solution. Anyway, I think an implementation of such at this point would be a premature optimization. chris From biojiangke at gmail.com Tue Nov 8 17:29:54 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST) Subject: [Bioperl-l] Some questions about the Bio::PopGen In-Reply-To: References: Message-ID: <32805996.post@talk.nabble.com> I think the pi calculated in the function isn't really the pi as defined. You need to divide the value by total number of sites (in your case, it's 5, which is not your individual number but sequence length). I think the reason they implemented this way is that sometimes it's easier to work only with variable sites. The aln to population function converts an aln object to a population object. You can't really see the object unless you write additional codes to write it out or do some calculations on it. The third question depends on your specific needs. For population level analyses of molecular evolution, I usually create a multiple sequence alignment with other applications (clustalw etc), then manually adjust the alignments to make sure they represent homology. I wouldn't touch the alignment once this is done but only make an aln (or whatever format you want) for inputting to analyses applications, like Bio::PopGen (usually use the aln_to_population function you're using now). Qian Zhao wrote: > > Hi > Recently, I am learning how to caculate pi, Fst, Tajima D using > Bio::PopGen. > I am not familiar with Perl and I am really confused with the following > problems. > (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used > to caculate is this: > __DATA__ > 01 A01 A > 01 A02 A > 01 A03 A > 01 A04 A > 01 A05 A > 02 A01 A > 02 A02 T > 02 A03 T > 02 A04 T > 02 A05 T > 03 A01 G > 03 A02 G > 03 A03 G > 03 A04 G > 03 A05 G > 04 A01 G > 04 A02 G > 04 A03 C > 04 A04 C > 04 A05 G > 05 A01 T > 05 A02 C > 05 A03 T > 05 A04 T > 05 A05 T > And I am not sure if I can use these sequences below to demostrate the > prettybase format above: >>A01 > AAGGT >>A02 > ATGGC >>A03 > ATGCT >>A04 > ATGCT >>A05 > ATGGT > The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I > use DnaSP. I find that if the 1.4/5=0.28, which means that if the number > from Bio::PopGen::Statistics is divided by the individula number, the > result > would be exactly the same. Is there something wrong in my perl script? The > code I used was below: > #/usr/bin/perl -w > use warnings; > use strict; > use Bio::PopGen::Genotype; > my $genotype = Bio::PopGen::Genotype->new(-marker_name => 'gene_1', > -individual_id => '001', > -alleles => ['1','5'] ); > use Bio::PopGen::Individual; > my $ind = Bio::PopGen::Individual->new(-unique_id => '001', > -genotypes => [$genotype] ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > use Bio::PopGen::Population; > my $pop = Bio::PopGen::Population->new(-name => 'Bm', > -description => 'description', > -individuals => [$ind] ); > use Bio::PopGen::IO; > use Bio::PopGen::Statistics; > my $nummarkers = $pop->get_marker_names; > my $stats = Bio::PopGen::Statistics->new(); > my $io = Bio::PopGen::IO->new (-format => 'prettybase', > -file => '1.txt'); > if( my $pop = $io->next_population ) { > my $pi = $stats->pi($pop, $nummarkers); > print "pi is $pi\n"; > my @inds; > for my $ind ( $pop->get_Individuals ) { > if( $ind->unique_id =~ /A0[1-3]/ ) { > push @inds, $ind; > } > } > print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n"; > } > > (2) I want to use Bio::PopGen::Utilities to translate the alignment file > to > the population file. However, I can not find the result file after the > program. I use the following code: > use Bio::PopGen::Utilities; > use Bio::AlignIO; > > my $in = Bio::AlignIO->new(-file => 't/data/t7.aln', > -format => 'clustalw'); > my $aln = $in->next_aln; > my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln); > my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model => > 'cod', > -alignment => > $aln); > I am not sure where I should add my result file' name in the code. > (3) If my file contains a lot of individual sequences and one individual > has > one genotype. I'd like to know how can I use the Bio::PopGen::Individual, > Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which > can used in Bio::PopGen::Statistics ? > > I will be great appreciated if I can get the answers. Thanks for your time > and Best Wishes! > Qian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biojiangke at gmail.com Tue Nov 8 17:51:22 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST) Subject: [Bioperl-l] questions about the bioperl module Bio::PopGen::Statistics In-Reply-To: <201106012030039537050@gmail.com> References: <201106012030039537050@gmail.com> Message-ID: <32805997.post@talk.nabble.com> If you read the Bio::PopGen doc, you'll see there is an optional argument for the function that calculates pi, which is taking the number of sites into consideration. Also, when you use the aln_to_population function to input an alignment, you can use the option to take in all sites, including the monomorphic sites. I think if you implement both in your script, you'll get the same pi value as from other applications like DnaSP. In terms of sliding window analyses, you may have to implement your own method to move along the windows, but I think DnaSP is ready to do that, you don't have to write your won script. lvu.jun wrote: > > Hi, there, > I am trying to calculate the population genetics parameters such as pi > using the bioperl module Bio::PopGen::Statistics. But I found that the > method only requires the input of the marker genotype of every individuals > for the population. I don't know why the module does not take the DNA > sequence length into consideration when calculating the pi value. > According to the definition of the pi value, besides the polymorphic > sites, we also need the monomorphic sites that should be incorporated in > the denominator when doing the calculation. Is it right? therefore I'm > confused about the module, who can tell me why it can correctly calculate > the pi value only with the marker(polymorphic) genotype? > Another question, if I want to calculate the pi value using the sliding > window along the genome, how can I do this using the > Bio::PopGen::Statistics module? > Thanks for your help! > Yours sincerely, > Jun > > Chinese Academy of Sciences > > 2011-06-01 > > > > lvu.jun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shachigahoimbi at gmail.com Wed Nov 9 00:22:33 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Wed, 9 Nov 2011 10:52:33 +0530 Subject: [Bioperl-l] Run FGENESH using bioperl Message-ID: Dear All. I have multi-fasta sequence file and I want to run FGENESH and I would like to run the FGENESH for sequence one by one stored in multi-fasta sequence file. Is it possible using Bioperl ? Please guide me. Thanks in advance. -- Regards, Shachi From pankajt322 at gmail.com Thu Nov 3 08:12:44 2011 From: pankajt322 at gmail.com (pankaj) Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT) Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On Oct 21, 1:59?am, Shachi Gahoi wrote: > Dear all, > > I have fasta format sequence file and I want to extract ORF ID "PITG_14194" > from fasta file and then I want to rename same file with that ORF ID > "PITG_14194". > > I have many files and I want to do same exercise with all sequence files. > > Please tell me how can i do this with perl or bioperl. > > >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora > > infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 > MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL > ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA > RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF > HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM > YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL > TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD > RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR > NGIAVDHKGVICNGKAPIEIAVDENTLSAAA > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From azaballos at isciii.es Wed Nov 9 06:28:21 2011 From: azaballos at isciii.es (Angel Zaballos) Date: Wed, 9 Nov 2011 12:28:21 +0100 Subject: [Bioperl-l] bp_genbank2gff.pl bug Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Running bp_genbank2gff.pl got this: [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251. ?ngel Zaballos Unidad de Gen?mica Centro Nacional de Microbiolog?a-ISCIII Carretera Majadahonda-Pozuelo, Km 2,2 28220-Majadahonda Tel: 918223994 mail: azaballos at isciii.es ************************* AVISO LEGAL ************************* Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, pudiendo contener documentos anexos de car?cter privado y confidencial. Si por error, ha recibido este mensaje y no se encuentra entre los destinatarios, por favor, no use, informe, distribuya, imprima o copie su contenido por ning?n medio. Le rogamos lo comunique al remitente y borre completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no asume ning?n tipo de responsabilidad legal por el contenido de este mensaje cuando no responda a las funciones atribuidas al remitente del mismo por la normativa vigente. From scott at scottcain.net Wed Nov 9 11:12:02 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 11:12:02 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: Hi Angel, I would suggest using bp_genbank2gff3.pl, as it is more actively maintained; the bp_genbank2gff.pl script hasn't really been touched in many years, and I imagine it's suffering from some serious code rot. Scott 2011/11/9 Angel Zaballos > Running bp_genbank2gff.pl got this: > > [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > AAXT01000001.1 > babesichr3.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 11:13:10 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 16:13:10 +0000 Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On 3 November 2011 12:12, pankaj wrote: > > > On Oct 21, 1:59?am, Shachi Gahoi wrote: >> Dear all, >> >> I have fasta format sequence file and I want to extract ORF ID "PITG_14194" >> from fasta file and then I want to rename same file with that ORF ID >> "PITG_14194". >> >> I have many files and I want to do same exercise with all sequence files. >> >> Please tell me how can i do this with perl or bioperl. >> >> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora >> >> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 >> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL >> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA >> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF >> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM >> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL >> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD >> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR >> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA >> ---------- Forwarded message ---------- From: Jason Stajich Date: 21 October 2011 10:56 Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl To: Shachi Gahoi Cc: bioperl-l at bioperl.org easy to do this with a simple regular expression and opening a new file. Have you read up on this concept in Perl. You can use SeqIO to parse FASTA files - did you read the HOWTO and website documentation first? We don't typically do people's work for them on this mailing list so please show some effort first. From scott at scottcain.net Wed Nov 9 13:43:00 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 13:43:00 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Chris, Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. Scott 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or > remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus > destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie > su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III > no > >> asume ning?n tipo de responsabilidad legal por el contenido de este > mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo > por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 13:39:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 18:39:52 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Scott, Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? chris On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 9 14:51:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 19:51:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Scott, It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder. Either would prevent it from being packaged and installed in future versions. (Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules) chris On Nov 9, 2011, at 12:43 PM, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. > > Scott > > > 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 15:39:17 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 20:39:17 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: On 9 November 2011 18:43, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the > code repository) is not a bad idea. ?I can't really think of a down side. > > Scott Hi can I suggest instead to simply make the script issue a warning right at the start? Something like "bp_genbank2gff is obsolete and will be removed from a future version of bioerl; please use bp_genbank2gff3 instead". You could leave it there for the next 2 releases and then finally remove it. This would have 2 advantages: 1) people that have been using it will immediately know what to use as replacement (instead of coming and ask in the mailing list)? 2) people who use it but don't know anything about the subject, someone told them to "just press this button" or "just type this in the terminal", won't have suddenly a broken system and will have time to find someone that will make it work again. That's what's done in GNU octave and I think it works good there. Carn? From scott at scottcain.net Wed Nov 9 15:48:07 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 15:48:07 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Carn?, You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) Scott 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 16:59:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 21:59:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Works for me, it's a standard deprecation policy. The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning). chris On Nov 9, 2011, at 2:48 PM, Scott Cain wrote: > Hi Carn?, > > You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) > > Scott > > > 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From biopython at maubp.freeserve.co.uk Thu Nov 10 08:09:40 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 13:09:40 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: <31659982.post@talk.nabble.com> References: <31659982.post@talk.nabble.com> Message-ID: Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: > > I received the following error while trying to run bl2seq from > standaloneblastplus. Has anyone else encountered this problem? > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: /usr/bin/blastp call crashed: There was a problem running > /usr/bin/blastp : Error: NCBI C++ Exception: > > "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", > line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to > access NULL pointer. > > Thank you, > Ryan Just hit something very very similar, looks like a BLAST+ bug which I will report now: $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query NC_003197.fna -evalue 0.0001 -subject NC_011294.fna Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", line 689: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer. This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was BLAST 2.2.24+ (blastp) from the look of the error. The line number has changed by one, but I'm confident it is the same point of failure. In my case I was comparing nucleotide against nucleotide, so should have been using tblastx not tblastn, but it still shouldn't have had a pointer exception. Peter From cjfields at illinois.edu Thu Nov 10 09:00:46 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 14:00:46 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Nov 10, 2011, at 7:09 AM, Peter wrote: > Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html > > On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >> >> I received the following error while trying to run bl2seq from >> standaloneblastplus. Has anyone else encountered this problem? >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: /usr/bin/blastp call crashed: There was a problem running >> /usr/bin/blastp : Error: NCBI C++ Exception: >> >> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >> access NULL pointer. >> >> Thank you, >> Ryan > > Just hit something very very similar, looks like a BLAST+ bug which I > will report now: > > $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query > NC_003197.fna -evalue 0.0001 -subject NC_011294.fna > Error: NCBI C++ Exception: > "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", > line 689: Critical: ncbi::CObject::ThrowNullPointerException() - > Attempt to access NULL pointer. > > This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was > BLAST 2.2.24+ (blastp) from the look of the error. The line number has > changed by one, but I'm confident it is the same point of failure. > > In my case I was comparing nucleotide against nucleotide, so should > have been using tblastx not tblastn, but it still shouldn't have had a > pointer exception. > > Peter Yeah, that's bad. I have seen a few things like this myself that make me worry about the transition to BLAST+. chris PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? From casaburi at ceinge.unina.it Thu Nov 10 07:29:55 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads Message-ID: <32818254.post@talk.nabble.com> Hi everybody, i have some reads (454) where there are adaptors (NNNN...), one,two or three adaptors for each reads depending on the reads. Is there any way to establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors over the total ??? >271-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >272-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >273-88 GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >274-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA The problem is that some adpators occur in the middle of the sequences because they coming out from a concameration experimental design (they are miRNAs between NNNNNN...). So i want to know a script or tool that may say how many reads have 1 adapt, how many 2, (max are 4) in respect to the total number of reads. Do you know any tool/script that may help ? Tnx Can anyone suggests me a script to fix this ??? Thank you very much -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jovel_juan at hotmail.com Thu Nov 10 11:06:16 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Thu, 10 Nov 2011 16:06:16 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <32818254.post@talk.nabble.com> References: <32818254.post@talk.nabble.com> Message-ID: There are many ways to do it. Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. For example: $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. You then place that result in a hash bin: my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} # Then you can sort and output your classes foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } You can workout the details, but something like this should work. > Date: Thu, 10 Nov 2011 04:29:55 -0800 > From: casaburi at ceinge.unina.it > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Scripting help to identify adaptors count in reads > > > Hi everybody, > > i have some reads (454) where there are adaptors (NNNN...), one,two or three > adaptors for each reads depending on the reads. Is there any way to > establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors > over the total ??? > > >271-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG > >272-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC > >273-88 > GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA > >274-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA > > The problem is that some adpators occur in the middle of the sequences > because they coming out from a concameration experimental design (they are > miRNAs between NNNNNN...). So i want to know a script or tool that may say > how many reads have 1 adapt, how many 2, (max are 4) in respect to the total > number of reads. Do you know any tool/script that may help ? Tnx > Can anyone suggests me a script to fix this ??? > > Thank you very much > -- > View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Nov 10 11:55:53 2011 From: scott at scottcain.net (Scott Cain) Date: Thu, 10 Nov 2011 11:55:53 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: Hi Angel, Please keep correspondence on the mailing list. I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), and it worked fine. I suspect there is something wrong with your genbank file. Scott On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > His Scott, > > Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same > happened: > > [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > > babesichr3_2.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > UNIVERSAL->import is deprecated and will be removed in a future perl at > /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 > > However, the output file seems to be correct (Indeed, that was also the > case for bp_genbank2gff.pl). I then ran ldHgGene and worked: > > [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab > babesiachr3_2.gff > Reading babesiachr3_2.gff > Read 4776 transcripts in 8821 lines in 1 files > 4776 groups 1 seqs 1 sources 6 feature types > 2379 gene predictions > > I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a > Mac with Parallels. Maybe tis is the cause for such a message. > > Regards > > > ?ngel > > > El 09/11/2011, a las 17:12, Scott Cain escribi?: > > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este >> mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por >> la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From l.m.timmermans at students.uu.nl Thu Nov 10 12:17:12 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Thu, 10 Nov 2011 18:17:12 +0100 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence > (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# > $adapter_matches will store the number of times the adapter sequence is > repeated. > No, it will not. tr/// will count characters, not sequences. Something like ?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH. Leon From cjfields at illinois.edu Thu Nov 10 14:17:57 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 19:17:57 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu> This is running using an older version of bioperl (probably 1.6.0 or 1.6.1). The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed. chris On Nov 10, 2011, at 10:55 AM, Scott Cain wrote: > Hi Angel, > > Please keep correspondence on the mailing list. > > I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), > and it worked fine. I suspect there is something wrong with your genbank > file. > > Scott > > > On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > >> His Scott, >> >> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same >> happened: >> >> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > >> babesichr3_2.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> UNIVERSAL->import is deprecated and will be removed in a future perl at >> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 >> >> However, the output file seems to be correct (Indeed, that was also the >> case for bp_genbank2gff.pl). I then ran ldHgGene and worked: >> >> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab >> babesiachr3_2.gff >> Reading babesiachr3_2.gff >> Read 4776 transcripts in 8821 lines in 1 files >> 4776 groups 1 seqs 1 sources 6 feature types >> 2379 gene predictions >> >> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a >> Mac with Parallels. Maybe tis is the cause for such a message. >> >> Regards >> >> >> ?ngel >> >> >> El 09/11/2011, a las 17:12, Scott Cain escribi?: >> >> Hi Angel, >> >> I would suggest using bp_genbank2gff3.pl, as it is more actively >> maintained; the bp_genbank2gff.pl script hasn't really been touched in >> many years, and I imagine it's suffering from some serious code rot. >> >> Scott >> >> >> 2011/11/9 Angel Zaballos >> >>> Running bp_genbank2gff.pl got this: >>> >>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >>> AAXT01000001.1 > babesichr3.gff >>> Replacement list is longer than search list at >>> /usr/share/perl5/Bio/Range.pm line 251. >>> >>> >>> >>> ?ngel Zaballos >>> Unidad de Gen?mica >>> Centro Nacional de Microbiolog?a-ISCIII >>> Carretera Majadahonda-Pozuelo, Km 2,2 >>> 28220-Majadahonda >>> >>> Tel: 918223994 >>> mail: azaballos at isciii.es >>> >>> >>> >>> >>> ************************* AVISO LEGAL ************************* >>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >>> pudiendo contener documentos anexos de car?cter privado y confidencial. >>> Si por error, ha recibido este mensaje y no se encuentra entre los >>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >>> asume ning?n tipo de responsabilidad legal por el contenido de este >>> mensaje >>> cuando no responda a las funciones atribuidas al remitente del mismo por >>> la >>> normativa vigente. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Nov 10 14:27:22 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 19:27:22 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J wrote: > On Nov 10, 2011, at 7:09 AM, Peter wrote: > >> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html >> >> On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >>> >>> I received the following error while trying to run bl2seq from >>> standaloneblastplus. Has anyone else encountered this problem? >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: /usr/bin/blastp call crashed: There was a problem running >>> /usr/bin/blastp : Error: NCBI C++ Exception: >>> >>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >>> access NULL pointer. >>> >>> Thank you, >>> Ryan >> >> Just hit something very very similar, looks like a BLAST+ bug which I >> will report now: >> >> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query >> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna >> Error: NCBI C++ Exception: >> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", >> line 689: Critical: ncbi::CObject::ThrowNullPointerException() - >> Attempt to access NULL pointer. >> >> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was >> BLAST 2.2.24+ (blastp) from the look of the error. The line number has >> changed by one, but I'm confident it is the same point of failure. >> >> In my case I was comparing nucleotide against nucleotide, so should >> have been using tblastx not tblastn, but it still shouldn't have had a >> pointer exception. >> >> Peter > > Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+. > > chris I'm told is already fixed and will be part of BLAST 2.2.26+ which is good. > > PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? > Maybe once, but it was in the archive and my email account. Peter From anna.fr at gmail.com Thu Nov 10 15:01:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 09:01:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? Message-ID: Hi all Does anyone know if there is a way to get a Taxonomy node and/or taxonid from a gi number using the flatfile with taxonomy db? I have blast output that I want to append taxonomic information to. I have hundreds of thousands of items to do this for, so it's not practical to use entrez to query the?NCBI database. I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I think much too large to put into a hash! This was also discussed in 2009: http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I don't think there was a conclusion? Thanks for your help Anna Friedlander From shalabh.sharma7 at gmail.com Thu Nov 10 15:12:09 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 10 Nov 2011 15:12:09 -0500 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, I think the thread you mentioned was started by me. That time i wrote few scripts to map gi to taxa, after some time i found some other efficient ways also. But recently 'Miguel Pignatelli' directed to some Bio-LITE modules that are really helpful. These are the modules he mentioned, i found them really easy to use and very efficient. Bio-LITE-Taxonomy-0.07 Bio-LITE-Taxonomy-NCBI-0.07 Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 Cheers Shalabh On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Nov 10 15:23:14 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 20:23:14 +0000 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu> Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option). I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups. chris On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote: > Hi Anna, > I think the thread you mentioned was started by me. > That time i wrote few scripts to map gi to taxa, after some time i found > some other efficient ways also. But recently 'Miguel Pignatelli' directed > to some Bio-LITE modules that are really helpful. > > These are the modules he mentioned, i found them really easy to use and > very efficient. > > Bio-LITE-Taxonomy-0.07 > Bio-LITE-Taxonomy-NCBI-0.07 > Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 > > Cheers > Shalabh > > On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Thu Nov 10 15:51:13 2011 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 10 Nov 2011 21:51:13 +0100 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, Jason changed his example script from using hashes to using SQLite: bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom See https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl It's an example script that shows how to do the tax to gi mapping for blast reports. Bernd On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Nov 10 16:13:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 21:13:12 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split? Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)). tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match. '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/). chris On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. > You then place that result in a hash bin: > my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} > # Then you can sort and output your classes > foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } > > You can workout the details, but something like this should work. > > > > > > > >> Date: Thu, 10 Nov 2011 04:29:55 -0800 >> From: casaburi at ceinge.unina.it >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Scripting help to identify adaptors count in reads >> >> >> Hi everybody, >> >> i have some reads (454) where there are adaptors (NNNN...), one,two or three >> adaptors for each reads depending on the reads. Is there any way to >> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors >> over the total ??? >> >>> 271-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >>> 272-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >>> 273-88 >> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >>> 274-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA >> >> The problem is that some adpators occur in the middle of the sequences >> because they coming out from a concameration experimental design (they are >> miRNAs between NNNNNN...). So i want to know a script or tool that may say >> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total >> number of reads. Do you know any tool/script that may help ? Tnx >> Can anyone suggests me a script to fix this ??? >> >> Thank you very much >> -- >> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Thu Nov 10 16:15:29 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 10 Nov 2011 13:15:29 -0800 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI and then a second db to store GI -> TAXONID This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string. https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl That's the first 165 lines, and then lookups are basically what you see on line 195. Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?). one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading. Jason On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From anna.fr at gmail.com Thu Nov 10 20:07:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 14:07:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> References: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Message-ID: thanks all for the fast responses. I'll try the bio-lite modules shalabh mentioned On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich wrote: > Here's another variant of one I wrote which is for my own purposes, the code > at the beginning uses a NOSQL solution to storing all the ACC -> GI > and then a second db to store GI -> TAXONID > This is the case where I have a file of accession numbers and I want to add > to the description line the taxonomy string. > https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl > That's the first 165 lines, and then lookups are basically what you see on > line 195. > Would be good to rewrite that script below to use TokyoCabinent > or?KyotoCabinent (is newer implementation, not sure if it is faster?). > one thing that this does is take up a lot of disk space ,but you can have > tradeoffs between than and which compression scheme you use, which will > impact performance of loading. > Jason > On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > > have hundreds of thousands of items to do this for, so it's not > > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > > think much too large to put into a hash! > > This was also discussed in 2009: > > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > > don't think there was a conclusion? > > Thanks for your help > > Anna Friedlander > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arun_innovative90 at yahoo.com Fri Nov 11 06:09:46 2011 From: arun_innovative90 at yahoo.com (Arun Kumar) Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST) Subject: [Bioperl-l] BIOPERL MATERIAL Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Hi team, ? ?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl. ? Thanks in advance Thanks & Regards, Arunkumar.d From awitney at sgul.ac.uk Fri Nov 11 08:23:29 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 11 Nov 2011 13:23:29 +0000 Subject: [Bioperl-l] BIOPERL MATERIAL In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Message-ID: All BioPerl documents can be found here: http://www.bioperl.org/wiki/Main_Page And a useful place to start would be the HOWTOs: http://www.bioperl.org/wiki/HOWTOs regards adam On 11 Nov 2011, at 11:09, Arun Kumar wrote: > Hi team, > > This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with bioperl. > > Thanks in advance > > Thanks & Regards, > Arunkumar.d > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From casaburi at ceinge.unina.it Fri Nov 11 07:13:50 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825229.post@talk.nabble.com> Hi thank you for your answer !!! At the end i tried this script and seems to work for this purpose: perl -pe 's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g' Scrivania/orchidea/Fiore/Mydata.fasta > result.txt -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From casaburi at ceinge.unina.it Fri Nov 11 07:21:29 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825274.post@talk.nabble.com> Thanks everybody for answering me so soon !!! Probably another way may be: perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print "$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt and/or with 'nawk': nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i " ADAPTOR"}' myFile.fasta > result.txt They give the same result. If you will have this problem try these, work good !!! Still Thanks to all, Giorgio -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Sun Nov 13 07:24:35 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:24:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J wrote: > On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > >> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J >> wrote: >>> (side thread, so re-titling...) >>> >> And CC'ing open-bio-l, which is a better home for this than bioperl-l, >> where OBDA v2 talk came up again in discussion of a BioPerl indexing >> problem. Archive links for thread here: >> >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > > yes, good idea... I've not CC'd the bioperl-l anymore. >>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>>> >>>> Yes, we're using SQLite3 to store essentially a list of filenames >>>> and their format as one table, and then in the main table an >>>> entry for each sequence recording the ID (only one accession, >>>> unlike OBDA which had infrastructure for a secondary accession), >>>> file number, offset of the start of the record, and optionally the >>>> length of the record on disk. >>>> >>>> i.e. Basically what OBDA does, but using SQLite rather >>>> than BDB (not included in Python 3) or a flat file index >>>> (poor performance with large datasets). >>>> >>>> I find this design attractive on several levels: >>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>>> * Preserves the original file untouched >>>> * Index is a small single file (thanks to SQLite) >>>> * Back end could be switched out >>>> * Could be applied to compressed file formats >>>> * Reuses existing parsing code to access entries >>>> >>>> This could easily form basis of OBDA v2, the main points >>>> of difference I anticipate between the Bio* projects would >>>> be naming conventions for the different file formats, and >>>> what we consider to be the default record ID of each read >>>> (e.g. which field in a GenBank file - although agreement >>>> here is not essential). Some of that was already settled in >>>> principle with OBDA v1. >>> >>> The primary/secondary IDs could be configurable with a sane >>> default, I think the bioperl implementations allowed this (and >>> it is certainly something that will be requested). >> >> One reason I went with a single ID only was to keep the >> Python dictionary based API simple (think hash in Perl). >> You don't get secondary keys in a Python dict or a hash ;) >> >> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you >> can provide a call back function to map the suggested ID to >> something else. Obviously this doesn't give the full flexibility >> of extracting a field from the record's annotation because we >> don't parse the whole record during indexing (it would be too >> slow). > > Same with bioperl. > >> However, I'm happy for there to be an *optional* secondary >> key in an OBDA v2 SQLite schema, but Biopython probably >> won't populate it. We could optionally use it rather than the >> primary ID on loading an existing index though. > > Optional implementation of that is fine by me. > >> Personally I would stick with one key in the index - it should >> be faster and makes it simpler to switch the back end if we >> need to later. If anyone wants a second key, they can build >> a second index *grin*. > > That's easy enough. > >>>> On the other hand, you could try and store the parsed data >>>> itself, which is where NOSQL looks more interesting. That >>>> essentially requires the ability to serialise your annotated >>>> sequence object model to disk - which would be tricky to do >>>> cross project (much more ambitious than BioSQL is). It also >>>> means the "index" becomes very large because it now holds >>>> all the original data. >>>> >>>> Peter >>> >>> For a fully cross-Bio* compliant format, I don't think it's feasible >>> to use serialized data unless they are serialized in something >>> that is easily deserialized across HLLs (JSON, BSON, YAML, >>> XML, etc). ?Either that, or such data is stored concurrently with >>> the binary blob, along with meta data that indicates the source >>> of the blob, parser, version, etc, etc (unless there are tools out >>> there that reliably interconvert serialized complex data structures >>> between HLLs). ?Anyway you go about it, it seems like it could >>> be a major ball of hurt, unless implemented very carefully. >> >> You missed out RDF as a serialisation ;) >> >> But yes, going down the shared serialisation route is going >> to be messy - as you are well aware: >> >>> Aside: I think this was one of the problems with >>> Bio::DB::SeqFeature::Store, in that it at one point stored >>> Perl-specific Storable blobs. >>> >>> chris >> >> Peter > > yes, it's a problem w/o an easy solution. ?Anyway, I think an > implementation of such at this point would be a premature > optimization. > > chris So, Chris and I seem in general agreement that an OBDA v2 using SQLite but based on essentially the same approach as the BDB or flat file based OBDA v1 is a good idea. i.e. Tables mapping record identifiers to file offsets in the original sequence files. I hope to get BioRuby on board, they already have an OBDA v1 support so that shouldn't be too hard. Right now I don't recall if BioJava has/had OBDA v1 support, and if they did if it was affected in their recent move to BioJava v3 (I understand from their mailing list that some older lower priority functionality has not all been ported yet). Also EMBOSS are likely to be interested, certainly Peter Rice was interested in the SQLite indexing we're already using in Biopython for sequence files (i.e. what is effectively the prototype for OBDA v2). Note that in addition to simple indexing of text files, we are already using the same simple offset + length approach for indexing binary files (e.g. SFF). On the immediate practical side, I think I can edit the current OBDA website of http://obda.open-bio.org/ via /home/websites/obda.open-bio.org/html on the server. We need to work out where the current OBDA indexing specification lives (CVS or SVN?) and perhaps move that to github. We may need a general OBF organisation account on git hub for this and any other cross-project repositories. I see there is already an OBDA project on RedMine, (Chris can you add me to that please?) https://redmine.open-bio.org/projects/obda Peter From p.j.a.cock at googlemail.com Sun Nov 13 07:30:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:30:37 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files Message-ID: Hi again, I've retitled this as it is a little off topic from the main OBDA redux thread, http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html As far as I recall, the original flat file and BDB based OBDA specification for indexing sequencing files didn't cover compressed files. That might be something to consider (although we should sort of uncompressed text/binary files first). I've recently been experimenting with using compressed files - in particular simple GZIP files (ignoring any block structure) and BGZF (the specialised gzipped blocking used in BAM), see: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html http://seqanswers.com/forums/showthread.php?t=15347 The virtual offset approach used in BGZF squeezes a 16 bit within block offset (thus limiting you to 64kb blocks) and at 48 bit block start offset (thus limiting you to a 256TB file) into a single 64bit "virtual" offset. That makes sense if you are keeping the lookup table or many offsets in memory, and can be used as is with code expecting a single offset (like the current Biopython SQLite index schema). Also bzip2 but this is block based, with the block size ranging from 100KB to 900KB. http://bzip.org/ http://bzip.org/1.0.5/bzip2-manual-1.0.5.html I haven't tried any performance tests yet, which would be interesting as I believe compression/decompression of bfzip2 is more costly in CPU terms than gzip (although both will be block size dependent). If we wanted to imitate the BGZF virtual offset scheme for arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme could use 20 bits to cover bz2 blocks of up to 900KB, leaving 64 - 20 = 44 bits for the start offset, thus limiting you to to just 2^44 bytes or 16Tb which sounds OK only in the medium term. On the bright side this could be used to index any BZIP2 file (under 16TB), whereas BGZF cannot be applied to any GZIP file. On the other hand, storing the block start and within block separately is truly generic and could be used on any blocked GZIP file (including BGZF) and BZIP2 etc. It would make the SQLite schema a bit more complicated though. Maybe something to consider for the next revision to OBDA, and focus on the non-compressed case for now? Regards, Peter From p.j.a.cock at googlemail.com Sun Nov 13 07:32:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:32:12 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files In-Reply-To: References: Message-ID: On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock wrote: > Hi again, > > I've retitled this as it is a little off topic from the main OBDA redux thread, > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html > > As far as I recall, the original flat file and BDB based OBDA > specification for indexing sequencing files didn't cover > compressed files. That might be something to consider > (although we should sort of uncompressed text/binary > files first). Sorry - didn't meant to include bioperl-l on that, although it may be of interest to you guys anyway. Peter From jluis.lavin at unavarra.es Mon Nov 14 06:14:43 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 12:14:43 +0100 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Mon Nov 14 06:59:56 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 06:59:56 -0500 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > Hello everybody, > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > worked fine for me. Now I need to perform a multiple BLAST search, but this > time I'd just like to get all the BLAST results in a single out file > instead of having each sequence's report written individually. I've read > the documentation of the module, but due to my short > experience/understanding on complex modules as this one seems to be I can't > figure out where to change the script to achieve my previously mentioned > aim. > Here I post the script I've been using (it's basically the one posted on > the module cookbook). > > #!/c:/Perl -w > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > use Data::Dumper; > > #Here i set the parameters for blast > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > tblastx):\n"; > my $blst = ; > my $prog = "$blst"; > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > env_nr):\n"; > my $dtb = ; > $db = "$dtb"; > print "Enter your cutt off score (1e-n):\n"; > my $cut = ; > my $e_val = "$cut"; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > #Select the file and make the blast. > print "Enter your FASTA file:\n"; > chomp(my $infile = ); > my $r = $remoteBlast->submit_blast($infile); > my $v = 1; > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > TO RETURN!!!!! > while ( my @rids = $remoteBlast->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $remoteBlast->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $remoteBlast->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = > $result->query_name()."\.out";##################open SALIDA, > '>>'."$^T"."Report"."\.out"; > $remoteBlast->save_output($filename);############# > $remoteBlast->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > > > May any of you please explain me how to solve my question? > > Thanks in advence > > With best wishes > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Nov 14 09:07:36 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 09:07:36 -0500 Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single out References: Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Please keep this on list discussions Sent from my iPhone-please excuse typos -- Jason Stajich Begin forwarded message: > From: Jos? Luis Lav?n > Date: November 14, 2011 8:04:25 AM EST > To: Jason Stajich > Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out > > Hello Jason, > > As answering your question: > > " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" > > A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. > I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. > > Thanks in advance > > El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: > if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. > > If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? > > On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > > > Hello everybody, > > > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > > worked fine for me. Now I need to perform a multiple BLAST search, but this > > time I'd just like to get all the BLAST results in a single out file > > instead of having each sequence's report written individually. I've read > > the documentation of the module, but due to my short > > experience/understanding on complex modules as this one seems to be I can't > > figure out where to change the script to achieve my previously mentioned > > aim. > > Here I post the script I've been using (it's basically the one posted on > > the module cookbook). > > > > #!/c:/Perl -w > > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > use Data::Dumper; > > > > #Here i set the parameters for blast > > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > > tblastx):\n"; > > my $blst = ; > > my $prog = "$blst"; > > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > > env_nr):\n"; > > my $dtb = ; > > $db = "$dtb"; > > print "Enter your cutt off score (1e-n):\n"; > > my $cut = ; > > my $e_val = "$cut"; > > > > my @params = ( '-prog' => $prog, > > '-data' => $db, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #Select the file and make the blast. > > print "Enter your FASTA file:\n"; > > chomp(my $infile = ); > > my $r = $remoteBlast->submit_blast($infile); > > my $v = 1; > > > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > > TO RETURN!!!!! > > while ( my @rids = $remoteBlast->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $remoteBlast->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $remoteBlast->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = > > $result->query_name()."\.out";##################open SALIDA, > > '>>'."$^T"."Report"."\.out"; > > $remoteBlast->save_output($filename);############# > > $remoteBlast->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > > > > > May any of you please explain me how to solve my question? > > > > Thanks in advence > > > > With best wishes > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN From cl134 at duke.edu Sun Nov 13 09:42:05 2011 From: cl134 at duke.edu (Cheng-Ruei Lee) Date: Sun, 13 Nov 2011 09:42:05 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Hi all, Bioperl version: 1.006 Here are two error messages when I'm using this module to calculate Fu & Li's statistics: Illegal division by zero at (the Statistics.pm file) line 359 Illegal division by zero at (the Statistics.pm file) line 376 A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. Sincerely, Cheng-Ruei Lee From joluito at gmail.com Mon Nov 14 04:21:31 2011 From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 10:21:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Mon Nov 14 12:02:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:02:22 +0000 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... chris On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > Please keep this on list discussions > > Sent from my iPhone-please excuse typos > > -- > Jason Stajich > > Begin forwarded message: > >> From: Jos? Luis Lav?n >> Date: November 14, 2011 8:04:25 AM EST >> To: Jason Stajich >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out >> >> Hello Jason, >> >> As answering your question: >> >> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" >> >> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. >> I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. >> >> Thanks in advance >> >> El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: >> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. >> >> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? >> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> >>> Hello everybody, >>> >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has >>> worked fine for me. Now I need to perform a multiple BLAST search, but this >>> time I'd just like to get all the BLAST results in a single out file >>> instead of having each sequence's report written individually. I've read >>> the documentation of the module, but due to my short >>> experience/understanding on complex modules as this one seems to be I can't >>> figure out where to change the script to achieve my previously mentioned >>> aim. >>> Here I post the script I've been using (it's basically the one posted on >>> the module cookbook). >>> >>> #!/c:/Perl -w >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::SearchIO; >>> use Data::Dumper; >>> >>> #Here i set the parameters for blast >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >>> tblastx):\n"; >>> my $blst = ; >>> my $prog = "$blst"; >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, >>> env_nr):\n"; >>> my $dtb = ; >>> $db = "$dtb"; >>> print "Enter your cutt off score (1e-n):\n"; >>> my $cut = ; >>> my $e_val = "$cut"; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO' ); >>> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> >>> #Select the file and make the blast. >>> print "Enter your FASTA file:\n"; >>> chomp(my $infile = ); >>> my $r = $remoteBlast->submit_blast($infile); >>> my $v = 1; >>> >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS >>> TO RETURN!!!!! >>> while ( my @rids = $remoteBlast->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $remoteBlast->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $remoteBlast->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = >>> $result->query_name()."\.out";##################open SALIDA, >>> '>>'."$^T"."Report"."\.out"; >>> $remoteBlast->save_output($filename);############# >>> $remoteBlast->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> >>> >>> May any of you please explain me how to solve my question? >>> >>> Thanks in advence >>> >>> With best wishes >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 12:03:04 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:03:04 +0000 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: Cheng, Have you tried the latest CPAN release (we're at 1.006901). chris On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 12:59:35 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:59:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu> On Nov 13, 2011, at 6:24 AM, Peter Cock wrote: > So, Chris and I seem in general agreement that an OBDA v2 > using SQLite but based on essentially the same approach as > the BDB or flat file based OBDA v1 is a good idea. i.e. Tables > mapping record identifiers to file offsets in the original sequence > files. The worry I have is adhering to a specific backend (e.g. SQLite). The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets. Who's to say something similar won't happen to SQLite, or that it is the best option available? Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed). > I hope to get BioRuby on board, they already have an OBDA > v1 support so that shouldn't be too hard. > > Right now I don't recall if BioJava has/had OBDA v1 support, > and if they did if it was affected in their recent move to BioJava > v3 (I understand from their mailing list that some older lower > priority functionality has not all been ported yet). I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?) > Also EMBOSS are likely to be interested, certainly Peter Rice > was interested in the SQLite indexing we're already using in > Biopython for sequence files (i.e. what is effectively the > prototype for OBDA v2). > > Note that in addition to simple indexing of text files, we are > already using the same simple offset + length approach for > indexing binary files (e.g. SFF). I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well. > On the immediate practical side, I think I can edit the > current OBDA website of http://obda.open-bio.org/ > via /home/websites/obda.open-bio.org/html on the > server. See below w/ regards to my thoughts on the wiki. > We need to work out where the current OBDA indexing > specification lives (CVS or SVN?) and perhaps move > that to github. We may need a general OBF organisation > account on git hub for this and any other cross-project > repositories. +1 to a move to github, but maybe this belongs in an OBF-specific organization. And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. > I see there is already an OBDA project on RedMine, > (Chris can you add me to that please?) > https://redmine.open-bio.org/projects/obda > > Peter Done (last night actually, but I didn't have time to respond immediately). chris From David.Messina at sbc.su.se Mon Nov 14 14:31:18 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Nov 2011 20:31:18 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... Yes, it's the --remote option. I've used it, and it works great. The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers. Dave > From jluis.lavin at unavarra.es Mon Nov 14 16:23:31 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 22:23:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: Thank you very much for your answers, but due to them, I'm afraid I didn't explained myself good enough. I'm not looking for another tool to perform a BLAST task. I was just wondering if there was a way to simply change the way the module writes the outputs, so that I can get multiple searches in a single report file instead of having a report for each BLAST search. Maybe there's some issue I ignore, that makes you recommend the use of other tools instead of the Bioperl Remote BLAST module...it would be appreciated if you let me know about that (NCBI server problems with web-services or so)... Thank you for your answers anyway Best wishes 2011/11/14 Fields, Christopher J > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the > various 'blast*' indicating the search is to use a remote database. I > haven't used it, though... > > chris > > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > > > Please keep this on list discussions > > > > Sent from my iPhone-please excuse typos > > > > -- > > Jason Stajich > > > > Begin forwarded message: > > > >> From: Jos? Luis Lav?n > >> Date: November 14, 2011 8:04:25 AM EST > >> To: Jason Stajich > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a > single out > >> > >> Hello Jason, > >> > >> As answering your question: > >> > >> " If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a > table?" > >> > >> A concatenation of BLAST (default format) reports should be OK, since I > have a script to parse that kind of results. Anyway formats 1 or 2 will > also do the trick. > >> I'll be happy to get assistance on how to change the OUTFILE from "a > query a report" to "all queries in the same report", because I don't seem > to be able to do it myself after reading the module documentation. > >> > >> Thanks in advance > >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < > jason.stajich at gmail.com> escribi?: > >> if you want to do a bunch of BLASTs remotely on the cmdline you should > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ > equivalent). This might be faster to do and easier since you need to learn > the programming part too. > >> > >> If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a table? > >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > >> > >>> Hello everybody, > >>> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > >>> worked fine for me. Now I need to perform a multiple BLAST search, but > this > >>> time I'd just like to get all the BLAST results in a single out file > >>> instead of having each sequence's report written individually. I've > read > >>> the documentation of the module, but due to my short > >>> experience/understanding on complex modules as this one seems to be I > can't > >>> figure out where to change the script to achieve my previously > mentioned > >>> aim. > >>> Here I post the script I've been using (it's basically the one posted > on > >>> the module cookbook). > >>> > >>> #!/c:/Perl -w > >>> use Bio::Tools::Run::RemoteBlast; > >>> use Bio::SearchIO; > >>> use Data::Dumper; > >>> > >>> #Here i set the parameters for blast > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > >>> tblastx):\n"; > >>> my $blst = ; > >>> my $prog = "$blst"; > >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, > pdb, > >>> env_nr):\n"; > >>> my $dtb = ; > >>> $db = "$dtb"; > >>> print "Enter your cutt off score (1e-n):\n"; > >>> my $cut = ; > >>> my $e_val = "$cut"; > >>> > >>> my @params = ( '-prog' => $prog, > >>> '-data' => $db, > >>> '-expect' => $e_val, > >>> '-readmethod' => 'SearchIO' ); > >>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > >>> > >>> > >>> #Select the file and make the blast. > >>> print "Enter your FASTA file:\n"; > >>> chomp(my $infile = ); > >>> my $r = $remoteBlast->submit_blast($infile); > >>> my $v = 1; > >>> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE > RESULTS > >>> TO RETURN!!!!! > >>> while ( my @rids = $remoteBlast->each_rid ) { > >>> foreach my $rid ( @rids ) { > >>> my $rc = $remoteBlast->retrieve_blast($rid); > >>> if( !ref($rc) ) { > >>> if( $rc < 0 ) { > >>> $remoteBlast->remove_rid($rid); > >>> } > >>> print STDERR "." if ( $v > 0 ); > >>> sleep 5; > >>> } else { > >>> my $result = $rc->next_result(); > >>> #save the output > >>> my $filename = > >>> $result->query_name()."\.out";##################open SALIDA, > >>> '>>'."$^T"."Report"."\.out"; > >>> $remoteBlast->save_output($filename);############# > >>> $remoteBlast->remove_rid($rid); > >>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>> while ( my $hit = $result->next_hit ) { > >>> next unless ( $v > 0); > >>> print "\thit name is ", $hit->name, "\n"; > >>> while( my $hsp = $hit->next_hsp ) { > >>> print "\t\tscore is ", $hsp->score, "\n"; > >>> } > >>> } > >>> } > >>> } > >>> } > >>> > >>> > >>> May any of you please explain me how to solve my question? > >>> > >>> Thanks in advence > >>> > >>> With best wishes > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> -- > >> -- > >> Dr. Jos? Luis Lav?n Trueba > >> > >> Dpto. de Producci?n Agraria > >> Grupo de Gen?tica y Microbiolog?a > >> Universidad P?blica de Navarra > >> 31006 Pamplona > >> Navarra > >> SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Mon Nov 14 22:53:19 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 22:53:19 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com> sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming. I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. https://redmine.open-bio.org/issues/3313 Jason Can you provide a test script and we'll add a test for this so On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cchehoud at gmail.com Mon Nov 14 20:39:32 2011 From: cchehoud at gmail.com (Christel Chehoud) Date: Mon, 14 Nov 2011 17:39:32 -0800 Subject: [Bioperl-l] Bioperl installation help Message-ID: Dear BioPerl, Thank you for creating such useful code. Unfortunately, every time I try to install Bioperl, it takes me a long time and is a challenging ordeal :( I am a new MAC user and was not able to download bioperl using CPAN. Here is the error I am getting: ERROR: Can't create '/usr/local/bin' Do not have write permissions on '/usr/local/bin' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 CJFIELDS/BioPerl-1.6.0.tar.gz ./Build install -- NOT OK ---- You may have to su to root to install the package (Or you may want to run something like o conf make_install_make_command 'sudo make' to raise your permissions.Warning (usually harmless): 'YAML' not installed, will not store persistent state Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but failure ignored because 'force' in effect so I did: cpan> o conf make_install_make_command 'sudo make' followed by cpan> o conf commit and started over..I got the same number of errors as last time (so I decided not to force install this time). do you have any suggestions: 63 tests and 305 subtests skipped. Failed 11/329 test scripts. 981/17708 subtests failed. Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = 117.20 CPU) Failed 11/329 test programs. 981/17708 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Warning (usually harmless): 'YAML' not installed, will not store persistent state Running Build install make test had returned bad status, won't install without force Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Thanks a lot for your time and help. I appreciate it. Thank you, Christel From casaburi at ceinge.unina.it Tue Nov 15 04:25:25 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST) Subject: [Bioperl-l] Blast > parsing result in Exel Message-ID: <32846407.post@talk.nabble.com> Hy everybody, in this situation froma blast (-m 1) result file : Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 132-291 (59 letters) Database: Scrivania/orchidea/mature_mirBase.fa 21,643 sequences; 470,608 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031 mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031 gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9 gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9 mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9 132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59 12631 5 .............. 18 12630 5 .............. 18 7826 5 ........... 15 7644 19 ........... 9 5394 3 ........... 13 5394 3 ........... 13 BLASTN 2.2.21 [Jun-14-2009] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ... .... .......... ______________________________________________________________ I need to parse in an exel sheet : 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula Is possible from a big blast result file obtain an exel with 5 columns where every field is the first hit of the blast result. Can anyone halp me to fix this problem ??? Also with a little script in perl. Thank you very much -- View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From nisa.dar10 at gmail.com Tue Nov 15 19:49:00 2011 From: nisa.dar10 at gmail.com (nisa.dar) Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST) Subject: [Bioperl-l] print alignment from blast results file Message-ID: <32851673.post@talk.nabble.com> Hi, I am parsing a blast results file. I have found bioperl modules to get query string, homology string and hit string for each hit/hsp. I want to print them in the form of an alignment instead of aligning them individually. this is what I am doing, but it doesn't seem correct while (my $hsp = $hit->next_hsp) { my $start_query_num=$hsp->start('query'); my $query_string=$hsp->query_string; my $end_query_num=$hsp->end('query'); my $homology_string=$hsp->homology_string; my $start_hit_num=$hsp->start('hit'); my $hit_string=$hsp->hit_string; my $end_hit_num=$hsp->end('hit'); my $aln_o = $hsp->get_aln; $query_string=~s/\n//g;#get rid of new line characters $homology_string=~s/\n//g; $hit_string=~s/\n//g; print "

Alignment:


"; print "$start_query_num-$query_string-$end_query_num
"; print "         $homology_string
"; print "$start_hit_num-$hit_string-$end_hit_num

"; } Please let me know how can I print them in the form of an alignment as seen in the blast results file. Thanks -- View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Wed Nov 16 04:11:40 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Nov 2011 09:11:40 +0000 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C wrote: > > Hy everybody, > > in this situation froma blast (-m 1) result file : > > ... > > I need to parse in an exel sheet : > > 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species > > > 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula > > Is possible from a big blast result file obtain an exel with 5 columns where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > > Thank you very much Have you looked at any of the BioPerl BLAST parsing examples? e.g http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/Module:Bio::SearchIO See also http://seqanswers.com/forums/showthread.php?t=15489 Peter From bosborne11 at verizon.net Wed Nov 16 08:19:33 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 16 Nov 2011 08:19:33 -0500 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <32851673.post@talk.nabble.com> References: <32851673.post@talk.nabble.com> Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Nisa, See: http://www.bioperl.org/wiki/HOWTO:SearchIO Brian O. On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > > Hi, > > I am parsing a blast results file. I have found bioperl modules to get query > string, homology string and hit string for each hit/hsp. I want to print > them in the form of an alignment instead of aligning them individually. > > this is what I am doing, but it doesn't seem correct > > while (my $hsp = $hit->next_hsp) { > my > $start_query_num=$hsp->start('query'); > my $query_string=$hsp->query_string; > my $end_query_num=$hsp->end('query'); > my $homology_string=$hsp->homology_string; > my $start_hit_num=$hsp->start('hit'); > my $hit_string=$hsp->hit_string; > my $end_hit_num=$hsp->end('hit'); > my $aln_o = $hsp->get_aln; > $query_string=~s/\n//g;#get rid of new line characters > $homology_string=~s/\n//g; > $hit_string=~s/\n//g; > > print "

Alignment:


"; > print "$start_query_num-$query_string-$end_query_num
"; > print " >         $homology_string
"; > print "$start_hit_num-$hit_string-$end_hit_num

"; > > > > } > > Please let me know how can I print them in the form of an alignment as seen > in the blast results file. > > Thanks > > > -- > View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 11:44:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:44:27 +0000 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu> For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules). This should automatically install the latest version from CPAN. My guess is this will address some of the issues. However, w/o actually seeing what tests failed we can't help. Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB. There are instructions in the installation docs for that. You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan. chris On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 11:46:16 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:46:16 +0000 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> References: <32851673.post@talk.nabble.com> <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Message-ID: small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance). chris On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote: > Nisa, > > See: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > Brian O. > > > On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > >> >> Hi, >> >> I am parsing a blast results file. I have found bioperl modules to get query >> string, homology string and hit string for each hit/hsp. I want to print >> them in the form of an alignment instead of aligning them individually. >> >> this is what I am doing, but it doesn't seem correct >> >> while (my $hsp = $hit->next_hsp) { >> my >> $start_query_num=$hsp->start('query'); >> my $query_string=$hsp->query_string; >> my $end_query_num=$hsp->end('query'); >> my $homology_string=$hsp->homology_string; >> my $start_hit_num=$hsp->start('hit'); >> my $hit_string=$hsp->hit_string; >> my $end_hit_num=$hsp->end('hit'); >> my $aln_o = $hsp->get_aln; >> $query_string=~s/\n//g;#get rid of new line characters >> $homology_string=~s/\n//g; >> $hit_string=~s/\n//g; >> >> print "

Alignment:


"; >> print "$start_query_num-$query_string-$end_query_num
"; >> print " >>         $homology_string
"; >> print "$start_hit_num-$hit_string-$end_hit_num

"; >> >> >> >> } >> >> Please let me know how can I print them in the form of an alignment as seen >> in the blast results file. >> >> Thanks >> >> >> -- >> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 16 12:01:49 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Nov 2011 18:01:49 +0100 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: Hi Christel, Sorry to hear you're having trouble with the installation. It looks like these modules aren't getting installed and are causing the failed tests: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO I would try installing those separately via CPAN first and then trying again to install BioPerl. Also, it was a good idea to set the make_install_make_command option to CPAN, and that should have worked. Unfortunately, there's another installation system called Module::Build that has its own option which may need to be set: cpan> o conf mbuild_install_build_command 'sudo ./Build' That being said, I would suggest you grab the latest version of BioPerl from github instead of using v1.6.1 from CPAN, which is fairly out of date at this point. And unless you're planning to use BioPerl with GBrowse or Bio::Graphics, there's another, simpler way to get BioPerl up and running (assuming you have all the prerequisites like Data::Stag installed): See "Don't want to install BioPerl?" here: http://www.seqxml.org/xml/BioPerl.html Best, Dave On Tue, Nov 15, 2011 at 02:39, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm > line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Wed Nov 16 13:31:46 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Wed, 16 Nov 2011 19:31:46 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: Thank you for your answer Jason, While answering you I figured out how to do it...sometimes you need other people's point of view to see the light. As you pointed out: "what is complicaticated is the file name right now is based on the query name." that's what I expected that could have an easy fix, the issue about the dependency between the outfile name and the query name, this is why I couldn't figure out how to change the name of the output . While reading the code to answer you, I came across the solution. I was persistent on doing it this way because I need to run BLAST remotely on my CGI, that's why I didn't pay attention to all the other options you suggested. Thank you all for your sugestions anyway. ;) Best wishes JL El 16 de noviembre de 2011 18:03, Jason Stajich escribi?: > the answer to your question is to move the line that opens a file to > outside the loop. what is complicaticated is the file name right now is > based on the query name. so you need to think how you want to name the > file. Since this isn't obvious to you, then I think we are suggesting you > probably need to understand programming more, and it might just be easier > to use the tools as we have suggested rather than teaching you to modify > what is just an example code. our suggestions are based on the way we'd > solve the problem so maybe you have other reasons for the direction you > want to take. > > I also think it is not efficient or logical to run > remote blast through the web protocol simply to write it back out with > bioperl since that has to parse it in and then write it out -- why not just > run the program that generates the output directly from NCBI. Or run BLAST > locally for likely more efficient running. > > Finally the bioperl writer may not 100% reproduce the blast output so if > you are planning on further parsing the output that comes out from this > script, it really doesn't seem like a good idea to launder it through > bioperl parser first. > > > > 2011/11/14 Jos? Luis Lav?n > >> Thank you very much for your answers, but due to them, I'm afraid I didn't >> explained myself good enough. >> >> I'm not looking for another tool to perform a BLAST task. I was just >> wondering if there was a way to simply change the way the module writes >> the >> outputs, so that I can get multiple searches in a single report file >> instead of having a report for each BLAST search. >> >> Maybe there's some issue I ignore, that makes you recommend the use of >> other tools instead of the Bioperl Remote BLAST module...it would be >> appreciated if you let me know about that (NCBI server problems with >> web-services or so)... >> >> Thank you for your answers anyway >> >> Best wishes >> >> 2011/11/14 Fields, Christopher J >> >> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for >> the >> > various 'blast*' indicating the search is to use a remote database. I >> > haven't used it, though... >> > >> > chris >> > >> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: >> > >> > > Please keep this on list discussions >> > > >> > > Sent from my iPhone-please excuse typos >> > > >> > > -- >> > > Jason Stajich >> > > >> > > Begin forwarded message: >> > > >> > >> From: Jos? Luis Lav?n >> > >> Date: November 14, 2011 8:04:25 AM EST >> > >> To: Jason Stajich >> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a >> > single out >> > >> >> > >> Hello Jason, >> > >> >> > >> As answering your question: >> > >> >> > >> " If you want to do this within this code I guess the question is >> what >> > format you want the data in - a BLAST report or something more like a >> > table?" >> > >> >> > >> A concatenation of BLAST (default format) reports should be OK, >> since I >> > have a script to parse that kind of results. Anyway formats 1 or 2 will >> > also do the trick. >> > >> I'll be happy to get assistance on how to change the OUTFILE from "a >> > query a report" to "all queries in the same report", because I don't >> seem >> > to be able to do it myself after reading the module documentation. >> > >> >> > >> Thanks in advance >> > >> >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < >> > jason.stajich at gmail.com> escribi?: >> > >> if you want to do a bunch of BLASTs remotely on the cmdline you >> should >> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ >> > equivalent). This might be faster to do and easier since you need to >> learn >> > the programming part too. >> > >> >> > >> If you want to do this within this code I guess the question is what >> > format you want the data in - a BLAST report or something more like a >> table? >> > >> >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> > >> >> > >>> Hello everybody, >> > >>> >> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it >> has >> > >>> worked fine for me. Now I need to perform a multiple BLAST search, >> but >> > this >> > >>> time I'd just like to get all the BLAST results in a single out file >> > >>> instead of having each sequence's report written individually. I've >> > read >> > >>> the documentation of the module, but due to my short >> > >>> experience/understanding on complex modules as this one seems to be >> I >> > can't >> > >>> figure out where to change the script to achieve my previously >> > mentioned >> > >>> aim. >> > >>> Here I post the script I've been using (it's basically the one >> posted >> > on >> > >>> the module cookbook). >> > >>> >> > >>> #!/c:/Perl -w >> > >>> use Bio::Tools::Run::RemoteBlast; >> > >>> use Bio::SearchIO; >> > >>> use Data::Dumper; >> > >>> >> > >>> #Here i set the parameters for blast >> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >> > >>> tblastx):\n"; >> > >>> my $blst = ; >> > >>> my $prog = "$blst"; >> > >>> print "Enter a database to search (nr, refseq_protein, swissprot, >> pat, >> > pdb, >> > >>> env_nr):\n"; >> > >>> my $dtb = ; >> > >>> $db = "$dtb"; >> > >>> print "Enter your cutt off score (1e-n):\n"; >> > >>> my $cut = ; >> > >>> my $e_val = "$cut"; >> > >>> >> > >>> my @params = ( '-prog' => $prog, >> > >>> '-data' => $db, >> > >>> '-expect' => $e_val, >> > >>> '-readmethod' => 'SearchIO' ); >> > >>> >> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >> > >>> >> > >>> >> > >>> #Select the file and make the blast. >> > >>> print "Enter your FASTA file:\n"; >> > >>> chomp(my $infile = ); >> > >>> my $r = $remoteBlast->submit_blast($infile); >> > >>> my $v = 1; >> > >>> >> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE >> > RESULTS >> > >>> TO RETURN!!!!! >> > >>> while ( my @rids = $remoteBlast->each_rid ) { >> > >>> foreach my $rid ( @rids ) { >> > >>> my $rc = $remoteBlast->retrieve_blast($rid); >> > >>> if( !ref($rc) ) { >> > >>> if( $rc < 0 ) { >> > >>> $remoteBlast->remove_rid($rid); >> > >>> } >> > >>> print STDERR "." if ( $v > 0 ); >> > >>> sleep 5; >> > >>> } else { >> > >>> my $result = $rc->next_result(); >> > >>> #save the output >> > >>> my $filename = >> > >>> $result->query_name()."\.out";##################open SALIDA, >> > >>> '>>'."$^T"."Report"."\.out"; >> > >>> $remoteBlast->save_output($filename);############# >> > >>> $remoteBlast->remove_rid($rid); >> > >>> print "\nQuery Name: ", $result->query_name(), "\n"; >> > >>> while ( my $hit = $result->next_hit ) { >> > >>> next unless ( $v > 0); >> > >>> print "\thit name is ", $hit->name, "\n"; >> > >>> while( my $hsp = $hit->next_hsp ) { >> > >>> print "\t\tscore is ", $hsp->score, "\n"; >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> >> > >>> >> > >>> May any of you please explain me how to solve my question? >> > >>> >> > >>> Thanks in advence >> > >>> >> > >>> With best wishes >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> _______________________________________________ >> > >> Bioperl-l mailing list >> > >> Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> >> > >> -- >> > >> -- >> > >> Dr. Jos? Luis Lav?n Trueba >> > >> >> > >> Dpto. de Producci?n Agraria >> > >> Grupo de Gen?tica y Microbiolog?a >> > >> Universidad P?blica de Navarra >> > >> 31006 Pamplona >> > >> Navarra >> > >> SPAIN >> > > >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From l.m.timmermans at students.uu.nl Fri Nov 18 09:15:47 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Fri, 18 Nov 2011 15:15:47 +0100 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C wrote: > I need to parse in an exel sheet : > What you're saying here is nonsense. I think you meant to say you want to output Excel. > Is possible from a big blast result file obtain an exel with 5 columns > where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > There are a number of Perl modules on CPAN for outputting Excel. Try Excel::Writer::XLSX and Spreadsheet::WriteExcel for example. Leon From tzhu at mail.bnu.edu.cn Mon Nov 21 00:17:18 2011 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Mon, 21 Nov 2011 13:17:18 +0800 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn> I can use the "slice" method to split a single sequence alignment into several subalignments. Then is there a corresponding "combine" method to combine such subalignments back? -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From David.Messina at sbc.su.se Mon Nov 21 04:58:51 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 21 Nov 2011 10:58:51 +0100 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Nov 21 06:41:09 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 21 Nov 2011 11:41:09 +0000 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <4ECA38D5.8050709@gmail.com> See the cat method in Bio::Align::Utilities: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat On 21/11/2011 09:58, Dave Messina wrote: > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From zntayl at gmail.com Wed Nov 16 20:07:07 2011 From: zntayl at gmail.com (Nathan Taylor) Date: Wed, 16 Nov 2011 20:07:07 -0500 Subject: [Bioperl-l] seqIO.pm Message-ID: Hello, Can SeqIO.pm convert a file of fastq reads into .phd files. Or, barring that, a file of fastas and file of quals into .phd files? Many thanks, Nathan From gregonomic at yahoo.co.nz Mon Nov 21 07:00:50 2011 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST) Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Hi. I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. Usage: concatenate_alignments.pl -o <... input_alignment_n> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). Greg. ________________________________ From: Dave Messina To: Tao Zhu Cc: BioPerl Sent: Monday, 21 November 2011 7:58 PM Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: concatenate_alignments.pl Type: application/octet-stream Size: 3349 bytes Desc: not available URL: From jason.stajich at gmail.com Mon Nov 21 10:31:50 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 21 Nov 2011 10:31:50 -0500 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com> greg -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out. This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment. https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote: > Hi. > > I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. > > It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. > > Usage: > concatenate_alignments.pl -o <... input_alignment_n> > > > If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). > > Greg. > > > ________________________________ > From: Dave Messina > To: Tao Zhu > Cc: BioPerl > Sent: Monday, 21 November 2011 7:58 PM > Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? > > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Mon Nov 21 11:15:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 21 Nov 2011 16:15:13 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter From cjfields at illinois.edu Mon Nov 21 11:57:29 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 21 Nov 2011 16:57:29 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu> On Nov 21, 2011, at 10:15 AM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: >> Hello, >> >> Can SeqIO.pm convert a file of fastq reads into .phd files. Or, >> barring that, a file of fastas and file of quals into .phd files? >> >> Many thanks, >> Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an > error message? > > Peter This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose. Nathan, if you run into problems with that conversion let us know. chris From rondonbio at yahoo.com.br Mon Nov 21 12:31:21 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST) Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Hi! try this script: #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } my $fastq = $ARGV[0]; my $in = Bio::SeqIO->new( -file => $fastq, ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); my $out = Bio::SeqIO->new ( -file => ">out.phd", ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); while (my $seq = $in->next_seq()) { ?? ? ?$out->write_seq($seq); } exit; Best wishes, Rondon, a brazilian friend. ________________________________ De: Peter Cock Para: Nathan Taylor Cc: bioperl-l at bioperl.org Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 Assunto: Re: [Bioperl-l] seqIO.pm On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Nov 21 15:04:01 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 22 Nov 2011 09:04:01 +1300 Subject: [Bioperl-l] seqIO.pm In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> References: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz> Or you could use the builtin script bp_sreformat.pl --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rondon Neto > Sent: Tuesday, 22 November 2011 6:31 a.m. > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] seqIO.pm > > Hi! try this script: > > #!/usr/bin/perl > use warnings; > use strict; > use Bio::SeqIO; > > if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } > > my $fastq = $ARGV[0]; > > my $in = Bio::SeqIO->new( -file => $fastq, > ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); > > my $out = Bio::SeqIO->new ( -file => ">out.phd", > ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); > > while (my $seq = $in->next_seq()) { > ?? ? ?$out->write_seq($seq); > } > > exit; > > > Best wishes, > Rondon, a brazilian friend. > > > > > > > ________________________________ > De: Peter Cock > Para: Nathan Taylor > Cc: bioperl-l at bioperl.org > Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 > Assunto: Re: [Bioperl-l] seqIO.pm > > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > > Hello, > > > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > > barring that, a file of fastas and file of quals into .phd files? > > > > Many thanks, > > Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an error message? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From goodyearkl at gmail.com Mon Nov 21 21:23:13 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Hi, This may seem like a stupid question but I am just learning bioperl and I am trying to figure out how to get a count of all the characters in my FASTA file. I manged to get the number of sequences using the following. Is there a way to tell bioperl to count the characters? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $count=0; while (my $seq_obj = $seqio_obj->next_seq) { $count++; } #Display the number of sequences present print "There are $count sequences present.\n"; From David.Messina at sbc.su.se Tue Nov 22 03:08:11 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 22 Nov 2011 09:08:11 +0100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, You can use the length method for this. my $seq_length = $seq_obj->length(); Have you taken a look at the beginner's HOWTO? There's a nice table of sequence methods as well lots of other good information in there. http://www.bioperl.org/wiki/HOWTO:Beginners Dave On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From liam.elbourne at mq.edu.au Mon Nov 21 23:11:12 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Tue, 22 Nov 2011 15:11:12 +1100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, I think the length() method is what you're after: .... my $sequence_length = $seq_obj->length(); .... in your case. Have a look at: HOWTO:SeqIO - BioPerl and, HOWTO:Beginners - BioPerl for some more general stuff. Regards, Liam. On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: Message signed with OpenPGP using GPGMail URL: From goodyearkl at gmail.com Tue Nov 22 08:00:55 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? In-Reply-To: References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Thank you for your help. It keeps telling me that it can't find "length" do you think it has to do with the way I am coding it? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $countseq=0; while (my $seq_obj = $seqio_obj->next_seq, ) { $countseq++; } #Display the number of sequences present print "There are $countseq sequences present.\n"; #Count number of charcaters in file my $seq_length = $seq_obj->length ; print $seq_length On Nov 22, 5:08?am, Dave Messina wrote: > Hi Kylie, > > You can use the length method for this. > > my $seq_length = $seq_obj->length(); > > Have you taken a look at the beginner's HOWTO? There's a nice table of > sequence methods as well lots of other good information in there. > > http://www.bioperl.org/wiki/HOWTO:Beginners > > Dave > > > > > > > > > > On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > > Hi, > > This may seem like a stupid question but I am just learning bioperl > > and I am trying to figure out how to get a count of all the characters > > in my FASTA file. I manged to get the number of sequences using the > > following. Is there a way to tell bioperl to count the characters? > > > #!/usr/bin/perl -w > > #Defines perl modules > > #Bio::Seq deal with sequences and their features > > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats > > use Bio::SeqIO; > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > #Count how many sequences are present in file > > my $count=0; > > while (my $seq_obj = $seqio_obj->next_seq) { > > ? ?$count++; > > } > > #Display the number of sequences present > > print "There are $count sequences present.\n"; > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Nov 22 10:50:31 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 22 Nov 2011 15:50:31 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <4ECBC4C7.10401@gmail.com> Hi Kylie, I suspect the error you get is actually "Can't call method length on an undefined value" (please in future report the exact text of any error messages). You declare $seq_obj with "my" in the while loop, but then try to access it outside of the loop. Try printing out the length of each $seq_obj within the while loop. You should always include "use strict;" at the top of your program, that helps to catch errors like this. Cheers, Roy. On 22/11/2011 13:00, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 22 11:13:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 22 Nov 2011 16:13:01 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> This sounds a little homework-y. Sure this isn't for a class? :) One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl. Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length. chris On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Nov 22 15:47:36 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 23 Nov 2011 09:47:36 +1300 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz> Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl As previous posters have hinted, RTFM - the answers are all in there! --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J > Sent: Wednesday, 23 November 2011 5:13 a.m. > To: Kylie Goodyear > Cc: > Subject: Re: [Bioperl-l] Fasta counting script? > > This sounds a little homework-y. Sure this isn't for a class? :) > > One clue (and a good thing to keep in mind): always 'use strict; use warnings;' > with your scripts if you are new to perl. Doing so would let you know there is > a problem with the script the way it is written, specifically, the place where > you are inquiring about the length. > > chris > > On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > > > Thank you for your help. It keeps telling me that it can't find > > "length" do you think it has to do with the way I am coding it? > > > > #!/usr/bin/perl -w > > #Defines perl modules > > > > #Bio::Seq deal with sequences and their features use Bio::Seq; > > > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats use Bio::SeqIO; > > > > > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > > > > > #Count how many sequences are present in file my $countseq=0; while > > (my $seq_obj = $seqio_obj->next_seq, ) { > > $countseq++; > > } > > #Display the number of sequences present print "There are $countseq > > sequences present.\n"; > > > > #Count number of charcaters in file > > my $seq_length = $seq_obj->length ; > > print $seq_length > > > > > > On Nov 22, 5:08 am, Dave Messina wrote: > >> Hi Kylie, > >> > >> You can use the length method for this. > >> > >> my $seq_length = $seq_obj->length(); > >> > >> Have you taken a look at the beginner's HOWTO? There's a nice table > >> of sequence methods as well lots of other good information in there. > >> > >> http://www.bioperl.org/wiki/HOWTO:Beginners > >> > >> Dave > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear > wrote: > >>> Hi, > >>> This may seem like a stupid question but I am just learning bioperl > >>> and I am trying to figure out how to get a count of all the > >>> characters in my FASTA file. I manged to get the number of sequences > >>> using the following. Is there a way to tell bioperl to count the characters? > >> > >>> #!/usr/bin/perl -w > >>> #Defines perl modules > >>> #Bio::Seq deal with sequences and their features use Bio::Seq; > >>> #Bio::SeqIO handles reading and parsing of sequences of many > >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = > >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" > >>> ); #Count how many sequences are present in file my $count=0; while > >>> (my $seq_obj = $seqio_obj->next_seq) { > >>> $count++; > >>> } > >>> #Display the number of sequences present print "There are $count > >>> sequences present.\n"; > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioper... at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf > >> o/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From charles-listes+bioperl at plessy.org Wed Nov 23 05:27:45 2011 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Wed, 23 Nov 2011 19:27:45 +0900 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? Message-ID: <20111123102745.GC20168@merveille.plessy.net> Dear BioPerl developers, I am trying to process some unaligned paired-end reads with Bio::DB::Sam. For each pair, I want to detect a sequence index and a unique molecular identifier in the linker, record them as auxiliary flags, and trim the linker from the read. I collect the pairs through a features iterator, and can access all their data through the high-level Bio::DB::Bam::Alignment API. After modifying them (linker trimming and adding flags), I want to write the resulting pairs as a new unaligned BAM file. I apologise if the solution is trivial, but my problem is that I do not manage to modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as ?$pair[0]->qseq("GATACA")? give errors like ?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. Since I did not find explanations or portsions of source code indicating how to modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan From MEC at stowers.org Wed Nov 23 11:02:26 2011 From: MEC at stowers.org (Cook, Malcolm) Date: Wed, 23 Nov 2011 10:02:26 -0600 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Charles, I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". ~Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Plessy > Sent: Wednesday, November 23, 2011 4:28 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the read. > > I collect the pairs through a features iterator, and can access all their data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 23 14:26:31 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 23 Nov 2011 19:26:31 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Wed Nov 23 17:02:23 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:02:23 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: I apologize that the qseq() method is only allowing read-only access. I will attempt to fix this. Lincoln On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy < charles-listes+bioperl at plessy.org> wrote: > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the > read. > > I collect the pairs through a features iterator, and can access all their > data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as > a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not > manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating > how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lincoln.stein at gmail.com Wed Nov 23 17:05:41 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:05:41 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > According to the docs the low-level API for Bio-Samtools, both read and > write are allowed: > > http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API > > Using the low-level API for this purpose isn't documented as well, though > (the high-level API is read only AFAICT). > > The error message is a standard one generated from the XS bindings where > the passed argument passed isn't mapped correctly. Looking through the > Sam.xs file, qseq() is only prototyped as a reader; the only arg is a > Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a > function specified for Bio::DB::Bam::Alignment names l_qseq() that might be > the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' > prefix): > > .... > > int > bama_l_qseq(b,...) > Bio::DB::Bam::Alignment b > PROTOTYPE: $;$ > CODE: > if (items > 1) > b->core.l_qseq = SvIV(ST(1)); > RETVAL=b->core.l_qseq; > OUTPUT: > RETVAL > > SV* > bama_qseq(b) > Bio::DB::Bam::Alignment b > PROTOTYPE: $ > PREINIT: > char* seq; > int i; > CODE: > seq = Newxz(seq,b->core.l_qseq+1,char); > for (i=0;icore.l_qseq;i++) { > seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; > } > RETVAL = newSVpv(seq,b->core.l_qseq); > Safefree(seq); > OUTPUT: > RETVAL > > > -chris > > On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > > > Charles, > > > > I suggest you reconsider your approach to rather, use `samtools view` to > pipe your reads to stdout in sam format, then stream edit the barcode and > pipe it back to samtools for conversion back to .bam file. > > > > I know this is not what you're asking. I'm pretty sure that direct > answer to your question is, "yes - they are read-only". > > > > ~Malcolm > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy > >> Sent: Wednesday, November 23, 2011 4:28 AM > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > >> > >> Dear BioPerl developers, > >> > >> I am trying to process some unaligned paired-end reads with > Bio::DB::Sam. > >> For > >> each pair, I want to detect a sequence index and a unique molecular > >> identifier in > >> the linker, record them as auxiliary flags, and trim the linker from > the read. > >> > >> I collect the pairs through a features iterator, and can access all > their data > >> through the high-level Bio::DB::Bam::Alignment API. After modifying > them > >> (linker trimming and adding flags), I want to write the resulting pairs > as a > >> new unaligned BAM file. > >> > >> I apologise if the solution is trivial, but my problem is that I do not > manage to > >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > >> ?$pair[0]->qseq("GATACA")? give errors like > >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > >> > >> Since I did not find explanations or portsions of source code > indicating how to > >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > >> > >> Have a nice day, > >> > >> -- > >> Charles Plessy > >> Tsurumi, Kanagawa, Japan > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Wed Nov 23 20:07:09 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 24 Nov 2011 01:07:09 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> , Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu> Ah, okay, makes sense. I thought it was oddly named. :) Chris Sent from my iPad On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" > wrote: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J > wrote: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From ross at cuhk.edu.hk Sun Nov 27 03:24:43 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 27 Nov 2011 16:24:43 +0800 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: References: Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Hi all, To write a script to extract sequence generically for all types of BioLocation objects, I'd like to know if there is any way to check what types (e.g. simple or split) are being processed. Bio::Location::CoordinatePolicyI appears to be doing something similar but it is more like a post checking step. If I parse the genbank file line by line, I can certainly check whether the line contains keywords like "join" but as I'm using something like: my @features=grep{$_->primary_tag eq $chkTags[0]} $seqobj->get_SeqFeatures; foreach (@features) { $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; @gene=[]; I'd appreciate if anybody knows a better integration with the well-developed bioperl module. Thanks a lot. From Russell.Smithies at agresearch.co.nz Sun Nov 27 19:46:05 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Nov 2011 13:46:05 +1300 Subject: [Bioperl-l] Galaxy tools? Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl? I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox. It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space) --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From p.j.a.cock at googlemail.com Sun Nov 27 20:28:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 28 Nov 2011 01:28:33 +0000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: On Monday, November 28, 2011, Smithies, Russell wrote: > Possibly the wrong place to ask but has anyone written > Galaxy tools using BioPerl? > I was thinking of creating blast graphic and format converter > tools as I couldn't see any already available in their toolbox. > It looks like I can just write a Python wrapper for my existing > BioPerl scripts - although I suspect the "correct" method is to > use BioPython methods (but Python annoys me with its lack > of semi-colons and required white-space) Galaxy is agnostic about what language the tools are in, you can use a binary, shell script, Java, Perl, Python etc. Peter From florent.angly at gmail.com Sun Nov 27 21:09:45 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 12:09:45 +1000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: <4ED2ED69.10601@gmail.com> Hi Russell, As Peter said, the tools to be wrapped do not need to be written in Python. I have build a few wrappers for Galaxy, including one for the read simulator Grinder (http://sourceforge.net/projects/biogrinder/), which uses Bioperl and is available in the Galaxy Toolshed (http://sourceforge.net/projects/biogrinder/). It is not very hard to do a wrapper for trivial programs, but becomes more complicated once you start having optional arguments or multiple output files. Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) to parse command-line arguments. I have been thinking about leveraging the information that Getopt::Euclid stores about command-line arguments to automate most of the Galaxy wrapper generation, but I have not gotten to it yet. Florent On 28/11/11 11:28, Peter Cock wrote: > On Monday, November 28, 2011, Smithies, Russell wrote: >> Possibly the wrong place to ask but has anyone written >> Galaxy tools using BioPerl? >> I was thinking of creating blast graphic and format converter >> tools as I couldn't see any already available in their toolbox. >> It looks like I can just write a Python wrapper for my existing >> BioPerl scripts - although I suspect the "correct" method is to >> use BioPython methods (but Python annoys me with its lack >> of semi-colons and required white-space) > Galaxy is agnostic about what language the tools are in, > you can use a binary, shell script, Java, Perl, Python etc. > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Sun Nov 27 23:35:31 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 14:35:31 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules Message-ID: <4ED30F93.4000407@gmail.com> Hi all, I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. I envision the following modules: * Bio::Community::Member module representing members of a community. * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. Any interest? Ideas? Comments? Thanks, Florent From cjfields at illinois.edu Mon Nov 28 14:42:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:42:12 +0000 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> References: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu> Ross, The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects chris On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote: > Hi all, > > To write a script to extract sequence generically for all types of > BioLocation objects, I'd like to know if there is any way to check what > types (e.g. simple or split) are being processed. > > Bio::Location::CoordinatePolicyI appears to be doing something similar but > it is more like a post checking step. If I parse the genbank file line by > line, I can certainly check whether the line contains keywords like "join" > but as I'm using something like: > > my @features=grep{$_->primary_tag eq $chkTags[0]} > $seqobj->get_SeqFeatures; > > > foreach (@features) { > > $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; > > @gene=[]; > > I'd appreciate if anybody knows a better integration with the well-developed > bioperl module. > > Thanks a lot. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 28 14:47:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:47:10 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED30F93.4000407@gmail.com> References: <4ED30F93.4000407@gmail.com> Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? I do think it should be developed on it's own, per our recent discussions re: slimming down core. Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. chris On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > Hi all, > > I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. > > I envision the following modules: > * Bio::Community::Member module representing members of a community. > * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... > * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. > > The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > > Thanks, > > Florent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Nov 28 15:25:13 2011 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 28 Nov 2011 21:25:13 +0100 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: And now to the list too, On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > The idea is to implement these modules in Moose to teach myself Moose. The > members of a community could be a sequence (Bio::SeqI), a species (Bio::S), > an arbitrary string or even other things. I am not quite sure if Bioperl > provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > Sounds like a good use-case for roles, maybe even parametric roles. Leon From florent.angly at gmail.com Mon Nov 28 19:59:24 2011 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 29 Nov 2011 10:59:24 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> Message-ID: <4ED42E6C.6020501@gmail.com> Hi Chris, On 29/11/11 05:47, Fields, Christopher J wrote: > I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? None of these features would be duplicated. Rather, they would be used attributes of the Bio::Community::* objects. For example, a member of a community could have a Bio::SeqI attached to it as well as a Bio::Taxon, etc... > I do think it should be developed on it's own, per our recent discussions re: slimming down core. Yes, the features are so different that it makes sense to have the Bio::Community::* modules as a separate BioPerl distribution (like the Bio-FeatureIO BioPerl distribution). > Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Best, Florent > chris > > On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > >> Hi all, >> >> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. >> >> I envision the following modules: >> * Bio::Community::Member module representing members of a community. >> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... >> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. >> >> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> >> Thanks, >> >> Florent >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 29 00:32:50 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 05:32:50 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote: > And now to the list too, > > On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > >> The idea is to implement these modules in Moose to teach myself Moose. The >> members of a community could be a sequence (Bio::SeqI), a species (Bio::S), >> an arbitrary string or even other things. I am not quite sure if Bioperl >> provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> > > Sounds like a good use-case for roles, maybe even parametric roles. > > Leon Yep, agree totally. It would be a good replacement in most cases for the BioI interfaces. (see also, the Biome project, which I'm slooooooowly working on again, on github) chris From pmr at ebi.ac.uk Tue Nov 29 08:39:52 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 29 Nov 2011 13:39:52 +0000 Subject: [Bioperl-l] BinarySearch.pm Message-ID: <4ED4E0A8.30102@ebi.ac.uk> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. Both appear to be in the Bio/Flat/BinarySearch.pm source file. EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: if ($format =~ /embl/i) { return ('ID', "^ID (\\S+[^; ])", "^ID (\\S+[^; ])", { ACC => q/^AC (\S+);/, VERSION => q/^SV\s+(\S+)/ }); } The ACC secondary index has every record duplicated. This line is duplicated in the write_secondary_indices source code. Is that intentional? print $fh sprintf("%-${length}s",$record); regards, Peter Rice EMBOSS Team From uni.anastasia at gmail.com Sat Nov 26 12:32:48 2011 From: uni.anastasia at gmail.com (anastsia shapiro) Date: Sat, 26 Nov 2011 19:32:48 +0200 Subject: [Bioperl-l] Problem with parsing blast results Message-ID: Hello, I'm running a script that should parse a blast results, using searchIO. Sometimes the script work fines, however sometimes it stops, and I receive the following error. ------------- EXCEPTION ------------- MSG: no data for midline Query ------------------------------------------------------------ STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ blast.pm:1805 STACK toplevel D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 ------------------------------------- While the blast results files were received as a result of running the following blast command: blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I am using bioperl 1.6.1. I read all the forums , and it seems to be a bug, but on version 1.5 it was fixed. I will really appreciate your help, since I am trying to understand the problem for over a month. Regards, Anastasia From bunk at novozymes.com Tue Nov 29 11:46:54 2011 From: bunk at novozymes.com (Jacob Bunk Nielsen) Date: Tue, 29 Nov 2011 17:46:54 +0100 Subject: [Bioperl-l] Problem with parsing blast results In-Reply-To: (anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100") References: Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net> Hi anastsia shapiro writes: > I'm running a script that should parse a blast results, using searchIO. > > Sometimes the script work fines, however sometimes it stops, and I receive > the following error. > > ------------- EXCEPTION ------------- > MSG: no data for midline Query > ------------------------------------------------------------ > STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ > blast.pm:1805 > STACK toplevel > D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 > ------------------------------------- > While the blast results files were received as a result of running the > following blast command: > blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust > no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I don't know why this exact problem arises, but I think you should consider using an output format that is better machine parseable, like the XML format. You specify XML as output format of blastn by using -m 7. When reading the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO. That way I think you are likely to see a lot fewer problems regarding the parsing of blast output. If the above doesn't solve the problem you better show us the code that fails. Best regards Jacob From cjfields at illinois.edu Tue Nov 29 14:11:11 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 19:11:11 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED42E6C.6020501@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > Hi Chris, > > On 29/11/11 05:47, Fields, Christopher J wrote: > ... >> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. > Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > Best, > > Florent chris From cjfields at illinois.edu Tue Nov 29 17:30:58 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 22:30:58 +0000 Subject: [Bioperl-l] BinarySearch.pm In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk> References: <4ED4E0A8.30102@ebi.ac.uk> Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu> Peter, Can you send a test file that is failing? I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files. I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions. Both changes pass tests as is, though, so I have committed them in the meantime. chris On Nov 29, 2011, at 7:39 AM, Peter Rice wrote: > In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. > > Both appear to be in the Bio/Flat/BinarySearch.pm source file. > > EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: > > if ($format =~ /embl/i) { > return ('ID', > "^ID (\\S+[^; ])", > "^ID (\\S+[^; ])", > { > ACC => q/^AC (\S+);/, > VERSION => q/^SV\s+(\S+)/ > }); > } > > The ACC secondary index has every record duplicated. > This line is duplicated in the write_secondary_indices source code. Is that intentional? > > print $fh sprintf("%-${length}s",$record); > > regards, > > Peter Rice > EMBOSS Team > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Nov 29 20:18:41 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 11:18:41 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> Message-ID: <4ED58471.3030106@gmail.com> Chris, Yes, it is exciting to learn something new. I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? Cheers, Florent On 30/11/11 05:11, Fields, Christopher J wrote: > On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > >> Hi Chris, >> >> On 29/11/11 05:47, Fields, Christopher J wrote: >> ... >>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? > Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > >> Best, >> >> Florent > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 29 21:34:00 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 30 Nov 2011 02:34:00 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED58471.3030106@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > Chris, > Yes, it is exciting to learn something new. > I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: https://github.com/bioperl/Bio-Community chris > Cheers, > Florent > > On 30/11/11 05:11, Fields, Christopher J wrote: >> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >> >>> Hi Chris, >>> >>> On 29/11/11 05:47, Fields, Christopher J wrote: >>> ... >>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >> >>> Best, >>> >>> Florent >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Nov 29 21:50:04 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 12:50:04 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: <4ED599DC.6090808@gmail.com> Fantastic! Thank you very much Chris, Florent On 30/11/11 12:34, Fields, Christopher J wrote: > On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > >> Chris, >> Yes, it is exciting to learn something new. >> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? > It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: > > https://github.com/bioperl/Bio-Community > > chris > > >> Cheers, >> Florent >> >> On 30/11/11 05:11, Fields, Christopher J wrote: >>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >>> >>>> Hi Chris, >>>> >>>> On 29/11/11 05:47, Fields, Christopher J wrote: >>>> ... >>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >>> >>>> Best, >>>> >>>> Florent >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Wed Nov 30 00:25:32 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 00:25:32 -0500 Subject: [Bioperl-l] Exception MSG Message-ID: Hello, Brushing up on my BioPerl and I can't figure out this MSG: ------------- EXCEPTION ------------- MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out STACK Bio::Tools::Run::RemoteBlast::save_output /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 ------------------------------------- Here is the code: #!/usr/bin/perl -w use strict; use Bio::Tools::Run::RemoteBlast; #=cut my $prog = 'blastp'; my $db = 'swissprot'; my $e_val = '1e-10'; my @params = ('-prog' => $prog, '-data' => $db, 'expect' => $e_val, 'readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #human database $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; my $v =1; # this is just to turn on and off the messages # Construct the sequence object my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format => "fasta"); while (my $input = $seq_in->next_seq()){ my $r = $factory->submit_blast($input); print STDERR "waiting..." if ($v > 0); while (my @rids = $factory->each_rid()){ foreach my $rid (@rids){ my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if($rc < 0){ $factory->remove_rid($rid); } print STDERR "." if ($v > 0); sleep 5; } else { my $result = $rc->next_result(); #save output my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } Thanks for the help! From jason.stajich at gmail.com Wed Nov 30 01:05:41 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 29 Nov 2011 22:05:41 -0800 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself. On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > Hello, > > Brushing up on my BioPerl and I can't figure out this MSG: > > ------------- EXCEPTION ------------- > > MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > STACK Bio::Tools::Run::RemoteBlast::save_output > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > ------------------------------------- > Here is the code: > > #!/usr/bin/perl -w > > use strict; > > use Bio::Tools::Run::RemoteBlast; > > > #=cut > > my $prog = 'blastp'; > > my $db = 'swissprot'; > > my $e_val = '1e-10'; > > > my @params = ('-prog' => $prog, > > '-data' => $db, > > 'expect' => $e_val, > > 'readmethod' => 'SearchIO' ); > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > #human database > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > [ORGN]'; > > > my $v =1; # this is just to turn on and off the messages > > # Construct the sequence object > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format > => "fasta"); > > > while (my $input = $seq_in->next_seq()){ > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ($v > 0); > > while (my @rids = $factory->each_rid()){ > > foreach my $rid (@rids){ > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if($rc < 0){ > > $factory->remove_rid($rid); > > } > > print STDERR "." if ($v > 0); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save output > > my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > Thanks for the help! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ss2489 at cornell.edu Wed Nov 30 09:32:47 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Wed, 30 Nov 2011 09:32:47 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: If that does not fix it, try using one of the unique identifiers as the file name (gi??) instead of the full query name. The pipe(|) characters might cause problems. On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > I don't think you need to give it the '>' when you specify the filename > for the output. That is done by the filehandle opening itsself. > > On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > > > Hello, > > > > Brushing up on my BioPerl and I can't figure out this MSG: > > > > ------------- EXCEPTION ------------- > > > > MSG: cannot open > >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > > > STACK Bio::Tools::Run::RemoteBlast::save_output > > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > > > ------------------------------------- > > Here is the code: > > > > #!/usr/bin/perl -w > > > > use strict; > > > > use Bio::Tools::Run::RemoteBlast; > > > > > > #=cut > > > > my $prog = 'blastp'; > > > > my $db = 'swissprot'; > > > > my $e_val = '1e-10'; > > > > > > my @params = ('-prog' => $prog, > > > > '-data' => $db, > > > > 'expect' => $e_val, > > > > 'readmethod' => 'SearchIO' ); > > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #human database > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > [ORGN]'; > > > > > > my $v =1; # this is just to turn on and off the messages > > > > # Construct the sequence object > > > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", > -format > > => "fasta"); > > > > > > while (my $input = $seq_in->next_seq()){ > > > > my $r = $factory->submit_blast($input); > > > > print STDERR "waiting..." if ($v > 0); > > > > while (my @rids = $factory->each_rid()){ > > > > foreach my $rid (@rids){ > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if($rc < 0){ > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ($v > 0); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save output > > > > my $filename = > ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > > > > Thanks for the help! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Wed Nov 30 09:34:52 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 09:34:52 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: Surya, As Jason suggested, I removed the '>' and it worked. Thanks for your response. Lom On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha wrote: > If that does not fix it, try using one of the unique identifiers as the > file name (gi??) instead of the full query name. The pipe(|) characters > might cause problems. > > On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > >> I don't think you need to give it the '>' when you specify the filename >> for the output. That is done by the filehandle opening itsself. >> >> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: >> >> > Hello, >> > >> > Brushing up on my BioPerl and I can't figure out this MSG: >> > >> > ------------- EXCEPTION ------------- >> > >> > MSG: cannot open >> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out >> > >> > STACK Bio::Tools::Run::RemoteBlast::save_output >> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 >> > >> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 >> > >> > ------------------------------------- >> > Here is the code: >> > >> > #!/usr/bin/perl -w >> > >> > use strict; >> > >> > use Bio::Tools::Run::RemoteBlast; >> > >> > >> > #=cut >> > >> > my $prog = 'blastp'; >> > >> > my $db = 'swissprot'; >> > >> > my $e_val = '1e-10'; >> > >> > >> > my @params = ('-prog' => $prog, >> > >> > '-data' => $db, >> > >> > 'expect' => $e_val, >> > >> > 'readmethod' => 'SearchIO' ); >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > >> > #human database >> > >> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens >> > [ORGN]'; >> > >> > >> > my $v =1; # this is just to turn on and off the messages >> > >> > # Construct the sequence object >> > >> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", >> -format >> > => "fasta"); >> > >> > >> > while (my $input = $seq_in->next_seq()){ >> > >> > my $r = $factory->submit_blast($input); >> > >> > print STDERR "waiting..." if ($v > 0); >> > >> > while (my @rids = $factory->each_rid()){ >> > >> > foreach my $rid (@rids){ >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) { >> > >> > if($rc < 0){ >> > >> > $factory->remove_rid($rid); >> > >> > } >> > >> > print STDERR "." if ($v > 0); >> > >> > sleep 5; >> > >> > } else { >> > >> > my $result = $rc->next_result(); >> > >> > #save output >> > >> > my $filename = >> ">/Users/mydata/Desktop/".$result->query_name().".out";#error >> > >> > $factory->save_output($filename); >> > >> > $factory->remove_rid($rid); >> > >> > print "\nQuery Name: ", $result->query_name(), "\n"; >> > >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v > 0); >> > >> > print "\thit name is ", $hit->name, "\n"; >> > >> > while( my $hsp = $hit->next_hsp ) { >> > >> > print "\t\tscore is ", $hsp->score, "\n"; >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > >> > >> > Thanks for the help! >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From ericdemuinck at gmail.com Wed Nov 30 18:36:36 2011 From: ericdemuinck at gmail.com (Ericde) Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST) Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form Message-ID: <32886592.post@talk.nabble.com> :-/ I am a newbie and I am trying to retrieve a blast multiple alignment in fasta form. The BLAST output (m -2) gives several alignments (which is good) and the parsing of the xml file seems to list all of these alignments (which is also good) The problem is that the fasta alignment file only includes one of the hits and the alignment does not include all the sequences (including the query sequence). I would like to generate a fasta file that includes all the alignments included in the m -2 output (plus query sequence if possible). I have cobbled together a script (below) ...I will attach the sample blast xml file and the (m -2) file as well....any insight is appreciated :/ #module load perl #give the name of the blast xml file to parse in the line where it says 'file =>' use Bio::SearchIO; #Use m -7 to generate xml file from blastall my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'BLASToutxml'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object #ENTER desired sequence length if( $hsp->length('total') > 50 ) { #ENTER desired percent identity if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; #Print alignment to file #$aln will be a Bio::SimpleAlign object use Bio::AlignIO; my $aln = $hsp->get_aln; #changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file => ">hsp.fas"); $alnIO->write_aln($aln); } } } } } http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml http://old.nabble.com/file/p32886592/hsp.fas hsp.fas -- View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From hrh at fmi.ch Tue Nov 1 06:18:54 2011 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Tue, 1 Nov 2011 11:18:54 +0100 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: Message-ID: Hi Carn? Please allow me to make a few comments: I very much like your idea of writing a free tool to edit and draw sequences. We (ie people working in core Bioinformatics facilities) all suffer from having to deal with files originally created with commercial packages. And on top of all the pain, those commercial packages are very expensive and they don't deliver what they promise to do. Just double checking: Have you looked a the free tools which are available? I am aware of the following ones (as far as I know, they are all GUI based and don't have a command line API): Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html GENtle http://gentle.magnusmanske.de/ GeneCoder http://www.algosome.com/gene-coder/gene-coder.html pDRAW32 http://www.acaclone.com/ Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> UGene http://ugene.unipro.ru/ maybe others on the list know of even better free tools? Also, have you looked at the emboss tool "cirdna" ? WRT file formats: I strongly suggest to stick to embl and genbank format as input and (text) output format. The features are not indexed, but you can create your own when you store the sequences in your system. Internally, you probably wanna keep the data in a 'simpler' format than embl or genbank, anyway. Alternatively, have you looked at gff/gtf as away of getting features? see: http://www.sequenceontology.org/gff3.shtml http://mblab.wustl.edu/GTF22.html I am looking forward to any progress you make Regards, Hans Hans-Rudolf Hotz, PhD Bioinformatics Support Friedrich Miescher Institute for Biomedical Research Maulbeerstrasse 66 4058 Basel/Switzerland On 10/31/11 7:05 PM, "Carn? Draug" wrote: > Hi > > I've been planning on writing a free (as in freedom) tool to edit > sequences and make plamids maps. The idea is to build the command line > tool first and maybe later work on a GUI for it. > > The problem I foresee at the moment while designing it, is how to > change a feature of the sequence. I'm not familiar with all sequence > formats (only fasta, ensembl and genbank) but I can't see how to > specify from the command line what feature to edit since I can't see > any unique identifiers for them. Is there a file format that makes > this easier? Any tips would be most appreciated. > > Thank in advance, > Carn? Draug > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 09:40:30 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 13:40:30 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote: > Hi, > > I am having problems running Bio::Index::Fastq. I get the following error when a quality line begins with '@'. > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: No description line parsed > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71 > STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29 > STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147 > STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198 > STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68 > > > Here is an example of a fastq record that is causing this error, The last line which starts with an '@' is actually the qual line. > > @5:105:15806:16092:Y > GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG > + > @9;A565:=8B? > > > i see that chris has partially addressed this in the mailing list > http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html > > However as he pointed out at the time, it appears this may be a fairly large problem. The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not. I can try to push this to the forefront this week, the fix shouldn't be too hard to implement. In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running. > My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0 would work since the header lines are always the first of 4 lines , 0,4,8, etc. That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing. There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again. A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second). The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use. > But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence > > > ## only for single line seq and qual > my $line_count = 0; > while (<$FASTQ>) { > if (/^@/ and $line_count % 4 == 0) { > # $begin is the position of the first character after the '@' > my $begin = tell($FASTQ) - length( $_ ) + 1; > foreach my $id (&$id_parser($_)) { > $self->add_record($id, $i, $begin); > $c++; > } > } > $line_count++; > } > > > -- > BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID? > > There's one called cdbfasta which looks like it might work ? does anyone have experience with it? I haven't, but it appears FASTA-specific. Does it parse FASTQ as well? I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well. May have to look that one up. > Thanks, > sofia > > P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here. chris From p.j.a.cock at googlemail.com Tue Nov 1 10:38:43 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 14:38:43 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J wrote: > > One problem the various Bio* indexers have currently is the lack of > standardization on a specific schema for indexing. ?There are in-roads > towards this (OBDA) that haven't been adequately traveled IMHO, > which need to be taken up again. > Something to switch to open-bio-l at lists.open-bio.org for, http://lists.open-bio.org/mailman/listinfo/open-bio-l We can continue this thread from last summer, http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html ... http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html And CC Peter Rice from EMBOSS too - we chatted about this at ISMB/BOSC 2011 in July - and whomever looks after the OBDA/indexing code in BioRuby and BioJava too. > A second, and maybe this is more specific to BioPerl, is that the > parsers and indexers essentially reimplement the format parsing > in each module, so if there are bugs they have to be independently > fixed (hence why SeqIO works and the indexer doesn't; I wrote the > first but not the second). ?The best place for any optimizations > would be in a unified parser that both the SeqIO and indexer > modules could use. We have that problem to an extent in Biopython's Bio.SeqIO code. The indexing code duplicates some logic of the parsing code (how much depends on the format), sufficient to extract the read ID and the bounds on disk. The two could be more unified but the parsers came first and didn't want to change them at the time. Instead I tried to be rigorous in consistency testing for the index code's unit tests. Regards, Peter From carandraug+dev at gmail.com Tue Nov 1 11:13:06 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 1 Nov 2011 15:13:06 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: On 1 November 2011 10:18, Hotz, Hans-Rudolf wrote: > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): They are not all free. Just for future reference, here's their licenses: > Serial Cloner Couldn't find a license and the download for linux has no source so I'm guessing proprietary. > GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/ Free under GPL > GeneCoder Proprietary > pDRAW32 Proprietary > Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/ Seems public domain. License is not defined anywhere but the files I checked had the public domain notice on the header > Ape Proprietary ("license" is at the top of AppMain.tcl) > UGene ? ? ? ? ? ? http://ugene.unipro.ru/ Free under GPL > Also, have you looked at the emboss tool "cirdna" ? Free under GPL > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html Considering the already existing alternatives, I'm more likely to collaborate with one of them to do what I want. I'll just have to check them all and decide. I was planning on writing a new tool and contribute it to the scripts section of bioperl since when I googled before all the links only the proprietary tools showed up. Thank you very much for the links. Carn? From roy.chaudhuri at gmail.com Tue Nov 1 11:44:19 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 01 Nov 2011 15:44:19 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: <4EB013D3.30801@gmail.com> The Sanger Institute's Artemis is good for editing sequence features, and DNAPlotter can be used to produce circular diagrams: http://www.sanger.ac.uk/resources/software/artemis http://www.sanger.ac.uk/resources/software/dnaplotter Roy. On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote: > Hi Carn? > > Please allow me to make a few comments: > > I very much like your idea of writing a free tool to edit and draw > sequences. We (ie people working in core Bioinformatics facilities) all > suffer from having to deal with files originally created with commercial > packages. And on top of all the pain, those commercial packages are very > expensive and they don't deliver what they promise to do. > > > Just double checking: Have you looked a the free tools which are available? > > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): > > Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html > GENtle http://gentle.magnusmanske.de/ > GeneCoder http://www.algosome.com/gene-coder/gene-coder.html > pDRAW32 http://www.acaclone.com/ > Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ > Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> > UGene http://ugene.unipro.ru/ > > maybe others on the list know of even better free tools? > > Also, have you looked at the emboss tool "cirdna" ? > > > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html > > > > I am looking forward to any progress you make > > Regards, Hans > > > > Hans-Rudolf Hotz, PhD > Bioinformatics Support > > Friedrich Miescher Institute for Biomedical Research > Maulbeerstrasse 66 > 4058 Basel/Switzerland > > > > On 10/31/11 7:05 PM, "Carn? Draug" wrote: > >> Hi >> >> I've been planning on writing a free (as in freedom) tool to edit >> sequences and make plamids maps. The idea is to build the command line >> tool first and maybe later work on a GUI for it. >> >> The problem I foresee at the moment while designing it, is how to >> change a feature of the sequence. I'm not familiar with all sequence >> formats (only fasta, ensembl and genbank) but I can't see how to >> specify from the command line what feature to edit since I can't see >> any unique identifiers for them. Is there a file format that makes >> this easier? Any tips would be most appreciated. >> >> Thank in advance, >> Carn? Draug >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Tue Nov 1 12:02:24 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 1 Nov 2011 09:02:24 -0700 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Jason On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J > wrote: >> >> One problem the various Bio* indexers have currently is the lack of >> standardization on a specific schema for indexing. There are in-roads >> towards this (OBDA) that haven't been adequately traveled IMHO, >> which need to be taken up again. >> > > Something to switch to open-bio-l at lists.open-bio.org for, > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > We can continue this thread from last summer, > http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html > ... > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html > > And CC Peter Rice from EMBOSS too - we chatted about this > at ISMB/BOSC 2011 in July - and whomever looks after the > OBDA/indexing code in BioRuby and BioJava too. > >> A second, and maybe this is more specific to BioPerl, is that the >> parsers and indexers essentially reimplement the format parsing >> in each module, so if there are bugs they have to be independently >> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >> first but not the second). The best place for any optimizations >> would be in a unified parser that both the SeqIO and indexer >> modules could use. > > We have that problem to an extent in Biopython's Bio.SeqIO code. > The indexing code duplicates some logic of the parsing code > (how much depends on the format), sufficient to extract the read > ID and the bounds on disk. The two could be more unified but > the parsers came first and didn't want to change them at the time. > Instead I tried to be rigorous in consistency testing for the index > code's unit tests. > > Regards, > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 13:44:25 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 17:44:25 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point. > I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data. The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)? Or are there problems afoot there we're unaware of? Re: specifics, I think Biopython uses SQLite, is that correct Peter? chris > Jason > On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > >> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J >> wrote: >>> >>> One problem the various Bio* indexers have currently is the lack of >>> standardization on a specific schema for indexing. There are in-roads >>> towards this (OBDA) that haven't been adequately traveled IMHO, >>> which need to be taken up again. >>> >> >> Something to switch to open-bio-l at lists.open-bio.org for, >> http://lists.open-bio.org/mailman/listinfo/open-bio-l >> >> We can continue this thread from last summer, >> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html >> ... >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html >> >> And CC Peter Rice from EMBOSS too - we chatted about this >> at ISMB/BOSC 2011 in July - and whomever looks after the >> OBDA/indexing code in BioRuby and BioJava too. >> >>> A second, and maybe this is more specific to BioPerl, is that the >>> parsers and indexers essentially reimplement the format parsing >>> in each module, so if there are bugs they have to be independently >>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >>> first but not the second). The best place for any optimizations >>> would be in a unified parser that both the SeqIO and indexer >>> modules could use. >> >> We have that problem to an extent in Biopython's Bio.SeqIO code. >> The indexing code duplicates some logic of the parsing code >> (how much depends on the format), sufficient to extract the read >> ID and the bounds on disk. The two could be more unified but >> the parsers came first and didn't want to change them at the time. >> Instead I tried to be rigorous in consistency testing for the index >> code's unit tests. >> >> Regards, >> >> Peter >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From p.j.a.cock at googlemail.com Tue Nov 1 14:06:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 18:06:50 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J wrote: > On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > >> I think a different indexer is needed for the scale of key/value >> pairs we see in fastq files if we want to make a fast lookup by >> ID. I think speed is of essence for this type of solution and so >> a forced all records must be 4 lines long is okay for this type >> of implementation. > > This can always be an early optimization, that's easy enough. > But I'm sure we will have to deal with multi-line seq/qual > FASTQ at some point. > >> I found NOSQL implementations to be much better >> performance and than any of the BDB type solutions -- they >> end up being really slow at above 1-5M keys. ?I used >> TokyoCabinet and KyotoCabinet to do indexing of accession >> -> taxonomy ID and found it quite fast for the needs. I >> haven't tried storing 100bp reads + qual string as the >> value in it yet but I think it could be done, certainly worth >> a prototype. > > Adding a middle layer where the backend storage is abstracted > is the probably the (best|most flexible) option, converging on a > good default that will work for this data. ?The actual interface is > in place, though would it be more feasible to go the OBDA > (converge on a cross-Bio* compatible schema)? ?Or are there > problems afoot there we're unaware of? > > Re: specifics, I think Biopython uses SQLite, is that correct Peter? > > chris Yes, we're using SQLite3 to store essentially a list of filenames and their format as one table, and then in the main table an entry for each sequence recording the ID (only one accession, unlike OBDA which had infrastructure for a secondary accession), file number, offset of the start of the record, and optionally the length of the record on disk. i.e. Basically what OBDA does, but using SQLite rather than BDB (not included in Python 3) or a flat file index (poor performance with large datasets). I find this design attractive on several levels: * File format neutral, covers FASTA, FASTQ, GenBank, etc * Preserves the original file untouched * Index is a small single file (thanks to SQLite) * Back end could be switched out * Could be applied to compressed file formats * Reuses existing parsing code to access entries This could easily form basis of OBDA v2, the main points of difference I anticipate between the Bio* projects would be naming conventions for the different file formats, and what we consider to be the default record ID of each read (e.g. which field in a GenBank file - although agreement here is not essential). Some of that was already settled in principle with OBDA v1. On the other hand, you could try and store the parsed data itself, which is where NOSQL looks more interesting. That essentially requires the ability to serialise your annotated sequence object model to disk - which would be tricky to do cross project (much more ambitious than BioSQL is). It also means the "index" becomes very large because it now holds all the original data. Peter From wenbinmei at gmail.com Wed Nov 2 00:25:32 2011 From: wenbinmei at gmail.com (wenbin mei) Date: Wed, 2 Nov 2011 00:25:32 -0400 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment Message-ID: Hi, I need some help in coding. I have a multiple sequence alignment which has gaps. And also I have a reference genome sequence in the alignment which I know all the coordinates for the protein coding genes. I want to extract all these protein coding genes alignment from the big alignment. I am using Bio SimpleAlign but the question is that due to the gaps in the alignment, the coordinates has shifted in the alignment. I wonder is there a way I can not count the gaps and still be able to extract the protein alignment. One way I can do is remove the gaps in the reference first and then extract the sequence. But I don't like this way ... Thank you for help. -best, wenbin From dejian.zhao at gmail.com Wed Nov 2 09:33:18 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Wed, 02 Nov 2011 21:33:18 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree Message-ID: <4EB1469E.4050108@gmail.com> There are various packages on CPAN to cope with phylogenetic analysis. I wonder which module can read the output from other phylogenetic softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to produce a picture which combines the phylogenetic tree and the structure of each gene. From roy.chaudhuri at gmail.com Wed Nov 2 09:49:46 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 02 Nov 2011 13:49:46 +0000 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB1469E.4050108@gmail.com> References: <4EB1469E.4050108@gmail.com> Message-ID: <4EB14A7A.30307@gmail.com> MEGA can export trees in Newick format, which can be read by Bio::TreeIO. The tree can be drawn in EPS format using Bio::Tree::Draw::Cladogram. See: http://www.bioperl.org/wiki/HOWTO:Trees Roy. On 02/11/2011 13:33, Dejian Zhao wrote: > There are various packages on CPAN to cope with phylogenetic analysis. I > wonder which module can read the output from other phylogenetic > softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to > produce a picture which combines the phylogenetic tree and the structure > of each gene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Wed Nov 2 12:29:45 2011 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT) Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: References: Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie> Hi, You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. $aln2 = $aln->slice(20, 30); Cheers, Jun ----- Original Message ----- From: wenbin mei Date: Wednesday, November 2, 2011 5:51 am Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment To: bioperl-l at lists.open-bio.org > Hi, > > I need some help in coding. I have a multiple sequence alignment > which has > gaps. And also I have a reference genome sequence in the > alignment which I > know all the coordinates for the protein coding genes. I want to > extractall these protein coding genes alignment from the big > alignment. I am using > Bio SimpleAlign but the question is that due to the gaps in the > alignment,the coordinates has shifted in the alignment. I wonder > is there a way I can > not count the gaps and still be able to extract the protein > alignment. One > way I can do is remove the gaps in the reference first and then > extract the > sequence. But I don't like this way ... Thank you for help. > > -best, > wenbin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Wed Nov 2 21:39:22 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Thu, 03 Nov 2011 09:39:22 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB14A7A.30307@gmail.com> References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com> Message-ID: <4EB1F0CA.80309@gmail.com> That's great! Many thanks, Roy. On 2011-11-2 21:49, Roy Chaudhuri wrote: > MEGA can export trees in Newick format, which can be read by > Bio::TreeIO. The tree can be drawn in EPS format using > Bio::Tree::Draw::Cladogram. See: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > On 02/11/2011 13:33, Dejian Zhao wrote: >> There are various packages on CPAN to cope with phylogenetic analysis. I >> wonder which module can read the output from other phylogenetic >> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to >> produce a picture which combines the phylogenetic tree and the structure >> of each gene. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From noncoding at gmail.com Thu Nov 3 05:59:26 2011 From: noncoding at gmail.com (Remo Sanges) Date: Thu, 03 Nov 2011 10:59:26 +0100 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie> References: <7300ecdd1dd56.4eb16ff9@ucd.ie> Message-ID: <4EB265FE.30909@gmail.com> To get the location in the initial sequence starting from a column in a multiple alignment you can: 1) create a Bio::LocatableSeq compliant object by using the method each_seq_with_id on the SimpleAlign object 2) then using the method location_from_column on the created LocatableSeq object HTH ERemo -- Remo Sanges Bioinformatics - Animal Physiology and Evolution Stazione Zoologica Anton Dohrn Villa Comunale, 80121 Napoli - Italy +39 081 5833428 On 11/2/11 5:29 PM, Jun Yin wrote: > Hi, > > You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. > > $aln2 = $aln->slice(20, 30); > > Cheers, > Jun > > > ----- Original Message ----- > From: wenbin mei > Date: Wednesday, November 2, 2011 5:51 am > Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment > To: bioperl-l at lists.open-bio.org > >> Hi, >> >> I need some help in coding. I have a multiple sequence alignment >> which has >> gaps. And also I have a reference genome sequence in the >> alignment which I >> know all the coordinates for the protein coding genes. I want to >> extractall these protein coding genes alignment from the big >> alignment. I am using >> Bio SimpleAlign but the question is that due to the gaps in the >> alignment,the coordinates has shifted in the alignment. I wonder >> is there a way I can >> not count the gaps and still be able to extract the protein >> alignment. One >> way I can do is remove the gaps in the reference first and then >> extract the >> sequence. But I don't like this way ... Thank you for help. >> >> -best, >> wenbin >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From G.Gallone at sms.ed.ac.uk Thu Nov 3 07:50:11 2011 From: G.Gallone at sms.ed.ac.uk (Giuseppe G.) Date: Thu, 03 Nov 2011 11:50:11 +0000 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk> Hi, I would be grateful if you could shed some light on the exact meaning of the method overall_percentage_identity in Bio::SimpleAlign. If I understand correctly, the method works by considering only aminoacids that are identical over all the members of the alignment, and then averaging over the total number of aminoacids in the sequence. Is this correct? Thank you Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Thu Nov 3 09:22:21 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 3 Nov 2011 14:22:21 +0100 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk> References: <4EB27FF3.9050203@sms.ed.ac.uk> Message-ID: Hi Giuseppe, If I understand correctly, the method works by considering only aminoacids > that are identical over all the members of the alignment Yes. > , and then averaging over the total number of aminoacids in the sequence. > Is this correct? > Almost. By default, the denominator is the alignment length, namely the length of the MSA including gaps. By means of the 'short' and 'long' options, it's also possible to use the shortest or longest sequence's ungapped lengths as the denominator. Dave From cjfields at illinois.edu Thu Nov 3 14:28:36 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 18:28:36 +0000 Subject: [Bioperl-l] OBDA redux? was Re: Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: (side thread, so re-titling...) On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J > wrote: >> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: >> >>> I think a different indexer is needed for the scale of key/value >>> pairs we see in fastq files if we want to make a fast lookup by >>> ID. I think speed is of essence for this type of solution and so >>> a forced all records must be 4 lines long is okay for this type >>> of implementation. >> >> This can always be an early optimization, that's easy enough. >> But I'm sure we will have to deal with multi-line seq/qual >> FASTQ at some point. >> >>> I found NOSQL implementations to be much better >>> performance and than any of the BDB type solutions -- they >>> end up being really slow at above 1-5M keys. I used >>> TokyoCabinet and KyotoCabinet to do indexing of accession >>> -> taxonomy ID and found it quite fast for the needs. I >>> haven't tried storing 100bp reads + qual string as the >>> value in it yet but I think it could be done, certainly worth >>> a prototype. >> >> Adding a middle layer where the backend storage is abstracted >> is the probably the (best|most flexible) option, converging on a >> good default that will work for this data. The actual interface is >> in place, though would it be more feasible to go the OBDA >> (converge on a cross-Bio* compatible schema)? Or are there >> problems afoot there we're unaware of? >> >> Re: specifics, I think Biopython uses SQLite, is that correct Peter? >> >> chris > > Yes, we're using SQLite3 to store essentially a list of filenames > and their format as one table, and then in the main table an > entry for each sequence recording the ID (only one accession, > unlike OBDA which had infrastructure for a secondary accession), > file number, offset of the start of the record, and optionally the > length of the record on disk. > > i.e. Basically what OBDA does, but using SQLite rather > than BDB (not included in Python 3) or a flat file index > (poor performance with large datasets). > > I find this design attractive on several levels: > * File format neutral, covers FASTA, FASTQ, GenBank, etc > * Preserves the original file untouched > * Index is a small single file (thanks to SQLite) > * Back end could be switched out > * Could be applied to compressed file formats > * Reuses existing parsing code to access entries > > This could easily form basis of OBDA v2, the main points > of difference I anticipate between the Bio* projects would > be naming conventions for the different file formats, and > what we consider to be the default record ID of each read > (e.g. which field in a GenBank file - although agreement > here is not essential). Some of that was already settled in > principle with OBDA v1. The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested). > On the other hand, you could try and store the parsed data > itself, which is where NOSQL looks more interesting. That > essentially requires the ability to serialise your annotated > sequence object model to disk - which would be tricky to do > cross project (much more ambitious than BioSQL is). It also > means the "index" becomes very large because it now holds > all the original data. > > Peter For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc). Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs). Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully. Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs. chris From p.j.a.cock at googlemail.com Thu Nov 3 14:52:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 3 Nov 2011 18:52:50 +0000 Subject: [Bioperl-l] OBDA redux? Message-ID: On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J wrote: > (side thread, so re-titling...) > And CC'ing open-bio-l, which is a better home for this than bioperl-l, where OBDA v2 talk came up again in discussion of a BioPerl indexing problem. Archive links for thread here: http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >> >> Yes, we're using SQLite3 to store essentially a list of filenames >> and their format as one table, and then in the main table an >> entry for each sequence recording the ID (only one accession, >> unlike OBDA which had infrastructure for a secondary accession), >> file number, offset of the start of the record, and optionally the >> length of the record on disk. >> >> i.e. Basically what OBDA does, but using SQLite rather >> than BDB (not included in Python 3) or a flat file index >> (poor performance with large datasets). >> >> I find this design attractive on several levels: >> * File format neutral, covers FASTA, FASTQ, GenBank, etc >> * Preserves the original file untouched >> * Index is a small single file (thanks to SQLite) >> * Back end could be switched out >> * Could be applied to compressed file formats >> * Reuses existing parsing code to access entries >> >> This could easily form basis of OBDA v2, the main points >> of difference I anticipate between the Bio* projects would >> be naming conventions for the different file formats, and >> what we consider to be the default record ID of each read >> (e.g. which field in a GenBank file - although agreement >> here is not essential). Some of that was already settled in >> principle with OBDA v1. > > The primary/secondary IDs could be configurable with a sane > default, I think the bioperl implementations allowed this (and > it is certainly something that will be requested). One reason I went with a single ID only was to keep the Python dictionary based API simple (think hash in Perl). You don't get secondary keys in a Python dict or a hash ;) As a nod to flexibility, in Biopython's Bio.SeqIO indexing you can provide a call back function to map the suggested ID to something else. Obviously this doesn't give the full flexibility of extracting a field from the record's annotation because we don't parse the whole record during indexing (it would be too slow). However, I'm happy for there to be an *optional* secondary key in an OBDA v2 SQLite schema, but Biopython probably won't populate it. We could optionally use it rather than the primary ID on loading an existing index though. Personally I would stick with one key in the index - it should be faster and makes it simpler to switch the back end if we need to later. If anyone wants a second key, they can build a second index *grin*. >> On the other hand, you could try and store the parsed data >> itself, which is where NOSQL looks more interesting. That >> essentially requires the ability to serialise your annotated >> sequence object model to disk - which would be tricky to do >> cross project (much more ambitious than BioSQL is). It also >> means the "index" becomes very large because it now holds >> all the original data. >> >> Peter > > For a fully cross-Bio* compliant format, I don't think it's feasible > to use serialized data unless they are serialized in something > that is easily deserialized across HLLs (JSON, BSON, YAML, > XML, etc). Either that, or such data is stored concurrently with > the binary blob, along with meta data that indicates the source > of the blob, parser, version, etc, etc (unless there are tools out > there that reliably interconvert serialized complex data structures > between HLLs). Anyway you go about it, it seems like it could > be a major ball of hurt, unless implemented very carefully. You missed out RDF as a serialisation ;) But yes, going down the shared serialisation route is going to be messy - as you are well aware: > Aside: I think this was one of the problems with > Bio::DB::SeqFeature::Store, in that it at one point stored > Perl-specific Storable blobs. > > chris Peter From cjfields at illinois.edu Thu Nov 3 15:47:51 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 19:47:51 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J > wrote: >> (side thread, so re-titling...) >> > And CC'ing open-bio-l, which is a better home for this than bioperl-l, > where OBDA v2 talk came up again in discussion of a BioPerl indexing > problem. Archive links for thread here: > > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html yes, good idea... >> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>> >>> Yes, we're using SQLite3 to store essentially a list of filenames >>> and their format as one table, and then in the main table an >>> entry for each sequence recording the ID (only one accession, >>> unlike OBDA which had infrastructure for a secondary accession), >>> file number, offset of the start of the record, and optionally the >>> length of the record on disk. >>> >>> i.e. Basically what OBDA does, but using SQLite rather >>> than BDB (not included in Python 3) or a flat file index >>> (poor performance with large datasets). >>> >>> I find this design attractive on several levels: >>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>> * Preserves the original file untouched >>> * Index is a small single file (thanks to SQLite) >>> * Back end could be switched out >>> * Could be applied to compressed file formats >>> * Reuses existing parsing code to access entries >>> >>> This could easily form basis of OBDA v2, the main points >>> of difference I anticipate between the Bio* projects would >>> be naming conventions for the different file formats, and >>> what we consider to be the default record ID of each read >>> (e.g. which field in a GenBank file - although agreement >>> here is not essential). Some of that was already settled in >>> principle with OBDA v1. >> >> The primary/secondary IDs could be configurable with a sane >> default, I think the bioperl implementations allowed this (and >> it is certainly something that will be requested). > > One reason I went with a single ID only was to keep the > Python dictionary based API simple (think hash in Perl). > You don't get secondary keys in a Python dict or a hash ;) > > As a nod to flexibility, in Biopython's Bio.SeqIO indexing you > can provide a call back function to map the suggested ID to > something else. Obviously this doesn't give the full flexibility > of extracting a field from the record's annotation because we > don't parse the whole record during indexing (it would be too > slow). Same with bioperl. > However, I'm happy for there to be an *optional* secondary > key in an OBDA v2 SQLite schema, but Biopython probably > won't populate it. We could optionally use it rather than the > primary ID on loading an existing index though. Optional implementation of that is fine by me. > Personally I would stick with one key in the index - it should > be faster and makes it simpler to switch the back end if we > need to later. If anyone wants a second key, they can build > a second index *grin*. That's easy enough. >>> On the other hand, you could try and store the parsed data >>> itself, which is where NOSQL looks more interesting. That >>> essentially requires the ability to serialise your annotated >>> sequence object model to disk - which would be tricky to do >>> cross project (much more ambitious than BioSQL is). It also >>> means the "index" becomes very large because it now holds >>> all the original data. >>> >>> Peter >> >> For a fully cross-Bio* compliant format, I don't think it's feasible >> to use serialized data unless they are serialized in something >> that is easily deserialized across HLLs (JSON, BSON, YAML, >> XML, etc). Either that, or such data is stored concurrently with >> the binary blob, along with meta data that indicates the source >> of the blob, parser, version, etc, etc (unless there are tools out >> there that reliably interconvert serialized complex data structures >> between HLLs). Anyway you go about it, it seems like it could >> be a major ball of hurt, unless implemented very carefully. > > You missed out RDF as a serialisation ;) > > But yes, going down the shared serialisation route is going > to be messy - as you are well aware: > >> Aside: I think this was one of the problems with >> Bio::DB::SeqFeature::Store, in that it at one point stored >> Perl-specific Storable blobs. >> >> chris > > Peter yes, it's a problem w/o an easy solution. Anyway, I think an implementation of such at this point would be a premature optimization. chris From biojiangke at gmail.com Tue Nov 8 17:29:54 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST) Subject: [Bioperl-l] Some questions about the Bio::PopGen In-Reply-To: References: Message-ID: <32805996.post@talk.nabble.com> I think the pi calculated in the function isn't really the pi as defined. You need to divide the value by total number of sites (in your case, it's 5, which is not your individual number but sequence length). I think the reason they implemented this way is that sometimes it's easier to work only with variable sites. The aln to population function converts an aln object to a population object. You can't really see the object unless you write additional codes to write it out or do some calculations on it. The third question depends on your specific needs. For population level analyses of molecular evolution, I usually create a multiple sequence alignment with other applications (clustalw etc), then manually adjust the alignments to make sure they represent homology. I wouldn't touch the alignment once this is done but only make an aln (or whatever format you want) for inputting to analyses applications, like Bio::PopGen (usually use the aln_to_population function you're using now). Qian Zhao wrote: > > Hi > Recently, I am learning how to caculate pi, Fst, Tajima D using > Bio::PopGen. > I am not familiar with Perl and I am really confused with the following > problems. > (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used > to caculate is this: > __DATA__ > 01 A01 A > 01 A02 A > 01 A03 A > 01 A04 A > 01 A05 A > 02 A01 A > 02 A02 T > 02 A03 T > 02 A04 T > 02 A05 T > 03 A01 G > 03 A02 G > 03 A03 G > 03 A04 G > 03 A05 G > 04 A01 G > 04 A02 G > 04 A03 C > 04 A04 C > 04 A05 G > 05 A01 T > 05 A02 C > 05 A03 T > 05 A04 T > 05 A05 T > And I am not sure if I can use these sequences below to demostrate the > prettybase format above: >>A01 > AAGGT >>A02 > ATGGC >>A03 > ATGCT >>A04 > ATGCT >>A05 > ATGGT > The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I > use DnaSP. I find that if the 1.4/5=0.28, which means that if the number > from Bio::PopGen::Statistics is divided by the individula number, the > result > would be exactly the same. Is there something wrong in my perl script? The > code I used was below: > #/usr/bin/perl -w > use warnings; > use strict; > use Bio::PopGen::Genotype; > my $genotype = Bio::PopGen::Genotype->new(-marker_name => 'gene_1', > -individual_id => '001', > -alleles => ['1','5'] ); > use Bio::PopGen::Individual; > my $ind = Bio::PopGen::Individual->new(-unique_id => '001', > -genotypes => [$genotype] ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > use Bio::PopGen::Population; > my $pop = Bio::PopGen::Population->new(-name => 'Bm', > -description => 'description', > -individuals => [$ind] ); > use Bio::PopGen::IO; > use Bio::PopGen::Statistics; > my $nummarkers = $pop->get_marker_names; > my $stats = Bio::PopGen::Statistics->new(); > my $io = Bio::PopGen::IO->new (-format => 'prettybase', > -file => '1.txt'); > if( my $pop = $io->next_population ) { > my $pi = $stats->pi($pop, $nummarkers); > print "pi is $pi\n"; > my @inds; > for my $ind ( $pop->get_Individuals ) { > if( $ind->unique_id =~ /A0[1-3]/ ) { > push @inds, $ind; > } > } > print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n"; > } > > (2) I want to use Bio::PopGen::Utilities to translate the alignment file > to > the population file. However, I can not find the result file after the > program. I use the following code: > use Bio::PopGen::Utilities; > use Bio::AlignIO; > > my $in = Bio::AlignIO->new(-file => 't/data/t7.aln', > -format => 'clustalw'); > my $aln = $in->next_aln; > my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln); > my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model => > 'cod', > -alignment => > $aln); > I am not sure where I should add my result file' name in the code. > (3) If my file contains a lot of individual sequences and one individual > has > one genotype. I'd like to know how can I use the Bio::PopGen::Individual, > Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which > can used in Bio::PopGen::Statistics ? > > I will be great appreciated if I can get the answers. Thanks for your time > and Best Wishes! > Qian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biojiangke at gmail.com Tue Nov 8 17:51:22 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST) Subject: [Bioperl-l] questions about the bioperl module Bio::PopGen::Statistics In-Reply-To: <201106012030039537050@gmail.com> References: <201106012030039537050@gmail.com> Message-ID: <32805997.post@talk.nabble.com> If you read the Bio::PopGen doc, you'll see there is an optional argument for the function that calculates pi, which is taking the number of sites into consideration. Also, when you use the aln_to_population function to input an alignment, you can use the option to take in all sites, including the monomorphic sites. I think if you implement both in your script, you'll get the same pi value as from other applications like DnaSP. In terms of sliding window analyses, you may have to implement your own method to move along the windows, but I think DnaSP is ready to do that, you don't have to write your won script. lvu.jun wrote: > > Hi, there, > I am trying to calculate the population genetics parameters such as pi > using the bioperl module Bio::PopGen::Statistics. But I found that the > method only requires the input of the marker genotype of every individuals > for the population. I don't know why the module does not take the DNA > sequence length into consideration when calculating the pi value. > According to the definition of the pi value, besides the polymorphic > sites, we also need the monomorphic sites that should be incorporated in > the denominator when doing the calculation. Is it right? therefore I'm > confused about the module, who can tell me why it can correctly calculate > the pi value only with the marker(polymorphic) genotype? > Another question, if I want to calculate the pi value using the sliding > window along the genome, how can I do this using the > Bio::PopGen::Statistics module? > Thanks for your help! > Yours sincerely, > Jun > > Chinese Academy of Sciences > > 2011-06-01 > > > > lvu.jun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shachigahoimbi at gmail.com Wed Nov 9 00:22:33 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Wed, 9 Nov 2011 10:52:33 +0530 Subject: [Bioperl-l] Run FGENESH using bioperl Message-ID: Dear All. I have multi-fasta sequence file and I want to run FGENESH and I would like to run the FGENESH for sequence one by one stored in multi-fasta sequence file. Is it possible using Bioperl ? Please guide me. Thanks in advance. -- Regards, Shachi From pankajt322 at gmail.com Thu Nov 3 08:12:44 2011 From: pankajt322 at gmail.com (pankaj) Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT) Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On Oct 21, 1:59?am, Shachi Gahoi wrote: > Dear all, > > I have fasta format sequence file and I want to extract ORF ID "PITG_14194" > from fasta file and then I want to rename same file with that ORF ID > "PITG_14194". > > I have many files and I want to do same exercise with all sequence files. > > Please tell me how can i do this with perl or bioperl. > > >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora > > infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 > MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL > ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA > RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF > HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM > YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL > TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD > RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR > NGIAVDHKGVICNGKAPIEIAVDENTLSAAA > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From azaballos at isciii.es Wed Nov 9 06:28:21 2011 From: azaballos at isciii.es (Angel Zaballos) Date: Wed, 9 Nov 2011 12:28:21 +0100 Subject: [Bioperl-l] bp_genbank2gff.pl bug Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Running bp_genbank2gff.pl got this: [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251. ?ngel Zaballos Unidad de Gen?mica Centro Nacional de Microbiolog?a-ISCIII Carretera Majadahonda-Pozuelo, Km 2,2 28220-Majadahonda Tel: 918223994 mail: azaballos at isciii.es ************************* AVISO LEGAL ************************* Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, pudiendo contener documentos anexos de car?cter privado y confidencial. Si por error, ha recibido este mensaje y no se encuentra entre los destinatarios, por favor, no use, informe, distribuya, imprima o copie su contenido por ning?n medio. Le rogamos lo comunique al remitente y borre completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no asume ning?n tipo de responsabilidad legal por el contenido de este mensaje cuando no responda a las funciones atribuidas al remitente del mismo por la normativa vigente. From scott at scottcain.net Wed Nov 9 11:12:02 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 11:12:02 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: Hi Angel, I would suggest using bp_genbank2gff3.pl, as it is more actively maintained; the bp_genbank2gff.pl script hasn't really been touched in many years, and I imagine it's suffering from some serious code rot. Scott 2011/11/9 Angel Zaballos > Running bp_genbank2gff.pl got this: > > [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > AAXT01000001.1 > babesichr3.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 11:13:10 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 16:13:10 +0000 Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On 3 November 2011 12:12, pankaj wrote: > > > On Oct 21, 1:59?am, Shachi Gahoi wrote: >> Dear all, >> >> I have fasta format sequence file and I want to extract ORF ID "PITG_14194" >> from fasta file and then I want to rename same file with that ORF ID >> "PITG_14194". >> >> I have many files and I want to do same exercise with all sequence files. >> >> Please tell me how can i do this with perl or bioperl. >> >> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora >> >> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 >> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL >> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA >> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF >> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM >> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL >> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD >> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR >> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA >> ---------- Forwarded message ---------- From: Jason Stajich Date: 21 October 2011 10:56 Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl To: Shachi Gahoi Cc: bioperl-l at bioperl.org easy to do this with a simple regular expression and opening a new file. Have you read up on this concept in Perl. You can use SeqIO to parse FASTA files - did you read the HOWTO and website documentation first? We don't typically do people's work for them on this mailing list so please show some effort first. From scott at scottcain.net Wed Nov 9 13:43:00 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 13:43:00 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Chris, Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. Scott 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or > remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus > destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie > su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III > no > >> asume ning?n tipo de responsabilidad legal por el contenido de este > mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo > por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 13:39:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 18:39:52 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Scott, Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? chris On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 9 14:51:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 19:51:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Scott, It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder. Either would prevent it from being packaged and installed in future versions. (Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules) chris On Nov 9, 2011, at 12:43 PM, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. > > Scott > > > 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 15:39:17 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 20:39:17 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: On 9 November 2011 18:43, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the > code repository) is not a bad idea. ?I can't really think of a down side. > > Scott Hi can I suggest instead to simply make the script issue a warning right at the start? Something like "bp_genbank2gff is obsolete and will be removed from a future version of bioerl; please use bp_genbank2gff3 instead". You could leave it there for the next 2 releases and then finally remove it. This would have 2 advantages: 1) people that have been using it will immediately know what to use as replacement (instead of coming and ask in the mailing list)? 2) people who use it but don't know anything about the subject, someone told them to "just press this button" or "just type this in the terminal", won't have suddenly a broken system and will have time to find someone that will make it work again. That's what's done in GNU octave and I think it works good there. Carn? From scott at scottcain.net Wed Nov 9 15:48:07 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 15:48:07 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Carn?, You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) Scott 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 16:59:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 21:59:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Works for me, it's a standard deprecation policy. The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning). chris On Nov 9, 2011, at 2:48 PM, Scott Cain wrote: > Hi Carn?, > > You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) > > Scott > > > 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From biopython at maubp.freeserve.co.uk Thu Nov 10 08:09:40 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 13:09:40 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: <31659982.post@talk.nabble.com> References: <31659982.post@talk.nabble.com> Message-ID: Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: > > I received the following error while trying to run bl2seq from > standaloneblastplus. Has anyone else encountered this problem? > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: /usr/bin/blastp call crashed: There was a problem running > /usr/bin/blastp : Error: NCBI C++ Exception: > > "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", > line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to > access NULL pointer. > > Thank you, > Ryan Just hit something very very similar, looks like a BLAST+ bug which I will report now: $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query NC_003197.fna -evalue 0.0001 -subject NC_011294.fna Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", line 689: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer. This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was BLAST 2.2.24+ (blastp) from the look of the error. The line number has changed by one, but I'm confident it is the same point of failure. In my case I was comparing nucleotide against nucleotide, so should have been using tblastx not tblastn, but it still shouldn't have had a pointer exception. Peter From cjfields at illinois.edu Thu Nov 10 09:00:46 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 14:00:46 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Nov 10, 2011, at 7:09 AM, Peter wrote: > Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html > > On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >> >> I received the following error while trying to run bl2seq from >> standaloneblastplus. Has anyone else encountered this problem? >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: /usr/bin/blastp call crashed: There was a problem running >> /usr/bin/blastp : Error: NCBI C++ Exception: >> >> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >> access NULL pointer. >> >> Thank you, >> Ryan > > Just hit something very very similar, looks like a BLAST+ bug which I > will report now: > > $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query > NC_003197.fna -evalue 0.0001 -subject NC_011294.fna > Error: NCBI C++ Exception: > "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", > line 689: Critical: ncbi::CObject::ThrowNullPointerException() - > Attempt to access NULL pointer. > > This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was > BLAST 2.2.24+ (blastp) from the look of the error. The line number has > changed by one, but I'm confident it is the same point of failure. > > In my case I was comparing nucleotide against nucleotide, so should > have been using tblastx not tblastn, but it still shouldn't have had a > pointer exception. > > Peter Yeah, that's bad. I have seen a few things like this myself that make me worry about the transition to BLAST+. chris PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? From casaburi at ceinge.unina.it Thu Nov 10 07:29:55 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads Message-ID: <32818254.post@talk.nabble.com> Hi everybody, i have some reads (454) where there are adaptors (NNNN...), one,two or three adaptors for each reads depending on the reads. Is there any way to establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors over the total ??? >271-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >272-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >273-88 GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >274-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA The problem is that some adpators occur in the middle of the sequences because they coming out from a concameration experimental design (they are miRNAs between NNNNNN...). So i want to know a script or tool that may say how many reads have 1 adapt, how many 2, (max are 4) in respect to the total number of reads. Do you know any tool/script that may help ? Tnx Can anyone suggests me a script to fix this ??? Thank you very much -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jovel_juan at hotmail.com Thu Nov 10 11:06:16 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Thu, 10 Nov 2011 16:06:16 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <32818254.post@talk.nabble.com> References: <32818254.post@talk.nabble.com> Message-ID: There are many ways to do it. Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. For example: $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. You then place that result in a hash bin: my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} # Then you can sort and output your classes foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } You can workout the details, but something like this should work. > Date: Thu, 10 Nov 2011 04:29:55 -0800 > From: casaburi at ceinge.unina.it > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Scripting help to identify adaptors count in reads > > > Hi everybody, > > i have some reads (454) where there are adaptors (NNNN...), one,two or three > adaptors for each reads depending on the reads. Is there any way to > establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors > over the total ??? > > >271-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG > >272-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC > >273-88 > GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA > >274-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA > > The problem is that some adpators occur in the middle of the sequences > because they coming out from a concameration experimental design (they are > miRNAs between NNNNNN...). So i want to know a script or tool that may say > how many reads have 1 adapt, how many 2, (max are 4) in respect to the total > number of reads. Do you know any tool/script that may help ? Tnx > Can anyone suggests me a script to fix this ??? > > Thank you very much > -- > View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Nov 10 11:55:53 2011 From: scott at scottcain.net (Scott Cain) Date: Thu, 10 Nov 2011 11:55:53 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: Hi Angel, Please keep correspondence on the mailing list. I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), and it worked fine. I suspect there is something wrong with your genbank file. Scott On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > His Scott, > > Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same > happened: > > [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > > babesichr3_2.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > UNIVERSAL->import is deprecated and will be removed in a future perl at > /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 > > However, the output file seems to be correct (Indeed, that was also the > case for bp_genbank2gff.pl). I then ran ldHgGene and worked: > > [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab > babesiachr3_2.gff > Reading babesiachr3_2.gff > Read 4776 transcripts in 8821 lines in 1 files > 4776 groups 1 seqs 1 sources 6 feature types > 2379 gene predictions > > I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a > Mac with Parallels. Maybe tis is the cause for such a message. > > Regards > > > ?ngel > > > El 09/11/2011, a las 17:12, Scott Cain escribi?: > > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este >> mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por >> la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From l.m.timmermans at students.uu.nl Thu Nov 10 12:17:12 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Thu, 10 Nov 2011 18:17:12 +0100 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence > (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# > $adapter_matches will store the number of times the adapter sequence is > repeated. > No, it will not. tr/// will count characters, not sequences. Something like ?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH. Leon From cjfields at illinois.edu Thu Nov 10 14:17:57 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 19:17:57 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu> This is running using an older version of bioperl (probably 1.6.0 or 1.6.1). The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed. chris On Nov 10, 2011, at 10:55 AM, Scott Cain wrote: > Hi Angel, > > Please keep correspondence on the mailing list. > > I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), > and it worked fine. I suspect there is something wrong with your genbank > file. > > Scott > > > On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > >> His Scott, >> >> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same >> happened: >> >> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > >> babesichr3_2.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> UNIVERSAL->import is deprecated and will be removed in a future perl at >> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 >> >> However, the output file seems to be correct (Indeed, that was also the >> case for bp_genbank2gff.pl). I then ran ldHgGene and worked: >> >> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab >> babesiachr3_2.gff >> Reading babesiachr3_2.gff >> Read 4776 transcripts in 8821 lines in 1 files >> 4776 groups 1 seqs 1 sources 6 feature types >> 2379 gene predictions >> >> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a >> Mac with Parallels. Maybe tis is the cause for such a message. >> >> Regards >> >> >> ?ngel >> >> >> El 09/11/2011, a las 17:12, Scott Cain escribi?: >> >> Hi Angel, >> >> I would suggest using bp_genbank2gff3.pl, as it is more actively >> maintained; the bp_genbank2gff.pl script hasn't really been touched in >> many years, and I imagine it's suffering from some serious code rot. >> >> Scott >> >> >> 2011/11/9 Angel Zaballos >> >>> Running bp_genbank2gff.pl got this: >>> >>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >>> AAXT01000001.1 > babesichr3.gff >>> Replacement list is longer than search list at >>> /usr/share/perl5/Bio/Range.pm line 251. >>> >>> >>> >>> ?ngel Zaballos >>> Unidad de Gen?mica >>> Centro Nacional de Microbiolog?a-ISCIII >>> Carretera Majadahonda-Pozuelo, Km 2,2 >>> 28220-Majadahonda >>> >>> Tel: 918223994 >>> mail: azaballos at isciii.es >>> >>> >>> >>> >>> ************************* AVISO LEGAL ************************* >>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >>> pudiendo contener documentos anexos de car?cter privado y confidencial. >>> Si por error, ha recibido este mensaje y no se encuentra entre los >>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >>> asume ning?n tipo de responsabilidad legal por el contenido de este >>> mensaje >>> cuando no responda a las funciones atribuidas al remitente del mismo por >>> la >>> normativa vigente. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Nov 10 14:27:22 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 19:27:22 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J wrote: > On Nov 10, 2011, at 7:09 AM, Peter wrote: > >> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html >> >> On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >>> >>> I received the following error while trying to run bl2seq from >>> standaloneblastplus. Has anyone else encountered this problem? >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: /usr/bin/blastp call crashed: There was a problem running >>> /usr/bin/blastp : Error: NCBI C++ Exception: >>> >>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >>> access NULL pointer. >>> >>> Thank you, >>> Ryan >> >> Just hit something very very similar, looks like a BLAST+ bug which I >> will report now: >> >> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query >> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna >> Error: NCBI C++ Exception: >> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", >> line 689: Critical: ncbi::CObject::ThrowNullPointerException() - >> Attempt to access NULL pointer. >> >> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was >> BLAST 2.2.24+ (blastp) from the look of the error. The line number has >> changed by one, but I'm confident it is the same point of failure. >> >> In my case I was comparing nucleotide against nucleotide, so should >> have been using tblastx not tblastn, but it still shouldn't have had a >> pointer exception. >> >> Peter > > Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+. > > chris I'm told is already fixed and will be part of BLAST 2.2.26+ which is good. > > PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? > Maybe once, but it was in the archive and my email account. Peter From anna.fr at gmail.com Thu Nov 10 15:01:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 09:01:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? Message-ID: Hi all Does anyone know if there is a way to get a Taxonomy node and/or taxonid from a gi number using the flatfile with taxonomy db? I have blast output that I want to append taxonomic information to. I have hundreds of thousands of items to do this for, so it's not practical to use entrez to query the?NCBI database. I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I think much too large to put into a hash! This was also discussed in 2009: http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I don't think there was a conclusion? Thanks for your help Anna Friedlander From shalabh.sharma7 at gmail.com Thu Nov 10 15:12:09 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 10 Nov 2011 15:12:09 -0500 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, I think the thread you mentioned was started by me. That time i wrote few scripts to map gi to taxa, after some time i found some other efficient ways also. But recently 'Miguel Pignatelli' directed to some Bio-LITE modules that are really helpful. These are the modules he mentioned, i found them really easy to use and very efficient. Bio-LITE-Taxonomy-0.07 Bio-LITE-Taxonomy-NCBI-0.07 Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 Cheers Shalabh On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Nov 10 15:23:14 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 20:23:14 +0000 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu> Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option). I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups. chris On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote: > Hi Anna, > I think the thread you mentioned was started by me. > That time i wrote few scripts to map gi to taxa, after some time i found > some other efficient ways also. But recently 'Miguel Pignatelli' directed > to some Bio-LITE modules that are really helpful. > > These are the modules he mentioned, i found them really easy to use and > very efficient. > > Bio-LITE-Taxonomy-0.07 > Bio-LITE-Taxonomy-NCBI-0.07 > Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 > > Cheers > Shalabh > > On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Thu Nov 10 15:51:13 2011 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 10 Nov 2011 21:51:13 +0100 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, Jason changed his example script from using hashes to using SQLite: bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom See https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl It's an example script that shows how to do the tax to gi mapping for blast reports. Bernd On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Nov 10 16:13:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 21:13:12 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split? Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)). tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match. '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/). chris On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. > You then place that result in a hash bin: > my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} > # Then you can sort and output your classes > foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } > > You can workout the details, but something like this should work. > > > > > > > >> Date: Thu, 10 Nov 2011 04:29:55 -0800 >> From: casaburi at ceinge.unina.it >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Scripting help to identify adaptors count in reads >> >> >> Hi everybody, >> >> i have some reads (454) where there are adaptors (NNNN...), one,two or three >> adaptors for each reads depending on the reads. Is there any way to >> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors >> over the total ??? >> >>> 271-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >>> 272-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >>> 273-88 >> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >>> 274-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA >> >> The problem is that some adpators occur in the middle of the sequences >> because they coming out from a concameration experimental design (they are >> miRNAs between NNNNNN...). So i want to know a script or tool that may say >> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total >> number of reads. Do you know any tool/script that may help ? Tnx >> Can anyone suggests me a script to fix this ??? >> >> Thank you very much >> -- >> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Thu Nov 10 16:15:29 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 10 Nov 2011 13:15:29 -0800 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI and then a second db to store GI -> TAXONID This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string. https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl That's the first 165 lines, and then lookups are basically what you see on line 195. Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?). one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading. Jason On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From anna.fr at gmail.com Thu Nov 10 20:07:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 14:07:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> References: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Message-ID: thanks all for the fast responses. I'll try the bio-lite modules shalabh mentioned On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich wrote: > Here's another variant of one I wrote which is for my own purposes, the code > at the beginning uses a NOSQL solution to storing all the ACC -> GI > and then a second db to store GI -> TAXONID > This is the case where I have a file of accession numbers and I want to add > to the description line the taxonomy string. > https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl > That's the first 165 lines, and then lookups are basically what you see on > line 195. > Would be good to rewrite that script below to use TokyoCabinent > or?KyotoCabinent (is newer implementation, not sure if it is faster?). > one thing that this does is take up a lot of disk space ,but you can have > tradeoffs between than and which compression scheme you use, which will > impact performance of loading. > Jason > On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > > have hundreds of thousands of items to do this for, so it's not > > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > > think much too large to put into a hash! > > This was also discussed in 2009: > > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > > don't think there was a conclusion? > > Thanks for your help > > Anna Friedlander > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arun_innovative90 at yahoo.com Fri Nov 11 06:09:46 2011 From: arun_innovative90 at yahoo.com (Arun Kumar) Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST) Subject: [Bioperl-l] BIOPERL MATERIAL Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Hi team, ? ?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl. ? Thanks in advance Thanks & Regards, Arunkumar.d From awitney at sgul.ac.uk Fri Nov 11 08:23:29 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 11 Nov 2011 13:23:29 +0000 Subject: [Bioperl-l] BIOPERL MATERIAL In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Message-ID: All BioPerl documents can be found here: http://www.bioperl.org/wiki/Main_Page And a useful place to start would be the HOWTOs: http://www.bioperl.org/wiki/HOWTOs regards adam On 11 Nov 2011, at 11:09, Arun Kumar wrote: > Hi team, > > This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with bioperl. > > Thanks in advance > > Thanks & Regards, > Arunkumar.d > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From casaburi at ceinge.unina.it Fri Nov 11 07:13:50 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825229.post@talk.nabble.com> Hi thank you for your answer !!! At the end i tried this script and seems to work for this purpose: perl -pe 's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g' Scrivania/orchidea/Fiore/Mydata.fasta > result.txt -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From casaburi at ceinge.unina.it Fri Nov 11 07:21:29 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825274.post@talk.nabble.com> Thanks everybody for answering me so soon !!! Probably another way may be: perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print "$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt and/or with 'nawk': nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i " ADAPTOR"}' myFile.fasta > result.txt They give the same result. If you will have this problem try these, work good !!! Still Thanks to all, Giorgio -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Sun Nov 13 07:24:35 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:24:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J wrote: > On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > >> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J >> wrote: >>> (side thread, so re-titling...) >>> >> And CC'ing open-bio-l, which is a better home for this than bioperl-l, >> where OBDA v2 talk came up again in discussion of a BioPerl indexing >> problem. Archive links for thread here: >> >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > > yes, good idea... I've not CC'd the bioperl-l anymore. >>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>>> >>>> Yes, we're using SQLite3 to store essentially a list of filenames >>>> and their format as one table, and then in the main table an >>>> entry for each sequence recording the ID (only one accession, >>>> unlike OBDA which had infrastructure for a secondary accession), >>>> file number, offset of the start of the record, and optionally the >>>> length of the record on disk. >>>> >>>> i.e. Basically what OBDA does, but using SQLite rather >>>> than BDB (not included in Python 3) or a flat file index >>>> (poor performance with large datasets). >>>> >>>> I find this design attractive on several levels: >>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>>> * Preserves the original file untouched >>>> * Index is a small single file (thanks to SQLite) >>>> * Back end could be switched out >>>> * Could be applied to compressed file formats >>>> * Reuses existing parsing code to access entries >>>> >>>> This could easily form basis of OBDA v2, the main points >>>> of difference I anticipate between the Bio* projects would >>>> be naming conventions for the different file formats, and >>>> what we consider to be the default record ID of each read >>>> (e.g. which field in a GenBank file - although agreement >>>> here is not essential). Some of that was already settled in >>>> principle with OBDA v1. >>> >>> The primary/secondary IDs could be configurable with a sane >>> default, I think the bioperl implementations allowed this (and >>> it is certainly something that will be requested). >> >> One reason I went with a single ID only was to keep the >> Python dictionary based API simple (think hash in Perl). >> You don't get secondary keys in a Python dict or a hash ;) >> >> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you >> can provide a call back function to map the suggested ID to >> something else. Obviously this doesn't give the full flexibility >> of extracting a field from the record's annotation because we >> don't parse the whole record during indexing (it would be too >> slow). > > Same with bioperl. > >> However, I'm happy for there to be an *optional* secondary >> key in an OBDA v2 SQLite schema, but Biopython probably >> won't populate it. We could optionally use it rather than the >> primary ID on loading an existing index though. > > Optional implementation of that is fine by me. > >> Personally I would stick with one key in the index - it should >> be faster and makes it simpler to switch the back end if we >> need to later. If anyone wants a second key, they can build >> a second index *grin*. > > That's easy enough. > >>>> On the other hand, you could try and store the parsed data >>>> itself, which is where NOSQL looks more interesting. That >>>> essentially requires the ability to serialise your annotated >>>> sequence object model to disk - which would be tricky to do >>>> cross project (much more ambitious than BioSQL is). It also >>>> means the "index" becomes very large because it now holds >>>> all the original data. >>>> >>>> Peter >>> >>> For a fully cross-Bio* compliant format, I don't think it's feasible >>> to use serialized data unless they are serialized in something >>> that is easily deserialized across HLLs (JSON, BSON, YAML, >>> XML, etc). ?Either that, or such data is stored concurrently with >>> the binary blob, along with meta data that indicates the source >>> of the blob, parser, version, etc, etc (unless there are tools out >>> there that reliably interconvert serialized complex data structures >>> between HLLs). ?Anyway you go about it, it seems like it could >>> be a major ball of hurt, unless implemented very carefully. >> >> You missed out RDF as a serialisation ;) >> >> But yes, going down the shared serialisation route is going >> to be messy - as you are well aware: >> >>> Aside: I think this was one of the problems with >>> Bio::DB::SeqFeature::Store, in that it at one point stored >>> Perl-specific Storable blobs. >>> >>> chris >> >> Peter > > yes, it's a problem w/o an easy solution. ?Anyway, I think an > implementation of such at this point would be a premature > optimization. > > chris So, Chris and I seem in general agreement that an OBDA v2 using SQLite but based on essentially the same approach as the BDB or flat file based OBDA v1 is a good idea. i.e. Tables mapping record identifiers to file offsets in the original sequence files. I hope to get BioRuby on board, they already have an OBDA v1 support so that shouldn't be too hard. Right now I don't recall if BioJava has/had OBDA v1 support, and if they did if it was affected in their recent move to BioJava v3 (I understand from their mailing list that some older lower priority functionality has not all been ported yet). Also EMBOSS are likely to be interested, certainly Peter Rice was interested in the SQLite indexing we're already using in Biopython for sequence files (i.e. what is effectively the prototype for OBDA v2). Note that in addition to simple indexing of text files, we are already using the same simple offset + length approach for indexing binary files (e.g. SFF). On the immediate practical side, I think I can edit the current OBDA website of http://obda.open-bio.org/ via /home/websites/obda.open-bio.org/html on the server. We need to work out where the current OBDA indexing specification lives (CVS or SVN?) and perhaps move that to github. We may need a general OBF organisation account on git hub for this and any other cross-project repositories. I see there is already an OBDA project on RedMine, (Chris can you add me to that please?) https://redmine.open-bio.org/projects/obda Peter From p.j.a.cock at googlemail.com Sun Nov 13 07:30:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:30:37 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files Message-ID: Hi again, I've retitled this as it is a little off topic from the main OBDA redux thread, http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html As far as I recall, the original flat file and BDB based OBDA specification for indexing sequencing files didn't cover compressed files. That might be something to consider (although we should sort of uncompressed text/binary files first). I've recently been experimenting with using compressed files - in particular simple GZIP files (ignoring any block structure) and BGZF (the specialised gzipped blocking used in BAM), see: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html http://seqanswers.com/forums/showthread.php?t=15347 The virtual offset approach used in BGZF squeezes a 16 bit within block offset (thus limiting you to 64kb blocks) and at 48 bit block start offset (thus limiting you to a 256TB file) into a single 64bit "virtual" offset. That makes sense if you are keeping the lookup table or many offsets in memory, and can be used as is with code expecting a single offset (like the current Biopython SQLite index schema). Also bzip2 but this is block based, with the block size ranging from 100KB to 900KB. http://bzip.org/ http://bzip.org/1.0.5/bzip2-manual-1.0.5.html I haven't tried any performance tests yet, which would be interesting as I believe compression/decompression of bfzip2 is more costly in CPU terms than gzip (although both will be block size dependent). If we wanted to imitate the BGZF virtual offset scheme for arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme could use 20 bits to cover bz2 blocks of up to 900KB, leaving 64 - 20 = 44 bits for the start offset, thus limiting you to to just 2^44 bytes or 16Tb which sounds OK only in the medium term. On the bright side this could be used to index any BZIP2 file (under 16TB), whereas BGZF cannot be applied to any GZIP file. On the other hand, storing the block start and within block separately is truly generic and could be used on any blocked GZIP file (including BGZF) and BZIP2 etc. It would make the SQLite schema a bit more complicated though. Maybe something to consider for the next revision to OBDA, and focus on the non-compressed case for now? Regards, Peter From p.j.a.cock at googlemail.com Sun Nov 13 07:32:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:32:12 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files In-Reply-To: References: Message-ID: On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock wrote: > Hi again, > > I've retitled this as it is a little off topic from the main OBDA redux thread, > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html > > As far as I recall, the original flat file and BDB based OBDA > specification for indexing sequencing files didn't cover > compressed files. That might be something to consider > (although we should sort of uncompressed text/binary > files first). Sorry - didn't meant to include bioperl-l on that, although it may be of interest to you guys anyway. Peter From jluis.lavin at unavarra.es Mon Nov 14 06:14:43 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 12:14:43 +0100 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Mon Nov 14 06:59:56 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 06:59:56 -0500 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > Hello everybody, > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > worked fine for me. Now I need to perform a multiple BLAST search, but this > time I'd just like to get all the BLAST results in a single out file > instead of having each sequence's report written individually. I've read > the documentation of the module, but due to my short > experience/understanding on complex modules as this one seems to be I can't > figure out where to change the script to achieve my previously mentioned > aim. > Here I post the script I've been using (it's basically the one posted on > the module cookbook). > > #!/c:/Perl -w > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > use Data::Dumper; > > #Here i set the parameters for blast > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > tblastx):\n"; > my $blst = ; > my $prog = "$blst"; > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > env_nr):\n"; > my $dtb = ; > $db = "$dtb"; > print "Enter your cutt off score (1e-n):\n"; > my $cut = ; > my $e_val = "$cut"; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > #Select the file and make the blast. > print "Enter your FASTA file:\n"; > chomp(my $infile = ); > my $r = $remoteBlast->submit_blast($infile); > my $v = 1; > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > TO RETURN!!!!! > while ( my @rids = $remoteBlast->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $remoteBlast->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $remoteBlast->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = > $result->query_name()."\.out";##################open SALIDA, > '>>'."$^T"."Report"."\.out"; > $remoteBlast->save_output($filename);############# > $remoteBlast->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > > > May any of you please explain me how to solve my question? > > Thanks in advence > > With best wishes > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Nov 14 09:07:36 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 09:07:36 -0500 Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single out References: Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Please keep this on list discussions Sent from my iPhone-please excuse typos -- Jason Stajich Begin forwarded message: > From: Jos? Luis Lav?n > Date: November 14, 2011 8:04:25 AM EST > To: Jason Stajich > Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out > > Hello Jason, > > As answering your question: > > " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" > > A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. > I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. > > Thanks in advance > > El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: > if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. > > If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? > > On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > > > Hello everybody, > > > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > > worked fine for me. Now I need to perform a multiple BLAST search, but this > > time I'd just like to get all the BLAST results in a single out file > > instead of having each sequence's report written individually. I've read > > the documentation of the module, but due to my short > > experience/understanding on complex modules as this one seems to be I can't > > figure out where to change the script to achieve my previously mentioned > > aim. > > Here I post the script I've been using (it's basically the one posted on > > the module cookbook). > > > > #!/c:/Perl -w > > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > use Data::Dumper; > > > > #Here i set the parameters for blast > > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > > tblastx):\n"; > > my $blst = ; > > my $prog = "$blst"; > > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > > env_nr):\n"; > > my $dtb = ; > > $db = "$dtb"; > > print "Enter your cutt off score (1e-n):\n"; > > my $cut = ; > > my $e_val = "$cut"; > > > > my @params = ( '-prog' => $prog, > > '-data' => $db, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #Select the file and make the blast. > > print "Enter your FASTA file:\n"; > > chomp(my $infile = ); > > my $r = $remoteBlast->submit_blast($infile); > > my $v = 1; > > > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > > TO RETURN!!!!! > > while ( my @rids = $remoteBlast->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $remoteBlast->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $remoteBlast->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = > > $result->query_name()."\.out";##################open SALIDA, > > '>>'."$^T"."Report"."\.out"; > > $remoteBlast->save_output($filename);############# > > $remoteBlast->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > > > > > May any of you please explain me how to solve my question? > > > > Thanks in advence > > > > With best wishes > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN From cl134 at duke.edu Sun Nov 13 09:42:05 2011 From: cl134 at duke.edu (Cheng-Ruei Lee) Date: Sun, 13 Nov 2011 09:42:05 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Hi all, Bioperl version: 1.006 Here are two error messages when I'm using this module to calculate Fu & Li's statistics: Illegal division by zero at (the Statistics.pm file) line 359 Illegal division by zero at (the Statistics.pm file) line 376 A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. Sincerely, Cheng-Ruei Lee From joluito at gmail.com Mon Nov 14 04:21:31 2011 From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 10:21:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Mon Nov 14 12:02:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:02:22 +0000 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... chris On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > Please keep this on list discussions > > Sent from my iPhone-please excuse typos > > -- > Jason Stajich > > Begin forwarded message: > >> From: Jos? Luis Lav?n >> Date: November 14, 2011 8:04:25 AM EST >> To: Jason Stajich >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out >> >> Hello Jason, >> >> As answering your question: >> >> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" >> >> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. >> I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. >> >> Thanks in advance >> >> El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: >> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. >> >> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? >> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> >>> Hello everybody, >>> >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has >>> worked fine for me. Now I need to perform a multiple BLAST search, but this >>> time I'd just like to get all the BLAST results in a single out file >>> instead of having each sequence's report written individually. I've read >>> the documentation of the module, but due to my short >>> experience/understanding on complex modules as this one seems to be I can't >>> figure out where to change the script to achieve my previously mentioned >>> aim. >>> Here I post the script I've been using (it's basically the one posted on >>> the module cookbook). >>> >>> #!/c:/Perl -w >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::SearchIO; >>> use Data::Dumper; >>> >>> #Here i set the parameters for blast >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >>> tblastx):\n"; >>> my $blst = ; >>> my $prog = "$blst"; >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, >>> env_nr):\n"; >>> my $dtb = ; >>> $db = "$dtb"; >>> print "Enter your cutt off score (1e-n):\n"; >>> my $cut = ; >>> my $e_val = "$cut"; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO' ); >>> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> >>> #Select the file and make the blast. >>> print "Enter your FASTA file:\n"; >>> chomp(my $infile = ); >>> my $r = $remoteBlast->submit_blast($infile); >>> my $v = 1; >>> >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS >>> TO RETURN!!!!! >>> while ( my @rids = $remoteBlast->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $remoteBlast->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $remoteBlast->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = >>> $result->query_name()."\.out";##################open SALIDA, >>> '>>'."$^T"."Report"."\.out"; >>> $remoteBlast->save_output($filename);############# >>> $remoteBlast->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> >>> >>> May any of you please explain me how to solve my question? >>> >>> Thanks in advence >>> >>> With best wishes >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 12:03:04 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:03:04 +0000 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: Cheng, Have you tried the latest CPAN release (we're at 1.006901). chris On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 12:59:35 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:59:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu> On Nov 13, 2011, at 6:24 AM, Peter Cock wrote: > So, Chris and I seem in general agreement that an OBDA v2 > using SQLite but based on essentially the same approach as > the BDB or flat file based OBDA v1 is a good idea. i.e. Tables > mapping record identifiers to file offsets in the original sequence > files. The worry I have is adhering to a specific backend (e.g. SQLite). The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets. Who's to say something similar won't happen to SQLite, or that it is the best option available? Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed). > I hope to get BioRuby on board, they already have an OBDA > v1 support so that shouldn't be too hard. > > Right now I don't recall if BioJava has/had OBDA v1 support, > and if they did if it was affected in their recent move to BioJava > v3 (I understand from their mailing list that some older lower > priority functionality has not all been ported yet). I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?) > Also EMBOSS are likely to be interested, certainly Peter Rice > was interested in the SQLite indexing we're already using in > Biopython for sequence files (i.e. what is effectively the > prototype for OBDA v2). > > Note that in addition to simple indexing of text files, we are > already using the same simple offset + length approach for > indexing binary files (e.g. SFF). I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well. > On the immediate practical side, I think I can edit the > current OBDA website of http://obda.open-bio.org/ > via /home/websites/obda.open-bio.org/html on the > server. See below w/ regards to my thoughts on the wiki. > We need to work out where the current OBDA indexing > specification lives (CVS or SVN?) and perhaps move > that to github. We may need a general OBF organisation > account on git hub for this and any other cross-project > repositories. +1 to a move to github, but maybe this belongs in an OBF-specific organization. And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. > I see there is already an OBDA project on RedMine, > (Chris can you add me to that please?) > https://redmine.open-bio.org/projects/obda > > Peter Done (last night actually, but I didn't have time to respond immediately). chris From David.Messina at sbc.su.se Mon Nov 14 14:31:18 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Nov 2011 20:31:18 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... Yes, it's the --remote option. I've used it, and it works great. The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers. Dave > From jluis.lavin at unavarra.es Mon Nov 14 16:23:31 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 22:23:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: Thank you very much for your answers, but due to them, I'm afraid I didn't explained myself good enough. I'm not looking for another tool to perform a BLAST task. I was just wondering if there was a way to simply change the way the module writes the outputs, so that I can get multiple searches in a single report file instead of having a report for each BLAST search. Maybe there's some issue I ignore, that makes you recommend the use of other tools instead of the Bioperl Remote BLAST module...it would be appreciated if you let me know about that (NCBI server problems with web-services or so)... Thank you for your answers anyway Best wishes 2011/11/14 Fields, Christopher J > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the > various 'blast*' indicating the search is to use a remote database. I > haven't used it, though... > > chris > > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > > > Please keep this on list discussions > > > > Sent from my iPhone-please excuse typos > > > > -- > > Jason Stajich > > > > Begin forwarded message: > > > >> From: Jos? Luis Lav?n > >> Date: November 14, 2011 8:04:25 AM EST > >> To: Jason Stajich > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a > single out > >> > >> Hello Jason, > >> > >> As answering your question: > >> > >> " If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a > table?" > >> > >> A concatenation of BLAST (default format) reports should be OK, since I > have a script to parse that kind of results. Anyway formats 1 or 2 will > also do the trick. > >> I'll be happy to get assistance on how to change the OUTFILE from "a > query a report" to "all queries in the same report", because I don't seem > to be able to do it myself after reading the module documentation. > >> > >> Thanks in advance > >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < > jason.stajich at gmail.com> escribi?: > >> if you want to do a bunch of BLASTs remotely on the cmdline you should > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ > equivalent). This might be faster to do and easier since you need to learn > the programming part too. > >> > >> If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a table? > >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > >> > >>> Hello everybody, > >>> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > >>> worked fine for me. Now I need to perform a multiple BLAST search, but > this > >>> time I'd just like to get all the BLAST results in a single out file > >>> instead of having each sequence's report written individually. I've > read > >>> the documentation of the module, but due to my short > >>> experience/understanding on complex modules as this one seems to be I > can't > >>> figure out where to change the script to achieve my previously > mentioned > >>> aim. > >>> Here I post the script I've been using (it's basically the one posted > on > >>> the module cookbook). > >>> > >>> #!/c:/Perl -w > >>> use Bio::Tools::Run::RemoteBlast; > >>> use Bio::SearchIO; > >>> use Data::Dumper; > >>> > >>> #Here i set the parameters for blast > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > >>> tblastx):\n"; > >>> my $blst = ; > >>> my $prog = "$blst"; > >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, > pdb, > >>> env_nr):\n"; > >>> my $dtb = ; > >>> $db = "$dtb"; > >>> print "Enter your cutt off score (1e-n):\n"; > >>> my $cut = ; > >>> my $e_val = "$cut"; > >>> > >>> my @params = ( '-prog' => $prog, > >>> '-data' => $db, > >>> '-expect' => $e_val, > >>> '-readmethod' => 'SearchIO' ); > >>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > >>> > >>> > >>> #Select the file and make the blast. > >>> print "Enter your FASTA file:\n"; > >>> chomp(my $infile = ); > >>> my $r = $remoteBlast->submit_blast($infile); > >>> my $v = 1; > >>> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE > RESULTS > >>> TO RETURN!!!!! > >>> while ( my @rids = $remoteBlast->each_rid ) { > >>> foreach my $rid ( @rids ) { > >>> my $rc = $remoteBlast->retrieve_blast($rid); > >>> if( !ref($rc) ) { > >>> if( $rc < 0 ) { > >>> $remoteBlast->remove_rid($rid); > >>> } > >>> print STDERR "." if ( $v > 0 ); > >>> sleep 5; > >>> } else { > >>> my $result = $rc->next_result(); > >>> #save the output > >>> my $filename = > >>> $result->query_name()."\.out";##################open SALIDA, > >>> '>>'."$^T"."Report"."\.out"; > >>> $remoteBlast->save_output($filename);############# > >>> $remoteBlast->remove_rid($rid); > >>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>> while ( my $hit = $result->next_hit ) { > >>> next unless ( $v > 0); > >>> print "\thit name is ", $hit->name, "\n"; > >>> while( my $hsp = $hit->next_hsp ) { > >>> print "\t\tscore is ", $hsp->score, "\n"; > >>> } > >>> } > >>> } > >>> } > >>> } > >>> > >>> > >>> May any of you please explain me how to solve my question? > >>> > >>> Thanks in advence > >>> > >>> With best wishes > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> -- > >> -- > >> Dr. Jos? Luis Lav?n Trueba > >> > >> Dpto. de Producci?n Agraria > >> Grupo de Gen?tica y Microbiolog?a > >> Universidad P?blica de Navarra > >> 31006 Pamplona > >> Navarra > >> SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Mon Nov 14 22:53:19 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 22:53:19 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com> sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming. I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. https://redmine.open-bio.org/issues/3313 Jason Can you provide a test script and we'll add a test for this so On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cchehoud at gmail.com Mon Nov 14 20:39:32 2011 From: cchehoud at gmail.com (Christel Chehoud) Date: Mon, 14 Nov 2011 17:39:32 -0800 Subject: [Bioperl-l] Bioperl installation help Message-ID: Dear BioPerl, Thank you for creating such useful code. Unfortunately, every time I try to install Bioperl, it takes me a long time and is a challenging ordeal :( I am a new MAC user and was not able to download bioperl using CPAN. Here is the error I am getting: ERROR: Can't create '/usr/local/bin' Do not have write permissions on '/usr/local/bin' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 CJFIELDS/BioPerl-1.6.0.tar.gz ./Build install -- NOT OK ---- You may have to su to root to install the package (Or you may want to run something like o conf make_install_make_command 'sudo make' to raise your permissions.Warning (usually harmless): 'YAML' not installed, will not store persistent state Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but failure ignored because 'force' in effect so I did: cpan> o conf make_install_make_command 'sudo make' followed by cpan> o conf commit and started over..I got the same number of errors as last time (so I decided not to force install this time). do you have any suggestions: 63 tests and 305 subtests skipped. Failed 11/329 test scripts. 981/17708 subtests failed. Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = 117.20 CPU) Failed 11/329 test programs. 981/17708 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Warning (usually harmless): 'YAML' not installed, will not store persistent state Running Build install make test had returned bad status, won't install without force Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Thanks a lot for your time and help. I appreciate it. Thank you, Christel From casaburi at ceinge.unina.it Tue Nov 15 04:25:25 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST) Subject: [Bioperl-l] Blast > parsing result in Exel Message-ID: <32846407.post@talk.nabble.com> Hy everybody, in this situation froma blast (-m 1) result file : Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 132-291 (59 letters) Database: Scrivania/orchidea/mature_mirBase.fa 21,643 sequences; 470,608 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031 mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031 gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9 gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9 mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9 132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59 12631 5 .............. 18 12630 5 .............. 18 7826 5 ........... 15 7644 19 ........... 9 5394 3 ........... 13 5394 3 ........... 13 BLASTN 2.2.21 [Jun-14-2009] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ... .... .......... ______________________________________________________________ I need to parse in an exel sheet : 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula Is possible from a big blast result file obtain an exel with 5 columns where every field is the first hit of the blast result. Can anyone halp me to fix this problem ??? Also with a little script in perl. Thank you very much -- View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From nisa.dar10 at gmail.com Tue Nov 15 19:49:00 2011 From: nisa.dar10 at gmail.com (nisa.dar) Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST) Subject: [Bioperl-l] print alignment from blast results file Message-ID: <32851673.post@talk.nabble.com> Hi, I am parsing a blast results file. I have found bioperl modules to get query string, homology string and hit string for each hit/hsp. I want to print them in the form of an alignment instead of aligning them individually. this is what I am doing, but it doesn't seem correct while (my $hsp = $hit->next_hsp) { my $start_query_num=$hsp->start('query'); my $query_string=$hsp->query_string; my $end_query_num=$hsp->end('query'); my $homology_string=$hsp->homology_string; my $start_hit_num=$hsp->start('hit'); my $hit_string=$hsp->hit_string; my $end_hit_num=$hsp->end('hit'); my $aln_o = $hsp->get_aln; $query_string=~s/\n//g;#get rid of new line characters $homology_string=~s/\n//g; $hit_string=~s/\n//g; print "

Alignment:


"; print "$start_query_num-$query_string-$end_query_num
"; print "         $homology_string
"; print "$start_hit_num-$hit_string-$end_hit_num

"; } Please let me know how can I print them in the form of an alignment as seen in the blast results file. Thanks -- View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Wed Nov 16 04:11:40 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Nov 2011 09:11:40 +0000 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C wrote: > > Hy everybody, > > in this situation froma blast (-m 1) result file : > > ... > > I need to parse in an exel sheet : > > 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species > > > 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula > > Is possible from a big blast result file obtain an exel with 5 columns where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > > Thank you very much Have you looked at any of the BioPerl BLAST parsing examples? e.g http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/Module:Bio::SearchIO See also http://seqanswers.com/forums/showthread.php?t=15489 Peter From bosborne11 at verizon.net Wed Nov 16 08:19:33 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 16 Nov 2011 08:19:33 -0500 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <32851673.post@talk.nabble.com> References: <32851673.post@talk.nabble.com> Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Nisa, See: http://www.bioperl.org/wiki/HOWTO:SearchIO Brian O. On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > > Hi, > > I am parsing a blast results file. I have found bioperl modules to get query > string, homology string and hit string for each hit/hsp. I want to print > them in the form of an alignment instead of aligning them individually. > > this is what I am doing, but it doesn't seem correct > > while (my $hsp = $hit->next_hsp) { > my > $start_query_num=$hsp->start('query'); > my $query_string=$hsp->query_string; > my $end_query_num=$hsp->end('query'); > my $homology_string=$hsp->homology_string; > my $start_hit_num=$hsp->start('hit'); > my $hit_string=$hsp->hit_string; > my $end_hit_num=$hsp->end('hit'); > my $aln_o = $hsp->get_aln; > $query_string=~s/\n//g;#get rid of new line characters > $homology_string=~s/\n//g; > $hit_string=~s/\n//g; > > print "

Alignment:


"; > print "$start_query_num-$query_string-$end_query_num
"; > print " >         $homology_string
"; > print "$start_hit_num-$hit_string-$end_hit_num

"; > > > > } > > Please let me know how can I print them in the form of an alignment as seen > in the blast results file. > > Thanks > > > -- > View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 11:44:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:44:27 +0000 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu> For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules). This should automatically install the latest version from CPAN. My guess is this will address some of the issues. However, w/o actually seeing what tests failed we can't help. Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB. There are instructions in the installation docs for that. You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan. chris On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 11:46:16 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:46:16 +0000 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> References: <32851673.post@talk.nabble.com> <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Message-ID: small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance). chris On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote: > Nisa, > > See: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > Brian O. > > > On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > >> >> Hi, >> >> I am parsing a blast results file. I have found bioperl modules to get query >> string, homology string and hit string for each hit/hsp. I want to print >> them in the form of an alignment instead of aligning them individually. >> >> this is what I am doing, but it doesn't seem correct >> >> while (my $hsp = $hit->next_hsp) { >> my >> $start_query_num=$hsp->start('query'); >> my $query_string=$hsp->query_string; >> my $end_query_num=$hsp->end('query'); >> my $homology_string=$hsp->homology_string; >> my $start_hit_num=$hsp->start('hit'); >> my $hit_string=$hsp->hit_string; >> my $end_hit_num=$hsp->end('hit'); >> my $aln_o = $hsp->get_aln; >> $query_string=~s/\n//g;#get rid of new line characters >> $homology_string=~s/\n//g; >> $hit_string=~s/\n//g; >> >> print "

Alignment:


"; >> print "$start_query_num-$query_string-$end_query_num
"; >> print " >>         $homology_string
"; >> print "$start_hit_num-$hit_string-$end_hit_num

"; >> >> >> >> } >> >> Please let me know how can I print them in the form of an alignment as seen >> in the blast results file. >> >> Thanks >> >> >> -- >> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 16 12:01:49 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Nov 2011 18:01:49 +0100 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: Hi Christel, Sorry to hear you're having trouble with the installation. It looks like these modules aren't getting installed and are causing the failed tests: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO I would try installing those separately via CPAN first and then trying again to install BioPerl. Also, it was a good idea to set the make_install_make_command option to CPAN, and that should have worked. Unfortunately, there's another installation system called Module::Build that has its own option which may need to be set: cpan> o conf mbuild_install_build_command 'sudo ./Build' That being said, I would suggest you grab the latest version of BioPerl from github instead of using v1.6.1 from CPAN, which is fairly out of date at this point. And unless you're planning to use BioPerl with GBrowse or Bio::Graphics, there's another, simpler way to get BioPerl up and running (assuming you have all the prerequisites like Data::Stag installed): See "Don't want to install BioPerl?" here: http://www.seqxml.org/xml/BioPerl.html Best, Dave On Tue, Nov 15, 2011 at 02:39, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm > line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Wed Nov 16 13:31:46 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Wed, 16 Nov 2011 19:31:46 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: Thank you for your answer Jason, While answering you I figured out how to do it...sometimes you need other people's point of view to see the light. As you pointed out: "what is complicaticated is the file name right now is based on the query name." that's what I expected that could have an easy fix, the issue about the dependency between the outfile name and the query name, this is why I couldn't figure out how to change the name of the output . While reading the code to answer you, I came across the solution. I was persistent on doing it this way because I need to run BLAST remotely on my CGI, that's why I didn't pay attention to all the other options you suggested. Thank you all for your sugestions anyway. ;) Best wishes JL El 16 de noviembre de 2011 18:03, Jason Stajich escribi?: > the answer to your question is to move the line that opens a file to > outside the loop. what is complicaticated is the file name right now is > based on the query name. so you need to think how you want to name the > file. Since this isn't obvious to you, then I think we are suggesting you > probably need to understand programming more, and it might just be easier > to use the tools as we have suggested rather than teaching you to modify > what is just an example code. our suggestions are based on the way we'd > solve the problem so maybe you have other reasons for the direction you > want to take. > > I also think it is not efficient or logical to run > remote blast through the web protocol simply to write it back out with > bioperl since that has to parse it in and then write it out -- why not just > run the program that generates the output directly from NCBI. Or run BLAST > locally for likely more efficient running. > > Finally the bioperl writer may not 100% reproduce the blast output so if > you are planning on further parsing the output that comes out from this > script, it really doesn't seem like a good idea to launder it through > bioperl parser first. > > > > 2011/11/14 Jos? Luis Lav?n > >> Thank you very much for your answers, but due to them, I'm afraid I didn't >> explained myself good enough. >> >> I'm not looking for another tool to perform a BLAST task. I was just >> wondering if there was a way to simply change the way the module writes >> the >> outputs, so that I can get multiple searches in a single report file >> instead of having a report for each BLAST search. >> >> Maybe there's some issue I ignore, that makes you recommend the use of >> other tools instead of the Bioperl Remote BLAST module...it would be >> appreciated if you let me know about that (NCBI server problems with >> web-services or so)... >> >> Thank you for your answers anyway >> >> Best wishes >> >> 2011/11/14 Fields, Christopher J >> >> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for >> the >> > various 'blast*' indicating the search is to use a remote database. I >> > haven't used it, though... >> > >> > chris >> > >> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: >> > >> > > Please keep this on list discussions >> > > >> > > Sent from my iPhone-please excuse typos >> > > >> > > -- >> > > Jason Stajich >> > > >> > > Begin forwarded message: >> > > >> > >> From: Jos? Luis Lav?n >> > >> Date: November 14, 2011 8:04:25 AM EST >> > >> To: Jason Stajich >> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a >> > single out >> > >> >> > >> Hello Jason, >> > >> >> > >> As answering your question: >> > >> >> > >> " If you want to do this within this code I guess the question is >> what >> > format you want the data in - a BLAST report or something more like a >> > table?" >> > >> >> > >> A concatenation of BLAST (default format) reports should be OK, >> since I >> > have a script to parse that kind of results. Anyway formats 1 or 2 will >> > also do the trick. >> > >> I'll be happy to get assistance on how to change the OUTFILE from "a >> > query a report" to "all queries in the same report", because I don't >> seem >> > to be able to do it myself after reading the module documentation. >> > >> >> > >> Thanks in advance >> > >> >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < >> > jason.stajich at gmail.com> escribi?: >> > >> if you want to do a bunch of BLASTs remotely on the cmdline you >> should >> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ >> > equivalent). This might be faster to do and easier since you need to >> learn >> > the programming part too. >> > >> >> > >> If you want to do this within this code I guess the question is what >> > format you want the data in - a BLAST report or something more like a >> table? >> > >> >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> > >> >> > >>> Hello everybody, >> > >>> >> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it >> has >> > >>> worked fine for me. Now I need to perform a multiple BLAST search, >> but >> > this >> > >>> time I'd just like to get all the BLAST results in a single out file >> > >>> instead of having each sequence's report written individually. I've >> > read >> > >>> the documentation of the module, but due to my short >> > >>> experience/understanding on complex modules as this one seems to be >> I >> > can't >> > >>> figure out where to change the script to achieve my previously >> > mentioned >> > >>> aim. >> > >>> Here I post the script I've been using (it's basically the one >> posted >> > on >> > >>> the module cookbook). >> > >>> >> > >>> #!/c:/Perl -w >> > >>> use Bio::Tools::Run::RemoteBlast; >> > >>> use Bio::SearchIO; >> > >>> use Data::Dumper; >> > >>> >> > >>> #Here i set the parameters for blast >> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >> > >>> tblastx):\n"; >> > >>> my $blst = ; >> > >>> my $prog = "$blst"; >> > >>> print "Enter a database to search (nr, refseq_protein, swissprot, >> pat, >> > pdb, >> > >>> env_nr):\n"; >> > >>> my $dtb = ; >> > >>> $db = "$dtb"; >> > >>> print "Enter your cutt off score (1e-n):\n"; >> > >>> my $cut = ; >> > >>> my $e_val = "$cut"; >> > >>> >> > >>> my @params = ( '-prog' => $prog, >> > >>> '-data' => $db, >> > >>> '-expect' => $e_val, >> > >>> '-readmethod' => 'SearchIO' ); >> > >>> >> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >> > >>> >> > >>> >> > >>> #Select the file and make the blast. >> > >>> print "Enter your FASTA file:\n"; >> > >>> chomp(my $infile = ); >> > >>> my $r = $remoteBlast->submit_blast($infile); >> > >>> my $v = 1; >> > >>> >> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE >> > RESULTS >> > >>> TO RETURN!!!!! >> > >>> while ( my @rids = $remoteBlast->each_rid ) { >> > >>> foreach my $rid ( @rids ) { >> > >>> my $rc = $remoteBlast->retrieve_blast($rid); >> > >>> if( !ref($rc) ) { >> > >>> if( $rc < 0 ) { >> > >>> $remoteBlast->remove_rid($rid); >> > >>> } >> > >>> print STDERR "." if ( $v > 0 ); >> > >>> sleep 5; >> > >>> } else { >> > >>> my $result = $rc->next_result(); >> > >>> #save the output >> > >>> my $filename = >> > >>> $result->query_name()."\.out";##################open SALIDA, >> > >>> '>>'."$^T"."Report"."\.out"; >> > >>> $remoteBlast->save_output($filename);############# >> > >>> $remoteBlast->remove_rid($rid); >> > >>> print "\nQuery Name: ", $result->query_name(), "\n"; >> > >>> while ( my $hit = $result->next_hit ) { >> > >>> next unless ( $v > 0); >> > >>> print "\thit name is ", $hit->name, "\n"; >> > >>> while( my $hsp = $hit->next_hsp ) { >> > >>> print "\t\tscore is ", $hsp->score, "\n"; >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> >> > >>> >> > >>> May any of you please explain me how to solve my question? >> > >>> >> > >>> Thanks in advence >> > >>> >> > >>> With best wishes >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> _______________________________________________ >> > >> Bioperl-l mailing list >> > >> Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> >> > >> -- >> > >> -- >> > >> Dr. Jos? Luis Lav?n Trueba >> > >> >> > >> Dpto. de Producci?n Agraria >> > >> Grupo de Gen?tica y Microbiolog?a >> > >> Universidad P?blica de Navarra >> > >> 31006 Pamplona >> > >> Navarra >> > >> SPAIN >> > > >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From l.m.timmermans at students.uu.nl Fri Nov 18 09:15:47 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Fri, 18 Nov 2011 15:15:47 +0100 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C wrote: > I need to parse in an exel sheet : > What you're saying here is nonsense. I think you meant to say you want to output Excel. > Is possible from a big blast result file obtain an exel with 5 columns > where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > There are a number of Perl modules on CPAN for outputting Excel. Try Excel::Writer::XLSX and Spreadsheet::WriteExcel for example. Leon From tzhu at mail.bnu.edu.cn Mon Nov 21 00:17:18 2011 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Mon, 21 Nov 2011 13:17:18 +0800 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn> I can use the "slice" method to split a single sequence alignment into several subalignments. Then is there a corresponding "combine" method to combine such subalignments back? -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From David.Messina at sbc.su.se Mon Nov 21 04:58:51 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 21 Nov 2011 10:58:51 +0100 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Nov 21 06:41:09 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 21 Nov 2011 11:41:09 +0000 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <4ECA38D5.8050709@gmail.com> See the cat method in Bio::Align::Utilities: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat On 21/11/2011 09:58, Dave Messina wrote: > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From zntayl at gmail.com Wed Nov 16 20:07:07 2011 From: zntayl at gmail.com (Nathan Taylor) Date: Wed, 16 Nov 2011 20:07:07 -0500 Subject: [Bioperl-l] seqIO.pm Message-ID: Hello, Can SeqIO.pm convert a file of fastq reads into .phd files. Or, barring that, a file of fastas and file of quals into .phd files? Many thanks, Nathan From gregonomic at yahoo.co.nz Mon Nov 21 07:00:50 2011 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST) Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Hi. I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. Usage: concatenate_alignments.pl -o <... input_alignment_n> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). Greg. ________________________________ From: Dave Messina To: Tao Zhu Cc: BioPerl Sent: Monday, 21 November 2011 7:58 PM Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: concatenate_alignments.pl Type: application/octet-stream Size: 3349 bytes Desc: not available URL: From jason.stajich at gmail.com Mon Nov 21 10:31:50 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 21 Nov 2011 10:31:50 -0500 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com> greg -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out. This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment. https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote: > Hi. > > I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. > > It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. > > Usage: > concatenate_alignments.pl -o <... input_alignment_n> > > > If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). > > Greg. > > > ________________________________ > From: Dave Messina > To: Tao Zhu > Cc: BioPerl > Sent: Monday, 21 November 2011 7:58 PM > Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? > > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Mon Nov 21 11:15:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 21 Nov 2011 16:15:13 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter From cjfields at illinois.edu Mon Nov 21 11:57:29 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 21 Nov 2011 16:57:29 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu> On Nov 21, 2011, at 10:15 AM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: >> Hello, >> >> Can SeqIO.pm convert a file of fastq reads into .phd files. Or, >> barring that, a file of fastas and file of quals into .phd files? >> >> Many thanks, >> Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an > error message? > > Peter This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose. Nathan, if you run into problems with that conversion let us know. chris From rondonbio at yahoo.com.br Mon Nov 21 12:31:21 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST) Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Hi! try this script: #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } my $fastq = $ARGV[0]; my $in = Bio::SeqIO->new( -file => $fastq, ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); my $out = Bio::SeqIO->new ( -file => ">out.phd", ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); while (my $seq = $in->next_seq()) { ?? ? ?$out->write_seq($seq); } exit; Best wishes, Rondon, a brazilian friend. ________________________________ De: Peter Cock Para: Nathan Taylor Cc: bioperl-l at bioperl.org Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 Assunto: Re: [Bioperl-l] seqIO.pm On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Nov 21 15:04:01 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 22 Nov 2011 09:04:01 +1300 Subject: [Bioperl-l] seqIO.pm In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> References: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz> Or you could use the builtin script bp_sreformat.pl --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rondon Neto > Sent: Tuesday, 22 November 2011 6:31 a.m. > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] seqIO.pm > > Hi! try this script: > > #!/usr/bin/perl > use warnings; > use strict; > use Bio::SeqIO; > > if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } > > my $fastq = $ARGV[0]; > > my $in = Bio::SeqIO->new( -file => $fastq, > ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); > > my $out = Bio::SeqIO->new ( -file => ">out.phd", > ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); > > while (my $seq = $in->next_seq()) { > ?? ? ?$out->write_seq($seq); > } > > exit; > > > Best wishes, > Rondon, a brazilian friend. > > > > > > > ________________________________ > De: Peter Cock > Para: Nathan Taylor > Cc: bioperl-l at bioperl.org > Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 > Assunto: Re: [Bioperl-l] seqIO.pm > > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > > Hello, > > > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > > barring that, a file of fastas and file of quals into .phd files? > > > > Many thanks, > > Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an error message? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From goodyearkl at gmail.com Mon Nov 21 21:23:13 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Hi, This may seem like a stupid question but I am just learning bioperl and I am trying to figure out how to get a count of all the characters in my FASTA file. I manged to get the number of sequences using the following. Is there a way to tell bioperl to count the characters? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $count=0; while (my $seq_obj = $seqio_obj->next_seq) { $count++; } #Display the number of sequences present print "There are $count sequences present.\n"; From David.Messina at sbc.su.se Tue Nov 22 03:08:11 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 22 Nov 2011 09:08:11 +0100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, You can use the length method for this. my $seq_length = $seq_obj->length(); Have you taken a look at the beginner's HOWTO? There's a nice table of sequence methods as well lots of other good information in there. http://www.bioperl.org/wiki/HOWTO:Beginners Dave On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From liam.elbourne at mq.edu.au Mon Nov 21 23:11:12 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Tue, 22 Nov 2011 15:11:12 +1100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, I think the length() method is what you're after: .... my $sequence_length = $seq_obj->length(); .... in your case. Have a look at: HOWTO:SeqIO - BioPerl and, HOWTO:Beginners - BioPerl for some more general stuff. Regards, Liam. On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: Message signed with OpenPGP using GPGMail URL: From goodyearkl at gmail.com Tue Nov 22 08:00:55 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? In-Reply-To: References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Thank you for your help. It keeps telling me that it can't find "length" do you think it has to do with the way I am coding it? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $countseq=0; while (my $seq_obj = $seqio_obj->next_seq, ) { $countseq++; } #Display the number of sequences present print "There are $countseq sequences present.\n"; #Count number of charcaters in file my $seq_length = $seq_obj->length ; print $seq_length On Nov 22, 5:08?am, Dave Messina wrote: > Hi Kylie, > > You can use the length method for this. > > my $seq_length = $seq_obj->length(); > > Have you taken a look at the beginner's HOWTO? There's a nice table of > sequence methods as well lots of other good information in there. > > http://www.bioperl.org/wiki/HOWTO:Beginners > > Dave > > > > > > > > > > On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > > Hi, > > This may seem like a stupid question but I am just learning bioperl > > and I am trying to figure out how to get a count of all the characters > > in my FASTA file. I manged to get the number of sequences using the > > following. Is there a way to tell bioperl to count the characters? > > > #!/usr/bin/perl -w > > #Defines perl modules > > #Bio::Seq deal with sequences and their features > > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats > > use Bio::SeqIO; > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > #Count how many sequences are present in file > > my $count=0; > > while (my $seq_obj = $seqio_obj->next_seq) { > > ? ?$count++; > > } > > #Display the number of sequences present > > print "There are $count sequences present.\n"; > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Nov 22 10:50:31 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 22 Nov 2011 15:50:31 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <4ECBC4C7.10401@gmail.com> Hi Kylie, I suspect the error you get is actually "Can't call method length on an undefined value" (please in future report the exact text of any error messages). You declare $seq_obj with "my" in the while loop, but then try to access it outside of the loop. Try printing out the length of each $seq_obj within the while loop. You should always include "use strict;" at the top of your program, that helps to catch errors like this. Cheers, Roy. On 22/11/2011 13:00, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 22 11:13:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 22 Nov 2011 16:13:01 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> This sounds a little homework-y. Sure this isn't for a class? :) One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl. Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length. chris On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Nov 22 15:47:36 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 23 Nov 2011 09:47:36 +1300 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz> Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl As previous posters have hinted, RTFM - the answers are all in there! --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J > Sent: Wednesday, 23 November 2011 5:13 a.m. > To: Kylie Goodyear > Cc: > Subject: Re: [Bioperl-l] Fasta counting script? > > This sounds a little homework-y. Sure this isn't for a class? :) > > One clue (and a good thing to keep in mind): always 'use strict; use warnings;' > with your scripts if you are new to perl. Doing so would let you know there is > a problem with the script the way it is written, specifically, the place where > you are inquiring about the length. > > chris > > On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > > > Thank you for your help. It keeps telling me that it can't find > > "length" do you think it has to do with the way I am coding it? > > > > #!/usr/bin/perl -w > > #Defines perl modules > > > > #Bio::Seq deal with sequences and their features use Bio::Seq; > > > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats use Bio::SeqIO; > > > > > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > > > > > #Count how many sequences are present in file my $countseq=0; while > > (my $seq_obj = $seqio_obj->next_seq, ) { > > $countseq++; > > } > > #Display the number of sequences present print "There are $countseq > > sequences present.\n"; > > > > #Count number of charcaters in file > > my $seq_length = $seq_obj->length ; > > print $seq_length > > > > > > On Nov 22, 5:08 am, Dave Messina wrote: > >> Hi Kylie, > >> > >> You can use the length method for this. > >> > >> my $seq_length = $seq_obj->length(); > >> > >> Have you taken a look at the beginner's HOWTO? There's a nice table > >> of sequence methods as well lots of other good information in there. > >> > >> http://www.bioperl.org/wiki/HOWTO:Beginners > >> > >> Dave > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear > wrote: > >>> Hi, > >>> This may seem like a stupid question but I am just learning bioperl > >>> and I am trying to figure out how to get a count of all the > >>> characters in my FASTA file. I manged to get the number of sequences > >>> using the following. Is there a way to tell bioperl to count the characters? > >> > >>> #!/usr/bin/perl -w > >>> #Defines perl modules > >>> #Bio::Seq deal with sequences and their features use Bio::Seq; > >>> #Bio::SeqIO handles reading and parsing of sequences of many > >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = > >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" > >>> ); #Count how many sequences are present in file my $count=0; while > >>> (my $seq_obj = $seqio_obj->next_seq) { > >>> $count++; > >>> } > >>> #Display the number of sequences present print "There are $count > >>> sequences present.\n"; > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioper... at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf > >> o/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From charles-listes+bioperl at plessy.org Wed Nov 23 05:27:45 2011 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Wed, 23 Nov 2011 19:27:45 +0900 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? Message-ID: <20111123102745.GC20168@merveille.plessy.net> Dear BioPerl developers, I am trying to process some unaligned paired-end reads with Bio::DB::Sam. For each pair, I want to detect a sequence index and a unique molecular identifier in the linker, record them as auxiliary flags, and trim the linker from the read. I collect the pairs through a features iterator, and can access all their data through the high-level Bio::DB::Bam::Alignment API. After modifying them (linker trimming and adding flags), I want to write the resulting pairs as a new unaligned BAM file. I apologise if the solution is trivial, but my problem is that I do not manage to modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as ?$pair[0]->qseq("GATACA")? give errors like ?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. Since I did not find explanations or portsions of source code indicating how to modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan From MEC at stowers.org Wed Nov 23 11:02:26 2011 From: MEC at stowers.org (Cook, Malcolm) Date: Wed, 23 Nov 2011 10:02:26 -0600 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Charles, I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". ~Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Plessy > Sent: Wednesday, November 23, 2011 4:28 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the read. > > I collect the pairs through a features iterator, and can access all their data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 23 14:26:31 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 23 Nov 2011 19:26:31 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Wed Nov 23 17:02:23 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:02:23 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: I apologize that the qseq() method is only allowing read-only access. I will attempt to fix this. Lincoln On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy < charles-listes+bioperl at plessy.org> wrote: > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the > read. > > I collect the pairs through a features iterator, and can access all their > data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as > a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not > manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating > how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lincoln.stein at gmail.com Wed Nov 23 17:05:41 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:05:41 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > According to the docs the low-level API for Bio-Samtools, both read and > write are allowed: > > http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API > > Using the low-level API for this purpose isn't documented as well, though > (the high-level API is read only AFAICT). > > The error message is a standard one generated from the XS bindings where > the passed argument passed isn't mapped correctly. Looking through the > Sam.xs file, qseq() is only prototyped as a reader; the only arg is a > Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a > function specified for Bio::DB::Bam::Alignment names l_qseq() that might be > the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' > prefix): > > .... > > int > bama_l_qseq(b,...) > Bio::DB::Bam::Alignment b > PROTOTYPE: $;$ > CODE: > if (items > 1) > b->core.l_qseq = SvIV(ST(1)); > RETVAL=b->core.l_qseq; > OUTPUT: > RETVAL > > SV* > bama_qseq(b) > Bio::DB::Bam::Alignment b > PROTOTYPE: $ > PREINIT: > char* seq; > int i; > CODE: > seq = Newxz(seq,b->core.l_qseq+1,char); > for (i=0;icore.l_qseq;i++) { > seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; > } > RETVAL = newSVpv(seq,b->core.l_qseq); > Safefree(seq); > OUTPUT: > RETVAL > > > -chris > > On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > > > Charles, > > > > I suggest you reconsider your approach to rather, use `samtools view` to > pipe your reads to stdout in sam format, then stream edit the barcode and > pipe it back to samtools for conversion back to .bam file. > > > > I know this is not what you're asking. I'm pretty sure that direct > answer to your question is, "yes - they are read-only". > > > > ~Malcolm > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy > >> Sent: Wednesday, November 23, 2011 4:28 AM > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > >> > >> Dear BioPerl developers, > >> > >> I am trying to process some unaligned paired-end reads with > Bio::DB::Sam. > >> For > >> each pair, I want to detect a sequence index and a unique molecular > >> identifier in > >> the linker, record them as auxiliary flags, and trim the linker from > the read. > >> > >> I collect the pairs through a features iterator, and can access all > their data > >> through the high-level Bio::DB::Bam::Alignment API. After modifying > them > >> (linker trimming and adding flags), I want to write the resulting pairs > as a > >> new unaligned BAM file. > >> > >> I apologise if the solution is trivial, but my problem is that I do not > manage to > >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > >> ?$pair[0]->qseq("GATACA")? give errors like > >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > >> > >> Since I did not find explanations or portsions of source code > indicating how to > >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > >> > >> Have a nice day, > >> > >> -- > >> Charles Plessy > >> Tsurumi, Kanagawa, Japan > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Wed Nov 23 20:07:09 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 24 Nov 2011 01:07:09 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> , Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu> Ah, okay, makes sense. I thought it was oddly named. :) Chris Sent from my iPad On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" > wrote: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J > wrote: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From ross at cuhk.edu.hk Sun Nov 27 03:24:43 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 27 Nov 2011 16:24:43 +0800 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: References: Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Hi all, To write a script to extract sequence generically for all types of BioLocation objects, I'd like to know if there is any way to check what types (e.g. simple or split) are being processed. Bio::Location::CoordinatePolicyI appears to be doing something similar but it is more like a post checking step. If I parse the genbank file line by line, I can certainly check whether the line contains keywords like "join" but as I'm using something like: my @features=grep{$_->primary_tag eq $chkTags[0]} $seqobj->get_SeqFeatures; foreach (@features) { $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; @gene=[]; I'd appreciate if anybody knows a better integration with the well-developed bioperl module. Thanks a lot. From Russell.Smithies at agresearch.co.nz Sun Nov 27 19:46:05 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Nov 2011 13:46:05 +1300 Subject: [Bioperl-l] Galaxy tools? Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl? I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox. It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space) --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From p.j.a.cock at googlemail.com Sun Nov 27 20:28:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 28 Nov 2011 01:28:33 +0000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: On Monday, November 28, 2011, Smithies, Russell wrote: > Possibly the wrong place to ask but has anyone written > Galaxy tools using BioPerl? > I was thinking of creating blast graphic and format converter > tools as I couldn't see any already available in their toolbox. > It looks like I can just write a Python wrapper for my existing > BioPerl scripts - although I suspect the "correct" method is to > use BioPython methods (but Python annoys me with its lack > of semi-colons and required white-space) Galaxy is agnostic about what language the tools are in, you can use a binary, shell script, Java, Perl, Python etc. Peter From florent.angly at gmail.com Sun Nov 27 21:09:45 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 12:09:45 +1000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: <4ED2ED69.10601@gmail.com> Hi Russell, As Peter said, the tools to be wrapped do not need to be written in Python. I have build a few wrappers for Galaxy, including one for the read simulator Grinder (http://sourceforge.net/projects/biogrinder/), which uses Bioperl and is available in the Galaxy Toolshed (http://sourceforge.net/projects/biogrinder/). It is not very hard to do a wrapper for trivial programs, but becomes more complicated once you start having optional arguments or multiple output files. Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) to parse command-line arguments. I have been thinking about leveraging the information that Getopt::Euclid stores about command-line arguments to automate most of the Galaxy wrapper generation, but I have not gotten to it yet. Florent On 28/11/11 11:28, Peter Cock wrote: > On Monday, November 28, 2011, Smithies, Russell wrote: >> Possibly the wrong place to ask but has anyone written >> Galaxy tools using BioPerl? >> I was thinking of creating blast graphic and format converter >> tools as I couldn't see any already available in their toolbox. >> It looks like I can just write a Python wrapper for my existing >> BioPerl scripts - although I suspect the "correct" method is to >> use BioPython methods (but Python annoys me with its lack >> of semi-colons and required white-space) > Galaxy is agnostic about what language the tools are in, > you can use a binary, shell script, Java, Perl, Python etc. > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Sun Nov 27 23:35:31 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 14:35:31 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules Message-ID: <4ED30F93.4000407@gmail.com> Hi all, I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. I envision the following modules: * Bio::Community::Member module representing members of a community. * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. Any interest? Ideas? Comments? Thanks, Florent From cjfields at illinois.edu Mon Nov 28 14:42:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:42:12 +0000 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> References: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu> Ross, The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects chris On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote: > Hi all, > > To write a script to extract sequence generically for all types of > BioLocation objects, I'd like to know if there is any way to check what > types (e.g. simple or split) are being processed. > > Bio::Location::CoordinatePolicyI appears to be doing something similar but > it is more like a post checking step. If I parse the genbank file line by > line, I can certainly check whether the line contains keywords like "join" > but as I'm using something like: > > my @features=grep{$_->primary_tag eq $chkTags[0]} > $seqobj->get_SeqFeatures; > > > foreach (@features) { > > $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; > > @gene=[]; > > I'd appreciate if anybody knows a better integration with the well-developed > bioperl module. > > Thanks a lot. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 28 14:47:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:47:10 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED30F93.4000407@gmail.com> References: <4ED30F93.4000407@gmail.com> Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? I do think it should be developed on it's own, per our recent discussions re: slimming down core. Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. chris On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > Hi all, > > I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. > > I envision the following modules: > * Bio::Community::Member module representing members of a community. > * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... > * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. > > The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > > Thanks, > > Florent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Nov 28 15:25:13 2011 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 28 Nov 2011 21:25:13 +0100 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: And now to the list too, On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > The idea is to implement these modules in Moose to teach myself Moose. The > members of a community could be a sequence (Bio::SeqI), a species (Bio::S), > an arbitrary string or even other things. I am not quite sure if Bioperl > provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > Sounds like a good use-case for roles, maybe even parametric roles. Leon From florent.angly at gmail.com Mon Nov 28 19:59:24 2011 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 29 Nov 2011 10:59:24 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> Message-ID: <4ED42E6C.6020501@gmail.com> Hi Chris, On 29/11/11 05:47, Fields, Christopher J wrote: > I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? None of these features would be duplicated. Rather, they would be used attributes of the Bio::Community::* objects. For example, a member of a community could have a Bio::SeqI attached to it as well as a Bio::Taxon, etc... > I do think it should be developed on it's own, per our recent discussions re: slimming down core. Yes, the features are so different that it makes sense to have the Bio::Community::* modules as a separate BioPerl distribution (like the Bio-FeatureIO BioPerl distribution). > Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Best, Florent > chris > > On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > >> Hi all, >> >> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. >> >> I envision the following modules: >> * Bio::Community::Member module representing members of a community. >> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... >> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. >> >> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> >> Thanks, >> >> Florent >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 29 00:32:50 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 05:32:50 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote: > And now to the list too, > > On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > >> The idea is to implement these modules in Moose to teach myself Moose. The >> members of a community could be a sequence (Bio::SeqI), a species (Bio::S), >> an arbitrary string or even other things. I am not quite sure if Bioperl >> provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> > > Sounds like a good use-case for roles, maybe even parametric roles. > > Leon Yep, agree totally. It would be a good replacement in most cases for the BioI interfaces. (see also, the Biome project, which I'm slooooooowly working on again, on github) chris From pmr at ebi.ac.uk Tue Nov 29 08:39:52 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 29 Nov 2011 13:39:52 +0000 Subject: [Bioperl-l] BinarySearch.pm Message-ID: <4ED4E0A8.30102@ebi.ac.uk> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. Both appear to be in the Bio/Flat/BinarySearch.pm source file. EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: if ($format =~ /embl/i) { return ('ID', "^ID (\\S+[^; ])", "^ID (\\S+[^; ])", { ACC => q/^AC (\S+);/, VERSION => q/^SV\s+(\S+)/ }); } The ACC secondary index has every record duplicated. This line is duplicated in the write_secondary_indices source code. Is that intentional? print $fh sprintf("%-${length}s",$record); regards, Peter Rice EMBOSS Team From uni.anastasia at gmail.com Sat Nov 26 12:32:48 2011 From: uni.anastasia at gmail.com (anastsia shapiro) Date: Sat, 26 Nov 2011 19:32:48 +0200 Subject: [Bioperl-l] Problem with parsing blast results Message-ID: Hello, I'm running a script that should parse a blast results, using searchIO. Sometimes the script work fines, however sometimes it stops, and I receive the following error. ------------- EXCEPTION ------------- MSG: no data for midline Query ------------------------------------------------------------ STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ blast.pm:1805 STACK toplevel D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 ------------------------------------- While the blast results files were received as a result of running the following blast command: blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I am using bioperl 1.6.1. I read all the forums , and it seems to be a bug, but on version 1.5 it was fixed. I will really appreciate your help, since I am trying to understand the problem for over a month. Regards, Anastasia From bunk at novozymes.com Tue Nov 29 11:46:54 2011 From: bunk at novozymes.com (Jacob Bunk Nielsen) Date: Tue, 29 Nov 2011 17:46:54 +0100 Subject: [Bioperl-l] Problem with parsing blast results In-Reply-To: (anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100") References: Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net> Hi anastsia shapiro writes: > I'm running a script that should parse a blast results, using searchIO. > > Sometimes the script work fines, however sometimes it stops, and I receive > the following error. > > ------------- EXCEPTION ------------- > MSG: no data for midline Query > ------------------------------------------------------------ > STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ > blast.pm:1805 > STACK toplevel > D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 > ------------------------------------- > While the blast results files were received as a result of running the > following blast command: > blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust > no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I don't know why this exact problem arises, but I think you should consider using an output format that is better machine parseable, like the XML format. You specify XML as output format of blastn by using -m 7. When reading the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO. That way I think you are likely to see a lot fewer problems regarding the parsing of blast output. If the above doesn't solve the problem you better show us the code that fails. Best regards Jacob From cjfields at illinois.edu Tue Nov 29 14:11:11 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 19:11:11 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED42E6C.6020501@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > Hi Chris, > > On 29/11/11 05:47, Fields, Christopher J wrote: > ... >> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. > Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > Best, > > Florent chris From cjfields at illinois.edu Tue Nov 29 17:30:58 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 22:30:58 +0000 Subject: [Bioperl-l] BinarySearch.pm In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk> References: <4ED4E0A8.30102@ebi.ac.uk> Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu> Peter, Can you send a test file that is failing? I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files. I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions. Both changes pass tests as is, though, so I have committed them in the meantime. chris On Nov 29, 2011, at 7:39 AM, Peter Rice wrote: > In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. > > Both appear to be in the Bio/Flat/BinarySearch.pm source file. > > EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: > > if ($format =~ /embl/i) { > return ('ID', > "^ID (\\S+[^; ])", > "^ID (\\S+[^; ])", > { > ACC => q/^AC (\S+);/, > VERSION => q/^SV\s+(\S+)/ > }); > } > > The ACC secondary index has every record duplicated. > This line is duplicated in the write_secondary_indices source code. Is that intentional? > > print $fh sprintf("%-${length}s",$record); > > regards, > > Peter Rice > EMBOSS Team > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Nov 29 20:18:41 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 11:18:41 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> Message-ID: <4ED58471.3030106@gmail.com> Chris, Yes, it is exciting to learn something new. I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? Cheers, Florent On 30/11/11 05:11, Fields, Christopher J wrote: > On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > >> Hi Chris, >> >> On 29/11/11 05:47, Fields, Christopher J wrote: >> ... >>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? > Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > >> Best, >> >> Florent > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 29 21:34:00 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 30 Nov 2011 02:34:00 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED58471.3030106@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > Chris, > Yes, it is exciting to learn something new. > I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: https://github.com/bioperl/Bio-Community chris > Cheers, > Florent > > On 30/11/11 05:11, Fields, Christopher J wrote: >> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >> >>> Hi Chris, >>> >>> On 29/11/11 05:47, Fields, Christopher J wrote: >>> ... >>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >> >>> Best, >>> >>> Florent >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Nov 29 21:50:04 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 12:50:04 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: <4ED599DC.6090808@gmail.com> Fantastic! Thank you very much Chris, Florent On 30/11/11 12:34, Fields, Christopher J wrote: > On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > >> Chris, >> Yes, it is exciting to learn something new. >> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? > It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: > > https://github.com/bioperl/Bio-Community > > chris > > >> Cheers, >> Florent >> >> On 30/11/11 05:11, Fields, Christopher J wrote: >>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >>> >>>> Hi Chris, >>>> >>>> On 29/11/11 05:47, Fields, Christopher J wrote: >>>> ... >>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >>> >>>> Best, >>>> >>>> Florent >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Wed Nov 30 00:25:32 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 00:25:32 -0500 Subject: [Bioperl-l] Exception MSG Message-ID: Hello, Brushing up on my BioPerl and I can't figure out this MSG: ------------- EXCEPTION ------------- MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out STACK Bio::Tools::Run::RemoteBlast::save_output /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 ------------------------------------- Here is the code: #!/usr/bin/perl -w use strict; use Bio::Tools::Run::RemoteBlast; #=cut my $prog = 'blastp'; my $db = 'swissprot'; my $e_val = '1e-10'; my @params = ('-prog' => $prog, '-data' => $db, 'expect' => $e_val, 'readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #human database $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; my $v =1; # this is just to turn on and off the messages # Construct the sequence object my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format => "fasta"); while (my $input = $seq_in->next_seq()){ my $r = $factory->submit_blast($input); print STDERR "waiting..." if ($v > 0); while (my @rids = $factory->each_rid()){ foreach my $rid (@rids){ my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if($rc < 0){ $factory->remove_rid($rid); } print STDERR "." if ($v > 0); sleep 5; } else { my $result = $rc->next_result(); #save output my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } Thanks for the help! From jason.stajich at gmail.com Wed Nov 30 01:05:41 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 29 Nov 2011 22:05:41 -0800 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself. On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > Hello, > > Brushing up on my BioPerl and I can't figure out this MSG: > > ------------- EXCEPTION ------------- > > MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > STACK Bio::Tools::Run::RemoteBlast::save_output > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > ------------------------------------- > Here is the code: > > #!/usr/bin/perl -w > > use strict; > > use Bio::Tools::Run::RemoteBlast; > > > #=cut > > my $prog = 'blastp'; > > my $db = 'swissprot'; > > my $e_val = '1e-10'; > > > my @params = ('-prog' => $prog, > > '-data' => $db, > > 'expect' => $e_val, > > 'readmethod' => 'SearchIO' ); > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > #human database > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > [ORGN]'; > > > my $v =1; # this is just to turn on and off the messages > > # Construct the sequence object > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format > => "fasta"); > > > while (my $input = $seq_in->next_seq()){ > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ($v > 0); > > while (my @rids = $factory->each_rid()){ > > foreach my $rid (@rids){ > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if($rc < 0){ > > $factory->remove_rid($rid); > > } > > print STDERR "." if ($v > 0); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save output > > my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > Thanks for the help! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ss2489 at cornell.edu Wed Nov 30 09:32:47 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Wed, 30 Nov 2011 09:32:47 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: If that does not fix it, try using one of the unique identifiers as the file name (gi??) instead of the full query name. The pipe(|) characters might cause problems. On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > I don't think you need to give it the '>' when you specify the filename > for the output. That is done by the filehandle opening itsself. > > On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > > > Hello, > > > > Brushing up on my BioPerl and I can't figure out this MSG: > > > > ------------- EXCEPTION ------------- > > > > MSG: cannot open > >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > > > STACK Bio::Tools::Run::RemoteBlast::save_output > > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > > > ------------------------------------- > > Here is the code: > > > > #!/usr/bin/perl -w > > > > use strict; > > > > use Bio::Tools::Run::RemoteBlast; > > > > > > #=cut > > > > my $prog = 'blastp'; > > > > my $db = 'swissprot'; > > > > my $e_val = '1e-10'; > > > > > > my @params = ('-prog' => $prog, > > > > '-data' => $db, > > > > 'expect' => $e_val, > > > > 'readmethod' => 'SearchIO' ); > > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #human database > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > [ORGN]'; > > > > > > my $v =1; # this is just to turn on and off the messages > > > > # Construct the sequence object > > > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", > -format > > => "fasta"); > > > > > > while (my $input = $seq_in->next_seq()){ > > > > my $r = $factory->submit_blast($input); > > > > print STDERR "waiting..." if ($v > 0); > > > > while (my @rids = $factory->each_rid()){ > > > > foreach my $rid (@rids){ > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if($rc < 0){ > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ($v > 0); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save output > > > > my $filename = > ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > > > > Thanks for the help! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Wed Nov 30 09:34:52 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 09:34:52 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: Surya, As Jason suggested, I removed the '>' and it worked. Thanks for your response. Lom On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha wrote: > If that does not fix it, try using one of the unique identifiers as the > file name (gi??) instead of the full query name. The pipe(|) characters > might cause problems. > > On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > >> I don't think you need to give it the '>' when you specify the filename >> for the output. That is done by the filehandle opening itsself. >> >> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: >> >> > Hello, >> > >> > Brushing up on my BioPerl and I can't figure out this MSG: >> > >> > ------------- EXCEPTION ------------- >> > >> > MSG: cannot open >> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out >> > >> > STACK Bio::Tools::Run::RemoteBlast::save_output >> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 >> > >> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 >> > >> > ------------------------------------- >> > Here is the code: >> > >> > #!/usr/bin/perl -w >> > >> > use strict; >> > >> > use Bio::Tools::Run::RemoteBlast; >> > >> > >> > #=cut >> > >> > my $prog = 'blastp'; >> > >> > my $db = 'swissprot'; >> > >> > my $e_val = '1e-10'; >> > >> > >> > my @params = ('-prog' => $prog, >> > >> > '-data' => $db, >> > >> > 'expect' => $e_val, >> > >> > 'readmethod' => 'SearchIO' ); >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > >> > #human database >> > >> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens >> > [ORGN]'; >> > >> > >> > my $v =1; # this is just to turn on and off the messages >> > >> > # Construct the sequence object >> > >> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", >> -format >> > => "fasta"); >> > >> > >> > while (my $input = $seq_in->next_seq()){ >> > >> > my $r = $factory->submit_blast($input); >> > >> > print STDERR "waiting..." if ($v > 0); >> > >> > while (my @rids = $factory->each_rid()){ >> > >> > foreach my $rid (@rids){ >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) { >> > >> > if($rc < 0){ >> > >> > $factory->remove_rid($rid); >> > >> > } >> > >> > print STDERR "." if ($v > 0); >> > >> > sleep 5; >> > >> > } else { >> > >> > my $result = $rc->next_result(); >> > >> > #save output >> > >> > my $filename = >> ">/Users/mydata/Desktop/".$result->query_name().".out";#error >> > >> > $factory->save_output($filename); >> > >> > $factory->remove_rid($rid); >> > >> > print "\nQuery Name: ", $result->query_name(), "\n"; >> > >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v > 0); >> > >> > print "\thit name is ", $hit->name, "\n"; >> > >> > while( my $hsp = $hit->next_hsp ) { >> > >> > print "\t\tscore is ", $hsp->score, "\n"; >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > >> > >> > Thanks for the help! >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From ericdemuinck at gmail.com Wed Nov 30 18:36:36 2011 From: ericdemuinck at gmail.com (Ericde) Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST) Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form Message-ID: <32886592.post@talk.nabble.com> :-/ I am a newbie and I am trying to retrieve a blast multiple alignment in fasta form. The BLAST output (m -2) gives several alignments (which is good) and the parsing of the xml file seems to list all of these alignments (which is also good) The problem is that the fasta alignment file only includes one of the hits and the alignment does not include all the sequences (including the query sequence). I would like to generate a fasta file that includes all the alignments included in the m -2 output (plus query sequence if possible). I have cobbled together a script (below) ...I will attach the sample blast xml file and the (m -2) file as well....any insight is appreciated :/ #module load perl #give the name of the blast xml file to parse in the line where it says 'file =>' use Bio::SearchIO; #Use m -7 to generate xml file from blastall my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'BLASToutxml'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object #ENTER desired sequence length if( $hsp->length('total') > 50 ) { #ENTER desired percent identity if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; #Print alignment to file #$aln will be a Bio::SimpleAlign object use Bio::AlignIO; my $aln = $hsp->get_aln; #changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file => ">hsp.fas"); $alnIO->write_aln($aln); } } } } } http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml http://old.nabble.com/file/p32886592/hsp.fas hsp.fas -- View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From hrh at fmi.ch Tue Nov 1 06:18:54 2011 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Tue, 1 Nov 2011 11:18:54 +0100 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: Message-ID: Hi Carn? Please allow me to make a few comments: I very much like your idea of writing a free tool to edit and draw sequences. We (ie people working in core Bioinformatics facilities) all suffer from having to deal with files originally created with commercial packages. And on top of all the pain, those commercial packages are very expensive and they don't deliver what they promise to do. Just double checking: Have you looked a the free tools which are available? I am aware of the following ones (as far as I know, they are all GUI based and don't have a command line API): Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html GENtle http://gentle.magnusmanske.de/ GeneCoder http://www.algosome.com/gene-coder/gene-coder.html pDRAW32 http://www.acaclone.com/ Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> UGene http://ugene.unipro.ru/ maybe others on the list know of even better free tools? Also, have you looked at the emboss tool "cirdna" ? WRT file formats: I strongly suggest to stick to embl and genbank format as input and (text) output format. The features are not indexed, but you can create your own when you store the sequences in your system. Internally, you probably wanna keep the data in a 'simpler' format than embl or genbank, anyway. Alternatively, have you looked at gff/gtf as away of getting features? see: http://www.sequenceontology.org/gff3.shtml http://mblab.wustl.edu/GTF22.html I am looking forward to any progress you make Regards, Hans Hans-Rudolf Hotz, PhD Bioinformatics Support Friedrich Miescher Institute for Biomedical Research Maulbeerstrasse 66 4058 Basel/Switzerland On 10/31/11 7:05 PM, "Carn? Draug" wrote: > Hi > > I've been planning on writing a free (as in freedom) tool to edit > sequences and make plamids maps. The idea is to build the command line > tool first and maybe later work on a GUI for it. > > The problem I foresee at the moment while designing it, is how to > change a feature of the sequence. I'm not familiar with all sequence > formats (only fasta, ensembl and genbank) but I can't see how to > specify from the command line what feature to edit since I can't see > any unique identifiers for them. Is there a file format that makes > this easier? Any tips would be most appreciated. > > Thank in advance, > Carn? Draug > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 09:40:30 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 13:40:30 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote: > Hi, > > I am having problems running Bio::Index::Fastq. I get the following error when a quality line begins with '@'. > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: No description line parsed > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71 > STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29 > STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147 > STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198 > STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68 > > > Here is an example of a fastq record that is causing this error, The last line which starts with an '@' is actually the qual line. > > @5:105:15806:16092:Y > GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG > + > @9;A565:=8B? > > > i see that chris has partially addressed this in the mailing list > http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html > > However as he pointed out at the time, it appears this may be a fairly large problem. The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not. I can try to push this to the forefront this week, the fix shouldn't be too hard to implement. In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running. > My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0 would work since the header lines are always the first of 4 lines , 0,4,8, etc. That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing. There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again. A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second). The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use. > But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence > > > ## only for single line seq and qual > my $line_count = 0; > while (<$FASTQ>) { > if (/^@/ and $line_count % 4 == 0) { > # $begin is the position of the first character after the '@' > my $begin = tell($FASTQ) - length( $_ ) + 1; > foreach my $id (&$id_parser($_)) { > $self->add_record($id, $i, $begin); > $c++; > } > } > $line_count++; > } > > > -- > BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID? > > There's one called cdbfasta which looks like it might work ? does anyone have experience with it? I haven't, but it appears FASTA-specific. Does it parse FASTQ as well? I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well. May have to look that one up. > Thanks, > sofia > > P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here. chris From p.j.a.cock at googlemail.com Tue Nov 1 10:38:43 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 14:38:43 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J wrote: > > One problem the various Bio* indexers have currently is the lack of > standardization on a specific schema for indexing. ?There are in-roads > towards this (OBDA) that haven't been adequately traveled IMHO, > which need to be taken up again. > Something to switch to open-bio-l at lists.open-bio.org for, http://lists.open-bio.org/mailman/listinfo/open-bio-l We can continue this thread from last summer, http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html ... http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html And CC Peter Rice from EMBOSS too - we chatted about this at ISMB/BOSC 2011 in July - and whomever looks after the OBDA/indexing code in BioRuby and BioJava too. > A second, and maybe this is more specific to BioPerl, is that the > parsers and indexers essentially reimplement the format parsing > in each module, so if there are bugs they have to be independently > fixed (hence why SeqIO works and the indexer doesn't; I wrote the > first but not the second). ?The best place for any optimizations > would be in a unified parser that both the SeqIO and indexer > modules could use. We have that problem to an extent in Biopython's Bio.SeqIO code. The indexing code duplicates some logic of the parsing code (how much depends on the format), sufficient to extract the read ID and the bounds on disk. The two could be more unified but the parsers came first and didn't want to change them at the time. Instead I tried to be rigorous in consistency testing for the index code's unit tests. Regards, Peter From carandraug+dev at gmail.com Tue Nov 1 11:13:06 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 1 Nov 2011 15:13:06 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: On 1 November 2011 10:18, Hotz, Hans-Rudolf wrote: > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): They are not all free. Just for future reference, here's their licenses: > Serial Cloner Couldn't find a license and the download for linux has no source so I'm guessing proprietary. > GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/ Free under GPL > GeneCoder Proprietary > pDRAW32 Proprietary > Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/ Seems public domain. License is not defined anywhere but the files I checked had the public domain notice on the header > Ape Proprietary ("license" is at the top of AppMain.tcl) > UGene ? ? ? ? ? ? http://ugene.unipro.ru/ Free under GPL > Also, have you looked at the emboss tool "cirdna" ? Free under GPL > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html Considering the already existing alternatives, I'm more likely to collaborate with one of them to do what I want. I'll just have to check them all and decide. I was planning on writing a new tool and contribute it to the scripts section of bioperl since when I googled before all the links only the proprietary tools showed up. Thank you very much for the links. Carn? From roy.chaudhuri at gmail.com Tue Nov 1 11:44:19 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 01 Nov 2011 15:44:19 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: <4EB013D3.30801@gmail.com> The Sanger Institute's Artemis is good for editing sequence features, and DNAPlotter can be used to produce circular diagrams: http://www.sanger.ac.uk/resources/software/artemis http://www.sanger.ac.uk/resources/software/dnaplotter Roy. On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote: > Hi Carn? > > Please allow me to make a few comments: > > I very much like your idea of writing a free tool to edit and draw > sequences. We (ie people working in core Bioinformatics facilities) all > suffer from having to deal with files originally created with commercial > packages. And on top of all the pain, those commercial packages are very > expensive and they don't deliver what they promise to do. > > > Just double checking: Have you looked a the free tools which are available? > > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): > > Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html > GENtle http://gentle.magnusmanske.de/ > GeneCoder http://www.algosome.com/gene-coder/gene-coder.html > pDRAW32 http://www.acaclone.com/ > Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ > Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> > UGene http://ugene.unipro.ru/ > > maybe others on the list know of even better free tools? > > Also, have you looked at the emboss tool "cirdna" ? > > > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html > > > > I am looking forward to any progress you make > > Regards, Hans > > > > Hans-Rudolf Hotz, PhD > Bioinformatics Support > > Friedrich Miescher Institute for Biomedical Research > Maulbeerstrasse 66 > 4058 Basel/Switzerland > > > > On 10/31/11 7:05 PM, "Carn? Draug" wrote: > >> Hi >> >> I've been planning on writing a free (as in freedom) tool to edit >> sequences and make plamids maps. The idea is to build the command line >> tool first and maybe later work on a GUI for it. >> >> The problem I foresee at the moment while designing it, is how to >> change a feature of the sequence. I'm not familiar with all sequence >> formats (only fasta, ensembl and genbank) but I can't see how to >> specify from the command line what feature to edit since I can't see >> any unique identifiers for them. Is there a file format that makes >> this easier? Any tips would be most appreciated. >> >> Thank in advance, >> Carn? Draug >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Tue Nov 1 12:02:24 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 1 Nov 2011 09:02:24 -0700 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Jason On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J > wrote: >> >> One problem the various Bio* indexers have currently is the lack of >> standardization on a specific schema for indexing. There are in-roads >> towards this (OBDA) that haven't been adequately traveled IMHO, >> which need to be taken up again. >> > > Something to switch to open-bio-l at lists.open-bio.org for, > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > We can continue this thread from last summer, > http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html > ... > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html > > And CC Peter Rice from EMBOSS too - we chatted about this > at ISMB/BOSC 2011 in July - and whomever looks after the > OBDA/indexing code in BioRuby and BioJava too. > >> A second, and maybe this is more specific to BioPerl, is that the >> parsers and indexers essentially reimplement the format parsing >> in each module, so if there are bugs they have to be independently >> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >> first but not the second). The best place for any optimizations >> would be in a unified parser that both the SeqIO and indexer >> modules could use. > > We have that problem to an extent in Biopython's Bio.SeqIO code. > The indexing code duplicates some logic of the parsing code > (how much depends on the format), sufficient to extract the read > ID and the bounds on disk. The two could be more unified but > the parsers came first and didn't want to change them at the time. > Instead I tried to be rigorous in consistency testing for the index > code's unit tests. > > Regards, > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 13:44:25 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 17:44:25 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point. > I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data. The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)? Or are there problems afoot there we're unaware of? Re: specifics, I think Biopython uses SQLite, is that correct Peter? chris > Jason > On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > >> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J >> wrote: >>> >>> One problem the various Bio* indexers have currently is the lack of >>> standardization on a specific schema for indexing. There are in-roads >>> towards this (OBDA) that haven't been adequately traveled IMHO, >>> which need to be taken up again. >>> >> >> Something to switch to open-bio-l at lists.open-bio.org for, >> http://lists.open-bio.org/mailman/listinfo/open-bio-l >> >> We can continue this thread from last summer, >> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html >> ... >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html >> >> And CC Peter Rice from EMBOSS too - we chatted about this >> at ISMB/BOSC 2011 in July - and whomever looks after the >> OBDA/indexing code in BioRuby and BioJava too. >> >>> A second, and maybe this is more specific to BioPerl, is that the >>> parsers and indexers essentially reimplement the format parsing >>> in each module, so if there are bugs they have to be independently >>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >>> first but not the second). The best place for any optimizations >>> would be in a unified parser that both the SeqIO and indexer >>> modules could use. >> >> We have that problem to an extent in Biopython's Bio.SeqIO code. >> The indexing code duplicates some logic of the parsing code >> (how much depends on the format), sufficient to extract the read >> ID and the bounds on disk. The two could be more unified but >> the parsers came first and didn't want to change them at the time. >> Instead I tried to be rigorous in consistency testing for the index >> code's unit tests. >> >> Regards, >> >> Peter >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From p.j.a.cock at googlemail.com Tue Nov 1 14:06:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 18:06:50 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J wrote: > On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > >> I think a different indexer is needed for the scale of key/value >> pairs we see in fastq files if we want to make a fast lookup by >> ID. I think speed is of essence for this type of solution and so >> a forced all records must be 4 lines long is okay for this type >> of implementation. > > This can always be an early optimization, that's easy enough. > But I'm sure we will have to deal with multi-line seq/qual > FASTQ at some point. > >> I found NOSQL implementations to be much better >> performance and than any of the BDB type solutions -- they >> end up being really slow at above 1-5M keys. ?I used >> TokyoCabinet and KyotoCabinet to do indexing of accession >> -> taxonomy ID and found it quite fast for the needs. I >> haven't tried storing 100bp reads + qual string as the >> value in it yet but I think it could be done, certainly worth >> a prototype. > > Adding a middle layer where the backend storage is abstracted > is the probably the (best|most flexible) option, converging on a > good default that will work for this data. ?The actual interface is > in place, though would it be more feasible to go the OBDA > (converge on a cross-Bio* compatible schema)? ?Or are there > problems afoot there we're unaware of? > > Re: specifics, I think Biopython uses SQLite, is that correct Peter? > > chris Yes, we're using SQLite3 to store essentially a list of filenames and their format as one table, and then in the main table an entry for each sequence recording the ID (only one accession, unlike OBDA which had infrastructure for a secondary accession), file number, offset of the start of the record, and optionally the length of the record on disk. i.e. Basically what OBDA does, but using SQLite rather than BDB (not included in Python 3) or a flat file index (poor performance with large datasets). I find this design attractive on several levels: * File format neutral, covers FASTA, FASTQ, GenBank, etc * Preserves the original file untouched * Index is a small single file (thanks to SQLite) * Back end could be switched out * Could be applied to compressed file formats * Reuses existing parsing code to access entries This could easily form basis of OBDA v2, the main points of difference I anticipate between the Bio* projects would be naming conventions for the different file formats, and what we consider to be the default record ID of each read (e.g. which field in a GenBank file - although agreement here is not essential). Some of that was already settled in principle with OBDA v1. On the other hand, you could try and store the parsed data itself, which is where NOSQL looks more interesting. That essentially requires the ability to serialise your annotated sequence object model to disk - which would be tricky to do cross project (much more ambitious than BioSQL is). It also means the "index" becomes very large because it now holds all the original data. Peter From wenbinmei at gmail.com Wed Nov 2 00:25:32 2011 From: wenbinmei at gmail.com (wenbin mei) Date: Wed, 2 Nov 2011 00:25:32 -0400 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment Message-ID: Hi, I need some help in coding. I have a multiple sequence alignment which has gaps. And also I have a reference genome sequence in the alignment which I know all the coordinates for the protein coding genes. I want to extract all these protein coding genes alignment from the big alignment. I am using Bio SimpleAlign but the question is that due to the gaps in the alignment, the coordinates has shifted in the alignment. I wonder is there a way I can not count the gaps and still be able to extract the protein alignment. One way I can do is remove the gaps in the reference first and then extract the sequence. But I don't like this way ... Thank you for help. -best, wenbin From dejian.zhao at gmail.com Wed Nov 2 09:33:18 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Wed, 02 Nov 2011 21:33:18 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree Message-ID: <4EB1469E.4050108@gmail.com> There are various packages on CPAN to cope with phylogenetic analysis. I wonder which module can read the output from other phylogenetic softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to produce a picture which combines the phylogenetic tree and the structure of each gene. From roy.chaudhuri at gmail.com Wed Nov 2 09:49:46 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 02 Nov 2011 13:49:46 +0000 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB1469E.4050108@gmail.com> References: <4EB1469E.4050108@gmail.com> Message-ID: <4EB14A7A.30307@gmail.com> MEGA can export trees in Newick format, which can be read by Bio::TreeIO. The tree can be drawn in EPS format using Bio::Tree::Draw::Cladogram. See: http://www.bioperl.org/wiki/HOWTO:Trees Roy. On 02/11/2011 13:33, Dejian Zhao wrote: > There are various packages on CPAN to cope with phylogenetic analysis. I > wonder which module can read the output from other phylogenetic > softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to > produce a picture which combines the phylogenetic tree and the structure > of each gene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Wed Nov 2 12:29:45 2011 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT) Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: References: Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie> Hi, You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. $aln2 = $aln->slice(20, 30); Cheers, Jun ----- Original Message ----- From: wenbin mei Date: Wednesday, November 2, 2011 5:51 am Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment To: bioperl-l at lists.open-bio.org > Hi, > > I need some help in coding. I have a multiple sequence alignment > which has > gaps. And also I have a reference genome sequence in the > alignment which I > know all the coordinates for the protein coding genes. I want to > extractall these protein coding genes alignment from the big > alignment. I am using > Bio SimpleAlign but the question is that due to the gaps in the > alignment,the coordinates has shifted in the alignment. I wonder > is there a way I can > not count the gaps and still be able to extract the protein > alignment. One > way I can do is remove the gaps in the reference first and then > extract the > sequence. But I don't like this way ... Thank you for help. > > -best, > wenbin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Wed Nov 2 21:39:22 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Thu, 03 Nov 2011 09:39:22 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB14A7A.30307@gmail.com> References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com> Message-ID: <4EB1F0CA.80309@gmail.com> That's great! Many thanks, Roy. On 2011-11-2 21:49, Roy Chaudhuri wrote: > MEGA can export trees in Newick format, which can be read by > Bio::TreeIO. The tree can be drawn in EPS format using > Bio::Tree::Draw::Cladogram. See: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > On 02/11/2011 13:33, Dejian Zhao wrote: >> There are various packages on CPAN to cope with phylogenetic analysis. I >> wonder which module can read the output from other phylogenetic >> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to >> produce a picture which combines the phylogenetic tree and the structure >> of each gene. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From noncoding at gmail.com Thu Nov 3 05:59:26 2011 From: noncoding at gmail.com (Remo Sanges) Date: Thu, 03 Nov 2011 10:59:26 +0100 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie> References: <7300ecdd1dd56.4eb16ff9@ucd.ie> Message-ID: <4EB265FE.30909@gmail.com> To get the location in the initial sequence starting from a column in a multiple alignment you can: 1) create a Bio::LocatableSeq compliant object by using the method each_seq_with_id on the SimpleAlign object 2) then using the method location_from_column on the created LocatableSeq object HTH ERemo -- Remo Sanges Bioinformatics - Animal Physiology and Evolution Stazione Zoologica Anton Dohrn Villa Comunale, 80121 Napoli - Italy +39 081 5833428 On 11/2/11 5:29 PM, Jun Yin wrote: > Hi, > > You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. > > $aln2 = $aln->slice(20, 30); > > Cheers, > Jun > > > ----- Original Message ----- > From: wenbin mei > Date: Wednesday, November 2, 2011 5:51 am > Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment > To: bioperl-l at lists.open-bio.org > >> Hi, >> >> I need some help in coding. I have a multiple sequence alignment >> which has >> gaps. And also I have a reference genome sequence in the >> alignment which I >> know all the coordinates for the protein coding genes. I want to >> extractall these protein coding genes alignment from the big >> alignment. I am using >> Bio SimpleAlign but the question is that due to the gaps in the >> alignment,the coordinates has shifted in the alignment. I wonder >> is there a way I can >> not count the gaps and still be able to extract the protein >> alignment. One >> way I can do is remove the gaps in the reference first and then >> extract the >> sequence. But I don't like this way ... Thank you for help. >> >> -best, >> wenbin >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From G.Gallone at sms.ed.ac.uk Thu Nov 3 07:50:11 2011 From: G.Gallone at sms.ed.ac.uk (Giuseppe G.) Date: Thu, 03 Nov 2011 11:50:11 +0000 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk> Hi, I would be grateful if you could shed some light on the exact meaning of the method overall_percentage_identity in Bio::SimpleAlign. If I understand correctly, the method works by considering only aminoacids that are identical over all the members of the alignment, and then averaging over the total number of aminoacids in the sequence. Is this correct? Thank you Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Thu Nov 3 09:22:21 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 3 Nov 2011 14:22:21 +0100 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk> References: <4EB27FF3.9050203@sms.ed.ac.uk> Message-ID: Hi Giuseppe, If I understand correctly, the method works by considering only aminoacids > that are identical over all the members of the alignment Yes. > , and then averaging over the total number of aminoacids in the sequence. > Is this correct? > Almost. By default, the denominator is the alignment length, namely the length of the MSA including gaps. By means of the 'short' and 'long' options, it's also possible to use the shortest or longest sequence's ungapped lengths as the denominator. Dave From cjfields at illinois.edu Thu Nov 3 14:28:36 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 18:28:36 +0000 Subject: [Bioperl-l] OBDA redux? was Re: Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: (side thread, so re-titling...) On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J > wrote: >> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: >> >>> I think a different indexer is needed for the scale of key/value >>> pairs we see in fastq files if we want to make a fast lookup by >>> ID. I think speed is of essence for this type of solution and so >>> a forced all records must be 4 lines long is okay for this type >>> of implementation. >> >> This can always be an early optimization, that's easy enough. >> But I'm sure we will have to deal with multi-line seq/qual >> FASTQ at some point. >> >>> I found NOSQL implementations to be much better >>> performance and than any of the BDB type solutions -- they >>> end up being really slow at above 1-5M keys. I used >>> TokyoCabinet and KyotoCabinet to do indexing of accession >>> -> taxonomy ID and found it quite fast for the needs. I >>> haven't tried storing 100bp reads + qual string as the >>> value in it yet but I think it could be done, certainly worth >>> a prototype. >> >> Adding a middle layer where the backend storage is abstracted >> is the probably the (best|most flexible) option, converging on a >> good default that will work for this data. The actual interface is >> in place, though would it be more feasible to go the OBDA >> (converge on a cross-Bio* compatible schema)? Or are there >> problems afoot there we're unaware of? >> >> Re: specifics, I think Biopython uses SQLite, is that correct Peter? >> >> chris > > Yes, we're using SQLite3 to store essentially a list of filenames > and their format as one table, and then in the main table an > entry for each sequence recording the ID (only one accession, > unlike OBDA which had infrastructure for a secondary accession), > file number, offset of the start of the record, and optionally the > length of the record on disk. > > i.e. Basically what OBDA does, but using SQLite rather > than BDB (not included in Python 3) or a flat file index > (poor performance with large datasets). > > I find this design attractive on several levels: > * File format neutral, covers FASTA, FASTQ, GenBank, etc > * Preserves the original file untouched > * Index is a small single file (thanks to SQLite) > * Back end could be switched out > * Could be applied to compressed file formats > * Reuses existing parsing code to access entries > > This could easily form basis of OBDA v2, the main points > of difference I anticipate between the Bio* projects would > be naming conventions for the different file formats, and > what we consider to be the default record ID of each read > (e.g. which field in a GenBank file - although agreement > here is not essential). Some of that was already settled in > principle with OBDA v1. The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested). > On the other hand, you could try and store the parsed data > itself, which is where NOSQL looks more interesting. That > essentially requires the ability to serialise your annotated > sequence object model to disk - which would be tricky to do > cross project (much more ambitious than BioSQL is). It also > means the "index" becomes very large because it now holds > all the original data. > > Peter For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc). Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs). Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully. Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs. chris From p.j.a.cock at googlemail.com Thu Nov 3 14:52:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 3 Nov 2011 18:52:50 +0000 Subject: [Bioperl-l] OBDA redux? Message-ID: On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J wrote: > (side thread, so re-titling...) > And CC'ing open-bio-l, which is a better home for this than bioperl-l, where OBDA v2 talk came up again in discussion of a BioPerl indexing problem. Archive links for thread here: http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >> >> Yes, we're using SQLite3 to store essentially a list of filenames >> and their format as one table, and then in the main table an >> entry for each sequence recording the ID (only one accession, >> unlike OBDA which had infrastructure for a secondary accession), >> file number, offset of the start of the record, and optionally the >> length of the record on disk. >> >> i.e. Basically what OBDA does, but using SQLite rather >> than BDB (not included in Python 3) or a flat file index >> (poor performance with large datasets). >> >> I find this design attractive on several levels: >> * File format neutral, covers FASTA, FASTQ, GenBank, etc >> * Preserves the original file untouched >> * Index is a small single file (thanks to SQLite) >> * Back end could be switched out >> * Could be applied to compressed file formats >> * Reuses existing parsing code to access entries >> >> This could easily form basis of OBDA v2, the main points >> of difference I anticipate between the Bio* projects would >> be naming conventions for the different file formats, and >> what we consider to be the default record ID of each read >> (e.g. which field in a GenBank file - although agreement >> here is not essential). Some of that was already settled in >> principle with OBDA v1. > > The primary/secondary IDs could be configurable with a sane > default, I think the bioperl implementations allowed this (and > it is certainly something that will be requested). One reason I went with a single ID only was to keep the Python dictionary based API simple (think hash in Perl). You don't get secondary keys in a Python dict or a hash ;) As a nod to flexibility, in Biopython's Bio.SeqIO indexing you can provide a call back function to map the suggested ID to something else. Obviously this doesn't give the full flexibility of extracting a field from the record's annotation because we don't parse the whole record during indexing (it would be too slow). However, I'm happy for there to be an *optional* secondary key in an OBDA v2 SQLite schema, but Biopython probably won't populate it. We could optionally use it rather than the primary ID on loading an existing index though. Personally I would stick with one key in the index - it should be faster and makes it simpler to switch the back end if we need to later. If anyone wants a second key, they can build a second index *grin*. >> On the other hand, you could try and store the parsed data >> itself, which is where NOSQL looks more interesting. That >> essentially requires the ability to serialise your annotated >> sequence object model to disk - which would be tricky to do >> cross project (much more ambitious than BioSQL is). It also >> means the "index" becomes very large because it now holds >> all the original data. >> >> Peter > > For a fully cross-Bio* compliant format, I don't think it's feasible > to use serialized data unless they are serialized in something > that is easily deserialized across HLLs (JSON, BSON, YAML, > XML, etc). Either that, or such data is stored concurrently with > the binary blob, along with meta data that indicates the source > of the blob, parser, version, etc, etc (unless there are tools out > there that reliably interconvert serialized complex data structures > between HLLs). Anyway you go about it, it seems like it could > be a major ball of hurt, unless implemented very carefully. You missed out RDF as a serialisation ;) But yes, going down the shared serialisation route is going to be messy - as you are well aware: > Aside: I think this was one of the problems with > Bio::DB::SeqFeature::Store, in that it at one point stored > Perl-specific Storable blobs. > > chris Peter From cjfields at illinois.edu Thu Nov 3 15:47:51 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 19:47:51 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J > wrote: >> (side thread, so re-titling...) >> > And CC'ing open-bio-l, which is a better home for this than bioperl-l, > where OBDA v2 talk came up again in discussion of a BioPerl indexing > problem. Archive links for thread here: > > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html yes, good idea... >> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>> >>> Yes, we're using SQLite3 to store essentially a list of filenames >>> and their format as one table, and then in the main table an >>> entry for each sequence recording the ID (only one accession, >>> unlike OBDA which had infrastructure for a secondary accession), >>> file number, offset of the start of the record, and optionally the >>> length of the record on disk. >>> >>> i.e. Basically what OBDA does, but using SQLite rather >>> than BDB (not included in Python 3) or a flat file index >>> (poor performance with large datasets). >>> >>> I find this design attractive on several levels: >>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>> * Preserves the original file untouched >>> * Index is a small single file (thanks to SQLite) >>> * Back end could be switched out >>> * Could be applied to compressed file formats >>> * Reuses existing parsing code to access entries >>> >>> This could easily form basis of OBDA v2, the main points >>> of difference I anticipate between the Bio* projects would >>> be naming conventions for the different file formats, and >>> what we consider to be the default record ID of each read >>> (e.g. which field in a GenBank file - although agreement >>> here is not essential). Some of that was already settled in >>> principle with OBDA v1. >> >> The primary/secondary IDs could be configurable with a sane >> default, I think the bioperl implementations allowed this (and >> it is certainly something that will be requested). > > One reason I went with a single ID only was to keep the > Python dictionary based API simple (think hash in Perl). > You don't get secondary keys in a Python dict or a hash ;) > > As a nod to flexibility, in Biopython's Bio.SeqIO indexing you > can provide a call back function to map the suggested ID to > something else. Obviously this doesn't give the full flexibility > of extracting a field from the record's annotation because we > don't parse the whole record during indexing (it would be too > slow). Same with bioperl. > However, I'm happy for there to be an *optional* secondary > key in an OBDA v2 SQLite schema, but Biopython probably > won't populate it. We could optionally use it rather than the > primary ID on loading an existing index though. Optional implementation of that is fine by me. > Personally I would stick with one key in the index - it should > be faster and makes it simpler to switch the back end if we > need to later. If anyone wants a second key, they can build > a second index *grin*. That's easy enough. >>> On the other hand, you could try and store the parsed data >>> itself, which is where NOSQL looks more interesting. That >>> essentially requires the ability to serialise your annotated >>> sequence object model to disk - which would be tricky to do >>> cross project (much more ambitious than BioSQL is). It also >>> means the "index" becomes very large because it now holds >>> all the original data. >>> >>> Peter >> >> For a fully cross-Bio* compliant format, I don't think it's feasible >> to use serialized data unless they are serialized in something >> that is easily deserialized across HLLs (JSON, BSON, YAML, >> XML, etc). Either that, or such data is stored concurrently with >> the binary blob, along with meta data that indicates the source >> of the blob, parser, version, etc, etc (unless there are tools out >> there that reliably interconvert serialized complex data structures >> between HLLs). Anyway you go about it, it seems like it could >> be a major ball of hurt, unless implemented very carefully. > > You missed out RDF as a serialisation ;) > > But yes, going down the shared serialisation route is going > to be messy - as you are well aware: > >> Aside: I think this was one of the problems with >> Bio::DB::SeqFeature::Store, in that it at one point stored >> Perl-specific Storable blobs. >> >> chris > > Peter yes, it's a problem w/o an easy solution. Anyway, I think an implementation of such at this point would be a premature optimization. chris From biojiangke at gmail.com Tue Nov 8 17:29:54 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST) Subject: [Bioperl-l] Some questions about the Bio::PopGen In-Reply-To: References: Message-ID: <32805996.post@talk.nabble.com> I think the pi calculated in the function isn't really the pi as defined. You need to divide the value by total number of sites (in your case, it's 5, which is not your individual number but sequence length). I think the reason they implemented this way is that sometimes it's easier to work only with variable sites. The aln to population function converts an aln object to a population object. You can't really see the object unless you write additional codes to write it out or do some calculations on it. The third question depends on your specific needs. For population level analyses of molecular evolution, I usually create a multiple sequence alignment with other applications (clustalw etc), then manually adjust the alignments to make sure they represent homology. I wouldn't touch the alignment once this is done but only make an aln (or whatever format you want) for inputting to analyses applications, like Bio::PopGen (usually use the aln_to_population function you're using now). Qian Zhao wrote: > > Hi > Recently, I am learning how to caculate pi, Fst, Tajima D using > Bio::PopGen. > I am not familiar with Perl and I am really confused with the following > problems. > (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used > to caculate is this: > __DATA__ > 01 A01 A > 01 A02 A > 01 A03 A > 01 A04 A > 01 A05 A > 02 A01 A > 02 A02 T > 02 A03 T > 02 A04 T > 02 A05 T > 03 A01 G > 03 A02 G > 03 A03 G > 03 A04 G > 03 A05 G > 04 A01 G > 04 A02 G > 04 A03 C > 04 A04 C > 04 A05 G > 05 A01 T > 05 A02 C > 05 A03 T > 05 A04 T > 05 A05 T > And I am not sure if I can use these sequences below to demostrate the > prettybase format above: >>A01 > AAGGT >>A02 > ATGGC >>A03 > ATGCT >>A04 > ATGCT >>A05 > ATGGT > The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I > use DnaSP. I find that if the 1.4/5=0.28, which means that if the number > from Bio::PopGen::Statistics is divided by the individula number, the > result > would be exactly the same. Is there something wrong in my perl script? The > code I used was below: > #/usr/bin/perl -w > use warnings; > use strict; > use Bio::PopGen::Genotype; > my $genotype = Bio::PopGen::Genotype->new(-marker_name => 'gene_1', > -individual_id => '001', > -alleles => ['1','5'] ); > use Bio::PopGen::Individual; > my $ind = Bio::PopGen::Individual->new(-unique_id => '001', > -genotypes => [$genotype] ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > use Bio::PopGen::Population; > my $pop = Bio::PopGen::Population->new(-name => 'Bm', > -description => 'description', > -individuals => [$ind] ); > use Bio::PopGen::IO; > use Bio::PopGen::Statistics; > my $nummarkers = $pop->get_marker_names; > my $stats = Bio::PopGen::Statistics->new(); > my $io = Bio::PopGen::IO->new (-format => 'prettybase', > -file => '1.txt'); > if( my $pop = $io->next_population ) { > my $pi = $stats->pi($pop, $nummarkers); > print "pi is $pi\n"; > my @inds; > for my $ind ( $pop->get_Individuals ) { > if( $ind->unique_id =~ /A0[1-3]/ ) { > push @inds, $ind; > } > } > print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n"; > } > > (2) I want to use Bio::PopGen::Utilities to translate the alignment file > to > the population file. However, I can not find the result file after the > program. I use the following code: > use Bio::PopGen::Utilities; > use Bio::AlignIO; > > my $in = Bio::AlignIO->new(-file => 't/data/t7.aln', > -format => 'clustalw'); > my $aln = $in->next_aln; > my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln); > my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model => > 'cod', > -alignment => > $aln); > I am not sure where I should add my result file' name in the code. > (3) If my file contains a lot of individual sequences and one individual > has > one genotype. I'd like to know how can I use the Bio::PopGen::Individual, > Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which > can used in Bio::PopGen::Statistics ? > > I will be great appreciated if I can get the answers. Thanks for your time > and Best Wishes! > Qian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biojiangke at gmail.com Tue Nov 8 17:51:22 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST) Subject: [Bioperl-l] questions about the bioperl module Bio::PopGen::Statistics In-Reply-To: <201106012030039537050@gmail.com> References: <201106012030039537050@gmail.com> Message-ID: <32805997.post@talk.nabble.com> If you read the Bio::PopGen doc, you'll see there is an optional argument for the function that calculates pi, which is taking the number of sites into consideration. Also, when you use the aln_to_population function to input an alignment, you can use the option to take in all sites, including the monomorphic sites. I think if you implement both in your script, you'll get the same pi value as from other applications like DnaSP. In terms of sliding window analyses, you may have to implement your own method to move along the windows, but I think DnaSP is ready to do that, you don't have to write your won script. lvu.jun wrote: > > Hi, there, > I am trying to calculate the population genetics parameters such as pi > using the bioperl module Bio::PopGen::Statistics. But I found that the > method only requires the input of the marker genotype of every individuals > for the population. I don't know why the module does not take the DNA > sequence length into consideration when calculating the pi value. > According to the definition of the pi value, besides the polymorphic > sites, we also need the monomorphic sites that should be incorporated in > the denominator when doing the calculation. Is it right? therefore I'm > confused about the module, who can tell me why it can correctly calculate > the pi value only with the marker(polymorphic) genotype? > Another question, if I want to calculate the pi value using the sliding > window along the genome, how can I do this using the > Bio::PopGen::Statistics module? > Thanks for your help! > Yours sincerely, > Jun > > Chinese Academy of Sciences > > 2011-06-01 > > > > lvu.jun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shachigahoimbi at gmail.com Wed Nov 9 00:22:33 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Wed, 9 Nov 2011 10:52:33 +0530 Subject: [Bioperl-l] Run FGENESH using bioperl Message-ID: Dear All. I have multi-fasta sequence file and I want to run FGENESH and I would like to run the FGENESH for sequence one by one stored in multi-fasta sequence file. Is it possible using Bioperl ? Please guide me. Thanks in advance. -- Regards, Shachi From pankajt322 at gmail.com Thu Nov 3 08:12:44 2011 From: pankajt322 at gmail.com (pankaj) Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT) Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On Oct 21, 1:59?am, Shachi Gahoi wrote: > Dear all, > > I have fasta format sequence file and I want to extract ORF ID "PITG_14194" > from fasta file and then I want to rename same file with that ORF ID > "PITG_14194". > > I have many files and I want to do same exercise with all sequence files. > > Please tell me how can i do this with perl or bioperl. > > >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora > > infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 > MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL > ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA > RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF > HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM > YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL > TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD > RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR > NGIAVDHKGVICNGKAPIEIAVDENTLSAAA > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From azaballos at isciii.es Wed Nov 9 06:28:21 2011 From: azaballos at isciii.es (Angel Zaballos) Date: Wed, 9 Nov 2011 12:28:21 +0100 Subject: [Bioperl-l] bp_genbank2gff.pl bug Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Running bp_genbank2gff.pl got this: [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251. ?ngel Zaballos Unidad de Gen?mica Centro Nacional de Microbiolog?a-ISCIII Carretera Majadahonda-Pozuelo, Km 2,2 28220-Majadahonda Tel: 918223994 mail: azaballos at isciii.es ************************* AVISO LEGAL ************************* Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, pudiendo contener documentos anexos de car?cter privado y confidencial. Si por error, ha recibido este mensaje y no se encuentra entre los destinatarios, por favor, no use, informe, distribuya, imprima o copie su contenido por ning?n medio. Le rogamos lo comunique al remitente y borre completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no asume ning?n tipo de responsabilidad legal por el contenido de este mensaje cuando no responda a las funciones atribuidas al remitente del mismo por la normativa vigente. From scott at scottcain.net Wed Nov 9 11:12:02 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 11:12:02 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: Hi Angel, I would suggest using bp_genbank2gff3.pl, as it is more actively maintained; the bp_genbank2gff.pl script hasn't really been touched in many years, and I imagine it's suffering from some serious code rot. Scott 2011/11/9 Angel Zaballos > Running bp_genbank2gff.pl got this: > > [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > AAXT01000001.1 > babesichr3.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 11:13:10 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 16:13:10 +0000 Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On 3 November 2011 12:12, pankaj wrote: > > > On Oct 21, 1:59?am, Shachi Gahoi wrote: >> Dear all, >> >> I have fasta format sequence file and I want to extract ORF ID "PITG_14194" >> from fasta file and then I want to rename same file with that ORF ID >> "PITG_14194". >> >> I have many files and I want to do same exercise with all sequence files. >> >> Please tell me how can i do this with perl or bioperl. >> >> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora >> >> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 >> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL >> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA >> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF >> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM >> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL >> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD >> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR >> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA >> ---------- Forwarded message ---------- From: Jason Stajich Date: 21 October 2011 10:56 Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl To: Shachi Gahoi Cc: bioperl-l at bioperl.org easy to do this with a simple regular expression and opening a new file. Have you read up on this concept in Perl. You can use SeqIO to parse FASTA files - did you read the HOWTO and website documentation first? We don't typically do people's work for them on this mailing list so please show some effort first. From scott at scottcain.net Wed Nov 9 13:43:00 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 13:43:00 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Chris, Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. Scott 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or > remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus > destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie > su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III > no > >> asume ning?n tipo de responsabilidad legal por el contenido de este > mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo > por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 13:39:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 18:39:52 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Scott, Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? chris On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 9 14:51:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 19:51:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Scott, It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder. Either would prevent it from being packaged and installed in future versions. (Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules) chris On Nov 9, 2011, at 12:43 PM, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. > > Scott > > > 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 15:39:17 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 20:39:17 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: On 9 November 2011 18:43, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the > code repository) is not a bad idea. ?I can't really think of a down side. > > Scott Hi can I suggest instead to simply make the script issue a warning right at the start? Something like "bp_genbank2gff is obsolete and will be removed from a future version of bioerl; please use bp_genbank2gff3 instead". You could leave it there for the next 2 releases and then finally remove it. This would have 2 advantages: 1) people that have been using it will immediately know what to use as replacement (instead of coming and ask in the mailing list)? 2) people who use it but don't know anything about the subject, someone told them to "just press this button" or "just type this in the terminal", won't have suddenly a broken system and will have time to find someone that will make it work again. That's what's done in GNU octave and I think it works good there. Carn? From scott at scottcain.net Wed Nov 9 15:48:07 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 15:48:07 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Carn?, You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) Scott 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 16:59:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 21:59:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Works for me, it's a standard deprecation policy. The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning). chris On Nov 9, 2011, at 2:48 PM, Scott Cain wrote: > Hi Carn?, > > You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) > > Scott > > > 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From biopython at maubp.freeserve.co.uk Thu Nov 10 08:09:40 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 13:09:40 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: <31659982.post@talk.nabble.com> References: <31659982.post@talk.nabble.com> Message-ID: Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: > > I received the following error while trying to run bl2seq from > standaloneblastplus. Has anyone else encountered this problem? > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: /usr/bin/blastp call crashed: There was a problem running > /usr/bin/blastp : Error: NCBI C++ Exception: > > "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", > line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to > access NULL pointer. > > Thank you, > Ryan Just hit something very very similar, looks like a BLAST+ bug which I will report now: $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query NC_003197.fna -evalue 0.0001 -subject NC_011294.fna Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", line 689: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer. This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was BLAST 2.2.24+ (blastp) from the look of the error. The line number has changed by one, but I'm confident it is the same point of failure. In my case I was comparing nucleotide against nucleotide, so should have been using tblastx not tblastn, but it still shouldn't have had a pointer exception. Peter From cjfields at illinois.edu Thu Nov 10 09:00:46 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 14:00:46 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Nov 10, 2011, at 7:09 AM, Peter wrote: > Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html > > On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >> >> I received the following error while trying to run bl2seq from >> standaloneblastplus. Has anyone else encountered this problem? >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: /usr/bin/blastp call crashed: There was a problem running >> /usr/bin/blastp : Error: NCBI C++ Exception: >> >> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >> access NULL pointer. >> >> Thank you, >> Ryan > > Just hit something very very similar, looks like a BLAST+ bug which I > will report now: > > $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query > NC_003197.fna -evalue 0.0001 -subject NC_011294.fna > Error: NCBI C++ Exception: > "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", > line 689: Critical: ncbi::CObject::ThrowNullPointerException() - > Attempt to access NULL pointer. > > This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was > BLAST 2.2.24+ (blastp) from the look of the error. The line number has > changed by one, but I'm confident it is the same point of failure. > > In my case I was comparing nucleotide against nucleotide, so should > have been using tblastx not tblastn, but it still shouldn't have had a > pointer exception. > > Peter Yeah, that's bad. I have seen a few things like this myself that make me worry about the transition to BLAST+. chris PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? From casaburi at ceinge.unina.it Thu Nov 10 07:29:55 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads Message-ID: <32818254.post@talk.nabble.com> Hi everybody, i have some reads (454) where there are adaptors (NNNN...), one,two or three adaptors for each reads depending on the reads. Is there any way to establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors over the total ??? >271-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >272-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >273-88 GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >274-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA The problem is that some adpators occur in the middle of the sequences because they coming out from a concameration experimental design (they are miRNAs between NNNNNN...). So i want to know a script or tool that may say how many reads have 1 adapt, how many 2, (max are 4) in respect to the total number of reads. Do you know any tool/script that may help ? Tnx Can anyone suggests me a script to fix this ??? Thank you very much -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jovel_juan at hotmail.com Thu Nov 10 11:06:16 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Thu, 10 Nov 2011 16:06:16 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <32818254.post@talk.nabble.com> References: <32818254.post@talk.nabble.com> Message-ID: There are many ways to do it. Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. For example: $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. You then place that result in a hash bin: my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} # Then you can sort and output your classes foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } You can workout the details, but something like this should work. > Date: Thu, 10 Nov 2011 04:29:55 -0800 > From: casaburi at ceinge.unina.it > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Scripting help to identify adaptors count in reads > > > Hi everybody, > > i have some reads (454) where there are adaptors (NNNN...), one,two or three > adaptors for each reads depending on the reads. Is there any way to > establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors > over the total ??? > > >271-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG > >272-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC > >273-88 > GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA > >274-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA > > The problem is that some adpators occur in the middle of the sequences > because they coming out from a concameration experimental design (they are > miRNAs between NNNNNN...). So i want to know a script or tool that may say > how many reads have 1 adapt, how many 2, (max are 4) in respect to the total > number of reads. Do you know any tool/script that may help ? Tnx > Can anyone suggests me a script to fix this ??? > > Thank you very much > -- > View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Nov 10 11:55:53 2011 From: scott at scottcain.net (Scott Cain) Date: Thu, 10 Nov 2011 11:55:53 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: Hi Angel, Please keep correspondence on the mailing list. I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), and it worked fine. I suspect there is something wrong with your genbank file. Scott On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > His Scott, > > Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same > happened: > > [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > > babesichr3_2.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > UNIVERSAL->import is deprecated and will be removed in a future perl at > /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 > > However, the output file seems to be correct (Indeed, that was also the > case for bp_genbank2gff.pl). I then ran ldHgGene and worked: > > [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab > babesiachr3_2.gff > Reading babesiachr3_2.gff > Read 4776 transcripts in 8821 lines in 1 files > 4776 groups 1 seqs 1 sources 6 feature types > 2379 gene predictions > > I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a > Mac with Parallels. Maybe tis is the cause for such a message. > > Regards > > > ?ngel > > > El 09/11/2011, a las 17:12, Scott Cain escribi?: > > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este >> mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por >> la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From l.m.timmermans at students.uu.nl Thu Nov 10 12:17:12 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Thu, 10 Nov 2011 18:17:12 +0100 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence > (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# > $adapter_matches will store the number of times the adapter sequence is > repeated. > No, it will not. tr/// will count characters, not sequences. Something like ?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH. Leon From cjfields at illinois.edu Thu Nov 10 14:17:57 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 19:17:57 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu> This is running using an older version of bioperl (probably 1.6.0 or 1.6.1). The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed. chris On Nov 10, 2011, at 10:55 AM, Scott Cain wrote: > Hi Angel, > > Please keep correspondence on the mailing list. > > I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), > and it worked fine. I suspect there is something wrong with your genbank > file. > > Scott > > > On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > >> His Scott, >> >> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same >> happened: >> >> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > >> babesichr3_2.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> UNIVERSAL->import is deprecated and will be removed in a future perl at >> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 >> >> However, the output file seems to be correct (Indeed, that was also the >> case for bp_genbank2gff.pl). I then ran ldHgGene and worked: >> >> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab >> babesiachr3_2.gff >> Reading babesiachr3_2.gff >> Read 4776 transcripts in 8821 lines in 1 files >> 4776 groups 1 seqs 1 sources 6 feature types >> 2379 gene predictions >> >> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a >> Mac with Parallels. Maybe tis is the cause for such a message. >> >> Regards >> >> >> ?ngel >> >> >> El 09/11/2011, a las 17:12, Scott Cain escribi?: >> >> Hi Angel, >> >> I would suggest using bp_genbank2gff3.pl, as it is more actively >> maintained; the bp_genbank2gff.pl script hasn't really been touched in >> many years, and I imagine it's suffering from some serious code rot. >> >> Scott >> >> >> 2011/11/9 Angel Zaballos >> >>> Running bp_genbank2gff.pl got this: >>> >>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >>> AAXT01000001.1 > babesichr3.gff >>> Replacement list is longer than search list at >>> /usr/share/perl5/Bio/Range.pm line 251. >>> >>> >>> >>> ?ngel Zaballos >>> Unidad de Gen?mica >>> Centro Nacional de Microbiolog?a-ISCIII >>> Carretera Majadahonda-Pozuelo, Km 2,2 >>> 28220-Majadahonda >>> >>> Tel: 918223994 >>> mail: azaballos at isciii.es >>> >>> >>> >>> >>> ************************* AVISO LEGAL ************************* >>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >>> pudiendo contener documentos anexos de car?cter privado y confidencial. >>> Si por error, ha recibido este mensaje y no se encuentra entre los >>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >>> asume ning?n tipo de responsabilidad legal por el contenido de este >>> mensaje >>> cuando no responda a las funciones atribuidas al remitente del mismo por >>> la >>> normativa vigente. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Nov 10 14:27:22 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 19:27:22 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J wrote: > On Nov 10, 2011, at 7:09 AM, Peter wrote: > >> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html >> >> On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >>> >>> I received the following error while trying to run bl2seq from >>> standaloneblastplus. Has anyone else encountered this problem? >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: /usr/bin/blastp call crashed: There was a problem running >>> /usr/bin/blastp : Error: NCBI C++ Exception: >>> >>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >>> access NULL pointer. >>> >>> Thank you, >>> Ryan >> >> Just hit something very very similar, looks like a BLAST+ bug which I >> will report now: >> >> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query >> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna >> Error: NCBI C++ Exception: >> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", >> line 689: Critical: ncbi::CObject::ThrowNullPointerException() - >> Attempt to access NULL pointer. >> >> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was >> BLAST 2.2.24+ (blastp) from the look of the error. The line number has >> changed by one, but I'm confident it is the same point of failure. >> >> In my case I was comparing nucleotide against nucleotide, so should >> have been using tblastx not tblastn, but it still shouldn't have had a >> pointer exception. >> >> Peter > > Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+. > > chris I'm told is already fixed and will be part of BLAST 2.2.26+ which is good. > > PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? > Maybe once, but it was in the archive and my email account. Peter From anna.fr at gmail.com Thu Nov 10 15:01:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 09:01:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? Message-ID: Hi all Does anyone know if there is a way to get a Taxonomy node and/or taxonid from a gi number using the flatfile with taxonomy db? I have blast output that I want to append taxonomic information to. I have hundreds of thousands of items to do this for, so it's not practical to use entrez to query the?NCBI database. I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I think much too large to put into a hash! This was also discussed in 2009: http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I don't think there was a conclusion? Thanks for your help Anna Friedlander From shalabh.sharma7 at gmail.com Thu Nov 10 15:12:09 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 10 Nov 2011 15:12:09 -0500 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, I think the thread you mentioned was started by me. That time i wrote few scripts to map gi to taxa, after some time i found some other efficient ways also. But recently 'Miguel Pignatelli' directed to some Bio-LITE modules that are really helpful. These are the modules he mentioned, i found them really easy to use and very efficient. Bio-LITE-Taxonomy-0.07 Bio-LITE-Taxonomy-NCBI-0.07 Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 Cheers Shalabh On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Nov 10 15:23:14 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 20:23:14 +0000 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu> Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option). I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups. chris On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote: > Hi Anna, > I think the thread you mentioned was started by me. > That time i wrote few scripts to map gi to taxa, after some time i found > some other efficient ways also. But recently 'Miguel Pignatelli' directed > to some Bio-LITE modules that are really helpful. > > These are the modules he mentioned, i found them really easy to use and > very efficient. > > Bio-LITE-Taxonomy-0.07 > Bio-LITE-Taxonomy-NCBI-0.07 > Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 > > Cheers > Shalabh > > On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Thu Nov 10 15:51:13 2011 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 10 Nov 2011 21:51:13 +0100 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, Jason changed his example script from using hashes to using SQLite: bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom See https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl It's an example script that shows how to do the tax to gi mapping for blast reports. Bernd On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Nov 10 16:13:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 21:13:12 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split? Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)). tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match. '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/). chris On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. > You then place that result in a hash bin: > my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} > # Then you can sort and output your classes > foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } > > You can workout the details, but something like this should work. > > > > > > > >> Date: Thu, 10 Nov 2011 04:29:55 -0800 >> From: casaburi at ceinge.unina.it >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Scripting help to identify adaptors count in reads >> >> >> Hi everybody, >> >> i have some reads (454) where there are adaptors (NNNN...), one,two or three >> adaptors for each reads depending on the reads. Is there any way to >> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors >> over the total ??? >> >>> 271-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >>> 272-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >>> 273-88 >> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >>> 274-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA >> >> The problem is that some adpators occur in the middle of the sequences >> because they coming out from a concameration experimental design (they are >> miRNAs between NNNNNN...). So i want to know a script or tool that may say >> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total >> number of reads. Do you know any tool/script that may help ? Tnx >> Can anyone suggests me a script to fix this ??? >> >> Thank you very much >> -- >> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Thu Nov 10 16:15:29 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 10 Nov 2011 13:15:29 -0800 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI and then a second db to store GI -> TAXONID This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string. https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl That's the first 165 lines, and then lookups are basically what you see on line 195. Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?). one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading. Jason On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From anna.fr at gmail.com Thu Nov 10 20:07:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 14:07:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> References: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Message-ID: thanks all for the fast responses. I'll try the bio-lite modules shalabh mentioned On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich wrote: > Here's another variant of one I wrote which is for my own purposes, the code > at the beginning uses a NOSQL solution to storing all the ACC -> GI > and then a second db to store GI -> TAXONID > This is the case where I have a file of accession numbers and I want to add > to the description line the taxonomy string. > https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl > That's the first 165 lines, and then lookups are basically what you see on > line 195. > Would be good to rewrite that script below to use TokyoCabinent > or?KyotoCabinent (is newer implementation, not sure if it is faster?). > one thing that this does is take up a lot of disk space ,but you can have > tradeoffs between than and which compression scheme you use, which will > impact performance of loading. > Jason > On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > > have hundreds of thousands of items to do this for, so it's not > > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > > think much too large to put into a hash! > > This was also discussed in 2009: > > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > > don't think there was a conclusion? > > Thanks for your help > > Anna Friedlander > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arun_innovative90 at yahoo.com Fri Nov 11 06:09:46 2011 From: arun_innovative90 at yahoo.com (Arun Kumar) Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST) Subject: [Bioperl-l] BIOPERL MATERIAL Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Hi team, ? ?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl. ? Thanks in advance Thanks & Regards, Arunkumar.d From awitney at sgul.ac.uk Fri Nov 11 08:23:29 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 11 Nov 2011 13:23:29 +0000 Subject: [Bioperl-l] BIOPERL MATERIAL In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Message-ID: All BioPerl documents can be found here: http://www.bioperl.org/wiki/Main_Page And a useful place to start would be the HOWTOs: http://www.bioperl.org/wiki/HOWTOs regards adam On 11 Nov 2011, at 11:09, Arun Kumar wrote: > Hi team, > > This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with bioperl. > > Thanks in advance > > Thanks & Regards, > Arunkumar.d > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From casaburi at ceinge.unina.it Fri Nov 11 07:13:50 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825229.post@talk.nabble.com> Hi thank you for your answer !!! At the end i tried this script and seems to work for this purpose: perl -pe 's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g' Scrivania/orchidea/Fiore/Mydata.fasta > result.txt -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From casaburi at ceinge.unina.it Fri Nov 11 07:21:29 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825274.post@talk.nabble.com> Thanks everybody for answering me so soon !!! Probably another way may be: perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print "$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt and/or with 'nawk': nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i " ADAPTOR"}' myFile.fasta > result.txt They give the same result. If you will have this problem try these, work good !!! Still Thanks to all, Giorgio -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Sun Nov 13 07:24:35 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:24:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J wrote: > On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > >> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J >> wrote: >>> (side thread, so re-titling...) >>> >> And CC'ing open-bio-l, which is a better home for this than bioperl-l, >> where OBDA v2 talk came up again in discussion of a BioPerl indexing >> problem. Archive links for thread here: >> >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > > yes, good idea... I've not CC'd the bioperl-l anymore. >>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>>> >>>> Yes, we're using SQLite3 to store essentially a list of filenames >>>> and their format as one table, and then in the main table an >>>> entry for each sequence recording the ID (only one accession, >>>> unlike OBDA which had infrastructure for a secondary accession), >>>> file number, offset of the start of the record, and optionally the >>>> length of the record on disk. >>>> >>>> i.e. Basically what OBDA does, but using SQLite rather >>>> than BDB (not included in Python 3) or a flat file index >>>> (poor performance with large datasets). >>>> >>>> I find this design attractive on several levels: >>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>>> * Preserves the original file untouched >>>> * Index is a small single file (thanks to SQLite) >>>> * Back end could be switched out >>>> * Could be applied to compressed file formats >>>> * Reuses existing parsing code to access entries >>>> >>>> This could easily form basis of OBDA v2, the main points >>>> of difference I anticipate between the Bio* projects would >>>> be naming conventions for the different file formats, and >>>> what we consider to be the default record ID of each read >>>> (e.g. which field in a GenBank file - although agreement >>>> here is not essential). Some of that was already settled in >>>> principle with OBDA v1. >>> >>> The primary/secondary IDs could be configurable with a sane >>> default, I think the bioperl implementations allowed this (and >>> it is certainly something that will be requested). >> >> One reason I went with a single ID only was to keep the >> Python dictionary based API simple (think hash in Perl). >> You don't get secondary keys in a Python dict or a hash ;) >> >> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you >> can provide a call back function to map the suggested ID to >> something else. Obviously this doesn't give the full flexibility >> of extracting a field from the record's annotation because we >> don't parse the whole record during indexing (it would be too >> slow). > > Same with bioperl. > >> However, I'm happy for there to be an *optional* secondary >> key in an OBDA v2 SQLite schema, but Biopython probably >> won't populate it. We could optionally use it rather than the >> primary ID on loading an existing index though. > > Optional implementation of that is fine by me. > >> Personally I would stick with one key in the index - it should >> be faster and makes it simpler to switch the back end if we >> need to later. If anyone wants a second key, they can build >> a second index *grin*. > > That's easy enough. > >>>> On the other hand, you could try and store the parsed data >>>> itself, which is where NOSQL looks more interesting. That >>>> essentially requires the ability to serialise your annotated >>>> sequence object model to disk - which would be tricky to do >>>> cross project (much more ambitious than BioSQL is). It also >>>> means the "index" becomes very large because it now holds >>>> all the original data. >>>> >>>> Peter >>> >>> For a fully cross-Bio* compliant format, I don't think it's feasible >>> to use serialized data unless they are serialized in something >>> that is easily deserialized across HLLs (JSON, BSON, YAML, >>> XML, etc). ?Either that, or such data is stored concurrently with >>> the binary blob, along with meta data that indicates the source >>> of the blob, parser, version, etc, etc (unless there are tools out >>> there that reliably interconvert serialized complex data structures >>> between HLLs). ?Anyway you go about it, it seems like it could >>> be a major ball of hurt, unless implemented very carefully. >> >> You missed out RDF as a serialisation ;) >> >> But yes, going down the shared serialisation route is going >> to be messy - as you are well aware: >> >>> Aside: I think this was one of the problems with >>> Bio::DB::SeqFeature::Store, in that it at one point stored >>> Perl-specific Storable blobs. >>> >>> chris >> >> Peter > > yes, it's a problem w/o an easy solution. ?Anyway, I think an > implementation of such at this point would be a premature > optimization. > > chris So, Chris and I seem in general agreement that an OBDA v2 using SQLite but based on essentially the same approach as the BDB or flat file based OBDA v1 is a good idea. i.e. Tables mapping record identifiers to file offsets in the original sequence files. I hope to get BioRuby on board, they already have an OBDA v1 support so that shouldn't be too hard. Right now I don't recall if BioJava has/had OBDA v1 support, and if they did if it was affected in their recent move to BioJava v3 (I understand from their mailing list that some older lower priority functionality has not all been ported yet). Also EMBOSS are likely to be interested, certainly Peter Rice was interested in the SQLite indexing we're already using in Biopython for sequence files (i.e. what is effectively the prototype for OBDA v2). Note that in addition to simple indexing of text files, we are already using the same simple offset + length approach for indexing binary files (e.g. SFF). On the immediate practical side, I think I can edit the current OBDA website of http://obda.open-bio.org/ via /home/websites/obda.open-bio.org/html on the server. We need to work out where the current OBDA indexing specification lives (CVS or SVN?) and perhaps move that to github. We may need a general OBF organisation account on git hub for this and any other cross-project repositories. I see there is already an OBDA project on RedMine, (Chris can you add me to that please?) https://redmine.open-bio.org/projects/obda Peter From p.j.a.cock at googlemail.com Sun Nov 13 07:30:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:30:37 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files Message-ID: Hi again, I've retitled this as it is a little off topic from the main OBDA redux thread, http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html As far as I recall, the original flat file and BDB based OBDA specification for indexing sequencing files didn't cover compressed files. That might be something to consider (although we should sort of uncompressed text/binary files first). I've recently been experimenting with using compressed files - in particular simple GZIP files (ignoring any block structure) and BGZF (the specialised gzipped blocking used in BAM), see: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html http://seqanswers.com/forums/showthread.php?t=15347 The virtual offset approach used in BGZF squeezes a 16 bit within block offset (thus limiting you to 64kb blocks) and at 48 bit block start offset (thus limiting you to a 256TB file) into a single 64bit "virtual" offset. That makes sense if you are keeping the lookup table or many offsets in memory, and can be used as is with code expecting a single offset (like the current Biopython SQLite index schema). Also bzip2 but this is block based, with the block size ranging from 100KB to 900KB. http://bzip.org/ http://bzip.org/1.0.5/bzip2-manual-1.0.5.html I haven't tried any performance tests yet, which would be interesting as I believe compression/decompression of bfzip2 is more costly in CPU terms than gzip (although both will be block size dependent). If we wanted to imitate the BGZF virtual offset scheme for arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme could use 20 bits to cover bz2 blocks of up to 900KB, leaving 64 - 20 = 44 bits for the start offset, thus limiting you to to just 2^44 bytes or 16Tb which sounds OK only in the medium term. On the bright side this could be used to index any BZIP2 file (under 16TB), whereas BGZF cannot be applied to any GZIP file. On the other hand, storing the block start and within block separately is truly generic and could be used on any blocked GZIP file (including BGZF) and BZIP2 etc. It would make the SQLite schema a bit more complicated though. Maybe something to consider for the next revision to OBDA, and focus on the non-compressed case for now? Regards, Peter From p.j.a.cock at googlemail.com Sun Nov 13 07:32:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:32:12 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files In-Reply-To: References: Message-ID: On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock wrote: > Hi again, > > I've retitled this as it is a little off topic from the main OBDA redux thread, > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html > > As far as I recall, the original flat file and BDB based OBDA > specification for indexing sequencing files didn't cover > compressed files. That might be something to consider > (although we should sort of uncompressed text/binary > files first). Sorry - didn't meant to include bioperl-l on that, although it may be of interest to you guys anyway. Peter From jluis.lavin at unavarra.es Mon Nov 14 06:14:43 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 12:14:43 +0100 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Mon Nov 14 06:59:56 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 06:59:56 -0500 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > Hello everybody, > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > worked fine for me. Now I need to perform a multiple BLAST search, but this > time I'd just like to get all the BLAST results in a single out file > instead of having each sequence's report written individually. I've read > the documentation of the module, but due to my short > experience/understanding on complex modules as this one seems to be I can't > figure out where to change the script to achieve my previously mentioned > aim. > Here I post the script I've been using (it's basically the one posted on > the module cookbook). > > #!/c:/Perl -w > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > use Data::Dumper; > > #Here i set the parameters for blast > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > tblastx):\n"; > my $blst = ; > my $prog = "$blst"; > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > env_nr):\n"; > my $dtb = ; > $db = "$dtb"; > print "Enter your cutt off score (1e-n):\n"; > my $cut = ; > my $e_val = "$cut"; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > #Select the file and make the blast. > print "Enter your FASTA file:\n"; > chomp(my $infile = ); > my $r = $remoteBlast->submit_blast($infile); > my $v = 1; > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > TO RETURN!!!!! > while ( my @rids = $remoteBlast->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $remoteBlast->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $remoteBlast->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = > $result->query_name()."\.out";##################open SALIDA, > '>>'."$^T"."Report"."\.out"; > $remoteBlast->save_output($filename);############# > $remoteBlast->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > > > May any of you please explain me how to solve my question? > > Thanks in advence > > With best wishes > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Nov 14 09:07:36 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 09:07:36 -0500 Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single out References: Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Please keep this on list discussions Sent from my iPhone-please excuse typos -- Jason Stajich Begin forwarded message: > From: Jos? Luis Lav?n > Date: November 14, 2011 8:04:25 AM EST > To: Jason Stajich > Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out > > Hello Jason, > > As answering your question: > > " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" > > A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. > I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. > > Thanks in advance > > El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: > if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. > > If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? > > On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > > > Hello everybody, > > > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > > worked fine for me. Now I need to perform a multiple BLAST search, but this > > time I'd just like to get all the BLAST results in a single out file > > instead of having each sequence's report written individually. I've read > > the documentation of the module, but due to my short > > experience/understanding on complex modules as this one seems to be I can't > > figure out where to change the script to achieve my previously mentioned > > aim. > > Here I post the script I've been using (it's basically the one posted on > > the module cookbook). > > > > #!/c:/Perl -w > > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > use Data::Dumper; > > > > #Here i set the parameters for blast > > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > > tblastx):\n"; > > my $blst = ; > > my $prog = "$blst"; > > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > > env_nr):\n"; > > my $dtb = ; > > $db = "$dtb"; > > print "Enter your cutt off score (1e-n):\n"; > > my $cut = ; > > my $e_val = "$cut"; > > > > my @params = ( '-prog' => $prog, > > '-data' => $db, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #Select the file and make the blast. > > print "Enter your FASTA file:\n"; > > chomp(my $infile = ); > > my $r = $remoteBlast->submit_blast($infile); > > my $v = 1; > > > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > > TO RETURN!!!!! > > while ( my @rids = $remoteBlast->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $remoteBlast->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $remoteBlast->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = > > $result->query_name()."\.out";##################open SALIDA, > > '>>'."$^T"."Report"."\.out"; > > $remoteBlast->save_output($filename);############# > > $remoteBlast->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > > > > > May any of you please explain me how to solve my question? > > > > Thanks in advence > > > > With best wishes > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN From cl134 at duke.edu Sun Nov 13 09:42:05 2011 From: cl134 at duke.edu (Cheng-Ruei Lee) Date: Sun, 13 Nov 2011 09:42:05 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Hi all, Bioperl version: 1.006 Here are two error messages when I'm using this module to calculate Fu & Li's statistics: Illegal division by zero at (the Statistics.pm file) line 359 Illegal division by zero at (the Statistics.pm file) line 376 A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. Sincerely, Cheng-Ruei Lee From joluito at gmail.com Mon Nov 14 04:21:31 2011 From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 10:21:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Mon Nov 14 12:02:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:02:22 +0000 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... chris On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > Please keep this on list discussions > > Sent from my iPhone-please excuse typos > > -- > Jason Stajich > > Begin forwarded message: > >> From: Jos? Luis Lav?n >> Date: November 14, 2011 8:04:25 AM EST >> To: Jason Stajich >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out >> >> Hello Jason, >> >> As answering your question: >> >> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" >> >> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. >> I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. >> >> Thanks in advance >> >> El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: >> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. >> >> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? >> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> >>> Hello everybody, >>> >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has >>> worked fine for me. Now I need to perform a multiple BLAST search, but this >>> time I'd just like to get all the BLAST results in a single out file >>> instead of having each sequence's report written individually. I've read >>> the documentation of the module, but due to my short >>> experience/understanding on complex modules as this one seems to be I can't >>> figure out where to change the script to achieve my previously mentioned >>> aim. >>> Here I post the script I've been using (it's basically the one posted on >>> the module cookbook). >>> >>> #!/c:/Perl -w >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::SearchIO; >>> use Data::Dumper; >>> >>> #Here i set the parameters for blast >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >>> tblastx):\n"; >>> my $blst = ; >>> my $prog = "$blst"; >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, >>> env_nr):\n"; >>> my $dtb = ; >>> $db = "$dtb"; >>> print "Enter your cutt off score (1e-n):\n"; >>> my $cut = ; >>> my $e_val = "$cut"; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO' ); >>> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> >>> #Select the file and make the blast. >>> print "Enter your FASTA file:\n"; >>> chomp(my $infile = ); >>> my $r = $remoteBlast->submit_blast($infile); >>> my $v = 1; >>> >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS >>> TO RETURN!!!!! >>> while ( my @rids = $remoteBlast->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $remoteBlast->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $remoteBlast->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = >>> $result->query_name()."\.out";##################open SALIDA, >>> '>>'."$^T"."Report"."\.out"; >>> $remoteBlast->save_output($filename);############# >>> $remoteBlast->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> >>> >>> May any of you please explain me how to solve my question? >>> >>> Thanks in advence >>> >>> With best wishes >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 12:03:04 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:03:04 +0000 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: Cheng, Have you tried the latest CPAN release (we're at 1.006901). chris On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 12:59:35 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:59:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu> On Nov 13, 2011, at 6:24 AM, Peter Cock wrote: > So, Chris and I seem in general agreement that an OBDA v2 > using SQLite but based on essentially the same approach as > the BDB or flat file based OBDA v1 is a good idea. i.e. Tables > mapping record identifiers to file offsets in the original sequence > files. The worry I have is adhering to a specific backend (e.g. SQLite). The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets. Who's to say something similar won't happen to SQLite, or that it is the best option available? Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed). > I hope to get BioRuby on board, they already have an OBDA > v1 support so that shouldn't be too hard. > > Right now I don't recall if BioJava has/had OBDA v1 support, > and if they did if it was affected in their recent move to BioJava > v3 (I understand from their mailing list that some older lower > priority functionality has not all been ported yet). I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?) > Also EMBOSS are likely to be interested, certainly Peter Rice > was interested in the SQLite indexing we're already using in > Biopython for sequence files (i.e. what is effectively the > prototype for OBDA v2). > > Note that in addition to simple indexing of text files, we are > already using the same simple offset + length approach for > indexing binary files (e.g. SFF). I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well. > On the immediate practical side, I think I can edit the > current OBDA website of http://obda.open-bio.org/ > via /home/websites/obda.open-bio.org/html on the > server. See below w/ regards to my thoughts on the wiki. > We need to work out where the current OBDA indexing > specification lives (CVS or SVN?) and perhaps move > that to github. We may need a general OBF organisation > account on git hub for this and any other cross-project > repositories. +1 to a move to github, but maybe this belongs in an OBF-specific organization. And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. > I see there is already an OBDA project on RedMine, > (Chris can you add me to that please?) > https://redmine.open-bio.org/projects/obda > > Peter Done (last night actually, but I didn't have time to respond immediately). chris From David.Messina at sbc.su.se Mon Nov 14 14:31:18 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Nov 2011 20:31:18 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... Yes, it's the --remote option. I've used it, and it works great. The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers. Dave > From jluis.lavin at unavarra.es Mon Nov 14 16:23:31 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 22:23:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: Thank you very much for your answers, but due to them, I'm afraid I didn't explained myself good enough. I'm not looking for another tool to perform a BLAST task. I was just wondering if there was a way to simply change the way the module writes the outputs, so that I can get multiple searches in a single report file instead of having a report for each BLAST search. Maybe there's some issue I ignore, that makes you recommend the use of other tools instead of the Bioperl Remote BLAST module...it would be appreciated if you let me know about that (NCBI server problems with web-services or so)... Thank you for your answers anyway Best wishes 2011/11/14 Fields, Christopher J > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the > various 'blast*' indicating the search is to use a remote database. I > haven't used it, though... > > chris > > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > > > Please keep this on list discussions > > > > Sent from my iPhone-please excuse typos > > > > -- > > Jason Stajich > > > > Begin forwarded message: > > > >> From: Jos? Luis Lav?n > >> Date: November 14, 2011 8:04:25 AM EST > >> To: Jason Stajich > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a > single out > >> > >> Hello Jason, > >> > >> As answering your question: > >> > >> " If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a > table?" > >> > >> A concatenation of BLAST (default format) reports should be OK, since I > have a script to parse that kind of results. Anyway formats 1 or 2 will > also do the trick. > >> I'll be happy to get assistance on how to change the OUTFILE from "a > query a report" to "all queries in the same report", because I don't seem > to be able to do it myself after reading the module documentation. > >> > >> Thanks in advance > >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < > jason.stajich at gmail.com> escribi?: > >> if you want to do a bunch of BLASTs remotely on the cmdline you should > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ > equivalent). This might be faster to do and easier since you need to learn > the programming part too. > >> > >> If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a table? > >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > >> > >>> Hello everybody, > >>> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > >>> worked fine for me. Now I need to perform a multiple BLAST search, but > this > >>> time I'd just like to get all the BLAST results in a single out file > >>> instead of having each sequence's report written individually. I've > read > >>> the documentation of the module, but due to my short > >>> experience/understanding on complex modules as this one seems to be I > can't > >>> figure out where to change the script to achieve my previously > mentioned > >>> aim. > >>> Here I post the script I've been using (it's basically the one posted > on > >>> the module cookbook). > >>> > >>> #!/c:/Perl -w > >>> use Bio::Tools::Run::RemoteBlast; > >>> use Bio::SearchIO; > >>> use Data::Dumper; > >>> > >>> #Here i set the parameters for blast > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > >>> tblastx):\n"; > >>> my $blst = ; > >>> my $prog = "$blst"; > >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, > pdb, > >>> env_nr):\n"; > >>> my $dtb = ; > >>> $db = "$dtb"; > >>> print "Enter your cutt off score (1e-n):\n"; > >>> my $cut = ; > >>> my $e_val = "$cut"; > >>> > >>> my @params = ( '-prog' => $prog, > >>> '-data' => $db, > >>> '-expect' => $e_val, > >>> '-readmethod' => 'SearchIO' ); > >>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > >>> > >>> > >>> #Select the file and make the blast. > >>> print "Enter your FASTA file:\n"; > >>> chomp(my $infile = ); > >>> my $r = $remoteBlast->submit_blast($infile); > >>> my $v = 1; > >>> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE > RESULTS > >>> TO RETURN!!!!! > >>> while ( my @rids = $remoteBlast->each_rid ) { > >>> foreach my $rid ( @rids ) { > >>> my $rc = $remoteBlast->retrieve_blast($rid); > >>> if( !ref($rc) ) { > >>> if( $rc < 0 ) { > >>> $remoteBlast->remove_rid($rid); > >>> } > >>> print STDERR "." if ( $v > 0 ); > >>> sleep 5; > >>> } else { > >>> my $result = $rc->next_result(); > >>> #save the output > >>> my $filename = > >>> $result->query_name()."\.out";##################open SALIDA, > >>> '>>'."$^T"."Report"."\.out"; > >>> $remoteBlast->save_output($filename);############# > >>> $remoteBlast->remove_rid($rid); > >>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>> while ( my $hit = $result->next_hit ) { > >>> next unless ( $v > 0); > >>> print "\thit name is ", $hit->name, "\n"; > >>> while( my $hsp = $hit->next_hsp ) { > >>> print "\t\tscore is ", $hsp->score, "\n"; > >>> } > >>> } > >>> } > >>> } > >>> } > >>> > >>> > >>> May any of you please explain me how to solve my question? > >>> > >>> Thanks in advence > >>> > >>> With best wishes > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> -- > >> -- > >> Dr. Jos? Luis Lav?n Trueba > >> > >> Dpto. de Producci?n Agraria > >> Grupo de Gen?tica y Microbiolog?a > >> Universidad P?blica de Navarra > >> 31006 Pamplona > >> Navarra > >> SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Mon Nov 14 22:53:19 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 22:53:19 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com> sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming. I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. https://redmine.open-bio.org/issues/3313 Jason Can you provide a test script and we'll add a test for this so On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cchehoud at gmail.com Mon Nov 14 20:39:32 2011 From: cchehoud at gmail.com (Christel Chehoud) Date: Mon, 14 Nov 2011 17:39:32 -0800 Subject: [Bioperl-l] Bioperl installation help Message-ID: Dear BioPerl, Thank you for creating such useful code. Unfortunately, every time I try to install Bioperl, it takes me a long time and is a challenging ordeal :( I am a new MAC user and was not able to download bioperl using CPAN. Here is the error I am getting: ERROR: Can't create '/usr/local/bin' Do not have write permissions on '/usr/local/bin' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 CJFIELDS/BioPerl-1.6.0.tar.gz ./Build install -- NOT OK ---- You may have to su to root to install the package (Or you may want to run something like o conf make_install_make_command 'sudo make' to raise your permissions.Warning (usually harmless): 'YAML' not installed, will not store persistent state Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but failure ignored because 'force' in effect so I did: cpan> o conf make_install_make_command 'sudo make' followed by cpan> o conf commit and started over..I got the same number of errors as last time (so I decided not to force install this time). do you have any suggestions: 63 tests and 305 subtests skipped. Failed 11/329 test scripts. 981/17708 subtests failed. Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = 117.20 CPU) Failed 11/329 test programs. 981/17708 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Warning (usually harmless): 'YAML' not installed, will not store persistent state Running Build install make test had returned bad status, won't install without force Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Thanks a lot for your time and help. I appreciate it. Thank you, Christel From casaburi at ceinge.unina.it Tue Nov 15 04:25:25 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST) Subject: [Bioperl-l] Blast > parsing result in Exel Message-ID: <32846407.post@talk.nabble.com> Hy everybody, in this situation froma blast (-m 1) result file : Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 132-291 (59 letters) Database: Scrivania/orchidea/mature_mirBase.fa 21,643 sequences; 470,608 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031 mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031 gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9 gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9 mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9 132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59 12631 5 .............. 18 12630 5 .............. 18 7826 5 ........... 15 7644 19 ........... 9 5394 3 ........... 13 5394 3 ........... 13 BLASTN 2.2.21 [Jun-14-2009] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ... .... .......... ______________________________________________________________ I need to parse in an exel sheet : 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula Is possible from a big blast result file obtain an exel with 5 columns where every field is the first hit of the blast result. Can anyone halp me to fix this problem ??? Also with a little script in perl. Thank you very much -- View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From nisa.dar10 at gmail.com Tue Nov 15 19:49:00 2011 From: nisa.dar10 at gmail.com (nisa.dar) Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST) Subject: [Bioperl-l] print alignment from blast results file Message-ID: <32851673.post@talk.nabble.com> Hi, I am parsing a blast results file. I have found bioperl modules to get query string, homology string and hit string for each hit/hsp. I want to print them in the form of an alignment instead of aligning them individually. this is what I am doing, but it doesn't seem correct while (my $hsp = $hit->next_hsp) { my $start_query_num=$hsp->start('query'); my $query_string=$hsp->query_string; my $end_query_num=$hsp->end('query'); my $homology_string=$hsp->homology_string; my $start_hit_num=$hsp->start('hit'); my $hit_string=$hsp->hit_string; my $end_hit_num=$hsp->end('hit'); my $aln_o = $hsp->get_aln; $query_string=~s/\n//g;#get rid of new line characters $homology_string=~s/\n//g; $hit_string=~s/\n//g; print "

Alignment:


"; print "$start_query_num-$query_string-$end_query_num
"; print "         $homology_string
"; print "$start_hit_num-$hit_string-$end_hit_num

"; } Please let me know how can I print them in the form of an alignment as seen in the blast results file. Thanks -- View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Wed Nov 16 04:11:40 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Nov 2011 09:11:40 +0000 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C wrote: > > Hy everybody, > > in this situation froma blast (-m 1) result file : > > ... > > I need to parse in an exel sheet : > > 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species > > > 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula > > Is possible from a big blast result file obtain an exel with 5 columns where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > > Thank you very much Have you looked at any of the BioPerl BLAST parsing examples? e.g http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/Module:Bio::SearchIO See also http://seqanswers.com/forums/showthread.php?t=15489 Peter From bosborne11 at verizon.net Wed Nov 16 08:19:33 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 16 Nov 2011 08:19:33 -0500 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <32851673.post@talk.nabble.com> References: <32851673.post@talk.nabble.com> Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Nisa, See: http://www.bioperl.org/wiki/HOWTO:SearchIO Brian O. On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > > Hi, > > I am parsing a blast results file. I have found bioperl modules to get query > string, homology string and hit string for each hit/hsp. I want to print > them in the form of an alignment instead of aligning them individually. > > this is what I am doing, but it doesn't seem correct > > while (my $hsp = $hit->next_hsp) { > my > $start_query_num=$hsp->start('query'); > my $query_string=$hsp->query_string; > my $end_query_num=$hsp->end('query'); > my $homology_string=$hsp->homology_string; > my $start_hit_num=$hsp->start('hit'); > my $hit_string=$hsp->hit_string; > my $end_hit_num=$hsp->end('hit'); > my $aln_o = $hsp->get_aln; > $query_string=~s/\n//g;#get rid of new line characters > $homology_string=~s/\n//g; > $hit_string=~s/\n//g; > > print "

Alignment:


"; > print "$start_query_num-$query_string-$end_query_num
"; > print " >         $homology_string
"; > print "$start_hit_num-$hit_string-$end_hit_num

"; > > > > } > > Please let me know how can I print them in the form of an alignment as seen > in the blast results file. > > Thanks > > > -- > View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 11:44:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:44:27 +0000 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu> For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules). This should automatically install the latest version from CPAN. My guess is this will address some of the issues. However, w/o actually seeing what tests failed we can't help. Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB. There are instructions in the installation docs for that. You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan. chris On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 11:46:16 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:46:16 +0000 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> References: <32851673.post@talk.nabble.com> <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Message-ID: small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance). chris On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote: > Nisa, > > See: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > Brian O. > > > On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > >> >> Hi, >> >> I am parsing a blast results file. I have found bioperl modules to get query >> string, homology string and hit string for each hit/hsp. I want to print >> them in the form of an alignment instead of aligning them individually. >> >> this is what I am doing, but it doesn't seem correct >> >> while (my $hsp = $hit->next_hsp) { >> my >> $start_query_num=$hsp->start('query'); >> my $query_string=$hsp->query_string; >> my $end_query_num=$hsp->end('query'); >> my $homology_string=$hsp->homology_string; >> my $start_hit_num=$hsp->start('hit'); >> my $hit_string=$hsp->hit_string; >> my $end_hit_num=$hsp->end('hit'); >> my $aln_o = $hsp->get_aln; >> $query_string=~s/\n//g;#get rid of new line characters >> $homology_string=~s/\n//g; >> $hit_string=~s/\n//g; >> >> print "

Alignment:


"; >> print "$start_query_num-$query_string-$end_query_num
"; >> print " >>         $homology_string
"; >> print "$start_hit_num-$hit_string-$end_hit_num

"; >> >> >> >> } >> >> Please let me know how can I print them in the form of an alignment as seen >> in the blast results file. >> >> Thanks >> >> >> -- >> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 16 12:01:49 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Nov 2011 18:01:49 +0100 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: Hi Christel, Sorry to hear you're having trouble with the installation. It looks like these modules aren't getting installed and are causing the failed tests: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO I would try installing those separately via CPAN first and then trying again to install BioPerl. Also, it was a good idea to set the make_install_make_command option to CPAN, and that should have worked. Unfortunately, there's another installation system called Module::Build that has its own option which may need to be set: cpan> o conf mbuild_install_build_command 'sudo ./Build' That being said, I would suggest you grab the latest version of BioPerl from github instead of using v1.6.1 from CPAN, which is fairly out of date at this point. And unless you're planning to use BioPerl with GBrowse or Bio::Graphics, there's another, simpler way to get BioPerl up and running (assuming you have all the prerequisites like Data::Stag installed): See "Don't want to install BioPerl?" here: http://www.seqxml.org/xml/BioPerl.html Best, Dave On Tue, Nov 15, 2011 at 02:39, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm > line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Wed Nov 16 13:31:46 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Wed, 16 Nov 2011 19:31:46 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: Thank you for your answer Jason, While answering you I figured out how to do it...sometimes you need other people's point of view to see the light. As you pointed out: "what is complicaticated is the file name right now is based on the query name." that's what I expected that could have an easy fix, the issue about the dependency between the outfile name and the query name, this is why I couldn't figure out how to change the name of the output . While reading the code to answer you, I came across the solution. I was persistent on doing it this way because I need to run BLAST remotely on my CGI, that's why I didn't pay attention to all the other options you suggested. Thank you all for your sugestions anyway. ;) Best wishes JL El 16 de noviembre de 2011 18:03, Jason Stajich escribi?: > the answer to your question is to move the line that opens a file to > outside the loop. what is complicaticated is the file name right now is > based on the query name. so you need to think how you want to name the > file. Since this isn't obvious to you, then I think we are suggesting you > probably need to understand programming more, and it might just be easier > to use the tools as we have suggested rather than teaching you to modify > what is just an example code. our suggestions are based on the way we'd > solve the problem so maybe you have other reasons for the direction you > want to take. > > I also think it is not efficient or logical to run > remote blast through the web protocol simply to write it back out with > bioperl since that has to parse it in and then write it out -- why not just > run the program that generates the output directly from NCBI. Or run BLAST > locally for likely more efficient running. > > Finally the bioperl writer may not 100% reproduce the blast output so if > you are planning on further parsing the output that comes out from this > script, it really doesn't seem like a good idea to launder it through > bioperl parser first. > > > > 2011/11/14 Jos? Luis Lav?n > >> Thank you very much for your answers, but due to them, I'm afraid I didn't >> explained myself good enough. >> >> I'm not looking for another tool to perform a BLAST task. I was just >> wondering if there was a way to simply change the way the module writes >> the >> outputs, so that I can get multiple searches in a single report file >> instead of having a report for each BLAST search. >> >> Maybe there's some issue I ignore, that makes you recommend the use of >> other tools instead of the Bioperl Remote BLAST module...it would be >> appreciated if you let me know about that (NCBI server problems with >> web-services or so)... >> >> Thank you for your answers anyway >> >> Best wishes >> >> 2011/11/14 Fields, Christopher J >> >> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for >> the >> > various 'blast*' indicating the search is to use a remote database. I >> > haven't used it, though... >> > >> > chris >> > >> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: >> > >> > > Please keep this on list discussions >> > > >> > > Sent from my iPhone-please excuse typos >> > > >> > > -- >> > > Jason Stajich >> > > >> > > Begin forwarded message: >> > > >> > >> From: Jos? Luis Lav?n >> > >> Date: November 14, 2011 8:04:25 AM EST >> > >> To: Jason Stajich >> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a >> > single out >> > >> >> > >> Hello Jason, >> > >> >> > >> As answering your question: >> > >> >> > >> " If you want to do this within this code I guess the question is >> what >> > format you want the data in - a BLAST report or something more like a >> > table?" >> > >> >> > >> A concatenation of BLAST (default format) reports should be OK, >> since I >> > have a script to parse that kind of results. Anyway formats 1 or 2 will >> > also do the trick. >> > >> I'll be happy to get assistance on how to change the OUTFILE from "a >> > query a report" to "all queries in the same report", because I don't >> seem >> > to be able to do it myself after reading the module documentation. >> > >> >> > >> Thanks in advance >> > >> >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < >> > jason.stajich at gmail.com> escribi?: >> > >> if you want to do a bunch of BLASTs remotely on the cmdline you >> should >> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ >> > equivalent). This might be faster to do and easier since you need to >> learn >> > the programming part too. >> > >> >> > >> If you want to do this within this code I guess the question is what >> > format you want the data in - a BLAST report or something more like a >> table? >> > >> >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> > >> >> > >>> Hello everybody, >> > >>> >> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it >> has >> > >>> worked fine for me. Now I need to perform a multiple BLAST search, >> but >> > this >> > >>> time I'd just like to get all the BLAST results in a single out file >> > >>> instead of having each sequence's report written individually. I've >> > read >> > >>> the documentation of the module, but due to my short >> > >>> experience/understanding on complex modules as this one seems to be >> I >> > can't >> > >>> figure out where to change the script to achieve my previously >> > mentioned >> > >>> aim. >> > >>> Here I post the script I've been using (it's basically the one >> posted >> > on >> > >>> the module cookbook). >> > >>> >> > >>> #!/c:/Perl -w >> > >>> use Bio::Tools::Run::RemoteBlast; >> > >>> use Bio::SearchIO; >> > >>> use Data::Dumper; >> > >>> >> > >>> #Here i set the parameters for blast >> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >> > >>> tblastx):\n"; >> > >>> my $blst = ; >> > >>> my $prog = "$blst"; >> > >>> print "Enter a database to search (nr, refseq_protein, swissprot, >> pat, >> > pdb, >> > >>> env_nr):\n"; >> > >>> my $dtb = ; >> > >>> $db = "$dtb"; >> > >>> print "Enter your cutt off score (1e-n):\n"; >> > >>> my $cut = ; >> > >>> my $e_val = "$cut"; >> > >>> >> > >>> my @params = ( '-prog' => $prog, >> > >>> '-data' => $db, >> > >>> '-expect' => $e_val, >> > >>> '-readmethod' => 'SearchIO' ); >> > >>> >> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >> > >>> >> > >>> >> > >>> #Select the file and make the blast. >> > >>> print "Enter your FASTA file:\n"; >> > >>> chomp(my $infile = ); >> > >>> my $r = $remoteBlast->submit_blast($infile); >> > >>> my $v = 1; >> > >>> >> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE >> > RESULTS >> > >>> TO RETURN!!!!! >> > >>> while ( my @rids = $remoteBlast->each_rid ) { >> > >>> foreach my $rid ( @rids ) { >> > >>> my $rc = $remoteBlast->retrieve_blast($rid); >> > >>> if( !ref($rc) ) { >> > >>> if( $rc < 0 ) { >> > >>> $remoteBlast->remove_rid($rid); >> > >>> } >> > >>> print STDERR "." if ( $v > 0 ); >> > >>> sleep 5; >> > >>> } else { >> > >>> my $result = $rc->next_result(); >> > >>> #save the output >> > >>> my $filename = >> > >>> $result->query_name()."\.out";##################open SALIDA, >> > >>> '>>'."$^T"."Report"."\.out"; >> > >>> $remoteBlast->save_output($filename);############# >> > >>> $remoteBlast->remove_rid($rid); >> > >>> print "\nQuery Name: ", $result->query_name(), "\n"; >> > >>> while ( my $hit = $result->next_hit ) { >> > >>> next unless ( $v > 0); >> > >>> print "\thit name is ", $hit->name, "\n"; >> > >>> while( my $hsp = $hit->next_hsp ) { >> > >>> print "\t\tscore is ", $hsp->score, "\n"; >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> >> > >>> >> > >>> May any of you please explain me how to solve my question? >> > >>> >> > >>> Thanks in advence >> > >>> >> > >>> With best wishes >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> _______________________________________________ >> > >> Bioperl-l mailing list >> > >> Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> >> > >> -- >> > >> -- >> > >> Dr. Jos? Luis Lav?n Trueba >> > >> >> > >> Dpto. de Producci?n Agraria >> > >> Grupo de Gen?tica y Microbiolog?a >> > >> Universidad P?blica de Navarra >> > >> 31006 Pamplona >> > >> Navarra >> > >> SPAIN >> > > >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From l.m.timmermans at students.uu.nl Fri Nov 18 09:15:47 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Fri, 18 Nov 2011 15:15:47 +0100 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C wrote: > I need to parse in an exel sheet : > What you're saying here is nonsense. I think you meant to say you want to output Excel. > Is possible from a big blast result file obtain an exel with 5 columns > where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > There are a number of Perl modules on CPAN for outputting Excel. Try Excel::Writer::XLSX and Spreadsheet::WriteExcel for example. Leon From tzhu at mail.bnu.edu.cn Mon Nov 21 00:17:18 2011 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Mon, 21 Nov 2011 13:17:18 +0800 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn> I can use the "slice" method to split a single sequence alignment into several subalignments. Then is there a corresponding "combine" method to combine such subalignments back? -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From David.Messina at sbc.su.se Mon Nov 21 04:58:51 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 21 Nov 2011 10:58:51 +0100 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Nov 21 06:41:09 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 21 Nov 2011 11:41:09 +0000 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <4ECA38D5.8050709@gmail.com> See the cat method in Bio::Align::Utilities: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat On 21/11/2011 09:58, Dave Messina wrote: > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From zntayl at gmail.com Wed Nov 16 20:07:07 2011 From: zntayl at gmail.com (Nathan Taylor) Date: Wed, 16 Nov 2011 20:07:07 -0500 Subject: [Bioperl-l] seqIO.pm Message-ID: Hello, Can SeqIO.pm convert a file of fastq reads into .phd files. Or, barring that, a file of fastas and file of quals into .phd files? Many thanks, Nathan From gregonomic at yahoo.co.nz Mon Nov 21 07:00:50 2011 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST) Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Hi. I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. Usage: concatenate_alignments.pl -o <... input_alignment_n> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). Greg. ________________________________ From: Dave Messina To: Tao Zhu Cc: BioPerl Sent: Monday, 21 November 2011 7:58 PM Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: concatenate_alignments.pl Type: application/octet-stream Size: 3349 bytes Desc: not available URL: From jason.stajich at gmail.com Mon Nov 21 10:31:50 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 21 Nov 2011 10:31:50 -0500 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com> greg -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out. This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment. https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote: > Hi. > > I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. > > It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. > > Usage: > concatenate_alignments.pl -o <... input_alignment_n> > > > If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). > > Greg. > > > ________________________________ > From: Dave Messina > To: Tao Zhu > Cc: BioPerl > Sent: Monday, 21 November 2011 7:58 PM > Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? > > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Mon Nov 21 11:15:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 21 Nov 2011 16:15:13 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter From cjfields at illinois.edu Mon Nov 21 11:57:29 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 21 Nov 2011 16:57:29 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu> On Nov 21, 2011, at 10:15 AM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: >> Hello, >> >> Can SeqIO.pm convert a file of fastq reads into .phd files. Or, >> barring that, a file of fastas and file of quals into .phd files? >> >> Many thanks, >> Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an > error message? > > Peter This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose. Nathan, if you run into problems with that conversion let us know. chris From rondonbio at yahoo.com.br Mon Nov 21 12:31:21 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST) Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Hi! try this script: #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } my $fastq = $ARGV[0]; my $in = Bio::SeqIO->new( -file => $fastq, ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); my $out = Bio::SeqIO->new ( -file => ">out.phd", ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); while (my $seq = $in->next_seq()) { ?? ? ?$out->write_seq($seq); } exit; Best wishes, Rondon, a brazilian friend. ________________________________ De: Peter Cock Para: Nathan Taylor Cc: bioperl-l at bioperl.org Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 Assunto: Re: [Bioperl-l] seqIO.pm On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Nov 21 15:04:01 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 22 Nov 2011 09:04:01 +1300 Subject: [Bioperl-l] seqIO.pm In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> References: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz> Or you could use the builtin script bp_sreformat.pl --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rondon Neto > Sent: Tuesday, 22 November 2011 6:31 a.m. > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] seqIO.pm > > Hi! try this script: > > #!/usr/bin/perl > use warnings; > use strict; > use Bio::SeqIO; > > if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } > > my $fastq = $ARGV[0]; > > my $in = Bio::SeqIO->new( -file => $fastq, > ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); > > my $out = Bio::SeqIO->new ( -file => ">out.phd", > ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); > > while (my $seq = $in->next_seq()) { > ?? ? ?$out->write_seq($seq); > } > > exit; > > > Best wishes, > Rondon, a brazilian friend. > > > > > > > ________________________________ > De: Peter Cock > Para: Nathan Taylor > Cc: bioperl-l at bioperl.org > Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 > Assunto: Re: [Bioperl-l] seqIO.pm > > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > > Hello, > > > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > > barring that, a file of fastas and file of quals into .phd files? > > > > Many thanks, > > Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an error message? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From goodyearkl at gmail.com Mon Nov 21 21:23:13 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Hi, This may seem like a stupid question but I am just learning bioperl and I am trying to figure out how to get a count of all the characters in my FASTA file. I manged to get the number of sequences using the following. Is there a way to tell bioperl to count the characters? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $count=0; while (my $seq_obj = $seqio_obj->next_seq) { $count++; } #Display the number of sequences present print "There are $count sequences present.\n"; From David.Messina at sbc.su.se Tue Nov 22 03:08:11 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 22 Nov 2011 09:08:11 +0100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, You can use the length method for this. my $seq_length = $seq_obj->length(); Have you taken a look at the beginner's HOWTO? There's a nice table of sequence methods as well lots of other good information in there. http://www.bioperl.org/wiki/HOWTO:Beginners Dave On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From liam.elbourne at mq.edu.au Mon Nov 21 23:11:12 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Tue, 22 Nov 2011 15:11:12 +1100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, I think the length() method is what you're after: .... my $sequence_length = $seq_obj->length(); .... in your case. Have a look at: HOWTO:SeqIO - BioPerl and, HOWTO:Beginners - BioPerl for some more general stuff. Regards, Liam. On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: Message signed with OpenPGP using GPGMail URL: From goodyearkl at gmail.com Tue Nov 22 08:00:55 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? In-Reply-To: References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Thank you for your help. It keeps telling me that it can't find "length" do you think it has to do with the way I am coding it? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $countseq=0; while (my $seq_obj = $seqio_obj->next_seq, ) { $countseq++; } #Display the number of sequences present print "There are $countseq sequences present.\n"; #Count number of charcaters in file my $seq_length = $seq_obj->length ; print $seq_length On Nov 22, 5:08?am, Dave Messina wrote: > Hi Kylie, > > You can use the length method for this. > > my $seq_length = $seq_obj->length(); > > Have you taken a look at the beginner's HOWTO? There's a nice table of > sequence methods as well lots of other good information in there. > > http://www.bioperl.org/wiki/HOWTO:Beginners > > Dave > > > > > > > > > > On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > > Hi, > > This may seem like a stupid question but I am just learning bioperl > > and I am trying to figure out how to get a count of all the characters > > in my FASTA file. I manged to get the number of sequences using the > > following. Is there a way to tell bioperl to count the characters? > > > #!/usr/bin/perl -w > > #Defines perl modules > > #Bio::Seq deal with sequences and their features > > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats > > use Bio::SeqIO; > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > #Count how many sequences are present in file > > my $count=0; > > while (my $seq_obj = $seqio_obj->next_seq) { > > ? ?$count++; > > } > > #Display the number of sequences present > > print "There are $count sequences present.\n"; > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Nov 22 10:50:31 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 22 Nov 2011 15:50:31 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <4ECBC4C7.10401@gmail.com> Hi Kylie, I suspect the error you get is actually "Can't call method length on an undefined value" (please in future report the exact text of any error messages). You declare $seq_obj with "my" in the while loop, but then try to access it outside of the loop. Try printing out the length of each $seq_obj within the while loop. You should always include "use strict;" at the top of your program, that helps to catch errors like this. Cheers, Roy. On 22/11/2011 13:00, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 22 11:13:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 22 Nov 2011 16:13:01 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> This sounds a little homework-y. Sure this isn't for a class? :) One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl. Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length. chris On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Nov 22 15:47:36 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 23 Nov 2011 09:47:36 +1300 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz> Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl As previous posters have hinted, RTFM - the answers are all in there! --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J > Sent: Wednesday, 23 November 2011 5:13 a.m. > To: Kylie Goodyear > Cc: > Subject: Re: [Bioperl-l] Fasta counting script? > > This sounds a little homework-y. Sure this isn't for a class? :) > > One clue (and a good thing to keep in mind): always 'use strict; use warnings;' > with your scripts if you are new to perl. Doing so would let you know there is > a problem with the script the way it is written, specifically, the place where > you are inquiring about the length. > > chris > > On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > > > Thank you for your help. It keeps telling me that it can't find > > "length" do you think it has to do with the way I am coding it? > > > > #!/usr/bin/perl -w > > #Defines perl modules > > > > #Bio::Seq deal with sequences and their features use Bio::Seq; > > > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats use Bio::SeqIO; > > > > > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > > > > > #Count how many sequences are present in file my $countseq=0; while > > (my $seq_obj = $seqio_obj->next_seq, ) { > > $countseq++; > > } > > #Display the number of sequences present print "There are $countseq > > sequences present.\n"; > > > > #Count number of charcaters in file > > my $seq_length = $seq_obj->length ; > > print $seq_length > > > > > > On Nov 22, 5:08 am, Dave Messina wrote: > >> Hi Kylie, > >> > >> You can use the length method for this. > >> > >> my $seq_length = $seq_obj->length(); > >> > >> Have you taken a look at the beginner's HOWTO? There's a nice table > >> of sequence methods as well lots of other good information in there. > >> > >> http://www.bioperl.org/wiki/HOWTO:Beginners > >> > >> Dave > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear > wrote: > >>> Hi, > >>> This may seem like a stupid question but I am just learning bioperl > >>> and I am trying to figure out how to get a count of all the > >>> characters in my FASTA file. I manged to get the number of sequences > >>> using the following. Is there a way to tell bioperl to count the characters? > >> > >>> #!/usr/bin/perl -w > >>> #Defines perl modules > >>> #Bio::Seq deal with sequences and their features use Bio::Seq; > >>> #Bio::SeqIO handles reading and parsing of sequences of many > >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = > >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" > >>> ); #Count how many sequences are present in file my $count=0; while > >>> (my $seq_obj = $seqio_obj->next_seq) { > >>> $count++; > >>> } > >>> #Display the number of sequences present print "There are $count > >>> sequences present.\n"; > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioper... at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf > >> o/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From charles-listes+bioperl at plessy.org Wed Nov 23 05:27:45 2011 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Wed, 23 Nov 2011 19:27:45 +0900 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? Message-ID: <20111123102745.GC20168@merveille.plessy.net> Dear BioPerl developers, I am trying to process some unaligned paired-end reads with Bio::DB::Sam. For each pair, I want to detect a sequence index and a unique molecular identifier in the linker, record them as auxiliary flags, and trim the linker from the read. I collect the pairs through a features iterator, and can access all their data through the high-level Bio::DB::Bam::Alignment API. After modifying them (linker trimming and adding flags), I want to write the resulting pairs as a new unaligned BAM file. I apologise if the solution is trivial, but my problem is that I do not manage to modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as ?$pair[0]->qseq("GATACA")? give errors like ?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. Since I did not find explanations or portsions of source code indicating how to modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan From MEC at stowers.org Wed Nov 23 11:02:26 2011 From: MEC at stowers.org (Cook, Malcolm) Date: Wed, 23 Nov 2011 10:02:26 -0600 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Charles, I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". ~Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Plessy > Sent: Wednesday, November 23, 2011 4:28 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the read. > > I collect the pairs through a features iterator, and can access all their data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 23 14:26:31 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 23 Nov 2011 19:26:31 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Wed Nov 23 17:02:23 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:02:23 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: I apologize that the qseq() method is only allowing read-only access. I will attempt to fix this. Lincoln On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy < charles-listes+bioperl at plessy.org> wrote: > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the > read. > > I collect the pairs through a features iterator, and can access all their > data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as > a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not > manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating > how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lincoln.stein at gmail.com Wed Nov 23 17:05:41 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:05:41 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > According to the docs the low-level API for Bio-Samtools, both read and > write are allowed: > > http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API > > Using the low-level API for this purpose isn't documented as well, though > (the high-level API is read only AFAICT). > > The error message is a standard one generated from the XS bindings where > the passed argument passed isn't mapped correctly. Looking through the > Sam.xs file, qseq() is only prototyped as a reader; the only arg is a > Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a > function specified for Bio::DB::Bam::Alignment names l_qseq() that might be > the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' > prefix): > > .... > > int > bama_l_qseq(b,...) > Bio::DB::Bam::Alignment b > PROTOTYPE: $;$ > CODE: > if (items > 1) > b->core.l_qseq = SvIV(ST(1)); > RETVAL=b->core.l_qseq; > OUTPUT: > RETVAL > > SV* > bama_qseq(b) > Bio::DB::Bam::Alignment b > PROTOTYPE: $ > PREINIT: > char* seq; > int i; > CODE: > seq = Newxz(seq,b->core.l_qseq+1,char); > for (i=0;icore.l_qseq;i++) { > seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; > } > RETVAL = newSVpv(seq,b->core.l_qseq); > Safefree(seq); > OUTPUT: > RETVAL > > > -chris > > On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > > > Charles, > > > > I suggest you reconsider your approach to rather, use `samtools view` to > pipe your reads to stdout in sam format, then stream edit the barcode and > pipe it back to samtools for conversion back to .bam file. > > > > I know this is not what you're asking. I'm pretty sure that direct > answer to your question is, "yes - they are read-only". > > > > ~Malcolm > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy > >> Sent: Wednesday, November 23, 2011 4:28 AM > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > >> > >> Dear BioPerl developers, > >> > >> I am trying to process some unaligned paired-end reads with > Bio::DB::Sam. > >> For > >> each pair, I want to detect a sequence index and a unique molecular > >> identifier in > >> the linker, record them as auxiliary flags, and trim the linker from > the read. > >> > >> I collect the pairs through a features iterator, and can access all > their data > >> through the high-level Bio::DB::Bam::Alignment API. After modifying > them > >> (linker trimming and adding flags), I want to write the resulting pairs > as a > >> new unaligned BAM file. > >> > >> I apologise if the solution is trivial, but my problem is that I do not > manage to > >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > >> ?$pair[0]->qseq("GATACA")? give errors like > >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > >> > >> Since I did not find explanations or portsions of source code > indicating how to > >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > >> > >> Have a nice day, > >> > >> -- > >> Charles Plessy > >> Tsurumi, Kanagawa, Japan > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Wed Nov 23 20:07:09 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 24 Nov 2011 01:07:09 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> , Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu> Ah, okay, makes sense. I thought it was oddly named. :) Chris Sent from my iPad On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" > wrote: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J > wrote: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From ross at cuhk.edu.hk Sun Nov 27 03:24:43 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 27 Nov 2011 16:24:43 +0800 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: References: Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Hi all, To write a script to extract sequence generically for all types of BioLocation objects, I'd like to know if there is any way to check what types (e.g. simple or split) are being processed. Bio::Location::CoordinatePolicyI appears to be doing something similar but it is more like a post checking step. If I parse the genbank file line by line, I can certainly check whether the line contains keywords like "join" but as I'm using something like: my @features=grep{$_->primary_tag eq $chkTags[0]} $seqobj->get_SeqFeatures; foreach (@features) { $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; @gene=[]; I'd appreciate if anybody knows a better integration with the well-developed bioperl module. Thanks a lot. From Russell.Smithies at agresearch.co.nz Sun Nov 27 19:46:05 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Nov 2011 13:46:05 +1300 Subject: [Bioperl-l] Galaxy tools? Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl? I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox. It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space) --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From p.j.a.cock at googlemail.com Sun Nov 27 20:28:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 28 Nov 2011 01:28:33 +0000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: On Monday, November 28, 2011, Smithies, Russell wrote: > Possibly the wrong place to ask but has anyone written > Galaxy tools using BioPerl? > I was thinking of creating blast graphic and format converter > tools as I couldn't see any already available in their toolbox. > It looks like I can just write a Python wrapper for my existing > BioPerl scripts - although I suspect the "correct" method is to > use BioPython methods (but Python annoys me with its lack > of semi-colons and required white-space) Galaxy is agnostic about what language the tools are in, you can use a binary, shell script, Java, Perl, Python etc. Peter From florent.angly at gmail.com Sun Nov 27 21:09:45 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 12:09:45 +1000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: <4ED2ED69.10601@gmail.com> Hi Russell, As Peter said, the tools to be wrapped do not need to be written in Python. I have build a few wrappers for Galaxy, including one for the read simulator Grinder (http://sourceforge.net/projects/biogrinder/), which uses Bioperl and is available in the Galaxy Toolshed (http://sourceforge.net/projects/biogrinder/). It is not very hard to do a wrapper for trivial programs, but becomes more complicated once you start having optional arguments or multiple output files. Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) to parse command-line arguments. I have been thinking about leveraging the information that Getopt::Euclid stores about command-line arguments to automate most of the Galaxy wrapper generation, but I have not gotten to it yet. Florent On 28/11/11 11:28, Peter Cock wrote: > On Monday, November 28, 2011, Smithies, Russell wrote: >> Possibly the wrong place to ask but has anyone written >> Galaxy tools using BioPerl? >> I was thinking of creating blast graphic and format converter >> tools as I couldn't see any already available in their toolbox. >> It looks like I can just write a Python wrapper for my existing >> BioPerl scripts - although I suspect the "correct" method is to >> use BioPython methods (but Python annoys me with its lack >> of semi-colons and required white-space) > Galaxy is agnostic about what language the tools are in, > you can use a binary, shell script, Java, Perl, Python etc. > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Sun Nov 27 23:35:31 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 14:35:31 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules Message-ID: <4ED30F93.4000407@gmail.com> Hi all, I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. I envision the following modules: * Bio::Community::Member module representing members of a community. * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. Any interest? Ideas? Comments? Thanks, Florent From cjfields at illinois.edu Mon Nov 28 14:42:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:42:12 +0000 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> References: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu> Ross, The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects chris On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote: > Hi all, > > To write a script to extract sequence generically for all types of > BioLocation objects, I'd like to know if there is any way to check what > types (e.g. simple or split) are being processed. > > Bio::Location::CoordinatePolicyI appears to be doing something similar but > it is more like a post checking step. If I parse the genbank file line by > line, I can certainly check whether the line contains keywords like "join" > but as I'm using something like: > > my @features=grep{$_->primary_tag eq $chkTags[0]} > $seqobj->get_SeqFeatures; > > > foreach (@features) { > > $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; > > @gene=[]; > > I'd appreciate if anybody knows a better integration with the well-developed > bioperl module. > > Thanks a lot. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 28 14:47:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:47:10 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED30F93.4000407@gmail.com> References: <4ED30F93.4000407@gmail.com> Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? I do think it should be developed on it's own, per our recent discussions re: slimming down core. Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. chris On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > Hi all, > > I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. > > I envision the following modules: > * Bio::Community::Member module representing members of a community. > * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... > * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. > > The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > > Thanks, > > Florent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Nov 28 15:25:13 2011 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 28 Nov 2011 21:25:13 +0100 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: And now to the list too, On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > The idea is to implement these modules in Moose to teach myself Moose. The > members of a community could be a sequence (Bio::SeqI), a species (Bio::S), > an arbitrary string or even other things. I am not quite sure if Bioperl > provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > Sounds like a good use-case for roles, maybe even parametric roles. Leon From florent.angly at gmail.com Mon Nov 28 19:59:24 2011 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 29 Nov 2011 10:59:24 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> Message-ID: <4ED42E6C.6020501@gmail.com> Hi Chris, On 29/11/11 05:47, Fields, Christopher J wrote: > I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? None of these features would be duplicated. Rather, they would be used attributes of the Bio::Community::* objects. For example, a member of a community could have a Bio::SeqI attached to it as well as a Bio::Taxon, etc... > I do think it should be developed on it's own, per our recent discussions re: slimming down core. Yes, the features are so different that it makes sense to have the Bio::Community::* modules as a separate BioPerl distribution (like the Bio-FeatureIO BioPerl distribution). > Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Best, Florent > chris > > On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > >> Hi all, >> >> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. >> >> I envision the following modules: >> * Bio::Community::Member module representing members of a community. >> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... >> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. >> >> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> >> Thanks, >> >> Florent >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 29 00:32:50 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 05:32:50 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote: > And now to the list too, > > On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > >> The idea is to implement these modules in Moose to teach myself Moose. The >> members of a community could be a sequence (Bio::SeqI), a species (Bio::S), >> an arbitrary string or even other things. I am not quite sure if Bioperl >> provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> > > Sounds like a good use-case for roles, maybe even parametric roles. > > Leon Yep, agree totally. It would be a good replacement in most cases for the BioI interfaces. (see also, the Biome project, which I'm slooooooowly working on again, on github) chris From pmr at ebi.ac.uk Tue Nov 29 08:39:52 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 29 Nov 2011 13:39:52 +0000 Subject: [Bioperl-l] BinarySearch.pm Message-ID: <4ED4E0A8.30102@ebi.ac.uk> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. Both appear to be in the Bio/Flat/BinarySearch.pm source file. EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: if ($format =~ /embl/i) { return ('ID', "^ID (\\S+[^; ])", "^ID (\\S+[^; ])", { ACC => q/^AC (\S+);/, VERSION => q/^SV\s+(\S+)/ }); } The ACC secondary index has every record duplicated. This line is duplicated in the write_secondary_indices source code. Is that intentional? print $fh sprintf("%-${length}s",$record); regards, Peter Rice EMBOSS Team From uni.anastasia at gmail.com Sat Nov 26 12:32:48 2011 From: uni.anastasia at gmail.com (anastsia shapiro) Date: Sat, 26 Nov 2011 19:32:48 +0200 Subject: [Bioperl-l] Problem with parsing blast results Message-ID: Hello, I'm running a script that should parse a blast results, using searchIO. Sometimes the script work fines, however sometimes it stops, and I receive the following error. ------------- EXCEPTION ------------- MSG: no data for midline Query ------------------------------------------------------------ STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ blast.pm:1805 STACK toplevel D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 ------------------------------------- While the blast results files were received as a result of running the following blast command: blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I am using bioperl 1.6.1. I read all the forums , and it seems to be a bug, but on version 1.5 it was fixed. I will really appreciate your help, since I am trying to understand the problem for over a month. Regards, Anastasia From bunk at novozymes.com Tue Nov 29 11:46:54 2011 From: bunk at novozymes.com (Jacob Bunk Nielsen) Date: Tue, 29 Nov 2011 17:46:54 +0100 Subject: [Bioperl-l] Problem with parsing blast results In-Reply-To: (anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100") References: Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net> Hi anastsia shapiro writes: > I'm running a script that should parse a blast results, using searchIO. > > Sometimes the script work fines, however sometimes it stops, and I receive > the following error. > > ------------- EXCEPTION ------------- > MSG: no data for midline Query > ------------------------------------------------------------ > STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ > blast.pm:1805 > STACK toplevel > D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 > ------------------------------------- > While the blast results files were received as a result of running the > following blast command: > blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust > no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I don't know why this exact problem arises, but I think you should consider using an output format that is better machine parseable, like the XML format. You specify XML as output format of blastn by using -m 7. When reading the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO. That way I think you are likely to see a lot fewer problems regarding the parsing of blast output. If the above doesn't solve the problem you better show us the code that fails. Best regards Jacob From cjfields at illinois.edu Tue Nov 29 14:11:11 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 19:11:11 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED42E6C.6020501@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > Hi Chris, > > On 29/11/11 05:47, Fields, Christopher J wrote: > ... >> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. > Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > Best, > > Florent chris From cjfields at illinois.edu Tue Nov 29 17:30:58 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 22:30:58 +0000 Subject: [Bioperl-l] BinarySearch.pm In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk> References: <4ED4E0A8.30102@ebi.ac.uk> Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu> Peter, Can you send a test file that is failing? I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files. I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions. Both changes pass tests as is, though, so I have committed them in the meantime. chris On Nov 29, 2011, at 7:39 AM, Peter Rice wrote: > In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. > > Both appear to be in the Bio/Flat/BinarySearch.pm source file. > > EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: > > if ($format =~ /embl/i) { > return ('ID', > "^ID (\\S+[^; ])", > "^ID (\\S+[^; ])", > { > ACC => q/^AC (\S+);/, > VERSION => q/^SV\s+(\S+)/ > }); > } > > The ACC secondary index has every record duplicated. > This line is duplicated in the write_secondary_indices source code. Is that intentional? > > print $fh sprintf("%-${length}s",$record); > > regards, > > Peter Rice > EMBOSS Team > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Nov 29 20:18:41 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 11:18:41 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> Message-ID: <4ED58471.3030106@gmail.com> Chris, Yes, it is exciting to learn something new. I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? Cheers, Florent On 30/11/11 05:11, Fields, Christopher J wrote: > On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > >> Hi Chris, >> >> On 29/11/11 05:47, Fields, Christopher J wrote: >> ... >>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? > Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > >> Best, >> >> Florent > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 29 21:34:00 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 30 Nov 2011 02:34:00 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED58471.3030106@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > Chris, > Yes, it is exciting to learn something new. > I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: https://github.com/bioperl/Bio-Community chris > Cheers, > Florent > > On 30/11/11 05:11, Fields, Christopher J wrote: >> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >> >>> Hi Chris, >>> >>> On 29/11/11 05:47, Fields, Christopher J wrote: >>> ... >>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >> >>> Best, >>> >>> Florent >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Tue Nov 29 21:50:04 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 12:50:04 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: <4ED599DC.6090808@gmail.com> Fantastic! Thank you very much Chris, Florent On 30/11/11 12:34, Fields, Christopher J wrote: > On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > >> Chris, >> Yes, it is exciting to learn something new. >> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? > It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: > > https://github.com/bioperl/Bio-Community > > chris > > >> Cheers, >> Florent >> >> On 30/11/11 05:11, Fields, Christopher J wrote: >>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >>> >>>> Hi Chris, >>>> >>>> On 29/11/11 05:47, Fields, Christopher J wrote: >>>> ... >>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >>> >>>> Best, >>>> >>>> Florent >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Wed Nov 30 00:25:32 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 00:25:32 -0500 Subject: [Bioperl-l] Exception MSG Message-ID: Hello, Brushing up on my BioPerl and I can't figure out this MSG: ------------- EXCEPTION ------------- MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out STACK Bio::Tools::Run::RemoteBlast::save_output /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 ------------------------------------- Here is the code: #!/usr/bin/perl -w use strict; use Bio::Tools::Run::RemoteBlast; #=cut my $prog = 'blastp'; my $db = 'swissprot'; my $e_val = '1e-10'; my @params = ('-prog' => $prog, '-data' => $db, 'expect' => $e_val, 'readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #human database $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; my $v =1; # this is just to turn on and off the messages # Construct the sequence object my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format => "fasta"); while (my $input = $seq_in->next_seq()){ my $r = $factory->submit_blast($input); print STDERR "waiting..." if ($v > 0); while (my @rids = $factory->each_rid()){ foreach my $rid (@rids){ my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if($rc < 0){ $factory->remove_rid($rid); } print STDERR "." if ($v > 0); sleep 5; } else { my $result = $rc->next_result(); #save output my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } Thanks for the help! From jason.stajich at gmail.com Wed Nov 30 01:05:41 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 29 Nov 2011 22:05:41 -0800 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself. On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > Hello, > > Brushing up on my BioPerl and I can't figure out this MSG: > > ------------- EXCEPTION ------------- > > MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > STACK Bio::Tools::Run::RemoteBlast::save_output > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > ------------------------------------- > Here is the code: > > #!/usr/bin/perl -w > > use strict; > > use Bio::Tools::Run::RemoteBlast; > > > #=cut > > my $prog = 'blastp'; > > my $db = 'swissprot'; > > my $e_val = '1e-10'; > > > my @params = ('-prog' => $prog, > > '-data' => $db, > > 'expect' => $e_val, > > 'readmethod' => 'SearchIO' ); > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > #human database > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > [ORGN]'; > > > my $v =1; # this is just to turn on and off the messages > > # Construct the sequence object > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format > => "fasta"); > > > while (my $input = $seq_in->next_seq()){ > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ($v > 0); > > while (my @rids = $factory->each_rid()){ > > foreach my $rid (@rids){ > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if($rc < 0){ > > $factory->remove_rid($rid); > > } > > print STDERR "." if ($v > 0); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save output > > my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > Thanks for the help! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ss2489 at cornell.edu Wed Nov 30 09:32:47 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Wed, 30 Nov 2011 09:32:47 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: If that does not fix it, try using one of the unique identifiers as the file name (gi??) instead of the full query name. The pipe(|) characters might cause problems. On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > I don't think you need to give it the '>' when you specify the filename > for the output. That is done by the filehandle opening itsself. > > On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > > > Hello, > > > > Brushing up on my BioPerl and I can't figure out this MSG: > > > > ------------- EXCEPTION ------------- > > > > MSG: cannot open > >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > > > STACK Bio::Tools::Run::RemoteBlast::save_output > > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > > > ------------------------------------- > > Here is the code: > > > > #!/usr/bin/perl -w > > > > use strict; > > > > use Bio::Tools::Run::RemoteBlast; > > > > > > #=cut > > > > my $prog = 'blastp'; > > > > my $db = 'swissprot'; > > > > my $e_val = '1e-10'; > > > > > > my @params = ('-prog' => $prog, > > > > '-data' => $db, > > > > 'expect' => $e_val, > > > > 'readmethod' => 'SearchIO' ); > > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #human database > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > [ORGN]'; > > > > > > my $v =1; # this is just to turn on and off the messages > > > > # Construct the sequence object > > > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", > -format > > => "fasta"); > > > > > > while (my $input = $seq_in->next_seq()){ > > > > my $r = $factory->submit_blast($input); > > > > print STDERR "waiting..." if ($v > 0); > > > > while (my @rids = $factory->each_rid()){ > > > > foreach my $rid (@rids){ > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if($rc < 0){ > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ($v > 0); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save output > > > > my $filename = > ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > > > > Thanks for the help! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Wed Nov 30 09:34:52 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 09:34:52 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: Surya, As Jason suggested, I removed the '>' and it worked. Thanks for your response. Lom On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha wrote: > If that does not fix it, try using one of the unique identifiers as the > file name (gi??) instead of the full query name. The pipe(|) characters > might cause problems. > > On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > >> I don't think you need to give it the '>' when you specify the filename >> for the output. That is done by the filehandle opening itsself. >> >> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: >> >> > Hello, >> > >> > Brushing up on my BioPerl and I can't figure out this MSG: >> > >> > ------------- EXCEPTION ------------- >> > >> > MSG: cannot open >> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out >> > >> > STACK Bio::Tools::Run::RemoteBlast::save_output >> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 >> > >> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 >> > >> > ------------------------------------- >> > Here is the code: >> > >> > #!/usr/bin/perl -w >> > >> > use strict; >> > >> > use Bio::Tools::Run::RemoteBlast; >> > >> > >> > #=cut >> > >> > my $prog = 'blastp'; >> > >> > my $db = 'swissprot'; >> > >> > my $e_val = '1e-10'; >> > >> > >> > my @params = ('-prog' => $prog, >> > >> > '-data' => $db, >> > >> > 'expect' => $e_val, >> > >> > 'readmethod' => 'SearchIO' ); >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > >> > #human database >> > >> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens >> > [ORGN]'; >> > >> > >> > my $v =1; # this is just to turn on and off the messages >> > >> > # Construct the sequence object >> > >> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", >> -format >> > => "fasta"); >> > >> > >> > while (my $input = $seq_in->next_seq()){ >> > >> > my $r = $factory->submit_blast($input); >> > >> > print STDERR "waiting..." if ($v > 0); >> > >> > while (my @rids = $factory->each_rid()){ >> > >> > foreach my $rid (@rids){ >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) { >> > >> > if($rc < 0){ >> > >> > $factory->remove_rid($rid); >> > >> > } >> > >> > print STDERR "." if ($v > 0); >> > >> > sleep 5; >> > >> > } else { >> > >> > my $result = $rc->next_result(); >> > >> > #save output >> > >> > my $filename = >> ">/Users/mydata/Desktop/".$result->query_name().".out";#error >> > >> > $factory->save_output($filename); >> > >> > $factory->remove_rid($rid); >> > >> > print "\nQuery Name: ", $result->query_name(), "\n"; >> > >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v > 0); >> > >> > print "\thit name is ", $hit->name, "\n"; >> > >> > while( my $hsp = $hit->next_hsp ) { >> > >> > print "\t\tscore is ", $hsp->score, "\n"; >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > >> > >> > Thanks for the help! >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From ericdemuinck at gmail.com Wed Nov 30 18:36:36 2011 From: ericdemuinck at gmail.com (Ericde) Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST) Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form Message-ID: <32886592.post@talk.nabble.com> :-/ I am a newbie and I am trying to retrieve a blast multiple alignment in fasta form. The BLAST output (m -2) gives several alignments (which is good) and the parsing of the xml file seems to list all of these alignments (which is also good) The problem is that the fasta alignment file only includes one of the hits and the alignment does not include all the sequences (including the query sequence). I would like to generate a fasta file that includes all the alignments included in the m -2 output (plus query sequence if possible). I have cobbled together a script (below) ...I will attach the sample blast xml file and the (m -2) file as well....any insight is appreciated :/ #module load perl #give the name of the blast xml file to parse in the line where it says 'file =>' use Bio::SearchIO; #Use m -7 to generate xml file from blastall my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'BLASToutxml'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object #ENTER desired sequence length if( $hsp->length('total') > 50 ) { #ENTER desired percent identity if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; #Print alignment to file #$aln will be a Bio::SimpleAlign object use Bio::AlignIO; my $aln = $hsp->get_aln; #changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file => ">hsp.fas"); $alnIO->write_aln($aln); } } } } } http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml http://old.nabble.com/file/p32886592/hsp.fas hsp.fas -- View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From hrh at fmi.ch Tue Nov 1 10:18:54 2011 From: hrh at fmi.ch (Hotz, Hans-Rudolf) Date: Tue, 1 Nov 2011 11:18:54 +0100 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: Message-ID: Hi Carn? Please allow me to make a few comments: I very much like your idea of writing a free tool to edit and draw sequences. We (ie people working in core Bioinformatics facilities) all suffer from having to deal with files originally created with commercial packages. And on top of all the pain, those commercial packages are very expensive and they don't deliver what they promise to do. Just double checking: Have you looked a the free tools which are available? I am aware of the following ones (as far as I know, they are all GUI based and don't have a command line API): Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html GENtle http://gentle.magnusmanske.de/ GeneCoder http://www.algosome.com/gene-coder/gene-coder.html pDRAW32 http://www.acaclone.com/ Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> UGene http://ugene.unipro.ru/ maybe others on the list know of even better free tools? Also, have you looked at the emboss tool "cirdna" ? WRT file formats: I strongly suggest to stick to embl and genbank format as input and (text) output format. The features are not indexed, but you can create your own when you store the sequences in your system. Internally, you probably wanna keep the data in a 'simpler' format than embl or genbank, anyway. Alternatively, have you looked at gff/gtf as away of getting features? see: http://www.sequenceontology.org/gff3.shtml http://mblab.wustl.edu/GTF22.html I am looking forward to any progress you make Regards, Hans Hans-Rudolf Hotz, PhD Bioinformatics Support Friedrich Miescher Institute for Biomedical Research Maulbeerstrasse 66 4058 Basel/Switzerland On 10/31/11 7:05 PM, "Carn? Draug" wrote: > Hi > > I've been planning on writing a free (as in freedom) tool to edit > sequences and make plamids maps. The idea is to build the command line > tool first and maybe later work on a GUI for it. > > The problem I foresee at the moment while designing it, is how to > change a feature of the sequence. I'm not familiar with all sequence > formats (only fasta, ensembl and genbank) but I can't see how to > specify from the command line what feature to edit since I can't see > any unique identifiers for them. Is there a file format that makes > this easier? Any tips would be most appreciated. > > Thank in advance, > Carn? Draug > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 13:40:30 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 13:40:30 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote: > Hi, > > I am having problems running Bio::Index::Fastq. I get the following error when a quality line begins with '@'. > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: No description line parsed > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368 > STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71 > STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29 > STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147 > STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198 > STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68 > > > Here is an example of a fastq record that is causing this error, The last line which starts with an '@' is actually the qual line. > > @5:105:15806:16092:Y > GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG > + > @9;A565:=8B? > > > i see that chris has partially addressed this in the mailing list > http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html > > However as he pointed out at the time, it appears this may be a fairly large problem. The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not. I can try to push this to the forefront this week, the fix shouldn't be too hard to implement. In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running. > My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0 would work since the header lines are always the first of 4 lines , 0,4,8, etc. That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing. There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again. A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second). The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use. > But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence > > > ## only for single line seq and qual > my $line_count = 0; > while (<$FASTQ>) { > if (/^@/ and $line_count % 4 == 0) { > # $begin is the position of the first character after the '@' > my $begin = tell($FASTQ) - length( $_ ) + 1; > foreach my $id (&$id_parser($_)) { > $self->add_record($id, $i, $begin); > $c++; > } > } > $line_count++; > } > > > -- > BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID? > > There's one called cdbfasta which looks like it might work ? does anyone have experience with it? I haven't, but it appears FASTA-specific. Does it parse FASTQ as well? I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well. May have to look that one up. > Thanks, > sofia > > P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here. chris From p.j.a.cock at googlemail.com Tue Nov 1 14:38:43 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 14:38:43 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J wrote: > > One problem the various Bio* indexers have currently is the lack of > standardization on a specific schema for indexing. ?There are in-roads > towards this (OBDA) that haven't been adequately traveled IMHO, > which need to be taken up again. > Something to switch to open-bio-l at lists.open-bio.org for, http://lists.open-bio.org/mailman/listinfo/open-bio-l We can continue this thread from last summer, http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html ... http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html And CC Peter Rice from EMBOSS too - we chatted about this at ISMB/BOSC 2011 in July - and whomever looks after the OBDA/indexing code in BioRuby and BioJava too. > A second, and maybe this is more specific to BioPerl, is that the > parsers and indexers essentially reimplement the format parsing > in each module, so if there are bugs they have to be independently > fixed (hence why SeqIO works and the indexer doesn't; I wrote the > first but not the second). ?The best place for any optimizations > would be in a unified parser that both the SeqIO and indexer > modules could use. We have that problem to an extent in Biopython's Bio.SeqIO code. The indexing code duplicates some logic of the parsing code (how much depends on the format), sufficient to extract the read ID and the bounds on disk. The two could be more unified but the parsers came first and didn't want to change them at the time. Instead I tried to be rigorous in consistency testing for the index code's unit tests. Regards, Peter From carandraug+dev at gmail.com Tue Nov 1 15:13:06 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Tue, 1 Nov 2011 15:13:06 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: On 1 November 2011 10:18, Hotz, Hans-Rudolf wrote: > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): They are not all free. Just for future reference, here's their licenses: > Serial Cloner Couldn't find a license and the download for linux has no source so I'm guessing proprietary. > GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/ Free under GPL > GeneCoder Proprietary > pDRAW32 Proprietary > Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/ Seems public domain. License is not defined anywhere but the files I checked had the public domain notice on the header > Ape Proprietary ("license" is at the top of AppMain.tcl) > UGene ? ? ? ? ? ? http://ugene.unipro.ru/ Free under GPL > Also, have you looked at the emboss tool "cirdna" ? Free under GPL > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html Considering the already existing alternatives, I'm more likely to collaborate with one of them to do what I want. I'll just have to check them all and decide. I was planning on writing a new tool and contribute it to the scripts section of bioperl since when I googled before all the links only the proprietary tools showed up. Thank you very much for the links. Carn? From roy.chaudhuri at gmail.com Tue Nov 1 15:44:19 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 01 Nov 2011 15:44:19 +0000 Subject: [Bioperl-l] best way to edit sequence features In-Reply-To: References: Message-ID: <4EB013D3.30801@gmail.com> The Sanger Institute's Artemis is good for editing sequence features, and DNAPlotter can be used to produce circular diagrams: http://www.sanger.ac.uk/resources/software/artemis http://www.sanger.ac.uk/resources/software/dnaplotter Roy. On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote: > Hi Carn? > > Please allow me to make a few comments: > > I very much like your idea of writing a free tool to edit and draw > sequences. We (ie people working in core Bioinformatics facilities) all > suffer from having to deal with files originally created with commercial > packages. And on top of all the pain, those commercial packages are very > expensive and they don't deliver what they promise to do. > > > Just double checking: Have you looked a the free tools which are available? > > I am aware of the following ones (as far as I know, they are all GUI based > and don't have a command line API): > > Serial Cloner http://serialbasics.free.fr/Serial_Cloner.html > GENtle http://gentle.magnusmanske.de/ > GeneCoder http://www.algosome.com/gene-coder/gene-coder.html > pDRAW32 http://www.acaclone.com/ > Genome Workbench http://www.ncbi.nlm.nih.gov/projects/gbench/ > Ape http://www.biology.utah.edu/jorgensen/wayned/ape/> > UGene http://ugene.unipro.ru/ > > maybe others on the list know of even better free tools? > > Also, have you looked at the emboss tool "cirdna" ? > > > WRT file formats: I strongly suggest to stick to embl and genbank format as > input and (text) output format. The features are not indexed, but you can > create your own when you store the sequences in your system. Internally, you > probably wanna keep the data in a 'simpler' format than embl or genbank, > anyway. > > Alternatively, have you looked at gff/gtf as away of getting features? > see: > > http://www.sequenceontology.org/gff3.shtml > http://mblab.wustl.edu/GTF22.html > > > > I am looking forward to any progress you make > > Regards, Hans > > > > Hans-Rudolf Hotz, PhD > Bioinformatics Support > > Friedrich Miescher Institute for Biomedical Research > Maulbeerstrasse 66 > 4058 Basel/Switzerland > > > > On 10/31/11 7:05 PM, "Carn? Draug" wrote: > >> Hi >> >> I've been planning on writing a free (as in freedom) tool to edit >> sequences and make plamids maps. The idea is to build the command line >> tool first and maybe later work on a GUI for it. >> >> The problem I foresee at the moment while designing it, is how to >> change a feature of the sequence. I'm not familiar with all sequence >> formats (only fasta, ensembl and genbank) but I can't see how to >> specify from the command line what feature to edit since I can't see >> any unique identifiers for them. Is there a file format that makes >> this easier? Any tips would be most appreciated. >> >> Thank in advance, >> Carn? Draug >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Tue Nov 1 16:02:24 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 1 Nov 2011 09:02:24 -0700 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Jason On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J > wrote: >> >> One problem the various Bio* indexers have currently is the lack of >> standardization on a specific schema for indexing. There are in-roads >> towards this (OBDA) that haven't been adequately traveled IMHO, >> which need to be taken up again. >> > > Something to switch to open-bio-l at lists.open-bio.org for, > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > We can continue this thread from last summer, > http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html > ... > http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html > > And CC Peter Rice from EMBOSS too - we chatted about this > at ISMB/BOSC 2011 in July - and whomever looks after the > OBDA/indexing code in BioRuby and BioJava too. > >> A second, and maybe this is more specific to BioPerl, is that the >> parsers and indexers essentially reimplement the format parsing >> in each module, so if there are bugs they have to be independently >> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >> first but not the second). The best place for any optimizations >> would be in a unified parser that both the SeqIO and indexer >> modules could use. > > We have that problem to an extent in Biopython's Bio.SeqIO code. > The indexing code duplicates some logic of the parsing code > (how much depends on the format), sufficient to extract the read > ID and the bounds on disk. The two could be more unified but > the parsers came first and didn't want to change them at the time. > Instead I tried to be rigorous in consistency testing for the index > code's unit tests. > > Regards, > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 1 17:44:25 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Nov 2011 17:44:25 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point. > I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys. I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype. Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data. The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)? Or are there problems afoot there we're unaware of? Re: specifics, I think Biopython uses SQLite, is that correct Peter? chris > Jason > On Nov 1, 2011, at 7:38 AM, Peter Cock wrote: > >> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J >> wrote: >>> >>> One problem the various Bio* indexers have currently is the lack of >>> standardization on a specific schema for indexing. There are in-roads >>> towards this (OBDA) that haven't been adequately traveled IMHO, >>> which need to be taken up again. >>> >> >> Something to switch to open-bio-l at lists.open-bio.org for, >> http://lists.open-bio.org/mailman/listinfo/open-bio-l >> >> We can continue this thread from last summer, >> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html >> ... >> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html >> >> And CC Peter Rice from EMBOSS too - we chatted about this >> at ISMB/BOSC 2011 in July - and whomever looks after the >> OBDA/indexing code in BioRuby and BioJava too. >> >>> A second, and maybe this is more specific to BioPerl, is that the >>> parsers and indexers essentially reimplement the format parsing >>> in each module, so if there are bugs they have to be independently >>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the >>> first but not the second). The best place for any optimizations >>> would be in a unified parser that both the SeqIO and indexer >>> modules could use. >> >> We have that problem to an extent in Biopython's Bio.SeqIO code. >> The indexing code duplicates some logic of the parsing code >> (how much depends on the format), sufficient to extract the read >> ID and the bounds on disk. The two could be more unified but >> the parsers came first and didn't want to change them at the time. >> Instead I tried to be rigorous in consistency testing for the index >> code's unit tests. >> >> Regards, >> >> Peter >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From p.j.a.cock at googlemail.com Tue Nov 1 18:06:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 18:06:50 +0000 Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J wrote: > On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: > >> I think a different indexer is needed for the scale of key/value >> pairs we see in fastq files if we want to make a fast lookup by >> ID. I think speed is of essence for this type of solution and so >> a forced all records must be 4 lines long is okay for this type >> of implementation. > > This can always be an early optimization, that's easy enough. > But I'm sure we will have to deal with multi-line seq/qual > FASTQ at some point. > >> I found NOSQL implementations to be much better >> performance and than any of the BDB type solutions -- they >> end up being really slow at above 1-5M keys. ?I used >> TokyoCabinet and KyotoCabinet to do indexing of accession >> -> taxonomy ID and found it quite fast for the needs. I >> haven't tried storing 100bp reads + qual string as the >> value in it yet but I think it could be done, certainly worth >> a prototype. > > Adding a middle layer where the backend storage is abstracted > is the probably the (best|most flexible) option, converging on a > good default that will work for this data. ?The actual interface is > in place, though would it be more feasible to go the OBDA > (converge on a cross-Bio* compatible schema)? ?Or are there > problems afoot there we're unaware of? > > Re: specifics, I think Biopython uses SQLite, is that correct Peter? > > chris Yes, we're using SQLite3 to store essentially a list of filenames and their format as one table, and then in the main table an entry for each sequence recording the ID (only one accession, unlike OBDA which had infrastructure for a secondary accession), file number, offset of the start of the record, and optionally the length of the record on disk. i.e. Basically what OBDA does, but using SQLite rather than BDB (not included in Python 3) or a flat file index (poor performance with large datasets). I find this design attractive on several levels: * File format neutral, covers FASTA, FASTQ, GenBank, etc * Preserves the original file untouched * Index is a small single file (thanks to SQLite) * Back end could be switched out * Could be applied to compressed file formats * Reuses existing parsing code to access entries This could easily form basis of OBDA v2, the main points of difference I anticipate between the Bio* projects would be naming conventions for the different file formats, and what we consider to be the default record ID of each read (e.g. which field in a GenBank file - although agreement here is not essential). Some of that was already settled in principle with OBDA v1. On the other hand, you could try and store the parsed data itself, which is where NOSQL looks more interesting. That essentially requires the ability to serialise your annotated sequence object model to disk - which would be tricky to do cross project (much more ambitious than BioSQL is). It also means the "index" becomes very large because it now holds all the original data. Peter From wenbinmei at gmail.com Wed Nov 2 04:25:32 2011 From: wenbinmei at gmail.com (wenbin mei) Date: Wed, 2 Nov 2011 00:25:32 -0400 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment Message-ID: Hi, I need some help in coding. I have a multiple sequence alignment which has gaps. And also I have a reference genome sequence in the alignment which I know all the coordinates for the protein coding genes. I want to extract all these protein coding genes alignment from the big alignment. I am using Bio SimpleAlign but the question is that due to the gaps in the alignment, the coordinates has shifted in the alignment. I wonder is there a way I can not count the gaps and still be able to extract the protein alignment. One way I can do is remove the gaps in the reference first and then extract the sequence. But I don't like this way ... Thank you for help. -best, wenbin From dejian.zhao at gmail.com Wed Nov 2 13:33:18 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Wed, 02 Nov 2011 21:33:18 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree Message-ID: <4EB1469E.4050108@gmail.com> There are various packages on CPAN to cope with phylogenetic analysis. I wonder which module can read the output from other phylogenetic softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to produce a picture which combines the phylogenetic tree and the structure of each gene. From roy.chaudhuri at gmail.com Wed Nov 2 13:49:46 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 02 Nov 2011 13:49:46 +0000 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB1469E.4050108@gmail.com> References: <4EB1469E.4050108@gmail.com> Message-ID: <4EB14A7A.30307@gmail.com> MEGA can export trees in Newick format, which can be read by Bio::TreeIO. The tree can be drawn in EPS format using Bio::Tree::Draw::Cladogram. See: http://www.bioperl.org/wiki/HOWTO:Trees Roy. On 02/11/2011 13:33, Dejian Zhao wrote: > There are various packages on CPAN to cope with phylogenetic analysis. I > wonder which module can read the output from other phylogenetic > softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to > produce a picture which combines the phylogenetic tree and the structure > of each gene. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Wed Nov 2 16:29:45 2011 From: jun.yin at ucd.ie (Jun Yin) Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT) Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: References: Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie> Hi, You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. $aln2 = $aln->slice(20, 30); Cheers, Jun ----- Original Message ----- From: wenbin mei Date: Wednesday, November 2, 2011 5:51 am Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment To: bioperl-l at lists.open-bio.org > Hi, > > I need some help in coding. I have a multiple sequence alignment > which has > gaps. And also I have a reference genome sequence in the > alignment which I > know all the coordinates for the protein coding genes. I want to > extractall these protein coding genes alignment from the big > alignment. I am using > Bio SimpleAlign but the question is that due to the gaps in the > alignment,the coordinates has shifted in the alignment. I wonder > is there a way I can > not count the gaps and still be able to extract the protein > alignment. One > way I can do is remove the gaps in the reference first and then > extract the > sequence. But I don't like this way ... Thank you for help. > > -best, > wenbin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dejian.zhao at gmail.com Thu Nov 3 01:39:22 2011 From: dejian.zhao at gmail.com (Dejian Zhao) Date: Thu, 03 Nov 2011 09:39:22 +0800 Subject: [Bioperl-l] Modules to read MEGA output and reproduce the phylogenetic tree In-Reply-To: <4EB14A7A.30307@gmail.com> References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com> Message-ID: <4EB1F0CA.80309@gmail.com> That's great! Many thanks, Roy. On 2011-11-2 21:49, Roy Chaudhuri wrote: > MEGA can export trees in Newick format, which can be read by > Bio::TreeIO. The tree can be drawn in EPS format using > Bio::Tree::Draw::Cladogram. See: > http://www.bioperl.org/wiki/HOWTO:Trees > > Roy. > > On 02/11/2011 13:33, Dejian Zhao wrote: >> There are various packages on CPAN to cope with phylogenetic analysis. I >> wonder which module can read the output from other phylogenetic >> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to >> produce a picture which combines the phylogenetic tree and the structure >> of each gene. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From noncoding at gmail.com Thu Nov 3 09:59:26 2011 From: noncoding at gmail.com (Remo Sanges) Date: Thu, 03 Nov 2011 10:59:26 +0100 Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie> References: <7300ecdd1dd56.4eb16ff9@ucd.ie> Message-ID: <4EB265FE.30909@gmail.com> To get the location in the initial sequence starting from a column in a multiple alignment you can: 1) create a Bio::LocatableSeq compliant object by using the method each_seq_with_id on the SimpleAlign object 2) then using the method location_from_column on the created LocatableSeq object HTH ERemo -- Remo Sanges Bioinformatics - Animal Physiology and Evolution Stazione Zoologica Anton Dohrn Villa Comunale, 80121 Napoli - Italy +39 081 5833428 On 11/2/11 5:29 PM, Jun Yin wrote: > Hi, > > You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g. > > $aln2 = $aln->slice(20, 30); > > Cheers, > Jun > > > ----- Original Message ----- > From: wenbin mei > Date: Wednesday, November 2, 2011 5:51 am > Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment > To: bioperl-l at lists.open-bio.org > >> Hi, >> >> I need some help in coding. I have a multiple sequence alignment >> which has >> gaps. And also I have a reference genome sequence in the >> alignment which I >> know all the coordinates for the protein coding genes. I want to >> extractall these protein coding genes alignment from the big >> alignment. I am using >> Bio SimpleAlign but the question is that due to the gaps in the >> alignment,the coordinates has shifted in the alignment. I wonder >> is there a way I can >> not count the gaps and still be able to extract the protein >> alignment. One >> way I can do is remove the gaps in the reference first and then >> extract the >> sequence. But I don't like this way ... Thank you for help. >> >> -best, >> wenbin >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From G.Gallone at sms.ed.ac.uk Thu Nov 3 11:50:11 2011 From: G.Gallone at sms.ed.ac.uk (Giuseppe G.) Date: Thu, 03 Nov 2011 11:50:11 +0000 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk> Hi, I would be grateful if you could shed some light on the exact meaning of the method overall_percentage_identity in Bio::SimpleAlign. If I understand correctly, the method works by considering only aminoacids that are identical over all the members of the alignment, and then averaging over the total number of aminoacids in the sequence. Is this correct? Thank you Giuseppe -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From David.Messina at sbc.su.se Thu Nov 3 13:22:21 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 3 Nov 2011 14:22:21 +0100 Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of overall_percentage_identity? In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk> References: <4EB27FF3.9050203@sms.ed.ac.uk> Message-ID: Hi Giuseppe, If I understand correctly, the method works by considering only aminoacids > that are identical over all the members of the alignment Yes. > , and then averaging over the total number of aminoacids in the sequence. > Is this correct? > Almost. By default, the denominator is the alignment length, namely the length of the MSA including gaps. By means of the 'short' and 'long' options, it's also possible to use the shortest or longest sequence's ungapped lengths as the denominator. Dave From cjfields at illinois.edu Thu Nov 3 18:28:36 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 18:28:36 +0000 Subject: [Bioperl-l] OBDA redux? was Re: Bio::Index::Fastq '@' in qual In-Reply-To: References: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu> <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com> <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu> Message-ID: (side thread, so re-titling...) On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: > On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J > wrote: >> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote: >> >>> I think a different indexer is needed for the scale of key/value >>> pairs we see in fastq files if we want to make a fast lookup by >>> ID. I think speed is of essence for this type of solution and so >>> a forced all records must be 4 lines long is okay for this type >>> of implementation. >> >> This can always be an early optimization, that's easy enough. >> But I'm sure we will have to deal with multi-line seq/qual >> FASTQ at some point. >> >>> I found NOSQL implementations to be much better >>> performance and than any of the BDB type solutions -- they >>> end up being really slow at above 1-5M keys. I used >>> TokyoCabinet and KyotoCabinet to do indexing of accession >>> -> taxonomy ID and found it quite fast for the needs. I >>> haven't tried storing 100bp reads + qual string as the >>> value in it yet but I think it could be done, certainly worth >>> a prototype. >> >> Adding a middle layer where the backend storage is abstracted >> is the probably the (best|most flexible) option, converging on a >> good default that will work for this data. The actual interface is >> in place, though would it be more feasible to go the OBDA >> (converge on a cross-Bio* compatible schema)? Or are there >> problems afoot there we're unaware of? >> >> Re: specifics, I think Biopython uses SQLite, is that correct Peter? >> >> chris > > Yes, we're using SQLite3 to store essentially a list of filenames > and their format as one table, and then in the main table an > entry for each sequence recording the ID (only one accession, > unlike OBDA which had infrastructure for a secondary accession), > file number, offset of the start of the record, and optionally the > length of the record on disk. > > i.e. Basically what OBDA does, but using SQLite rather > than BDB (not included in Python 3) or a flat file index > (poor performance with large datasets). > > I find this design attractive on several levels: > * File format neutral, covers FASTA, FASTQ, GenBank, etc > * Preserves the original file untouched > * Index is a small single file (thanks to SQLite) > * Back end could be switched out > * Could be applied to compressed file formats > * Reuses existing parsing code to access entries > > This could easily form basis of OBDA v2, the main points > of difference I anticipate between the Bio* projects would > be naming conventions for the different file formats, and > what we consider to be the default record ID of each read > (e.g. which field in a GenBank file - although agreement > here is not essential). Some of that was already settled in > principle with OBDA v1. The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested). > On the other hand, you could try and store the parsed data > itself, which is where NOSQL looks more interesting. That > essentially requires the ability to serialise your annotated > sequence object model to disk - which would be tricky to do > cross project (much more ambitious than BioSQL is). It also > means the "index" becomes very large because it now holds > all the original data. > > Peter For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc). Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs). Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully. Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs. chris From p.j.a.cock at googlemail.com Thu Nov 3 18:52:50 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 3 Nov 2011 18:52:50 +0000 Subject: [Bioperl-l] OBDA redux? Message-ID: On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J wrote: > (side thread, so re-titling...) > And CC'ing open-bio-l, which is a better home for this than bioperl-l, where OBDA v2 talk came up again in discussion of a BioPerl indexing problem. Archive links for thread here: http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >> >> Yes, we're using SQLite3 to store essentially a list of filenames >> and their format as one table, and then in the main table an >> entry for each sequence recording the ID (only one accession, >> unlike OBDA which had infrastructure for a secondary accession), >> file number, offset of the start of the record, and optionally the >> length of the record on disk. >> >> i.e. Basically what OBDA does, but using SQLite rather >> than BDB (not included in Python 3) or a flat file index >> (poor performance with large datasets). >> >> I find this design attractive on several levels: >> * File format neutral, covers FASTA, FASTQ, GenBank, etc >> * Preserves the original file untouched >> * Index is a small single file (thanks to SQLite) >> * Back end could be switched out >> * Could be applied to compressed file formats >> * Reuses existing parsing code to access entries >> >> This could easily form basis of OBDA v2, the main points >> of difference I anticipate between the Bio* projects would >> be naming conventions for the different file formats, and >> what we consider to be the default record ID of each read >> (e.g. which field in a GenBank file - although agreement >> here is not essential). Some of that was already settled in >> principle with OBDA v1. > > The primary/secondary IDs could be configurable with a sane > default, I think the bioperl implementations allowed this (and > it is certainly something that will be requested). One reason I went with a single ID only was to keep the Python dictionary based API simple (think hash in Perl). You don't get secondary keys in a Python dict or a hash ;) As a nod to flexibility, in Biopython's Bio.SeqIO indexing you can provide a call back function to map the suggested ID to something else. Obviously this doesn't give the full flexibility of extracting a field from the record's annotation because we don't parse the whole record during indexing (it would be too slow). However, I'm happy for there to be an *optional* secondary key in an OBDA v2 SQLite schema, but Biopython probably won't populate it. We could optionally use it rather than the primary ID on loading an existing index though. Personally I would stick with one key in the index - it should be faster and makes it simpler to switch the back end if we need to later. If anyone wants a second key, they can build a second index *grin*. >> On the other hand, you could try and store the parsed data >> itself, which is where NOSQL looks more interesting. That >> essentially requires the ability to serialise your annotated >> sequence object model to disk - which would be tricky to do >> cross project (much more ambitious than BioSQL is). It also >> means the "index" becomes very large because it now holds >> all the original data. >> >> Peter > > For a fully cross-Bio* compliant format, I don't think it's feasible > to use serialized data unless they are serialized in something > that is easily deserialized across HLLs (JSON, BSON, YAML, > XML, etc). Either that, or such data is stored concurrently with > the binary blob, along with meta data that indicates the source > of the blob, parser, version, etc, etc (unless there are tools out > there that reliably interconvert serialized complex data structures > between HLLs). Anyway you go about it, it seems like it could > be a major ball of hurt, unless implemented very carefully. You missed out RDF as a serialisation ;) But yes, going down the shared serialisation route is going to be messy - as you are well aware: > Aside: I think this was one of the problems with > Bio::DB::SeqFeature::Store, in that it at one point stored > Perl-specific Storable blobs. > > chris Peter From cjfields at illinois.edu Thu Nov 3 19:47:51 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 3 Nov 2011 19:47:51 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J > wrote: >> (side thread, so re-titling...) >> > And CC'ing open-bio-l, which is a better home for this than bioperl-l, > where OBDA v2 talk came up again in discussion of a BioPerl indexing > problem. Archive links for thread here: > > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html > http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html yes, good idea... >> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>> >>> Yes, we're using SQLite3 to store essentially a list of filenames >>> and their format as one table, and then in the main table an >>> entry for each sequence recording the ID (only one accession, >>> unlike OBDA which had infrastructure for a secondary accession), >>> file number, offset of the start of the record, and optionally the >>> length of the record on disk. >>> >>> i.e. Basically what OBDA does, but using SQLite rather >>> than BDB (not included in Python 3) or a flat file index >>> (poor performance with large datasets). >>> >>> I find this design attractive on several levels: >>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>> * Preserves the original file untouched >>> * Index is a small single file (thanks to SQLite) >>> * Back end could be switched out >>> * Could be applied to compressed file formats >>> * Reuses existing parsing code to access entries >>> >>> This could easily form basis of OBDA v2, the main points >>> of difference I anticipate between the Bio* projects would >>> be naming conventions for the different file formats, and >>> what we consider to be the default record ID of each read >>> (e.g. which field in a GenBank file - although agreement >>> here is not essential). Some of that was already settled in >>> principle with OBDA v1. >> >> The primary/secondary IDs could be configurable with a sane >> default, I think the bioperl implementations allowed this (and >> it is certainly something that will be requested). > > One reason I went with a single ID only was to keep the > Python dictionary based API simple (think hash in Perl). > You don't get secondary keys in a Python dict or a hash ;) > > As a nod to flexibility, in Biopython's Bio.SeqIO indexing you > can provide a call back function to map the suggested ID to > something else. Obviously this doesn't give the full flexibility > of extracting a field from the record's annotation because we > don't parse the whole record during indexing (it would be too > slow). Same with bioperl. > However, I'm happy for there to be an *optional* secondary > key in an OBDA v2 SQLite schema, but Biopython probably > won't populate it. We could optionally use it rather than the > primary ID on loading an existing index though. Optional implementation of that is fine by me. > Personally I would stick with one key in the index - it should > be faster and makes it simpler to switch the back end if we > need to later. If anyone wants a second key, they can build > a second index *grin*. That's easy enough. >>> On the other hand, you could try and store the parsed data >>> itself, which is where NOSQL looks more interesting. That >>> essentially requires the ability to serialise your annotated >>> sequence object model to disk - which would be tricky to do >>> cross project (much more ambitious than BioSQL is). It also >>> means the "index" becomes very large because it now holds >>> all the original data. >>> >>> Peter >> >> For a fully cross-Bio* compliant format, I don't think it's feasible >> to use serialized data unless they are serialized in something >> that is easily deserialized across HLLs (JSON, BSON, YAML, >> XML, etc). Either that, or such data is stored concurrently with >> the binary blob, along with meta data that indicates the source >> of the blob, parser, version, etc, etc (unless there are tools out >> there that reliably interconvert serialized complex data structures >> between HLLs). Anyway you go about it, it seems like it could >> be a major ball of hurt, unless implemented very carefully. > > You missed out RDF as a serialisation ;) > > But yes, going down the shared serialisation route is going > to be messy - as you are well aware: > >> Aside: I think this was one of the problems with >> Bio::DB::SeqFeature::Store, in that it at one point stored >> Perl-specific Storable blobs. >> >> chris > > Peter yes, it's a problem w/o an easy solution. Anyway, I think an implementation of such at this point would be a premature optimization. chris From biojiangke at gmail.com Tue Nov 8 22:29:54 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST) Subject: [Bioperl-l] Some questions about the Bio::PopGen In-Reply-To: References: Message-ID: <32805996.post@talk.nabble.com> I think the pi calculated in the function isn't really the pi as defined. You need to divide the value by total number of sites (in your case, it's 5, which is not your individual number but sequence length). I think the reason they implemented this way is that sometimes it's easier to work only with variable sites. The aln to population function converts an aln object to a population object. You can't really see the object unless you write additional codes to write it out or do some calculations on it. The third question depends on your specific needs. For population level analyses of molecular evolution, I usually create a multiple sequence alignment with other applications (clustalw etc), then manually adjust the alignments to make sure they represent homology. I wouldn't touch the alignment once this is done but only make an aln (or whatever format you want) for inputting to analyses applications, like Bio::PopGen (usually use the aln_to_population function you're using now). Qian Zhao wrote: > > Hi > Recently, I am learning how to caculate pi, Fst, Tajima D using > Bio::PopGen. > I am not familiar with Perl and I am really confused with the following > problems. > (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used > to caculate is this: > __DATA__ > 01 A01 A > 01 A02 A > 01 A03 A > 01 A04 A > 01 A05 A > 02 A01 A > 02 A02 T > 02 A03 T > 02 A04 T > 02 A05 T > 03 A01 G > 03 A02 G > 03 A03 G > 03 A04 G > 03 A05 G > 04 A01 G > 04 A02 G > 04 A03 C > 04 A04 C > 04 A05 G > 05 A01 T > 05 A02 C > 05 A03 T > 05 A04 T > 05 A05 T > And I am not sure if I can use these sequences below to demostrate the > prettybase format above: >>A01 > AAGGT >>A02 > ATGGC >>A03 > ATGCT >>A04 > ATGCT >>A05 > ATGGT > The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I > use DnaSP. I find that if the 1.4/5=0.28, which means that if the number > from Bio::PopGen::Statistics is divided by the individula number, the > result > would be exactly the same. Is there something wrong in my perl script? The > code I used was below: > #/usr/bin/perl -w > use warnings; > use strict; > use Bio::PopGen::Genotype; > my $genotype = Bio::PopGen::Genotype->new(-marker_name => 'gene_1', > -individual_id => '001', > -alleles => ['1','5'] ); > use Bio::PopGen::Individual; > my $ind = Bio::PopGen::Individual->new(-unique_id => '001', > -genotypes => [$genotype] ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > $ind->add_Genotype( > Bio::PopGen::Genotype->new(-alleles => ['1', '5'], > -marker_name => 'gene_1') > ); > use Bio::PopGen::Population; > my $pop = Bio::PopGen::Population->new(-name => 'Bm', > -description => 'description', > -individuals => [$ind] ); > use Bio::PopGen::IO; > use Bio::PopGen::Statistics; > my $nummarkers = $pop->get_marker_names; > my $stats = Bio::PopGen::Statistics->new(); > my $io = Bio::PopGen::IO->new (-format => 'prettybase', > -file => '1.txt'); > if( my $pop = $io->next_population ) { > my $pi = $stats->pi($pop, $nummarkers); > print "pi is $pi\n"; > my @inds; > for my $ind ( $pop->get_Individuals ) { > if( $ind->unique_id =~ /A0[1-3]/ ) { > push @inds, $ind; > } > } > print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n"; > } > > (2) I want to use Bio::PopGen::Utilities to translate the alignment file > to > the population file. However, I can not find the result file after the > program. I use the following code: > use Bio::PopGen::Utilities; > use Bio::AlignIO; > > my $in = Bio::AlignIO->new(-file => 't/data/t7.aln', > -format => 'clustalw'); > my $aln = $in->next_aln; > my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln); > my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model => > 'cod', > -alignment => > $aln); > I am not sure where I should add my result file' name in the code. > (3) If my file contains a lot of individual sequences and one individual > has > one genotype. I'd like to know how can I use the Bio::PopGen::Individual, > Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which > can used in Bio::PopGen::Statistics ? > > I will be great appreciated if I can get the answers. Thanks for your time > and Best Wishes! > Qian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From biojiangke at gmail.com Tue Nov 8 22:51:22 2011 From: biojiangke at gmail.com (vitis) Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST) Subject: [Bioperl-l] questions about the bioperl module Bio::PopGen::Statistics In-Reply-To: <201106012030039537050@gmail.com> References: <201106012030039537050@gmail.com> Message-ID: <32805997.post@talk.nabble.com> If you read the Bio::PopGen doc, you'll see there is an optional argument for the function that calculates pi, which is taking the number of sites into consideration. Also, when you use the aln_to_population function to input an alignment, you can use the option to take in all sites, including the monomorphic sites. I think if you implement both in your script, you'll get the same pi value as from other applications like DnaSP. In terms of sliding window analyses, you may have to implement your own method to move along the windows, but I think DnaSP is ready to do that, you don't have to write your won script. lvu.jun wrote: > > Hi, there, > I am trying to calculate the population genetics parameters such as pi > using the bioperl module Bio::PopGen::Statistics. But I found that the > method only requires the input of the marker genotype of every individuals > for the population. I don't know why the module does not take the DNA > sequence length into consideration when calculating the pi value. > According to the definition of the pi value, besides the polymorphic > sites, we also need the monomorphic sites that should be incorporated in > the denominator when doing the calculation. Is it right? therefore I'm > confused about the module, who can tell me why it can correctly calculate > the pi value only with the marker(polymorphic) genotype? > Another question, if I want to calculate the pi value using the sliding > window along the genome, how can I do this using the > Bio::PopGen::Statistics module? > Thanks for your help! > Yours sincerely, > Jun > > Chinese Academy of Sciences > > 2011-06-01 > > > > lvu.jun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shachigahoimbi at gmail.com Wed Nov 9 05:22:33 2011 From: shachigahoimbi at gmail.com (Shachi Gahoi) Date: Wed, 9 Nov 2011 10:52:33 +0530 Subject: [Bioperl-l] Run FGENESH using bioperl Message-ID: Dear All. I have multi-fasta sequence file and I want to run FGENESH and I would like to run the FGENESH for sequence one by one stored in multi-fasta sequence file. Is it possible using Bioperl ? Please guide me. Thanks in advance. -- Regards, Shachi From pankajt322 at gmail.com Thu Nov 3 12:12:44 2011 From: pankajt322 at gmail.com (pankaj) Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT) Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On Oct 21, 1:59?am, Shachi Gahoi wrote: > Dear all, > > I have fasta format sequence file and I want to extract ORF ID "PITG_14194" > from fasta file and then I want to rename same file with that ORF ID > "PITG_14194". > > I have many files and I want to do same exercise with all sequence files. > > Please tell me how can i do this with perl or bioperl. > > >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora > > infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 > MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL > ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA > RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF > HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM > YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL > TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD > RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR > NGIAVDHKGVICNGKAPIEIAVDENTLSAAA > > -- > Regards, > Shachi > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From azaballos at isciii.es Wed Nov 9 11:28:21 2011 From: azaballos at isciii.es (Angel Zaballos) Date: Wed, 9 Nov 2011 12:28:21 +0100 Subject: [Bioperl-l] bp_genbank2gff.pl bug Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Running bp_genbank2gff.pl got this: [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251. ?ngel Zaballos Unidad de Gen?mica Centro Nacional de Microbiolog?a-ISCIII Carretera Majadahonda-Pozuelo, Km 2,2 28220-Majadahonda Tel: 918223994 mail: azaballos at isciii.es ************************* AVISO LEGAL ************************* Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, pudiendo contener documentos anexos de car?cter privado y confidencial. Si por error, ha recibido este mensaje y no se encuentra entre los destinatarios, por favor, no use, informe, distribuya, imprima o copie su contenido por ning?n medio. Le rogamos lo comunique al remitente y borre completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no asume ning?n tipo de responsabilidad legal por el contenido de este mensaje cuando no responda a las funciones atribuidas al remitente del mismo por la normativa vigente. From scott at scottcain.net Wed Nov 9 16:12:02 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 11:12:02 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: Hi Angel, I would suggest using bp_genbank2gff3.pl, as it is more actively maintained; the bp_genbank2gff.pl script hasn't really been touched in many years, and I imagine it's suffering from some serious code rot. Scott 2011/11/9 Angel Zaballos > Running bp_genbank2gff.pl got this: > > [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > AAXT01000001.1 > babesichr3.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 16:13:10 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 16:13:10 +0000 Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl In-Reply-To: References: Message-ID: On 3 November 2011 12:12, pankaj wrote: > > > On Oct 21, 1:59?am, Shachi Gahoi wrote: >> Dear all, >> >> I have fasta format sequence file and I want to extract ORF ID "PITG_14194" >> from fasta file and then I want to rename same file with that ORF ID >> "PITG_14194". >> >> I have many files and I want to do same exercise with all sequence files. >> >> Please tell me how can i do this with perl or bioperl. >> >> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora >> >> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1 >> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL >> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA >> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF >> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM >> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL >> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD >> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR >> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA >> ---------- Forwarded message ---------- From: Jason Stajich Date: 21 October 2011 10:56 Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl To: Shachi Gahoi Cc: bioperl-l at bioperl.org easy to do this with a simple regular expression and opening a new file. Have you read up on this concept in Perl. You can use SeqIO to parse FASTA files - did you read the HOWTO and website documentation first? We don't typically do people's work for them on this mailing list so please show some effort first. From scott at scottcain.net Wed Nov 9 18:43:00 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 13:43:00 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Chris, Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. Scott 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or > remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus > destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie > su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III > no > >> asume ning?n tipo de responsabilidad legal por el contenido de este > mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo > por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain > dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 18:39:52 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 18:39:52 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Scott, Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? chris On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 9 19:51:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 19:51:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Scott, It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder. Either would prevent it from being packaged and installed in future versions. (Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules) chris On Nov 9, 2011, at 12:43 PM, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea. I can't really think of a down side. > > Scott > > > 2011/11/9 Fields, Christopher J > Scott, > > Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)? > > chris > > On Nov 9, 2011, at 10:12 AM, Scott Cain wrote: > > > Hi Angel, > > > > I would suggest using bp_genbank2gff3.pl, as it is more actively > > maintained; the bp_genbank2gff.pl script hasn't really been touched in many > > years, and I imagine it's suffering from some serious code rot. > > > > Scott > > > > > > 2011/11/9 Angel Zaballos > > > >> Running bp_genbank2gff.pl got this: > >> > >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession > >> AAXT01000001.1 > babesichr3.gff > >> Replacement list is longer than search list at > >> /usr/share/perl5/Bio/Range.pm line 251. > >> > >> > >> > >> ?ngel Zaballos > >> Unidad de Gen?mica > >> Centro Nacional de Microbiolog?a-ISCIII > >> Carretera Majadahonda-Pozuelo, Km 2,2 > >> 28220-Majadahonda > >> > >> Tel: 918223994 > >> mail: azaballos at isciii.es > >> > >> > >> > >> > >> ************************* AVISO LEGAL ************************* > >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > >> pudiendo contener documentos anexos de car?cter privado y confidencial. > >> Si por error, ha recibido este mensaje y no se encuentra entre los > >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su > >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > >> cuando no responda a las funciones atribuidas al remitente del mismo por la > >> normativa vigente. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. scott at scottcain dot > > net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From carandraug+dev at gmail.com Wed Nov 9 20:39:17 2011 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Wed, 9 Nov 2011 20:39:17 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: On 9 November 2011 18:43, Scott Cain wrote: > Hi Chris, > > Actually, removing it from the distribution (but letting it remain in the > code repository) is not a bad idea. ?I can't really think of a down side. > > Scott Hi can I suggest instead to simply make the script issue a warning right at the start? Something like "bp_genbank2gff is obsolete and will be removed from a future version of bioerl; please use bp_genbank2gff3 instead". You could leave it there for the next 2 releases and then finally remove it. This would have 2 advantages: 1) people that have been using it will immediately know what to use as replacement (instead of coming and ask in the mailing list)? 2) people who use it but don't know anything about the subject, someone told them to "just press this button" or "just type this in the terminal", won't have suddenly a broken system and will have time to find someone that will make it work again. That's what's done in GNU octave and I think it works good there. Carn? From scott at scottcain.net Wed Nov 9 20:48:07 2011 From: scott at scottcain.net (Scott Cain) Date: Wed, 9 Nov 2011 15:48:07 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Hi Carn?, You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) Scott 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed Nov 9 21:59:48 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 9 Nov 2011 21:59:48 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu> Message-ID: Works for me, it's a standard deprecation policy. The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning). chris On Nov 9, 2011, at 2:48 PM, Scott Cain wrote: > Hi Carn?, > > You are absolutely correct; that is the right way to do it. I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-) > > Scott > > > 2011/11/9 Carn? Draug > On 9 November 2011 18:43, Scott Cain wrote: > > Hi Chris, > > > > Actually, removing it from the distribution (but letting it remain in the > > code repository) is not a bad idea. I can't really think of a down side. > > > > Scott > > Hi > > can I suggest instead to simply make the script issue a warning right > at the start? Something like "bp_genbank2gff is obsolete and will be > removed from a future version of bioerl; please use bp_genbank2gff3 > instead". You could leave it there for the next 2 releases and then > finally remove it. This would have 2 advantages: > > 1) people that have been using it will immediately know what to use as > replacement (instead of coming and ask in the mailing list)? > 2) people who use it but don't know anything about the subject, > someone told them to "just press this button" or "just type this in > the terminal", won't have suddenly a broken system and will have time > to find someone that will make it work again. > > That's what's done in GNU octave and I think it works good there. > Carn? > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research From biopython at maubp.freeserve.co.uk Thu Nov 10 13:09:40 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 13:09:40 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: <31659982.post@talk.nabble.com> References: <31659982.post@talk.nabble.com> Message-ID: Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: > > I received the following error while trying to run bl2seq from > standaloneblastplus. Has anyone else encountered this problem? > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: /usr/bin/blastp call crashed: There was a problem running > /usr/bin/blastp : Error: NCBI C++ Exception: > > "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", > line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to > access NULL pointer. > > Thank you, > Ryan Just hit something very very similar, looks like a BLAST+ bug which I will report now: $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query NC_003197.fna -evalue 0.0001 -subject NC_011294.fna Error: NCBI C++ Exception: "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", line 689: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer. This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was BLAST 2.2.24+ (blastp) from the look of the error. The line number has changed by one, but I'm confident it is the same point of failure. In my case I was comparing nucleotide against nucleotide, so should have been using tblastx not tblastn, but it still shouldn't have had a pointer exception. Peter From cjfields at illinois.edu Thu Nov 10 14:00:46 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 14:00:46 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Nov 10, 2011, at 7:09 AM, Peter wrote: > Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html > > On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >> >> I received the following error while trying to run bl2seq from >> standaloneblastplus. Has anyone else encountered this problem? >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: /usr/bin/blastp call crashed: There was a problem running >> /usr/bin/blastp : Error: NCBI C++ Exception: >> >> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >> access NULL pointer. >> >> Thank you, >> Ryan > > Just hit something very very similar, looks like a BLAST+ bug which I > will report now: > > $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query > NC_003197.fna -evalue 0.0001 -subject NC_011294.fna > Error: NCBI C++ Exception: > "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", > line 689: Critical: ncbi::CObject::ThrowNullPointerException() - > Attempt to access NULL pointer. > > This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was > BLAST 2.2.24+ (blastp) from the look of the error. The line number has > changed by one, but I'm confident it is the same point of failure. > > In my case I was comparing nucleotide against nucleotide, so should > have been using tblastx not tblastn, but it still shouldn't have had a > pointer exception. > > Peter Yeah, that's bad. I have seen a few things like this myself that make me worry about the transition to BLAST+. chris PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? From casaburi at ceinge.unina.it Thu Nov 10 12:29:55 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads Message-ID: <32818254.post@talk.nabble.com> Hi everybody, i have some reads (454) where there are adaptors (NNNN...), one,two or three adaptors for each reads depending on the reads. Is there any way to establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors over the total ??? >271-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >272-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >273-88 GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >274-88 GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA The problem is that some adpators occur in the middle of the sequences because they coming out from a concameration experimental design (they are miRNAs between NNNNNN...). So i want to know a script or tool that may say how many reads have 1 adapt, how many 2, (max are 4) in respect to the total number of reads. Do you know any tool/script that may help ? Tnx Can anyone suggests me a script to fix this ??? Thank you very much -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jovel_juan at hotmail.com Thu Nov 10 16:06:16 2011 From: jovel_juan at hotmail.com (Juan Jovel) Date: Thu, 10 Nov 2011 16:06:16 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <32818254.post@talk.nabble.com> References: <32818254.post@talk.nabble.com> Message-ID: There are many ways to do it. Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. For example: $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. You then place that result in a hash bin: my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} # Then you can sort and output your classes foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } You can workout the details, but something like this should work. > Date: Thu, 10 Nov 2011 04:29:55 -0800 > From: casaburi at ceinge.unina.it > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Scripting help to identify adaptors count in reads > > > Hi everybody, > > i have some reads (454) where there are adaptors (NNNN...), one,two or three > adaptors for each reads depending on the reads. Is there any way to > establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors > over the total ??? > > >271-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG > >272-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC > >273-88 > GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA > >274-88 > GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA > > The problem is that some adpators occur in the middle of the sequences > because they coming out from a concameration experimental design (they are > miRNAs between NNNNNN...). So i want to know a script or tool that may say > how many reads have 1 adapt, how many 2, (max are 4) in respect to the total > number of reads. Do you know any tool/script that may help ? Tnx > Can anyone suggests me a script to fix this ??? > > Thank you very much > -- > View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Nov 10 16:55:53 2011 From: scott at scottcain.net (Scott Cain) Date: Thu, 10 Nov 2011 11:55:53 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: Hi Angel, Please keep correspondence on the mailing list. I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), and it worked fine. I suspect there is something wrong with your genbank file. Scott On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > His Scott, > > Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same > happened: > > [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > > babesichr3_2.gff > Replacement list is longer than search list at > /usr/share/perl5/Bio/Range.pm line 251. > UNIVERSAL->import is deprecated and will be removed in a future perl at > /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 > > However, the output file seems to be correct (Indeed, that was also the > case for bp_genbank2gff.pl). I then ran ldHgGene and worked: > > [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab > babesiachr3_2.gff > Reading babesiachr3_2.gff > Read 4776 transcripts in 8821 lines in 1 files > 4776 groups 1 seqs 1 sources 6 feature types > 2379 gene predictions > > I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a > Mac with Parallels. Maybe tis is the cause for such a message. > > Regards > > > ?ngel > > > El 09/11/2011, a las 17:12, Scott Cain escribi?: > > Hi Angel, > > I would suggest using bp_genbank2gff3.pl, as it is more actively > maintained; the bp_genbank2gff.pl script hasn't really been touched in > many years, and I imagine it's suffering from some serious code rot. > > Scott > > > 2011/11/9 Angel Zaballos > >> Running bp_genbank2gff.pl got this: >> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >> AAXT01000001.1 > babesichr3.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este >> mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por >> la >> normativa vigente. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > ?ngel Zaballos > Unidad de Gen?mica > Centro Nacional de Microbiolog?a-ISCIII > Carretera Majadahonda-Pozuelo, Km 2,2 > 28220-Majadahonda > > Tel: 918223994 > mail: azaballos at isciii.es > > > > ************************* AVISO LEGAL ************************* > Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, > pudiendo contener documentos anexos de car?cter privado y confidencial. > Si por error, ha recibido este mensaje y no se encuentra entre los > destinatarios, por favor, no use, informe, distribuya, imprima o copie su > contenido por ning?n medio. Le rogamos lo comunique al remitente y borre > completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no > asume ning?n tipo de responsabilidad legal por el contenido de este mensaje > cuando no responda a las funciones atribuidas al remitente del mismo por la > normativa vigente. > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From l.m.timmermans at students.uu.nl Thu Nov 10 17:17:12 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Thu, 10 Nov 2011 18:17:12 +0100 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence > (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# > $adapter_matches will store the number of times the adapter sequence is > repeated. > No, it will not. tr/// will count characters, not sequences. Something like ?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH. Leon From cjfields at illinois.edu Thu Nov 10 19:17:57 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 19:17:57 +0000 Subject: [Bioperl-l] bp_genbank2gff.pl bug In-Reply-To: References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es> <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es> Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu> This is running using an older version of bioperl (probably 1.6.0 or 1.6.1). The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed. chris On Nov 10, 2011, at 10:55 AM, Scott Cain wrote: > Hi Angel, > > Please keep correspondence on the mailing list. > > I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria), > and it worked fine. I suspect there is something wrong with your genbank > file. > > Scott > > > On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos wrote: > >> His Scott, >> >> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same >> happened: >> >> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk > >> babesichr3_2.gff >> Replacement list is longer than search list at >> /usr/share/perl5/Bio/Range.pm line 251. >> UNIVERSAL->import is deprecated and will be removed in a future perl at >> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94 >> >> However, the output file seems to be correct (Indeed, that was also the >> case for bp_genbank2gff.pl). I then ran ldHgGene and worked: >> >> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab >> babesiachr3_2.gff >> Reading babesiachr3_2.gff >> Read 4776 transcripts in 8821 lines in 1 files >> 4776 groups 1 seqs 1 sources 6 feature types >> 2379 gene predictions >> >> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a >> Mac with Parallels. Maybe tis is the cause for such a message. >> >> Regards >> >> >> ?ngel >> >> >> El 09/11/2011, a las 17:12, Scott Cain escribi?: >> >> Hi Angel, >> >> I would suggest using bp_genbank2gff3.pl, as it is more actively >> maintained; the bp_genbank2gff.pl script hasn't really been touched in >> many years, and I imagine it's suffering from some serious code rot. >> >> Scott >> >> >> 2011/11/9 Angel Zaballos >> >>> Running bp_genbank2gff.pl got this: >>> >>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession >>> AAXT01000001.1 > babesichr3.gff >>> Replacement list is longer than search list at >>> /usr/share/perl5/Bio/Range.pm line 251. >>> >>> >>> >>> ?ngel Zaballos >>> Unidad de Gen?mica >>> Centro Nacional de Microbiolog?a-ISCIII >>> Carretera Majadahonda-Pozuelo, Km 2,2 >>> 28220-Majadahonda >>> >>> Tel: 918223994 >>> mail: azaballos at isciii.es >>> >>> >>> >>> >>> ************************* AVISO LEGAL ************************* >>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >>> pudiendo contener documentos anexos de car?cter privado y confidencial. >>> Si por error, ha recibido este mensaje y no se encuentra entre los >>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >>> asume ning?n tipo de responsabilidad legal por el contenido de este >>> mensaje >>> cuando no responda a las funciones atribuidas al remitente del mismo por >>> la >>> normativa vigente. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> ?ngel Zaballos >> Unidad de Gen?mica >> Centro Nacional de Microbiolog?a-ISCIII >> Carretera Majadahonda-Pozuelo, Km 2,2 >> 28220-Majadahonda >> >> Tel: 918223994 >> mail: azaballos at isciii.es >> >> >> >> ************************* AVISO LEGAL ************************* >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios, >> pudiendo contener documentos anexos de car?cter privado y confidencial. >> Si por error, ha recibido este mensaje y no se encuentra entre los >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje >> cuando no responda a las funciones atribuidas al remitente del mismo por la >> normativa vigente. >> >> > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Nov 10 19:27:22 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Nov 2011 19:27:22 +0000 Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++ Exception In-Reply-To: References: <31659982.post@talk.nabble.com> Message-ID: On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J wrote: > On Nov 10, 2011, at 7:09 AM, Peter wrote: > >> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html >> >> On Thu, May 19, 2011 at 11:15 PM, rgoldade wrote: >>> >>> I received the following error while trying to run bl2seq from >>> standaloneblastplus. Has anyone else encountered this problem? >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: /usr/bin/blastp call crashed: There was a problem running >>> /usr/bin/blastp : Error: NCBI C++ Exception: >>> >>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp", >>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to >>> access NULL pointer. >>> >>> Thank you, >>> Ryan >> >> Just hit something very very similar, looks like a BLAST+ bug which I >> will report now: >> >> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query >> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna >> Error: NCBI C++ Exception: >> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp", >> line 689: Critical: ncbi::CObject::ThrowNullPointerException() - >> Attempt to access NULL pointer. >> >> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was >> BLAST 2.2.24+ (blastp) from the look of the error. The line number has >> changed by one, but I'm confident it is the same point of failure. >> >> In my case I was comparing nucleotide against nucleotide, so should >> have been using tblastx not tblastn, but it still shouldn't have had a >> pointer exception. >> >> Peter > > Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+. > > chris I'm told is already fixed and will be part of BLAST 2.2.26+ which is good. > > PS - Odd I didn't see this one, was it caught in the bioperl-announce filter? > Maybe once, but it was in the archive and my email account. Peter From anna.fr at gmail.com Thu Nov 10 20:01:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 09:01:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? Message-ID: Hi all Does anyone know if there is a way to get a Taxonomy node and/or taxonid from a gi number using the flatfile with taxonomy db? I have blast output that I want to append taxonomic information to. I have hundreds of thousands of items to do this for, so it's not practical to use entrez to query the?NCBI database. I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I think much too large to put into a hash! This was also discussed in 2009: http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I don't think there was a conclusion? Thanks for your help Anna Friedlander From shalabh.sharma7 at gmail.com Thu Nov 10 20:12:09 2011 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 10 Nov 2011 15:12:09 -0500 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, I think the thread you mentioned was started by me. That time i wrote few scripts to map gi to taxa, after some time i found some other efficient ways also. But recently 'Miguel Pignatelli' directed to some Bio-LITE modules that are really helpful. These are the modules he mentioned, i found them really easy to use and very efficient. Bio-LITE-Taxonomy-0.07 Bio-LITE-Taxonomy-NCBI-0.07 Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 Cheers Shalabh On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From cjfields at illinois.edu Thu Nov 10 20:23:14 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 20:23:14 +0000 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu> Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option). I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups. chris On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote: > Hi Anna, > I think the thread you mentioned was started by me. > That time i wrote few scripts to map gi to taxa, after some time i found > some other efficient ways also. But recently 'Miguel Pignatelli' directed > to some Bio-LITE modules that are really helpful. > > These are the modules he mentioned, i found them really easy to use and > very efficient. > > Bio-LITE-Taxonomy-0.07 > Bio-LITE-Taxonomy-NCBI-0.07 > Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04 > > Cheers > Shalabh > > On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander wrote: > >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Thu Nov 10 20:51:13 2011 From: bernd.web at gmail.com (Bernd Web) Date: Thu, 10 Nov 2011 21:51:13 +0100 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: Hi Anna, Jason changed his example script from using hashes to using SQLite: bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom See https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl It's an example script that shows how to do the tax to gi mapping for blast reports. Bernd On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > have hundreds of thousands of items to do this for, so it's not > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > think much too large to put into a hash! > > This was also discussed in 2009: > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > don't think there was a conclusion? > > Thanks for your help > Anna Friedlander > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Nov 10 21:13:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 10 Nov 2011 21:13:12 +0000 Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: References: <32818254.post@talk.nabble.com> Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split? Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)). tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match. '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/). chris On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote: > > There are many ways to do it. > Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. > For example: > $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. > You then place that result in a hash bin: > my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){ $adapter_frequency{$class}++}else{ $adapter_frequency{$class} = 1} > # Then you can sort and output your classes > foreach $class (sort keys %adapter_frequency){ print "$class\t$adapter_frequency{$class}\n"; } > > You can workout the details, but something like this should work. > > > > > > > >> Date: Thu, 10 Nov 2011 04:29:55 -0800 >> From: casaburi at ceinge.unina.it >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Scripting help to identify adaptors count in reads >> >> >> Hi everybody, >> >> i have some reads (454) where there are adaptors (NNNN...), one,two or three >> adaptors for each reads depending on the reads. Is there any way to >> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors >> over the total ??? >> >>> 271-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG >>> 272-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC >>> 273-88 >> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA >>> 274-88 >> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA >> >> The problem is that some adpators occur in the middle of the sequences >> because they coming out from a concameration experimental design (they are >> miRNAs between NNNNNN...). So i want to know a script or tool that may say >> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total >> number of reads. Do you know any tool/script that may help ? Tnx >> Can anyone suggests me a script to fix this ??? >> >> Thank you very much >> -- >> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Thu Nov 10 21:15:29 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 10 Nov 2011 13:15:29 -0800 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: References: Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI and then a second db to store GI -> TAXONID This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string. https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl That's the first 165 lines, and then lookups are basically what you see on line 195. Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?). one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading. Jason On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: >> Hi all >> >> Does anyone know if there is a way to get a Taxonomy node and/or >> taxonid from a gi number using the flatfile with taxonomy db? >> >> I have blast output that I want to append taxonomic information to. I >> have hundreds of thousands of items to do this for, so it's not >> practical to use entrez to query the NCBI database. >> >> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I >> think much too large to put into a hash! >> >> This was also discussed in 2009: >> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I >> don't think there was a conclusion? >> >> Thanks for your help >> Anna Friedlander >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From anna.fr at gmail.com Fri Nov 11 01:07:57 2011 From: anna.fr at gmail.com (Anna Friedlander) Date: Fri, 11 Nov 2011 14:07:57 +1300 Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi? In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> References: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com> Message-ID: thanks all for the fast responses. I'll try the bio-lite modules shalabh mentioned On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich wrote: > Here's another variant of one I wrote which is for my own purposes, the code > at the beginning uses a NOSQL solution to storing all the ACC -> GI > and then a second db to store GI -> TAXONID > This is the case where I have a file of accession numbers and I want to add > to the description line the taxonomy string. > https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl > That's the first 165 lines, and then lookups are basically what you see on > line 195. > Would be good to rewrite that script below to use TokyoCabinent > or?KyotoCabinent (is newer implementation, not sure if it is faster?). > one thing that this does is take up a lot of disk space ,but you can have > tradeoffs between than and which compression scheme you use, which will > impact performance of loading. > Jason > On Nov 10, 2011, at 12:51 PM, Bernd Web wrote: > > Hi Anna, > > Jason changed his example script from using hashes to using SQLite: > bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom > > See > https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl > > It's an example script that shows how to do the tax to gi mapping for > blast reports. > > > Bernd > > On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander wrote: > > Hi all > > Does anyone know if there is a way to get a Taxonomy node and/or > > taxonid from a gi number using the flatfile with taxonomy db? > > I have blast output that I want to append taxonomic information to. I > > have hundreds of thousands of items to do this for, so it's not > > practical to use entrez to query the?NCBI database. > > I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I > > think much too large to put into a hash! > > This was also discussed in 2009: > > http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I > > don't think there was a conclusion? > > Thanks for your help > > Anna Friedlander > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From arun_innovative90 at yahoo.com Fri Nov 11 11:09:46 2011 From: arun_innovative90 at yahoo.com (Arun Kumar) Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST) Subject: [Bioperl-l] BIOPERL MATERIAL Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Hi team, ? ?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl. ? Thanks in advance Thanks & Regards, Arunkumar.d From awitney at sgul.ac.uk Fri Nov 11 13:23:29 2011 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 11 Nov 2011 13:23:29 +0000 Subject: [Bioperl-l] BIOPERL MATERIAL In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com> Message-ID: All BioPerl documents can be found here: http://www.bioperl.org/wiki/Main_Page And a useful place to start would be the HOWTOs: http://www.bioperl.org/wiki/HOWTOs regards adam On 11 Nov 2011, at 11:09, Arun Kumar wrote: > Hi team, > > This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with bioperl. > > Thanks in advance > > Thanks & Regards, > Arunkumar.d > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From casaburi at ceinge.unina.it Fri Nov 11 12:13:50 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825229.post@talk.nabble.com> Hi thank you for your answer !!! At the end i tried this script and seems to work for this purpose: perl -pe 's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g' Scrivania/orchidea/Fiore/Mydata.fasta > result.txt -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From casaburi at ceinge.unina.it Fri Nov 11 12:21:29 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST) Subject: [Bioperl-l] Scripting help to identify adaptors count in reads In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> References: <32818254.post@talk.nabble.com> <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu> Message-ID: <32825274.post@talk.nabble.com> Thanks everybody for answering me so soon !!! Probably another way may be: perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print "$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt and/or with 'nawk': nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i " ADAPTOR"}' myFile.fasta > result.txt They give the same result. If you will have this problem try these, work good !!! Still Thanks to all, Giorgio -- View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Sun Nov 13 12:24:35 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:24:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J wrote: > On Nov 3, 2011, at 1:52 PM, Peter Cock wrote: > >> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J >> wrote: >>> (side thread, so re-titling...) >>> >> And CC'ing open-bio-l, which is a better home for this than bioperl-l, >> where OBDA v2 talk came up again in discussion of a BioPerl indexing >> problem. Archive links for thread here: >> >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html >> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html > > yes, good idea... I've not CC'd the bioperl-l anymore. >>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote: >>>> >>>> Yes, we're using SQLite3 to store essentially a list of filenames >>>> and their format as one table, and then in the main table an >>>> entry for each sequence recording the ID (only one accession, >>>> unlike OBDA which had infrastructure for a secondary accession), >>>> file number, offset of the start of the record, and optionally the >>>> length of the record on disk. >>>> >>>> i.e. Basically what OBDA does, but using SQLite rather >>>> than BDB (not included in Python 3) or a flat file index >>>> (poor performance with large datasets). >>>> >>>> I find this design attractive on several levels: >>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc >>>> * Preserves the original file untouched >>>> * Index is a small single file (thanks to SQLite) >>>> * Back end could be switched out >>>> * Could be applied to compressed file formats >>>> * Reuses existing parsing code to access entries >>>> >>>> This could easily form basis of OBDA v2, the main points >>>> of difference I anticipate between the Bio* projects would >>>> be naming conventions for the different file formats, and >>>> what we consider to be the default record ID of each read >>>> (e.g. which field in a GenBank file - although agreement >>>> here is not essential). Some of that was already settled in >>>> principle with OBDA v1. >>> >>> The primary/secondary IDs could be configurable with a sane >>> default, I think the bioperl implementations allowed this (and >>> it is certainly something that will be requested). >> >> One reason I went with a single ID only was to keep the >> Python dictionary based API simple (think hash in Perl). >> You don't get secondary keys in a Python dict or a hash ;) >> >> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you >> can provide a call back function to map the suggested ID to >> something else. Obviously this doesn't give the full flexibility >> of extracting a field from the record's annotation because we >> don't parse the whole record during indexing (it would be too >> slow). > > Same with bioperl. > >> However, I'm happy for there to be an *optional* secondary >> key in an OBDA v2 SQLite schema, but Biopython probably >> won't populate it. We could optionally use it rather than the >> primary ID on loading an existing index though. > > Optional implementation of that is fine by me. > >> Personally I would stick with one key in the index - it should >> be faster and makes it simpler to switch the back end if we >> need to later. If anyone wants a second key, they can build >> a second index *grin*. > > That's easy enough. > >>>> On the other hand, you could try and store the parsed data >>>> itself, which is where NOSQL looks more interesting. That >>>> essentially requires the ability to serialise your annotated >>>> sequence object model to disk - which would be tricky to do >>>> cross project (much more ambitious than BioSQL is). It also >>>> means the "index" becomes very large because it now holds >>>> all the original data. >>>> >>>> Peter >>> >>> For a fully cross-Bio* compliant format, I don't think it's feasible >>> to use serialized data unless they are serialized in something >>> that is easily deserialized across HLLs (JSON, BSON, YAML, >>> XML, etc). ?Either that, or such data is stored concurrently with >>> the binary blob, along with meta data that indicates the source >>> of the blob, parser, version, etc, etc (unless there are tools out >>> there that reliably interconvert serialized complex data structures >>> between HLLs). ?Anyway you go about it, it seems like it could >>> be a major ball of hurt, unless implemented very carefully. >> >> You missed out RDF as a serialisation ;) >> >> But yes, going down the shared serialisation route is going >> to be messy - as you are well aware: >> >>> Aside: I think this was one of the problems with >>> Bio::DB::SeqFeature::Store, in that it at one point stored >>> Perl-specific Storable blobs. >>> >>> chris >> >> Peter > > yes, it's a problem w/o an easy solution. ?Anyway, I think an > implementation of such at this point would be a premature > optimization. > > chris So, Chris and I seem in general agreement that an OBDA v2 using SQLite but based on essentially the same approach as the BDB or flat file based OBDA v1 is a good idea. i.e. Tables mapping record identifiers to file offsets in the original sequence files. I hope to get BioRuby on board, they already have an OBDA v1 support so that shouldn't be too hard. Right now I don't recall if BioJava has/had OBDA v1 support, and if they did if it was affected in their recent move to BioJava v3 (I understand from their mailing list that some older lower priority functionality has not all been ported yet). Also EMBOSS are likely to be interested, certainly Peter Rice was interested in the SQLite indexing we're already using in Biopython for sequence files (i.e. what is effectively the prototype for OBDA v2). Note that in addition to simple indexing of text files, we are already using the same simple offset + length approach for indexing binary files (e.g. SFF). On the immediate practical side, I think I can edit the current OBDA website of http://obda.open-bio.org/ via /home/websites/obda.open-bio.org/html on the server. We need to work out where the current OBDA indexing specification lives (CVS or SVN?) and perhaps move that to github. We may need a general OBF organisation account on git hub for this and any other cross-project repositories. I see there is already an OBDA project on RedMine, (Chris can you add me to that please?) https://redmine.open-bio.org/projects/obda Peter From p.j.a.cock at googlemail.com Sun Nov 13 12:30:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:30:37 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files Message-ID: Hi again, I've retitled this as it is a little off topic from the main OBDA redux thread, http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html As far as I recall, the original flat file and BDB based OBDA specification for indexing sequencing files didn't cover compressed files. That might be something to consider (although we should sort of uncompressed text/binary files first). I've recently been experimenting with using compressed files - in particular simple GZIP files (ignoring any block structure) and BGZF (the specialised gzipped blocking used in BAM), see: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html http://seqanswers.com/forums/showthread.php?t=15347 The virtual offset approach used in BGZF squeezes a 16 bit within block offset (thus limiting you to 64kb blocks) and at 48 bit block start offset (thus limiting you to a 256TB file) into a single 64bit "virtual" offset. That makes sense if you are keeping the lookup table or many offsets in memory, and can be used as is with code expecting a single offset (like the current Biopython SQLite index schema). Also bzip2 but this is block based, with the block size ranging from 100KB to 900KB. http://bzip.org/ http://bzip.org/1.0.5/bzip2-manual-1.0.5.html I haven't tried any performance tests yet, which would be interesting as I believe compression/decompression of bfzip2 is more costly in CPU terms than gzip (although both will be block size dependent). If we wanted to imitate the BGZF virtual offset scheme for arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme could use 20 bits to cover bz2 blocks of up to 900KB, leaving 64 - 20 = 44 bits for the start offset, thus limiting you to to just 2^44 bytes or 16Tb which sounds OK only in the medium term. On the bright side this could be used to index any BZIP2 file (under 16TB), whereas BGZF cannot be applied to any GZIP file. On the other hand, storing the block start and within block separately is truly generic and could be used on any blocked GZIP file (including BGZF) and BZIP2 etc. It would make the SQLite schema a bit more complicated though. Maybe something to consider for the next revision to OBDA, and focus on the non-compressed case for now? Regards, Peter From p.j.a.cock at googlemail.com Sun Nov 13 12:32:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 13 Nov 2011 12:32:12 +0000 Subject: [Bioperl-l] OBDA redux? Compressed files In-Reply-To: References: Message-ID: On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock wrote: > Hi again, > > I've retitled this as it is a little off topic from the main OBDA redux thread, > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html > http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html > > As far as I recall, the original flat file and BDB based OBDA > specification for indexing sequencing files didn't cover > compressed files. That might be something to consider > (although we should sort of uncompressed text/binary > files first). Sorry - didn't meant to include bioperl-l on that, although it may be of interest to you guys anyway. Peter From jluis.lavin at unavarra.es Mon Nov 14 11:14:43 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 12:14:43 +0100 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Mon Nov 14 11:59:56 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 06:59:56 -0500 Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out In-Reply-To: References: Message-ID: if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > Hello everybody, > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > worked fine for me. Now I need to perform a multiple BLAST search, but this > time I'd just like to get all the BLAST results in a single out file > instead of having each sequence's report written individually. I've read > the documentation of the module, but due to my short > experience/understanding on complex modules as this one seems to be I can't > figure out where to change the script to achieve my previously mentioned > aim. > Here I post the script I've been using (it's basically the one posted on > the module cookbook). > > #!/c:/Perl -w > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > use Data::Dumper; > > #Here i set the parameters for blast > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > tblastx):\n"; > my $blst = ; > my $prog = "$blst"; > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > env_nr):\n"; > my $dtb = ; > $db = "$dtb"; > print "Enter your cutt off score (1e-n):\n"; > my $cut = ; > my $e_val = "$cut"; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > #Select the file and make the blast. > print "Enter your FASTA file:\n"; > chomp(my $infile = ); > my $r = $remoteBlast->submit_blast($infile); > my $v = 1; > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > TO RETURN!!!!! > while ( my @rids = $remoteBlast->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $remoteBlast->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $remoteBlast->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = > $result->query_name()."\.out";##################open SALIDA, > '>>'."$^T"."Report"."\.out"; > $remoteBlast->save_output($filename);############# > $remoteBlast->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > > > May any of you please explain me how to solve my question? > > Thanks in advence > > With best wishes > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Nov 14 14:07:36 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 09:07:36 -0500 Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single out References: Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Please keep this on list discussions Sent from my iPhone-please excuse typos -- Jason Stajich Begin forwarded message: > From: Jos? Luis Lav?n > Date: November 14, 2011 8:04:25 AM EST > To: Jason Stajich > Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out > > Hello Jason, > > As answering your question: > > " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" > > A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. > I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. > > Thanks in advance > > El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: > if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. > > If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? > > On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > > > Hello everybody, > > > > I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > > worked fine for me. Now I need to perform a multiple BLAST search, but this > > time I'd just like to get all the BLAST results in a single out file > > instead of having each sequence's report written individually. I've read > > the documentation of the module, but due to my short > > experience/understanding on complex modules as this one seems to be I can't > > figure out where to change the script to achieve my previously mentioned > > aim. > > Here I post the script I've been using (it's basically the one posted on > > the module cookbook). > > > > #!/c:/Perl -w > > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > use Data::Dumper; > > > > #Here i set the parameters for blast > > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > > tblastx):\n"; > > my $blst = ; > > my $prog = "$blst"; > > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, > > env_nr):\n"; > > my $dtb = ; > > $db = "$dtb"; > > print "Enter your cutt off score (1e-n):\n"; > > my $cut = ; > > my $e_val = "$cut"; > > > > my @params = ( '-prog' => $prog, > > '-data' => $db, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #Select the file and make the blast. > > print "Enter your FASTA file:\n"; > > chomp(my $infile = ); > > my $r = $remoteBlast->submit_blast($infile); > > my $v = 1; > > > > print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS > > TO RETURN!!!!! > > while ( my @rids = $remoteBlast->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $remoteBlast->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $remoteBlast->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = > > $result->query_name()."\.out";##################open SALIDA, > > '>>'."$^T"."Report"."\.out"; > > $remoteBlast->save_output($filename);############# > > $remoteBlast->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > > > > > May any of you please explain me how to solve my question? > > > > Thanks in advence > > > > With best wishes > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > > > > > -- > > -- > > Dr. Jos? Luis Lav?n Trueba > > > > Dpto. de Producci?n Agraria > > Grupo de Gen?tica y Microbiolog?a > > Universidad P?blica de Navarra > > 31006 Pamplona > > Navarra > > SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > -- > Dr. Jos? Luis Lav?n Trueba > > Dpto. de Producci?n Agraria > Grupo de Gen?tica y Microbiolog?a > Universidad P?blica de Navarra > 31006 Pamplona > Navarra > SPAIN From cl134 at duke.edu Sun Nov 13 14:42:05 2011 From: cl134 at duke.edu (Cheng-Ruei Lee) Date: Sun, 13 Nov 2011 09:42:05 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Hi all, Bioperl version: 1.006 Here are two error messages when I'm using this module to calculate Fu & Li's statistics: Illegal division by zero at (the Statistics.pm file) line 359 Illegal division by zero at (the Statistics.pm file) line 376 A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. Sincerely, Cheng-Ruei Lee From joluito at gmail.com Mon Nov 14 09:21:31 2011 From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 10:21:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out Message-ID: Hello everybody, I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has worked fine for me. Now I need to perform a multiple BLAST search, but this time I'd just like to get all the BLAST results in a single out file instead of having each sequence's report written individually. I've read the documentation of the module, but due to my short experience/understanding on complex modules as this one seems to be I can't figure out where to change the script to achieve my previously mentioned aim. Here I post the script I've been using (it's basically the one posted on the module cookbook). #!/c:/Perl -w use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use Data::Dumper; #Here i set the parameters for blast print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, tblastx):\n"; my $blst = ; my $prog = "$blst"; print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, env_nr):\n"; my $dtb = ; $db = "$dtb"; print "Enter your cutt off score (1e-n):\n"; my $cut = ; my $e_val = "$cut"; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); #Select the file and make the blast. print "Enter your FASTA file:\n"; chomp(my $infile = ); my $r = $remoteBlast->submit_blast($infile); my $v = 1; print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS TO RETURN!!!!! while ( my @rids = $remoteBlast->each_rid ) { foreach my $rid ( @rids ) { my $rc = $remoteBlast->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $remoteBlast->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out";##################open SALIDA, '>>'."$^T"."Report"."\.out"; $remoteBlast->save_output($filename);############# $remoteBlast->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } May any of you please explain me how to solve my question? Thanks in advence With best wishes -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From cjfields at illinois.edu Mon Nov 14 17:02:22 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:02:22 +0000 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... chris On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > Please keep this on list discussions > > Sent from my iPhone-please excuse typos > > -- > Jason Stajich > > Begin forwarded message: > >> From: Jos? Luis Lav?n >> Date: November 14, 2011 8:04:25 AM EST >> To: Jason Stajich >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out >> >> Hello Jason, >> >> As answering your question: >> >> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?" >> >> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. >> I'll be happy to get assistance on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation. >> >> Thanks in advance >> >> El 14 de noviembre de 2011 12:59, Jason Stajich escribi?: >> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too. >> >> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table? >> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> >>> Hello everybody, >>> >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has >>> worked fine for me. Now I need to perform a multiple BLAST search, but this >>> time I'd just like to get all the BLAST results in a single out file >>> instead of having each sequence's report written individually. I've read >>> the documentation of the module, but due to my short >>> experience/understanding on complex modules as this one seems to be I can't >>> figure out where to change the script to achieve my previously mentioned >>> aim. >>> Here I post the script I've been using (it's basically the one posted on >>> the module cookbook). >>> >>> #!/c:/Perl -w >>> use Bio::Tools::Run::RemoteBlast; >>> use Bio::SearchIO; >>> use Data::Dumper; >>> >>> #Here i set the parameters for blast >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >>> tblastx):\n"; >>> my $blst = ; >>> my $prog = "$blst"; >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb, >>> env_nr):\n"; >>> my $dtb = ; >>> $db = "$dtb"; >>> print "Enter your cutt off score (1e-n):\n"; >>> my $cut = ; >>> my $e_val = "$cut"; >>> >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO' ); >>> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >>> >>> >>> #Select the file and make the blast. >>> print "Enter your FASTA file:\n"; >>> chomp(my $infile = ); >>> my $r = $remoteBlast->submit_blast($infile); >>> my $v = 1; >>> >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE RESULTS >>> TO RETURN!!!!! >>> while ( my @rids = $remoteBlast->each_rid ) { >>> foreach my $rid ( @rids ) { >>> my $rc = $remoteBlast->retrieve_blast($rid); >>> if( !ref($rc) ) { >>> if( $rc < 0 ) { >>> $remoteBlast->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> sleep 5; >>> } else { >>> my $result = $rc->next_result(); >>> #save the output >>> my $filename = >>> $result->query_name()."\.out";##################open SALIDA, >>> '>>'."$^T"."Report"."\.out"; >>> $remoteBlast->save_output($filename);############# >>> $remoteBlast->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), "\n"; >>> while ( my $hit = $result->next_hit ) { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> } >>> >>> >>> May any of you please explain me how to solve my question? >>> >>> Thanks in advence >>> >>> With best wishes >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> >>> >>> -- >>> -- >>> Dr. Jos? Luis Lav?n Trueba >>> >>> Dpto. de Producci?n Agraria >>> Grupo de Gen?tica y Microbiolog?a >>> Universidad P?blica de Navarra >>> 31006 Pamplona >>> Navarra >>> SPAIN >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 17:03:04 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:03:04 +0000 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: Cheng, Have you tried the latest CPAN release (we're at 1.006901). chris On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 14 17:59:35 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 14 Nov 2011 17:59:35 +0000 Subject: [Bioperl-l] OBDA redux? In-Reply-To: References: Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu> On Nov 13, 2011, at 6:24 AM, Peter Cock wrote: > So, Chris and I seem in general agreement that an OBDA v2 > using SQLite but based on essentially the same approach as > the BDB or flat file based OBDA v1 is a good idea. i.e. Tables > mapping record identifiers to file offsets in the original sequence > files. The worry I have is adhering to a specific backend (e.g. SQLite). The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets. Who's to say something similar won't happen to SQLite, or that it is the best option available? Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed). > I hope to get BioRuby on board, they already have an OBDA > v1 support so that shouldn't be too hard. > > Right now I don't recall if BioJava has/had OBDA v1 support, > and if they did if it was affected in their recent move to BioJava > v3 (I understand from their mailing list that some older lower > priority functionality has not all been ported yet). I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?) > Also EMBOSS are likely to be interested, certainly Peter Rice > was interested in the SQLite indexing we're already using in > Biopython for sequence files (i.e. what is effectively the > prototype for OBDA v2). > > Note that in addition to simple indexing of text files, we are > already using the same simple offset + length approach for > indexing binary files (e.g. SFF). I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well. > On the immediate practical side, I think I can edit the > current OBDA website of http://obda.open-bio.org/ > via /home/websites/obda.open-bio.org/html on the > server. See below w/ regards to my thoughts on the wiki. > We need to work out where the current OBDA indexing > specification lives (CVS or SVN?) and perhaps move > that to github. We may need a general OBF organisation > account on git hub for this and any other cross-project > repositories. +1 to a move to github, but maybe this belongs in an OBF-specific organization. And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. > I see there is already an OBDA project on RedMine, > (Chris can you add me to that please?) > https://redmine.open-bio.org/projects/obda > > Peter Done (last night actually, but I didn't have time to respond immediately). chris From David.Messina at sbc.su.se Mon Nov 14 19:31:18 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 14 Nov 2011 20:31:18 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database. I haven't used it, though... Yes, it's the --remote option. I've used it, and it works great. The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers. Dave > From jluis.lavin at unavarra.es Mon Nov 14 21:23:31 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Mon, 14 Nov 2011 22:23:31 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu> Message-ID: Thank you very much for your answers, but due to them, I'm afraid I didn't explained myself good enough. I'm not looking for another tool to perform a BLAST task. I was just wondering if there was a way to simply change the way the module writes the outputs, so that I can get multiple searches in a single report file instead of having a report for each BLAST search. Maybe there's some issue I ignore, that makes you recommend the use of other tools instead of the Bioperl Remote BLAST module...it would be appreciated if you let me know about that (NCBI server problems with web-services or so)... Thank you for your answers anyway Best wishes 2011/11/14 Fields, Christopher J > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the > various 'blast*' indicating the search is to use a remote database. I > haven't used it, though... > > chris > > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: > > > Please keep this on list discussions > > > > Sent from my iPhone-please excuse typos > > > > -- > > Jason Stajich > > > > Begin forwarded message: > > > >> From: Jos? Luis Lav?n > >> Date: November 14, 2011 8:04:25 AM EST > >> To: Jason Stajich > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a > single out > >> > >> Hello Jason, > >> > >> As answering your question: > >> > >> " If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a > table?" > >> > >> A concatenation of BLAST (default format) reports should be OK, since I > have a script to parse that kind of results. Anyway formats 1 or 2 will > also do the trick. > >> I'll be happy to get assistance on how to change the OUTFILE from "a > query a report" to "all queries in the same report", because I don't seem > to be able to do it myself after reading the module documentation. > >> > >> Thanks in advance > >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < > jason.stajich at gmail.com> escribi?: > >> if you want to do a bunch of BLASTs remotely on the cmdline you should > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ > equivalent). This might be faster to do and easier since you need to learn > the programming part too. > >> > >> If you want to do this within this code I guess the question is what > format you want the data in - a BLAST report or something more like a table? > >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: > >> > >>> Hello everybody, > >>> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it has > >>> worked fine for me. Now I need to perform a multiple BLAST search, but > this > >>> time I'd just like to get all the BLAST results in a single out file > >>> instead of having each sequence's report written individually. I've > read > >>> the documentation of the module, but due to my short > >>> experience/understanding on complex modules as this one seems to be I > can't > >>> figure out where to change the script to achieve my previously > mentioned > >>> aim. > >>> Here I post the script I've been using (it's basically the one posted > on > >>> the module cookbook). > >>> > >>> #!/c:/Perl -w > >>> use Bio::Tools::Run::RemoteBlast; > >>> use Bio::SearchIO; > >>> use Data::Dumper; > >>> > >>> #Here i set the parameters for blast > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, > >>> tblastx):\n"; > >>> my $blst = ; > >>> my $prog = "$blst"; > >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, > pdb, > >>> env_nr):\n"; > >>> my $dtb = ; > >>> $db = "$dtb"; > >>> print "Enter your cutt off score (1e-n):\n"; > >>> my $cut = ; > >>> my $e_val = "$cut"; > >>> > >>> my @params = ( '-prog' => $prog, > >>> '-data' => $db, > >>> '-expect' => $e_val, > >>> '-readmethod' => 'SearchIO' ); > >>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); > >>> > >>> > >>> #Select the file and make the blast. > >>> print "Enter your FASTA file:\n"; > >>> chomp(my $infile = ); > >>> my $r = $remoteBlast->submit_blast($infile); > >>> my $v = 1; > >>> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE > RESULTS > >>> TO RETURN!!!!! > >>> while ( my @rids = $remoteBlast->each_rid ) { > >>> foreach my $rid ( @rids ) { > >>> my $rc = $remoteBlast->retrieve_blast($rid); > >>> if( !ref($rc) ) { > >>> if( $rc < 0 ) { > >>> $remoteBlast->remove_rid($rid); > >>> } > >>> print STDERR "." if ( $v > 0 ); > >>> sleep 5; > >>> } else { > >>> my $result = $rc->next_result(); > >>> #save the output > >>> my $filename = > >>> $result->query_name()."\.out";##################open SALIDA, > >>> '>>'."$^T"."Report"."\.out"; > >>> $remoteBlast->save_output($filename);############# > >>> $remoteBlast->remove_rid($rid); > >>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>> while ( my $hit = $result->next_hit ) { > >>> next unless ( $v > 0); > >>> print "\thit name is ", $hit->name, "\n"; > >>> while( my $hsp = $hit->next_hsp ) { > >>> print "\t\tscore is ", $hsp->score, "\n"; > >>> } > >>> } > >>> } > >>> } > >>> } > >>> > >>> > >>> May any of you please explain me how to solve my question? > >>> > >>> Thanks in advence > >>> > >>> With best wishes > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> > >>> > >>> -- > >>> -- > >>> Dr. Jos? Luis Lav?n Trueba > >>> > >>> Dpto. de Producci?n Agraria > >>> Grupo de Gen?tica y Microbiolog?a > >>> Universidad P?blica de Navarra > >>> 31006 Pamplona > >>> Navarra > >>> SPAIN > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> -- > >> -- > >> Dr. Jos? Luis Lav?n Trueba > >> > >> Dpto. de Producci?n Agraria > >> Grupo de Gen?tica y Microbiolog?a > >> Universidad P?blica de Navarra > >> 31006 Pamplona > >> Navarra > >> SPAIN > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From jason.stajich at gmail.com Tue Nov 15 03:53:19 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 14 Nov 2011 22:53:19 -0500 Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu> Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com> sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming. I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. https://redmine.open-bio.org/issues/3313 Jason Can you provide a test script and we'll add a test for this so On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote: > Hi all, > > Bioperl version: 1.006 > Here are two error messages when I'm using this module to calculate Fu & Li's statistics: > Illegal division by zero at (the Statistics.pm file) line 359 > Illegal division by zero at (the Statistics.pm file) line 376 > A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page. > > Sincerely, > Cheng-Ruei Lee > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cchehoud at gmail.com Tue Nov 15 01:39:32 2011 From: cchehoud at gmail.com (Christel Chehoud) Date: Mon, 14 Nov 2011 17:39:32 -0800 Subject: [Bioperl-l] Bioperl installation help Message-ID: Dear BioPerl, Thank you for creating such useful code. Unfortunately, every time I try to install Bioperl, it takes me a long time and is a challenging ordeal :( I am a new MAC user and was not able to download bioperl using CPAN. Here is the error I am getting: ERROR: Can't create '/usr/local/bin' Do not have write permissions on '/usr/local/bin' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 CJFIELDS/BioPerl-1.6.0.tar.gz ./Build install -- NOT OK ---- You may have to su to root to install the package (Or you may want to run something like o conf make_install_make_command 'sudo make' to raise your permissions.Warning (usually harmless): 'YAML' not installed, will not store persistent state Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but failure ignored because 'force' in effect so I did: cpan> o conf make_install_make_command 'sudo make' followed by cpan> o conf commit and started over..I got the same number of errors as last time (so I decided not to force install this time). do you have any suggestions: 63 tests and 305 subtests skipped. Failed 11/329 test scripts. 981/17708 subtests failed. Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = 117.20 CPU) Failed 11/329 test programs. 981/17708 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Warning (usually harmless): 'YAML' not installed, will not store persistent state Running Build install make test had returned bad status, won't install without force Failed during this command: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Thanks a lot for your time and help. I appreciate it. Thank you, Christel From casaburi at ceinge.unina.it Tue Nov 15 09:25:25 2011 From: casaburi at ceinge.unina.it (Giorgio C) Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST) Subject: [Bioperl-l] Blast > parsing result in Exel Message-ID: <32846407.post@talk.nabble.com> Hy everybody, in this situation froma blast (-m 1) result file : Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= 132-291 (59 letters) Database: Scrivania/orchidea/mature_mirBase.fa 21,643 sequences; 470,608 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031 mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031 gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9 gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9 mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9 132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59 12631 5 .............. 18 12630 5 .............. 18 7826 5 ........... 15 7644 19 ........... 9 5394 3 ........... 13 5394 3 ........... 13 BLASTN 2.2.21 [Jun-14-2009] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, ... .... .......... ______________________________________________________________ I need to parse in an exel sheet : 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula Is possible from a big blast result file obtain an exel with 5 columns where every field is the first hit of the blast result. Can anyone halp me to fix this problem ??? Also with a little script in perl. Thank you very much -- View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From nisa.dar10 at gmail.com Wed Nov 16 00:49:00 2011 From: nisa.dar10 at gmail.com (nisa.dar) Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST) Subject: [Bioperl-l] print alignment from blast results file Message-ID: <32851673.post@talk.nabble.com> Hi, I am parsing a blast results file. I have found bioperl modules to get query string, homology string and hit string for each hit/hsp. I want to print them in the form of an alignment instead of aligning them individually. this is what I am doing, but it doesn't seem correct while (my $hsp = $hit->next_hsp) { my $start_query_num=$hsp->start('query'); my $query_string=$hsp->query_string; my $end_query_num=$hsp->end('query'); my $homology_string=$hsp->homology_string; my $start_hit_num=$hsp->start('hit'); my $hit_string=$hsp->hit_string; my $end_hit_num=$hsp->end('hit'); my $aln_o = $hsp->get_aln; $query_string=~s/\n//g;#get rid of new line characters $homology_string=~s/\n//g; $hit_string=~s/\n//g; print "

Alignment:


"; print "$start_query_num-$query_string-$end_query_num
"; print "         $homology_string
"; print "$start_hit_num-$hit_string-$end_hit_num

"; } Please let me know how can I print them in the form of an alignment as seen in the blast results file. Thanks -- View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Wed Nov 16 09:11:40 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Nov 2011 09:11:40 +0000 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C wrote: > > Hy everybody, > > in this situation froma blast (-m 1) result file : > > ... > > I need to parse in an exel sheet : > > 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species > > > 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula > > Is possible from a big blast result file obtain an exel with 5 columns where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > > Thank you very much Have you looked at any of the BioPerl BLAST parsing examples? e.g http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/Module:Bio::SearchIO See also http://seqanswers.com/forums/showthread.php?t=15489 Peter From bosborne11 at verizon.net Wed Nov 16 13:19:33 2011 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 16 Nov 2011 08:19:33 -0500 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <32851673.post@talk.nabble.com> References: <32851673.post@talk.nabble.com> Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Nisa, See: http://www.bioperl.org/wiki/HOWTO:SearchIO Brian O. On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > > Hi, > > I am parsing a blast results file. I have found bioperl modules to get query > string, homology string and hit string for each hit/hsp. I want to print > them in the form of an alignment instead of aligning them individually. > > this is what I am doing, but it doesn't seem correct > > while (my $hsp = $hit->next_hsp) { > my > $start_query_num=$hsp->start('query'); > my $query_string=$hsp->query_string; > my $end_query_num=$hsp->end('query'); > my $homology_string=$hsp->homology_string; > my $start_hit_num=$hsp->start('hit'); > my $hit_string=$hsp->hit_string; > my $end_hit_num=$hsp->end('hit'); > my $aln_o = $hsp->get_aln; > $query_string=~s/\n//g;#get rid of new line characters > $homology_string=~s/\n//g; > $hit_string=~s/\n//g; > > print "

Alignment:


"; > print "$start_query_num-$query_string-$end_query_num
"; > print " >         $homology_string
"; > print "$start_hit_num-$hit_string-$end_hit_num

"; > > > > } > > Please let me know how can I print them in the form of an alignment as seen > in the blast results file. > > Thanks > > > -- > View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 16:44:27 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:44:27 +0000 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu> For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules). This should automatically install the latest version from CPAN. My guess is this will address some of the issues. However, w/o actually seeing what tests failed we can't help. Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB. There are instructions in the installation docs for that. You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan. chris On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 16 16:46:16 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 16 Nov 2011 16:46:16 +0000 Subject: [Bioperl-l] print alignment from blast results file In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> References: <32851673.post@talk.nabble.com> <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net> Message-ID: small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance). chris On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote: > Nisa, > > See: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > Brian O. > > > On Nov 15, 2011, at 7:49 PM, nisa.dar wrote: > >> >> Hi, >> >> I am parsing a blast results file. I have found bioperl modules to get query >> string, homology string and hit string for each hit/hsp. I want to print >> them in the form of an alignment instead of aligning them individually. >> >> this is what I am doing, but it doesn't seem correct >> >> while (my $hsp = $hit->next_hsp) { >> my >> $start_query_num=$hsp->start('query'); >> my $query_string=$hsp->query_string; >> my $end_query_num=$hsp->end('query'); >> my $homology_string=$hsp->homology_string; >> my $start_hit_num=$hsp->start('hit'); >> my $hit_string=$hsp->hit_string; >> my $end_hit_num=$hsp->end('hit'); >> my $aln_o = $hsp->get_aln; >> $query_string=~s/\n//g;#get rid of new line characters >> $homology_string=~s/\n//g; >> $hit_string=~s/\n//g; >> >> print "

Alignment:


"; >> print "$start_query_num-$query_string-$end_query_num
"; >> print " >>         $homology_string
"; >> print "$start_hit_num-$hit_string-$end_hit_num

"; >> >> >> >> } >> >> Please let me know how can I print them in the form of an alignment as seen >> in the blast results file. >> >> Thanks >> >> >> -- >> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Nov 16 17:01:49 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 16 Nov 2011 18:01:49 +0100 Subject: [Bioperl-l] Bioperl installation help In-Reply-To: References: Message-ID: Hi Christel, Sorry to hear you're having trouble with the installation. It looks like these modules aren't getting installed and are causing the failed tests: CMUNGALL/Data-Stag-0.11.tar.gz : make NO FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO I would try installing those separately via CPAN first and then trying again to install BioPerl. Also, it was a good idea to set the make_install_make_command option to CPAN, and that should have worked. Unfortunately, there's another installation system called Module::Build that has its own option which may need to be set: cpan> o conf mbuild_install_build_command 'sudo ./Build' That being said, I would suggest you grab the latest version of BioPerl from github instead of using v1.6.1 from CPAN, which is fairly out of date at this point. And unless you're planning to use BioPerl with GBrowse or Bio::Graphics, there's another, simpler way to get BioPerl up and running (assuming you have all the prerequisites like Data::Stag installed): See "Don't want to install BioPerl?" here: http://www.seqxml.org/xml/BioPerl.html Best, Dave On Tue, Nov 15, 2011 at 02:39, Christel Chehoud wrote: > Dear BioPerl, > Thank you for creating such useful code. Unfortunately, every time I > try to install Bioperl, it takes me a long time and is a challenging > ordeal :( I am a new MAC user and was not able to download bioperl > using CPAN. Here is the error I am getting: > > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm > line 902 > CJFIELDS/BioPerl-1.6.0.tar.gz > ./Build install -- NOT OK > ---- > You may have to su to root to install the package > (Or you may want to run something like > o conf make_install_make_command 'sudo make' > to raise your permissions.Warning (usually harmless): 'YAML' not > installed, will not store persistent state > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > CJFIELDS/BioPerl-1.6.0.tar.gz : make_test FAILED but > failure ignored because 'force' in effect > > > so I did: > cpan> o conf make_install_make_command 'sudo make' > followed by > cpan> o conf commit > > and started over..I got the same number of errors as last time (so I > decided not to force install this time). do you have any suggestions: > > 63 tests and 305 subtests skipped. > Failed 11/329 test scripts. 981/17708 subtests failed. > Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys = > 117.20 CPU) > Failed 11/329 test programs. 981/17708 subtests failed. > CJFIELDS/BioPerl-1.6.1.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.1.tar.gz > Warning (usually harmless): 'YAML' not installed, will not store > persistent state > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CMUNGALL/Data-Stag-0.11.tar.gz : make NO > FLORA/ExtUtils-Manifest-1.60.tar.gz : make NO > CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO > > > Thanks a lot for your time and help. I appreciate it. > > Thank you, > Christel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jluis.lavin at unavarra.es Wed Nov 16 18:31:46 2011 From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=) Date: Wed, 16 Nov 2011 19:31:46 +0100 Subject: [Bioperl-l] How to get Remote BLAST results in a single out In-Reply-To: References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com> Message-ID: Thank you for your answer Jason, While answering you I figured out how to do it...sometimes you need other people's point of view to see the light. As you pointed out: "what is complicaticated is the file name right now is based on the query name." that's what I expected that could have an easy fix, the issue about the dependency between the outfile name and the query name, this is why I couldn't figure out how to change the name of the output . While reading the code to answer you, I came across the solution. I was persistent on doing it this way because I need to run BLAST remotely on my CGI, that's why I didn't pay attention to all the other options you suggested. Thank you all for your sugestions anyway. ;) Best wishes JL El 16 de noviembre de 2011 18:03, Jason Stajich escribi?: > the answer to your question is to move the line that opens a file to > outside the loop. what is complicaticated is the file name right now is > based on the query name. so you need to think how you want to name the > file. Since this isn't obvious to you, then I think we are suggesting you > probably need to understand programming more, and it might just be easier > to use the tools as we have suggested rather than teaching you to modify > what is just an example code. our suggestions are based on the way we'd > solve the problem so maybe you have other reasons for the direction you > want to take. > > I also think it is not efficient or logical to run > remote blast through the web protocol simply to write it back out with > bioperl since that has to parse it in and then write it out -- why not just > run the program that generates the output directly from NCBI. Or run BLAST > locally for likely more efficient running. > > Finally the bioperl writer may not 100% reproduce the blast output so if > you are planning on further parsing the output that comes out from this > script, it really doesn't seem like a good idea to launder it through > bioperl parser first. > > > > 2011/11/14 Jos? Luis Lav?n > >> Thank you very much for your answers, but due to them, I'm afraid I didn't >> explained myself good enough. >> >> I'm not looking for another tool to perform a BLAST task. I was just >> wondering if there was a way to simply change the way the module writes >> the >> outputs, so that I can get multiple searches in a single report file >> instead of having a report for each BLAST search. >> >> Maybe there's some issue I ignore, that makes you recommend the use of >> other tools instead of the Bioperl Remote BLAST module...it would be >> appreciated if you let me know about that (NCBI server problems with >> web-services or so)... >> >> Thank you for your answers anyway >> >> Best wishes >> >> 2011/11/14 Fields, Christopher J >> >> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for >> the >> > various 'blast*' indicating the search is to use a remote database. I >> > haven't used it, though... >> > >> > chris >> > >> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote: >> > >> > > Please keep this on list discussions >> > > >> > > Sent from my iPhone-please excuse typos >> > > >> > > -- >> > > Jason Stajich >> > > >> > > Begin forwarded message: >> > > >> > >> From: Jos? Luis Lav?n >> > >> Date: November 14, 2011 8:04:25 AM EST >> > >> To: Jason Stajich >> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a >> > single out >> > >> >> > >> Hello Jason, >> > >> >> > >> As answering your question: >> > >> >> > >> " If you want to do this within this code I guess the question is >> what >> > format you want the data in - a BLAST report or something more like a >> > table?" >> > >> >> > >> A concatenation of BLAST (default format) reports should be OK, >> since I >> > have a script to parse that kind of results. Anyway formats 1 or 2 will >> > also do the trick. >> > >> I'll be happy to get assistance on how to change the OUTFILE from "a >> > query a report" to "all queries in the same report", because I don't >> seem >> > to be able to do it myself after reading the module documentation. >> > >> >> > >> Thanks in advance >> > >> >> > >> El 14 de noviembre de 2011 12:59, Jason Stajich < >> > jason.stajich at gmail.com> escribi?: >> > >> if you want to do a bunch of BLASTs remotely on the cmdline you >> should >> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ >> > equivalent). This might be faster to do and easier since you need to >> learn >> > the programming part too. >> > >> >> > >> If you want to do this within this code I guess the question is what >> > format you want the data in - a BLAST report or something more like a >> table? >> > >> >> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote: >> > >> >> > >>> Hello everybody, >> > >>> >> > >>> I've been using "Bio::Tools::Run::RemoteBlast" for a time and it >> has >> > >>> worked fine for me. Now I need to perform a multiple BLAST search, >> but >> > this >> > >>> time I'd just like to get all the BLAST results in a single out file >> > >>> instead of having each sequence's report written individually. I've >> > read >> > >>> the documentation of the module, but due to my short >> > >>> experience/understanding on complex modules as this one seems to be >> I >> > can't >> > >>> figure out where to change the script to achieve my previously >> > mentioned >> > >>> aim. >> > >>> Here I post the script I've been using (it's basically the one >> posted >> > on >> > >>> the module cookbook). >> > >>> >> > >>> #!/c:/Perl -w >> > >>> use Bio::Tools::Run::RemoteBlast; >> > >>> use Bio::SearchIO; >> > >>> use Data::Dumper; >> > >>> >> > >>> #Here i set the parameters for blast >> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn, >> > >>> tblastx):\n"; >> > >>> my $blst = ; >> > >>> my $prog = "$blst"; >> > >>> print "Enter a database to search (nr, refseq_protein, swissprot, >> pat, >> > pdb, >> > >>> env_nr):\n"; >> > >>> my $dtb = ; >> > >>> $db = "$dtb"; >> > >>> print "Enter your cutt off score (1e-n):\n"; >> > >>> my $cut = ; >> > >>> my $e_val = "$cut"; >> > >>> >> > >>> my @params = ( '-prog' => $prog, >> > >>> '-data' => $db, >> > >>> '-expect' => $e_val, >> > >>> '-readmethod' => 'SearchIO' ); >> > >>> >> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params); >> > >>> >> > >>> >> > >>> #Select the file and make the blast. >> > >>> print "Enter your FASTA file:\n"; >> > >>> chomp(my $infile = ); >> > >>> my $r = $remoteBlast->submit_blast($infile); >> > >>> my $v = 1; >> > >>> >> > >>> print STDERR "waiting..." if( $v > 0 ); ######## WAIT FOR THE >> > RESULTS >> > >>> TO RETURN!!!!! >> > >>> while ( my @rids = $remoteBlast->each_rid ) { >> > >>> foreach my $rid ( @rids ) { >> > >>> my $rc = $remoteBlast->retrieve_blast($rid); >> > >>> if( !ref($rc) ) { >> > >>> if( $rc < 0 ) { >> > >>> $remoteBlast->remove_rid($rid); >> > >>> } >> > >>> print STDERR "." if ( $v > 0 ); >> > >>> sleep 5; >> > >>> } else { >> > >>> my $result = $rc->next_result(); >> > >>> #save the output >> > >>> my $filename = >> > >>> $result->query_name()."\.out";##################open SALIDA, >> > >>> '>>'."$^T"."Report"."\.out"; >> > >>> $remoteBlast->save_output($filename);############# >> > >>> $remoteBlast->remove_rid($rid); >> > >>> print "\nQuery Name: ", $result->query_name(), "\n"; >> > >>> while ( my $hit = $result->next_hit ) { >> > >>> next unless ( $v > 0); >> > >>> print "\thit name is ", $hit->name, "\n"; >> > >>> while( my $hsp = $hit->next_hsp ) { >> > >>> print "\t\tscore is ", $hsp->score, "\n"; >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> } >> > >>> >> > >>> >> > >>> May any of you please explain me how to solve my question? >> > >>> >> > >>> Thanks in advence >> > >>> >> > >>> With best wishes >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> >> > >>> >> > >>> -- >> > >>> -- >> > >>> Dr. Jos? Luis Lav?n Trueba >> > >>> >> > >>> Dpto. de Producci?n Agraria >> > >>> Grupo de Gen?tica y Microbiolog?a >> > >>> Universidad P?blica de Navarra >> > >>> 31006 Pamplona >> > >>> Navarra >> > >>> SPAIN >> > >>> >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> _______________________________________________ >> > >> Bioperl-l mailing list >> > >> Bioperl-l at lists.open-bio.org >> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > >> >> > >> -- >> > >> -- >> > >> Dr. Jos? Luis Lav?n Trueba >> > >> >> > >> Dpto. de Producci?n Agraria >> > >> Grupo de Gen?tica y Microbiolog?a >> > >> Universidad P?blica de Navarra >> > >> 31006 Pamplona >> > >> Navarra >> > >> SPAIN >> > > >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> -- >> Dr. Jos? Luis Lav?n Trueba >> >> Dpto. de Producci?n Agraria >> Grupo de Gen?tica y Microbiolog?a >> Universidad P?blica de Navarra >> 31006 Pamplona >> Navarra >> SPAIN >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- -- Dr. Jos? Luis Lav?n Trueba Dpto. de Producci?n Agraria Grupo de Gen?tica y Microbiolog?a Universidad P?blica de Navarra 31006 Pamplona Navarra SPAIN From l.m.timmermans at students.uu.nl Fri Nov 18 14:15:47 2011 From: l.m.timmermans at students.uu.nl (L.M. Timmermans) Date: Fri, 18 Nov 2011 15:15:47 +0100 Subject: [Bioperl-l] Blast > parsing result in Exel In-Reply-To: <32846407.post@talk.nabble.com> References: <32846407.post@talk.nabble.com> Message-ID: On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C wrote: > I need to parse in an exel sheet : > What you're saying here is nonsense. I think you meant to say you want to output Excel. > Is possible from a big blast result file obtain an exel with 5 columns > where > every field is the first hit of the blast result. Can anyone halp me to fix > this problem ??? Also with a little script in perl. > There are a number of Perl modules on CPAN for outputting Excel. Try Excel::Writer::XLSX and Spreadsheet::WriteExcel for example. Leon From tzhu at mail.bnu.edu.cn Mon Nov 21 05:17:18 2011 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Mon, 21 Nov 2011 13:17:18 +0800 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn> I can use the "slice" method to split a single sequence alignment into several subalignments. Then is there a corresponding "combine" method to combine such subalignments back? -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From David.Messina at sbc.su.se Mon Nov 21 09:58:51 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 21 Nov 2011 10:58:51 +0100 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > From roy.chaudhuri at gmail.com Mon Nov 21 11:41:09 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 21 Nov 2011 11:41:09 +0000 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <4ECA38D5.8050709@gmail.com> See the cat method in Bio::Align::Utilities: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat On 21/11/2011 09:58, Dave Messina wrote: > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From zntayl at gmail.com Thu Nov 17 01:07:07 2011 From: zntayl at gmail.com (Nathan Taylor) Date: Wed, 16 Nov 2011 20:07:07 -0500 Subject: [Bioperl-l] seqIO.pm Message-ID: Hello, Can SeqIO.pm convert a file of fastq reads into .phd files. Or, barring that, a file of fastas and file of quals into .phd files? Many thanks, Nathan From gregonomic at yahoo.co.nz Mon Nov 21 12:00:50 2011 From: gregonomic at yahoo.co.nz (Gregory Baillie) Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST) Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: References: <4EC9DEDE.6030901@mail.bnu.edu.cn> Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Hi. I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. Usage: concatenate_alignments.pl -o <... input_alignment_n> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). Greg. ________________________________ From: Dave Messina To: Tao Zhu Cc: BioPerl Sent: Monday, 21 November 2011 7:58 PM Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? Hi, No, I don't believe such a method exists. Could you describe what you are wanting to do? Perhaps there is another way to do it. Dave On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > I can use the "slice" method to split a single sequence alignment into > several subalignments. Then is there a corresponding "combine" method to > combine such subalignments back? > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: concatenate_alignments.pl Type: application/octet-stream Size: 3349 bytes Desc: not available URL: From jason.stajich at gmail.com Mon Nov 21 15:31:50 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 21 Nov 2011 10:31:50 -0500 Subject: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> References: <4EC9DEDE.6030901@mail.bnu.edu.cn> <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com> Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com> greg -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out. This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment. https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote: > Hi. > > I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments. > > It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK. > > Usage: > concatenate_alignments.pl -o <... input_alignment_n> > > > If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---'). > > Greg. > > > ________________________________ > From: Dave Messina > To: Tao Zhu > Cc: BioPerl > Sent: Monday, 21 November 2011 7:58 PM > Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment? > > Hi, > > No, I don't believe such a method exists. Could you describe what you are > wanting to do? Perhaps there is another way to do it. > > > Dave > > > > On Mon, Nov 21, 2011 at 06:17, Tao Zhu wrote: > >> I can use the "slice" method to split a single sequence alignment into >> several subalignments. Then is there a corresponding "combine" method to >> combine such subalignments back? >> >> -- >> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing >> 100875, China >> Email: tzhu at mail.bnu.edu.cn >> >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Mon Nov 21 16:15:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 21 Nov 2011 16:15:13 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter From cjfields at illinois.edu Mon Nov 21 16:57:29 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 21 Nov 2011 16:57:29 +0000 Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu> On Nov 21, 2011, at 10:15 AM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: >> Hello, >> >> Can SeqIO.pm convert a file of fastq reads into .phd files. Or, >> barring that, a file of fastas and file of quals into .phd files? >> >> Many thanks, >> Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an > error message? > > Peter This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose. Nathan, if you run into problems with that conversion let us know. chris From rondonbio at yahoo.com.br Mon Nov 21 17:31:21 2011 From: rondonbio at yahoo.com.br (Rondon Neto) Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST) Subject: [Bioperl-l] seqIO.pm In-Reply-To: References: Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Hi! try this script: #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } my $fastq = $ARGV[0]; my $in = Bio::SeqIO->new( -file => $fastq, ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); my $out = Bio::SeqIO->new ( -file => ">out.phd", ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); while (my $seq = $in->next_seq()) { ?? ? ?$out->write_seq($seq); } exit; Best wishes, Rondon, a brazilian friend. ________________________________ De: Peter Cock Para: Nathan Taylor Cc: bioperl-l at bioperl.org Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 Assunto: Re: [Bioperl-l] seqIO.pm On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > Hello, > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > barring that, a file of fastas and file of quals into .phd files? > > Many thanks, > Nathan In principle that is possible (e.g. Biopython can do fastq to phd). Have you tried using BioPerl's SeqIO to do this? Was there an error message? Peter _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Nov 21 20:04:01 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 22 Nov 2011 09:04:01 +1300 Subject: [Bioperl-l] seqIO.pm In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> References: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz> Or you could use the builtin script bp_sreformat.pl --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rondon Neto > Sent: Tuesday, 22 November 2011 6:31 a.m. > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] seqIO.pm > > Hi! try this script: > > #!/usr/bin/perl > use warnings; > use strict; > use Bio::SeqIO; > > if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; } > > my $fastq = $ARGV[0]; > > my $in = Bio::SeqIO->new( -file => $fastq, > ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' ); > > my $out = Bio::SeqIO->new ( -file => ">out.phd", > ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd'); > > while (my $seq = $in->next_seq()) { > ?? ? ?$out->write_seq($seq); > } > > exit; > > > Best wishes, > Rondon, a brazilian friend. > > > > > > > ________________________________ > De: Peter Cock > Para: Nathan Taylor > Cc: bioperl-l at bioperl.org > Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15 > Assunto: Re: [Bioperl-l] seqIO.pm > > On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor wrote: > > Hello, > > > > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or, > > barring that, a file of fastas and file of quals into .phd files? > > > > Many thanks, > > Nathan > > In principle that is possible (e.g. Biopython can do fastq to phd). > Have you tried using BioPerl's SeqIO to do this? Was there an error message? > > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From goodyearkl at gmail.com Tue Nov 22 02:23:13 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Hi, This may seem like a stupid question but I am just learning bioperl and I am trying to figure out how to get a count of all the characters in my FASTA file. I manged to get the number of sequences using the following. Is there a way to tell bioperl to count the characters? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $count=0; while (my $seq_obj = $seqio_obj->next_seq) { $count++; } #Display the number of sequences present print "There are $count sequences present.\n"; From David.Messina at sbc.su.se Tue Nov 22 08:08:11 2011 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 22 Nov 2011 09:08:11 +0100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, You can use the length method for this. my $seq_length = $seq_obj->length(); Have you taken a look at the beginner's HOWTO? There's a nice table of sequence methods as well lots of other good information in there. http://www.bioperl.org/wiki/HOWTO:Beginners Dave On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From liam.elbourne at mq.edu.au Tue Nov 22 04:11:12 2011 From: liam.elbourne at mq.edu.au (Liam Elbourne) Date: Tue, 22 Nov 2011 15:11:12 +1100 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: Hi Kylie, I think the length() method is what you're after: .... my $sequence_length = $seq_obj->length(); .... in your case. Have a look at: HOWTO:SeqIO - BioPerl and, HOWTO:Beginners - BioPerl for some more general stuff. Regards, Liam. On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote: > Hi, > This may seem like a stupid question but I am just learning bioperl > and I am trying to figure out how to get a count of all the characters > in my FASTA file. I manged to get the number of sequences using the > following. Is there a way to tell bioperl to count the characters? > > #!/usr/bin/perl -w > #Defines perl modules > #Bio::Seq deal with sequences and their features > use Bio::Seq; > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > #Count how many sequences are present in file > my $count=0; > while (my $seq_obj = $seqio_obj->next_seq) { > $count++; > } > #Display the number of sequences present > print "There are $count sequences present.\n"; > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: Message signed with OpenPGP using GPGMail URL: From goodyearkl at gmail.com Tue Nov 22 13:00:55 2011 From: goodyearkl at gmail.com (Kylie Goodyear) Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST) Subject: [Bioperl-l] Fasta counting script? In-Reply-To: References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Thank you for your help. It keeps telling me that it can't find "length" do you think it has to do with the way I am coding it? #!/usr/bin/perl -w #Defines perl modules #Bio::Seq deal with sequences and their features use Bio::Seq; #Bio::SeqIO handles reading and parsing of sequences of many different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" ); #Count how many sequences are present in file my $countseq=0; while (my $seq_obj = $seqio_obj->next_seq, ) { $countseq++; } #Display the number of sequences present print "There are $countseq sequences present.\n"; #Count number of charcaters in file my $seq_length = $seq_obj->length ; print $seq_length On Nov 22, 5:08?am, Dave Messina wrote: > Hi Kylie, > > You can use the length method for this. > > my $seq_length = $seq_obj->length(); > > Have you taken a look at the beginner's HOWTO? There's a nice table of > sequence methods as well lots of other good information in there. > > http://www.bioperl.org/wiki/HOWTO:Beginners > > Dave > > > > > > > > > > On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: > > Hi, > > This may seem like a stupid question but I am just learning bioperl > > and I am trying to figure out how to get a count of all the characters > > in my FASTA file. I manged to get the number of sequences using the > > following. Is there a way to tell bioperl to count the characters? > > > #!/usr/bin/perl -w > > #Defines perl modules > > #Bio::Seq deal with sequences and their features > > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats > > use Bio::SeqIO; > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > #Count how many sequences are present in file > > my $count=0; > > while (my $seq_obj = $seqio_obj->next_seq) { > > ? ?$count++; > > } > > #Display the number of sequences present > > print "There are $count sequences present.\n"; > > > _______________________________________________ > > Bioperl-l mailing list > > Bioper... at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Nov 22 15:50:31 2011 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 22 Nov 2011 15:50:31 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <4ECBC4C7.10401@gmail.com> Hi Kylie, I suspect the error you get is actually "Can't call method length on an undefined value" (please in future report the exact text of any error messages). You declare $seq_obj with "my" in the while loop, but then try to access it outside of the loop. Try printing out the length of each $seq_obj within the while loop. You should always include "use strict;" at the top of your program, that helps to catch errors like this. Cheers, Roy. On 22/11/2011 13:00, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 22 16:13:01 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 22 Nov 2011 16:13:01 +0000 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> This sounds a little homework-y. Sure this isn't for a class? :) One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl. Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length. chris On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > Thank you for your help. It keeps telling me that it can't find > "length" do you think it has to do with the way I am coding it? > > #!/usr/bin/perl -w > #Defines perl modules > > #Bio::Seq deal with sequences and their features > use Bio::Seq; > > #Bio::SeqIO handles reading and parsing of sequences of many different > formats > use Bio::SeqIO; > > > #Read FASTA file > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > => "fasta" ); > > > #Count how many sequences are present in file > my $countseq=0; > while (my $seq_obj = $seqio_obj->next_seq, ) { > $countseq++; > } > #Display the number of sequences present > print "There are $countseq sequences present.\n"; > > #Count number of charcaters in file > my $seq_length = $seq_obj->length ; > print $seq_length > > > On Nov 22, 5:08 am, Dave Messina wrote: >> Hi Kylie, >> >> You can use the length method for this. >> >> my $seq_length = $seq_obj->length(); >> >> Have you taken a look at the beginner's HOWTO? There's a nice table of >> sequence methods as well lots of other good information in there. >> >> http://www.bioperl.org/wiki/HOWTO:Beginners >> >> Dave >> >> >> >> >> >> >> >> >> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear wrote: >>> Hi, >>> This may seem like a stupid question but I am just learning bioperl >>> and I am trying to figure out how to get a count of all the characters >>> in my FASTA file. I manged to get the number of sequences using the >>> following. Is there a way to tell bioperl to count the characters? >> >>> #!/usr/bin/perl -w >>> #Defines perl modules >>> #Bio::Seq deal with sequences and their features >>> use Bio::Seq; >>> #Bio::SeqIO handles reading and parsing of sequences of many different >>> formats >>> use Bio::SeqIO; >>> #Read FASTA file >>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format >>> => "fasta" ); >>> #Count how many sequences are present in file >>> my $count=0; >>> while (my $seq_obj = $seqio_obj->next_seq) { >>> $count++; >>> } >>> #Display the number of sequences present >>> print "There are $count sequences present.\n"; >> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioper... at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Nov 22 20:47:36 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 23 Nov 2011 09:47:36 +1300 Subject: [Bioperl-l] Fasta counting script? In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com> <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com> <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz> Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl As previous posters have hinted, RTFM - the answers are all in there! --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J > Sent: Wednesday, 23 November 2011 5:13 a.m. > To: Kylie Goodyear > Cc: > Subject: Re: [Bioperl-l] Fasta counting script? > > This sounds a little homework-y. Sure this isn't for a class? :) > > One clue (and a good thing to keep in mind): always 'use strict; use warnings;' > with your scripts if you are new to perl. Doing so would let you know there is > a problem with the script the way it is written, specifically, the place where > you are inquiring about the length. > > chris > > On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote: > > > Thank you for your help. It keeps telling me that it can't find > > "length" do you think it has to do with the way I am coding it? > > > > #!/usr/bin/perl -w > > #Defines perl modules > > > > #Bio::Seq deal with sequences and their features use Bio::Seq; > > > > #Bio::SeqIO handles reading and parsing of sequences of many different > > formats use Bio::SeqIO; > > > > > > #Read FASTA file > > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format > > => "fasta" ); > > > > > > #Count how many sequences are present in file my $countseq=0; while > > (my $seq_obj = $seqio_obj->next_seq, ) { > > $countseq++; > > } > > #Display the number of sequences present print "There are $countseq > > sequences present.\n"; > > > > #Count number of charcaters in file > > my $seq_length = $seq_obj->length ; > > print $seq_length > > > > > > On Nov 22, 5:08 am, Dave Messina wrote: > >> Hi Kylie, > >> > >> You can use the length method for this. > >> > >> my $seq_length = $seq_obj->length(); > >> > >> Have you taken a look at the beginner's HOWTO? There's a nice table > >> of sequence methods as well lots of other good information in there. > >> > >> http://www.bioperl.org/wiki/HOWTO:Beginners > >> > >> Dave > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear > wrote: > >>> Hi, > >>> This may seem like a stupid question but I am just learning bioperl > >>> and I am trying to figure out how to get a count of all the > >>> characters in my FASTA file. I manged to get the number of sequences > >>> using the following. Is there a way to tell bioperl to count the characters? > >> > >>> #!/usr/bin/perl -w > >>> #Defines perl modules > >>> #Bio::Seq deal with sequences and their features use Bio::Seq; > >>> #Bio::SeqIO handles reading and parsing of sequences of many > >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj = > >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta" > >>> ); #Count how many sequences are present in file my $count=0; while > >>> (my $seq_obj = $seqio_obj->next_seq) { > >>> $count++; > >>> } > >>> #Display the number of sequences present print "There are $count > >>> sequences present.\n"; > >> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioper... at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf > >> o/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From charles-listes+bioperl at plessy.org Wed Nov 23 10:27:45 2011 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Wed, 23 Nov 2011 19:27:45 +0900 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? Message-ID: <20111123102745.GC20168@merveille.plessy.net> Dear BioPerl developers, I am trying to process some unaligned paired-end reads with Bio::DB::Sam. For each pair, I want to detect a sequence index and a unique molecular identifier in the linker, record them as auxiliary flags, and trim the linker from the read. I collect the pairs through a features iterator, and can access all their data through the high-level Bio::DB::Bam::Alignment API. After modifying them (linker trimming and adding flags), I want to write the resulting pairs as a new unaligned BAM file. I apologise if the solution is trivial, but my problem is that I do not manage to modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as ?$pair[0]->qseq("GATACA")? give errors like ?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. Since I did not find explanations or portsions of source code indicating how to modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan From MEC at stowers.org Wed Nov 23 16:02:26 2011 From: MEC at stowers.org (Cook, Malcolm) Date: Wed, 23 Nov 2011 10:02:26 -0600 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Charles, I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". ~Malcolm > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Plessy > Sent: Wednesday, November 23, 2011 4:28 AM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the read. > > I collect the pairs through a features iterator, and can access all their data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 23 19:26:31 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 23 Nov 2011 19:26:31 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lincoln.stein at gmail.com Wed Nov 23 22:02:23 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:02:23 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: <20111123102745.GC20168@merveille.plessy.net> References: <20111123102745.GC20168@merveille.plessy.net> Message-ID: I apologize that the qseq() method is only allowing read-only access. I will attempt to fix this. Lincoln On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy < charles-listes+bioperl at plessy.org> wrote: > Dear BioPerl developers, > > I am trying to process some unaligned paired-end reads with Bio::DB::Sam. > For > each pair, I want to detect a sequence index and a unique molecular > identifier in > the linker, record them as auxiliary flags, and trim the linker from the > read. > > I collect the pairs through a features iterator, and can access all their > data > through the high-level Bio::DB::Bam::Alignment API. After modifying them > (linker trimming and adding flags), I want to write the resulting pairs as > a > new unaligned BAM file. > > I apologise if the solution is trivial, but my problem is that I do not > manage to > modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > ?$pair[0]->qseq("GATACA")? give errors like > ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > > Since I did not find explanations or portsions of source code indicating > how to > modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > > Have a nice day, > > -- > Charles Plessy > Tsurumi, Kanagawa, Japan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lincoln.stein at gmail.com Wed Nov 23 22:05:41 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 24 Nov 2011 06:05:41 +0800 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> Message-ID: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > According to the docs the low-level API for Bio-Samtools, both read and > write are allowed: > > http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API > > Using the low-level API for this purpose isn't documented as well, though > (the high-level API is read only AFAICT). > > The error message is a standard one generated from the XS bindings where > the passed argument passed isn't mapped correctly. Looking through the > Sam.xs file, qseq() is only prototyped as a reader; the only arg is a > Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a > function specified for Bio::DB::Bam::Alignment names l_qseq() that might be > the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' > prefix): > > .... > > int > bama_l_qseq(b,...) > Bio::DB::Bam::Alignment b > PROTOTYPE: $;$ > CODE: > if (items > 1) > b->core.l_qseq = SvIV(ST(1)); > RETVAL=b->core.l_qseq; > OUTPUT: > RETVAL > > SV* > bama_qseq(b) > Bio::DB::Bam::Alignment b > PROTOTYPE: $ > PREINIT: > char* seq; > int i; > CODE: > seq = Newxz(seq,b->core.l_qseq+1,char); > for (i=0;icore.l_qseq;i++) { > seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; > } > RETVAL = newSVpv(seq,b->core.l_qseq); > Safefree(seq); > OUTPUT: > RETVAL > > > -chris > > On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > > > Charles, > > > > I suggest you reconsider your approach to rather, use `samtools view` to > pipe your reads to stdout in sam format, then stream edit the barcode and > pipe it back to samtools for conversion back to .bam file. > > > > I know this is not what you're asking. I'm pretty sure that direct > answer to your question is, "yes - they are read-only". > > > > ~Malcolm > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy > >> Sent: Wednesday, November 23, 2011 4:28 AM > >> To: bioperl-l at bioperl.org > >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? > >> > >> Dear BioPerl developers, > >> > >> I am trying to process some unaligned paired-end reads with > Bio::DB::Sam. > >> For > >> each pair, I want to detect a sequence index and a unique molecular > >> identifier in > >> the linker, record them as auxiliary flags, and trim the linker from > the read. > >> > >> I collect the pairs through a features iterator, and can access all > their data > >> through the high-level Bio::DB::Bam::Alignment API. After modifying > them > >> (linker trimming and adding flags), I want to write the resulting pairs > as a > >> new unaligned BAM file. > >> > >> I apologise if the solution is trivial, but my problem is that I do not > manage to > >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as > >> ?$pair[0]->qseq("GATACA")? give errors like > >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at > >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. > >> > >> Since I did not find explanations or portsions of source code > indicating how to > >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? > >> > >> Have a nice day, > >> > >> -- > >> Charles Plessy > >> Tsurumi, Kanagawa, Japan > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From cjfields at illinois.edu Thu Nov 24 01:07:09 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 24 Nov 2011 01:07:09 +0000 Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? In-Reply-To: References: <20111123102745.GC20168@merveille.plessy.net> <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org> , Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu> Ah, okay, makes sense. I thought it was oddly named. :) Chris Sent from my iPad On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" > wrote: Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself. Lincoln On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J > wrote: According to the docs the low-level API for Bio-Samtools, both read and write are allowed: http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT). The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly. Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self). However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix): .... int bama_l_qseq(b,...) Bio::DB::Bam::Alignment b PROTOTYPE: $;$ CODE: if (items > 1) b->core.l_qseq = SvIV(ST(1)); RETVAL=b->core.l_qseq; OUTPUT: RETVAL SV* bama_qseq(b) Bio::DB::Bam::Alignment b PROTOTYPE: $ PREINIT: char* seq; int i; CODE: seq = Newxz(seq,b->core.l_qseq+1,char); for (i=0;icore.l_qseq;i++) { seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)]; } RETVAL = newSVpv(seq,b->core.l_qseq); Safefree(seq); OUTPUT: RETVAL -chris On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote: > Charles, > > I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file. > > I know this is not what you're asking. I'm pretty sure that direct answer to your question is, "yes - they are read-only". > > ~Malcolm > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy >> Sent: Wednesday, November 23, 2011 4:28 AM >> To: bioperl-l at bioperl.org >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ? >> >> Dear BioPerl developers, >> >> I am trying to process some unaligned paired-end reads with Bio::DB::Sam. >> For >> each pair, I want to detect a sequence index and a unique molecular >> identifier in >> the linker, record them as auxiliary flags, and trim the linker from the read. >> >> I collect the pairs through a features iterator, and can access all their data >> through the high-level Bio::DB::Bam::Alignment API. After modifying them >> (linker trimming and adding flags), I want to write the resulting pairs as a >> new unaligned BAM file. >> >> I apologise if the solution is trivial, but my problem is that I do not manage to >> modify the Bio::DB::Bam::Alignment objects. Typically, attempts such as >> ?$pair[0]->qseq("GATACA")? give errors like >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?. >> >> Since I did not find explanations or portsions of source code indicating how to >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only? >> >> Have a nice day, >> >> -- >> Charles Plessy >> Tsurumi, Kanagawa, Japan >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa > From ross at cuhk.edu.hk Sun Nov 27 08:24:43 2011 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 27 Nov 2011 16:24:43 +0800 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: References: Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Hi all, To write a script to extract sequence generically for all types of BioLocation objects, I'd like to know if there is any way to check what types (e.g. simple or split) are being processed. Bio::Location::CoordinatePolicyI appears to be doing something similar but it is more like a post checking step. If I parse the genbank file line by line, I can certainly check whether the line contains keywords like "join" but as I'm using something like: my @features=grep{$_->primary_tag eq $chkTags[0]} $seqobj->get_SeqFeatures; foreach (@features) { $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; @gene=[]; I'd appreciate if anybody knows a better integration with the well-developed bioperl module. Thanks a lot. From Russell.Smithies at agresearch.co.nz Mon Nov 28 00:46:05 2011 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 28 Nov 2011 13:46:05 +1300 Subject: [Bioperl-l] Galaxy tools? Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl? I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox. It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space) --Russell ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From p.j.a.cock at googlemail.com Mon Nov 28 01:28:33 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 28 Nov 2011 01:28:33 +0000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: On Monday, November 28, 2011, Smithies, Russell wrote: > Possibly the wrong place to ask but has anyone written > Galaxy tools using BioPerl? > I was thinking of creating blast graphic and format converter > tools as I couldn't see any already available in their toolbox. > It looks like I can just write a Python wrapper for my existing > BioPerl scripts - although I suspect the "correct" method is to > use BioPython methods (but Python annoys me with its lack > of semi-colons and required white-space) Galaxy is agnostic about what language the tools are in, you can use a binary, shell script, Java, Perl, Python etc. Peter From florent.angly at gmail.com Mon Nov 28 02:09:45 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 12:09:45 +1000 Subject: [Bioperl-l] Galaxy tools? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz> Message-ID: <4ED2ED69.10601@gmail.com> Hi Russell, As Peter said, the tools to be wrapped do not need to be written in Python. I have build a few wrappers for Galaxy, including one for the read simulator Grinder (http://sourceforge.net/projects/biogrinder/), which uses Bioperl and is available in the Galaxy Toolshed (http://sourceforge.net/projects/biogrinder/). It is not very hard to do a wrapper for trivial programs, but becomes more complicated once you start having optional arguments or multiple output files. Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) to parse command-line arguments. I have been thinking about leveraging the information that Getopt::Euclid stores about command-line arguments to automate most of the Galaxy wrapper generation, but I have not gotten to it yet. Florent On 28/11/11 11:28, Peter Cock wrote: > On Monday, November 28, 2011, Smithies, Russell wrote: >> Possibly the wrong place to ask but has anyone written >> Galaxy tools using BioPerl? >> I was thinking of creating blast graphic and format converter >> tools as I couldn't see any already available in their toolbox. >> It looks like I can just write a Python wrapper for my existing >> BioPerl scripts - although I suspect the "correct" method is to >> use BioPython methods (but Python annoys me with its lack >> of semi-colons and required white-space) > Galaxy is agnostic about what language the tools are in, > you can use a binary, shell script, Java, Perl, Python etc. > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Mon Nov 28 04:35:31 2011 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 28 Nov 2011 14:35:31 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules Message-ID: <4ED30F93.4000407@gmail.com> Hi all, I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. I envision the following modules: * Bio::Community::Member module representing members of a community. * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. Any interest? Ideas? Comments? Thanks, Florent From cjfields at illinois.edu Mon Nov 28 19:42:12 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:42:12 +0000 Subject: [Bioperl-l] Check the location type for a particular gene in a Genbank file In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> References: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk> Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu> Ross, The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects chris On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote: > Hi all, > > To write a script to extract sequence generically for all types of > BioLocation objects, I'd like to know if there is any way to check what > types (e.g. simple or split) are being processed. > > Bio::Location::CoordinatePolicyI appears to be doing something similar but > it is more like a post checking step. If I parse the genbank file line by > line, I can certainly check whether the line contains keywords like "join" > but as I'm using something like: > > my @features=grep{$_->primary_tag eq $chkTags[0]} > $seqobj->get_SeqFeatures; > > > foreach (@features) { > > $pseudo=$_->has_tag('pseudo')?'pseudo':'functional'; > > @gene=[]; > > I'd appreciate if anybody knows a better integration with the well-developed > bioperl module. > > Thanks a lot. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Nov 28 19:47:10 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 28 Nov 2011 19:47:10 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED30F93.4000407@gmail.com> References: <4ED30F93.4000407@gmail.com> Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? I do think it should be developed on it's own, per our recent discussions re: slimming down core. Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. chris On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > Hi all, > > I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. > > I envision the following modules: > * Bio::Community::Member module representing members of a community. > * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... > * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. > > The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > > Thanks, > > Florent > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From l.m.timmermans at students.uu.nl Mon Nov 28 20:25:13 2011 From: l.m.timmermans at students.uu.nl (Leon Timmermans) Date: Mon, 28 Nov 2011 21:25:13 +0100 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: And now to the list too, On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > The idea is to implement these modules in Moose to teach myself Moose. The > members of a community could be a sequence (Bio::SeqI), a species (Bio::S), > an arbitrary string or even other things. I am not quite sure if Bioperl > provide facilities to attach some arbitrary information to an object. > > Any interest? Ideas? Comments? > Sounds like a good use-case for roles, maybe even parametric roles. Leon From florent.angly at gmail.com Tue Nov 29 00:59:24 2011 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 29 Nov 2011 10:59:24 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> Message-ID: <4ED42E6C.6020501@gmail.com> Hi Chris, On 29/11/11 05:47, Fields, Christopher J wrote: > I think the idea is sound, it would be nice to have. Jason is working a bit in this area, maybe he has some additional thoughts? Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)? None of these features would be duplicated. Rather, they would be used attributes of the Bio::Community::* objects. For example, a member of a community could have a Bio::SeqI attached to it as well as a Bio::Taxon, etc... > I do think it should be developed on it's own, per our recent discussions re: slimming down core. Yes, the features are so different that it makes sense to have the Bio::Community::* modules as a separate BioPerl distribution (like the Bio-FeatureIO BioPerl distribution). > Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Best, Florent > chris > > On Nov 27, 2011, at 10:35 PM, Florent Angly wrote: > >> Hi all, >> >> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace. >> >> I envision the following modules: >> * Bio::Community::Member module representing members of a community. >> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ... >> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves. >> >> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> >> Thanks, >> >> Florent >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Nov 29 05:32:50 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 05:32:50 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> Message-ID: On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote: > And now to the list too, > > On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly wrote: > >> The idea is to implement these modules in Moose to teach myself Moose. The >> members of a community could be a sequence (Bio::SeqI), a species (Bio::S), >> an arbitrary string or even other things. I am not quite sure if Bioperl >> provide facilities to attach some arbitrary information to an object. >> >> Any interest? Ideas? Comments? >> > > Sounds like a good use-case for roles, maybe even parametric roles. > > Leon Yep, agree totally. It would be a good replacement in most cases for the BioI interfaces. (see also, the Biome project, which I'm slooooooowly working on again, on github) chris From pmr at ebi.ac.uk Tue Nov 29 13:39:52 2011 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 29 Nov 2011 13:39:52 +0000 Subject: [Bioperl-l] BinarySearch.pm Message-ID: <4ED4E0A8.30102@ebi.ac.uk> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. Both appear to be in the Bio/Flat/BinarySearch.pm source file. EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: if ($format =~ /embl/i) { return ('ID', "^ID (\\S+[^; ])", "^ID (\\S+[^; ])", { ACC => q/^AC (\S+);/, VERSION => q/^SV\s+(\S+)/ }); } The ACC secondary index has every record duplicated. This line is duplicated in the write_secondary_indices source code. Is that intentional? print $fh sprintf("%-${length}s",$record); regards, Peter Rice EMBOSS Team From uni.anastasia at gmail.com Sat Nov 26 17:32:48 2011 From: uni.anastasia at gmail.com (anastsia shapiro) Date: Sat, 26 Nov 2011 19:32:48 +0200 Subject: [Bioperl-l] Problem with parsing blast results Message-ID: Hello, I'm running a script that should parse a blast results, using searchIO. Sometimes the script work fines, however sometimes it stops, and I receive the following error. ------------- EXCEPTION ------------- MSG: no data for midline Query ------------------------------------------------------------ STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ blast.pm:1805 STACK toplevel D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 ------------------------------------- While the blast results files were received as a result of running the following blast command: blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I am using bioperl 1.6.1. I read all the forums , and it seems to be a bug, but on version 1.5 it was fixed. I will really appreciate your help, since I am trying to understand the problem for over a month. Regards, Anastasia From bunk at novozymes.com Tue Nov 29 16:46:54 2011 From: bunk at novozymes.com (Jacob Bunk Nielsen) Date: Tue, 29 Nov 2011 17:46:54 +0100 Subject: [Bioperl-l] Problem with parsing blast results In-Reply-To: (anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100") References: Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net> Hi anastsia shapiro writes: > I'm running a script that should parse a blast results, using searchIO. > > Sometimes the script work fines, however sometimes it stops, and I receive > the following error. > > ------------- EXCEPTION ------------- > MSG: no data for midline Query > ------------------------------------------------------------ > STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\ > blast.pm:1805 > STACK toplevel > D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36 > ------------------------------------- > While the blast results files were received as a result of running the > following blast command: > blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust > no -num_descriptions 0 -query xxxxx.txt -out results.txt -strand plus I don't know why this exact problem arises, but I think you should consider using an output format that is better machine parseable, like the XML format. You specify XML as output format of blastn by using -m 7. When reading the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO. That way I think you are likely to see a lot fewer problems regarding the parsing of blast output. If the above doesn't solve the problem you better show us the code that fails. Best regards Jacob From cjfields at illinois.edu Tue Nov 29 19:11:11 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 19:11:11 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED42E6C.6020501@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > Hi Chris, > > On 29/11/11 05:47, Fields, Christopher J wrote: > ... >> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. > Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > Best, > > Florent chris From cjfields at illinois.edu Tue Nov 29 22:30:58 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 29 Nov 2011 22:30:58 +0000 Subject: [Bioperl-l] BinarySearch.pm In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk> References: <4ED4E0A8.30102@ebi.ac.uk> Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu> Peter, Can you send a test file that is failing? I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files. I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions. Both changes pass tests as is, though, so I have committed them in the meantime. chris On Nov 29, 2011, at 7:39 AM, Peter Rice wrote: > In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems. > > Both appear to be in the Bio/Flat/BinarySearch.pm source file. > > EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work: > > if ($format =~ /embl/i) { > return ('ID', > "^ID (\\S+[^; ])", > "^ID (\\S+[^; ])", > { > ACC => q/^AC (\S+);/, > VERSION => q/^SV\s+(\S+)/ > }); > } > > The ACC secondary index has every record duplicated. > This line is duplicated in the write_secondary_indices source code. Is that intentional? > > print $fh sprintf("%-${length}s",$record); > > regards, > > Peter Rice > EMBOSS Team > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Wed Nov 30 01:18:41 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 11:18:41 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> Message-ID: <4ED58471.3030106@gmail.com> Chris, Yes, it is exciting to learn something new. I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? Cheers, Florent On 30/11/11 05:11, Fields, Christopher J wrote: > On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: > >> Hi Chris, >> >> On 29/11/11 05:47, Fields, Christopher J wrote: >> ... >>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? > Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. > >> Best, >> >> Florent > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Nov 30 02:34:00 2011 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 30 Nov 2011 02:34:00 +0000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: <4ED58471.3030106@gmail.com> References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > Chris, > Yes, it is exciting to learn something new. > I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: https://github.com/bioperl/Bio-Community chris > Cheers, > Florent > > On 30/11/11 05:11, Fields, Christopher J wrote: >> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >> >>> Hi Chris, >>> >>> On 29/11/11 05:47, Fields, Christopher J wrote: >>> ... >>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >> >>> Best, >>> >>> Florent >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Wed Nov 30 02:50:04 2011 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 30 Nov 2011 12:50:04 +1000 Subject: [Bioperl-l] Interest in Bio::Community modules In-Reply-To: References: <4ED30F93.4000407@gmail.com> <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu> <4ED42E6C.6020501@gmail.com> <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu> <4ED58471.3030106@gmail.com> Message-ID: <4ED599DC.6090808@gmail.com> Fantastic! Thank you very much Chris, Florent On 30/11/11 12:34, Fields, Christopher J wrote: > On Nov 29, 2011, at 7:18 PM, Florent Angly wrote: > >> Chris, >> Yes, it is exciting to learn something new. >> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon? > It's up to you. I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready: > > https://github.com/bioperl/Bio-Community > > chris > > >> Cheers, >> Florent >> >> On 30/11/11 05:11, Fields, Christopher J wrote: >>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote: >>> >>>> Hi Chris, >>>> >>>> On 29/11/11 05:47, Fields, Christopher J wrote: >>>> ... >>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two). Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose. Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine. >>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision? >>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help. And it never hurts to learn something new like Moose and other modern perl niceties. >>> >>>> Best, >>>> >>>> Florent >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From lsbrath at gmail.com Wed Nov 30 05:25:32 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 00:25:32 -0500 Subject: [Bioperl-l] Exception MSG Message-ID: Hello, Brushing up on my BioPerl and I can't figure out this MSG: ------------- EXCEPTION ------------- MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out STACK Bio::Tools::Run::RemoteBlast::save_output /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 ------------------------------------- Here is the code: #!/usr/bin/perl -w use strict; use Bio::Tools::Run::RemoteBlast; #=cut my $prog = 'blastp'; my $db = 'swissprot'; my $e_val = '1e-10'; my @params = ('-prog' => $prog, '-data' => $db, 'expect' => $e_val, 'readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #human database $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; my $v =1; # this is just to turn on and off the messages # Construct the sequence object my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format => "fasta"); while (my $input = $seq_in->next_seq()){ my $r = $factory->submit_blast($input); print STDERR "waiting..." if ($v > 0); while (my @rids = $factory->each_rid()){ foreach my $rid (@rids){ my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if($rc < 0){ $factory->remove_rid($rid); } print STDERR "." if ($v > 0); sleep 5; } else { my $result = $rc->next_result(); #save output my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } Thanks for the help! From jason.stajich at gmail.com Wed Nov 30 06:05:41 2011 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 29 Nov 2011 22:05:41 -0800 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself. On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > Hello, > > Brushing up on my BioPerl and I can't figure out this MSG: > > ------------- EXCEPTION ------------- > > MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > STACK Bio::Tools::Run::RemoteBlast::save_output > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > ------------------------------------- > Here is the code: > > #!/usr/bin/perl -w > > use strict; > > use Bio::Tools::Run::RemoteBlast; > > > #=cut > > my $prog = 'blastp'; > > my $db = 'swissprot'; > > my $e_val = '1e-10'; > > > my @params = ('-prog' => $prog, > > '-data' => $db, > > 'expect' => $e_val, > > 'readmethod' => 'SearchIO' ); > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > #human database > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > [ORGN]'; > > > my $v =1; # this is just to turn on and off the messages > > # Construct the sequence object > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format > => "fasta"); > > > while (my $input = $seq_in->next_seq()){ > > my $r = $factory->submit_blast($input); > > print STDERR "waiting..." if ($v > 0); > > while (my @rids = $factory->each_rid()){ > > foreach my $rid (@rids){ > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if($rc < 0){ > > $factory->remove_rid($rid); > > } > > print STDERR "." if ($v > 0); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save output > > my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > Thanks for the help! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ss2489 at cornell.edu Wed Nov 30 14:32:47 2011 From: ss2489 at cornell.edu (Surya Saha) Date: Wed, 30 Nov 2011 09:32:47 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: If that does not fix it, try using one of the unique identifiers as the file name (gi??) instead of the full query name. The pipe(|) characters might cause problems. On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > I don't think you need to give it the '>' when you specify the filename > for the output. That is done by the filehandle opening itsself. > > On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: > > > Hello, > > > > Brushing up on my BioPerl and I can't figure out this MSG: > > > > ------------- EXCEPTION ------------- > > > > MSG: cannot open > >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out > > > > STACK Bio::Tools::Run::RemoteBlast::save_output > > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 > > > > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 > > > > ------------------------------------- > > Here is the code: > > > > #!/usr/bin/perl -w > > > > use strict; > > > > use Bio::Tools::Run::RemoteBlast; > > > > > > #=cut > > > > my $prog = 'blastp'; > > > > my $db = 'swissprot'; > > > > my $e_val = '1e-10'; > > > > > > my @params = ('-prog' => $prog, > > > > '-data' => $db, > > > > 'expect' => $e_val, > > > > 'readmethod' => 'SearchIO' ); > > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #human database > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > [ORGN]'; > > > > > > my $v =1; # this is just to turn on and off the messages > > > > # Construct the sequence object > > > > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", > -format > > => "fasta"); > > > > > > while (my $input = $seq_in->next_seq()){ > > > > my $r = $factory->submit_blast($input); > > > > print STDERR "waiting..." if ($v > 0); > > > > while (my @rids = $factory->each_rid()){ > > > > foreach my $rid (@rids){ > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if($rc < 0){ > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ($v > 0); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save output > > > > my $filename = > ">/Users/mydata/Desktop/".$result->query_name().".out";#error > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > > > > Thanks for the help! > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Wed Nov 30 14:34:52 2011 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Wed, 30 Nov 2011 09:34:52 -0500 Subject: [Bioperl-l] Exception MSG In-Reply-To: References: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com> Message-ID: Surya, As Jason suggested, I removed the '>' and it worked. Thanks for your response. Lom On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha wrote: > If that does not fix it, try using one of the unique identifiers as the > file name (gi??) instead of the full query name. The pipe(|) characters > might cause problems. > > On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich wrote: > >> I don't think you need to give it the '>' when you specify the filename >> for the output. That is done by the filehandle opening itsself. >> >> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote: >> >> > Hello, >> > >> > Brushing up on my BioPerl and I can't figure out this MSG: >> > >> > ------------- EXCEPTION ------------- >> > >> > MSG: cannot open >> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out >> > >> > STACK Bio::Tools::Run::RemoteBlast::save_output >> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678 >> > >> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40 >> > >> > ------------------------------------- >> > Here is the code: >> > >> > #!/usr/bin/perl -w >> > >> > use strict; >> > >> > use Bio::Tools::Run::RemoteBlast; >> > >> > >> > #=cut >> > >> > my $prog = 'blastp'; >> > >> > my $db = 'swissprot'; >> > >> > my $e_val = '1e-10'; >> > >> > >> > my @params = ('-prog' => $prog, >> > >> > '-data' => $db, >> > >> > 'expect' => $e_val, >> > >> > 'readmethod' => 'SearchIO' ); >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > >> > #human database >> > >> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens >> > [ORGN]'; >> > >> > >> > my $v =1; # this is just to turn on and off the messages >> > >> > # Construct the sequence object >> > >> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", >> -format >> > => "fasta"); >> > >> > >> > while (my $input = $seq_in->next_seq()){ >> > >> > my $r = $factory->submit_blast($input); >> > >> > print STDERR "waiting..." if ($v > 0); >> > >> > while (my @rids = $factory->each_rid()){ >> > >> > foreach my $rid (@rids){ >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) { >> > >> > if($rc < 0){ >> > >> > $factory->remove_rid($rid); >> > >> > } >> > >> > print STDERR "." if ($v > 0); >> > >> > sleep 5; >> > >> > } else { >> > >> > my $result = $rc->next_result(); >> > >> > #save output >> > >> > my $filename = >> ">/Users/mydata/Desktop/".$result->query_name().".out";#error >> > >> > $factory->save_output($filename); >> > >> > $factory->remove_rid($rid); >> > >> > print "\nQuery Name: ", $result->query_name(), "\n"; >> > >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v > 0); >> > >> > print "\thit name is ", $hit->name, "\n"; >> > >> > while( my $hsp = $hit->next_hsp ) { >> > >> > print "\t\tscore is ", $hsp->score, "\n"; >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > } >> > >> > >> > >> > Thanks for the help! >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From ericdemuinck at gmail.com Wed Nov 30 23:36:36 2011 From: ericdemuinck at gmail.com (Ericde) Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST) Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form Message-ID: <32886592.post@talk.nabble.com> :-/ I am a newbie and I am trying to retrieve a blast multiple alignment in fasta form. The BLAST output (m -2) gives several alignments (which is good) and the parsing of the xml file seems to list all of these alignments (which is also good) The problem is that the fasta alignment file only includes one of the hits and the alignment does not include all the sequences (including the query sequence). I would like to generate a fasta file that includes all the alignments included in the m -2 output (plus query sequence if possible). I have cobbled together a script (below) ...I will attach the sample blast xml file and the (m -2) file as well....any insight is appreciated :/ #module load perl #give the name of the blast xml file to parse in the line where it says 'file =>' use Bio::SearchIO; #Use m -7 to generate xml file from blastall my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'BLASToutxml'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object #ENTER desired sequence length if( $hsp->length('total') > 50 ) { #ENTER desired percent identity if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; #Print alignment to file #$aln will be a Bio::SimpleAlign object use Bio::AlignIO; my $aln = $hsp->get_aln; #changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file => ">hsp.fas"); $alnIO->write_aln($aln); } } } } } http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml http://old.nabble.com/file/p32886592/hsp.fas hsp.fas -- View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.