From hrh at fmi.ch  Tue Nov  1 06:18:54 2011
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Tue, 1 Nov 2011 11:18:54 +0100
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
Message-ID: <CAD5861E.14042%hrh@fmi.ch>

Hi Carn?

Please allow me to make a few comments:

I very much like your idea of writing a free tool to edit and draw
sequences. We (ie people working in core Bioinformatics facilities) all
suffer from having to deal with files originally created with commercial
packages. And on top of all the pain, those commercial packages are very
expensive and they don't deliver what they promise to do.


Just double checking: Have you looked a the free tools which are available?

I am aware of the following ones (as far as I know, they are all GUI based
and don't have a command line API):

Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
GENtle            http://gentle.magnusmanske.de/
GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
pDRAW32           http://www.acaclone.com/
Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
UGene             http://ugene.unipro.ru/

maybe others on the list know of even better free tools?

Also, have you looked at the emboss tool "cirdna" ?


WRT file formats: I strongly suggest to stick to embl and genbank format as
input and (text) output format. The features are not indexed, but you can
create your own when you store the sequences in your system. Internally, you
probably wanna keep the data in a 'simpler' format than embl or genbank,
anyway.

Alternatively, have you looked at gff/gtf as away of getting features?
see: 

http://www.sequenceontology.org/gff3.shtml
http://mblab.wustl.edu/GTF22.html


I am looking forward to any progress you make

Regards, Hans


Hans-Rudolf Hotz, PhD
Bioinformatics Support

Friedrich Miescher Institute for Biomedical Research
Maulbeerstrasse 66
4058 Basel/Switzerland


On 10/31/11 7:05 PM, "Carn? Draug" <carandraug+dev at gmail.com> wrote:

> Hi
> 
> I've been planning on writing a free (as in freedom) tool to edit
> sequences and make plamids maps. The idea is to build the command line
> tool first and maybe later work on a GUI for it.
> 
> The problem I foresee at the moment while designing it, is how to
> change a feature of the sequence. I'm not familiar with all sequence
> formats (only fasta, ensembl and genbank) but I can't see how to
> specify from the command line what feature to edit since I can't see
> any unique identifiers for them. Is there a file format that makes
> this easier? Any tips would be most appreciated.
> 
> Thank in advance,
> Carn? Draug
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 09:40:30 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 13:40:30 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>

On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote:

> Hi,
> 
> I am having problems running Bio::Index::Fastq.  I get the following error when a quality line begins with '@'.
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: No description line parsed
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
> STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
> STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
> STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198
> STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68
> 
> 
> Here is an example of a fastq record that is causing this error, The last line which starts with an '@'  is actually the qual line.
> 
> @5:105:15806:16092:Y
> GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG
> +
> @9;A565:=8B?<E<DEEBEE<E3BB?3??BCCF2<@@=BGGBDB60:64594.81?<B??;3?8-984?
> 
> 
> 
> i see that chris has partially addressed this in the mailing list
> http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html
> 
> However as he pointed out at the time, it appears this may be a fairly large problem.

The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not.  I can try to push this to the forefront this week, the fix shouldn't be too hard to implement.  In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running.

> My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0  would work since the header lines are always the first of 4 lines , 0,4,8, etc.

That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. 

One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing.  There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again.

A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second).  The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use.

> But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence
> 
> 
> ## only for single line seq and qual
> my $line_count = 0;
>   while (<$FASTQ>) {
>       if (/^@/ and  $line_count % 4 == 0) {
>           # $begin is the position of the first character after the '@'
>           my $begin = tell($FASTQ) - length( $_ ) + 1;
>           foreach my $id (&$id_parser($_)) {
>               $self->add_record($id, $i, $begin);
>               $c++;
>           }
>       }
>       $line_count++;
>   }
> 
> 
> --
> BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID?
> 
> There's one called cdbfasta which looks like it might work ? does anyone have experience with it?

I haven't, but it appears FASTA-specific.  Does it parse FASTQ as well?  

I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well.  May have to look that one up.

> Thanks,
> sofia
> 
> P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here.


chris


From p.j.a.cock at googlemail.com  Tue Nov  1 10:38:43 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 14:38:43 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
Message-ID: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>

On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
>
> One problem the various Bio* indexers have currently is the lack of
> standardization on a specific schema for indexing. ?There are in-roads
> towards this (OBDA) that haven't been adequately traveled IMHO,
> which need to be taken up again.
>

Something to switch to open-bio-l at lists.open-bio.org for,
http://lists.open-bio.org/mailman/listinfo/open-bio-l

We can continue this thread from last summer,
http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
...
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html

And CC Peter Rice from EMBOSS too - we chatted about this
at ISMB/BOSC 2011 in July - and whomever looks after the
OBDA/indexing code in BioRuby and BioJava too.

> A second, and maybe this is more specific to BioPerl, is that the
> parsers and indexers essentially reimplement the format parsing
> in each module, so if there are bugs they have to be independently
> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
> first but not the second). ?The best place for any optimizations
> would be in a unified parser that both the SeqIO and indexer
> modules could use.

We have that problem to an extent in Biopython's Bio.SeqIO code.
The indexing code duplicates some logic of the parsing code
(how much depends on the format), sufficient to extract the read
ID and the bounds on disk. The two could be more unified but
the parsers came first and didn't want to change them at the time.
Instead I tried to be rigorous in consistency testing for the index
code's unit tests.

Regards,

Peter


From carandraug+dev at gmail.com  Tue Nov  1 11:13:06 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Tue, 1 Nov 2011 15:13:06 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
	<CAD5861E.14042%hrh@fmi.ch>
Message-ID: <CAPOrs_0rZcokpSvAhMM3gtKWgeH3knDuTfnyybPJUU5D-WEgpA@mail.gmail.com>

On 1 November 2011 10:18, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):

They are not all free. Just for future reference, here's their licenses:

> Serial Cloner

Couldn't find a license and the download for linux has no source so
I'm guessing proprietary.

> GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/

Free under GPL

> GeneCoder

Proprietary

> pDRAW32

Proprietary

> Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/

Seems public domain. License is not defined anywhere but the files I
checked had the public domain notice on the header

> Ape

Proprietary ("license" is at the top of AppMain.tcl)

> UGene ? ? ? ? ? ? http://ugene.unipro.ru/

Free under GPL

> Also, have you looked at the emboss tool "cirdna" ?

Free under GPL

> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html

Considering the already existing alternatives, I'm more likely to
collaborate with one of them to do what I want. I'll just have to
check them all and decide. I was planning on writing a new tool and
contribute it to the scripts section of bioperl since when I googled
before all the links only the proprietary tools showed up. Thank you
very much for the links.

Carn?


From roy.chaudhuri at gmail.com  Tue Nov  1 11:44:19 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 01 Nov 2011 15:44:19 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAD5861E.14042%hrh@fmi.ch>
Message-ID: <4EB013D3.30801@gmail.com>

The Sanger Institute's Artemis is good for editing sequence features, 
and DNAPlotter can be used to produce circular diagrams:

http://www.sanger.ac.uk/resources/software/artemis
http://www.sanger.ac.uk/resources/software/dnaplotter

Roy.

On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote:
> Hi Carn?
>
> Please allow me to make a few comments:
>
> I very much like your idea of writing a free tool to edit and draw
> sequences. We (ie people working in core Bioinformatics facilities) all
> suffer from having to deal with files originally created with commercial
> packages. And on top of all the pain, those commercial packages are very
> expensive and they don't deliver what they promise to do.
>
>
> Just double checking: Have you looked a the free tools which are available?
>
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):
>
> Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
> GENtle            http://gentle.magnusmanske.de/
> GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
> pDRAW32           http://www.acaclone.com/
> Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
> Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
> UGene             http://ugene.unipro.ru/
>
> maybe others on the list know of even better free tools?
>
> Also, have you looked at the emboss tool "cirdna" ?
>
>
> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html
>
>
>
> I am looking forward to any progress you make
>
> Regards, Hans
>
>
>
> Hans-Rudolf Hotz, PhD
> Bioinformatics Support
>
> Friedrich Miescher Institute for Biomedical Research
> Maulbeerstrasse 66
> 4058 Basel/Switzerland
>
>
>
> On 10/31/11 7:05 PM, "Carn? Draug"<carandraug+dev at gmail.com>  wrote:
>
>> Hi
>>
>> I've been planning on writing a free (as in freedom) tool to edit
>> sequences and make plamids maps. The idea is to build the command line
>> tool first and maybe later work on a GUI for it.
>>
>> The problem I foresee at the moment while designing it, is how to
>> change a feature of the sequence. I'm not familiar with all sequence
>> formats (only fasta, ensembl and genbank) but I can't see how to
>> specify from the command line what feature to edit since I can't see
>> any unique identifiers for them. Is there a file format that makes
>> this easier? Any tips would be most appreciated.
>>
>> Thank in advance,
>> Carn? Draug
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Tue Nov  1 12:02:24 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 1 Nov 2011 09:02:24 -0700
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>


I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Jason
On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> 
>> One problem the various Bio* indexers have currently is the lack of
>> standardization on a specific schema for indexing.  There are in-roads
>> towards this (OBDA) that haven't been adequately traveled IMHO,
>> which need to be taken up again.
>> 
> 
> Something to switch to open-bio-l at lists.open-bio.org for,
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
> 
> We can continue this thread from last summer,
> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
> ...
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
> 
> And CC Peter Rice from EMBOSS too - we chatted about this
> at ISMB/BOSC 2011 in July - and whomever looks after the
> OBDA/indexing code in BioRuby and BioJava too.
> 
>> A second, and maybe this is more specific to BioPerl, is that the
>> parsers and indexers essentially reimplement the format parsing
>> in each module, so if there are bugs they have to be independently
>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>> first but not the second).  The best place for any optimizations
>> would be in a unified parser that both the SeqIO and indexer
>> modules could use.
> 
> We have that problem to an extent in Biopython's Bio.SeqIO code.
> The indexing code duplicates some logic of the parsing code
> (how much depends on the format), sufficient to extract the read
> ID and the bounds on disk. The two could be more unified but
> the parsers came first and didn't want to change them at the time.
> Instead I tried to be rigorous in consistency testing for the index
> code's unit tests.
> 
> Regards,
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 13:44:25 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 17:44:25 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>

On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:

> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point.  

> I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data.  The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)?  Or are there problems afoot there we're unaware of?

Re: specifics, I think Biopython uses SQLite, is that correct Peter?  

chris

> Jason
> On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:
> 
>> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> 
>>> One problem the various Bio* indexers have currently is the lack of
>>> standardization on a specific schema for indexing.  There are in-roads
>>> towards this (OBDA) that haven't been adequately traveled IMHO,
>>> which need to be taken up again.
>>> 
>> 
>> Something to switch to open-bio-l at lists.open-bio.org for,
>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>> 
>> We can continue this thread from last summer,
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
>> ...
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
>> 
>> And CC Peter Rice from EMBOSS too - we chatted about this
>> at ISMB/BOSC 2011 in July - and whomever looks after the
>> OBDA/indexing code in BioRuby and BioJava too.
>> 
>>> A second, and maybe this is more specific to BioPerl, is that the
>>> parsers and indexers essentially reimplement the format parsing
>>> in each module, so if there are bugs they have to be independently
>>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>>> first but not the second).  The best place for any optimizations
>>> would be in a unified parser that both the SeqIO and indexer
>>> modules could use.
>> 
>> We have that problem to an extent in Biopython's Bio.SeqIO code.
>> The indexing code duplicates some logic of the parsing code
>> (how much depends on the format), sufficient to extract the read
>> ID and the bounds on disk. The two could be more unified but
>> the parsers came first and didn't want to change them at the time.
>> Instead I tried to be rigorous in consistency testing for the index
>> code's unit tests.
>> 
>> Regards,
>> 
>> Peter
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From p.j.a.cock at googlemail.com  Tue Nov  1 14:06:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 18:06:50 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
Message-ID: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>

On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>
>> I think a different indexer is needed for the scale of key/value
>> pairs we see in fastq files if we want to make a fast lookup by
>> ID. I think speed is of essence for this type of solution and so
>> a forced all records must be 4 lines long is okay for this type
>> of implementation.
>
> This can always be an early optimization, that's easy enough.
> But I'm sure we will have to deal with multi-line seq/qual
> FASTQ at some point.
>
>> I found NOSQL implementations to be much better
>> performance and than any of the BDB type solutions -- they
>> end up being really slow at above 1-5M keys. ?I used
>> TokyoCabinet and KyotoCabinet to do indexing of accession
>> -> taxonomy ID and found it quite fast for the needs. I
>> haven't tried storing 100bp reads + qual string as the
>> value in it yet but I think it could be done, certainly worth
>> a prototype.
>
> Adding a middle layer where the backend storage is abstracted
> is the probably the (best|most flexible) option, converging on a
> good default that will work for this data. ?The actual interface is
> in place, though would it be more feasible to go the OBDA
> (converge on a cross-Bio* compatible schema)? ?Or are there
> problems afoot there we're unaware of?
>
> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>
> chris

Yes, we're using SQLite3 to store essentially a list of filenames
and their format as one table, and then in the main table an
entry for each sequence recording the ID (only one accession,
unlike OBDA which had infrastructure for a secondary accession),
file number, offset of the start of the record, and optionally the
length of the record on disk.

i.e. Basically what OBDA does, but using SQLite rather
than BDB (not included in Python 3) or a flat file index
(poor performance with large datasets).

I find this design attractive on several levels:
* File format neutral, covers FASTA, FASTQ, GenBank, etc
* Preserves the original file untouched
* Index is a small single file (thanks to SQLite)
* Back end could be switched out
* Could be applied to compressed file formats
* Reuses existing parsing code to access entries

This could easily form basis of OBDA v2, the main points
of difference I anticipate between the Bio* projects would
be naming conventions for the different file formats, and
what we consider to be the default record ID of each read
(e.g. which field in a GenBank file - although agreement
here is not essential). Some of that was already settled in
principle with OBDA v1.

On the other hand, you could try and store the parsed data
itself, which is where NOSQL looks more interesting. That
essentially requires the ability to serialise your annotated
sequence object model to disk - which would be tricky to do
cross project (much more ambitious than BioSQL is). It also
means the "index" becomes very large because it now holds
all the original data.

Peter


From wenbinmei at gmail.com  Wed Nov  2 00:25:32 2011
From: wenbinmei at gmail.com (wenbin mei)
Date: Wed, 2 Nov 2011 00:25:32 -0400
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
Message-ID: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>

Hi,

I need some help in coding. I have a multiple sequence alignment which has
gaps. And also I have a reference genome sequence in the alignment which I
know all the coordinates for the protein coding genes. I want to extract
all these protein coding genes alignment from the big alignment. I am using
Bio SimpleAlign but the question is that due to the gaps in the alignment,
the coordinates has shifted in the alignment. I wonder is there a way I can
not count the gaps and still be able to extract the protein alignment. One
way I can do is remove the gaps in the reference first and then extract the
sequence. But I don't like this way ... Thank you for help.

-best,
wenbin

From dejian.zhao at gmail.com  Wed Nov  2 09:33:18 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Wed, 02 Nov 2011 21:33:18 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
	phylogenetic tree
Message-ID: <4EB1469E.4050108@gmail.com>

There are various packages on CPAN to cope with phylogenetic analysis. I 
wonder which module can read the output from other phylogenetic 
softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to 
produce a picture which combines the phylogenetic tree and the structure 
of each gene.

From roy.chaudhuri at gmail.com  Wed Nov  2 09:49:46 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 02 Nov 2011 13:49:46 +0000
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB1469E.4050108@gmail.com>
References: <4EB1469E.4050108@gmail.com>
Message-ID: <4EB14A7A.30307@gmail.com>

MEGA can export trees in Newick format, which can be read by 
Bio::TreeIO. The tree can be drawn in EPS format using 
Bio::Tree::Draw::Cladogram. See:
http://www.bioperl.org/wiki/HOWTO:Trees

Roy.

On 02/11/2011 13:33, Dejian Zhao wrote:
> There are various packages on CPAN to cope with phylogenetic analysis. I
> wonder which module can read the output from other phylogenetic
> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
> produce a picture which combines the phylogenetic tree and the structure
> of each gene.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Wed Nov  2 12:29:45 2011
From: jun.yin at ucd.ie (Jun Yin)
Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT)
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
 alignment
In-Reply-To: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie>

Hi,
 
You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
 
$aln2 = $aln->slice(20, 30);
 
Cheers,
Jun


----- Original Message -----
From: wenbin mei <wenbinmei at gmail.com>
Date: Wednesday, November 2, 2011 5:51 am
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
To: bioperl-l at lists.open-bio.org

> Hi,
> 
> I need some help in coding. I have a multiple sequence alignment 
> which has
> gaps. And also I have a reference genome sequence in the 
> alignment which I
> know all the coordinates for the protein coding genes. I want to 
> extractall these protein coding genes alignment from the big 
> alignment. I am using
> Bio SimpleAlign but the question is that due to the gaps in the 
> alignment,the coordinates has shifted in the alignment. I wonder 
> is there a way I can
> not count the gaps and still be able to extract the protein 
> alignment. One
> way I can do is remove the gaps in the reference first and then 
> extract the
> sequence. But I don't like this way ... Thank you for help.
> 
> -best,
> wenbin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From dejian.zhao at gmail.com  Wed Nov  2 21:39:22 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Thu, 03 Nov 2011 09:39:22 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB14A7A.30307@gmail.com>
References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com>
Message-ID: <4EB1F0CA.80309@gmail.com>

That's great!
Many thanks, Roy.

On 2011-11-2 21:49, Roy Chaudhuri wrote:
> MEGA can export trees in Newick format, which can be read by 
> Bio::TreeIO. The tree can be drawn in EPS format using 
> Bio::Tree::Draw::Cladogram. See:
> http://www.bioperl.org/wiki/HOWTO:Trees
>
> Roy.
>
> On 02/11/2011 13:33, Dejian Zhao wrote:
>> There are various packages on CPAN to cope with phylogenetic analysis. I
>> wonder which module can read the output from other phylogenetic
>> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
>> produce a picture which combines the phylogenetic tree and the structure
>> of each gene.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From noncoding at gmail.com  Thu Nov  3 05:59:26 2011
From: noncoding at gmail.com (Remo Sanges)
Date: Thu, 03 Nov 2011 10:59:26 +0100
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
	alignment
In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
	<7300ecdd1dd56.4eb16ff9@ucd.ie>
Message-ID: <4EB265FE.30909@gmail.com>

To get the location in the initial sequence starting from a column in a 
multiple alignment you can:

1) create a Bio::LocatableSeq compliant object by using the method 
each_seq_with_id on the SimpleAlign object

2) then using the method location_from_column on the created 
LocatableSeq object

HTH

ERemo


-- 
Remo Sanges
Bioinformatics - Animal Physiology and Evolution
Stazione Zoologica Anton Dohrn
Villa Comunale, 80121 Napoli - Italy
+39 081 5833428


On 11/2/11 5:29 PM, Jun Yin wrote:
> Hi,
>
> You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
>
> $aln2 = $aln->slice(20, 30);
>
> Cheers,
> Jun
>
>
> ----- Original Message -----
> From: wenbin mei<wenbinmei at gmail.com>
> Date: Wednesday, November 2, 2011 5:51 am
> Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
> To: bioperl-l at lists.open-bio.org
>
>> Hi,
>>
>> I need some help in coding. I have a multiple sequence alignment
>> which has
>> gaps. And also I have a reference genome sequence in the
>> alignment which I
>> know all the coordinates for the protein coding genes. I want to
>> extractall these protein coding genes alignment from the big
>> alignment. I am using
>> Bio SimpleAlign but the question is that due to the gaps in the
>> alignment,the coordinates has shifted in the alignment. I wonder
>> is there a way I can
>> not count the gaps and still be able to extract the protein
>> alignment. One
>> way I can do is remove the gaps in the reference first and then
>> extract the
>> sequence. But I don't like this way ... Thank you for help.
>>
>> -best,
>> wenbin
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Gallone at sms.ed.ac.uk  Thu Nov  3 07:50:11 2011
From: G.Gallone at sms.ed.ac.uk (Giuseppe G.)
Date: Thu, 03 Nov 2011 11:50:11 +0000
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk>

Hi,

I would be grateful if you could shed some light on the exact meaning of 
the method overall_percentage_identity in Bio::SimpleAlign.

If I understand correctly, the method works by considering only 
aminoacids that are identical over all the members of the alignment, and 
then averaging over the total number of aminoacids in the sequence. Is 
this correct?

Thank you
Giuseppe
-- 

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.

From David.Messina at sbc.su.se  Thu Nov  3 09:22:21 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 3 Nov 2011 14:22:21 +0100
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk>
References: <4EB27FF3.9050203@sms.ed.ac.uk>
Message-ID: <CAM3TQQWm46SWfu-6DANDaoppi8oLKGuzwGm8uxkVkf_JAog3xg@mail.gmail.com>

Hi Giuseppe,

If I understand correctly, the method works by considering only aminoacids
> that are identical over all the members of the alignment


Yes.


> , and then averaging over the total number of aminoacids in the sequence.
> Is this correct?
>

Almost.

By default, the denominator is the alignment length, namely the length of
the MSA including gaps. By means of the 'short' and 'long' options, it's
also possible to use the shortest or longest sequence's ungapped lengths as
the denominator.


Dave

From cjfields at illinois.edu  Thu Nov  3 14:28:36 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 18:28:36 +0000
Subject: [Bioperl-l] OBDA redux? was Re:  Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
	<CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
Message-ID: <ED419B5E-9C55-478F-BDD6-C2B663ABE636@illinois.edu>

(side thread, so re-titling...)

On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>> 
>>> I think a different indexer is needed for the scale of key/value
>>> pairs we see in fastq files if we want to make a fast lookup by
>>> ID. I think speed is of essence for this type of solution and so
>>> a forced all records must be 4 lines long is okay for this type
>>> of implementation.
>> 
>> This can always be an early optimization, that's easy enough.
>> But I'm sure we will have to deal with multi-line seq/qual
>> FASTQ at some point.
>> 
>>> I found NOSQL implementations to be much better
>>> performance and than any of the BDB type solutions -- they
>>> end up being really slow at above 1-5M keys.  I used
>>> TokyoCabinet and KyotoCabinet to do indexing of accession
>>> -> taxonomy ID and found it quite fast for the needs. I
>>> haven't tried storing 100bp reads + qual string as the
>>> value in it yet but I think it could be done, certainly worth
>>> a prototype.
>> 
>> Adding a middle layer where the backend storage is abstracted
>> is the probably the (best|most flexible) option, converging on a
>> good default that will work for this data.  The actual interface is
>> in place, though would it be more feasible to go the OBDA
>> (converge on a cross-Bio* compatible schema)?  Or are there
>> problems afoot there we're unaware of?
>> 
>> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>> 
>> chris
> 
> Yes, we're using SQLite3 to store essentially a list of filenames
> and their format as one table, and then in the main table an
> entry for each sequence recording the ID (only one accession,
> unlike OBDA which had infrastructure for a secondary accession),
> file number, offset of the start of the record, and optionally the
> length of the record on disk.
> 
> i.e. Basically what OBDA does, but using SQLite rather
> than BDB (not included in Python 3) or a flat file index
> (poor performance with large datasets).
> 
> I find this design attractive on several levels:
> * File format neutral, covers FASTA, FASTQ, GenBank, etc
> * Preserves the original file untouched
> * Index is a small single file (thanks to SQLite)
> * Back end could be switched out
> * Could be applied to compressed file formats
> * Reuses existing parsing code to access entries
> 
> This could easily form basis of OBDA v2, the main points
> of difference I anticipate between the Bio* projects would
> be naming conventions for the different file formats, and
> what we consider to be the default record ID of each read
> (e.g. which field in a GenBank file - although agreement
> here is not essential). Some of that was already settled in
> principle with OBDA v1.

The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested).

> On the other hand, you could try and store the parsed data
> itself, which is where NOSQL looks more interesting. That
> essentially requires the ability to serialise your annotated
> sequence object model to disk - which would be tricky to do
> cross project (much more ambitious than BioSQL is). It also
> means the "index" becomes very large because it now holds
> all the original data.
> 
> Peter

For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc).  Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs).  Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully.

Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs.

chris


From p.j.a.cock at googlemail.com  Thu Nov  3 14:52:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 3 Nov 2011 18:52:50 +0000
Subject: [Bioperl-l] OBDA redux?
Message-ID: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>

On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> (side thread, so re-titling...)
>

And CC'ing open-bio-l, which is a better home for this than bioperl-l,
where OBDA v2 talk came up again in discussion of a BioPerl indexing
problem. Archive links for thread here:

http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>
>> Yes, we're using SQLite3 to store essentially a list of filenames
>> and their format as one table, and then in the main table an
>> entry for each sequence recording the ID (only one accession,
>> unlike OBDA which had infrastructure for a secondary accession),
>> file number, offset of the start of the record, and optionally the
>> length of the record on disk.
>>
>> i.e. Basically what OBDA does, but using SQLite rather
>> than BDB (not included in Python 3) or a flat file index
>> (poor performance with large datasets).
>>
>> I find this design attractive on several levels:
>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>> * Preserves the original file untouched
>> * Index is a small single file (thanks to SQLite)
>> * Back end could be switched out
>> * Could be applied to compressed file formats
>> * Reuses existing parsing code to access entries
>>
>> This could easily form basis of OBDA v2, the main points
>> of difference I anticipate between the Bio* projects would
>> be naming conventions for the different file formats, and
>> what we consider to be the default record ID of each read
>> (e.g. which field in a GenBank file - although agreement
>> here is not essential). Some of that was already settled in
>> principle with OBDA v1.
>
> The primary/secondary IDs could be configurable with a sane
> default, I think the bioperl implementations allowed this (and
> it is certainly something that will be requested).

One reason I went with a single ID only was to keep the
Python dictionary based API simple (think hash in Perl).
You don't get secondary keys in a Python dict or a hash ;)

As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
can provide a call back function to map the suggested ID to
something else. Obviously this doesn't give the full flexibility
of extracting a field from the record's annotation because we
don't parse the whole record during indexing (it would be too
slow).

However, I'm happy for there to be an *optional* secondary
key in an OBDA v2 SQLite schema, but Biopython probably
won't populate it. We could optionally use it rather than the
primary ID on loading an existing index though.

Personally I would stick with one key in the index - it should
be faster and makes it simpler to switch the back end if we
need to later. If anyone wants a second key, they can build
a second index *grin*.

>> On the other hand, you could try and store the parsed data
>> itself, which is where NOSQL looks more interesting. That
>> essentially requires the ability to serialise your annotated
>> sequence object model to disk - which would be tricky to do
>> cross project (much more ambitious than BioSQL is). It also
>> means the "index" becomes very large because it now holds
>> all the original data.
>>
>> Peter
>
> For a fully cross-Bio* compliant format, I don't think it's feasible
> to use serialized data unless they are serialized in something
> that is easily deserialized across HLLs (JSON, BSON, YAML,
> XML, etc).  Either that, or such data is stored concurrently with
> the binary blob, along with meta data that indicates the source
> of the blob, parser, version, etc, etc (unless there are tools out
> there that reliably interconvert serialized complex data structures
> between HLLs).  Anyway you go about it, it seems like it could
> be a major ball of hurt, unless implemented very carefully.

You missed out RDF as a serialisation ;)

But yes, going down the shared serialisation route is going
to be messy - as you are well aware:

> Aside: I think this was one of the problems with
> Bio::DB::SeqFeature::Store, in that it at one point stored
> Perl-specific Storable blobs.
>
> chris

Peter

From cjfields at illinois.edu  Thu Nov  3 15:47:51 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 19:47:51 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
Message-ID: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>

On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:

> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> (side thread, so re-titling...)
>> 
> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
> where OBDA v2 talk came up again in discussion of a BioPerl indexing
> problem. Archive links for thread here:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

yes, good idea...

>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>> 
>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>> and their format as one table, and then in the main table an
>>> entry for each sequence recording the ID (only one accession,
>>> unlike OBDA which had infrastructure for a secondary accession),
>>> file number, offset of the start of the record, and optionally the
>>> length of the record on disk.
>>> 
>>> i.e. Basically what OBDA does, but using SQLite rather
>>> than BDB (not included in Python 3) or a flat file index
>>> (poor performance with large datasets).
>>> 
>>> I find this design attractive on several levels:
>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>> * Preserves the original file untouched
>>> * Index is a small single file (thanks to SQLite)
>>> * Back end could be switched out
>>> * Could be applied to compressed file formats
>>> * Reuses existing parsing code to access entries
>>> 
>>> This could easily form basis of OBDA v2, the main points
>>> of difference I anticipate between the Bio* projects would
>>> be naming conventions for the different file formats, and
>>> what we consider to be the default record ID of each read
>>> (e.g. which field in a GenBank file - although agreement
>>> here is not essential). Some of that was already settled in
>>> principle with OBDA v1.
>> 
>> The primary/secondary IDs could be configurable with a sane
>> default, I think the bioperl implementations allowed this (and
>> it is certainly something that will be requested).
> 
> One reason I went with a single ID only was to keep the
> Python dictionary based API simple (think hash in Perl).
> You don't get secondary keys in a Python dict or a hash ;)
> 
> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
> can provide a call back function to map the suggested ID to
> something else. Obviously this doesn't give the full flexibility
> of extracting a field from the record's annotation because we
> don't parse the whole record during indexing (it would be too
> slow).

Same with bioperl.

> However, I'm happy for there to be an *optional* secondary
> key in an OBDA v2 SQLite schema, but Biopython probably
> won't populate it. We could optionally use it rather than the
> primary ID on loading an existing index though.

Optional implementation of that is fine by me.

> Personally I would stick with one key in the index - it should
> be faster and makes it simpler to switch the back end if we
> need to later. If anyone wants a second key, they can build
> a second index *grin*.

That's easy enough.

>>> On the other hand, you could try and store the parsed data
>>> itself, which is where NOSQL looks more interesting. That
>>> essentially requires the ability to serialise your annotated
>>> sequence object model to disk - which would be tricky to do
>>> cross project (much more ambitious than BioSQL is). It also
>>> means the "index" becomes very large because it now holds
>>> all the original data.
>>> 
>>> Peter
>> 
>> For a fully cross-Bio* compliant format, I don't think it's feasible
>> to use serialized data unless they are serialized in something
>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>> XML, etc).  Either that, or such data is stored concurrently with
>> the binary blob, along with meta data that indicates the source
>> of the blob, parser, version, etc, etc (unless there are tools out
>> there that reliably interconvert serialized complex data structures
>> between HLLs).  Anyway you go about it, it seems like it could
>> be a major ball of hurt, unless implemented very carefully.
> 
> You missed out RDF as a serialisation ;)
> 
> But yes, going down the shared serialisation route is going
> to be messy - as you are well aware:
> 
>> Aside: I think this was one of the problems with
>> Bio::DB::SeqFeature::Store, in that it at one point stored
>> Perl-specific Storable blobs.
>> 
>> chris
> 
> Peter

yes, it's a problem w/o an easy solution.  Anyway, I think an implementation of such at this point would be a premature optimization.  

chris

From biojiangke at gmail.com  Tue Nov  8 17:29:54 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST)
Subject: [Bioperl-l] Some questions about the Bio::PopGen
In-Reply-To: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
References: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
Message-ID: <32805996.post@talk.nabble.com>


I think the pi calculated in the function isn't really the pi as defined. You
need to divide the value by total number of sites (in your case, it's 5,
which is not your individual number but sequence length). I think the reason
they implemented this way is that sometimes it's easier to work only with
variable sites. 

The aln to population function converts an aln object to a population
object. You can't really see the object unless you write additional codes to
write it out or do some calculations on it. 

The third question depends on your specific needs. For population level
analyses of molecular evolution, I usually create a multiple sequence
alignment with other applications (clustalw etc), then manually adjust the
alignments to make sure they represent homology. I wouldn't touch the
alignment once this is done but only make an aln (or whatever format you
want) for inputting to analyses applications, like Bio::PopGen (usually use
the aln_to_population function you're using now).


Qian Zhao wrote:
> 
> Hi
> Recently, I am learning how to caculate pi, Fst, Tajima D using
> Bio::PopGen.
> I am not familiar with Perl and I am really confused with the following
> problems.
> (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used
> to caculate is this:
>     __DATA__
> 01 A01 A
> 01 A02 A
> 01 A03 A
> 01 A04 A
> 01 A05 A
> 02 A01 A
> 02 A02 T
> 02 A03 T
> 02 A04 T
> 02 A05 T
> 03 A01 G
> 03 A02 G
> 03 A03 G
> 03 A04 G
> 03 A05 G
> 04 A01 G
> 04 A02 G
> 04 A03 C
> 04 A04 C
> 04 A05 G
> 05 A01 T
> 05 A02 C
> 05 A03 T
> 05 A04 T
> 05 A05 T
> And I am not sure if I can use these sequences below to demostrate the
> prettybase format above:
>>A01
> AAGGT
>>A02
> ATGGC
>>A03
> ATGCT
>>A04
> ATGCT
>>A05
> ATGGT
> The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I
> use DnaSP. I find that if the 1.4/5=0.28, which means that if the number
> from Bio::PopGen::Statistics is divided by the individula number, the
> result
> would be exactly the same. Is there something wrong in my perl script? The
> code I used was below:
> #/usr/bin/perl -w
> use warnings;
> use strict;
> use Bio::PopGen::Genotype;
>  my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'gene_1',
>                                            -individual_id => '001',
>                                            -alleles       => ['1','5'] );
> use Bio::PopGen::Individual;
>  my $ind = Bio::PopGen::Individual->new(-unique_id  => '001',
>                                         -genotypes  => [$genotype] );
> $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  use Bio::PopGen::Population;
>  my $pop = Bio::PopGen::Population->new(-name        => 'Bm',
>                                         -description => 'description',
>                                         -individuals => [$ind] );
> use Bio::PopGen::IO;
> use Bio::PopGen::Statistics;
> my $nummarkers = $pop->get_marker_names;
> my $stats = Bio::PopGen::Statistics->new();
> my $io = Bio::PopGen::IO->new (-format => 'prettybase',
>                                -file => '1.txt');
> if( my $pop = $io->next_population ) {
>     my $pi = $stats->pi($pop, $nummarkers);
>     print "pi is $pi\n";
> my @inds;
>     for my $ind ( $pop->get_Individuals ) {
>         if( $ind->unique_id =~ /A0[1-3]/ ) {
>             push @inds, $ind;
>         }
>     }
>     print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n";
> }
> 
> (2) I want to use Bio::PopGen::Utilities to translate the alignment file
> to
> the population file. However, I can not find the result file after the
> program. I use the following code:
> use Bio::PopGen::Utilities;
>   use Bio::AlignIO;
> 
>   my $in = Bio::AlignIO->new(-file   => 't/data/t7.aln',
>                             -format => 'clustalw');
> my $aln = $in->next_aln;
> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln);
> my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model =>
> 'cod',
>                                                          -alignment  =>
> $aln);
> I am not sure where I should add my result file' name in the code.
> (3) If my file contains a lot of individual sequences and one individual
> has
> one genotype. I'd like to know how can I use the  Bio::PopGen::Individual,
> Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which
> can used in Bio::PopGen::Statistics ?
> 
> I will be great appreciated if I can get the answers. Thanks for your time
> and Best Wishes!
>                                                    Qian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biojiangke at gmail.com  Tue Nov  8 17:51:22 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST)
Subject: [Bioperl-l] questions about the bioperl module
 Bio::PopGen::Statistics
In-Reply-To: <201106012030039537050@gmail.com>
References: <201106012030039537050@gmail.com>
Message-ID: <32805997.post@talk.nabble.com>


If you read the Bio::PopGen doc, you'll see there is an optional argument for
the function that calculates pi, which is taking the number of sites into
consideration. Also, when you use the aln_to_population function to input an
alignment, you can use the option to take in all sites, including the
monomorphic sites. I think if you implement both in your script, you'll get
the same pi value as from other applications like DnaSP.

In terms of sliding window analyses, you may have to implement your own
method to move along the windows, but I think DnaSP is ready to do that, you
don't have to write your won script.
  

lvu.jun wrote:
> 
> Hi, there,
> I am trying to calculate the population genetics parameters such as pi
> using the bioperl module Bio::PopGen::Statistics. But I found that the
> method only requires the input of the marker genotype of every individuals
> for the population. I don't know why the module does not take the DNA
> sequence length into consideration when calculating the pi value.
> According to the definition of the pi value, besides the polymorphic
> sites, we also need the monomorphic sites that should be incorporated in
> the denominator when doing the calculation. Is it right? therefore I'm
> confused about the module, who can tell me why it can correctly calculate
> the pi value only with the marker(polymorphic) genotype?
> Another question, if I want to calculate the pi value using the sliding
> window along the genome, how can I do this using the
> Bio::PopGen::Statistics module?
> Thanks for your help!
> Yours sincerely,
> Jun
> 
> Chinese Academy of Sciences
> 
> 2011-06-01 
> 
> 
> 
> lvu.jun 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From shachigahoimbi at gmail.com  Wed Nov  9 00:22:33 2011
From: shachigahoimbi at gmail.com (Shachi Gahoi)
Date: Wed, 9 Nov 2011 10:52:33 +0530
Subject: [Bioperl-l] Run FGENESH using bioperl
Message-ID: <CACyyM1ZOiMspVH3hF4fJOvedw=8YzZDuuzJRHsuJUJ=mkuYyng@mail.gmail.com>

Dear All.

I have multi-fasta sequence file and I want to run FGENESH and I would like
to run the FGENESH for sequence one by one stored in multi-fasta sequence
file.

Is it possible using Bioperl ?

Please guide me.

Thanks in advance.


-- 
Regards,
Shachi

From pankajt322 at gmail.com  Thu Nov  3 08:12:44 2011
From: pankajt322 at gmail.com (pankaj)
Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT)
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
Message-ID: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>


On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
> Dear all,
>
> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
> from fasta file and then I want to rename same file with that ORF ID
> "PITG_14194".
>
> I have many files and I want to do same exercise with all sequence files.
>
> Please tell me how can i do this with perl or bioperl.
>
> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>
> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From azaballos at isciii.es  Wed Nov  9 06:28:21 2011
From: azaballos at isciii.es (Angel Zaballos)
Date: Wed, 9 Nov 2011 12:28:21 +0100
Subject: [Bioperl-l] bp_genbank2gff.pl  bug
Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>

Running bp_genbank2gff.pl got this:

[root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff
Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251.


?ngel Zaballos
Unidad de Gen?mica
Centro Nacional de Microbiolog?a-ISCIII
Carretera Majadahonda-Pozuelo, Km 2,2
28220-Majadahonda

Tel: 918223994
mail:  azaballos at isciii.es


************************* AVISO LEGAL *************************
Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
pudiendo contener documentos anexos de car?cter privado y confidencial.
Si por error, ha recibido este mensaje y no se encuentra entre los
destinatarios, por favor, no use, informe, distribuya, imprima o copie su
contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
cuando no responda a las funciones atribuidas al remitente del mismo por la
normativa vigente.


From scott at scottcain.net  Wed Nov  9 11:12:02 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 11:12:02 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
Message-ID: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>

Hi Angel,

I would suggest using bp_genbank2gff3.pl, as it is more actively
maintained; the bp_genbank2gff.pl script hasn't really been touched in many
years, and I imagine it's suffering from some serious code rot.

Scott


2011/11/9 Angel Zaballos <azaballos at isciii.es>

> Running bp_genbank2gff.pl got this:
>
> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> AAXT01000001.1 > babesichr3.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
>
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 11:13:10 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 16:13:10 +0000
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
	<bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
Message-ID: <CAPOrs_030887wt=T7ZJyDUid92poO+FX4kKkRFTzWweXi5ffvw@mail.gmail.com>

On 3 November 2011 12:12, pankaj <pankajt322 at gmail.com> wrote:
>
>
> On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
>> Dear all,
>>
>> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
>> from fasta file and then I want to rename same file with that ORF ID
>> "PITG_14194".
>>
>> I have many files and I want to do same exercise with all sequence files.
>>
>> Please tell me how can i do this with perl or bioperl.
>>
>> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>>
>> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
>> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
>> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
>> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
>> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
>> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
>> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
>> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
>> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>>

---------- Forwarded message ----------
From: Jason Stajich <jason.stajich at gmail.com>
Date: 21 October 2011 10:56
Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl
To: Shachi Gahoi <shachigahoimbi at gmail.com>
Cc: bioperl-l at bioperl.org


easy to do this with a simple regular expression and opening a new
file.  Have you read up on this concept in Perl.
You can use SeqIO to parse FASTA files - did you read the HOWTO and
website documentation first?

We don't typically do people's work for them on this mailing list so
please show some effort first.


From scott at scottcain.net  Wed Nov  9 13:43:00 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 13:43:00 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
Message-ID: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>

Hi Chris,

Actually, removing it from the distribution (but letting it remain in the
code repository) is not a bad idea.  I can't really think of a down side.

Scott


2011/11/9 Fields, Christopher J <cjfields at illinois.edu>

> Scott,
>
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or
> remove it altogether)?
>
> chris
>
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
>
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus
> destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie
> su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III
> no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este
> mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo
> por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 13:39:52 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 18:39:52 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>

Scott,

Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?

chris

On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:

> Hi Angel,
> 
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> years, and I imagine it's suffering from some serious code rot.
> 
> Scott
> 
> 
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> 
>> Running bp_genbank2gff.pl got this:
>> 
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> 
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov  9 14:51:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 19:51:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <C0212F3D-AFD7-41A4-9649-B876FAFA7C02@illinois.edu>

Scott,

It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder.  Either would prevent it from being packaged and installed in future versions.  

(Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules)

chris

On Nov 9, 2011, at 12:43 PM, Scott Cain wrote:

> Hi Chris,
> 
> Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea.  I can't really think of a down side.
> 
> Scott
> 
> 
> 2011/11/9 Fields, Christopher J <cjfields at illinois.edu>
> Scott,
> 
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?
> 
> chris
> 
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
> 
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 15:39:17 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 20:39:17 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>

On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> Hi Chris,
>
> Actually, removing it from the distribution (but letting it remain in the
> code repository) is not a bad idea. ?I can't really think of a down side.
>
> Scott

Hi

can I suggest instead to simply make the script issue a warning right
at the start? Something like "bp_genbank2gff is obsolete and will be
removed from a future version of bioerl; please use bp_genbank2gff3
instead". You could leave it there for the next 2 releases and then
finally remove it. This would have 2 advantages:

1) people that have been using it will immediately know what to use as
replacement (instead of coming and ask in the mailing list)?
2) people who use it but don't know anything about the subject,
someone told them to "just press this button" or "just type this in
the terminal", won't have suddenly a broken system and will have time
to find someone that will make it work again.

That's what's done in GNU octave and I think it works good there.
Carn?


From scott at scottcain.net  Wed Nov  9 15:48:07 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 15:48:07 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
Message-ID: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>

Hi Carn?,

You are absolutely correct; that is the right way to do it.  I'll add that
right now (and if the original posts fix is an easy one, I'll fix that too
:-)

Scott


2011/11/9 Carn? Draug <carandraug+dev at gmail.com>

> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
>
> Hi
>
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
>
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
>
> That's what's done in GNU octave and I think it works good there.
> Carn?
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 16:59:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 21:59:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
	<CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
Message-ID: <C86AC2F8-F8E8-431D-83A6-39E896C23485@illinois.edu>

Works for me, it's a standard deprecation policy.  The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning).

chris

On Nov 9, 2011, at 2:48 PM, Scott Cain wrote:

> Hi Carn?,
> 
> You are absolutely correct; that is the right way to do it.  I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-)
> 
> Scott
> 
> 
> 2011/11/9 Carn? Draug <carandraug+dev at gmail.com>
> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
> 
> Hi
> 
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
> 
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
> 
> That's what's done in GNU octave and I think it works good there.
> Carn?
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From biopython at maubp.freeserve.co.uk  Thu Nov 10 08:09:40 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 13:09:40 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <31659982.post@talk.nabble.com>
References: <31659982.post@talk.nabble.com>
Message-ID: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>

Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html

On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>
> I received the following error while trying to run bl2seq from
> standaloneblastplus. Has anyone else encountered this problem?
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: /usr/bin/blastp call crashed: There was a problem running
> /usr/bin/blastp : Error: NCBI C++ Exception:
>
> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
> access NULL pointer.
>
> Thank you,
> Ryan

Just hit something very very similar, looks like a BLAST+ bug which I
will report now:

$ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
Error: NCBI C++ Exception:
    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
Attempt to access NULL pointer.

This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
BLAST 2.2.24+ (blastp) from the look of the error. The line number has
changed by one, but I'm confident it is the same point of failure.

In my case I was comparing nucleotide against nucleotide, so should
have been using tblastx not tblastn, but it still shouldn't have had a
pointer exception.

Peter

From cjfields at illinois.edu  Thu Nov 10 09:00:46 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 14:00:46 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI
	C++	Exception
In-Reply-To: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
Message-ID: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>

On Nov 10, 2011, at 7:09 AM, Peter wrote:

> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
> 
> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>> 
>> I received the following error while trying to run bl2seq from
>> standaloneblastplus. Has anyone else encountered this problem?
>> 
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: /usr/bin/blastp call crashed: There was a problem running
>> /usr/bin/blastp : Error: NCBI C++ Exception:
>> 
>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>> access NULL pointer.
>> 
>> Thank you,
>> Ryan
> 
> Just hit something very very similar, looks like a BLAST+ bug which I
> will report now:
> 
> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
> Error: NCBI C++ Exception:
>    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
> Attempt to access NULL pointer.
> 
> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
> changed by one, but I'm confident it is the same point of failure.
> 
> In my case I was comparing nucleotide against nucleotide, so should
> have been using tblastx not tblastn, but it still shouldn't have had a
> pointer exception.
> 
> Peter

Yeah, that's bad.  I have seen a few things like this myself that make me worry about the transition to BLAST+.

chris

PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?


From casaburi at ceinge.unina.it  Thu Nov 10 07:29:55 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST)
Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
Message-ID: <32818254.post@talk.nabble.com>


Hi everybody,

i have some reads (454) where there are adaptors (NNNN...), one,two or three
adaptors for each reads depending on the reads. Is there any way to
establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
over the total ???

>271-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>272-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>273-88
GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>274-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA

The problem is that some adpators occur in the middle of the sequences
because they coming out from a concameration experimental design (they are
miRNAs between NNNNNN...). So i want to know a script or tool that may say
how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
number of reads. Do you know any tool/script that may help ? Tnx 
Can anyone suggests me a script to fix this ???

Thank you very much 
-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jovel_juan at hotmail.com  Thu Nov 10 11:06:16 2011
From: jovel_juan at hotmail.com (Juan Jovel)
Date: Thu, 10 Nov 2011 16:06:16 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <32818254.post@talk.nabble.com>
References: <32818254.post@talk.nabble.com>
Message-ID: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>


There are many ways to do it. 
Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
For example: 
$adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
You then place that result in a hash bin:
my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
# Then you can sort and output your classes
foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }

You can workout the details, but something like this should work.


> Date: Thu, 10 Nov 2011 04:29:55 -0800
> From: casaburi at ceinge.unina.it
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
> 
> 
> Hi everybody,
> 
> i have some reads (454) where there are adaptors (NNNN...), one,two or three
> adaptors for each reads depending on the reads. Is there any way to
> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
> over the total ???
> 
> >271-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
> >272-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
> >273-88
> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
> >274-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
> 
> The problem is that some adpators occur in the middle of the sequences
> because they coming out from a concameration experimental design (they are
> miRNAs between NNNNNN...). So i want to know a script or tool that may say
> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
> number of reads. Do you know any tool/script that may help ? Tnx 
> Can anyone suggests me a script to fix this ???
> 
> Thank you very much 
> -- 
> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From scott at scottcain.net  Thu Nov 10 11:55:53 2011
From: scott at scottcain.net (Scott Cain)
Date: Thu, 10 Nov 2011 11:55:53 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
Message-ID: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>

Hi Angel,

Please keep correspondence on the mailing list.

I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
and it worked fine.  I suspect there is something wrong with your genbank
file.

Scott


On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:

> His Scott,
>
> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
> happened:
>
> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
> babesichr3_2.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
> UNIVERSAL->import is deprecated and will be removed in a future perl at
> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>
> However, the output file seems to be correct (Indeed, that was also the
> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>
> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
> babesiachr3_2.gff
> Reading babesiachr3_2.gff
> Read 4776 transcripts in 8821 lines in 1 files
>   4776 groups 1 seqs 1 sources 6 feature types
> 2379 gene predictions
>
> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
> Mac with Parallels. Maybe tis is the cause for such a message.
>
> Regards
>
>
> ?ngel
>
>
> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>
> Hi Angel,
>
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many years, and I imagine it's suffering from some serious code rot.
>
> Scott
>
>
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>
>> Running bp_genbank2gff.pl got this:
>>
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>>
>>
>>
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>>
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>>
>>
>>
>>
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este
>> mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por
>> la
>> normativa vigente.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From l.m.timmermans at students.uu.nl  Thu Nov 10 12:17:12 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Thu, 10 Nov 2011 18:17:12 +0100
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <CAC1jpXAW_MTQjBY8Z8ffr67g_0TrGwWddixuQvtTB19+S+DLVg@mail.gmail.com>

On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel <jovel_juan at hotmail.com> wrote:

>
> There are many ways to do it.
> Perhaps the simplest is to count the number of times the adapter sequence
> (or part of it) appears in each read.
> For example:
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;#
> $adapter_matches will store the number of times the adapter sequence is
> repeated.
>

No, it will not. tr/// will count characters, not sequences. Something like
?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH.

Leon


From cjfields at illinois.edu  Thu Nov 10 14:17:57 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 19:17:57 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
	<CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu>

This is running using an older version of bioperl (probably 1.6.0 or 1.6.1).  The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed.

chris

On Nov 10, 2011, at 10:55 AM, Scott Cain wrote:

> Hi Angel,
> 
> Please keep correspondence on the mailing list.
> 
> I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
> and it worked fine.  I suspect there is something wrong with your genbank
> file.
> 
> Scott
> 
> 
> On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:
> 
>> His Scott,
>> 
>> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
>> happened:
>> 
>> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
>> babesichr3_2.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> UNIVERSAL->import is deprecated and will be removed in a future perl at
>> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>> 
>> However, the output file seems to be correct (Indeed, that was also the
>> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>> 
>> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
>> babesiachr3_2.gff
>> Reading babesiachr3_2.gff
>> Read 4776 transcripts in 8821 lines in 1 files
>>  4776 groups 1 seqs 1 sources 6 feature types
>> 2379 gene predictions
>> 
>> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
>> Mac with Parallels. Maybe tis is the cause for such a message.
>> 
>> Regards
>> 
>> 
>> ?ngel
>> 
>> 
>> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>> 
>> Hi Angel,
>> 
>> I would suggest using bp_genbank2gff3.pl, as it is more actively
>> maintained; the bp_genbank2gff.pl script hasn't really been touched in
>> many years, and I imagine it's suffering from some serious code rot.
>> 
>> Scott
>> 
>> 
>> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>> 
>>> Running bp_genbank2gff.pl got this:
>>> 
>>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>>> AAXT01000001.1 > babesichr3.gff
>>> Replacement list is longer than search list at
>>> /usr/share/perl5/Bio/Range.pm line 251.
>>> 
>>> 
>>> 
>>> ?ngel Zaballos
>>> Unidad de Gen?mica
>>> Centro Nacional de Microbiolog?a-ISCIII
>>> Carretera Majadahonda-Pozuelo, Km 2,2
>>> 28220-Majadahonda
>>> 
>>> Tel: 918223994
>>> mail:  azaballos at isciii.es
>>> 
>>> 
>>> 
>>> 
>>> ************************* AVISO LEGAL *************************
>>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>>> Si por error, ha recibido este mensaje y no se encuentra entre los
>>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>>> asume ning?n tipo de responsabilidad legal por el contenido de este
>>> mensaje
>>> cuando no responda a las funciones atribuidas al remitente del mismo por
>>> la
>>> normativa vigente.
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Thu Nov 10 14:27:22 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 19:27:22 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
	<B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
Message-ID: <CAKVJ-_4+hGzxmn43qJ4SkJfCaPUQw=PkV5QSjUyqPSDmyVw64A@mail.gmail.com>

On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 10, 2011, at 7:09 AM, Peter wrote:
>
>> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
>>
>> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>>>
>>> I received the following error while trying to run bl2seq from
>>> standaloneblastplus. Has anyone else encountered this problem?
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: /usr/bin/blastp call crashed: There was a problem running
>>> /usr/bin/blastp : Error: NCBI C++ Exception:
>>>
>>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>>> access NULL pointer.
>>>
>>> Thank you,
>>> Ryan
>>
>> Just hit something very very similar, looks like a BLAST+ bug which I
>> will report now:
>>
>> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
>> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
>> Error: NCBI C++ Exception:
>> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
>> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
>> Attempt to access NULL pointer.
>>
>> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
>> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
>> changed by one, but I'm confident it is the same point of failure.
>>
>> In my case I was comparing nucleotide against nucleotide, so should
>> have been using tblastx not tblastn, but it still shouldn't have had a
>> pointer exception.
>>
>> Peter
>
> Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+.
>
> chris

I'm told is already fixed and will be part of BLAST 2.2.26+ which is good.

>
> PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?
>

Maybe once, but it was in the archive and my email account.

Peter


From anna.fr at gmail.com  Thu Nov 10 15:01:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 09:01:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
Message-ID: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>

Hi all

Does anyone know if there is a way to get a Taxonomy node and/or
taxonid from a gi number using the flatfile with taxonomy db?

I have blast output that I want to append taxonomic information to. I
have hundreds of thousands of items to do this for, so it's not
practical to use entrez to query the?NCBI database.

I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
think much too large to put into a hash!

This was also discussed in 2009:
http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
don't think there was a conclusion?

Thanks for your help
Anna Friedlander


From shalabh.sharma7 at gmail.com  Thu Nov 10 15:12:09 2011
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 10 Nov 2011 15:12:09 -0500
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>

Hi Anna,
           I think the thread you mentioned was started by me.
That time i wrote few scripts to map gi to taxa, after some time i found
some other efficient ways also. But recently 'Miguel Pignatelli' directed
to some Bio-LITE modules that are really helpful.

These are the modules he mentioned, i found them really easy to use and
very efficient.

Bio-LITE-Taxonomy-0.07
Bio-LITE-Taxonomy-NCBI-0.07
Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04

Cheers
Shalabh

On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:

> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636

From cjfields at illinois.edu  Thu Nov 10 15:23:14 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 20:23:14 +0000
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu>

Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option).  I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups.

chris

On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote:

> Hi Anna,
>           I think the thread you mentioned was started by me.
> That time i wrote few scripts to map gi to taxa, after some time i found
> some other efficient ways also. But recently 'Miguel Pignatelli' directed
> to some Bio-LITE modules that are really helpful.
> 
> These are the modules he mentioned, i found them really easy to use and
> very efficient.
> 
> Bio-LITE-Taxonomy-0.07
> Bio-LITE-Taxonomy-NCBI-0.07
> Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04
> 
> Cheers
> Shalabh
> 
> On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> 
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> Shalabh Sharma
> Scientific Computing Professional Associate (Bioinformatics Specialist)
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Thu Nov 10 15:51:13 2011
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 10 Nov 2011 21:51:13 +0100
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>

Hi Anna,

Jason changed his example script from using hashes to using SQLite:
bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom

See
https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl

It's an example script that shows how to do the tax to gi mapping for
blast reports.


Bernd

On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Thu Nov 10 16:13:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 21:13:12 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>

If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split?  Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)).  

tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match.  '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/).

chris

On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote:

> 
> There are many ways to do it. 
> Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
> For example: 
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
> You then place that result in a hash bin:
> my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
> # Then you can sort and output your classes
> foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }
> 
> You can workout the details, but something like this should work.
> 
> 
> 
> 
> 
> 
> 
>> Date: Thu, 10 Nov 2011 04:29:55 -0800
>> From: casaburi at ceinge.unina.it
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
>> 
>> 
>> Hi everybody,
>> 
>> i have some reads (454) where there are adaptors (NNNN...), one,two or three
>> adaptors for each reads depending on the reads. Is there any way to
>> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
>> over the total ???
>> 
>>> 271-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>>> 272-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>>> 273-88
>> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>>> 274-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
>> 
>> The problem is that some adpators occur in the middle of the sequences
>> because they coming out from a concameration experimental design (they are
>> miRNAs between NNNNNN...). So i want to know a script or tool that may say
>> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
>> number of reads. Do you know any tool/script that may help ? Tnx 
>> Can anyone suggests me a script to fix this ???
>> 
>> Thank you very much 
>> -- 
>> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 		 	   		  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Thu Nov 10 16:15:29 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Thu, 10 Nov 2011 13:15:29 -0800
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>

Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI
and then a second db to store GI -> TAXONID

This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string.

https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl

That's the first 165 lines, and then lookups are basically what you see on line 195.

Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?).
one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading.

Jason

On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:

> Hi Anna,
> 
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
> 
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
> 
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
> 
> 
> Bernd
> 
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anna.fr at gmail.com  Thu Nov 10 20:07:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 14:07:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
	<1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
Message-ID: <CALv2E+09JeJiXPUoZphNZnaVhWM9mstkhhp+=1Jvs6Hjy3c+uA@mail.gmail.com>

thanks all for the fast responses.

I'll try the bio-lite modules shalabh mentioned

On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich <jason.stajich at gmail.com> wrote:
> Here's another variant of one I wrote which is for my own purposes, the code
> at the beginning uses a NOSQL solution to storing all the ACC -> GI
> and then a second db to store GI -> TAXONID
> This is the case where I have a file of accession numbers and I want to add
> to the description line the taxonomy string.
> https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl
> That's the first 165 lines, and then lookups are basically what you see on
> line 195.
> Would be good to rewrite that script below to use TokyoCabinent
> or?KyotoCabinent (is newer implementation, not sure if it is faster?).
> one thing that this does is take up a lot of disk space ,but you can have
> tradeoffs between than and which compression scheme you use, which will
> impact performance of loading.
> Jason
> On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:
>
> Hi Anna,
>
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
>
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
>
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
>
>
> Bernd
>
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
>
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
>
> have hundreds of thousands of items to do this for, so it's not
>
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>
> think much too large to put into a hash!
>
> This was also discussed in 2009:
>
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>
> don't think there was a conclusion?
>
> Thanks for your help
>
> Anna Friedlander
>
> _______________________________________________
>
> Bioperl-l mailing list
>
> Bioperl-l at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From arun_innovative90 at yahoo.com  Fri Nov 11 06:09:46 2011
From: arun_innovative90 at yahoo.com (Arun Kumar)
Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST)
Subject: [Bioperl-l] BIOPERL MATERIAL
Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>

Hi team, 
?
?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl.
?
Thanks in advance

Thanks & Regards,
Arunkumar.d

From awitney at sgul.ac.uk  Fri Nov 11 08:23:29 2011
From: awitney at sgul.ac.uk (Adam Witney)
Date: Fri, 11 Nov 2011 13:23:29 +0000
Subject: [Bioperl-l] BIOPERL MATERIAL
In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
Message-ID: <EA1DBB02-0280-4207-97E7-A116C058A615@sgul.ac.uk>


All BioPerl documents can be found here:

http://www.bioperl.org/wiki/Main_Page

And a useful place to start would be the HOWTOs:

http://www.bioperl.org/wiki/HOWTOs

regards

adam


On 11 Nov 2011, at 11:09, Arun Kumar wrote:

> Hi team, 
>  
>    This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with  bioperl.
>  
> Thanks in advance
> 
> Thanks & Regards,
> Arunkumar.d
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From casaburi at ceinge.unina.it  Fri Nov 11 07:13:50 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825229.post@talk.nabble.com>


Hi thank you for your answer !!! 

At the end i tried this script and seems to work for this purpose:


perl -pe
's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g'
Scrivania/orchidea/Fiore/Mydata.fasta > result.txt


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From casaburi at ceinge.unina.it  Fri Nov 11 07:21:29 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825274.post@talk.nabble.com>


Thanks everybody for answering me so soon !!! Probably another way may be:

perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print
"$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt 


and/or with 'nawk':

nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i "
ADAPTOR"}' myFile.fasta > result.txt 

They give the same result. If you will have this problem try these, work
good !!!

Still Thanks to all,

Giorgio


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Sun Nov 13 07:24:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:24:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
Message-ID: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>

On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:
>
>> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> (side thread, so re-titling...)
>>>
>> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
>> where OBDA v2 talk came up again in discussion of a BioPerl indexing
>> problem. Archive links for thread here:
>>
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html
>
> yes, good idea...

I've not CC'd the bioperl-l anymore.

>>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>>>
>>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>>> and their format as one table, and then in the main table an
>>>> entry for each sequence recording the ID (only one accession,
>>>> unlike OBDA which had infrastructure for a secondary accession),
>>>> file number, offset of the start of the record, and optionally the
>>>> length of the record on disk.
>>>>
>>>> i.e. Basically what OBDA does, but using SQLite rather
>>>> than BDB (not included in Python 3) or a flat file index
>>>> (poor performance with large datasets).
>>>>
>>>> I find this design attractive on several levels:
>>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>>> * Preserves the original file untouched
>>>> * Index is a small single file (thanks to SQLite)
>>>> * Back end could be switched out
>>>> * Could be applied to compressed file formats
>>>> * Reuses existing parsing code to access entries
>>>>
>>>> This could easily form basis of OBDA v2, the main points
>>>> of difference I anticipate between the Bio* projects would
>>>> be naming conventions for the different file formats, and
>>>> what we consider to be the default record ID of each read
>>>> (e.g. which field in a GenBank file - although agreement
>>>> here is not essential). Some of that was already settled in
>>>> principle with OBDA v1.
>>>
>>> The primary/secondary IDs could be configurable with a sane
>>> default, I think the bioperl implementations allowed this (and
>>> it is certainly something that will be requested).
>>
>> One reason I went with a single ID only was to keep the
>> Python dictionary based API simple (think hash in Perl).
>> You don't get secondary keys in a Python dict or a hash ;)
>>
>> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
>> can provide a call back function to map the suggested ID to
>> something else. Obviously this doesn't give the full flexibility
>> of extracting a field from the record's annotation because we
>> don't parse the whole record during indexing (it would be too
>> slow).
>
> Same with bioperl.
>
>> However, I'm happy for there to be an *optional* secondary
>> key in an OBDA v2 SQLite schema, but Biopython probably
>> won't populate it. We could optionally use it rather than the
>> primary ID on loading an existing index though.
>
> Optional implementation of that is fine by me.
>
>> Personally I would stick with one key in the index - it should
>> be faster and makes it simpler to switch the back end if we
>> need to later. If anyone wants a second key, they can build
>> a second index *grin*.
>
> That's easy enough.
>
>>>> On the other hand, you could try and store the parsed data
>>>> itself, which is where NOSQL looks more interesting. That
>>>> essentially requires the ability to serialise your annotated
>>>> sequence object model to disk - which would be tricky to do
>>>> cross project (much more ambitious than BioSQL is). It also
>>>> means the "index" becomes very large because it now holds
>>>> all the original data.
>>>>
>>>> Peter
>>>
>>> For a fully cross-Bio* compliant format, I don't think it's feasible
>>> to use serialized data unless they are serialized in something
>>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>>> XML, etc). ?Either that, or such data is stored concurrently with
>>> the binary blob, along with meta data that indicates the source
>>> of the blob, parser, version, etc, etc (unless there are tools out
>>> there that reliably interconvert serialized complex data structures
>>> between HLLs). ?Anyway you go about it, it seems like it could
>>> be a major ball of hurt, unless implemented very carefully.
>>
>> You missed out RDF as a serialisation ;)
>>
>> But yes, going down the shared serialisation route is going
>> to be messy - as you are well aware:
>>
>>> Aside: I think this was one of the problems with
>>> Bio::DB::SeqFeature::Store, in that it at one point stored
>>> Perl-specific Storable blobs.
>>>
>>> chris
>>
>> Peter
>
> yes, it's a problem w/o an easy solution. ?Anyway, I think an
> implementation of such at this point would be a premature
> optimization.
>
> chris

So, Chris and I seem in general agreement that an OBDA v2
using SQLite but based on essentially the same approach as
the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
mapping record identifiers to file offsets in the original sequence
files.

I hope to get BioRuby on board, they already have an OBDA
v1 support so that shouldn't be too hard.

Right now I don't recall if BioJava has/had OBDA v1 support,
and if they did if it was affected in their recent move to BioJava
v3 (I understand from their mailing list that some older lower
priority functionality has not all been ported yet).

Also EMBOSS are likely to be interested, certainly Peter Rice
was interested in the SQLite indexing we're already using in
Biopython for sequence files (i.e. what is effectively the
prototype for OBDA v2).

Note that in addition to simple indexing of text files, we are
already using the same simple offset + length approach for
indexing binary files (e.g. SFF).

On the immediate practical side, I think I can edit the
current OBDA website of http://obda.open-bio.org/
via /home/websites/obda.open-bio.org/html on the
server.

We need to work out where the current OBDA indexing
specification lives (CVS or SVN?) and perhaps move
that to github. We may need a general OBF organisation
account on git hub for this and any other cross-project
repositories.

I see there is already an OBDA project on RedMine,
(Chris can you add me to that please?)
https://redmine.open-bio.org/projects/obda

Peter


From p.j.a.cock at googlemail.com  Sun Nov 13 07:30:37 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:30:37 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
Message-ID: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>

Hi again,

I've retitled this as it is a little off topic from the main OBDA redux thread,
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html

As far as I recall, the original flat file and BDB based OBDA
specification for indexing sequencing files didn't cover
compressed files. That might be something to consider
(although we should sort of uncompressed text/binary
files first).

I've recently been experimenting with using compressed
files - in particular simple GZIP files (ignoring any block structure)
and BGZF (the specialised gzipped blocking used in BAM), see:

http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
http://seqanswers.com/forums/showthread.php?t=15347

The virtual offset approach used in BGZF squeezes a 16 bit
within block offset (thus limiting you to 64kb blocks) and at
48 bit block start offset (thus limiting you to a 256TB file) into
a single 64bit "virtual" offset. That makes sense if you are
keeping the lookup table or many offsets in memory, and
can be used as is with code expecting a single offset (like
the current Biopython SQLite index schema).

Also bzip2 but this is block based, with the block size ranging
from 100KB to 900KB.

http://bzip.org/
http://bzip.org/1.0.5/bzip2-manual-1.0.5.html

I haven't tried any performance tests yet, which would
be interesting as I believe compression/decompression
of bfzip2 is more costly in CPU terms than gzip (although
both will be block size dependent).

If we wanted to imitate the BGZF virtual offset scheme for
arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme
could use 20 bits to cover bz2 blocks of up to 900KB, leaving
64 - 20 = 44 bits for the start offset, thus limiting you to to just
2^44 bytes or 16Tb which sounds OK only in the medium term.
On the bright side this could be used to index any BZIP2 file
(under 16TB), whereas BGZF cannot be applied to any
GZIP file.

On the other hand, storing the block start and within block
separately is truly generic and could be used on any blocked
GZIP file (including BGZF) and BZIP2 etc. It would make
the SQLite schema a bit more complicated though.

Maybe something to consider for the next revision to OBDA,
and focus on the non-compressed case for now?

Regards,

Peter

From p.j.a.cock at googlemail.com  Sun Nov 13 07:32:12 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:32:12 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
In-Reply-To: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
References: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
Message-ID: <CAKVJ-_7G639PJBZFLE8mQPT=0LXeTWaf54U0tbMgh6XWfUAKtQ@mail.gmail.com>

On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi again,
>
> I've retitled this as it is a little off topic from the main OBDA redux thread,
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html
>
> As far as I recall, the original flat file and BDB based OBDA
> specification for indexing sequencing files didn't cover
> compressed files. That might be something to consider
> (although we should sort of uncompressed text/binary
> files first).

Sorry - didn't meant to include bioperl-l on that, although it may be
of interest to you guys anyway.

Peter

From jluis.lavin at unavarra.es  Mon Nov 14 06:14:43 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 12:14:43 +0100
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
Message-ID: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Mon Nov 14 06:59:56 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 06:59:56 -0500
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
	<CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
Message-ID: <FDFB72A5-E38C-4637-9415-5A15E4C5B551@gmail.com>

if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.

If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?  

On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:

> Hello everybody,
> 
> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> worked fine for me. Now I need to perform a multiple BLAST search, but this
> time I'd just like to get all the BLAST results in a single out file
> instead of having each sequence's report written individually. I've read
> the documentation of the module, but due to my short
> experience/understanding on complex modules as this one seems to be I can't
> figure out where to change the script to achieve my previously mentioned
> aim.
> Here I post the script I've been using (it's basically the one posted on
> the module cookbook).
> 
> #!/c:/Perl -w
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> use Data::Dumper;
> 
> #Here i set the parameters for blast
> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> tblastx):\n";
> my $blst = <STDIN>;
> my $prog = "$blst";
> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> env_nr):\n";
> my $dtb = <STDIN>;
> $db = "$dtb";
> print "Enter your cutt off score (1e-n):\n";
> my $cut = <STDIN>;
> my $e_val = "$cut";
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
> 
> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #Select the file and make the blast.
> print "Enter your FASTA file:\n";
> chomp(my $infile = <STDIN>);
> my $r = $remoteBlast->submit_blast($infile);
>  my $v = 1;
> 
>    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> TO RETURN!!!!!
>    while ( my @rids = $remoteBlast->each_rid ) {
>      foreach my $rid ( @rids ) {
>        my $rc = $remoteBlast->retrieve_blast($rid);
>        if( !ref($rc) ) {
>          if( $rc < 0 ) {
>            $remoteBlast->remove_rid($rid);
>          }
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>        } else {
>          my $result = $rc->next_result();
>          #save the output
>          my $filename =
> $result->query_name()."\.out";##################open SALIDA,
> '>>'."$^T"."Report"."\.out";
>          $remoteBlast->save_output($filename);#############
>          $remoteBlast->remove_rid($rid);
>          print "\nQuery Name: ", $result->query_name(), "\n";
>          while ( my $hit = $result->next_hit ) {
>            next unless ( $v > 0);
>            print "\thit name is ", $hit->name, "\n";
>            while( my $hsp = $hit->next_hsp ) {
>              print "\t\tscore is ", $hsp->score, "\n";
>            }
>          }
>        }
>      }
>    }
> 
> 
> May any of you please explain me how to solve my question?
> 
> Thanks in advence
> 
> With best wishes
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Mon Nov 14 09:07:36 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 09:07:36 -0500
Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single
	out
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>

Please keep this on list discussions 

Sent from my iPhone-please excuse typos

--
Jason Stajich

Begin forwarded message:

> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> Date: November 14, 2011 8:04:25 AM EST
> To: Jason Stajich <jason.stajich at gmail.com>
> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
> 
> Hello Jason,
> 
> As answering your question:
> 
> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
> 
> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
> 
> Thanks in advance
> 
> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
> 
> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
> 
> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> 
> > Hello everybody,
> >
> > I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> > worked fine for me. Now I need to perform a multiple BLAST search, but this
> > time I'd just like to get all the BLAST results in a single out file
> > instead of having each sequence's report written individually. I've read
> > the documentation of the module, but due to my short
> > experience/understanding on complex modules as this one seems to be I can't
> > figure out where to change the script to achieve my previously mentioned
> > aim.
> > Here I post the script I've been using (it's basically the one posted on
> > the module cookbook).
> >
> > #!/c:/Perl -w
> > use Bio::Tools::Run::RemoteBlast;
> > use Bio::SearchIO;
> > use Data::Dumper;
> >
> > #Here i set the parameters for blast
> > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> > tblastx):\n";
> > my $blst = <STDIN>;
> > my $prog = "$blst";
> > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> > env_nr):\n";
> > my $dtb = <STDIN>;
> > $db = "$dtb";
> > print "Enter your cutt off score (1e-n):\n";
> > my $cut = <STDIN>;
> > my $e_val = "$cut";
> >
> > my @params = ( '-prog' => $prog,
> >         '-data' => $db,
> >         '-expect' => $e_val,
> >         '-readmethod' => 'SearchIO' );
> >
> > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #Select the file and make the blast.
> > print "Enter your FASTA file:\n";
> > chomp(my $infile = <STDIN>);
> > my $r = $remoteBlast->submit_blast($infile);
> >  my $v = 1;
> >
> >    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> > TO RETURN!!!!!
> >    while ( my @rids = $remoteBlast->each_rid ) {
> >      foreach my $rid ( @rids ) {
> >        my $rc = $remoteBlast->retrieve_blast($rid);
> >        if( !ref($rc) ) {
> >          if( $rc < 0 ) {
> >            $remoteBlast->remove_rid($rid);
> >          }
> >          print STDERR "." if ( $v > 0 );
> >          sleep 5;
> >        } else {
> >          my $result = $rc->next_result();
> >          #save the output
> >          my $filename =
> > $result->query_name()."\.out";##################open SALIDA,
> > '>>'."$^T"."Report"."\.out";
> >          $remoteBlast->save_output($filename);#############
> >          $remoteBlast->remove_rid($rid);
> >          print "\nQuery Name: ", $result->query_name(), "\n";
> >          while ( my $hit = $result->next_hit ) {
> >            next unless ( $v > 0);
> >            print "\thit name is ", $hit->name, "\n";
> >            while( my $hsp = $hit->next_hsp ) {
> >              print "\t\tscore is ", $hsp->score, "\n";
> >            }
> >          }
> >        }
> >      }
> >    }
> >
> >
> > May any of you please explain me how to solve my question?
> >
> > Thanks in advence
> >
> > With best wishes
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> >
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN


From cl134 at duke.edu  Sun Nov 13 09:42:05 2011
From: cl134 at duke.edu (Cheng-Ruei Lee)
Date: Sun, 13 Nov 2011 09:42:05 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>

Hi all,

     Bioperl version: 1.006
     Here are two error messages when I'm using this module to  
calculate Fu & Li's statistics:
Illegal division by zero at (the Statistics.pm file) line 359
Illegal division by zero at (the Statistics.pm file) line 376
     A further tracking down shows that the first error happens when  
$n (sample size in the ingroup) equals 1 or 2, and the second error  
happens when $n equals 3. This is not really a bug though. I would  
suggest either in the original code, do a checking before the  
calculation (and skip the current calculation when $n == 1, 2, or 3 -  
rather than let the whole program die), or add a few lines of notes in  
the CPAN page.

Sincerely,
Cheng-Ruei Lee

From joluito at gmail.com  Mon Nov 14 04:21:31 2011
From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 10:21:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
Message-ID: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From cjfields at illinois.edu  Mon Nov 14 12:02:22 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:02:22 +0000
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>

Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

chris

On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:

> Please keep this on list discussions 
> 
> Sent from my iPhone-please excuse typos
> 
> --
> Jason Stajich
> 
> Begin forwarded message:
> 
>> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> Date: November 14, 2011 8:04:25 AM EST
>> To: Jason Stajich <jason.stajich at gmail.com>
>> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
>> 
>> Hello Jason,
>> 
>> As answering your question:
>> 
>> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
>> 
>> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
>> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
>> 
>> Thanks in advance
>> 
>> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
>> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
>> 
>> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
>> 
>> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> 
>>> Hello everybody,
>>> 
>>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
>>> worked fine for me. Now I need to perform a multiple BLAST search, but this
>>> time I'd just like to get all the BLAST results in a single out file
>>> instead of having each sequence's report written individually. I've read
>>> the documentation of the module, but due to my short
>>> experience/understanding on complex modules as this one seems to be I can't
>>> figure out where to change the script to achieve my previously mentioned
>>> aim.
>>> Here I post the script I've been using (it's basically the one posted on
>>> the module cookbook).
>>> 
>>> #!/c:/Perl -w
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::SearchIO;
>>> use Data::Dumper;
>>> 
>>> #Here i set the parameters for blast
>>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>>> tblastx):\n";
>>> my $blst = <STDIN>;
>>> my $prog = "$blst";
>>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
>>> env_nr):\n";
>>> my $dtb = <STDIN>;
>>> $db = "$dtb";
>>> print "Enter your cutt off score (1e-n):\n";
>>> my $cut = <STDIN>;
>>> my $e_val = "$cut";
>>> 
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO' );
>>> 
>>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>>> 
>>> 
>>> #Select the file and make the blast.
>>> print "Enter your FASTA file:\n";
>>> chomp(my $infile = <STDIN>);
>>> my $r = $remoteBlast->submit_blast($infile);
>>> my $v = 1;
>>> 
>>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
>>> TO RETURN!!!!!
>>>   while ( my @rids = $remoteBlast->each_rid ) {
>>>     foreach my $rid ( @rids ) {
>>>       my $rc = $remoteBlast->retrieve_blast($rid);
>>>       if( !ref($rc) ) {
>>>         if( $rc < 0 ) {
>>>           $remoteBlast->remove_rid($rid);
>>>         }
>>>         print STDERR "." if ( $v > 0 );
>>>         sleep 5;
>>>       } else {
>>>         my $result = $rc->next_result();
>>>         #save the output
>>>         my $filename =
>>> $result->query_name()."\.out";##################open SALIDA,
>>> '>>'."$^T"."Report"."\.out";
>>>         $remoteBlast->save_output($filename);#############
>>>         $remoteBlast->remove_rid($rid);
>>>         print "\nQuery Name: ", $result->query_name(), "\n";
>>>         while ( my $hit = $result->next_hit ) {
>>>           next unless ( $v > 0);
>>>           print "\thit name is ", $hit->name, "\n";
>>>           while( my $hsp = $hit->next_hsp ) {
>>>             print "\t\tscore is ", $hsp->score, "\n";
>>>           }
>>>         }
>>>       }
>>>     }
>>>   }
>>> 
>>> 
>>> May any of you please explain me how to solve my question?
>>> 
>>> Thanks in advence
>>> 
>>> With best wishes
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> 
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
>> -- 
>> -- 
>> Dr. Jos? Luis Lav?n Trueba
>> 
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 12:03:04 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:03:04 +0000
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <E385D24C-E562-43B9-A820-2A7C59E9399A@illinois.edu>

Cheng,

Have you tried the latest CPAN release (we're at 1.006901).

chris

On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 12:59:35 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:59:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
	<CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu>

On Nov 13, 2011, at 6:24 AM, Peter Cock wrote:

> So, Chris and I seem in general agreement that an OBDA v2
> using SQLite but based on essentially the same approach as
> the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
> mapping record identifiers to file offsets in the original sequence
> files.

The worry I have is adhering to a specific backend (e.g. SQLite).  The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets.  Who's to say something similar won't happen to SQLite, or that it is the best option available?  

Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed).  

> I hope to get BioRuby on board, they already have an OBDA
> v1 support so that shouldn't be too hard.
> 
> Right now I don't recall if BioJava has/had OBDA v1 support,
> and if they did if it was affected in their recent move to BioJava
> v3 (I understand from their mailing list that some older lower
> priority functionality has not all been ported yet).

I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?)

> Also EMBOSS are likely to be interested, certainly Peter Rice
> was interested in the SQLite indexing we're already using in
> Biopython for sequence files (i.e. what is effectively the
> prototype for OBDA v2).
> 
> Note that in addition to simple indexing of text files, we are
> already using the same simple offset + length approach for
> indexing binary files (e.g. SFF).

I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well.

> On the immediate practical side, I think I can edit the
> current OBDA website of http://obda.open-bio.org/
> via /home/websites/obda.open-bio.org/html on the
> server.

See below w/ regards to my thoughts on the wiki.

> We need to work out where the current OBDA indexing
> specification lives (CVS or SVN?) and perhaps move
> that to github. We may need a general OBF organisation
> account on git hub for this and any other cross-project
> repositories.

+1 to a move to github, but maybe this belongs in an OBF-specific organization.  And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. 

> I see there is already an OBDA project on RedMine,
> (Chris can you add me to that please?)
> https://redmine.open-bio.org/projects/obda
> 
> Peter

Done (last night actually, but I didn't have time to respond immediately).

chris


From David.Messina at sbc.su.se  Mon Nov 14 14:31:18 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Nov 2011 20:31:18 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se>


> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

Yes, it's the --remote option. I've used it, and it works great.

The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers.


Dave

> 


From jluis.lavin at unavarra.es  Mon Nov 14 16:23:31 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 22:23:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>

Thank you very much for your answers, but due to them, I'm afraid I didn't
explained myself good enough.

 I'm not looking for another tool to perform a BLAST task. I was just
wondering if there was a way to simply change the way the module writes the
outputs, so that I can get multiple searches in a single report file
instead of having a report for each BLAST search.

Maybe there's some issue I ignore, that makes you recommend the use of
other tools instead of the Bioperl Remote BLAST module...it would be
appreciated if you let me know about that (NCBI server problems with
web-services or so)...

Thank you for your answers anyway

Best wishes

2011/11/14 Fields, Christopher J <cjfields at illinois.edu>

> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the
> various 'blast*' indicating the search is to use a remote database.  I
> haven't used it, though...
>
> chris
>
> On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>
> > Please keep this on list discussions
> >
> > Sent from my iPhone-please excuse typos
> >
> > --
> > Jason Stajich
> >
> > Begin forwarded message:
> >
> >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> >> Date: November 14, 2011 8:04:25 AM EST
> >> To: Jason Stajich <jason.stajich at gmail.com>
> >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
> single out
> >>
> >> Hello Jason,
> >>
> >> As answering your question:
> >>
> >> " If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a
> table?"
> >>
> >> A concatenation of BLAST (default format) reports should be OK, since I
> have a script to parse that kind of results. Anyway formats 1 or 2 will
> also do the trick.
> >> I'll be happy to get assistance  on how to change the OUTFILE from "a
> query a report" to "all queries in the same report", because I don't seem
> to be able to do it myself after reading the module documentation.
> >>
> >> Thanks in advance
> >>
> >> El 14 de noviembre de 2011 12:59, Jason Stajich <
> jason.stajich at gmail.com> escribi?:
> >> if you want to do a bunch of BLASTs remotely on the cmdline you should
> also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
> equivalent). This might be faster to do and easier since you need to learn
> the programming part too.
> >>
> >> If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a table?
> >>
> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> >>
> >>> Hello everybody,
> >>>
> >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> >>> worked fine for me. Now I need to perform a multiple BLAST search, but
> this
> >>> time I'd just like to get all the BLAST results in a single out file
> >>> instead of having each sequence's report written individually. I've
> read
> >>> the documentation of the module, but due to my short
> >>> experience/understanding on complex modules as this one seems to be I
> can't
> >>> figure out where to change the script to achieve my previously
> mentioned
> >>> aim.
> >>> Here I post the script I've been using (it's basically the one posted
> on
> >>> the module cookbook).
> >>>
> >>> #!/c:/Perl -w
> >>> use Bio::Tools::Run::RemoteBlast;
> >>> use Bio::SearchIO;
> >>> use Data::Dumper;
> >>>
> >>> #Here i set the parameters for blast
> >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> >>> tblastx):\n";
> >>> my $blst = <STDIN>;
> >>> my $prog = "$blst";
> >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat,
> pdb,
> >>> env_nr):\n";
> >>> my $dtb = <STDIN>;
> >>> $db = "$dtb";
> >>> print "Enter your cutt off score (1e-n):\n";
> >>> my $cut = <STDIN>;
> >>> my $e_val = "$cut";
> >>>
> >>> my @params = ( '-prog' => $prog,
> >>>        '-data' => $db,
> >>>        '-expect' => $e_val,
> >>>        '-readmethod' => 'SearchIO' );
> >>>
> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>
> >>>
> >>> #Select the file and make the blast.
> >>> print "Enter your FASTA file:\n";
> >>> chomp(my $infile = <STDIN>);
> >>> my $r = $remoteBlast->submit_blast($infile);
> >>> my $v = 1;
> >>>
> >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
> RESULTS
> >>> TO RETURN!!!!!
> >>>   while ( my @rids = $remoteBlast->each_rid ) {
> >>>     foreach my $rid ( @rids ) {
> >>>       my $rc = $remoteBlast->retrieve_blast($rid);
> >>>       if( !ref($rc) ) {
> >>>         if( $rc < 0 ) {
> >>>           $remoteBlast->remove_rid($rid);
> >>>         }
> >>>         print STDERR "." if ( $v > 0 );
> >>>         sleep 5;
> >>>       } else {
> >>>         my $result = $rc->next_result();
> >>>         #save the output
> >>>         my $filename =
> >>> $result->query_name()."\.out";##################open SALIDA,
> >>> '>>'."$^T"."Report"."\.out";
> >>>         $remoteBlast->save_output($filename);#############
> >>>         $remoteBlast->remove_rid($rid);
> >>>         print "\nQuery Name: ", $result->query_name(), "\n";
> >>>         while ( my $hit = $result->next_hit ) {
> >>>           next unless ( $v > 0);
> >>>           print "\thit name is ", $hit->name, "\n";
> >>>           while( my $hsp = $hit->next_hsp ) {
> >>>             print "\t\tscore is ", $hsp->score, "\n";
> >>>           }
> >>>         }
> >>>       }
> >>>     }
> >>>   }
> >>>
> >>>
> >>> May any of you please explain me how to solve my question?
> >>>
> >>> Thanks in advence
> >>>
> >>> With best wishes
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >>
> >> --
> >> --
> >> Dr. Jos? Luis Lav?n Trueba
> >>
> >> Dpto. de Producci?n Agraria
> >> Grupo de Gen?tica y Microbiolog?a
> >> Universidad P?blica de Navarra
> >> 31006 Pamplona
> >> Navarra
> >> SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Mon Nov 14 22:53:19 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 22:53:19 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com>

sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming.  I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. 

https://redmine.open-bio.org/issues/3313

Jason

Can you provide a test script and we'll add a test for this so 
On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cchehoud at gmail.com  Mon Nov 14 20:39:32 2011
From: cchehoud at gmail.com (Christel Chehoud)
Date: Mon, 14 Nov 2011 17:39:32 -0800
Subject: [Bioperl-l] Bioperl installation help
Message-ID: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>

Dear BioPerl,
Thank you for creating such useful code. Unfortunately, every time I
try to install Bioperl, it takes me a long time and is a challenging
ordeal :( I am a new MAC user and was not able to download bioperl
using CPAN. Here is the error I am getting:

ERROR: Can't create '/usr/local/bin'
Do not have write permissions on '/usr/local/bin'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
  CJFIELDS/BioPerl-1.6.0.tar.gz
  ./Build install  -- NOT OK
----
  You may have to su to root to install the package
  (Or you may want to run something like
    o conf make_install_make_command 'sudo make'
  to raise your permissions.Warning (usually harmless): 'YAML' not
installed, will not store persistent state
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
failure ignored because 'force' in effect


so I did:
cpan> o conf make_install_make_command 'sudo make'
followed by
cpan> o conf commit

and started over..I got the same number of errors as last time (so I
decided not to force install this time). do you have any suggestions:

63 tests and 305 subtests skipped.
Failed 11/329 test scripts. 981/17708 subtests failed.
Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
117.20 CPU)
Failed 11/329 test programs. 981/17708 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Warning (usually harmless): 'YAML' not installed, will not store
persistent state
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO


Thanks a lot for your time and help.  I appreciate it.

Thank you,
Christel

From casaburi at ceinge.unina.it  Tue Nov 15 04:25:25 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST)
Subject: [Bioperl-l]  Blast > parsing result in Exel
Message-ID: <32846407.post@talk.nabble.com>


Hy everybody,

in this situation froma blast (-m 1) result file :

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 132-291
(59 letters)

Database: Scrivania/orchidea/mature_mirBase.fa
21,643 sequences; 470,608 total letters

Searching..................................................done


Score E
Sequences producing significant alignments: (bits) Value

mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031
mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031
gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9
gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9
mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9

132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59
12631 5 .............. 18
12630 5 .............. 18
7826 5 ........... 15
7644 19 ........... 9
5394 3 ........... 13
5394 3 ........... 13
BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
...
....
..........

______________________________________________________________
I need to parse in an exel sheet :

1)ID 2)Name of the hit 3)E-value 4)Score 5)Species


1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula


Is possible from a big blast result file obtain an exel with 5 columns where
every field is the first hit of the blast result. Can anyone halp me to fix
this problem ??? Also with a little script in perl.


Thank you very much
-- 
View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From nisa.dar10 at gmail.com  Tue Nov 15 19:49:00 2011
From: nisa.dar10 at gmail.com (nisa.dar)
Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST)
Subject: [Bioperl-l]  print alignment from blast results file
Message-ID: <32851673.post@talk.nabble.com>


Hi,

I am parsing a blast results file. I have found bioperl modules to get query
string, homology string and hit string for each hit/hsp. I want to print
them in the form of an alignment instead of aligning them individually.

this is what I am doing, but it doesn't seem correct

while (my $hsp = $hit->next_hsp) {
                                        my
$start_query_num=$hsp->start('query');
					my $query_string=$hsp->query_string;
					my $end_query_num=$hsp->end('query');
					my $homology_string=$hsp->homology_string;
					my $start_hit_num=$hsp->start('hit');
					my $hit_string=$hsp->hit_string;
					my $end_hit_num=$hsp->end('hit');
					my $aln_o = $hsp->get_aln;
					$query_string=~s/\n//g;#get rid of new line characters
					$homology_string=~s/\n//g;
					$hit_string=~s/\n//g;

                         print "<h3>Alignment:</h3><br />";
			print "$start_query_num-$query_string-$end_query_num<br />";
			print "   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
   
            
}

Please let me know how can I print them in the form of an alignment as seen
in the blast results file.

Thanks


-- 
View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Wed Nov 16 04:11:40 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Nov 2011 09:11:40 +0000
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAKVJ-_5PTZttkHXS-FB-tOxhDRCty_qJH9PTurDWn2M5p3VzSw@mail.gmail.com>

On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C <casaburi at ceinge.unina.it> wrote:
>
> Hy everybody,
>
> in this situation froma blast (-m 1) result file :
>
> ...
>
> I need to parse in an exel sheet :
>
> 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species
>
>
> 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula
>
> Is possible from a big blast result file obtain an exel with 5 columns where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>
> Thank you very much

Have you looked at any of the BioPerl BLAST parsing examples? e.g
http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/Module:Bio::SearchIO

See also http://seqanswers.com/forums/showthread.php?t=15489

Peter

From bosborne11 at verizon.net  Wed Nov 16 08:19:33 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 16 Nov 2011 08:19:33 -0500
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <32851673.post@talk.nabble.com>
References: <32851673.post@talk.nabble.com>
Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>

Nisa,

See:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Brian O.


On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:

> 
> Hi,
> 
> I am parsing a blast results file. I have found bioperl modules to get query
> string, homology string and hit string for each hit/hsp. I want to print
> them in the form of an alignment instead of aligning them individually.
> 
> this is what I am doing, but it doesn't seem correct
> 
> while (my $hsp = $hit->next_hsp) {
>                                        my
> $start_query_num=$hsp->start('query');
> 					my $query_string=$hsp->query_string;
> 					my $end_query_num=$hsp->end('query');
> 					my $homology_string=$hsp->homology_string;
> 					my $start_hit_num=$hsp->start('hit');
> 					my $hit_string=$hsp->hit_string;
> 					my $end_hit_num=$hsp->end('hit');
> 					my $aln_o = $hsp->get_aln;
> 					$query_string=~s/\n//g;#get rid of new line characters
> 					$homology_string=~s/\n//g;
> 					$hit_string=~s/\n//g;
> 
>                         print "<h3>Alignment:</h3><br />";
> 			print "$start_query_num-$query_string-$end_query_num<br />";
> 			print "   
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
> 
> 
> 
> }
> 
> Please let me know how can I print them in the form of an alignment as seen
> in the blast results file.
> 
> Thanks
> 
> 
> -- 
> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 11:44:27 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:44:27 +0000
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu>

For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules).  This should automatically install the latest version from CPAN.  My guess is this will address some of the issues.  However, w/o actually seeing what tests failed we can't help.

Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB.  There are instructions in the installation docs for that.  You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan.

chris

On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
> 
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
> 
> 
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
> 
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
> 
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
> CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
> 
> 
> Thanks a lot for your time and help.  I appreciate it.
> 
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 11:46:16 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:46:16 +0000
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
References: <32851673.post@talk.nabble.com>
	<035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
Message-ID: <B7768538-08CE-40A0-8EB9-5EB5169C1072@illinois.edu>

small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance).

chris

On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote:

> Nisa,
> 
> See:
> 
> http://www.bioperl.org/wiki/HOWTO:SearchIO
> 
> Brian O.
> 
> 
> On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:
> 
>> 
>> Hi,
>> 
>> I am parsing a blast results file. I have found bioperl modules to get query
>> string, homology string and hit string for each hit/hsp. I want to print
>> them in the form of an alignment instead of aligning them individually.
>> 
>> this is what I am doing, but it doesn't seem correct
>> 
>> while (my $hsp = $hit->next_hsp) {
>>                                       my
>> $start_query_num=$hsp->start('query');
>> 					my $query_string=$hsp->query_string;
>> 					my $end_query_num=$hsp->end('query');
>> 					my $homology_string=$hsp->homology_string;
>> 					my $start_hit_num=$hsp->start('hit');
>> 					my $hit_string=$hsp->hit_string;
>> 					my $end_hit_num=$hsp->end('hit');
>> 					my $aln_o = $hsp->get_aln;
>> 					$query_string=~s/\n//g;#get rid of new line characters
>> 					$homology_string=~s/\n//g;
>> 					$hit_string=~s/\n//g;
>> 
>>                        print "<h3>Alignment:</h3><br />";
>> 			print "$start_query_num-$query_string-$end_query_num<br />";
>> 			print "   
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
>> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
>> 
>> 
>> 
>> }
>> 
>> Please let me know how can I print them in the form of an alignment as seen
>> in the blast results file.
>> 
>> Thanks
>> 
>> 
>> -- 
>> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Wed Nov 16 12:01:49 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Nov 2011 18:01:49 +0100
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <CAM3TQQWDJ1_HPrAUguFfH5ngV42WeUOvXE6N2GktgmeTFs=ijw@mail.gmail.com>

Hi Christel,

Sorry to hear you're having trouble with the installation.

It looks like these modules aren't getting installed and are causing the
failed tests:
CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO

I would try installing those separately via CPAN first and then trying
again to install BioPerl.

Also, it was a good idea to set the make_install_make_command option to
CPAN, and that should have worked. Unfortunately, there's another
installation system called Module::Build that has its own option which may
need to be set:
cpan> o conf mbuild_install_build_command 'sudo ./Build'


That being said, I would suggest you grab the latest version of BioPerl
from github instead of using v1.6.1 from CPAN, which is fairly out of date
at this point.

And unless you're planning to use BioPerl with GBrowse or Bio::Graphics,
there's another, simpler way to get BioPerl up and running (assuming you
have all the prerequisites like Data::Stag installed):

See "Don't want to install BioPerl?" here:
http://www.seqxml.org/xml/BioPerl.html


Best,
Dave


On Tue, Nov 15, 2011 at 02:39, Christel Chehoud <cchehoud at gmail.com> wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
>
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>  at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm
> line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
>
>
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
>
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
>
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
>  CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
>
>
> Thanks a lot for your time and help.  I appreciate it.
>
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From jluis.lavin at unavarra.es  Wed Nov 16 13:31:46 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Wed, 16 Nov 2011 19:31:46 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>
	<CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
Message-ID: <CADm9iy=mMqHhWO5rTXbJS4ZG8aG-t0mAVHqN720tnyA7Hy_nkg@mail.gmail.com>

Thank you for your answer Jason,

While answering you I figured out how to do it...sometimes you need other
people's point of view to see the light.

As you pointed out:

"what is complicaticated is the file name right now is based on the query
name."

that's what I expected that could have an easy fix, the issue about the
dependency between the outfile name and the query name, this is why I
couldn't figure out how to change the name of the output .

While reading the code to answer you, I came across the solution.

I was persistent on doing it this way because I need to run BLAST remotely
on my CGI, that's why I didn't pay attention to all the other options you
suggested. Thank you all for your sugestions anyway.

;)

Best wishes

JL


El 16 de noviembre de 2011 18:03, Jason Stajich <jason at bioperl.org>escribi?:

> the answer to your question is to move the line that opens a file to
> outside the loop. what is complicaticated is the file name right now is
> based on the query name. so you need to think how you want to name the
> file. Since this isn't obvious to you, then I think we are suggesting you
> probably need to understand programming more, and it might just be easier
> to use the tools as we have suggested rather than teaching you to modify
> what is just an example code.  our suggestions are based on the way we'd
> solve the problem so maybe you have other reasons for the direction you
> want to take.
>
> I also think it is not efficient or logical to run
> remote blast through the web protocol simply to write it back out with
> bioperl since that has to parse it in and then write it out -- why not just
> run the program that generates the output directly from NCBI. Or run BLAST
> locally for likely more efficient running.
>
>  Finally the bioperl writer may not 100% reproduce the blast output so if
> you are planning on further parsing the output that comes out from this
> script, it really doesn't seem like a good idea to launder it through
> bioperl parser first.
>
>
>
> 2011/11/14 Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>
>> Thank you very much for your answers, but due to them, I'm afraid I didn't
>> explained myself good enough.
>>
>>  I'm not looking for another tool to perform a BLAST task. I was just
>> wondering if there was a way to simply change the way the module writes
>> the
>> outputs, so that I can get multiple searches in a single report file
>> instead of having a report for each BLAST search.
>>
>> Maybe there's some issue I ignore, that makes you recommend the use of
>> other tools instead of the Bioperl Remote BLAST module...it would be
>> appreciated if you let me know about that (NCBI server problems with
>> web-services or so)...
>>
>> Thank you for your answers anyway
>>
>> Best wishes
>>
>> 2011/11/14 Fields, Christopher J <cjfields at illinois.edu>
>>
>> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for
>> the
>> > various 'blast*' indicating the search is to use a remote database.  I
>> > haven't used it, though...
>> >
>> > chris
>> >
>> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>> >
>> > > Please keep this on list discussions
>> > >
>> > > Sent from my iPhone-please excuse typos
>> > >
>> > > --
>> > > Jason Stajich
>> > >
>> > > Begin forwarded message:
>> > >
>> > >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> > >> Date: November 14, 2011 8:04:25 AM EST
>> > >> To: Jason Stajich <jason.stajich at gmail.com>
>> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
>> > single out
>> > >>
>> > >> Hello Jason,
>> > >>
>> > >> As answering your question:
>> > >>
>> > >> " If you want to do this within this code I guess the question is
>> what
>> > format you want the data in - a BLAST report or something more like a
>> > table?"
>> > >>
>> > >> A concatenation of BLAST (default format) reports should be OK,
>> since I
>> > have a script to parse that kind of results. Anyway formats 1 or 2 will
>> > also do the trick.
>> > >> I'll be happy to get assistance  on how to change the OUTFILE from "a
>> > query a report" to "all queries in the same report", because I don't
>> seem
>> > to be able to do it myself after reading the module documentation.
>> > >>
>> > >> Thanks in advance
>> > >>
>> > >> El 14 de noviembre de 2011 12:59, Jason Stajich <
>> > jason.stajich at gmail.com> escribi?:
>> > >> if you want to do a bunch of BLASTs remotely on the cmdline you
>> should
>> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
>> > equivalent). This might be faster to do and easier since you need to
>> learn
>> > the programming part too.
>> > >>
>> > >> If you want to do this within this code I guess the question is what
>> > format you want the data in - a BLAST report or something more like a
>> table?
>> > >>
>> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> > >>
>> > >>> Hello everybody,
>> > >>>
>> > >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it
>> has
>> > >>> worked fine for me. Now I need to perform a multiple BLAST search,
>> but
>> > this
>> > >>> time I'd just like to get all the BLAST results in a single out file
>> > >>> instead of having each sequence's report written individually. I've
>> > read
>> > >>> the documentation of the module, but due to my short
>> > >>> experience/understanding on complex modules as this one seems to be
>> I
>> > can't
>> > >>> figure out where to change the script to achieve my previously
>> > mentioned
>> > >>> aim.
>> > >>> Here I post the script I've been using (it's basically the one
>> posted
>> > on
>> > >>> the module cookbook).
>> > >>>
>> > >>> #!/c:/Perl -w
>> > >>> use Bio::Tools::Run::RemoteBlast;
>> > >>> use Bio::SearchIO;
>> > >>> use Data::Dumper;
>> > >>>
>> > >>> #Here i set the parameters for blast
>> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>> > >>> tblastx):\n";
>> > >>> my $blst = <STDIN>;
>> > >>> my $prog = "$blst";
>> > >>> print "Enter a database to search (nr, refseq_protein, swissprot,
>> pat,
>> > pdb,
>> > >>> env_nr):\n";
>> > >>> my $dtb = <STDIN>;
>> > >>> $db = "$dtb";
>> > >>> print "Enter your cutt off score (1e-n):\n";
>> > >>> my $cut = <STDIN>;
>> > >>> my $e_val = "$cut";
>> > >>>
>> > >>> my @params = ( '-prog' => $prog,
>> > >>>        '-data' => $db,
>> > >>>        '-expect' => $e_val,
>> > >>>        '-readmethod' => 'SearchIO' );
>> > >>>
>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>> > >>>
>> > >>>
>> > >>> #Select the file and make the blast.
>> > >>> print "Enter your FASTA file:\n";
>> > >>> chomp(my $infile = <STDIN>);
>> > >>> my $r = $remoteBlast->submit_blast($infile);
>> > >>> my $v = 1;
>> > >>>
>> > >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
>> > RESULTS
>> > >>> TO RETURN!!!!!
>> > >>>   while ( my @rids = $remoteBlast->each_rid ) {
>> > >>>     foreach my $rid ( @rids ) {
>> > >>>       my $rc = $remoteBlast->retrieve_blast($rid);
>> > >>>       if( !ref($rc) ) {
>> > >>>         if( $rc < 0 ) {
>> > >>>           $remoteBlast->remove_rid($rid);
>> > >>>         }
>> > >>>         print STDERR "." if ( $v > 0 );
>> > >>>         sleep 5;
>> > >>>       } else {
>> > >>>         my $result = $rc->next_result();
>> > >>>         #save the output
>> > >>>         my $filename =
>> > >>> $result->query_name()."\.out";##################open SALIDA,
>> > >>> '>>'."$^T"."Report"."\.out";
>> > >>>         $remoteBlast->save_output($filename);#############
>> > >>>         $remoteBlast->remove_rid($rid);
>> > >>>         print "\nQuery Name: ", $result->query_name(), "\n";
>> > >>>         while ( my $hit = $result->next_hit ) {
>> > >>>           next unless ( $v > 0);
>> > >>>           print "\thit name is ", $hit->name, "\n";
>> > >>>           while( my $hsp = $hit->next_hsp ) {
>> > >>>             print "\t\tscore is ", $hsp->score, "\n";
>> > >>>           }
>> > >>>         }
>> > >>>       }
>> > >>>     }
>> > >>>   }
>> > >>>
>> > >>>
>> > >>> May any of you please explain me how to solve my question?
>> > >>>
>> > >>> Thanks in advence
>> > >>>
>> > >>> With best wishes
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>> _______________________________________________
>> > >>> Bioperl-l mailing list
>> > >>> Bioperl-l at lists.open-bio.org
>> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> Bioperl-l mailing list
>> > >> Bioperl-l at lists.open-bio.org
>> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> --
>> > >> Dr. Jos? Luis Lav?n Trueba
>> > >>
>> > >> Dpto. de Producci?n Agraria
>> > >> Grupo de Gen?tica y Microbiolog?a
>> > >> Universidad P?blica de Navarra
>> > >> 31006 Pamplona
>> > >> Navarra
>> > >> SPAIN
>> > >
>> > > _______________________________________________
>> > > Bioperl-l mailing list
>> > > Bioperl-l at lists.open-bio.org
>> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> --
>> Dr. Jos? Luis Lav?n Trueba
>>
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From l.m.timmermans at students.uu.nl  Fri Nov 18 09:15:47 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Fri, 18 Nov 2011 15:15:47 +0100
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAC1jpXC7uBtbHb_ixzMy2idvfeFQc1Y=d8Zi3xn_=0RyGYTzrA@mail.gmail.com>

On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C <casaburi at ceinge.unina.it>wrote:

> I need to parse in an exel sheet :
>

What you're saying here is nonsense. I think you meant to say you want to
output Excel.


> Is possible from a big blast result file obtain an exel with 5 columns
> where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>

There are a number of Perl modules on CPAN for outputting Excel. Try
Excel::Writer::XLSX and Spreadsheet::WriteExcel for example.

Leon

From tzhu at mail.bnu.edu.cn  Mon Nov 21 00:17:18 2011
From: tzhu at mail.bnu.edu.cn (Tao Zhu)
Date: Mon, 21 Nov 2011 13:17:18 +0800
Subject: [Bioperl-l] Is there a "combine" method that would combine several
 sequence alignments to a single alignment?
Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn>

I can use the "slice" method to split a single sequence alignment into 
several subalignments. Then is there a corresponding "combine" method to 
combine such subalignments back?

-- 
Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
100875, China
Email: tzhu at mail.bnu.edu.cn


From David.Messina at sbc.su.se  Mon Nov 21 04:58:51 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 21 Nov 2011 10:58:51 +0100
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
Message-ID: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>

Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>

From roy.chaudhuri at gmail.com  Mon Nov 21 06:41:09 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 21 Nov 2011 11:41:09 +0000
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <4ECA38D5.8050709@gmail.com>

See the cat method in Bio::Align::Utilities:

http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat

On 21/11/2011 09:58, Dave Messina wrote:
> Hi,
>
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
>
>
> Dave
>
>
>
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu<tzhu at mail.bnu.edu.cn>  wrote:
>
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>>
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>>
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From zntayl at gmail.com  Wed Nov 16 20:07:07 2011
From: zntayl at gmail.com (Nathan Taylor)
Date: Wed, 16 Nov 2011 20:07:07 -0500
Subject: [Bioperl-l] seqIO.pm
Message-ID: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>

Hello,

   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
barring that, a file of fastas and file of quals into .phd files?

Many thanks,
Nathan

From gregonomic at yahoo.co.nz  Mon Nov 21 07:00:50 2011
From: gregonomic at yahoo.co.nz (Gregory Baillie)
Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST)
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>

Hi.

I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.

It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.

Usage:
concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>


If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').

Greg.


________________________________
 From: Dave Messina <David.Messina at sbc.su.se>
To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
Cc: BioPerl <bioperl-l at lists.open-bio.org> 
Sent: Monday, 21 November 2011 7:58 PM
Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
 
Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: concatenate_alignments.pl
Type: application/octet-stream
Size: 3349 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111121/aa673dba/attachment.obj>

From jason.stajich at gmail.com  Mon Nov 21 10:31:50 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 21 Nov 2011 10:31:50 -0500
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
	<1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com>

greg  -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out.

This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment.

https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl

On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote:

> Hi.
> 
> I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.
> 
> It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.
> 
> Usage:
> concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>
> 
> 
> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').
> 
> Greg.
> 
> 
> ________________________________
> From: Dave Messina <David.Messina at sbc.su.se>
> To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
> Cc: BioPerl <bioperl-l at lists.open-bio.org> 
> Sent: Monday, 21 November 2011 7:58 PM
> Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
> 
> Hi,
> 
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
> 
> 
> Dave
> 
> 
> 
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:
> 
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>> 
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>> 
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l<concatenate_alignments.pl>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Mon Nov 21 11:15:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 21 Nov 2011 16:15:13 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
Message-ID: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>

On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter


From cjfields at illinois.edu  Mon Nov 21 11:57:29 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 21 Nov 2011 16:57:29 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu>

On Nov 21, 2011, at 10:15 AM, Peter Cock wrote:

> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
>> Hello,
>> 
>>   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
>> barring that, a file of fastas and file of quals into .phd files?
>> 
>> Many thanks,
>> Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an
> error message?
> 
> Peter

This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose.  Nathan, if you run into problems with that conversion let us know.

chris

From rondonbio at yahoo.com.br  Mon Nov 21 12:31:21 2011
From: rondonbio at yahoo.com.br (Rondon Neto)
Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST)
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>

Hi! try this script:

#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;

if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }

my $fastq = $ARGV[0];

my $in = Bio::SeqIO->new( -file => $fastq,
?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );

my $out = Bio::SeqIO->new ( -file => ">out.phd",
?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');

while (my $seq = $in->next_seq()) {
?? ? ?$out->write_seq($seq);
}

exit;


Best wishes,
Rondon, a brazilian friend.


________________________________
 De: Peter Cock <p.j.a.cock at googlemail.com>
Para: Nathan Taylor <zntayl at gmail.com> 
Cc: bioperl-l at bioperl.org 
Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
Assunto: Re: [Bioperl-l] seqIO.pm
 
On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

From Russell.Smithies at agresearch.co.nz  Mon Nov 21 15:04:01 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 22 Nov 2011 09:04:01 +1300
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
	<1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz>

Or you could use the builtin script bp_sreformat.pl

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rondon Neto
> Sent: Tuesday, 22 November 2011 6:31 a.m.
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] seqIO.pm
> 
> Hi! try this script:
> 
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Bio::SeqIO;
> 
> if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }
> 
> my $fastq = $ARGV[0];
> 
> my $in = Bio::SeqIO->new( -file => $fastq,
> ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );
> 
> my $out = Bio::SeqIO->new ( -file => ">out.phd",
> ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');
> 
> while (my $seq = $in->next_seq()) {
> ?? ? ?$out->write_seq($seq);
> }
> 
> exit;
> 
> 
> Best wishes,
> Rondon, a brazilian friend.
> 
> 
> 
> 
> 
> 
> ________________________________
>  De: Peter Cock <p.j.a.cock at googlemail.com>
> Para: Nathan Taylor <zntayl at gmail.com>
> Cc: bioperl-l at bioperl.org
> Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
> Assunto: Re: [Bioperl-l] seqIO.pm
> 
> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> > Hello,
> >
> > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> > barring that, a file of fastas and file of quals into .phd files?
> >
> > Many thanks,
> > Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an error message?
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From goodyearkl at gmail.com  Mon Nov 21 21:23:13 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>

Hi,
This may seem like a stupid question but I am just learning bioperl
and I am trying to figure out how to get a count of all the characters
in my FASTA file. I manged to get the number of sequences using the
following. Is there a way to tell bioperl to count the characters?

#!/usr/bin/perl -w
#Defines perl modules
#Bio::Seq deal with sequences and their features
use Bio::Seq;
#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;
#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );
#Count how many sequences are present in file
my $count=0;
while (my $seq_obj = $seqio_obj->next_seq) {
    $count++;
}
#Display the number of sequences present
print "There are $count sequences present.\n";


From David.Messina at sbc.su.se  Tue Nov 22 03:08:11 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 22 Nov 2011 09:08:11 +0100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>

Hi Kylie,

You can use the length method for this.

my $seq_length = $seq_obj->length();

Have you taken a look at the beginner's HOWTO? There's a nice table of
sequence methods as well lots of other good information in there.

http://www.bioperl.org/wiki/HOWTO:Beginners


Dave


On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyearkl at gmail.com> wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
>
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From liam.elbourne at mq.edu.au  Mon Nov 21 23:11:12 2011
From: liam.elbourne at mq.edu.au (Liam Elbourne)
Date: Tue, 22 Nov 2011 15:11:12 +1100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <EEEBBE60-96CB-4458-A460-F154CCC7459D@mq.edu.au>

Hi Kylie,

I think the length() method is what you're after:

....
my $sequence_length = $seq_obj->length();

....

in your case. Have a look at:

HOWTO:SeqIO - BioPerl

and,

HOWTO:Beginners - BioPerl

for some more general stuff.


Regards,
Liam.


On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment.bin>

From goodyearkl at gmail.com  Tue Nov 22 08:00:55 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>

Thank you for your help. It keeps telling me that it can't find
"length" do you think it has to do with the way I am coding it?

#!/usr/bin/perl -w
#Defines perl modules

#Bio::Seq deal with sequences and their features
use Bio::Seq;

#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;


#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );


#Count how many sequences are present in file
my $countseq=0;
while (my $seq_obj = $seqio_obj->next_seq, ) {
    $countseq++;
    }
#Display the number of sequences present
print "There are $countseq sequences present.\n";

#Count number of charcaters in file
my $seq_length = $seq_obj->length ;
print $seq_length


On Nov 22, 5:08?am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> Hi Kylie,
>
> You can use the length method for this.
>
> my $seq_length = $seq_obj->length();
>
> Have you taken a look at the beginner's HOWTO? There's a nice table of
> sequence methods as well lots of other good information in there.
>
> http://www.bioperl.org/wiki/HOWTO:Beginners
>
> Dave
>
>
>
>
>
>
>
>
>
> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
> > Hi,
> > This may seem like a stupid question but I am just learning bioperl
> > and I am trying to figure out how to get a count of all the characters
> > in my FASTA file. I manged to get the number of sequences using the
> > following. Is there a way to tell bioperl to count the characters?
>
> > #!/usr/bin/perl -w
> > #Defines perl modules
> > #Bio::Seq deal with sequences and their features
> > use Bio::Seq;
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats
> > use Bio::SeqIO;
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> > #Count how many sequences are present in file
> > my $count=0;
> > while (my $seq_obj = $seqio_obj->next_seq) {
> > ? ?$count++;
> > }
> > #Display the number of sequences present
> > print "There are $count sequences present.\n";
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy.chaudhuri at gmail.com  Tue Nov 22 10:50:31 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 22 Nov 2011 15:50:31 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <4ECBC4C7.10401@gmail.com>

Hi Kylie,

I suspect the error you get is actually "Can't call method length on an 
undefined value" (please in future report the exact text of any error 
messages). You declare $seq_obj with "my" in the while loop, but then 
try to access it outside of the loop. Try printing out the length of 
each $seq_obj within the while loop.

You should always include "use strict;" at the top of your program, that 
helps to catch errors like this.

Cheers,
Roy.

On 22/11/2011 13:00, Kylie Goodyear wrote:
> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
>
> #!/usr/bin/perl -w
> #Defines perl modules
>
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
>
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
>
>
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
> =>  "fasta" );
>
>
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>      $countseq++;
>      }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
>
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
>
>
> On Nov 22, 5:08 am, Dave Messina<David.Mess... at sbc.su.se>  wrote:
>> Hi Kylie,
>>
>> You can use the length method for this.
>>
>> my $seq_length = $seq_obj->length();
>>
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>>
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>>
>> Dave
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear<goodyea... at gmail.com>  wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>>
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
>>> =>  "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>     $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 22 11:13:01 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 22 Nov 2011 16:13:01 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>

This sounds a little homework-y.  Sure this isn't for a class? :)

One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl.  Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length.

chris

On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:

> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> 
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> 
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> 
> 
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> 
> 
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>    $countseq++;
>    }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
> 
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
> 
> 
> On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
>> Hi Kylie,
>> 
>> You can use the length method for this.
>> 
>> my $seq_length = $seq_obj->length();
>> 
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>> 
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>> 
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
>>> => "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>    $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Nov 22 15:47:36 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 23 Nov 2011 09:47:36 +1300
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
	<0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz>

Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl
As previous posters have hinted, RTFM - the answers are all in there!

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J
> Sent: Wednesday, 23 November 2011 5:13 a.m.
> To: Kylie Goodyear
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] Fasta counting script?
> 
> This sounds a little homework-y.  Sure this isn't for a class? :)
> 
> One clue (and a good thing to keep in mind): always 'use strict; use warnings;'
> with your scripts if you are new to perl.  Doing so would let you know there is
> a problem with the script the way it is written, specifically, the place where
> you are inquiring about the length.
> 
> chris
> 
> On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:
> 
> > Thank you for your help. It keeps telling me that it can't find
> > "length" do you think it has to do with the way I am coding it?
> >
> > #!/usr/bin/perl -w
> > #Defines perl modules
> >
> > #Bio::Seq deal with sequences and their features use Bio::Seq;
> >
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats use Bio::SeqIO;
> >
> >
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> >
> >
> > #Count how many sequences are present in file my $countseq=0; while
> > (my $seq_obj = $seqio_obj->next_seq, ) {
> >    $countseq++;
> >    }
> > #Display the number of sequences present print "There are $countseq
> > sequences present.\n";
> >
> > #Count number of charcaters in file
> > my $seq_length = $seq_obj->length ;
> > print $seq_length
> >
> >
> > On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> >> Hi Kylie,
> >>
> >> You can use the length method for this.
> >>
> >> my $seq_length = $seq_obj->length();
> >>
> >> Have you taken a look at the beginner's HOWTO? There's a nice table
> >> of sequence methods as well lots of other good information in there.
> >>
> >> http://www.bioperl.org/wiki/HOWTO:Beginners
> >>
> >> Dave
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com>
> wrote:
> >>> Hi,
> >>> This may seem like a stupid question but I am just learning bioperl
> >>> and I am trying to figure out how to get a count of all the
> >>> characters in my FASTA file. I manged to get the number of sequences
> >>> using the following. Is there a way to tell bioperl to count the characters?
> >>
> >>> #!/usr/bin/perl -w
> >>> #Defines perl modules
> >>> #Bio::Seq deal with sequences and their features use Bio::Seq;
> >>> #Bio::SeqIO handles reading and parsing of sequences of many
> >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj =
> >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta"
> >>> ); #Count how many sequences are present in file my $count=0; while
> >>> (my $seq_obj = $seqio_obj->next_seq) {
> >>>    $count++;
> >>> }
> >>> #Display the number of sequences present print "There are $count
> >>> sequences present.\n";
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioper... at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf
> >> o/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From charles-listes+bioperl at plessy.org  Wed Nov 23 05:27:45 2011
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 23 Nov 2011 19:27:45 +0900
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
Message-ID: <20111123102745.GC20168@merveille.plessy.net>

Dear BioPerl developers,

I am trying to process some unaligned paired-end reads with Bio::DB::Sam.  For
each pair, I want to detect a sequence index and a unique molecular identifier in
the linker, record them as auxiliary flags, and trim the linker from the read.

I collect the pairs through a features iterator, and can access all their data
through the high-level Bio::DB::Bam::Alignment API.  After modifying them
(linker trimming and adding flags), I want to write the resulting pairs as a
new unaligned BAM file.

I apologise if the solution is trivial, but my problem is that I do not manage to
modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
?$pair[0]->qseq("GATACA")? give errors like
?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.

Since I did not find explanations or portsions of source code indicating how to
modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan

From MEC at stowers.org  Wed Nov 23 11:02:26 2011
From: MEC at stowers.org (Cook, Malcolm)
Date: Wed, 23 Nov 2011 10:02:26 -0600
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>

Charles,

I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.

I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".

~Malcolm


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> Sent: Wednesday, November 23, 2011 4:28 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> 
> Dear BioPerl developers,
> 
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
> For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the read.
> 
> I collect the pairs through a features iterator, and can access all their data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as a
> new unaligned BAM file.
> 
> I apologise if the solution is trivial, but my problem is that I do not manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> 
> Since I did not find explanations or portsions of source code indicating how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> 
> Have a nice day,
> 
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 23 14:26:31 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 23 Nov 2011 19:26:31 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
Message-ID: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>

According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).  

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
    Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
    if (items > 1)
      b->core.l_qseq = SvIV(ST(1));
    RETVAL=b->core.l_qseq;
OUTPUT:
    RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
    char* seq;
    int   i;
CODE:
    seq = Newxz(seq,b->core.l_qseq+1,char);
    for (i=0;i<b->core.l_qseq;i++) {
      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
    }
    RETVAL = newSVpv(seq,b->core.l_qseq);
    Safefree(seq);
OUTPUT:
    RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
> 
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
> 
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
> 
> ~Malcolm
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>> 
>> Dear BioPerl developers,
>> 
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>> 
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>> 
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>> 
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>> 
>> Have a nice day,
>> 
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lincoln.stein at gmail.com  Wed Nov 23 17:02:23 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:02:23 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <CAOS1dzwxY2Kt3_xkgnbCps_TYfnT3dGE9+gAirpBCeJoMT7YDg@mail.gmail.com>

I apologize that the qseq() method is only allowing read-only access. I
will attempt to fix this.

Lincoln

On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy <
charles-listes+bioperl at plessy.org> wrote:

> Dear BioPerl developers,
>
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>  For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the
> read.
>
> I collect the pairs through a features iterator, and can access all their
> data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as
> a
> new unaligned BAM file.
>
> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>
> Since I did not find explanations or portsions of source code indicating
> how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>
> Have a nice day,
>
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From lincoln.stein at gmail.com  Wed Nov 23 17:05:41 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:05:41 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
Message-ID: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>

Unfortunately l_qseq read/writes the length of the query sequence, not the
sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <
cjfields at illinois.edu> wrote:

> According to the docs the low-level API for Bio-Samtools, both read and
> write are allowed:
>
> http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API
>
> Using the low-level API for this purpose isn't documented as well, though
> (the high-level API is read only AFAICT).
>
> The error message is a standard one generated from the XS bindings where
> the passed argument passed isn't mapped correctly.  Looking through the
> Sam.xs file, qseq() is only prototyped as a reader; the only arg is a
> Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a
> function specified for Bio::DB::Bam::Alignment names l_qseq() that might be
> the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_'
> prefix):
>
> ....
>
> int
> bama_l_qseq(b,...)
>    Bio::DB::Bam::Alignment b
> PROTOTYPE: $;$
> CODE:
>    if (items > 1)
>      b->core.l_qseq = SvIV(ST(1));
>    RETVAL=b->core.l_qseq;
> OUTPUT:
>    RETVAL
>
> SV*
> bama_qseq(b)
> Bio::DB::Bam::Alignment b
> PROTOTYPE: $
> PREINIT:
>    char* seq;
>    int   i;
> CODE:
>    seq = Newxz(seq,b->core.l_qseq+1,char);
>    for (i=0;i<b->core.l_qseq;i++) {
>      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
>    }
>    RETVAL = newSVpv(seq,b->core.l_qseq);
>    Safefree(seq);
> OUTPUT:
>    RETVAL
>
>
> -chris
>
> On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:
>
> > Charles,
> >
> > I suggest you reconsider your approach to rather, use `samtools view` to
> pipe your reads to stdout in sam format, then stream edit the barcode and
> pipe it back to samtools for conversion back to .bam file.
> >
> > I know this is not what you're asking.  I'm pretty sure that direct
> answer to your question is, "yes - they are read-only".
> >
> > ~Malcolm
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> >> Sent: Wednesday, November 23, 2011 4:28 AM
> >> To: bioperl-l at bioperl.org
> >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> >>
> >> Dear BioPerl developers,
> >>
> >> I am trying to process some unaligned paired-end reads with
> Bio::DB::Sam.
> >> For
> >> each pair, I want to detect a sequence index and a unique molecular
> >> identifier in
> >> the linker, record them as auxiliary flags, and trim the linker from
> the read.
> >>
> >> I collect the pairs through a features iterator, and can access all
> their data
> >> through the high-level Bio::DB::Bam::Alignment API.  After modifying
> them
> >> (linker trimming and adding flags), I want to write the resulting pairs
> as a
> >> new unaligned BAM file.
> >>
> >> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> >> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> >> ?$pair[0]->qseq("GATACA")? give errors like
> >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> >>
> >> Since I did not find explanations or portsions of source code
> indicating how to
> >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> >>
> >> Have a nice day,
> >>
> >> --
> >> Charles Plessy
> >> Tsurumi, Kanagawa, Japan
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From cjfields at illinois.edu  Wed Nov 23 20:07:09 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 24 Nov 2011 01:07:09 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>,
	<CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu>

Ah, okay, makes sense.  I thought it was oddly named. :)

Chris

Sent from my iPad

On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" <lincoln.stein at gmail.com<mailto:lincoln.stein at gmail.com>> wrote:

Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
   Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
   if (items > 1)
     b->core.l_qseq = SvIV(ST(1));
   RETVAL=b->core.l_qseq;
OUTPUT:
   RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
   char* seq;
   int   i;
CODE:
   seq = Newxz(seq,b->core.l_qseq+1,char);
   for (i=0;i<b->core.l_qseq;i++) {
     seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
   }
   RETVAL = newSVpv(seq,b->core.l_qseq);
   Safefree(seq);
OUTPUT:
   RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
>
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
>
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
>
> ~Malcolm
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
>> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org<mailto:bioperl-l at bioperl.org>
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>>
>> Dear BioPerl developers,
>>
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>>
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>>
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>>
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>>
>> Have a nice day,
>>
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From ross at cuhk.edu.hk  Sun Nov 27 03:24:43 2011
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sun, 27 Nov 2011 16:24:43 +0800
Subject: [Bioperl-l] Check the location type for a particular gene in a
	Genbank file
In-Reply-To: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>

Hi all,

To write a script to extract sequence generically for all types of
BioLocation objects, I'd like to know if there is any way to check what
types (e.g. simple or split) are being processed.

Bio::Location::CoordinatePolicyI appears to be doing something similar but
it is more like a post checking step. If I parse the genbank file line by
line, I can certainly check whether the line contains keywords like "join"
but as I'm using something like:

        my @features=grep{$_->primary_tag eq $chkTags[0]}
$seqobj->get_SeqFeatures;                                    
 

        foreach (@features) {

            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';

            @gene=[];                                                   

I'd appreciate if anybody knows a better integration with the well-developed
bioperl module.

Thanks a lot.


From Russell.Smithies at agresearch.co.nz  Sun Nov 27 19:46:05 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 28 Nov 2011 13:46:05 +1300
Subject: [Bioperl-l] Galaxy tools?
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>

Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl?
I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox.
It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space)

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From p.j.a.cock at googlemail.com  Sun Nov 27 20:28:33 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Nov 2011 01:28:33 +0000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
Message-ID: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>

On Monday, November 28, 2011, Smithies, Russell  wrote:
> Possibly the wrong place to ask but has anyone written
> Galaxy tools using BioPerl?
> I was thinking of creating blast graphic and format converter
>  tools as I couldn't see any already available in their toolbox.
> It looks like I can just write a Python wrapper for my existing
> BioPerl scripts - although I suspect the "correct" method is to
> use BioPython methods (but Python annoys me with its lack
> of semi-colons and required white-space)

Galaxy is agnostic about what language the tools are in,
you can use a binary, shell script, Java, Perl, Python etc.

Peter

From florent.angly at gmail.com  Sun Nov 27 21:09:45 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 12:09:45 +1000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
	<CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
Message-ID: <4ED2ED69.10601@gmail.com>

Hi Russell,

As Peter said, the tools to be wrapped do not need to be written in Python.

I have build a few wrappers for Galaxy, including one for the read 
simulator Grinder (http://sourceforge.net/projects/biogrinder/), which 
uses Bioperl and is available in the Galaxy Toolshed 
(http://sourceforge.net/projects/biogrinder/). It is not very hard to do 
a wrapper for trivial programs, but becomes more complicated once you 
start having optional arguments or multiple output files.

Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) 
to parse command-line arguments. I have been thinking about leveraging 
the information that Getopt::Euclid stores about command-line arguments 
to automate most of the Galaxy wrapper generation, but I have not gotten 
to it yet.

Florent


On 28/11/11 11:28, Peter Cock wrote:
> On Monday, November 28, 2011, Smithies, Russell  wrote:
>> Possibly the wrong place to ask but has anyone written
>> Galaxy tools using BioPerl?
>> I was thinking of creating blast graphic and format converter
>>   tools as I couldn't see any already available in their toolbox.
>> It looks like I can just write a Python wrapper for my existing
>> BioPerl scripts - although I suspect the "correct" method is to
>> use BioPython methods (but Python annoys me with its lack
>> of semi-colons and required white-space)
> Galaxy is agnostic about what language the tools are in,
> you can use a binary, shell script, Java, Perl, Python etc.
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Sun Nov 27 23:35:31 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 14:35:31 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
Message-ID: <4ED30F93.4000407@gmail.com>

Hi all,

I have been thinking about starting a set of Perl modules that would 
useful for (microbial) ecologists to represent communities of organisms. 
At the moment, there does not seem to be anything like this in Bioperl. 
I am happy to make these modules available under the Bioperl umbrella 
using the Bio::Community::* namespace.

I envision the following modules:
* Bio::Community::Member module representing members of a community.
* Bio::Community::IO modules to read/write files that describe community 
composition (a.k.a. OTU table, or site by species table) as used 
programs like QIIME, Pyrotagger, GAAS, ...
* Bio::Community::Tools modules to help manipulate communities, e.g. to 
take some members at random, normalize the community to a given number 
of individuals, or do rarefaction curves.

The idea is to implement these modules in Moose to teach myself Moose. 
The members of a community could be a sequence (Bio::SeqI), a species 
(Bio::S), an arbitrary string or even other things. I am not quite sure 
if Bioperl provide facilities to attach some arbitrary information to an 
object.

Any interest? Ideas? Comments?

Thanks,

Florent


From cjfields at illinois.edu  Mon Nov 28 14:42:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:42:12 +0000
Subject: [Bioperl-l] Check the location type for a particular gene in
	a	Genbank file
In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
	<000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu>

Ross,

The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects

chris

On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote:

> Hi all,
> 
> To write a script to extract sequence generically for all types of
> BioLocation objects, I'd like to know if there is any way to check what
> types (e.g. simple or split) are being processed.
> 
> Bio::Location::CoordinatePolicyI appears to be doing something similar but
> it is more like a post checking step. If I parse the genbank file line by
> line, I can certainly check whether the line contains keywords like "join"
> but as I'm using something like:
> 
>        my @features=grep{$_->primary_tag eq $chkTags[0]}
> $seqobj->get_SeqFeatures;                                    
> 
> 
>        foreach (@features) {
> 
>            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';
> 
>            @gene=[];                                                   
> 
> I'd appreciate if anybody knows a better integration with the well-developed
> bioperl module.
> 
> Thanks a lot.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 28 14:47:10 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:47:10 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED30F93.4000407@gmail.com>
References: <4ED30F93.4000407@gmail.com>
Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>

I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?  I do think it should be developed on it's own, per our recent discussions re: slimming down core.

Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.

chris

On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:

> Hi all,
> 
> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
> 
> I envision the following modules:
> * Bio::Community::Member module representing members of a community.
> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
> 
> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
> 
> Any interest? Ideas? Comments?
> 
> Thanks,
> 
> Florent
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From l.m.timmermans at students.uu.nl  Mon Nov 28 15:25:13 2011
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Mon, 28 Nov 2011 21:25:13 +0100
Subject: [Bioperl-l]  Interest in Bio::Community modules
In-Reply-To: <CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
Message-ID: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>

And now to the list too,

On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:

> The idea is to implement these modules in Moose to teach myself Moose. The
> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
> an arbitrary string or even other things. I am not quite sure if Bioperl
> provide facilities to attach some arbitrary information to an object.
>
> Any interest? Ideas? Comments?
>

Sounds like a good use-case for roles, maybe even parametric roles.

Leon

From florent.angly at gmail.com  Mon Nov 28 19:59:24 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 29 Nov 2011 10:59:24 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
Message-ID: <4ED42E6C.6020501@gmail.com>

Hi Chris,

On 29/11/11 05:47, Fields, Christopher J wrote:
> I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?
None of these features would be duplicated. Rather, they would be used 
attributes of the Bio::Community::* objects. For example, a member of a 
community could have a Bio::SeqI attached to it as well as a Bio::Taxon, 
etc...

> I do think it should be developed on it's own, per our recent discussions re: slimming down core.
Yes, the features are so different that it makes sense to have the 
Bio::Community::* modules as a separate BioPerl distribution (like the 
Bio-FeatureIO BioPerl distribution).

> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* 
modules would need to inherit from any other BioPerl modules. 
Considering this and the performance aspects of Moose, do you think that 
using Moose is a wise design decision?

Best,

Florent


> chris
>
> On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:
>
>> Hi all,
>>
>> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
>>
>> I envision the following modules:
>> * Bio::Community::Member module representing members of a community.
>> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
>> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
>>
>> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
>>
>> Any interest? Ideas? Comments?
>>
>> Thanks,
>>
>> Florent
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 29 00:32:50 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 05:32:50 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
	<CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
Message-ID: <C87E8F45-FE8A-4E77-A612-DF1E25C9CA73@illinois.edu>

On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote:

> And now to the list too,
> 
> On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:
> 
>> The idea is to implement these modules in Moose to teach myself Moose. The
>> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
>> an arbitrary string or even other things. I am not quite sure if Bioperl
>> provide facilities to attach some arbitrary information to an object.
>> 
>> Any interest? Ideas? Comments?
>> 
> 
> Sounds like a good use-case for roles, maybe even parametric roles.
> 
> Leon

Yep, agree totally.  It would be a good replacement in most cases for the BioI interfaces.  

(see also, the Biome project, which I'm slooooooowly working on again, on github)

chris

From pmr at ebi.ac.uk  Tue Nov 29 08:39:52 2011
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 29 Nov 2011 13:39:52 +0000
Subject: [Bioperl-l] BinarySearch.pm
Message-ID: <4ED4E0A8.30102@ebi.ac.uk>

In trying to use bioflat_index.pl index files in EMBOSS, I ran into some 
problems.

Both appear to be in the Bio/Flat/BinarySearch.pm source file.

EMBL ID lines are failing to drop the ';' from the ID. Updating the 
regular expression to make sure the ';' is not picked up seems to work:

   if ($format =~ /embl/i) {
     return ('ID',
	    "^ID   (\\S+[^; ])",
	    "^ID   (\\S+[^; ])",
	    {
	     ACC     => q/^AC   (\S+);/,
	     VERSION => q/^SV\s+(\S+)/
	    });
   }

The ACC secondary index has every record duplicated.
This line is duplicated in the write_secondary_indices source code. Is 
that intentional?

  		    print $fh sprintf("%-${length}s",$record);

regards,

Peter Rice
EMBOSS Team

From uni.anastasia at gmail.com  Sat Nov 26 12:32:48 2011
From: uni.anastasia at gmail.com (anastsia shapiro)
Date: Sat, 26 Nov 2011 19:32:48 +0200
Subject: [Bioperl-l] Problem with parsing blast results
Message-ID: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>

Hello,

I'm running a script that should parse a blast results, using searchIO.

Sometimes the script work fines, however sometimes it stops, and I receive
the following error.

------------- EXCEPTION -------------
MSG: no data for midline Query
------------------------------------------------------------
STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
blast.pm:1805
STACK toplevel
D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
-------------------------------------
While the blast results files were received as a result of running the
following blast command:
blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I am using bioperl 1.6.1.
I read all the forums , and it seems to be a bug, but on version 1.5 it was
fixed.

I will really appreciate your help, since I am trying to understand the
problem for over a month.

Regards,
Anastasia

From bunk at novozymes.com  Tue Nov 29 11:46:54 2011
From: bunk at novozymes.com (Jacob Bunk Nielsen)
Date: Tue, 29 Nov 2011 17:46:54 +0100
Subject: [Bioperl-l] Problem with parsing blast results
In-Reply-To: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
	(anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100")
References: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net>

Hi

anastsia shapiro <uni.anastasia at gmail.com> writes:

> I'm running a script that should parse a blast results, using searchIO.
>
> Sometimes the script work fines, however sometimes it stops, and I receive
> the following error.
>
> ------------- EXCEPTION -------------
> MSG: no data for midline Query
> ------------------------------------------------------------
> STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
> blast.pm:1805
> STACK toplevel
> D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
> -------------------------------------
> While the blast results files were received as a result of running the
> following blast command:
> blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
> no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I don't know why this exact problem arises, but I think you should
consider using an output format that is better machine parseable, like
the XML format.

You specify XML as output format of blastn by using -m 7. When reading
the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO.

That way I think you are likely to see a lot fewer problems regarding
the parsing of blast output.

If the above doesn't solve the problem you better show us the code that
fails.

Best regards

Jacob


From cjfields at illinois.edu  Tue Nov 29 14:11:11 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 19:11:11 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED42E6C.6020501@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>

On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:

> Hi Chris,
> 
> On 29/11/11 05:47, Fields, Christopher J wrote:
> ...
>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?

Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.

> Best,
> 
> Florent


chris

From cjfields at illinois.edu  Tue Nov 29 17:30:58 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 22:30:58 +0000
Subject: [Bioperl-l] BinarySearch.pm
In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk>
References: <4ED4E0A8.30102@ebi.ac.uk>
Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu>

Peter, 

Can you send a test file that is failing?  I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files.  I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions.  Both changes pass tests as is, though, so I have committed them in the meantime.

chris

On Nov 29, 2011, at 7:39 AM, Peter Rice wrote:

> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems.
> 
> Both appear to be in the Bio/Flat/BinarySearch.pm source file.
> 
> EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work:
> 
>  if ($format =~ /embl/i) {
>    return ('ID',
> 	    "^ID   (\\S+[^; ])",
> 	    "^ID   (\\S+[^; ])",
> 	    {
> 	     ACC     => q/^AC   (\S+);/,
> 	     VERSION => q/^SV\s+(\S+)/
> 	    });
>  }
> 
> The ACC secondary index has every record duplicated.
> This line is duplicated in the write_secondary_indices source code. Is that intentional?
> 
> 		    print $fh sprintf("%-${length}s",$record);
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Nov 29 20:18:41 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 11:18:41 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
Message-ID: <4ED58471.3030106@gmail.com>

Chris,
Yes, it is exciting to learn something new.
I have developed a bit of code in the last few days in my local git 
repository. Do you think you could create a repository for Bio-Community 
on the Bioperl Github space or is it too soon?
Cheers,
Florent

On 30/11/11 05:11, Fields, Christopher J wrote:
> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>
>> Hi Chris,
>>
>> On 29/11/11 05:47, Fields, Christopher J wrote:
>> ...
>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>
>> Best,
>>
>> Florent
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 29 21:34:00 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 30 Nov 2011 02:34:00 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED58471.3030106@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
Message-ID: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>

On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:

> Chris,
> Yes, it is exciting to learn something new.
> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?

It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:

https://github.com/bioperl/Bio-Community

chris


> Cheers,
> Florent
> 
> On 30/11/11 05:11, Fields, Christopher J wrote:
>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>> 
>>> Hi Chris,
>>> 
>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>> ...
>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>> 
>>> Best,
>>> 
>>> Florent
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Nov 29 21:50:04 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 12:50:04 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
	<A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
Message-ID: <4ED599DC.6090808@gmail.com>

Fantastic! Thank you very much Chris,
Florent

On 30/11/11 12:34, Fields, Christopher J wrote:
> On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:
>
>> Chris,
>> Yes, it is exciting to learn something new.
>> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?
> It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:
>
> https://github.com/bioperl/Bio-Community
>
> chris
>
>
>> Cheers,
>> Florent
>>
>> On 30/11/11 05:11, Fields, Christopher J wrote:
>>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>>> ...
>>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>>>
>>>> Best,
>>>>
>>>> Florent
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lsbrath at gmail.com  Wed Nov 30 00:25:32 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 00:25:32 -0500
Subject: [Bioperl-l] Exception MSG
Message-ID: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>

Hello,

Brushing up on my BioPerl and I can't figure out this MSG:

------------- EXCEPTION -------------

MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out

STACK Bio::Tools::Run::RemoteBlast::save_output
/Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678

STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40

-------------------------------------
 Here is the code:

#!/usr/bin/perl -w

use strict;

use Bio::Tools::Run::RemoteBlast;


#=cut

my $prog = 'blastp';

my $db = 'swissprot';

my $e_val = '1e-10';


my @params = ('-prog' => $prog,

'-data' => $db,

'expect' => $e_val,

'readmethod' => 'SearchIO' );

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);


#human database

$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
[ORGN]';


my $v =1; # this is just to turn on and off the messages

# Construct the sequence object

my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
=> "fasta");


while (my $input = $seq_in->next_seq()){

my $r = $factory->submit_blast($input);

print STDERR "waiting..." if ($v > 0);

while (my @rids = $factory->each_rid()){

foreach my $rid (@rids){

my $rc = $factory->retrieve_blast($rid);

if( !ref($rc) ) {

if($rc < 0){

$factory->remove_rid($rid);

}

print STDERR "." if ($v > 0);

sleep 5;

} else {

my $result = $rc->next_result();

#save output

my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error

$factory->save_output($filename);

$factory->remove_rid($rid);

print "\nQuery Name: ", $result->query_name(), "\n";

          while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

            print "\thit name is ", $hit->name, "\n";

            while( my $hsp = $hit->next_hsp ) {

              print "\t\tscore is ", $hsp->score, "\n";

}

          }

        }

      }

    }

  }


Thanks for the help!

From jason.stajich at gmail.com  Wed Nov 30 01:05:41 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 29 Nov 2011 22:05:41 -0800
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>

I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself.

On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:

> Hello,
> 
> Brushing up on my BioPerl and I can't figure out this MSG:
> 
> ------------- EXCEPTION -------------
> 
> MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> 
> STACK Bio::Tools::Run::RemoteBlast::save_output
> /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> 
> STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> 
> -------------------------------------
> Here is the code:
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> use Bio::Tools::Run::RemoteBlast;
> 
> 
> #=cut
> 
> my $prog = 'blastp';
> 
> my $db = 'swissprot';
> 
> my $e_val = '1e-10';
> 
> 
> my @params = ('-prog' => $prog,
> 
> '-data' => $db,
> 
> 'expect' => $e_val,
> 
> 'readmethod' => 'SearchIO' );
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #human database
> 
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> [ORGN]';
> 
> 
> my $v =1; # this is just to turn on and off the messages
> 
> # Construct the sequence object
> 
> my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
> => "fasta");
> 
> 
> while (my $input = $seq_in->next_seq()){
> 
> my $r = $factory->submit_blast($input);
> 
> print STDERR "waiting..." if ($v > 0);
> 
> while (my @rids = $factory->each_rid()){
> 
> foreach my $rid (@rids){
> 
> my $rc = $factory->retrieve_blast($rid);
> 
> if( !ref($rc) ) {
> 
> if($rc < 0){
> 
> $factory->remove_rid($rid);
> 
> }
> 
> print STDERR "." if ($v > 0);
> 
> sleep 5;
> 
> } else {
> 
> my $result = $rc->next_result();
> 
> #save output
> 
> my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> 
> $factory->save_output($filename);
> 
> $factory->remove_rid($rid);
> 
> print "\nQuery Name: ", $result->query_name(), "\n";
> 
>          while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>            print "\thit name is ", $hit->name, "\n";
> 
>            while( my $hsp = $hit->next_hsp ) {
> 
>              print "\t\tscore is ", $hsp->score, "\n";
> 
> }
> 
>          }
> 
>        }
> 
>      }
> 
>    }
> 
>  }
> 
> 
> 
> Thanks for the help!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ss2489 at cornell.edu  Wed Nov 30 09:32:47 2011
From: ss2489 at cornell.edu (Surya Saha)
Date: Wed, 30 Nov 2011 09:32:47 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
Message-ID: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>

If that does not fix it, try using one of the unique identifiers as the
file name (gi??) instead of the full query name. The pipe(|) characters
might cause problems.

On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:

> I don't think you need to give it the '>' when you specify the filename
> for the output. That is done by the filehandle opening itsself.
>
> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>
> > Hello,
> >
> > Brushing up on my BioPerl and I can't figure out this MSG:
> >
> > ------------- EXCEPTION -------------
> >
> > MSG: cannot open
> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> >
> > STACK Bio::Tools::Run::RemoteBlast::save_output
> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> >
> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> >
> > -------------------------------------
> > Here is the code:
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use Bio::Tools::Run::RemoteBlast;
> >
> >
> > #=cut
> >
> > my $prog = 'blastp';
> >
> > my $db = 'swissprot';
> >
> > my $e_val = '1e-10';
> >
> >
> > my @params = ('-prog' => $prog,
> >
> > '-data' => $db,
> >
> > 'expect' => $e_val,
> >
> > 'readmethod' => 'SearchIO' );
> >
> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #human database
> >
> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> > [ORGN]';
> >
> >
> > my $v =1; # this is just to turn on and off the messages
> >
> > # Construct the sequence object
> >
> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
> -format
> > => "fasta");
> >
> >
> > while (my $input = $seq_in->next_seq()){
> >
> > my $r = $factory->submit_blast($input);
> >
> > print STDERR "waiting..." if ($v > 0);
> >
> > while (my @rids = $factory->each_rid()){
> >
> > foreach my $rid (@rids){
> >
> > my $rc = $factory->retrieve_blast($rid);
> >
> > if( !ref($rc) ) {
> >
> > if($rc < 0){
> >
> > $factory->remove_rid($rid);
> >
> > }
> >
> > print STDERR "." if ($v > 0);
> >
> > sleep 5;
> >
> > } else {
> >
> > my $result = $rc->next_result();
> >
> > #save output
> >
> > my $filename =
> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> >
> > $factory->save_output($filename);
> >
> > $factory->remove_rid($rid);
> >
> > print "\nQuery Name: ", $result->query_name(), "\n";
> >
> >          while ( my $hit = $result->next_hit ) {
> >
> >            next unless ( $v > 0);
> >
> >            print "\thit name is ", $hit->name, "\n";
> >
> >            while( my $hsp = $hit->next_hsp ) {
> >
> >              print "\t\tscore is ", $hsp->score, "\n";
> >
> > }
> >
> >          }
> >
> >        }
> >
> >      }
> >
> >    }
> >
> >  }
> >
> >
> >
> > Thanks for the help!
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From lsbrath at gmail.com  Wed Nov 30 09:34:52 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 09:34:52 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
	<CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
Message-ID: <CAJm=ba-yP6q53NunpxPJzdurthGE2uN3GAtiGs7eHm1rY6AdoA@mail.gmail.com>

Surya,

As Jason suggested, I removed the '>' and it worked. Thanks for your
response.

Lom

On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha <ss2489 at cornell.edu> wrote:

> If that does not fix it, try using one of the unique identifiers as the
> file name (gi??) instead of the full query name. The pipe(|) characters
> might cause problems.
>
> On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:
>
>> I don't think you need to give it the '>' when you specify the filename
>> for the output. That is done by the filehandle opening itsself.
>>
>> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>>
>> > Hello,
>> >
>> > Brushing up on my BioPerl and I can't figure out this MSG:
>> >
>> > ------------- EXCEPTION -------------
>> >
>> > MSG: cannot open
>> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
>> >
>> > STACK Bio::Tools::Run::RemoteBlast::save_output
>> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
>> >
>> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
>> >
>> > -------------------------------------
>> > Here is the code:
>> >
>> > #!/usr/bin/perl -w
>> >
>> > use strict;
>> >
>> > use Bio::Tools::Run::RemoteBlast;
>> >
>> >
>> > #=cut
>> >
>> > my $prog = 'blastp';
>> >
>> > my $db = 'swissprot';
>> >
>> > my $e_val = '1e-10';
>> >
>> >
>> > my @params = ('-prog' => $prog,
>> >
>> > '-data' => $db,
>> >
>> > 'expect' => $e_val,
>> >
>> > 'readmethod' => 'SearchIO' );
>> >
>> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> >
>> >
>> > #human database
>> >
>> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
>> > [ORGN]';
>> >
>> >
>> > my $v =1; # this is just to turn on and off the messages
>> >
>> > # Construct the sequence object
>> >
>> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
>> -format
>> > => "fasta");
>> >
>> >
>> > while (my $input = $seq_in->next_seq()){
>> >
>> > my $r = $factory->submit_blast($input);
>> >
>> > print STDERR "waiting..." if ($v > 0);
>> >
>> > while (my @rids = $factory->each_rid()){
>> >
>> > foreach my $rid (@rids){
>> >
>> > my $rc = $factory->retrieve_blast($rid);
>> >
>> > if( !ref($rc) ) {
>> >
>> > if($rc < 0){
>> >
>> > $factory->remove_rid($rid);
>> >
>> > }
>> >
>> > print STDERR "." if ($v > 0);
>> >
>> > sleep 5;
>> >
>> > } else {
>> >
>> > my $result = $rc->next_result();
>> >
>> > #save output
>> >
>> > my $filename =
>> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
>> >
>> > $factory->save_output($filename);
>> >
>> > $factory->remove_rid($rid);
>> >
>> > print "\nQuery Name: ", $result->query_name(), "\n";
>> >
>> >          while ( my $hit = $result->next_hit ) {
>> >
>> >            next unless ( $v > 0);
>> >
>> >            print "\thit name is ", $hit->name, "\n";
>> >
>> >            while( my $hsp = $hit->next_hsp ) {
>> >
>> >              print "\t\tscore is ", $hsp->score, "\n";
>> >
>> > }
>> >
>> >          }
>> >
>> >        }
>> >
>> >      }
>> >
>> >    }
>> >
>> >  }
>> >
>> >
>> >
>> > Thanks for the help!
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

From ericdemuinck at gmail.com  Wed Nov 30 18:36:36 2011
From: ericdemuinck at gmail.com (Ericde)
Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST)
Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form
Message-ID: <32886592.post@talk.nabble.com>


:-/

I am a newbie and I am trying to retrieve a blast multiple alignment in
fasta form. The BLAST output (m -2) gives several alignments (which is good)
and the parsing of the xml file seems to list all of these alignments (which
is also good) 

The problem is that the fasta alignment file only includes one of the hits
and the alignment does not include all the sequences (including the query
sequence).

I would like to generate a fasta file that includes all the alignments
included in the m -2 output (plus query sequence if possible). I have
cobbled together a script (below) ...I will attach the sample blast xml file
and the (m -2) file as well....any insight is appreciated :/

#module load perl
 
#give the name of the blast xml file to parse in the line where it says
'file =>'
use Bio::SearchIO; 
#Use m -7 to generate xml file from blastall
my $in = new Bio::SearchIO(-format => 'blastxml', 
                           -file   => 'BLASToutxml');
while( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
#ENTER desired sequence length
      if( $hsp->length('total') > 50 ) {
#ENTER desired percent identity
        if ( $hsp->percent_identity >= 75 ) {
          print "Query=",   $result->query_name,
            " Hit=",        $hit->name,
            " Length=",     $hsp->length('total'),
            " Percent_id=", $hsp->percent_identity, "\n";
#Print alignment to file
#$aln will be a Bio::SimpleAlign object
       use Bio::AlignIO;
           my $aln = $hsp->get_aln;

#changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file 
          my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file =>
">hsp.fas"); 
      $alnIO->write_aln($aln);

        }
      }
    }  
  }
}
http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml 
http://old.nabble.com/file/p32886592/hsp.fas hsp.fas 
-- 
View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From hrh at fmi.ch  Tue Nov  1 06:18:54 2011
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Tue, 1 Nov 2011 11:18:54 +0100
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
Message-ID: <CAD5861E.14042%hrh@fmi.ch>

Hi Carn?

Please allow me to make a few comments:

I very much like your idea of writing a free tool to edit and draw
sequences. We (ie people working in core Bioinformatics facilities) all
suffer from having to deal with files originally created with commercial
packages. And on top of all the pain, those commercial packages are very
expensive and they don't deliver what they promise to do.


Just double checking: Have you looked a the free tools which are available?

I am aware of the following ones (as far as I know, they are all GUI based
and don't have a command line API):

Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
GENtle            http://gentle.magnusmanske.de/
GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
pDRAW32           http://www.acaclone.com/
Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
UGene             http://ugene.unipro.ru/

maybe others on the list know of even better free tools?

Also, have you looked at the emboss tool "cirdna" ?


WRT file formats: I strongly suggest to stick to embl and genbank format as
input and (text) output format. The features are not indexed, but you can
create your own when you store the sequences in your system. Internally, you
probably wanna keep the data in a 'simpler' format than embl or genbank,
anyway.

Alternatively, have you looked at gff/gtf as away of getting features?
see: 

http://www.sequenceontology.org/gff3.shtml
http://mblab.wustl.edu/GTF22.html


I am looking forward to any progress you make

Regards, Hans


Hans-Rudolf Hotz, PhD
Bioinformatics Support

Friedrich Miescher Institute for Biomedical Research
Maulbeerstrasse 66
4058 Basel/Switzerland


On 10/31/11 7:05 PM, "Carn? Draug" <carandraug+dev at gmail.com> wrote:

> Hi
> 
> I've been planning on writing a free (as in freedom) tool to edit
> sequences and make plamids maps. The idea is to build the command line
> tool first and maybe later work on a GUI for it.
> 
> The problem I foresee at the moment while designing it, is how to
> change a feature of the sequence. I'm not familiar with all sequence
> formats (only fasta, ensembl and genbank) but I can't see how to
> specify from the command line what feature to edit since I can't see
> any unique identifiers for them. Is there a file format that makes
> this easier? Any tips would be most appreciated.
> 
> Thank in advance,
> Carn? Draug
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 09:40:30 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 13:40:30 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>

On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote:

> Hi,
> 
> I am having problems running Bio::Index::Fastq.  I get the following error when a quality line begins with '@'.
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: No description line parsed
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
> STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
> STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
> STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198
> STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68
> 
> 
> Here is an example of a fastq record that is causing this error, The last line which starts with an '@'  is actually the qual line.
> 
> @5:105:15806:16092:Y
> GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG
> +
> @9;A565:=8B?<E<DEEBEE<E3BB?3??BCCF2<@@=BGGBDB60:64594.81?<B??;3?8-984?
> 
> 
> 
> i see that chris has partially addressed this in the mailing list
> http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html
> 
> However as he pointed out at the time, it appears this may be a fairly large problem.

The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not.  I can try to push this to the forefront this week, the fix shouldn't be too hard to implement.  In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running.

> My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0  would work since the header lines are always the first of 4 lines , 0,4,8, etc.

That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. 

One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing.  There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again.

A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second).  The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use.

> But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence
> 
> 
> ## only for single line seq and qual
> my $line_count = 0;
>   while (<$FASTQ>) {
>       if (/^@/ and  $line_count % 4 == 0) {
>           # $begin is the position of the first character after the '@'
>           my $begin = tell($FASTQ) - length( $_ ) + 1;
>           foreach my $id (&$id_parser($_)) {
>               $self->add_record($id, $i, $begin);
>               $c++;
>           }
>       }
>       $line_count++;
>   }
> 
> 
> --
> BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID?
> 
> There's one called cdbfasta which looks like it might work ? does anyone have experience with it?

I haven't, but it appears FASTA-specific.  Does it parse FASTQ as well?  

I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well.  May have to look that one up.

> Thanks,
> sofia
> 
> P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here.


chris


From p.j.a.cock at googlemail.com  Tue Nov  1 10:38:43 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 14:38:43 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
Message-ID: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>

On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
>
> One problem the various Bio* indexers have currently is the lack of
> standardization on a specific schema for indexing. ?There are in-roads
> towards this (OBDA) that haven't been adequately traveled IMHO,
> which need to be taken up again.
>

Something to switch to open-bio-l at lists.open-bio.org for,
http://lists.open-bio.org/mailman/listinfo/open-bio-l

We can continue this thread from last summer,
http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
...
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html

And CC Peter Rice from EMBOSS too - we chatted about this
at ISMB/BOSC 2011 in July - and whomever looks after the
OBDA/indexing code in BioRuby and BioJava too.

> A second, and maybe this is more specific to BioPerl, is that the
> parsers and indexers essentially reimplement the format parsing
> in each module, so if there are bugs they have to be independently
> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
> first but not the second). ?The best place for any optimizations
> would be in a unified parser that both the SeqIO and indexer
> modules could use.

We have that problem to an extent in Biopython's Bio.SeqIO code.
The indexing code duplicates some logic of the parsing code
(how much depends on the format), sufficient to extract the read
ID and the bounds on disk. The two could be more unified but
the parsers came first and didn't want to change them at the time.
Instead I tried to be rigorous in consistency testing for the index
code's unit tests.

Regards,

Peter


From carandraug+dev at gmail.com  Tue Nov  1 11:13:06 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Tue, 1 Nov 2011 15:13:06 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
	<CAD5861E.14042%hrh@fmi.ch>
Message-ID: <CAPOrs_0rZcokpSvAhMM3gtKWgeH3knDuTfnyybPJUU5D-WEgpA@mail.gmail.com>

On 1 November 2011 10:18, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):

They are not all free. Just for future reference, here's their licenses:

> Serial Cloner

Couldn't find a license and the download for linux has no source so
I'm guessing proprietary.

> GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/

Free under GPL

> GeneCoder

Proprietary

> pDRAW32

Proprietary

> Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/

Seems public domain. License is not defined anywhere but the files I
checked had the public domain notice on the header

> Ape

Proprietary ("license" is at the top of AppMain.tcl)

> UGene ? ? ? ? ? ? http://ugene.unipro.ru/

Free under GPL

> Also, have you looked at the emboss tool "cirdna" ?

Free under GPL

> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html

Considering the already existing alternatives, I'm more likely to
collaborate with one of them to do what I want. I'll just have to
check them all and decide. I was planning on writing a new tool and
contribute it to the scripts section of bioperl since when I googled
before all the links only the proprietary tools showed up. Thank you
very much for the links.

Carn?


From roy.chaudhuri at gmail.com  Tue Nov  1 11:44:19 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 01 Nov 2011 15:44:19 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAD5861E.14042%hrh@fmi.ch>
Message-ID: <4EB013D3.30801@gmail.com>

The Sanger Institute's Artemis is good for editing sequence features, 
and DNAPlotter can be used to produce circular diagrams:

http://www.sanger.ac.uk/resources/software/artemis
http://www.sanger.ac.uk/resources/software/dnaplotter

Roy.

On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote:
> Hi Carn?
>
> Please allow me to make a few comments:
>
> I very much like your idea of writing a free tool to edit and draw
> sequences. We (ie people working in core Bioinformatics facilities) all
> suffer from having to deal with files originally created with commercial
> packages. And on top of all the pain, those commercial packages are very
> expensive and they don't deliver what they promise to do.
>
>
> Just double checking: Have you looked a the free tools which are available?
>
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):
>
> Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
> GENtle            http://gentle.magnusmanske.de/
> GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
> pDRAW32           http://www.acaclone.com/
> Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
> Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
> UGene             http://ugene.unipro.ru/
>
> maybe others on the list know of even better free tools?
>
> Also, have you looked at the emboss tool "cirdna" ?
>
>
> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html
>
>
>
> I am looking forward to any progress you make
>
> Regards, Hans
>
>
>
> Hans-Rudolf Hotz, PhD
> Bioinformatics Support
>
> Friedrich Miescher Institute for Biomedical Research
> Maulbeerstrasse 66
> 4058 Basel/Switzerland
>
>
>
> On 10/31/11 7:05 PM, "Carn? Draug"<carandraug+dev at gmail.com>  wrote:
>
>> Hi
>>
>> I've been planning on writing a free (as in freedom) tool to edit
>> sequences and make plamids maps. The idea is to build the command line
>> tool first and maybe later work on a GUI for it.
>>
>> The problem I foresee at the moment while designing it, is how to
>> change a feature of the sequence. I'm not familiar with all sequence
>> formats (only fasta, ensembl and genbank) but I can't see how to
>> specify from the command line what feature to edit since I can't see
>> any unique identifiers for them. Is there a file format that makes
>> this easier? Any tips would be most appreciated.
>>
>> Thank in advance,
>> Carn? Draug
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Tue Nov  1 12:02:24 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 1 Nov 2011 09:02:24 -0700
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>


I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Jason
On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> 
>> One problem the various Bio* indexers have currently is the lack of
>> standardization on a specific schema for indexing.  There are in-roads
>> towards this (OBDA) that haven't been adequately traveled IMHO,
>> which need to be taken up again.
>> 
> 
> Something to switch to open-bio-l at lists.open-bio.org for,
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
> 
> We can continue this thread from last summer,
> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
> ...
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
> 
> And CC Peter Rice from EMBOSS too - we chatted about this
> at ISMB/BOSC 2011 in July - and whomever looks after the
> OBDA/indexing code in BioRuby and BioJava too.
> 
>> A second, and maybe this is more specific to BioPerl, is that the
>> parsers and indexers essentially reimplement the format parsing
>> in each module, so if there are bugs they have to be independently
>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>> first but not the second).  The best place for any optimizations
>> would be in a unified parser that both the SeqIO and indexer
>> modules could use.
> 
> We have that problem to an extent in Biopython's Bio.SeqIO code.
> The indexing code duplicates some logic of the parsing code
> (how much depends on the format), sufficient to extract the read
> ID and the bounds on disk. The two could be more unified but
> the parsers came first and didn't want to change them at the time.
> Instead I tried to be rigorous in consistency testing for the index
> code's unit tests.
> 
> Regards,
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 13:44:25 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 17:44:25 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>

On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:

> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point.  

> I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data.  The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)?  Or are there problems afoot there we're unaware of?

Re: specifics, I think Biopython uses SQLite, is that correct Peter?  

chris

> Jason
> On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:
> 
>> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> 
>>> One problem the various Bio* indexers have currently is the lack of
>>> standardization on a specific schema for indexing.  There are in-roads
>>> towards this (OBDA) that haven't been adequately traveled IMHO,
>>> which need to be taken up again.
>>> 
>> 
>> Something to switch to open-bio-l at lists.open-bio.org for,
>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>> 
>> We can continue this thread from last summer,
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
>> ...
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
>> 
>> And CC Peter Rice from EMBOSS too - we chatted about this
>> at ISMB/BOSC 2011 in July - and whomever looks after the
>> OBDA/indexing code in BioRuby and BioJava too.
>> 
>>> A second, and maybe this is more specific to BioPerl, is that the
>>> parsers and indexers essentially reimplement the format parsing
>>> in each module, so if there are bugs they have to be independently
>>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>>> first but not the second).  The best place for any optimizations
>>> would be in a unified parser that both the SeqIO and indexer
>>> modules could use.
>> 
>> We have that problem to an extent in Biopython's Bio.SeqIO code.
>> The indexing code duplicates some logic of the parsing code
>> (how much depends on the format), sufficient to extract the read
>> ID and the bounds on disk. The two could be more unified but
>> the parsers came first and didn't want to change them at the time.
>> Instead I tried to be rigorous in consistency testing for the index
>> code's unit tests.
>> 
>> Regards,
>> 
>> Peter
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From p.j.a.cock at googlemail.com  Tue Nov  1 14:06:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 18:06:50 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
Message-ID: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>

On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>
>> I think a different indexer is needed for the scale of key/value
>> pairs we see in fastq files if we want to make a fast lookup by
>> ID. I think speed is of essence for this type of solution and so
>> a forced all records must be 4 lines long is okay for this type
>> of implementation.
>
> This can always be an early optimization, that's easy enough.
> But I'm sure we will have to deal with multi-line seq/qual
> FASTQ at some point.
>
>> I found NOSQL implementations to be much better
>> performance and than any of the BDB type solutions -- they
>> end up being really slow at above 1-5M keys. ?I used
>> TokyoCabinet and KyotoCabinet to do indexing of accession
>> -> taxonomy ID and found it quite fast for the needs. I
>> haven't tried storing 100bp reads + qual string as the
>> value in it yet but I think it could be done, certainly worth
>> a prototype.
>
> Adding a middle layer where the backend storage is abstracted
> is the probably the (best|most flexible) option, converging on a
> good default that will work for this data. ?The actual interface is
> in place, though would it be more feasible to go the OBDA
> (converge on a cross-Bio* compatible schema)? ?Or are there
> problems afoot there we're unaware of?
>
> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>
> chris

Yes, we're using SQLite3 to store essentially a list of filenames
and their format as one table, and then in the main table an
entry for each sequence recording the ID (only one accession,
unlike OBDA which had infrastructure for a secondary accession),
file number, offset of the start of the record, and optionally the
length of the record on disk.

i.e. Basically what OBDA does, but using SQLite rather
than BDB (not included in Python 3) or a flat file index
(poor performance with large datasets).

I find this design attractive on several levels:
* File format neutral, covers FASTA, FASTQ, GenBank, etc
* Preserves the original file untouched
* Index is a small single file (thanks to SQLite)
* Back end could be switched out
* Could be applied to compressed file formats
* Reuses existing parsing code to access entries

This could easily form basis of OBDA v2, the main points
of difference I anticipate between the Bio* projects would
be naming conventions for the different file formats, and
what we consider to be the default record ID of each read
(e.g. which field in a GenBank file - although agreement
here is not essential). Some of that was already settled in
principle with OBDA v1.

On the other hand, you could try and store the parsed data
itself, which is where NOSQL looks more interesting. That
essentially requires the ability to serialise your annotated
sequence object model to disk - which would be tricky to do
cross project (much more ambitious than BioSQL is). It also
means the "index" becomes very large because it now holds
all the original data.

Peter


From wenbinmei at gmail.com  Wed Nov  2 00:25:32 2011
From: wenbinmei at gmail.com (wenbin mei)
Date: Wed, 2 Nov 2011 00:25:32 -0400
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
Message-ID: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>

Hi,

I need some help in coding. I have a multiple sequence alignment which has
gaps. And also I have a reference genome sequence in the alignment which I
know all the coordinates for the protein coding genes. I want to extract
all these protein coding genes alignment from the big alignment. I am using
Bio SimpleAlign but the question is that due to the gaps in the alignment,
the coordinates has shifted in the alignment. I wonder is there a way I can
not count the gaps and still be able to extract the protein alignment. One
way I can do is remove the gaps in the reference first and then extract the
sequence. But I don't like this way ... Thank you for help.

-best,
wenbin


From dejian.zhao at gmail.com  Wed Nov  2 09:33:18 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Wed, 02 Nov 2011 21:33:18 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
	phylogenetic tree
Message-ID: <4EB1469E.4050108@gmail.com>

There are various packages on CPAN to cope with phylogenetic analysis. I 
wonder which module can read the output from other phylogenetic 
softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to 
produce a picture which combines the phylogenetic tree and the structure 
of each gene.


From roy.chaudhuri at gmail.com  Wed Nov  2 09:49:46 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 02 Nov 2011 13:49:46 +0000
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB1469E.4050108@gmail.com>
References: <4EB1469E.4050108@gmail.com>
Message-ID: <4EB14A7A.30307@gmail.com>

MEGA can export trees in Newick format, which can be read by 
Bio::TreeIO. The tree can be drawn in EPS format using 
Bio::Tree::Draw::Cladogram. See:
http://www.bioperl.org/wiki/HOWTO:Trees

Roy.

On 02/11/2011 13:33, Dejian Zhao wrote:
> There are various packages on CPAN to cope with phylogenetic analysis. I
> wonder which module can read the output from other phylogenetic
> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
> produce a picture which combines the phylogenetic tree and the structure
> of each gene.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Wed Nov  2 12:29:45 2011
From: jun.yin at ucd.ie (Jun Yin)
Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT)
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
 alignment
In-Reply-To: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie>

Hi,
 
You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
 
$aln2 = $aln->slice(20, 30);
 
Cheers,
Jun


----- Original Message -----
From: wenbin mei <wenbinmei at gmail.com>
Date: Wednesday, November 2, 2011 5:51 am
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
To: bioperl-l at lists.open-bio.org

> Hi,
> 
> I need some help in coding. I have a multiple sequence alignment 
> which has
> gaps. And also I have a reference genome sequence in the 
> alignment which I
> know all the coordinates for the protein coding genes. I want to 
> extractall these protein coding genes alignment from the big 
> alignment. I am using
> Bio SimpleAlign but the question is that due to the gaps in the 
> alignment,the coordinates has shifted in the alignment. I wonder 
> is there a way I can
> not count the gaps and still be able to extract the protein 
> alignment. One
> way I can do is remove the gaps in the reference first and then 
> extract the
> sequence. But I don't like this way ... Thank you for help.
> 
> -best,
> wenbin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dejian.zhao at gmail.com  Wed Nov  2 21:39:22 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Thu, 03 Nov 2011 09:39:22 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB14A7A.30307@gmail.com>
References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com>
Message-ID: <4EB1F0CA.80309@gmail.com>

That's great!
Many thanks, Roy.

On 2011-11-2 21:49, Roy Chaudhuri wrote:
> MEGA can export trees in Newick format, which can be read by 
> Bio::TreeIO. The tree can be drawn in EPS format using 
> Bio::Tree::Draw::Cladogram. See:
> http://www.bioperl.org/wiki/HOWTO:Trees
>
> Roy.
>
> On 02/11/2011 13:33, Dejian Zhao wrote:
>> There are various packages on CPAN to cope with phylogenetic analysis. I
>> wonder which module can read the output from other phylogenetic
>> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
>> produce a picture which combines the phylogenetic tree and the structure
>> of each gene.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From noncoding at gmail.com  Thu Nov  3 05:59:26 2011
From: noncoding at gmail.com (Remo Sanges)
Date: Thu, 03 Nov 2011 10:59:26 +0100
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
	alignment
In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
	<7300ecdd1dd56.4eb16ff9@ucd.ie>
Message-ID: <4EB265FE.30909@gmail.com>

To get the location in the initial sequence starting from a column in a 
multiple alignment you can:

1) create a Bio::LocatableSeq compliant object by using the method 
each_seq_with_id on the SimpleAlign object

2) then using the method location_from_column on the created 
LocatableSeq object

HTH

ERemo


-- 
Remo Sanges
Bioinformatics - Animal Physiology and Evolution
Stazione Zoologica Anton Dohrn
Villa Comunale, 80121 Napoli - Italy
+39 081 5833428


On 11/2/11 5:29 PM, Jun Yin wrote:
> Hi,
>
> You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
>
> $aln2 = $aln->slice(20, 30);
>
> Cheers,
> Jun
>
>
> ----- Original Message -----
> From: wenbin mei<wenbinmei at gmail.com>
> Date: Wednesday, November 2, 2011 5:51 am
> Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
> To: bioperl-l at lists.open-bio.org
>
>> Hi,
>>
>> I need some help in coding. I have a multiple sequence alignment
>> which has
>> gaps. And also I have a reference genome sequence in the
>> alignment which I
>> know all the coordinates for the protein coding genes. I want to
>> extractall these protein coding genes alignment from the big
>> alignment. I am using
>> Bio SimpleAlign but the question is that due to the gaps in the
>> alignment,the coordinates has shifted in the alignment. I wonder
>> is there a way I can
>> not count the gaps and still be able to extract the protein
>> alignment. One
>> way I can do is remove the gaps in the reference first and then
>> extract the
>> sequence. But I don't like this way ... Thank you for help.
>>
>> -best,
>> wenbin
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Gallone at sms.ed.ac.uk  Thu Nov  3 07:50:11 2011
From: G.Gallone at sms.ed.ac.uk (Giuseppe G.)
Date: Thu, 03 Nov 2011 11:50:11 +0000
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk>

Hi,

I would be grateful if you could shed some light on the exact meaning of 
the method overall_percentage_identity in Bio::SimpleAlign.

If I understand correctly, the method works by considering only 
aminoacids that are identical over all the members of the alignment, and 
then averaging over the total number of aminoacids in the sequence. Is 
this correct?

Thank you
Giuseppe
-- 

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Thu Nov  3 09:22:21 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 3 Nov 2011 14:22:21 +0100
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk>
References: <4EB27FF3.9050203@sms.ed.ac.uk>
Message-ID: <CAM3TQQWm46SWfu-6DANDaoppi8oLKGuzwGm8uxkVkf_JAog3xg@mail.gmail.com>

Hi Giuseppe,

If I understand correctly, the method works by considering only aminoacids
> that are identical over all the members of the alignment


Yes.


> , and then averaging over the total number of aminoacids in the sequence.
> Is this correct?
>

Almost.

By default, the denominator is the alignment length, namely the length of
the MSA including gaps. By means of the 'short' and 'long' options, it's
also possible to use the shortest or longest sequence's ungapped lengths as
the denominator.


Dave


From cjfields at illinois.edu  Thu Nov  3 14:28:36 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 18:28:36 +0000
Subject: [Bioperl-l] OBDA redux? was Re:  Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
	<CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
Message-ID: <ED419B5E-9C55-478F-BDD6-C2B663ABE636@illinois.edu>

(side thread, so re-titling...)

On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>> 
>>> I think a different indexer is needed for the scale of key/value
>>> pairs we see in fastq files if we want to make a fast lookup by
>>> ID. I think speed is of essence for this type of solution and so
>>> a forced all records must be 4 lines long is okay for this type
>>> of implementation.
>> 
>> This can always be an early optimization, that's easy enough.
>> But I'm sure we will have to deal with multi-line seq/qual
>> FASTQ at some point.
>> 
>>> I found NOSQL implementations to be much better
>>> performance and than any of the BDB type solutions -- they
>>> end up being really slow at above 1-5M keys.  I used
>>> TokyoCabinet and KyotoCabinet to do indexing of accession
>>> -> taxonomy ID and found it quite fast for the needs. I
>>> haven't tried storing 100bp reads + qual string as the
>>> value in it yet but I think it could be done, certainly worth
>>> a prototype.
>> 
>> Adding a middle layer where the backend storage is abstracted
>> is the probably the (best|most flexible) option, converging on a
>> good default that will work for this data.  The actual interface is
>> in place, though would it be more feasible to go the OBDA
>> (converge on a cross-Bio* compatible schema)?  Or are there
>> problems afoot there we're unaware of?
>> 
>> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>> 
>> chris
> 
> Yes, we're using SQLite3 to store essentially a list of filenames
> and their format as one table, and then in the main table an
> entry for each sequence recording the ID (only one accession,
> unlike OBDA which had infrastructure for a secondary accession),
> file number, offset of the start of the record, and optionally the
> length of the record on disk.
> 
> i.e. Basically what OBDA does, but using SQLite rather
> than BDB (not included in Python 3) or a flat file index
> (poor performance with large datasets).
> 
> I find this design attractive on several levels:
> * File format neutral, covers FASTA, FASTQ, GenBank, etc
> * Preserves the original file untouched
> * Index is a small single file (thanks to SQLite)
> * Back end could be switched out
> * Could be applied to compressed file formats
> * Reuses existing parsing code to access entries
> 
> This could easily form basis of OBDA v2, the main points
> of difference I anticipate between the Bio* projects would
> be naming conventions for the different file formats, and
> what we consider to be the default record ID of each read
> (e.g. which field in a GenBank file - although agreement
> here is not essential). Some of that was already settled in
> principle with OBDA v1.

The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested).

> On the other hand, you could try and store the parsed data
> itself, which is where NOSQL looks more interesting. That
> essentially requires the ability to serialise your annotated
> sequence object model to disk - which would be tricky to do
> cross project (much more ambitious than BioSQL is). It also
> means the "index" becomes very large because it now holds
> all the original data.
> 
> Peter

For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc).  Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs).  Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully.

Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs.

chris


From p.j.a.cock at googlemail.com  Thu Nov  3 14:52:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 3 Nov 2011 18:52:50 +0000
Subject: [Bioperl-l] OBDA redux?
Message-ID: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>

On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> (side thread, so re-titling...)
>

And CC'ing open-bio-l, which is a better home for this than bioperl-l,
where OBDA v2 talk came up again in discussion of a BioPerl indexing
problem. Archive links for thread here:

http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>
>> Yes, we're using SQLite3 to store essentially a list of filenames
>> and their format as one table, and then in the main table an
>> entry for each sequence recording the ID (only one accession,
>> unlike OBDA which had infrastructure for a secondary accession),
>> file number, offset of the start of the record, and optionally the
>> length of the record on disk.
>>
>> i.e. Basically what OBDA does, but using SQLite rather
>> than BDB (not included in Python 3) or a flat file index
>> (poor performance with large datasets).
>>
>> I find this design attractive on several levels:
>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>> * Preserves the original file untouched
>> * Index is a small single file (thanks to SQLite)
>> * Back end could be switched out
>> * Could be applied to compressed file formats
>> * Reuses existing parsing code to access entries
>>
>> This could easily form basis of OBDA v2, the main points
>> of difference I anticipate between the Bio* projects would
>> be naming conventions for the different file formats, and
>> what we consider to be the default record ID of each read
>> (e.g. which field in a GenBank file - although agreement
>> here is not essential). Some of that was already settled in
>> principle with OBDA v1.
>
> The primary/secondary IDs could be configurable with a sane
> default, I think the bioperl implementations allowed this (and
> it is certainly something that will be requested).

One reason I went with a single ID only was to keep the
Python dictionary based API simple (think hash in Perl).
You don't get secondary keys in a Python dict or a hash ;)

As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
can provide a call back function to map the suggested ID to
something else. Obviously this doesn't give the full flexibility
of extracting a field from the record's annotation because we
don't parse the whole record during indexing (it would be too
slow).

However, I'm happy for there to be an *optional* secondary
key in an OBDA v2 SQLite schema, but Biopython probably
won't populate it. We could optionally use it rather than the
primary ID on loading an existing index though.

Personally I would stick with one key in the index - it should
be faster and makes it simpler to switch the back end if we
need to later. If anyone wants a second key, they can build
a second index *grin*.

>> On the other hand, you could try and store the parsed data
>> itself, which is where NOSQL looks more interesting. That
>> essentially requires the ability to serialise your annotated
>> sequence object model to disk - which would be tricky to do
>> cross project (much more ambitious than BioSQL is). It also
>> means the "index" becomes very large because it now holds
>> all the original data.
>>
>> Peter
>
> For a fully cross-Bio* compliant format, I don't think it's feasible
> to use serialized data unless they are serialized in something
> that is easily deserialized across HLLs (JSON, BSON, YAML,
> XML, etc).  Either that, or such data is stored concurrently with
> the binary blob, along with meta data that indicates the source
> of the blob, parser, version, etc, etc (unless there are tools out
> there that reliably interconvert serialized complex data structures
> between HLLs).  Anyway you go about it, it seems like it could
> be a major ball of hurt, unless implemented very carefully.

You missed out RDF as a serialisation ;)

But yes, going down the shared serialisation route is going
to be messy - as you are well aware:

> Aside: I think this was one of the problems with
> Bio::DB::SeqFeature::Store, in that it at one point stored
> Perl-specific Storable blobs.
>
> chris

Peter


From cjfields at illinois.edu  Thu Nov  3 15:47:51 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 19:47:51 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
Message-ID: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>

On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:

> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> (side thread, so re-titling...)
>> 
> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
> where OBDA v2 talk came up again in discussion of a BioPerl indexing
> problem. Archive links for thread here:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

yes, good idea...

>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>> 
>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>> and their format as one table, and then in the main table an
>>> entry for each sequence recording the ID (only one accession,
>>> unlike OBDA which had infrastructure for a secondary accession),
>>> file number, offset of the start of the record, and optionally the
>>> length of the record on disk.
>>> 
>>> i.e. Basically what OBDA does, but using SQLite rather
>>> than BDB (not included in Python 3) or a flat file index
>>> (poor performance with large datasets).
>>> 
>>> I find this design attractive on several levels:
>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>> * Preserves the original file untouched
>>> * Index is a small single file (thanks to SQLite)
>>> * Back end could be switched out
>>> * Could be applied to compressed file formats
>>> * Reuses existing parsing code to access entries
>>> 
>>> This could easily form basis of OBDA v2, the main points
>>> of difference I anticipate between the Bio* projects would
>>> be naming conventions for the different file formats, and
>>> what we consider to be the default record ID of each read
>>> (e.g. which field in a GenBank file - although agreement
>>> here is not essential). Some of that was already settled in
>>> principle with OBDA v1.
>> 
>> The primary/secondary IDs could be configurable with a sane
>> default, I think the bioperl implementations allowed this (and
>> it is certainly something that will be requested).
> 
> One reason I went with a single ID only was to keep the
> Python dictionary based API simple (think hash in Perl).
> You don't get secondary keys in a Python dict or a hash ;)
> 
> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
> can provide a call back function to map the suggested ID to
> something else. Obviously this doesn't give the full flexibility
> of extracting a field from the record's annotation because we
> don't parse the whole record during indexing (it would be too
> slow).

Same with bioperl.

> However, I'm happy for there to be an *optional* secondary
> key in an OBDA v2 SQLite schema, but Biopython probably
> won't populate it. We could optionally use it rather than the
> primary ID on loading an existing index though.

Optional implementation of that is fine by me.

> Personally I would stick with one key in the index - it should
> be faster and makes it simpler to switch the back end if we
> need to later. If anyone wants a second key, they can build
> a second index *grin*.

That's easy enough.

>>> On the other hand, you could try and store the parsed data
>>> itself, which is where NOSQL looks more interesting. That
>>> essentially requires the ability to serialise your annotated
>>> sequence object model to disk - which would be tricky to do
>>> cross project (much more ambitious than BioSQL is). It also
>>> means the "index" becomes very large because it now holds
>>> all the original data.
>>> 
>>> Peter
>> 
>> For a fully cross-Bio* compliant format, I don't think it's feasible
>> to use serialized data unless they are serialized in something
>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>> XML, etc).  Either that, or such data is stored concurrently with
>> the binary blob, along with meta data that indicates the source
>> of the blob, parser, version, etc, etc (unless there are tools out
>> there that reliably interconvert serialized complex data structures
>> between HLLs).  Anyway you go about it, it seems like it could
>> be a major ball of hurt, unless implemented very carefully.
> 
> You missed out RDF as a serialisation ;)
> 
> But yes, going down the shared serialisation route is going
> to be messy - as you are well aware:
> 
>> Aside: I think this was one of the problems with
>> Bio::DB::SeqFeature::Store, in that it at one point stored
>> Perl-specific Storable blobs.
>> 
>> chris
> 
> Peter

yes, it's a problem w/o an easy solution.  Anyway, I think an implementation of such at this point would be a premature optimization.  

chris


From biojiangke at gmail.com  Tue Nov  8 17:29:54 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST)
Subject: [Bioperl-l] Some questions about the Bio::PopGen
In-Reply-To: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
References: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
Message-ID: <32805996.post@talk.nabble.com>


I think the pi calculated in the function isn't really the pi as defined. You
need to divide the value by total number of sites (in your case, it's 5,
which is not your individual number but sequence length). I think the reason
they implemented this way is that sometimes it's easier to work only with
variable sites. 

The aln to population function converts an aln object to a population
object. You can't really see the object unless you write additional codes to
write it out or do some calculations on it. 

The third question depends on your specific needs. For population level
analyses of molecular evolution, I usually create a multiple sequence
alignment with other applications (clustalw etc), then manually adjust the
alignments to make sure they represent homology. I wouldn't touch the
alignment once this is done but only make an aln (or whatever format you
want) for inputting to analyses applications, like Bio::PopGen (usually use
the aln_to_population function you're using now).


Qian Zhao wrote:
> 
> Hi
> Recently, I am learning how to caculate pi, Fst, Tajima D using
> Bio::PopGen.
> I am not familiar with Perl and I am really confused with the following
> problems.
> (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used
> to caculate is this:
>     __DATA__
> 01 A01 A
> 01 A02 A
> 01 A03 A
> 01 A04 A
> 01 A05 A
> 02 A01 A
> 02 A02 T
> 02 A03 T
> 02 A04 T
> 02 A05 T
> 03 A01 G
> 03 A02 G
> 03 A03 G
> 03 A04 G
> 03 A05 G
> 04 A01 G
> 04 A02 G
> 04 A03 C
> 04 A04 C
> 04 A05 G
> 05 A01 T
> 05 A02 C
> 05 A03 T
> 05 A04 T
> 05 A05 T
> And I am not sure if I can use these sequences below to demostrate the
> prettybase format above:
>>A01
> AAGGT
>>A02
> ATGGC
>>A03
> ATGCT
>>A04
> ATGCT
>>A05
> ATGGT
> The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I
> use DnaSP. I find that if the 1.4/5=0.28, which means that if the number
> from Bio::PopGen::Statistics is divided by the individula number, the
> result
> would be exactly the same. Is there something wrong in my perl script? The
> code I used was below:
> #/usr/bin/perl -w
> use warnings;
> use strict;
> use Bio::PopGen::Genotype;
>  my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'gene_1',
>                                            -individual_id => '001',
>                                            -alleles       => ['1','5'] );
> use Bio::PopGen::Individual;
>  my $ind = Bio::PopGen::Individual->new(-unique_id  => '001',
>                                         -genotypes  => [$genotype] );
> $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  use Bio::PopGen::Population;
>  my $pop = Bio::PopGen::Population->new(-name        => 'Bm',
>                                         -description => 'description',
>                                         -individuals => [$ind] );
> use Bio::PopGen::IO;
> use Bio::PopGen::Statistics;
> my $nummarkers = $pop->get_marker_names;
> my $stats = Bio::PopGen::Statistics->new();
> my $io = Bio::PopGen::IO->new (-format => 'prettybase',
>                                -file => '1.txt');
> if( my $pop = $io->next_population ) {
>     my $pi = $stats->pi($pop, $nummarkers);
>     print "pi is $pi\n";
> my @inds;
>     for my $ind ( $pop->get_Individuals ) {
>         if( $ind->unique_id =~ /A0[1-3]/ ) {
>             push @inds, $ind;
>         }
>     }
>     print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n";
> }
> 
> (2) I want to use Bio::PopGen::Utilities to translate the alignment file
> to
> the population file. However, I can not find the result file after the
> program. I use the following code:
> use Bio::PopGen::Utilities;
>   use Bio::AlignIO;
> 
>   my $in = Bio::AlignIO->new(-file   => 't/data/t7.aln',
>                             -format => 'clustalw');
> my $aln = $in->next_aln;
> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln);
> my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model =>
> 'cod',
>                                                          -alignment  =>
> $aln);
> I am not sure where I should add my result file' name in the code.
> (3) If my file contains a lot of individual sequences and one individual
> has
> one genotype. I'd like to know how can I use the  Bio::PopGen::Individual,
> Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which
> can used in Bio::PopGen::Statistics ?
> 
> I will be great appreciated if I can get the answers. Thanks for your time
> and Best Wishes!
>                                                    Qian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biojiangke at gmail.com  Tue Nov  8 17:51:22 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST)
Subject: [Bioperl-l] questions about the bioperl module
 Bio::PopGen::Statistics
In-Reply-To: <201106012030039537050@gmail.com>
References: <201106012030039537050@gmail.com>
Message-ID: <32805997.post@talk.nabble.com>


If you read the Bio::PopGen doc, you'll see there is an optional argument for
the function that calculates pi, which is taking the number of sites into
consideration. Also, when you use the aln_to_population function to input an
alignment, you can use the option to take in all sites, including the
monomorphic sites. I think if you implement both in your script, you'll get
the same pi value as from other applications like DnaSP.

In terms of sliding window analyses, you may have to implement your own
method to move along the windows, but I think DnaSP is ready to do that, you
don't have to write your won script.
  

lvu.jun wrote:
> 
> Hi, there,
> I am trying to calculate the population genetics parameters such as pi
> using the bioperl module Bio::PopGen::Statistics. But I found that the
> method only requires the input of the marker genotype of every individuals
> for the population. I don't know why the module does not take the DNA
> sequence length into consideration when calculating the pi value.
> According to the definition of the pi value, besides the polymorphic
> sites, we also need the monomorphic sites that should be incorporated in
> the denominator when doing the calculation. Is it right? therefore I'm
> confused about the module, who can tell me why it can correctly calculate
> the pi value only with the marker(polymorphic) genotype?
> Another question, if I want to calculate the pi value using the sliding
> window along the genome, how can I do this using the
> Bio::PopGen::Statistics module?
> Thanks for your help!
> Yours sincerely,
> Jun
> 
> Chinese Academy of Sciences
> 
> 2011-06-01 
> 
> 
> 
> lvu.jun 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From shachigahoimbi at gmail.com  Wed Nov  9 00:22:33 2011
From: shachigahoimbi at gmail.com (Shachi Gahoi)
Date: Wed, 9 Nov 2011 10:52:33 +0530
Subject: [Bioperl-l] Run FGENESH using bioperl
Message-ID: <CACyyM1ZOiMspVH3hF4fJOvedw=8YzZDuuzJRHsuJUJ=mkuYyng@mail.gmail.com>

Dear All.

I have multi-fasta sequence file and I want to run FGENESH and I would like
to run the FGENESH for sequence one by one stored in multi-fasta sequence
file.

Is it possible using Bioperl ?

Please guide me.

Thanks in advance.


-- 
Regards,
Shachi


From pankajt322 at gmail.com  Thu Nov  3 08:12:44 2011
From: pankajt322 at gmail.com (pankaj)
Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT)
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
Message-ID: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>


On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
> Dear all,
>
> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
> from fasta file and then I want to rename same file with that ORF ID
> "PITG_14194".
>
> I have many files and I want to do same exercise with all sequence files.
>
> Please tell me how can i do this with perl or bioperl.
>
> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>
> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From azaballos at isciii.es  Wed Nov  9 06:28:21 2011
From: azaballos at isciii.es (Angel Zaballos)
Date: Wed, 9 Nov 2011 12:28:21 +0100
Subject: [Bioperl-l] bp_genbank2gff.pl  bug
Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>

Running bp_genbank2gff.pl got this:

[root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff
Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251.


?ngel Zaballos
Unidad de Gen?mica
Centro Nacional de Microbiolog?a-ISCIII
Carretera Majadahonda-Pozuelo, Km 2,2
28220-Majadahonda

Tel: 918223994
mail:  azaballos at isciii.es


************************* AVISO LEGAL *************************
Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
pudiendo contener documentos anexos de car?cter privado y confidencial.
Si por error, ha recibido este mensaje y no se encuentra entre los
destinatarios, por favor, no use, informe, distribuya, imprima o copie su
contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
cuando no responda a las funciones atribuidas al remitente del mismo por la
normativa vigente.


From scott at scottcain.net  Wed Nov  9 11:12:02 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 11:12:02 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
Message-ID: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>

Hi Angel,

I would suggest using bp_genbank2gff3.pl, as it is more actively
maintained; the bp_genbank2gff.pl script hasn't really been touched in many
years, and I imagine it's suffering from some serious code rot.

Scott


2011/11/9 Angel Zaballos <azaballos at isciii.es>

> Running bp_genbank2gff.pl got this:
>
> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> AAXT01000001.1 > babesichr3.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
>
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 11:13:10 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 16:13:10 +0000
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
	<bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
Message-ID: <CAPOrs_030887wt=T7ZJyDUid92poO+FX4kKkRFTzWweXi5ffvw@mail.gmail.com>

On 3 November 2011 12:12, pankaj <pankajt322 at gmail.com> wrote:
>
>
> On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
>> Dear all,
>>
>> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
>> from fasta file and then I want to rename same file with that ORF ID
>> "PITG_14194".
>>
>> I have many files and I want to do same exercise with all sequence files.
>>
>> Please tell me how can i do this with perl or bioperl.
>>
>> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>>
>> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
>> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
>> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
>> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
>> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
>> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
>> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
>> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
>> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>>

---------- Forwarded message ----------
From: Jason Stajich <jason.stajich at gmail.com>
Date: 21 October 2011 10:56
Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl
To: Shachi Gahoi <shachigahoimbi at gmail.com>
Cc: bioperl-l at bioperl.org


easy to do this with a simple regular expression and opening a new
file.  Have you read up on this concept in Perl.
You can use SeqIO to parse FASTA files - did you read the HOWTO and
website documentation first?

We don't typically do people's work for them on this mailing list so
please show some effort first.


From scott at scottcain.net  Wed Nov  9 13:43:00 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 13:43:00 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
Message-ID: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>

Hi Chris,

Actually, removing it from the distribution (but letting it remain in the
code repository) is not a bad idea.  I can't really think of a down side.

Scott


2011/11/9 Fields, Christopher J <cjfields at illinois.edu>

> Scott,
>
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or
> remove it altogether)?
>
> chris
>
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
>
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus
> destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie
> su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III
> no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este
> mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo
> por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 13:39:52 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 18:39:52 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>

Scott,

Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?

chris

On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:

> Hi Angel,
> 
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> years, and I imagine it's suffering from some serious code rot.
> 
> Scott
> 
> 
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> 
>> Running bp_genbank2gff.pl got this:
>> 
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> 
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov  9 14:51:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 19:51:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <C0212F3D-AFD7-41A4-9649-B876FAFA7C02@illinois.edu>

Scott,

It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder.  Either would prevent it from being packaged and installed in future versions.  

(Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules)

chris

On Nov 9, 2011, at 12:43 PM, Scott Cain wrote:

> Hi Chris,
> 
> Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea.  I can't really think of a down side.
> 
> Scott
> 
> 
> 2011/11/9 Fields, Christopher J <cjfields at illinois.edu>
> Scott,
> 
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?
> 
> chris
> 
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
> 
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 15:39:17 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 20:39:17 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>

On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> Hi Chris,
>
> Actually, removing it from the distribution (but letting it remain in the
> code repository) is not a bad idea. ?I can't really think of a down side.
>
> Scott

Hi

can I suggest instead to simply make the script issue a warning right
at the start? Something like "bp_genbank2gff is obsolete and will be
removed from a future version of bioerl; please use bp_genbank2gff3
instead". You could leave it there for the next 2 releases and then
finally remove it. This would have 2 advantages:

1) people that have been using it will immediately know what to use as
replacement (instead of coming and ask in the mailing list)?
2) people who use it but don't know anything about the subject,
someone told them to "just press this button" or "just type this in
the terminal", won't have suddenly a broken system and will have time
to find someone that will make it work again.

That's what's done in GNU octave and I think it works good there.
Carn?


From scott at scottcain.net  Wed Nov  9 15:48:07 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 15:48:07 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
Message-ID: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>

Hi Carn?,

You are absolutely correct; that is the right way to do it.  I'll add that
right now (and if the original posts fix is an easy one, I'll fix that too
:-)

Scott


2011/11/9 Carn? Draug <carandraug+dev at gmail.com>

> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
>
> Hi
>
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
>
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
>
> That's what's done in GNU octave and I think it works good there.
> Carn?
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 16:59:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 21:59:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
	<CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
Message-ID: <C86AC2F8-F8E8-431D-83A6-39E896C23485@illinois.edu>

Works for me, it's a standard deprecation policy.  The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning).

chris

On Nov 9, 2011, at 2:48 PM, Scott Cain wrote:

> Hi Carn?,
> 
> You are absolutely correct; that is the right way to do it.  I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-)
> 
> Scott
> 
> 
> 2011/11/9 Carn? Draug <carandraug+dev at gmail.com>
> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
> 
> Hi
> 
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
> 
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
> 
> That's what's done in GNU octave and I think it works good there.
> Carn?
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From biopython at maubp.freeserve.co.uk  Thu Nov 10 08:09:40 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 13:09:40 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <31659982.post@talk.nabble.com>
References: <31659982.post@talk.nabble.com>
Message-ID: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>

Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html

On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>
> I received the following error while trying to run bl2seq from
> standaloneblastplus. Has anyone else encountered this problem?
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: /usr/bin/blastp call crashed: There was a problem running
> /usr/bin/blastp : Error: NCBI C++ Exception:
>
> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
> access NULL pointer.
>
> Thank you,
> Ryan

Just hit something very very similar, looks like a BLAST+ bug which I
will report now:

$ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
Error: NCBI C++ Exception:
    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
Attempt to access NULL pointer.

This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
BLAST 2.2.24+ (blastp) from the look of the error. The line number has
changed by one, but I'm confident it is the same point of failure.

In my case I was comparing nucleotide against nucleotide, so should
have been using tblastx not tblastn, but it still shouldn't have had a
pointer exception.

Peter


From cjfields at illinois.edu  Thu Nov 10 09:00:46 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 14:00:46 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI
	C++	Exception
In-Reply-To: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
Message-ID: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>

On Nov 10, 2011, at 7:09 AM, Peter wrote:

> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
> 
> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>> 
>> I received the following error while trying to run bl2seq from
>> standaloneblastplus. Has anyone else encountered this problem?
>> 
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: /usr/bin/blastp call crashed: There was a problem running
>> /usr/bin/blastp : Error: NCBI C++ Exception:
>> 
>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>> access NULL pointer.
>> 
>> Thank you,
>> Ryan
> 
> Just hit something very very similar, looks like a BLAST+ bug which I
> will report now:
> 
> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
> Error: NCBI C++ Exception:
>    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
> Attempt to access NULL pointer.
> 
> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
> changed by one, but I'm confident it is the same point of failure.
> 
> In my case I was comparing nucleotide against nucleotide, so should
> have been using tblastx not tblastn, but it still shouldn't have had a
> pointer exception.
> 
> Peter

Yeah, that's bad.  I have seen a few things like this myself that make me worry about the transition to BLAST+.

chris

PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?


From casaburi at ceinge.unina.it  Thu Nov 10 07:29:55 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST)
Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
Message-ID: <32818254.post@talk.nabble.com>


Hi everybody,

i have some reads (454) where there are adaptors (NNNN...), one,two or three
adaptors for each reads depending on the reads. Is there any way to
establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
over the total ???

>271-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>272-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>273-88
GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>274-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA

The problem is that some adpators occur in the middle of the sequences
because they coming out from a concameration experimental design (they are
miRNAs between NNNNNN...). So i want to know a script or tool that may say
how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
number of reads. Do you know any tool/script that may help ? Tnx 
Can anyone suggests me a script to fix this ???

Thank you very much 
-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jovel_juan at hotmail.com  Thu Nov 10 11:06:16 2011
From: jovel_juan at hotmail.com (Juan Jovel)
Date: Thu, 10 Nov 2011 16:06:16 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <32818254.post@talk.nabble.com>
References: <32818254.post@talk.nabble.com>
Message-ID: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>


There are many ways to do it. 
Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
For example: 
$adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
You then place that result in a hash bin:
my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
# Then you can sort and output your classes
foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }

You can workout the details, but something like this should work.


> Date: Thu, 10 Nov 2011 04:29:55 -0800
> From: casaburi at ceinge.unina.it
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
> 
> 
> Hi everybody,
> 
> i have some reads (454) where there are adaptors (NNNN...), one,two or three
> adaptors for each reads depending on the reads. Is there any way to
> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
> over the total ???
> 
> >271-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
> >272-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
> >273-88
> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
> >274-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
> 
> The problem is that some adpators occur in the middle of the sequences
> because they coming out from a concameration experimental design (they are
> miRNAs between NNNNNN...). So i want to know a script or tool that may say
> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
> number of reads. Do you know any tool/script that may help ? Tnx 
> Can anyone suggests me a script to fix this ???
> 
> Thank you very much 
> -- 
> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From scott at scottcain.net  Thu Nov 10 11:55:53 2011
From: scott at scottcain.net (Scott Cain)
Date: Thu, 10 Nov 2011 11:55:53 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
Message-ID: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>

Hi Angel,

Please keep correspondence on the mailing list.

I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
and it worked fine.  I suspect there is something wrong with your genbank
file.

Scott


On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:

> His Scott,
>
> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
> happened:
>
> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
> babesichr3_2.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
> UNIVERSAL->import is deprecated and will be removed in a future perl at
> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>
> However, the output file seems to be correct (Indeed, that was also the
> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>
> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
> babesiachr3_2.gff
> Reading babesiachr3_2.gff
> Read 4776 transcripts in 8821 lines in 1 files
>   4776 groups 1 seqs 1 sources 6 feature types
> 2379 gene predictions
>
> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
> Mac with Parallels. Maybe tis is the cause for such a message.
>
> Regards
>
>
> ?ngel
>
>
> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>
> Hi Angel,
>
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many years, and I imagine it's suffering from some serious code rot.
>
> Scott
>
>
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>
>> Running bp_genbank2gff.pl got this:
>>
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>>
>>
>>
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>>
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>>
>>
>>
>>
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este
>> mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por
>> la
>> normativa vigente.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From l.m.timmermans at students.uu.nl  Thu Nov 10 12:17:12 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Thu, 10 Nov 2011 18:17:12 +0100
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <CAC1jpXAW_MTQjBY8Z8ffr67g_0TrGwWddixuQvtTB19+S+DLVg@mail.gmail.com>

On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel <jovel_juan at hotmail.com> wrote:

>
> There are many ways to do it.
> Perhaps the simplest is to count the number of times the adapter sequence
> (or part of it) appears in each read.
> For example:
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;#
> $adapter_matches will store the number of times the adapter sequence is
> repeated.
>

No, it will not. tr/// will count characters, not sequences. Something like
?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH.

Leon


From cjfields at illinois.edu  Thu Nov 10 14:17:57 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 19:17:57 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
	<CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu>

This is running using an older version of bioperl (probably 1.6.0 or 1.6.1).  The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed.

chris

On Nov 10, 2011, at 10:55 AM, Scott Cain wrote:

> Hi Angel,
> 
> Please keep correspondence on the mailing list.
> 
> I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
> and it worked fine.  I suspect there is something wrong with your genbank
> file.
> 
> Scott
> 
> 
> On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:
> 
>> His Scott,
>> 
>> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
>> happened:
>> 
>> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
>> babesichr3_2.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> UNIVERSAL->import is deprecated and will be removed in a future perl at
>> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>> 
>> However, the output file seems to be correct (Indeed, that was also the
>> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>> 
>> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
>> babesiachr3_2.gff
>> Reading babesiachr3_2.gff
>> Read 4776 transcripts in 8821 lines in 1 files
>>  4776 groups 1 seqs 1 sources 6 feature types
>> 2379 gene predictions
>> 
>> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
>> Mac with Parallels. Maybe tis is the cause for such a message.
>> 
>> Regards
>> 
>> 
>> ?ngel
>> 
>> 
>> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>> 
>> Hi Angel,
>> 
>> I would suggest using bp_genbank2gff3.pl, as it is more actively
>> maintained; the bp_genbank2gff.pl script hasn't really been touched in
>> many years, and I imagine it's suffering from some serious code rot.
>> 
>> Scott
>> 
>> 
>> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>> 
>>> Running bp_genbank2gff.pl got this:
>>> 
>>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>>> AAXT01000001.1 > babesichr3.gff
>>> Replacement list is longer than search list at
>>> /usr/share/perl5/Bio/Range.pm line 251.
>>> 
>>> 
>>> 
>>> ?ngel Zaballos
>>> Unidad de Gen?mica
>>> Centro Nacional de Microbiolog?a-ISCIII
>>> Carretera Majadahonda-Pozuelo, Km 2,2
>>> 28220-Majadahonda
>>> 
>>> Tel: 918223994
>>> mail:  azaballos at isciii.es
>>> 
>>> 
>>> 
>>> 
>>> ************************* AVISO LEGAL *************************
>>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>>> Si por error, ha recibido este mensaje y no se encuentra entre los
>>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>>> asume ning?n tipo de responsabilidad legal por el contenido de este
>>> mensaje
>>> cuando no responda a las funciones atribuidas al remitente del mismo por
>>> la
>>> normativa vigente.
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Thu Nov 10 14:27:22 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 19:27:22 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
	<B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
Message-ID: <CAKVJ-_4+hGzxmn43qJ4SkJfCaPUQw=PkV5QSjUyqPSDmyVw64A@mail.gmail.com>

On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 10, 2011, at 7:09 AM, Peter wrote:
>
>> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
>>
>> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>>>
>>> I received the following error while trying to run bl2seq from
>>> standaloneblastplus. Has anyone else encountered this problem?
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: /usr/bin/blastp call crashed: There was a problem running
>>> /usr/bin/blastp : Error: NCBI C++ Exception:
>>>
>>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>>> access NULL pointer.
>>>
>>> Thank you,
>>> Ryan
>>
>> Just hit something very very similar, looks like a BLAST+ bug which I
>> will report now:
>>
>> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
>> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
>> Error: NCBI C++ Exception:
>> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
>> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
>> Attempt to access NULL pointer.
>>
>> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
>> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
>> changed by one, but I'm confident it is the same point of failure.
>>
>> In my case I was comparing nucleotide against nucleotide, so should
>> have been using tblastx not tblastn, but it still shouldn't have had a
>> pointer exception.
>>
>> Peter
>
> Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+.
>
> chris

I'm told is already fixed and will be part of BLAST 2.2.26+ which is good.

>
> PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?
>

Maybe once, but it was in the archive and my email account.

Peter


From anna.fr at gmail.com  Thu Nov 10 15:01:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 09:01:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
Message-ID: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>

Hi all

Does anyone know if there is a way to get a Taxonomy node and/or
taxonid from a gi number using the flatfile with taxonomy db?

I have blast output that I want to append taxonomic information to. I
have hundreds of thousands of items to do this for, so it's not
practical to use entrez to query the?NCBI database.

I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
think much too large to put into a hash!

This was also discussed in 2009:
http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
don't think there was a conclusion?

Thanks for your help
Anna Friedlander


From shalabh.sharma7 at gmail.com  Thu Nov 10 15:12:09 2011
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 10 Nov 2011 15:12:09 -0500
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>

Hi Anna,
           I think the thread you mentioned was started by me.
That time i wrote few scripts to map gi to taxa, after some time i found
some other efficient ways also. But recently 'Miguel Pignatelli' directed
to some Bio-LITE modules that are really helpful.

These are the modules he mentioned, i found them really easy to use and
very efficient.

Bio-LITE-Taxonomy-0.07
Bio-LITE-Taxonomy-NCBI-0.07
Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04

Cheers
Shalabh

On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:

> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From cjfields at illinois.edu  Thu Nov 10 15:23:14 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 20:23:14 +0000
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu>

Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option).  I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups.

chris

On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote:

> Hi Anna,
>           I think the thread you mentioned was started by me.
> That time i wrote few scripts to map gi to taxa, after some time i found
> some other efficient ways also. But recently 'Miguel Pignatelli' directed
> to some Bio-LITE modules that are really helpful.
> 
> These are the modules he mentioned, i found them really easy to use and
> very efficient.
> 
> Bio-LITE-Taxonomy-0.07
> Bio-LITE-Taxonomy-NCBI-0.07
> Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04
> 
> Cheers
> Shalabh
> 
> On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> 
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> Shalabh Sharma
> Scientific Computing Professional Associate (Bioinformatics Specialist)
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Thu Nov 10 15:51:13 2011
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 10 Nov 2011 21:51:13 +0100
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>

Hi Anna,

Jason changed his example script from using hashes to using SQLite:
bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom

See
https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl

It's an example script that shows how to do the tax to gi mapping for
blast reports.


Bernd

On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Thu Nov 10 16:13:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 21:13:12 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>

If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split?  Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)).  

tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match.  '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/).

chris

On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote:

> 
> There are many ways to do it. 
> Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
> For example: 
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
> You then place that result in a hash bin:
> my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
> # Then you can sort and output your classes
> foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }
> 
> You can workout the details, but something like this should work.
> 
> 
> 
> 
> 
> 
> 
>> Date: Thu, 10 Nov 2011 04:29:55 -0800
>> From: casaburi at ceinge.unina.it
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
>> 
>> 
>> Hi everybody,
>> 
>> i have some reads (454) where there are adaptors (NNNN...), one,two or three
>> adaptors for each reads depending on the reads. Is there any way to
>> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
>> over the total ???
>> 
>>> 271-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>>> 272-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>>> 273-88
>> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>>> 274-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
>> 
>> The problem is that some adpators occur in the middle of the sequences
>> because they coming out from a concameration experimental design (they are
>> miRNAs between NNNNNN...). So i want to know a script or tool that may say
>> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
>> number of reads. Do you know any tool/script that may help ? Tnx 
>> Can anyone suggests me a script to fix this ???
>> 
>> Thank you very much 
>> -- 
>> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 		 	   		  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Thu Nov 10 16:15:29 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Thu, 10 Nov 2011 13:15:29 -0800
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>

Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI
and then a second db to store GI -> TAXONID

This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string.

https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl

That's the first 165 lines, and then lookups are basically what you see on line 195.

Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?).
one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading.

Jason

On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:

> Hi Anna,
> 
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
> 
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
> 
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
> 
> 
> Bernd
> 
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anna.fr at gmail.com  Thu Nov 10 20:07:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 14:07:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
	<1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
Message-ID: <CALv2E+09JeJiXPUoZphNZnaVhWM9mstkhhp+=1Jvs6Hjy3c+uA@mail.gmail.com>

thanks all for the fast responses.

I'll try the bio-lite modules shalabh mentioned

On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich <jason.stajich at gmail.com> wrote:
> Here's another variant of one I wrote which is for my own purposes, the code
> at the beginning uses a NOSQL solution to storing all the ACC -> GI
> and then a second db to store GI -> TAXONID
> This is the case where I have a file of accession numbers and I want to add
> to the description line the taxonomy string.
> https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl
> That's the first 165 lines, and then lookups are basically what you see on
> line 195.
> Would be good to rewrite that script below to use TokyoCabinent
> or?KyotoCabinent (is newer implementation, not sure if it is faster?).
> one thing that this does is take up a lot of disk space ,but you can have
> tradeoffs between than and which compression scheme you use, which will
> impact performance of loading.
> Jason
> On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:
>
> Hi Anna,
>
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
>
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
>
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
>
>
> Bernd
>
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
>
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
>
> have hundreds of thousands of items to do this for, so it's not
>
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>
> think much too large to put into a hash!
>
> This was also discussed in 2009:
>
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>
> don't think there was a conclusion?
>
> Thanks for your help
>
> Anna Friedlander
>
> _______________________________________________
>
> Bioperl-l mailing list
>
> Bioperl-l at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From arun_innovative90 at yahoo.com  Fri Nov 11 06:09:46 2011
From: arun_innovative90 at yahoo.com (Arun Kumar)
Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST)
Subject: [Bioperl-l] BIOPERL MATERIAL
Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>

Hi team, 
?
?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl.
?
Thanks in advance

Thanks & Regards,
Arunkumar.d


From awitney at sgul.ac.uk  Fri Nov 11 08:23:29 2011
From: awitney at sgul.ac.uk (Adam Witney)
Date: Fri, 11 Nov 2011 13:23:29 +0000
Subject: [Bioperl-l] BIOPERL MATERIAL
In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
Message-ID: <EA1DBB02-0280-4207-97E7-A116C058A615@sgul.ac.uk>


All BioPerl documents can be found here:

http://www.bioperl.org/wiki/Main_Page

And a useful place to start would be the HOWTOs:

http://www.bioperl.org/wiki/HOWTOs

regards

adam


On 11 Nov 2011, at 11:09, Arun Kumar wrote:

> Hi team, 
>  
>    This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with  bioperl.
>  
> Thanks in advance
> 
> Thanks & Regards,
> Arunkumar.d
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From casaburi at ceinge.unina.it  Fri Nov 11 07:13:50 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825229.post@talk.nabble.com>


Hi thank you for your answer !!! 

At the end i tried this script and seems to work for this purpose:


perl -pe
's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g'
Scrivania/orchidea/Fiore/Mydata.fasta > result.txt


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From casaburi at ceinge.unina.it  Fri Nov 11 07:21:29 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825274.post@talk.nabble.com>


Thanks everybody for answering me so soon !!! Probably another way may be:

perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print
"$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt 


and/or with 'nawk':

nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i "
ADAPTOR"}' myFile.fasta > result.txt 

They give the same result. If you will have this problem try these, work
good !!!

Still Thanks to all,

Giorgio


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Sun Nov 13 07:24:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:24:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
Message-ID: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>

On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:
>
>> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> (side thread, so re-titling...)
>>>
>> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
>> where OBDA v2 talk came up again in discussion of a BioPerl indexing
>> problem. Archive links for thread here:
>>
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html
>
> yes, good idea...

I've not CC'd the bioperl-l anymore.

>>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>>>
>>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>>> and their format as one table, and then in the main table an
>>>> entry for each sequence recording the ID (only one accession,
>>>> unlike OBDA which had infrastructure for a secondary accession),
>>>> file number, offset of the start of the record, and optionally the
>>>> length of the record on disk.
>>>>
>>>> i.e. Basically what OBDA does, but using SQLite rather
>>>> than BDB (not included in Python 3) or a flat file index
>>>> (poor performance with large datasets).
>>>>
>>>> I find this design attractive on several levels:
>>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>>> * Preserves the original file untouched
>>>> * Index is a small single file (thanks to SQLite)
>>>> * Back end could be switched out
>>>> * Could be applied to compressed file formats
>>>> * Reuses existing parsing code to access entries
>>>>
>>>> This could easily form basis of OBDA v2, the main points
>>>> of difference I anticipate between the Bio* projects would
>>>> be naming conventions for the different file formats, and
>>>> what we consider to be the default record ID of each read
>>>> (e.g. which field in a GenBank file - although agreement
>>>> here is not essential). Some of that was already settled in
>>>> principle with OBDA v1.
>>>
>>> The primary/secondary IDs could be configurable with a sane
>>> default, I think the bioperl implementations allowed this (and
>>> it is certainly something that will be requested).
>>
>> One reason I went with a single ID only was to keep the
>> Python dictionary based API simple (think hash in Perl).
>> You don't get secondary keys in a Python dict or a hash ;)
>>
>> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
>> can provide a call back function to map the suggested ID to
>> something else. Obviously this doesn't give the full flexibility
>> of extracting a field from the record's annotation because we
>> don't parse the whole record during indexing (it would be too
>> slow).
>
> Same with bioperl.
>
>> However, I'm happy for there to be an *optional* secondary
>> key in an OBDA v2 SQLite schema, but Biopython probably
>> won't populate it. We could optionally use it rather than the
>> primary ID on loading an existing index though.
>
> Optional implementation of that is fine by me.
>
>> Personally I would stick with one key in the index - it should
>> be faster and makes it simpler to switch the back end if we
>> need to later. If anyone wants a second key, they can build
>> a second index *grin*.
>
> That's easy enough.
>
>>>> On the other hand, you could try and store the parsed data
>>>> itself, which is where NOSQL looks more interesting. That
>>>> essentially requires the ability to serialise your annotated
>>>> sequence object model to disk - which would be tricky to do
>>>> cross project (much more ambitious than BioSQL is). It also
>>>> means the "index" becomes very large because it now holds
>>>> all the original data.
>>>>
>>>> Peter
>>>
>>> For a fully cross-Bio* compliant format, I don't think it's feasible
>>> to use serialized data unless they are serialized in something
>>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>>> XML, etc). ?Either that, or such data is stored concurrently with
>>> the binary blob, along with meta data that indicates the source
>>> of the blob, parser, version, etc, etc (unless there are tools out
>>> there that reliably interconvert serialized complex data structures
>>> between HLLs). ?Anyway you go about it, it seems like it could
>>> be a major ball of hurt, unless implemented very carefully.
>>
>> You missed out RDF as a serialisation ;)
>>
>> But yes, going down the shared serialisation route is going
>> to be messy - as you are well aware:
>>
>>> Aside: I think this was one of the problems with
>>> Bio::DB::SeqFeature::Store, in that it at one point stored
>>> Perl-specific Storable blobs.
>>>
>>> chris
>>
>> Peter
>
> yes, it's a problem w/o an easy solution. ?Anyway, I think an
> implementation of such at this point would be a premature
> optimization.
>
> chris

So, Chris and I seem in general agreement that an OBDA v2
using SQLite but based on essentially the same approach as
the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
mapping record identifiers to file offsets in the original sequence
files.

I hope to get BioRuby on board, they already have an OBDA
v1 support so that shouldn't be too hard.

Right now I don't recall if BioJava has/had OBDA v1 support,
and if they did if it was affected in their recent move to BioJava
v3 (I understand from their mailing list that some older lower
priority functionality has not all been ported yet).

Also EMBOSS are likely to be interested, certainly Peter Rice
was interested in the SQLite indexing we're already using in
Biopython for sequence files (i.e. what is effectively the
prototype for OBDA v2).

Note that in addition to simple indexing of text files, we are
already using the same simple offset + length approach for
indexing binary files (e.g. SFF).

On the immediate practical side, I think I can edit the
current OBDA website of http://obda.open-bio.org/
via /home/websites/obda.open-bio.org/html on the
server.

We need to work out where the current OBDA indexing
specification lives (CVS or SVN?) and perhaps move
that to github. We may need a general OBF organisation
account on git hub for this and any other cross-project
repositories.

I see there is already an OBDA project on RedMine,
(Chris can you add me to that please?)
https://redmine.open-bio.org/projects/obda

Peter


From p.j.a.cock at googlemail.com  Sun Nov 13 07:30:37 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:30:37 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
Message-ID: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>

Hi again,

I've retitled this as it is a little off topic from the main OBDA redux thread,
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html

As far as I recall, the original flat file and BDB based OBDA
specification for indexing sequencing files didn't cover
compressed files. That might be something to consider
(although we should sort of uncompressed text/binary
files first).

I've recently been experimenting with using compressed
files - in particular simple GZIP files (ignoring any block structure)
and BGZF (the specialised gzipped blocking used in BAM), see:

http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
http://seqanswers.com/forums/showthread.php?t=15347

The virtual offset approach used in BGZF squeezes a 16 bit
within block offset (thus limiting you to 64kb blocks) and at
48 bit block start offset (thus limiting you to a 256TB file) into
a single 64bit "virtual" offset. That makes sense if you are
keeping the lookup table or many offsets in memory, and
can be used as is with code expecting a single offset (like
the current Biopython SQLite index schema).

Also bzip2 but this is block based, with the block size ranging
from 100KB to 900KB.

http://bzip.org/
http://bzip.org/1.0.5/bzip2-manual-1.0.5.html

I haven't tried any performance tests yet, which would
be interesting as I believe compression/decompression
of bfzip2 is more costly in CPU terms than gzip (although
both will be block size dependent).

If we wanted to imitate the BGZF virtual offset scheme for
arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme
could use 20 bits to cover bz2 blocks of up to 900KB, leaving
64 - 20 = 44 bits for the start offset, thus limiting you to to just
2^44 bytes or 16Tb which sounds OK only in the medium term.
On the bright side this could be used to index any BZIP2 file
(under 16TB), whereas BGZF cannot be applied to any
GZIP file.

On the other hand, storing the block start and within block
separately is truly generic and could be used on any blocked
GZIP file (including BGZF) and BZIP2 etc. It would make
the SQLite schema a bit more complicated though.

Maybe something to consider for the next revision to OBDA,
and focus on the non-compressed case for now?

Regards,

Peter


From p.j.a.cock at googlemail.com  Sun Nov 13 07:32:12 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:32:12 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
In-Reply-To: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
References: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
Message-ID: <CAKVJ-_7G639PJBZFLE8mQPT=0LXeTWaf54U0tbMgh6XWfUAKtQ@mail.gmail.com>

On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi again,
>
> I've retitled this as it is a little off topic from the main OBDA redux thread,
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html
>
> As far as I recall, the original flat file and BDB based OBDA
> specification for indexing sequencing files didn't cover
> compressed files. That might be something to consider
> (although we should sort of uncompressed text/binary
> files first).

Sorry - didn't meant to include bioperl-l on that, although it may be
of interest to you guys anyway.

Peter


From jluis.lavin at unavarra.es  Mon Nov 14 06:14:43 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 12:14:43 +0100
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
Message-ID: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Mon Nov 14 06:59:56 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 06:59:56 -0500
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
	<CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
Message-ID: <FDFB72A5-E38C-4637-9415-5A15E4C5B551@gmail.com>

if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.

If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?  

On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:

> Hello everybody,
> 
> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> worked fine for me. Now I need to perform a multiple BLAST search, but this
> time I'd just like to get all the BLAST results in a single out file
> instead of having each sequence's report written individually. I've read
> the documentation of the module, but due to my short
> experience/understanding on complex modules as this one seems to be I can't
> figure out where to change the script to achieve my previously mentioned
> aim.
> Here I post the script I've been using (it's basically the one posted on
> the module cookbook).
> 
> #!/c:/Perl -w
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> use Data::Dumper;
> 
> #Here i set the parameters for blast
> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> tblastx):\n";
> my $blst = <STDIN>;
> my $prog = "$blst";
> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> env_nr):\n";
> my $dtb = <STDIN>;
> $db = "$dtb";
> print "Enter your cutt off score (1e-n):\n";
> my $cut = <STDIN>;
> my $e_val = "$cut";
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
> 
> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #Select the file and make the blast.
> print "Enter your FASTA file:\n";
> chomp(my $infile = <STDIN>);
> my $r = $remoteBlast->submit_blast($infile);
>  my $v = 1;
> 
>    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> TO RETURN!!!!!
>    while ( my @rids = $remoteBlast->each_rid ) {
>      foreach my $rid ( @rids ) {
>        my $rc = $remoteBlast->retrieve_blast($rid);
>        if( !ref($rc) ) {
>          if( $rc < 0 ) {
>            $remoteBlast->remove_rid($rid);
>          }
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>        } else {
>          my $result = $rc->next_result();
>          #save the output
>          my $filename =
> $result->query_name()."\.out";##################open SALIDA,
> '>>'."$^T"."Report"."\.out";
>          $remoteBlast->save_output($filename);#############
>          $remoteBlast->remove_rid($rid);
>          print "\nQuery Name: ", $result->query_name(), "\n";
>          while ( my $hit = $result->next_hit ) {
>            next unless ( $v > 0);
>            print "\thit name is ", $hit->name, "\n";
>            while( my $hsp = $hit->next_hsp ) {
>              print "\t\tscore is ", $hsp->score, "\n";
>            }
>          }
>        }
>      }
>    }
> 
> 
> May any of you please explain me how to solve my question?
> 
> Thanks in advence
> 
> With best wishes
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Mon Nov 14 09:07:36 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 09:07:36 -0500
Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single
	out
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>

Please keep this on list discussions 

Sent from my iPhone-please excuse typos

--
Jason Stajich

Begin forwarded message:

> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> Date: November 14, 2011 8:04:25 AM EST
> To: Jason Stajich <jason.stajich at gmail.com>
> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
> 
> Hello Jason,
> 
> As answering your question:
> 
> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
> 
> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
> 
> Thanks in advance
> 
> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
> 
> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
> 
> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> 
> > Hello everybody,
> >
> > I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> > worked fine for me. Now I need to perform a multiple BLAST search, but this
> > time I'd just like to get all the BLAST results in a single out file
> > instead of having each sequence's report written individually. I've read
> > the documentation of the module, but due to my short
> > experience/understanding on complex modules as this one seems to be I can't
> > figure out where to change the script to achieve my previously mentioned
> > aim.
> > Here I post the script I've been using (it's basically the one posted on
> > the module cookbook).
> >
> > #!/c:/Perl -w
> > use Bio::Tools::Run::RemoteBlast;
> > use Bio::SearchIO;
> > use Data::Dumper;
> >
> > #Here i set the parameters for blast
> > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> > tblastx):\n";
> > my $blst = <STDIN>;
> > my $prog = "$blst";
> > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> > env_nr):\n";
> > my $dtb = <STDIN>;
> > $db = "$dtb";
> > print "Enter your cutt off score (1e-n):\n";
> > my $cut = <STDIN>;
> > my $e_val = "$cut";
> >
> > my @params = ( '-prog' => $prog,
> >         '-data' => $db,
> >         '-expect' => $e_val,
> >         '-readmethod' => 'SearchIO' );
> >
> > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #Select the file and make the blast.
> > print "Enter your FASTA file:\n";
> > chomp(my $infile = <STDIN>);
> > my $r = $remoteBlast->submit_blast($infile);
> >  my $v = 1;
> >
> >    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> > TO RETURN!!!!!
> >    while ( my @rids = $remoteBlast->each_rid ) {
> >      foreach my $rid ( @rids ) {
> >        my $rc = $remoteBlast->retrieve_blast($rid);
> >        if( !ref($rc) ) {
> >          if( $rc < 0 ) {
> >            $remoteBlast->remove_rid($rid);
> >          }
> >          print STDERR "." if ( $v > 0 );
> >          sleep 5;
> >        } else {
> >          my $result = $rc->next_result();
> >          #save the output
> >          my $filename =
> > $result->query_name()."\.out";##################open SALIDA,
> > '>>'."$^T"."Report"."\.out";
> >          $remoteBlast->save_output($filename);#############
> >          $remoteBlast->remove_rid($rid);
> >          print "\nQuery Name: ", $result->query_name(), "\n";
> >          while ( my $hit = $result->next_hit ) {
> >            next unless ( $v > 0);
> >            print "\thit name is ", $hit->name, "\n";
> >            while( my $hsp = $hit->next_hsp ) {
> >              print "\t\tscore is ", $hsp->score, "\n";
> >            }
> >          }
> >        }
> >      }
> >    }
> >
> >
> > May any of you please explain me how to solve my question?
> >
> > Thanks in advence
> >
> > With best wishes
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> >
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN


From cl134 at duke.edu  Sun Nov 13 09:42:05 2011
From: cl134 at duke.edu (Cheng-Ruei Lee)
Date: Sun, 13 Nov 2011 09:42:05 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>

Hi all,

     Bioperl version: 1.006
     Here are two error messages when I'm using this module to  
calculate Fu & Li's statistics:
Illegal division by zero at (the Statistics.pm file) line 359
Illegal division by zero at (the Statistics.pm file) line 376
     A further tracking down shows that the first error happens when  
$n (sample size in the ingroup) equals 1 or 2, and the second error  
happens when $n equals 3. This is not really a bug though. I would  
suggest either in the original code, do a checking before the  
calculation (and skip the current calculation when $n == 1, 2, or 3 -  
rather than let the whole program die), or add a few lines of notes in  
the CPAN page.

Sincerely,
Cheng-Ruei Lee


From joluito at gmail.com  Mon Nov 14 04:21:31 2011
From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 10:21:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
Message-ID: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From cjfields at illinois.edu  Mon Nov 14 12:02:22 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:02:22 +0000
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>

Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

chris

On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:

> Please keep this on list discussions 
> 
> Sent from my iPhone-please excuse typos
> 
> --
> Jason Stajich
> 
> Begin forwarded message:
> 
>> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> Date: November 14, 2011 8:04:25 AM EST
>> To: Jason Stajich <jason.stajich at gmail.com>
>> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
>> 
>> Hello Jason,
>> 
>> As answering your question:
>> 
>> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
>> 
>> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
>> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
>> 
>> Thanks in advance
>> 
>> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
>> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
>> 
>> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
>> 
>> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> 
>>> Hello everybody,
>>> 
>>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
>>> worked fine for me. Now I need to perform a multiple BLAST search, but this
>>> time I'd just like to get all the BLAST results in a single out file
>>> instead of having each sequence's report written individually. I've read
>>> the documentation of the module, but due to my short
>>> experience/understanding on complex modules as this one seems to be I can't
>>> figure out where to change the script to achieve my previously mentioned
>>> aim.
>>> Here I post the script I've been using (it's basically the one posted on
>>> the module cookbook).
>>> 
>>> #!/c:/Perl -w
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::SearchIO;
>>> use Data::Dumper;
>>> 
>>> #Here i set the parameters for blast
>>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>>> tblastx):\n";
>>> my $blst = <STDIN>;
>>> my $prog = "$blst";
>>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
>>> env_nr):\n";
>>> my $dtb = <STDIN>;
>>> $db = "$dtb";
>>> print "Enter your cutt off score (1e-n):\n";
>>> my $cut = <STDIN>;
>>> my $e_val = "$cut";
>>> 
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO' );
>>> 
>>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>>> 
>>> 
>>> #Select the file and make the blast.
>>> print "Enter your FASTA file:\n";
>>> chomp(my $infile = <STDIN>);
>>> my $r = $remoteBlast->submit_blast($infile);
>>> my $v = 1;
>>> 
>>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
>>> TO RETURN!!!!!
>>>   while ( my @rids = $remoteBlast->each_rid ) {
>>>     foreach my $rid ( @rids ) {
>>>       my $rc = $remoteBlast->retrieve_blast($rid);
>>>       if( !ref($rc) ) {
>>>         if( $rc < 0 ) {
>>>           $remoteBlast->remove_rid($rid);
>>>         }
>>>         print STDERR "." if ( $v > 0 );
>>>         sleep 5;
>>>       } else {
>>>         my $result = $rc->next_result();
>>>         #save the output
>>>         my $filename =
>>> $result->query_name()."\.out";##################open SALIDA,
>>> '>>'."$^T"."Report"."\.out";
>>>         $remoteBlast->save_output($filename);#############
>>>         $remoteBlast->remove_rid($rid);
>>>         print "\nQuery Name: ", $result->query_name(), "\n";
>>>         while ( my $hit = $result->next_hit ) {
>>>           next unless ( $v > 0);
>>>           print "\thit name is ", $hit->name, "\n";
>>>           while( my $hsp = $hit->next_hsp ) {
>>>             print "\t\tscore is ", $hsp->score, "\n";
>>>           }
>>>         }
>>>       }
>>>     }
>>>   }
>>> 
>>> 
>>> May any of you please explain me how to solve my question?
>>> 
>>> Thanks in advence
>>> 
>>> With best wishes
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> 
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
>> -- 
>> -- 
>> Dr. Jos? Luis Lav?n Trueba
>> 
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 12:03:04 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:03:04 +0000
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <E385D24C-E562-43B9-A820-2A7C59E9399A@illinois.edu>

Cheng,

Have you tried the latest CPAN release (we're at 1.006901).

chris

On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 12:59:35 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:59:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
	<CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu>

On Nov 13, 2011, at 6:24 AM, Peter Cock wrote:

> So, Chris and I seem in general agreement that an OBDA v2
> using SQLite but based on essentially the same approach as
> the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
> mapping record identifiers to file offsets in the original sequence
> files.

The worry I have is adhering to a specific backend (e.g. SQLite).  The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets.  Who's to say something similar won't happen to SQLite, or that it is the best option available?  

Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed).  

> I hope to get BioRuby on board, they already have an OBDA
> v1 support so that shouldn't be too hard.
> 
> Right now I don't recall if BioJava has/had OBDA v1 support,
> and if they did if it was affected in their recent move to BioJava
> v3 (I understand from their mailing list that some older lower
> priority functionality has not all been ported yet).

I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?)

> Also EMBOSS are likely to be interested, certainly Peter Rice
> was interested in the SQLite indexing we're already using in
> Biopython for sequence files (i.e. what is effectively the
> prototype for OBDA v2).
> 
> Note that in addition to simple indexing of text files, we are
> already using the same simple offset + length approach for
> indexing binary files (e.g. SFF).

I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well.

> On the immediate practical side, I think I can edit the
> current OBDA website of http://obda.open-bio.org/
> via /home/websites/obda.open-bio.org/html on the
> server.

See below w/ regards to my thoughts on the wiki.

> We need to work out where the current OBDA indexing
> specification lives (CVS or SVN?) and perhaps move
> that to github. We may need a general OBF organisation
> account on git hub for this and any other cross-project
> repositories.

+1 to a move to github, but maybe this belongs in an OBF-specific organization.  And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. 

> I see there is already an OBDA project on RedMine,
> (Chris can you add me to that please?)
> https://redmine.open-bio.org/projects/obda
> 
> Peter

Done (last night actually, but I didn't have time to respond immediately).

chris


From David.Messina at sbc.su.se  Mon Nov 14 14:31:18 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Nov 2011 20:31:18 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se>


> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

Yes, it's the --remote option. I've used it, and it works great.

The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers.


Dave

> 


From jluis.lavin at unavarra.es  Mon Nov 14 16:23:31 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 22:23:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>

Thank you very much for your answers, but due to them, I'm afraid I didn't
explained myself good enough.

 I'm not looking for another tool to perform a BLAST task. I was just
wondering if there was a way to simply change the way the module writes the
outputs, so that I can get multiple searches in a single report file
instead of having a report for each BLAST search.

Maybe there's some issue I ignore, that makes you recommend the use of
other tools instead of the Bioperl Remote BLAST module...it would be
appreciated if you let me know about that (NCBI server problems with
web-services or so)...

Thank you for your answers anyway

Best wishes

2011/11/14 Fields, Christopher J <cjfields at illinois.edu>

> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the
> various 'blast*' indicating the search is to use a remote database.  I
> haven't used it, though...
>
> chris
>
> On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>
> > Please keep this on list discussions
> >
> > Sent from my iPhone-please excuse typos
> >
> > --
> > Jason Stajich
> >
> > Begin forwarded message:
> >
> >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> >> Date: November 14, 2011 8:04:25 AM EST
> >> To: Jason Stajich <jason.stajich at gmail.com>
> >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
> single out
> >>
> >> Hello Jason,
> >>
> >> As answering your question:
> >>
> >> " If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a
> table?"
> >>
> >> A concatenation of BLAST (default format) reports should be OK, since I
> have a script to parse that kind of results. Anyway formats 1 or 2 will
> also do the trick.
> >> I'll be happy to get assistance  on how to change the OUTFILE from "a
> query a report" to "all queries in the same report", because I don't seem
> to be able to do it myself after reading the module documentation.
> >>
> >> Thanks in advance
> >>
> >> El 14 de noviembre de 2011 12:59, Jason Stajich <
> jason.stajich at gmail.com> escribi?:
> >> if you want to do a bunch of BLASTs remotely on the cmdline you should
> also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
> equivalent). This might be faster to do and easier since you need to learn
> the programming part too.
> >>
> >> If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a table?
> >>
> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> >>
> >>> Hello everybody,
> >>>
> >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> >>> worked fine for me. Now I need to perform a multiple BLAST search, but
> this
> >>> time I'd just like to get all the BLAST results in a single out file
> >>> instead of having each sequence's report written individually. I've
> read
> >>> the documentation of the module, but due to my short
> >>> experience/understanding on complex modules as this one seems to be I
> can't
> >>> figure out where to change the script to achieve my previously
> mentioned
> >>> aim.
> >>> Here I post the script I've been using (it's basically the one posted
> on
> >>> the module cookbook).
> >>>
> >>> #!/c:/Perl -w
> >>> use Bio::Tools::Run::RemoteBlast;
> >>> use Bio::SearchIO;
> >>> use Data::Dumper;
> >>>
> >>> #Here i set the parameters for blast
> >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> >>> tblastx):\n";
> >>> my $blst = <STDIN>;
> >>> my $prog = "$blst";
> >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat,
> pdb,
> >>> env_nr):\n";
> >>> my $dtb = <STDIN>;
> >>> $db = "$dtb";
> >>> print "Enter your cutt off score (1e-n):\n";
> >>> my $cut = <STDIN>;
> >>> my $e_val = "$cut";
> >>>
> >>> my @params = ( '-prog' => $prog,
> >>>        '-data' => $db,
> >>>        '-expect' => $e_val,
> >>>        '-readmethod' => 'SearchIO' );
> >>>
> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>
> >>>
> >>> #Select the file and make the blast.
> >>> print "Enter your FASTA file:\n";
> >>> chomp(my $infile = <STDIN>);
> >>> my $r = $remoteBlast->submit_blast($infile);
> >>> my $v = 1;
> >>>
> >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
> RESULTS
> >>> TO RETURN!!!!!
> >>>   while ( my @rids = $remoteBlast->each_rid ) {
> >>>     foreach my $rid ( @rids ) {
> >>>       my $rc = $remoteBlast->retrieve_blast($rid);
> >>>       if( !ref($rc) ) {
> >>>         if( $rc < 0 ) {
> >>>           $remoteBlast->remove_rid($rid);
> >>>         }
> >>>         print STDERR "." if ( $v > 0 );
> >>>         sleep 5;
> >>>       } else {
> >>>         my $result = $rc->next_result();
> >>>         #save the output
> >>>         my $filename =
> >>> $result->query_name()."\.out";##################open SALIDA,
> >>> '>>'."$^T"."Report"."\.out";
> >>>         $remoteBlast->save_output($filename);#############
> >>>         $remoteBlast->remove_rid($rid);
> >>>         print "\nQuery Name: ", $result->query_name(), "\n";
> >>>         while ( my $hit = $result->next_hit ) {
> >>>           next unless ( $v > 0);
> >>>           print "\thit name is ", $hit->name, "\n";
> >>>           while( my $hsp = $hit->next_hsp ) {
> >>>             print "\t\tscore is ", $hsp->score, "\n";
> >>>           }
> >>>         }
> >>>       }
> >>>     }
> >>>   }
> >>>
> >>>
> >>> May any of you please explain me how to solve my question?
> >>>
> >>> Thanks in advence
> >>>
> >>> With best wishes
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >>
> >> --
> >> --
> >> Dr. Jos? Luis Lav?n Trueba
> >>
> >> Dpto. de Producci?n Agraria
> >> Grupo de Gen?tica y Microbiolog?a
> >> Universidad P?blica de Navarra
> >> 31006 Pamplona
> >> Navarra
> >> SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Mon Nov 14 22:53:19 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 22:53:19 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com>

sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming.  I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. 

https://redmine.open-bio.org/issues/3313

Jason

Can you provide a test script and we'll add a test for this so 
On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cchehoud at gmail.com  Mon Nov 14 20:39:32 2011
From: cchehoud at gmail.com (Christel Chehoud)
Date: Mon, 14 Nov 2011 17:39:32 -0800
Subject: [Bioperl-l] Bioperl installation help
Message-ID: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>

Dear BioPerl,
Thank you for creating such useful code. Unfortunately, every time I
try to install Bioperl, it takes me a long time and is a challenging
ordeal :( I am a new MAC user and was not able to download bioperl
using CPAN. Here is the error I am getting:

ERROR: Can't create '/usr/local/bin'
Do not have write permissions on '/usr/local/bin'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
  CJFIELDS/BioPerl-1.6.0.tar.gz
  ./Build install  -- NOT OK
----
  You may have to su to root to install the package
  (Or you may want to run something like
    o conf make_install_make_command 'sudo make'
  to raise your permissions.Warning (usually harmless): 'YAML' not
installed, will not store persistent state
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
failure ignored because 'force' in effect


so I did:
cpan> o conf make_install_make_command 'sudo make'
followed by
cpan> o conf commit

and started over..I got the same number of errors as last time (so I
decided not to force install this time). do you have any suggestions:

63 tests and 305 subtests skipped.
Failed 11/329 test scripts. 981/17708 subtests failed.
Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
117.20 CPU)
Failed 11/329 test programs. 981/17708 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Warning (usually harmless): 'YAML' not installed, will not store
persistent state
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO


Thanks a lot for your time and help.  I appreciate it.

Thank you,
Christel


From casaburi at ceinge.unina.it  Tue Nov 15 04:25:25 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST)
Subject: [Bioperl-l]  Blast > parsing result in Exel
Message-ID: <32846407.post@talk.nabble.com>


Hy everybody,

in this situation froma blast (-m 1) result file :

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 132-291
(59 letters)

Database: Scrivania/orchidea/mature_mirBase.fa
21,643 sequences; 470,608 total letters

Searching..................................................done


Score E
Sequences producing significant alignments: (bits) Value

mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031
mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031
gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9
gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9
mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9

132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59
12631 5 .............. 18
12630 5 .............. 18
7826 5 ........... 15
7644 19 ........... 9
5394 3 ........... 13
5394 3 ........... 13
BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
...
....
..........

______________________________________________________________
I need to parse in an exel sheet :

1)ID 2)Name of the hit 3)E-value 4)Score 5)Species


1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula


Is possible from a big blast result file obtain an exel with 5 columns where
every field is the first hit of the blast result. Can anyone halp me to fix
this problem ??? Also with a little script in perl.


Thank you very much
-- 
View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From nisa.dar10 at gmail.com  Tue Nov 15 19:49:00 2011
From: nisa.dar10 at gmail.com (nisa.dar)
Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST)
Subject: [Bioperl-l]  print alignment from blast results file
Message-ID: <32851673.post@talk.nabble.com>


Hi,

I am parsing a blast results file. I have found bioperl modules to get query
string, homology string and hit string for each hit/hsp. I want to print
them in the form of an alignment instead of aligning them individually.

this is what I am doing, but it doesn't seem correct

while (my $hsp = $hit->next_hsp) {
                                        my
$start_query_num=$hsp->start('query');
					my $query_string=$hsp->query_string;
					my $end_query_num=$hsp->end('query');
					my $homology_string=$hsp->homology_string;
					my $start_hit_num=$hsp->start('hit');
					my $hit_string=$hsp->hit_string;
					my $end_hit_num=$hsp->end('hit');
					my $aln_o = $hsp->get_aln;
					$query_string=~s/\n//g;#get rid of new line characters
					$homology_string=~s/\n//g;
					$hit_string=~s/\n//g;

                         print "<h3>Alignment:</h3><br />";
			print "$start_query_num-$query_string-$end_query_num<br />";
			print "   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
   
            
}

Please let me know how can I print them in the form of an alignment as seen
in the blast results file.

Thanks


-- 
View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Wed Nov 16 04:11:40 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Nov 2011 09:11:40 +0000
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAKVJ-_5PTZttkHXS-FB-tOxhDRCty_qJH9PTurDWn2M5p3VzSw@mail.gmail.com>

On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C <casaburi at ceinge.unina.it> wrote:
>
> Hy everybody,
>
> in this situation froma blast (-m 1) result file :
>
> ...
>
> I need to parse in an exel sheet :
>
> 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species
>
>
> 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula
>
> Is possible from a big blast result file obtain an exel with 5 columns where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>
> Thank you very much

Have you looked at any of the BioPerl BLAST parsing examples? e.g
http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/Module:Bio::SearchIO

See also http://seqanswers.com/forums/showthread.php?t=15489

Peter


From bosborne11 at verizon.net  Wed Nov 16 08:19:33 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 16 Nov 2011 08:19:33 -0500
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <32851673.post@talk.nabble.com>
References: <32851673.post@talk.nabble.com>
Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>

Nisa,

See:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Brian O.


On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:

> 
> Hi,
> 
> I am parsing a blast results file. I have found bioperl modules to get query
> string, homology string and hit string for each hit/hsp. I want to print
> them in the form of an alignment instead of aligning them individually.
> 
> this is what I am doing, but it doesn't seem correct
> 
> while (my $hsp = $hit->next_hsp) {
>                                        my
> $start_query_num=$hsp->start('query');
> 					my $query_string=$hsp->query_string;
> 					my $end_query_num=$hsp->end('query');
> 					my $homology_string=$hsp->homology_string;
> 					my $start_hit_num=$hsp->start('hit');
> 					my $hit_string=$hsp->hit_string;
> 					my $end_hit_num=$hsp->end('hit');
> 					my $aln_o = $hsp->get_aln;
> 					$query_string=~s/\n//g;#get rid of new line characters
> 					$homology_string=~s/\n//g;
> 					$hit_string=~s/\n//g;
> 
>                         print "<h3>Alignment:</h3><br />";
> 			print "$start_query_num-$query_string-$end_query_num<br />";
> 			print "   
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
> 
> 
> 
> }
> 
> Please let me know how can I print them in the form of an alignment as seen
> in the blast results file.
> 
> Thanks
> 
> 
> -- 
> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 11:44:27 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:44:27 +0000
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu>

For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules).  This should automatically install the latest version from CPAN.  My guess is this will address some of the issues.  However, w/o actually seeing what tests failed we can't help.

Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB.  There are instructions in the installation docs for that.  You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan.

chris

On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
> 
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
> 
> 
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
> 
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
> 
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
> CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
> 
> 
> Thanks a lot for your time and help.  I appreciate it.
> 
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 11:46:16 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:46:16 +0000
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
References: <32851673.post@talk.nabble.com>
	<035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
Message-ID: <B7768538-08CE-40A0-8EB9-5EB5169C1072@illinois.edu>

small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance).

chris

On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote:

> Nisa,
> 
> See:
> 
> http://www.bioperl.org/wiki/HOWTO:SearchIO
> 
> Brian O.
> 
> 
> On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:
> 
>> 
>> Hi,
>> 
>> I am parsing a blast results file. I have found bioperl modules to get query
>> string, homology string and hit string for each hit/hsp. I want to print
>> them in the form of an alignment instead of aligning them individually.
>> 
>> this is what I am doing, but it doesn't seem correct
>> 
>> while (my $hsp = $hit->next_hsp) {
>>                                       my
>> $start_query_num=$hsp->start('query');
>> 					my $query_string=$hsp->query_string;
>> 					my $end_query_num=$hsp->end('query');
>> 					my $homology_string=$hsp->homology_string;
>> 					my $start_hit_num=$hsp->start('hit');
>> 					my $hit_string=$hsp->hit_string;
>> 					my $end_hit_num=$hsp->end('hit');
>> 					my $aln_o = $hsp->get_aln;
>> 					$query_string=~s/\n//g;#get rid of new line characters
>> 					$homology_string=~s/\n//g;
>> 					$hit_string=~s/\n//g;
>> 
>>                        print "<h3>Alignment:</h3><br />";
>> 			print "$start_query_num-$query_string-$end_query_num<br />";
>> 			print "   
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
>> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
>> 
>> 
>> 
>> }
>> 
>> Please let me know how can I print them in the form of an alignment as seen
>> in the blast results file.
>> 
>> Thanks
>> 
>> 
>> -- 
>> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Wed Nov 16 12:01:49 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Nov 2011 18:01:49 +0100
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <CAM3TQQWDJ1_HPrAUguFfH5ngV42WeUOvXE6N2GktgmeTFs=ijw@mail.gmail.com>

Hi Christel,

Sorry to hear you're having trouble with the installation.

It looks like these modules aren't getting installed and are causing the
failed tests:
CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO

I would try installing those separately via CPAN first and then trying
again to install BioPerl.

Also, it was a good idea to set the make_install_make_command option to
CPAN, and that should have worked. Unfortunately, there's another
installation system called Module::Build that has its own option which may
need to be set:
cpan> o conf mbuild_install_build_command 'sudo ./Build'


That being said, I would suggest you grab the latest version of BioPerl
from github instead of using v1.6.1 from CPAN, which is fairly out of date
at this point.

And unless you're planning to use BioPerl with GBrowse or Bio::Graphics,
there's another, simpler way to get BioPerl up and running (assuming you
have all the prerequisites like Data::Stag installed):

See "Don't want to install BioPerl?" here:
http://www.seqxml.org/xml/BioPerl.html


Best,
Dave


On Tue, Nov 15, 2011 at 02:39, Christel Chehoud <cchehoud at gmail.com> wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
>
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>  at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm
> line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
>
>
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
>
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
>
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
>  CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
>
>
> Thanks a lot for your time and help.  I appreciate it.
>
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jluis.lavin at unavarra.es  Wed Nov 16 13:31:46 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Wed, 16 Nov 2011 19:31:46 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>
	<CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
Message-ID: <CADm9iy=mMqHhWO5rTXbJS4ZG8aG-t0mAVHqN720tnyA7Hy_nkg@mail.gmail.com>

Thank you for your answer Jason,

While answering you I figured out how to do it...sometimes you need other
people's point of view to see the light.

As you pointed out:

"what is complicaticated is the file name right now is based on the query
name."

that's what I expected that could have an easy fix, the issue about the
dependency between the outfile name and the query name, this is why I
couldn't figure out how to change the name of the output .

While reading the code to answer you, I came across the solution.

I was persistent on doing it this way because I need to run BLAST remotely
on my CGI, that's why I didn't pay attention to all the other options you
suggested. Thank you all for your sugestions anyway.

;)

Best wishes

JL


El 16 de noviembre de 2011 18:03, Jason Stajich <jason at bioperl.org>escribi?:

> the answer to your question is to move the line that opens a file to
> outside the loop. what is complicaticated is the file name right now is
> based on the query name. so you need to think how you want to name the
> file. Since this isn't obvious to you, then I think we are suggesting you
> probably need to understand programming more, and it might just be easier
> to use the tools as we have suggested rather than teaching you to modify
> what is just an example code.  our suggestions are based on the way we'd
> solve the problem so maybe you have other reasons for the direction you
> want to take.
>
> I also think it is not efficient or logical to run
> remote blast through the web protocol simply to write it back out with
> bioperl since that has to parse it in and then write it out -- why not just
> run the program that generates the output directly from NCBI. Or run BLAST
> locally for likely more efficient running.
>
>  Finally the bioperl writer may not 100% reproduce the blast output so if
> you are planning on further parsing the output that comes out from this
> script, it really doesn't seem like a good idea to launder it through
> bioperl parser first.
>
>
>
> 2011/11/14 Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>
>> Thank you very much for your answers, but due to them, I'm afraid I didn't
>> explained myself good enough.
>>
>>  I'm not looking for another tool to perform a BLAST task. I was just
>> wondering if there was a way to simply change the way the module writes
>> the
>> outputs, so that I can get multiple searches in a single report file
>> instead of having a report for each BLAST search.
>>
>> Maybe there's some issue I ignore, that makes you recommend the use of
>> other tools instead of the Bioperl Remote BLAST module...it would be
>> appreciated if you let me know about that (NCBI server problems with
>> web-services or so)...
>>
>> Thank you for your answers anyway
>>
>> Best wishes
>>
>> 2011/11/14 Fields, Christopher J <cjfields at illinois.edu>
>>
>> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for
>> the
>> > various 'blast*' indicating the search is to use a remote database.  I
>> > haven't used it, though...
>> >
>> > chris
>> >
>> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>> >
>> > > Please keep this on list discussions
>> > >
>> > > Sent from my iPhone-please excuse typos
>> > >
>> > > --
>> > > Jason Stajich
>> > >
>> > > Begin forwarded message:
>> > >
>> > >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> > >> Date: November 14, 2011 8:04:25 AM EST
>> > >> To: Jason Stajich <jason.stajich at gmail.com>
>> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
>> > single out
>> > >>
>> > >> Hello Jason,
>> > >>
>> > >> As answering your question:
>> > >>
>> > >> " If you want to do this within this code I guess the question is
>> what
>> > format you want the data in - a BLAST report or something more like a
>> > table?"
>> > >>
>> > >> A concatenation of BLAST (default format) reports should be OK,
>> since I
>> > have a script to parse that kind of results. Anyway formats 1 or 2 will
>> > also do the trick.
>> > >> I'll be happy to get assistance  on how to change the OUTFILE from "a
>> > query a report" to "all queries in the same report", because I don't
>> seem
>> > to be able to do it myself after reading the module documentation.
>> > >>
>> > >> Thanks in advance
>> > >>
>> > >> El 14 de noviembre de 2011 12:59, Jason Stajich <
>> > jason.stajich at gmail.com> escribi?:
>> > >> if you want to do a bunch of BLASTs remotely on the cmdline you
>> should
>> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
>> > equivalent). This might be faster to do and easier since you need to
>> learn
>> > the programming part too.
>> > >>
>> > >> If you want to do this within this code I guess the question is what
>> > format you want the data in - a BLAST report or something more like a
>> table?
>> > >>
>> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> > >>
>> > >>> Hello everybody,
>> > >>>
>> > >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it
>> has
>> > >>> worked fine for me. Now I need to perform a multiple BLAST search,
>> but
>> > this
>> > >>> time I'd just like to get all the BLAST results in a single out file
>> > >>> instead of having each sequence's report written individually. I've
>> > read
>> > >>> the documentation of the module, but due to my short
>> > >>> experience/understanding on complex modules as this one seems to be
>> I
>> > can't
>> > >>> figure out where to change the script to achieve my previously
>> > mentioned
>> > >>> aim.
>> > >>> Here I post the script I've been using (it's basically the one
>> posted
>> > on
>> > >>> the module cookbook).
>> > >>>
>> > >>> #!/c:/Perl -w
>> > >>> use Bio::Tools::Run::RemoteBlast;
>> > >>> use Bio::SearchIO;
>> > >>> use Data::Dumper;
>> > >>>
>> > >>> #Here i set the parameters for blast
>> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>> > >>> tblastx):\n";
>> > >>> my $blst = <STDIN>;
>> > >>> my $prog = "$blst";
>> > >>> print "Enter a database to search (nr, refseq_protein, swissprot,
>> pat,
>> > pdb,
>> > >>> env_nr):\n";
>> > >>> my $dtb = <STDIN>;
>> > >>> $db = "$dtb";
>> > >>> print "Enter your cutt off score (1e-n):\n";
>> > >>> my $cut = <STDIN>;
>> > >>> my $e_val = "$cut";
>> > >>>
>> > >>> my @params = ( '-prog' => $prog,
>> > >>>        '-data' => $db,
>> > >>>        '-expect' => $e_val,
>> > >>>        '-readmethod' => 'SearchIO' );
>> > >>>
>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>> > >>>
>> > >>>
>> > >>> #Select the file and make the blast.
>> > >>> print "Enter your FASTA file:\n";
>> > >>> chomp(my $infile = <STDIN>);
>> > >>> my $r = $remoteBlast->submit_blast($infile);
>> > >>> my $v = 1;
>> > >>>
>> > >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
>> > RESULTS
>> > >>> TO RETURN!!!!!
>> > >>>   while ( my @rids = $remoteBlast->each_rid ) {
>> > >>>     foreach my $rid ( @rids ) {
>> > >>>       my $rc = $remoteBlast->retrieve_blast($rid);
>> > >>>       if( !ref($rc) ) {
>> > >>>         if( $rc < 0 ) {
>> > >>>           $remoteBlast->remove_rid($rid);
>> > >>>         }
>> > >>>         print STDERR "." if ( $v > 0 );
>> > >>>         sleep 5;
>> > >>>       } else {
>> > >>>         my $result = $rc->next_result();
>> > >>>         #save the output
>> > >>>         my $filename =
>> > >>> $result->query_name()."\.out";##################open SALIDA,
>> > >>> '>>'."$^T"."Report"."\.out";
>> > >>>         $remoteBlast->save_output($filename);#############
>> > >>>         $remoteBlast->remove_rid($rid);
>> > >>>         print "\nQuery Name: ", $result->query_name(), "\n";
>> > >>>         while ( my $hit = $result->next_hit ) {
>> > >>>           next unless ( $v > 0);
>> > >>>           print "\thit name is ", $hit->name, "\n";
>> > >>>           while( my $hsp = $hit->next_hsp ) {
>> > >>>             print "\t\tscore is ", $hsp->score, "\n";
>> > >>>           }
>> > >>>         }
>> > >>>       }
>> > >>>     }
>> > >>>   }
>> > >>>
>> > >>>
>> > >>> May any of you please explain me how to solve my question?
>> > >>>
>> > >>> Thanks in advence
>> > >>>
>> > >>> With best wishes
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>> _______________________________________________
>> > >>> Bioperl-l mailing list
>> > >>> Bioperl-l at lists.open-bio.org
>> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> Bioperl-l mailing list
>> > >> Bioperl-l at lists.open-bio.org
>> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> --
>> > >> Dr. Jos? Luis Lav?n Trueba
>> > >>
>> > >> Dpto. de Producci?n Agraria
>> > >> Grupo de Gen?tica y Microbiolog?a
>> > >> Universidad P?blica de Navarra
>> > >> 31006 Pamplona
>> > >> Navarra
>> > >> SPAIN
>> > >
>> > > _______________________________________________
>> > > Bioperl-l mailing list
>> > > Bioperl-l at lists.open-bio.org
>> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> --
>> Dr. Jos? Luis Lav?n Trueba
>>
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From l.m.timmermans at students.uu.nl  Fri Nov 18 09:15:47 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Fri, 18 Nov 2011 15:15:47 +0100
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAC1jpXC7uBtbHb_ixzMy2idvfeFQc1Y=d8Zi3xn_=0RyGYTzrA@mail.gmail.com>

On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C <casaburi at ceinge.unina.it>wrote:

> I need to parse in an exel sheet :
>

What you're saying here is nonsense. I think you meant to say you want to
output Excel.


> Is possible from a big blast result file obtain an exel with 5 columns
> where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>

There are a number of Perl modules on CPAN for outputting Excel. Try
Excel::Writer::XLSX and Spreadsheet::WriteExcel for example.

Leon


From tzhu at mail.bnu.edu.cn  Mon Nov 21 00:17:18 2011
From: tzhu at mail.bnu.edu.cn (Tao Zhu)
Date: Mon, 21 Nov 2011 13:17:18 +0800
Subject: [Bioperl-l] Is there a "combine" method that would combine several
 sequence alignments to a single alignment?
Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn>

I can use the "slice" method to split a single sequence alignment into 
several subalignments. Then is there a corresponding "combine" method to 
combine such subalignments back?

-- 
Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
100875, China
Email: tzhu at mail.bnu.edu.cn


From David.Messina at sbc.su.se  Mon Nov 21 04:58:51 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 21 Nov 2011 10:58:51 +0100
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
Message-ID: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>

Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>


From roy.chaudhuri at gmail.com  Mon Nov 21 06:41:09 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 21 Nov 2011 11:41:09 +0000
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <4ECA38D5.8050709@gmail.com>

See the cat method in Bio::Align::Utilities:

http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat

On 21/11/2011 09:58, Dave Messina wrote:
> Hi,
>
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
>
>
> Dave
>
>
>
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu<tzhu at mail.bnu.edu.cn>  wrote:
>
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>>
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>>
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From zntayl at gmail.com  Wed Nov 16 20:07:07 2011
From: zntayl at gmail.com (Nathan Taylor)
Date: Wed, 16 Nov 2011 20:07:07 -0500
Subject: [Bioperl-l] seqIO.pm
Message-ID: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>

Hello,

   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
barring that, a file of fastas and file of quals into .phd files?

Many thanks,
Nathan


From gregonomic at yahoo.co.nz  Mon Nov 21 07:00:50 2011
From: gregonomic at yahoo.co.nz (Gregory Baillie)
Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST)
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>

Hi.

I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.

It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.

Usage:
concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>


If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').

Greg.


________________________________
 From: Dave Messina <David.Messina at sbc.su.se>
To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
Cc: BioPerl <bioperl-l at lists.open-bio.org> 
Sent: Monday, 21 November 2011 7:58 PM
Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
 
Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: concatenate_alignments.pl
Type: application/octet-stream
Size: 3349 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111121/aa673dba/attachment-0002.obj>

From jason.stajich at gmail.com  Mon Nov 21 10:31:50 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 21 Nov 2011 10:31:50 -0500
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
	<1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com>

greg  -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out.

This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment.

https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl

On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote:

> Hi.
> 
> I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.
> 
> It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.
> 
> Usage:
> concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>
> 
> 
> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').
> 
> Greg.
> 
> 
> ________________________________
> From: Dave Messina <David.Messina at sbc.su.se>
> To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
> Cc: BioPerl <bioperl-l at lists.open-bio.org> 
> Sent: Monday, 21 November 2011 7:58 PM
> Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
> 
> Hi,
> 
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
> 
> 
> Dave
> 
> 
> 
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:
> 
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>> 
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>> 
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l<concatenate_alignments.pl>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Mon Nov 21 11:15:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 21 Nov 2011 16:15:13 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
Message-ID: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>

On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter


From cjfields at illinois.edu  Mon Nov 21 11:57:29 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 21 Nov 2011 16:57:29 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu>

On Nov 21, 2011, at 10:15 AM, Peter Cock wrote:

> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
>> Hello,
>> 
>>   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
>> barring that, a file of fastas and file of quals into .phd files?
>> 
>> Many thanks,
>> Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an
> error message?
> 
> Peter

This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose.  Nathan, if you run into problems with that conversion let us know.

chris


From rondonbio at yahoo.com.br  Mon Nov 21 12:31:21 2011
From: rondonbio at yahoo.com.br (Rondon Neto)
Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST)
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>

Hi! try this script:

#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;

if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }

my $fastq = $ARGV[0];

my $in = Bio::SeqIO->new( -file => $fastq,
?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );

my $out = Bio::SeqIO->new ( -file => ">out.phd",
?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');

while (my $seq = $in->next_seq()) {
?? ? ?$out->write_seq($seq);
}

exit;


Best wishes,
Rondon, a brazilian friend.


________________________________
 De: Peter Cock <p.j.a.cock at googlemail.com>
Para: Nathan Taylor <zntayl at gmail.com> 
Cc: bioperl-l at bioperl.org 
Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
Assunto: Re: [Bioperl-l] seqIO.pm
 
On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Mon Nov 21 15:04:01 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 22 Nov 2011 09:04:01 +1300
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
	<1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz>

Or you could use the builtin script bp_sreformat.pl

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rondon Neto
> Sent: Tuesday, 22 November 2011 6:31 a.m.
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] seqIO.pm
> 
> Hi! try this script:
> 
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Bio::SeqIO;
> 
> if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }
> 
> my $fastq = $ARGV[0];
> 
> my $in = Bio::SeqIO->new( -file => $fastq,
> ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );
> 
> my $out = Bio::SeqIO->new ( -file => ">out.phd",
> ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');
> 
> while (my $seq = $in->next_seq()) {
> ?? ? ?$out->write_seq($seq);
> }
> 
> exit;
> 
> 
> Best wishes,
> Rondon, a brazilian friend.
> 
> 
> 
> 
> 
> 
> ________________________________
>  De: Peter Cock <p.j.a.cock at googlemail.com>
> Para: Nathan Taylor <zntayl at gmail.com>
> Cc: bioperl-l at bioperl.org
> Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
> Assunto: Re: [Bioperl-l] seqIO.pm
> 
> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> > Hello,
> >
> > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> > barring that, a file of fastas and file of quals into .phd files?
> >
> > Many thanks,
> > Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an error message?
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From goodyearkl at gmail.com  Mon Nov 21 21:23:13 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>

Hi,
This may seem like a stupid question but I am just learning bioperl
and I am trying to figure out how to get a count of all the characters
in my FASTA file. I manged to get the number of sequences using the
following. Is there a way to tell bioperl to count the characters?

#!/usr/bin/perl -w
#Defines perl modules
#Bio::Seq deal with sequences and their features
use Bio::Seq;
#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;
#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );
#Count how many sequences are present in file
my $count=0;
while (my $seq_obj = $seqio_obj->next_seq) {
    $count++;
}
#Display the number of sequences present
print "There are $count sequences present.\n";


From David.Messina at sbc.su.se  Tue Nov 22 03:08:11 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 22 Nov 2011 09:08:11 +0100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>

Hi Kylie,

You can use the length method for this.

my $seq_length = $seq_obj->length();

Have you taken a look at the beginner's HOWTO? There's a nice table of
sequence methods as well lots of other good information in there.

http://www.bioperl.org/wiki/HOWTO:Beginners


Dave


On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyearkl at gmail.com> wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
>
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From liam.elbourne at mq.edu.au  Mon Nov 21 23:11:12 2011
From: liam.elbourne at mq.edu.au (Liam Elbourne)
Date: Tue, 22 Nov 2011 15:11:12 +1100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <EEEBBE60-96CB-4458-A460-F154CCC7459D@mq.edu.au>

Hi Kylie,

I think the length() method is what you're after:

....
my $sequence_length = $seq_obj->length();

....

in your case. Have a look at:

HOWTO:SeqIO - BioPerl

and,

HOWTO:Beginners - BioPerl

for some more general stuff.


Regards,
Liam.


On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment-0002.bin>

From goodyearkl at gmail.com  Tue Nov 22 08:00:55 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>

Thank you for your help. It keeps telling me that it can't find
"length" do you think it has to do with the way I am coding it?

#!/usr/bin/perl -w
#Defines perl modules

#Bio::Seq deal with sequences and their features
use Bio::Seq;

#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;


#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );


#Count how many sequences are present in file
my $countseq=0;
while (my $seq_obj = $seqio_obj->next_seq, ) {
    $countseq++;
    }
#Display the number of sequences present
print "There are $countseq sequences present.\n";

#Count number of charcaters in file
my $seq_length = $seq_obj->length ;
print $seq_length


On Nov 22, 5:08?am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> Hi Kylie,
>
> You can use the length method for this.
>
> my $seq_length = $seq_obj->length();
>
> Have you taken a look at the beginner's HOWTO? There's a nice table of
> sequence methods as well lots of other good information in there.
>
> http://www.bioperl.org/wiki/HOWTO:Beginners
>
> Dave
>
>
>
>
>
>
>
>
>
> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
> > Hi,
> > This may seem like a stupid question but I am just learning bioperl
> > and I am trying to figure out how to get a count of all the characters
> > in my FASTA file. I manged to get the number of sequences using the
> > following. Is there a way to tell bioperl to count the characters?
>
> > #!/usr/bin/perl -w
> > #Defines perl modules
> > #Bio::Seq deal with sequences and their features
> > use Bio::Seq;
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats
> > use Bio::SeqIO;
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> > #Count how many sequences are present in file
> > my $count=0;
> > while (my $seq_obj = $seqio_obj->next_seq) {
> > ? ?$count++;
> > }
> > #Display the number of sequences present
> > print "There are $count sequences present.\n";
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy.chaudhuri at gmail.com  Tue Nov 22 10:50:31 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 22 Nov 2011 15:50:31 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <4ECBC4C7.10401@gmail.com>

Hi Kylie,

I suspect the error you get is actually "Can't call method length on an 
undefined value" (please in future report the exact text of any error 
messages). You declare $seq_obj with "my" in the while loop, but then 
try to access it outside of the loop. Try printing out the length of 
each $seq_obj within the while loop.

You should always include "use strict;" at the top of your program, that 
helps to catch errors like this.

Cheers,
Roy.

On 22/11/2011 13:00, Kylie Goodyear wrote:
> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
>
> #!/usr/bin/perl -w
> #Defines perl modules
>
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
>
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
>
>
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
> =>  "fasta" );
>
>
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>      $countseq++;
>      }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
>
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
>
>
> On Nov 22, 5:08 am, Dave Messina<David.Mess... at sbc.su.se>  wrote:
>> Hi Kylie,
>>
>> You can use the length method for this.
>>
>> my $seq_length = $seq_obj->length();
>>
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>>
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>>
>> Dave
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear<goodyea... at gmail.com>  wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>>
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
>>> =>  "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>     $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 22 11:13:01 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 22 Nov 2011 16:13:01 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>

This sounds a little homework-y.  Sure this isn't for a class? :)

One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl.  Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length.

chris

On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:

> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> 
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> 
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> 
> 
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> 
> 
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>    $countseq++;
>    }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
> 
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
> 
> 
> On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
>> Hi Kylie,
>> 
>> You can use the length method for this.
>> 
>> my $seq_length = $seq_obj->length();
>> 
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>> 
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>> 
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
>>> => "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>    $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Nov 22 15:47:36 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 23 Nov 2011 09:47:36 +1300
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
	<0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz>

Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl
As previous posters have hinted, RTFM - the answers are all in there!

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J
> Sent: Wednesday, 23 November 2011 5:13 a.m.
> To: Kylie Goodyear
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] Fasta counting script?
> 
> This sounds a little homework-y.  Sure this isn't for a class? :)
> 
> One clue (and a good thing to keep in mind): always 'use strict; use warnings;'
> with your scripts if you are new to perl.  Doing so would let you know there is
> a problem with the script the way it is written, specifically, the place where
> you are inquiring about the length.
> 
> chris
> 
> On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:
> 
> > Thank you for your help. It keeps telling me that it can't find
> > "length" do you think it has to do with the way I am coding it?
> >
> > #!/usr/bin/perl -w
> > #Defines perl modules
> >
> > #Bio::Seq deal with sequences and their features use Bio::Seq;
> >
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats use Bio::SeqIO;
> >
> >
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> >
> >
> > #Count how many sequences are present in file my $countseq=0; while
> > (my $seq_obj = $seqio_obj->next_seq, ) {
> >    $countseq++;
> >    }
> > #Display the number of sequences present print "There are $countseq
> > sequences present.\n";
> >
> > #Count number of charcaters in file
> > my $seq_length = $seq_obj->length ;
> > print $seq_length
> >
> >
> > On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> >> Hi Kylie,
> >>
> >> You can use the length method for this.
> >>
> >> my $seq_length = $seq_obj->length();
> >>
> >> Have you taken a look at the beginner's HOWTO? There's a nice table
> >> of sequence methods as well lots of other good information in there.
> >>
> >> http://www.bioperl.org/wiki/HOWTO:Beginners
> >>
> >> Dave
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com>
> wrote:
> >>> Hi,
> >>> This may seem like a stupid question but I am just learning bioperl
> >>> and I am trying to figure out how to get a count of all the
> >>> characters in my FASTA file. I manged to get the number of sequences
> >>> using the following. Is there a way to tell bioperl to count the characters?
> >>
> >>> #!/usr/bin/perl -w
> >>> #Defines perl modules
> >>> #Bio::Seq deal with sequences and their features use Bio::Seq;
> >>> #Bio::SeqIO handles reading and parsing of sequences of many
> >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj =
> >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta"
> >>> ); #Count how many sequences are present in file my $count=0; while
> >>> (my $seq_obj = $seqio_obj->next_seq) {
> >>>    $count++;
> >>> }
> >>> #Display the number of sequences present print "There are $count
> >>> sequences present.\n";
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioper... at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf
> >> o/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From charles-listes+bioperl at plessy.org  Wed Nov 23 05:27:45 2011
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 23 Nov 2011 19:27:45 +0900
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
Message-ID: <20111123102745.GC20168@merveille.plessy.net>

Dear BioPerl developers,

I am trying to process some unaligned paired-end reads with Bio::DB::Sam.  For
each pair, I want to detect a sequence index and a unique molecular identifier in
the linker, record them as auxiliary flags, and trim the linker from the read.

I collect the pairs through a features iterator, and can access all their data
through the high-level Bio::DB::Bam::Alignment API.  After modifying them
(linker trimming and adding flags), I want to write the resulting pairs as a
new unaligned BAM file.

I apologise if the solution is trivial, but my problem is that I do not manage to
modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
?$pair[0]->qseq("GATACA")? give errors like
?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.

Since I did not find explanations or portsions of source code indicating how to
modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


From MEC at stowers.org  Wed Nov 23 11:02:26 2011
From: MEC at stowers.org (Cook, Malcolm)
Date: Wed, 23 Nov 2011 10:02:26 -0600
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>

Charles,

I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.

I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".

~Malcolm


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> Sent: Wednesday, November 23, 2011 4:28 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> 
> Dear BioPerl developers,
> 
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
> For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the read.
> 
> I collect the pairs through a features iterator, and can access all their data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as a
> new unaligned BAM file.
> 
> I apologise if the solution is trivial, but my problem is that I do not manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> 
> Since I did not find explanations or portsions of source code indicating how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> 
> Have a nice day,
> 
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 23 14:26:31 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 23 Nov 2011 19:26:31 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
Message-ID: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>

According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).  

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
    Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
    if (items > 1)
      b->core.l_qseq = SvIV(ST(1));
    RETVAL=b->core.l_qseq;
OUTPUT:
    RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
    char* seq;
    int   i;
CODE:
    seq = Newxz(seq,b->core.l_qseq+1,char);
    for (i=0;i<b->core.l_qseq;i++) {
      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
    }
    RETVAL = newSVpv(seq,b->core.l_qseq);
    Safefree(seq);
OUTPUT:
    RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
> 
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
> 
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
> 
> ~Malcolm
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>> 
>> Dear BioPerl developers,
>> 
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>> 
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>> 
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>> 
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>> 
>> Have a nice day,
>> 
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lincoln.stein at gmail.com  Wed Nov 23 17:02:23 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:02:23 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <CAOS1dzwxY2Kt3_xkgnbCps_TYfnT3dGE9+gAirpBCeJoMT7YDg@mail.gmail.com>

I apologize that the qseq() method is only allowing read-only access. I
will attempt to fix this.

Lincoln

On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy <
charles-listes+bioperl at plessy.org> wrote:

> Dear BioPerl developers,
>
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>  For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the
> read.
>
> I collect the pairs through a features iterator, and can access all their
> data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as
> a
> new unaligned BAM file.
>
> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>
> Since I did not find explanations or portsions of source code indicating
> how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>
> Have a nice day,
>
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From lincoln.stein at gmail.com  Wed Nov 23 17:05:41 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:05:41 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
Message-ID: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>

Unfortunately l_qseq read/writes the length of the query sequence, not the
sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <
cjfields at illinois.edu> wrote:

> According to the docs the low-level API for Bio-Samtools, both read and
> write are allowed:
>
> http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API
>
> Using the low-level API for this purpose isn't documented as well, though
> (the high-level API is read only AFAICT).
>
> The error message is a standard one generated from the XS bindings where
> the passed argument passed isn't mapped correctly.  Looking through the
> Sam.xs file, qseq() is only prototyped as a reader; the only arg is a
> Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a
> function specified for Bio::DB::Bam::Alignment names l_qseq() that might be
> the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_'
> prefix):
>
> ....
>
> int
> bama_l_qseq(b,...)
>    Bio::DB::Bam::Alignment b
> PROTOTYPE: $;$
> CODE:
>    if (items > 1)
>      b->core.l_qseq = SvIV(ST(1));
>    RETVAL=b->core.l_qseq;
> OUTPUT:
>    RETVAL
>
> SV*
> bama_qseq(b)
> Bio::DB::Bam::Alignment b
> PROTOTYPE: $
> PREINIT:
>    char* seq;
>    int   i;
> CODE:
>    seq = Newxz(seq,b->core.l_qseq+1,char);
>    for (i=0;i<b->core.l_qseq;i++) {
>      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
>    }
>    RETVAL = newSVpv(seq,b->core.l_qseq);
>    Safefree(seq);
> OUTPUT:
>    RETVAL
>
>
> -chris
>
> On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:
>
> > Charles,
> >
> > I suggest you reconsider your approach to rather, use `samtools view` to
> pipe your reads to stdout in sam format, then stream edit the barcode and
> pipe it back to samtools for conversion back to .bam file.
> >
> > I know this is not what you're asking.  I'm pretty sure that direct
> answer to your question is, "yes - they are read-only".
> >
> > ~Malcolm
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> >> Sent: Wednesday, November 23, 2011 4:28 AM
> >> To: bioperl-l at bioperl.org
> >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> >>
> >> Dear BioPerl developers,
> >>
> >> I am trying to process some unaligned paired-end reads with
> Bio::DB::Sam.
> >> For
> >> each pair, I want to detect a sequence index and a unique molecular
> >> identifier in
> >> the linker, record them as auxiliary flags, and trim the linker from
> the read.
> >>
> >> I collect the pairs through a features iterator, and can access all
> their data
> >> through the high-level Bio::DB::Bam::Alignment API.  After modifying
> them
> >> (linker trimming and adding flags), I want to write the resulting pairs
> as a
> >> new unaligned BAM file.
> >>
> >> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> >> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> >> ?$pair[0]->qseq("GATACA")? give errors like
> >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> >>
> >> Since I did not find explanations or portsions of source code
> indicating how to
> >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> >>
> >> Have a nice day,
> >>
> >> --
> >> Charles Plessy
> >> Tsurumi, Kanagawa, Japan
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From cjfields at illinois.edu  Wed Nov 23 20:07:09 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 24 Nov 2011 01:07:09 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>,
	<CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu>

Ah, okay, makes sense.  I thought it was oddly named. :)

Chris

Sent from my iPad

On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" <lincoln.stein at gmail.com<mailto:lincoln.stein at gmail.com>> wrote:

Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
   Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
   if (items > 1)
     b->core.l_qseq = SvIV(ST(1));
   RETVAL=b->core.l_qseq;
OUTPUT:
   RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
   char* seq;
   int   i;
CODE:
   seq = Newxz(seq,b->core.l_qseq+1,char);
   for (i=0;i<b->core.l_qseq;i++) {
     seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
   }
   RETVAL = newSVpv(seq,b->core.l_qseq);
   Safefree(seq);
OUTPUT:
   RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
>
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
>
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
>
> ~Malcolm
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
>> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org<mailto:bioperl-l at bioperl.org>
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>>
>> Dear BioPerl developers,
>>
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>>
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>>
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>>
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>>
>> Have a nice day,
>>
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From ross at cuhk.edu.hk  Sun Nov 27 03:24:43 2011
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sun, 27 Nov 2011 16:24:43 +0800
Subject: [Bioperl-l] Check the location type for a particular gene in a
	Genbank file
In-Reply-To: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>

Hi all,

To write a script to extract sequence generically for all types of
BioLocation objects, I'd like to know if there is any way to check what
types (e.g. simple or split) are being processed.

Bio::Location::CoordinatePolicyI appears to be doing something similar but
it is more like a post checking step. If I parse the genbank file line by
line, I can certainly check whether the line contains keywords like "join"
but as I'm using something like:

        my @features=grep{$_->primary_tag eq $chkTags[0]}
$seqobj->get_SeqFeatures;                                    
 

        foreach (@features) {

            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';

            @gene=[];                                                   

I'd appreciate if anybody knows a better integration with the well-developed
bioperl module.

Thanks a lot.


From Russell.Smithies at agresearch.co.nz  Sun Nov 27 19:46:05 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 28 Nov 2011 13:46:05 +1300
Subject: [Bioperl-l] Galaxy tools?
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>

Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl?
I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox.
It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space)

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From p.j.a.cock at googlemail.com  Sun Nov 27 20:28:33 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Nov 2011 01:28:33 +0000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
Message-ID: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>

On Monday, November 28, 2011, Smithies, Russell  wrote:
> Possibly the wrong place to ask but has anyone written
> Galaxy tools using BioPerl?
> I was thinking of creating blast graphic and format converter
>  tools as I couldn't see any already available in their toolbox.
> It looks like I can just write a Python wrapper for my existing
> BioPerl scripts - although I suspect the "correct" method is to
> use BioPython methods (but Python annoys me with its lack
> of semi-colons and required white-space)

Galaxy is agnostic about what language the tools are in,
you can use a binary, shell script, Java, Perl, Python etc.

Peter


From florent.angly at gmail.com  Sun Nov 27 21:09:45 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 12:09:45 +1000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
	<CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
Message-ID: <4ED2ED69.10601@gmail.com>

Hi Russell,

As Peter said, the tools to be wrapped do not need to be written in Python.

I have build a few wrappers for Galaxy, including one for the read 
simulator Grinder (http://sourceforge.net/projects/biogrinder/), which 
uses Bioperl and is available in the Galaxy Toolshed 
(http://sourceforge.net/projects/biogrinder/). It is not very hard to do 
a wrapper for trivial programs, but becomes more complicated once you 
start having optional arguments or multiple output files.

Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) 
to parse command-line arguments. I have been thinking about leveraging 
the information that Getopt::Euclid stores about command-line arguments 
to automate most of the Galaxy wrapper generation, but I have not gotten 
to it yet.

Florent


On 28/11/11 11:28, Peter Cock wrote:
> On Monday, November 28, 2011, Smithies, Russell  wrote:
>> Possibly the wrong place to ask but has anyone written
>> Galaxy tools using BioPerl?
>> I was thinking of creating blast graphic and format converter
>>   tools as I couldn't see any already available in their toolbox.
>> It looks like I can just write a Python wrapper for my existing
>> BioPerl scripts - although I suspect the "correct" method is to
>> use BioPython methods (but Python annoys me with its lack
>> of semi-colons and required white-space)
> Galaxy is agnostic about what language the tools are in,
> you can use a binary, shell script, Java, Perl, Python etc.
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Sun Nov 27 23:35:31 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 14:35:31 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
Message-ID: <4ED30F93.4000407@gmail.com>

Hi all,

I have been thinking about starting a set of Perl modules that would 
useful for (microbial) ecologists to represent communities of organisms. 
At the moment, there does not seem to be anything like this in Bioperl. 
I am happy to make these modules available under the Bioperl umbrella 
using the Bio::Community::* namespace.

I envision the following modules:
* Bio::Community::Member module representing members of a community.
* Bio::Community::IO modules to read/write files that describe community 
composition (a.k.a. OTU table, or site by species table) as used 
programs like QIIME, Pyrotagger, GAAS, ...
* Bio::Community::Tools modules to help manipulate communities, e.g. to 
take some members at random, normalize the community to a given number 
of individuals, or do rarefaction curves.

The idea is to implement these modules in Moose to teach myself Moose. 
The members of a community could be a sequence (Bio::SeqI), a species 
(Bio::S), an arbitrary string or even other things. I am not quite sure 
if Bioperl provide facilities to attach some arbitrary information to an 
object.

Any interest? Ideas? Comments?

Thanks,

Florent


From cjfields at illinois.edu  Mon Nov 28 14:42:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:42:12 +0000
Subject: [Bioperl-l] Check the location type for a particular gene in
	a	Genbank file
In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
	<000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu>

Ross,

The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects

chris

On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote:

> Hi all,
> 
> To write a script to extract sequence generically for all types of
> BioLocation objects, I'd like to know if there is any way to check what
> types (e.g. simple or split) are being processed.
> 
> Bio::Location::CoordinatePolicyI appears to be doing something similar but
> it is more like a post checking step. If I parse the genbank file line by
> line, I can certainly check whether the line contains keywords like "join"
> but as I'm using something like:
> 
>        my @features=grep{$_->primary_tag eq $chkTags[0]}
> $seqobj->get_SeqFeatures;                                    
> 
> 
>        foreach (@features) {
> 
>            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';
> 
>            @gene=[];                                                   
> 
> I'd appreciate if anybody knows a better integration with the well-developed
> bioperl module.
> 
> Thanks a lot.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 28 14:47:10 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:47:10 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED30F93.4000407@gmail.com>
References: <4ED30F93.4000407@gmail.com>
Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>

I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?  I do think it should be developed on it's own, per our recent discussions re: slimming down core.

Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.

chris

On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:

> Hi all,
> 
> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
> 
> I envision the following modules:
> * Bio::Community::Member module representing members of a community.
> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
> 
> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
> 
> Any interest? Ideas? Comments?
> 
> Thanks,
> 
> Florent
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From l.m.timmermans at students.uu.nl  Mon Nov 28 15:25:13 2011
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Mon, 28 Nov 2011 21:25:13 +0100
Subject: [Bioperl-l]  Interest in Bio::Community modules
In-Reply-To: <CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
Message-ID: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>

And now to the list too,

On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:

> The idea is to implement these modules in Moose to teach myself Moose. The
> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
> an arbitrary string or even other things. I am not quite sure if Bioperl
> provide facilities to attach some arbitrary information to an object.
>
> Any interest? Ideas? Comments?
>

Sounds like a good use-case for roles, maybe even parametric roles.

Leon


From florent.angly at gmail.com  Mon Nov 28 19:59:24 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 29 Nov 2011 10:59:24 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
Message-ID: <4ED42E6C.6020501@gmail.com>

Hi Chris,

On 29/11/11 05:47, Fields, Christopher J wrote:
> I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?
None of these features would be duplicated. Rather, they would be used 
attributes of the Bio::Community::* objects. For example, a member of a 
community could have a Bio::SeqI attached to it as well as a Bio::Taxon, 
etc...

> I do think it should be developed on it's own, per our recent discussions re: slimming down core.
Yes, the features are so different that it makes sense to have the 
Bio::Community::* modules as a separate BioPerl distribution (like the 
Bio-FeatureIO BioPerl distribution).

> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* 
modules would need to inherit from any other BioPerl modules. 
Considering this and the performance aspects of Moose, do you think that 
using Moose is a wise design decision?

Best,

Florent


> chris
>
> On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:
>
>> Hi all,
>>
>> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
>>
>> I envision the following modules:
>> * Bio::Community::Member module representing members of a community.
>> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
>> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
>>
>> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
>>
>> Any interest? Ideas? Comments?
>>
>> Thanks,
>>
>> Florent
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 29 00:32:50 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 05:32:50 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
	<CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
Message-ID: <C87E8F45-FE8A-4E77-A612-DF1E25C9CA73@illinois.edu>

On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote:

> And now to the list too,
> 
> On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:
> 
>> The idea is to implement these modules in Moose to teach myself Moose. The
>> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
>> an arbitrary string or even other things. I am not quite sure if Bioperl
>> provide facilities to attach some arbitrary information to an object.
>> 
>> Any interest? Ideas? Comments?
>> 
> 
> Sounds like a good use-case for roles, maybe even parametric roles.
> 
> Leon

Yep, agree totally.  It would be a good replacement in most cases for the BioI interfaces.  

(see also, the Biome project, which I'm slooooooowly working on again, on github)

chris


From pmr at ebi.ac.uk  Tue Nov 29 08:39:52 2011
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 29 Nov 2011 13:39:52 +0000
Subject: [Bioperl-l] BinarySearch.pm
Message-ID: <4ED4E0A8.30102@ebi.ac.uk>

In trying to use bioflat_index.pl index files in EMBOSS, I ran into some 
problems.

Both appear to be in the Bio/Flat/BinarySearch.pm source file.

EMBL ID lines are failing to drop the ';' from the ID. Updating the 
regular expression to make sure the ';' is not picked up seems to work:

   if ($format =~ /embl/i) {
     return ('ID',
	    "^ID   (\\S+[^; ])",
	    "^ID   (\\S+[^; ])",
	    {
	     ACC     => q/^AC   (\S+);/,
	     VERSION => q/^SV\s+(\S+)/
	    });
   }

The ACC secondary index has every record duplicated.
This line is duplicated in the write_secondary_indices source code. Is 
that intentional?

  		    print $fh sprintf("%-${length}s",$record);

regards,

Peter Rice
EMBOSS Team


From uni.anastasia at gmail.com  Sat Nov 26 12:32:48 2011
From: uni.anastasia at gmail.com (anastsia shapiro)
Date: Sat, 26 Nov 2011 19:32:48 +0200
Subject: [Bioperl-l] Problem with parsing blast results
Message-ID: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>

Hello,

I'm running a script that should parse a blast results, using searchIO.

Sometimes the script work fines, however sometimes it stops, and I receive
the following error.

------------- EXCEPTION -------------
MSG: no data for midline Query
------------------------------------------------------------
STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
blast.pm:1805
STACK toplevel
D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
-------------------------------------
While the blast results files were received as a result of running the
following blast command:
blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I am using bioperl 1.6.1.
I read all the forums , and it seems to be a bug, but on version 1.5 it was
fixed.

I will really appreciate your help, since I am trying to understand the
problem for over a month.

Regards,
Anastasia


From bunk at novozymes.com  Tue Nov 29 11:46:54 2011
From: bunk at novozymes.com (Jacob Bunk Nielsen)
Date: Tue, 29 Nov 2011 17:46:54 +0100
Subject: [Bioperl-l] Problem with parsing blast results
In-Reply-To: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
	(anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100")
References: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net>

Hi

anastsia shapiro <uni.anastasia at gmail.com> writes:

> I'm running a script that should parse a blast results, using searchIO.
>
> Sometimes the script work fines, however sometimes it stops, and I receive
> the following error.
>
> ------------- EXCEPTION -------------
> MSG: no data for midline Query
> ------------------------------------------------------------
> STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
> blast.pm:1805
> STACK toplevel
> D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
> -------------------------------------
> While the blast results files were received as a result of running the
> following blast command:
> blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
> no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I don't know why this exact problem arises, but I think you should
consider using an output format that is better machine parseable, like
the XML format.

You specify XML as output format of blastn by using -m 7. When reading
the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO.

That way I think you are likely to see a lot fewer problems regarding
the parsing of blast output.

If the above doesn't solve the problem you better show us the code that
fails.

Best regards

Jacob


From cjfields at illinois.edu  Tue Nov 29 14:11:11 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 19:11:11 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED42E6C.6020501@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>

On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:

> Hi Chris,
> 
> On 29/11/11 05:47, Fields, Christopher J wrote:
> ...
>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?

Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.

> Best,
> 
> Florent


chris


From cjfields at illinois.edu  Tue Nov 29 17:30:58 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 22:30:58 +0000
Subject: [Bioperl-l] BinarySearch.pm
In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk>
References: <4ED4E0A8.30102@ebi.ac.uk>
Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu>

Peter, 

Can you send a test file that is failing?  I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files.  I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions.  Both changes pass tests as is, though, so I have committed them in the meantime.

chris

On Nov 29, 2011, at 7:39 AM, Peter Rice wrote:

> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems.
> 
> Both appear to be in the Bio/Flat/BinarySearch.pm source file.
> 
> EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work:
> 
>  if ($format =~ /embl/i) {
>    return ('ID',
> 	    "^ID   (\\S+[^; ])",
> 	    "^ID   (\\S+[^; ])",
> 	    {
> 	     ACC     => q/^AC   (\S+);/,
> 	     VERSION => q/^SV\s+(\S+)/
> 	    });
>  }
> 
> The ACC secondary index has every record duplicated.
> This line is duplicated in the write_secondary_indices source code. Is that intentional?
> 
> 		    print $fh sprintf("%-${length}s",$record);
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Nov 29 20:18:41 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 11:18:41 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
Message-ID: <4ED58471.3030106@gmail.com>

Chris,
Yes, it is exciting to learn something new.
I have developed a bit of code in the last few days in my local git 
repository. Do you think you could create a repository for Bio-Community 
on the Bioperl Github space or is it too soon?
Cheers,
Florent

On 30/11/11 05:11, Fields, Christopher J wrote:
> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>
>> Hi Chris,
>>
>> On 29/11/11 05:47, Fields, Christopher J wrote:
>> ...
>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>
>> Best,
>>
>> Florent
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 29 21:34:00 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 30 Nov 2011 02:34:00 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED58471.3030106@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
Message-ID: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>

On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:

> Chris,
> Yes, it is exciting to learn something new.
> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?

It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:

https://github.com/bioperl/Bio-Community

chris


> Cheers,
> Florent
> 
> On 30/11/11 05:11, Fields, Christopher J wrote:
>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>> 
>>> Hi Chris,
>>> 
>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>> ...
>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>> 
>>> Best,
>>> 
>>> Florent
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Nov 29 21:50:04 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 12:50:04 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
	<A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
Message-ID: <4ED599DC.6090808@gmail.com>

Fantastic! Thank you very much Chris,
Florent

On 30/11/11 12:34, Fields, Christopher J wrote:
> On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:
>
>> Chris,
>> Yes, it is exciting to learn something new.
>> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?
> It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:
>
> https://github.com/bioperl/Bio-Community
>
> chris
>
>
>> Cheers,
>> Florent
>>
>> On 30/11/11 05:11, Fields, Christopher J wrote:
>>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>>> ...
>>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>>>
>>>> Best,
>>>>
>>>> Florent
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lsbrath at gmail.com  Wed Nov 30 00:25:32 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 00:25:32 -0500
Subject: [Bioperl-l] Exception MSG
Message-ID: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>

Hello,

Brushing up on my BioPerl and I can't figure out this MSG:

------------- EXCEPTION -------------

MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out

STACK Bio::Tools::Run::RemoteBlast::save_output
/Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678

STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40

-------------------------------------
 Here is the code:

#!/usr/bin/perl -w

use strict;

use Bio::Tools::Run::RemoteBlast;


#=cut

my $prog = 'blastp';

my $db = 'swissprot';

my $e_val = '1e-10';


my @params = ('-prog' => $prog,

'-data' => $db,

'expect' => $e_val,

'readmethod' => 'SearchIO' );

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);


#human database

$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
[ORGN]';


my $v =1; # this is just to turn on and off the messages

# Construct the sequence object

my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
=> "fasta");


while (my $input = $seq_in->next_seq()){

my $r = $factory->submit_blast($input);

print STDERR "waiting..." if ($v > 0);

while (my @rids = $factory->each_rid()){

foreach my $rid (@rids){

my $rc = $factory->retrieve_blast($rid);

if( !ref($rc) ) {

if($rc < 0){

$factory->remove_rid($rid);

}

print STDERR "." if ($v > 0);

sleep 5;

} else {

my $result = $rc->next_result();

#save output

my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error

$factory->save_output($filename);

$factory->remove_rid($rid);

print "\nQuery Name: ", $result->query_name(), "\n";

          while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

            print "\thit name is ", $hit->name, "\n";

            while( my $hsp = $hit->next_hsp ) {

              print "\t\tscore is ", $hsp->score, "\n";

}

          }

        }

      }

    }

  }


Thanks for the help!


From jason.stajich at gmail.com  Wed Nov 30 01:05:41 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 29 Nov 2011 22:05:41 -0800
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>

I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself.

On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:

> Hello,
> 
> Brushing up on my BioPerl and I can't figure out this MSG:
> 
> ------------- EXCEPTION -------------
> 
> MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> 
> STACK Bio::Tools::Run::RemoteBlast::save_output
> /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> 
> STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> 
> -------------------------------------
> Here is the code:
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> use Bio::Tools::Run::RemoteBlast;
> 
> 
> #=cut
> 
> my $prog = 'blastp';
> 
> my $db = 'swissprot';
> 
> my $e_val = '1e-10';
> 
> 
> my @params = ('-prog' => $prog,
> 
> '-data' => $db,
> 
> 'expect' => $e_val,
> 
> 'readmethod' => 'SearchIO' );
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #human database
> 
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> [ORGN]';
> 
> 
> my $v =1; # this is just to turn on and off the messages
> 
> # Construct the sequence object
> 
> my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
> => "fasta");
> 
> 
> while (my $input = $seq_in->next_seq()){
> 
> my $r = $factory->submit_blast($input);
> 
> print STDERR "waiting..." if ($v > 0);
> 
> while (my @rids = $factory->each_rid()){
> 
> foreach my $rid (@rids){
> 
> my $rc = $factory->retrieve_blast($rid);
> 
> if( !ref($rc) ) {
> 
> if($rc < 0){
> 
> $factory->remove_rid($rid);
> 
> }
> 
> print STDERR "." if ($v > 0);
> 
> sleep 5;
> 
> } else {
> 
> my $result = $rc->next_result();
> 
> #save output
> 
> my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> 
> $factory->save_output($filename);
> 
> $factory->remove_rid($rid);
> 
> print "\nQuery Name: ", $result->query_name(), "\n";
> 
>          while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>            print "\thit name is ", $hit->name, "\n";
> 
>            while( my $hsp = $hit->next_hsp ) {
> 
>              print "\t\tscore is ", $hsp->score, "\n";
> 
> }
> 
>          }
> 
>        }
> 
>      }
> 
>    }
> 
>  }
> 
> 
> 
> Thanks for the help!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ss2489 at cornell.edu  Wed Nov 30 09:32:47 2011
From: ss2489 at cornell.edu (Surya Saha)
Date: Wed, 30 Nov 2011 09:32:47 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
Message-ID: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>

If that does not fix it, try using one of the unique identifiers as the
file name (gi??) instead of the full query name. The pipe(|) characters
might cause problems.

On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:

> I don't think you need to give it the '>' when you specify the filename
> for the output. That is done by the filehandle opening itsself.
>
> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>
> > Hello,
> >
> > Brushing up on my BioPerl and I can't figure out this MSG:
> >
> > ------------- EXCEPTION -------------
> >
> > MSG: cannot open
> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> >
> > STACK Bio::Tools::Run::RemoteBlast::save_output
> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> >
> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> >
> > -------------------------------------
> > Here is the code:
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use Bio::Tools::Run::RemoteBlast;
> >
> >
> > #=cut
> >
> > my $prog = 'blastp';
> >
> > my $db = 'swissprot';
> >
> > my $e_val = '1e-10';
> >
> >
> > my @params = ('-prog' => $prog,
> >
> > '-data' => $db,
> >
> > 'expect' => $e_val,
> >
> > 'readmethod' => 'SearchIO' );
> >
> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #human database
> >
> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> > [ORGN]';
> >
> >
> > my $v =1; # this is just to turn on and off the messages
> >
> > # Construct the sequence object
> >
> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
> -format
> > => "fasta");
> >
> >
> > while (my $input = $seq_in->next_seq()){
> >
> > my $r = $factory->submit_blast($input);
> >
> > print STDERR "waiting..." if ($v > 0);
> >
> > while (my @rids = $factory->each_rid()){
> >
> > foreach my $rid (@rids){
> >
> > my $rc = $factory->retrieve_blast($rid);
> >
> > if( !ref($rc) ) {
> >
> > if($rc < 0){
> >
> > $factory->remove_rid($rid);
> >
> > }
> >
> > print STDERR "." if ($v > 0);
> >
> > sleep 5;
> >
> > } else {
> >
> > my $result = $rc->next_result();
> >
> > #save output
> >
> > my $filename =
> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> >
> > $factory->save_output($filename);
> >
> > $factory->remove_rid($rid);
> >
> > print "\nQuery Name: ", $result->query_name(), "\n";
> >
> >          while ( my $hit = $result->next_hit ) {
> >
> >            next unless ( $v > 0);
> >
> >            print "\thit name is ", $hit->name, "\n";
> >
> >            while( my $hsp = $hit->next_hsp ) {
> >
> >              print "\t\tscore is ", $hsp->score, "\n";
> >
> > }
> >
> >          }
> >
> >        }
> >
> >      }
> >
> >    }
> >
> >  }
> >
> >
> >
> > Thanks for the help!
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From lsbrath at gmail.com  Wed Nov 30 09:34:52 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 09:34:52 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
	<CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
Message-ID: <CAJm=ba-yP6q53NunpxPJzdurthGE2uN3GAtiGs7eHm1rY6AdoA@mail.gmail.com>

Surya,

As Jason suggested, I removed the '>' and it worked. Thanks for your
response.

Lom

On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha <ss2489 at cornell.edu> wrote:

> If that does not fix it, try using one of the unique identifiers as the
> file name (gi??) instead of the full query name. The pipe(|) characters
> might cause problems.
>
> On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:
>
>> I don't think you need to give it the '>' when you specify the filename
>> for the output. That is done by the filehandle opening itsself.
>>
>> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>>
>> > Hello,
>> >
>> > Brushing up on my BioPerl and I can't figure out this MSG:
>> >
>> > ------------- EXCEPTION -------------
>> >
>> > MSG: cannot open
>> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
>> >
>> > STACK Bio::Tools::Run::RemoteBlast::save_output
>> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
>> >
>> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
>> >
>> > -------------------------------------
>> > Here is the code:
>> >
>> > #!/usr/bin/perl -w
>> >
>> > use strict;
>> >
>> > use Bio::Tools::Run::RemoteBlast;
>> >
>> >
>> > #=cut
>> >
>> > my $prog = 'blastp';
>> >
>> > my $db = 'swissprot';
>> >
>> > my $e_val = '1e-10';
>> >
>> >
>> > my @params = ('-prog' => $prog,
>> >
>> > '-data' => $db,
>> >
>> > 'expect' => $e_val,
>> >
>> > 'readmethod' => 'SearchIO' );
>> >
>> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> >
>> >
>> > #human database
>> >
>> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
>> > [ORGN]';
>> >
>> >
>> > my $v =1; # this is just to turn on and off the messages
>> >
>> > # Construct the sequence object
>> >
>> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
>> -format
>> > => "fasta");
>> >
>> >
>> > while (my $input = $seq_in->next_seq()){
>> >
>> > my $r = $factory->submit_blast($input);
>> >
>> > print STDERR "waiting..." if ($v > 0);
>> >
>> > while (my @rids = $factory->each_rid()){
>> >
>> > foreach my $rid (@rids){
>> >
>> > my $rc = $factory->retrieve_blast($rid);
>> >
>> > if( !ref($rc) ) {
>> >
>> > if($rc < 0){
>> >
>> > $factory->remove_rid($rid);
>> >
>> > }
>> >
>> > print STDERR "." if ($v > 0);
>> >
>> > sleep 5;
>> >
>> > } else {
>> >
>> > my $result = $rc->next_result();
>> >
>> > #save output
>> >
>> > my $filename =
>> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
>> >
>> > $factory->save_output($filename);
>> >
>> > $factory->remove_rid($rid);
>> >
>> > print "\nQuery Name: ", $result->query_name(), "\n";
>> >
>> >          while ( my $hit = $result->next_hit ) {
>> >
>> >            next unless ( $v > 0);
>> >
>> >            print "\thit name is ", $hit->name, "\n";
>> >
>> >            while( my $hsp = $hit->next_hsp ) {
>> >
>> >              print "\t\tscore is ", $hsp->score, "\n";
>> >
>> > }
>> >
>> >          }
>> >
>> >        }
>> >
>> >      }
>> >
>> >    }
>> >
>> >  }
>> >
>> >
>> >
>> > Thanks for the help!
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From ericdemuinck at gmail.com  Wed Nov 30 18:36:36 2011
From: ericdemuinck at gmail.com (Ericde)
Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST)
Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form
Message-ID: <32886592.post@talk.nabble.com>


:-/

I am a newbie and I am trying to retrieve a blast multiple alignment in
fasta form. The BLAST output (m -2) gives several alignments (which is good)
and the parsing of the xml file seems to list all of these alignments (which
is also good) 

The problem is that the fasta alignment file only includes one of the hits
and the alignment does not include all the sequences (including the query
sequence).

I would like to generate a fasta file that includes all the alignments
included in the m -2 output (plus query sequence if possible). I have
cobbled together a script (below) ...I will attach the sample blast xml file
and the (m -2) file as well....any insight is appreciated :/

#module load perl
 
#give the name of the blast xml file to parse in the line where it says
'file =>'
use Bio::SearchIO; 
#Use m -7 to generate xml file from blastall
my $in = new Bio::SearchIO(-format => 'blastxml', 
                           -file   => 'BLASToutxml');
while( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
#ENTER desired sequence length
      if( $hsp->length('total') > 50 ) {
#ENTER desired percent identity
        if ( $hsp->percent_identity >= 75 ) {
          print "Query=",   $result->query_name,
            " Hit=",        $hit->name,
            " Length=",     $hsp->length('total'),
            " Percent_id=", $hsp->percent_identity, "\n";
#Print alignment to file
#$aln will be a Bio::SimpleAlign object
       use Bio::AlignIO;
           my $aln = $hsp->get_aln;

#changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file 
          my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file =>
">hsp.fas"); 
      $alnIO->write_aln($aln);

        }
      }
    }  
  }
}
http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml 
http://old.nabble.com/file/p32886592/hsp.fas hsp.fas 
-- 
View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From hrh at fmi.ch  Tue Nov  1 06:18:54 2011
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Tue, 1 Nov 2011 11:18:54 +0100
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
Message-ID: <CAD5861E.14042%hrh@fmi.ch>

Hi Carn?

Please allow me to make a few comments:

I very much like your idea of writing a free tool to edit and draw
sequences. We (ie people working in core Bioinformatics facilities) all
suffer from having to deal with files originally created with commercial
packages. And on top of all the pain, those commercial packages are very
expensive and they don't deliver what they promise to do.


Just double checking: Have you looked a the free tools which are available?

I am aware of the following ones (as far as I know, they are all GUI based
and don't have a command line API):

Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
GENtle            http://gentle.magnusmanske.de/
GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
pDRAW32           http://www.acaclone.com/
Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
UGene             http://ugene.unipro.ru/

maybe others on the list know of even better free tools?

Also, have you looked at the emboss tool "cirdna" ?


WRT file formats: I strongly suggest to stick to embl and genbank format as
input and (text) output format. The features are not indexed, but you can
create your own when you store the sequences in your system. Internally, you
probably wanna keep the data in a 'simpler' format than embl or genbank,
anyway.

Alternatively, have you looked at gff/gtf as away of getting features?
see: 

http://www.sequenceontology.org/gff3.shtml
http://mblab.wustl.edu/GTF22.html


I am looking forward to any progress you make

Regards, Hans


Hans-Rudolf Hotz, PhD
Bioinformatics Support

Friedrich Miescher Institute for Biomedical Research
Maulbeerstrasse 66
4058 Basel/Switzerland


On 10/31/11 7:05 PM, "Carn? Draug" <carandraug+dev at gmail.com> wrote:

> Hi
> 
> I've been planning on writing a free (as in freedom) tool to edit
> sequences and make plamids maps. The idea is to build the command line
> tool first and maybe later work on a GUI for it.
> 
> The problem I foresee at the moment while designing it, is how to
> change a feature of the sequence. I'm not familiar with all sequence
> formats (only fasta, ensembl and genbank) but I can't see how to
> specify from the command line what feature to edit since I can't see
> any unique identifiers for them. Is there a file format that makes
> this easier? Any tips would be most appreciated.
> 
> Thank in advance,
> Carn? Draug
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 09:40:30 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 13:40:30 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>

On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote:

> Hi,
> 
> I am having problems running Bio::Index::Fastq.  I get the following error when a quality line begins with '@'.
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: No description line parsed
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
> STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
> STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
> STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198
> STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68
> 
> 
> Here is an example of a fastq record that is causing this error, The last line which starts with an '@'  is actually the qual line.
> 
> @5:105:15806:16092:Y
> GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG
> +
> @9;A565:=8B?<E<DEEBEE<E3BB?3??BCCF2<@@=BGGBDB60:64594.81?<B??;3?8-984?
> 
> 
> 
> i see that chris has partially addressed this in the mailing list
> http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html
> 
> However as he pointed out at the time, it appears this may be a fairly large problem.

The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not.  I can try to push this to the forefront this week, the fix shouldn't be too hard to implement.  In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running.

> My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0  would work since the header lines are always the first of 4 lines , 0,4,8, etc.

That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. 

One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing.  There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again.

A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second).  The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use.

> But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence
> 
> 
> ## only for single line seq and qual
> my $line_count = 0;
>   while (<$FASTQ>) {
>       if (/^@/ and  $line_count % 4 == 0) {
>           # $begin is the position of the first character after the '@'
>           my $begin = tell($FASTQ) - length( $_ ) + 1;
>           foreach my $id (&$id_parser($_)) {
>               $self->add_record($id, $i, $begin);
>               $c++;
>           }
>       }
>       $line_count++;
>   }
> 
> 
> --
> BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID?
> 
> There's one called cdbfasta which looks like it might work ? does anyone have experience with it?

I haven't, but it appears FASTA-specific.  Does it parse FASTQ as well?  

I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well.  May have to look that one up.

> Thanks,
> sofia
> 
> P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here.


chris


From p.j.a.cock at googlemail.com  Tue Nov  1 10:38:43 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 14:38:43 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
Message-ID: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>

On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
>
> One problem the various Bio* indexers have currently is the lack of
> standardization on a specific schema for indexing. ?There are in-roads
> towards this (OBDA) that haven't been adequately traveled IMHO,
> which need to be taken up again.
>

Something to switch to open-bio-l at lists.open-bio.org for,
http://lists.open-bio.org/mailman/listinfo/open-bio-l

We can continue this thread from last summer,
http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
...
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html

And CC Peter Rice from EMBOSS too - we chatted about this
at ISMB/BOSC 2011 in July - and whomever looks after the
OBDA/indexing code in BioRuby and BioJava too.

> A second, and maybe this is more specific to BioPerl, is that the
> parsers and indexers essentially reimplement the format parsing
> in each module, so if there are bugs they have to be independently
> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
> first but not the second). ?The best place for any optimizations
> would be in a unified parser that both the SeqIO and indexer
> modules could use.

We have that problem to an extent in Biopython's Bio.SeqIO code.
The indexing code duplicates some logic of the parsing code
(how much depends on the format), sufficient to extract the read
ID and the bounds on disk. The two could be more unified but
the parsers came first and didn't want to change them at the time.
Instead I tried to be rigorous in consistency testing for the index
code's unit tests.

Regards,

Peter


From carandraug+dev at gmail.com  Tue Nov  1 11:13:06 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Tue, 1 Nov 2011 15:13:06 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
	<CAD5861E.14042%hrh@fmi.ch>
Message-ID: <CAPOrs_0rZcokpSvAhMM3gtKWgeH3knDuTfnyybPJUU5D-WEgpA@mail.gmail.com>

On 1 November 2011 10:18, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):

They are not all free. Just for future reference, here's their licenses:

> Serial Cloner

Couldn't find a license and the download for linux has no source so
I'm guessing proprietary.

> GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/

Free under GPL

> GeneCoder

Proprietary

> pDRAW32

Proprietary

> Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/

Seems public domain. License is not defined anywhere but the files I
checked had the public domain notice on the header

> Ape

Proprietary ("license" is at the top of AppMain.tcl)

> UGene ? ? ? ? ? ? http://ugene.unipro.ru/

Free under GPL

> Also, have you looked at the emboss tool "cirdna" ?

Free under GPL

> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html

Considering the already existing alternatives, I'm more likely to
collaborate with one of them to do what I want. I'll just have to
check them all and decide. I was planning on writing a new tool and
contribute it to the scripts section of bioperl since when I googled
before all the links only the proprietary tools showed up. Thank you
very much for the links.

Carn?


From roy.chaudhuri at gmail.com  Tue Nov  1 11:44:19 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 01 Nov 2011 15:44:19 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAD5861E.14042%hrh@fmi.ch>
Message-ID: <4EB013D3.30801@gmail.com>

The Sanger Institute's Artemis is good for editing sequence features, 
and DNAPlotter can be used to produce circular diagrams:

http://www.sanger.ac.uk/resources/software/artemis
http://www.sanger.ac.uk/resources/software/dnaplotter

Roy.

On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote:
> Hi Carn?
>
> Please allow me to make a few comments:
>
> I very much like your idea of writing a free tool to edit and draw
> sequences. We (ie people working in core Bioinformatics facilities) all
> suffer from having to deal with files originally created with commercial
> packages. And on top of all the pain, those commercial packages are very
> expensive and they don't deliver what they promise to do.
>
>
> Just double checking: Have you looked a the free tools which are available?
>
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):
>
> Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
> GENtle            http://gentle.magnusmanske.de/
> GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
> pDRAW32           http://www.acaclone.com/
> Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
> Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
> UGene             http://ugene.unipro.ru/
>
> maybe others on the list know of even better free tools?
>
> Also, have you looked at the emboss tool "cirdna" ?
>
>
> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html
>
>
>
> I am looking forward to any progress you make
>
> Regards, Hans
>
>
>
> Hans-Rudolf Hotz, PhD
> Bioinformatics Support
>
> Friedrich Miescher Institute for Biomedical Research
> Maulbeerstrasse 66
> 4058 Basel/Switzerland
>
>
>
> On 10/31/11 7:05 PM, "Carn? Draug"<carandraug+dev at gmail.com>  wrote:
>
>> Hi
>>
>> I've been planning on writing a free (as in freedom) tool to edit
>> sequences and make plamids maps. The idea is to build the command line
>> tool first and maybe later work on a GUI for it.
>>
>> The problem I foresee at the moment while designing it, is how to
>> change a feature of the sequence. I'm not familiar with all sequence
>> formats (only fasta, ensembl and genbank) but I can't see how to
>> specify from the command line what feature to edit since I can't see
>> any unique identifiers for them. Is there a file format that makes
>> this easier? Any tips would be most appreciated.
>>
>> Thank in advance,
>> Carn? Draug
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Tue Nov  1 12:02:24 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 1 Nov 2011 09:02:24 -0700
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>


I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Jason
On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> 
>> One problem the various Bio* indexers have currently is the lack of
>> standardization on a specific schema for indexing.  There are in-roads
>> towards this (OBDA) that haven't been adequately traveled IMHO,
>> which need to be taken up again.
>> 
> 
> Something to switch to open-bio-l at lists.open-bio.org for,
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
> 
> We can continue this thread from last summer,
> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
> ...
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
> 
> And CC Peter Rice from EMBOSS too - we chatted about this
> at ISMB/BOSC 2011 in July - and whomever looks after the
> OBDA/indexing code in BioRuby and BioJava too.
> 
>> A second, and maybe this is more specific to BioPerl, is that the
>> parsers and indexers essentially reimplement the format parsing
>> in each module, so if there are bugs they have to be independently
>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>> first but not the second).  The best place for any optimizations
>> would be in a unified parser that both the SeqIO and indexer
>> modules could use.
> 
> We have that problem to an extent in Biopython's Bio.SeqIO code.
> The indexing code duplicates some logic of the parsing code
> (how much depends on the format), sufficient to extract the read
> ID and the bounds on disk. The two could be more unified but
> the parsers came first and didn't want to change them at the time.
> Instead I tried to be rigorous in consistency testing for the index
> code's unit tests.
> 
> Regards,
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 13:44:25 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 17:44:25 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>

On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:

> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point.  

> I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data.  The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)?  Or are there problems afoot there we're unaware of?

Re: specifics, I think Biopython uses SQLite, is that correct Peter?  

chris

> Jason
> On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:
> 
>> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> 
>>> One problem the various Bio* indexers have currently is the lack of
>>> standardization on a specific schema for indexing.  There are in-roads
>>> towards this (OBDA) that haven't been adequately traveled IMHO,
>>> which need to be taken up again.
>>> 
>> 
>> Something to switch to open-bio-l at lists.open-bio.org for,
>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>> 
>> We can continue this thread from last summer,
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
>> ...
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
>> 
>> And CC Peter Rice from EMBOSS too - we chatted about this
>> at ISMB/BOSC 2011 in July - and whomever looks after the
>> OBDA/indexing code in BioRuby and BioJava too.
>> 
>>> A second, and maybe this is more specific to BioPerl, is that the
>>> parsers and indexers essentially reimplement the format parsing
>>> in each module, so if there are bugs they have to be independently
>>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>>> first but not the second).  The best place for any optimizations
>>> would be in a unified parser that both the SeqIO and indexer
>>> modules could use.
>> 
>> We have that problem to an extent in Biopython's Bio.SeqIO code.
>> The indexing code duplicates some logic of the parsing code
>> (how much depends on the format), sufficient to extract the read
>> ID and the bounds on disk. The two could be more unified but
>> the parsers came first and didn't want to change them at the time.
>> Instead I tried to be rigorous in consistency testing for the index
>> code's unit tests.
>> 
>> Regards,
>> 
>> Peter
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From p.j.a.cock at googlemail.com  Tue Nov  1 14:06:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 18:06:50 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
Message-ID: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>

On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>
>> I think a different indexer is needed for the scale of key/value
>> pairs we see in fastq files if we want to make a fast lookup by
>> ID. I think speed is of essence for this type of solution and so
>> a forced all records must be 4 lines long is okay for this type
>> of implementation.
>
> This can always be an early optimization, that's easy enough.
> But I'm sure we will have to deal with multi-line seq/qual
> FASTQ at some point.
>
>> I found NOSQL implementations to be much better
>> performance and than any of the BDB type solutions -- they
>> end up being really slow at above 1-5M keys. ?I used
>> TokyoCabinet and KyotoCabinet to do indexing of accession
>> -> taxonomy ID and found it quite fast for the needs. I
>> haven't tried storing 100bp reads + qual string as the
>> value in it yet but I think it could be done, certainly worth
>> a prototype.
>
> Adding a middle layer where the backend storage is abstracted
> is the probably the (best|most flexible) option, converging on a
> good default that will work for this data. ?The actual interface is
> in place, though would it be more feasible to go the OBDA
> (converge on a cross-Bio* compatible schema)? ?Or are there
> problems afoot there we're unaware of?
>
> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>
> chris

Yes, we're using SQLite3 to store essentially a list of filenames
and their format as one table, and then in the main table an
entry for each sequence recording the ID (only one accession,
unlike OBDA which had infrastructure for a secondary accession),
file number, offset of the start of the record, and optionally the
length of the record on disk.

i.e. Basically what OBDA does, but using SQLite rather
than BDB (not included in Python 3) or a flat file index
(poor performance with large datasets).

I find this design attractive on several levels:
* File format neutral, covers FASTA, FASTQ, GenBank, etc
* Preserves the original file untouched
* Index is a small single file (thanks to SQLite)
* Back end could be switched out
* Could be applied to compressed file formats
* Reuses existing parsing code to access entries

This could easily form basis of OBDA v2, the main points
of difference I anticipate between the Bio* projects would
be naming conventions for the different file formats, and
what we consider to be the default record ID of each read
(e.g. which field in a GenBank file - although agreement
here is not essential). Some of that was already settled in
principle with OBDA v1.

On the other hand, you could try and store the parsed data
itself, which is where NOSQL looks more interesting. That
essentially requires the ability to serialise your annotated
sequence object model to disk - which would be tricky to do
cross project (much more ambitious than BioSQL is). It also
means the "index" becomes very large because it now holds
all the original data.

Peter


From wenbinmei at gmail.com  Wed Nov  2 00:25:32 2011
From: wenbinmei at gmail.com (wenbin mei)
Date: Wed, 2 Nov 2011 00:25:32 -0400
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
Message-ID: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>

Hi,

I need some help in coding. I have a multiple sequence alignment which has
gaps. And also I have a reference genome sequence in the alignment which I
know all the coordinates for the protein coding genes. I want to extract
all these protein coding genes alignment from the big alignment. I am using
Bio SimpleAlign but the question is that due to the gaps in the alignment,
the coordinates has shifted in the alignment. I wonder is there a way I can
not count the gaps and still be able to extract the protein alignment. One
way I can do is remove the gaps in the reference first and then extract the
sequence. But I don't like this way ... Thank you for help.

-best,
wenbin


From dejian.zhao at gmail.com  Wed Nov  2 09:33:18 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Wed, 02 Nov 2011 21:33:18 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
	phylogenetic tree
Message-ID: <4EB1469E.4050108@gmail.com>

There are various packages on CPAN to cope with phylogenetic analysis. I 
wonder which module can read the output from other phylogenetic 
softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to 
produce a picture which combines the phylogenetic tree and the structure 
of each gene.


From roy.chaudhuri at gmail.com  Wed Nov  2 09:49:46 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 02 Nov 2011 13:49:46 +0000
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB1469E.4050108@gmail.com>
References: <4EB1469E.4050108@gmail.com>
Message-ID: <4EB14A7A.30307@gmail.com>

MEGA can export trees in Newick format, which can be read by 
Bio::TreeIO. The tree can be drawn in EPS format using 
Bio::Tree::Draw::Cladogram. See:
http://www.bioperl.org/wiki/HOWTO:Trees

Roy.

On 02/11/2011 13:33, Dejian Zhao wrote:
> There are various packages on CPAN to cope with phylogenetic analysis. I
> wonder which module can read the output from other phylogenetic
> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
> produce a picture which combines the phylogenetic tree and the structure
> of each gene.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Wed Nov  2 12:29:45 2011
From: jun.yin at ucd.ie (Jun Yin)
Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT)
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
 alignment
In-Reply-To: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie>

Hi,
 
You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
 
$aln2 = $aln->slice(20, 30);
 
Cheers,
Jun


----- Original Message -----
From: wenbin mei <wenbinmei at gmail.com>
Date: Wednesday, November 2, 2011 5:51 am
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
To: bioperl-l at lists.open-bio.org

> Hi,
> 
> I need some help in coding. I have a multiple sequence alignment 
> which has
> gaps. And also I have a reference genome sequence in the 
> alignment which I
> know all the coordinates for the protein coding genes. I want to 
> extractall these protein coding genes alignment from the big 
> alignment. I am using
> Bio SimpleAlign but the question is that due to the gaps in the 
> alignment,the coordinates has shifted in the alignment. I wonder 
> is there a way I can
> not count the gaps and still be able to extract the protein 
> alignment. One
> way I can do is remove the gaps in the reference first and then 
> extract the
> sequence. But I don't like this way ... Thank you for help.
> 
> -best,
> wenbin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dejian.zhao at gmail.com  Wed Nov  2 21:39:22 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Thu, 03 Nov 2011 09:39:22 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB14A7A.30307@gmail.com>
References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com>
Message-ID: <4EB1F0CA.80309@gmail.com>

That's great!
Many thanks, Roy.

On 2011-11-2 21:49, Roy Chaudhuri wrote:
> MEGA can export trees in Newick format, which can be read by 
> Bio::TreeIO. The tree can be drawn in EPS format using 
> Bio::Tree::Draw::Cladogram. See:
> http://www.bioperl.org/wiki/HOWTO:Trees
>
> Roy.
>
> On 02/11/2011 13:33, Dejian Zhao wrote:
>> There are various packages on CPAN to cope with phylogenetic analysis. I
>> wonder which module can read the output from other phylogenetic
>> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
>> produce a picture which combines the phylogenetic tree and the structure
>> of each gene.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From noncoding at gmail.com  Thu Nov  3 05:59:26 2011
From: noncoding at gmail.com (Remo Sanges)
Date: Thu, 03 Nov 2011 10:59:26 +0100
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
	alignment
In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
	<7300ecdd1dd56.4eb16ff9@ucd.ie>
Message-ID: <4EB265FE.30909@gmail.com>

To get the location in the initial sequence starting from a column in a 
multiple alignment you can:

1) create a Bio::LocatableSeq compliant object by using the method 
each_seq_with_id on the SimpleAlign object

2) then using the method location_from_column on the created 
LocatableSeq object

HTH

ERemo


-- 
Remo Sanges
Bioinformatics - Animal Physiology and Evolution
Stazione Zoologica Anton Dohrn
Villa Comunale, 80121 Napoli - Italy
+39 081 5833428


On 11/2/11 5:29 PM, Jun Yin wrote:
> Hi,
>
> You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
>
> $aln2 = $aln->slice(20, 30);
>
> Cheers,
> Jun
>
>
> ----- Original Message -----
> From: wenbin mei<wenbinmei at gmail.com>
> Date: Wednesday, November 2, 2011 5:51 am
> Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
> To: bioperl-l at lists.open-bio.org
>
>> Hi,
>>
>> I need some help in coding. I have a multiple sequence alignment
>> which has
>> gaps. And also I have a reference genome sequence in the
>> alignment which I
>> know all the coordinates for the protein coding genes. I want to
>> extractall these protein coding genes alignment from the big
>> alignment. I am using
>> Bio SimpleAlign but the question is that due to the gaps in the
>> alignment,the coordinates has shifted in the alignment. I wonder
>> is there a way I can
>> not count the gaps and still be able to extract the protein
>> alignment. One
>> way I can do is remove the gaps in the reference first and then
>> extract the
>> sequence. But I don't like this way ... Thank you for help.
>>
>> -best,
>> wenbin
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Gallone at sms.ed.ac.uk  Thu Nov  3 07:50:11 2011
From: G.Gallone at sms.ed.ac.uk (Giuseppe G.)
Date: Thu, 03 Nov 2011 11:50:11 +0000
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk>

Hi,

I would be grateful if you could shed some light on the exact meaning of 
the method overall_percentage_identity in Bio::SimpleAlign.

If I understand correctly, the method works by considering only 
aminoacids that are identical over all the members of the alignment, and 
then averaging over the total number of aminoacids in the sequence. Is 
this correct?

Thank you
Giuseppe
-- 

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Thu Nov  3 09:22:21 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 3 Nov 2011 14:22:21 +0100
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk>
References: <4EB27FF3.9050203@sms.ed.ac.uk>
Message-ID: <CAM3TQQWm46SWfu-6DANDaoppi8oLKGuzwGm8uxkVkf_JAog3xg@mail.gmail.com>

Hi Giuseppe,

If I understand correctly, the method works by considering only aminoacids
> that are identical over all the members of the alignment


Yes.


> , and then averaging over the total number of aminoacids in the sequence.
> Is this correct?
>

Almost.

By default, the denominator is the alignment length, namely the length of
the MSA including gaps. By means of the 'short' and 'long' options, it's
also possible to use the shortest or longest sequence's ungapped lengths as
the denominator.


Dave


From cjfields at illinois.edu  Thu Nov  3 14:28:36 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 18:28:36 +0000
Subject: [Bioperl-l] OBDA redux? was Re:  Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
	<CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
Message-ID: <ED419B5E-9C55-478F-BDD6-C2B663ABE636@illinois.edu>

(side thread, so re-titling...)

On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>> 
>>> I think a different indexer is needed for the scale of key/value
>>> pairs we see in fastq files if we want to make a fast lookup by
>>> ID. I think speed is of essence for this type of solution and so
>>> a forced all records must be 4 lines long is okay for this type
>>> of implementation.
>> 
>> This can always be an early optimization, that's easy enough.
>> But I'm sure we will have to deal with multi-line seq/qual
>> FASTQ at some point.
>> 
>>> I found NOSQL implementations to be much better
>>> performance and than any of the BDB type solutions -- they
>>> end up being really slow at above 1-5M keys.  I used
>>> TokyoCabinet and KyotoCabinet to do indexing of accession
>>> -> taxonomy ID and found it quite fast for the needs. I
>>> haven't tried storing 100bp reads + qual string as the
>>> value in it yet but I think it could be done, certainly worth
>>> a prototype.
>> 
>> Adding a middle layer where the backend storage is abstracted
>> is the probably the (best|most flexible) option, converging on a
>> good default that will work for this data.  The actual interface is
>> in place, though would it be more feasible to go the OBDA
>> (converge on a cross-Bio* compatible schema)?  Or are there
>> problems afoot there we're unaware of?
>> 
>> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>> 
>> chris
> 
> Yes, we're using SQLite3 to store essentially a list of filenames
> and their format as one table, and then in the main table an
> entry for each sequence recording the ID (only one accession,
> unlike OBDA which had infrastructure for a secondary accession),
> file number, offset of the start of the record, and optionally the
> length of the record on disk.
> 
> i.e. Basically what OBDA does, but using SQLite rather
> than BDB (not included in Python 3) or a flat file index
> (poor performance with large datasets).
> 
> I find this design attractive on several levels:
> * File format neutral, covers FASTA, FASTQ, GenBank, etc
> * Preserves the original file untouched
> * Index is a small single file (thanks to SQLite)
> * Back end could be switched out
> * Could be applied to compressed file formats
> * Reuses existing parsing code to access entries
> 
> This could easily form basis of OBDA v2, the main points
> of difference I anticipate between the Bio* projects would
> be naming conventions for the different file formats, and
> what we consider to be the default record ID of each read
> (e.g. which field in a GenBank file - although agreement
> here is not essential). Some of that was already settled in
> principle with OBDA v1.

The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested).

> On the other hand, you could try and store the parsed data
> itself, which is where NOSQL looks more interesting. That
> essentially requires the ability to serialise your annotated
> sequence object model to disk - which would be tricky to do
> cross project (much more ambitious than BioSQL is). It also
> means the "index" becomes very large because it now holds
> all the original data.
> 
> Peter

For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc).  Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs).  Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully.

Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs.

chris


From p.j.a.cock at googlemail.com  Thu Nov  3 14:52:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 3 Nov 2011 18:52:50 +0000
Subject: [Bioperl-l] OBDA redux?
Message-ID: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>

On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> (side thread, so re-titling...)
>

And CC'ing open-bio-l, which is a better home for this than bioperl-l,
where OBDA v2 talk came up again in discussion of a BioPerl indexing
problem. Archive links for thread here:

http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>
>> Yes, we're using SQLite3 to store essentially a list of filenames
>> and their format as one table, and then in the main table an
>> entry for each sequence recording the ID (only one accession,
>> unlike OBDA which had infrastructure for a secondary accession),
>> file number, offset of the start of the record, and optionally the
>> length of the record on disk.
>>
>> i.e. Basically what OBDA does, but using SQLite rather
>> than BDB (not included in Python 3) or a flat file index
>> (poor performance with large datasets).
>>
>> I find this design attractive on several levels:
>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>> * Preserves the original file untouched
>> * Index is a small single file (thanks to SQLite)
>> * Back end could be switched out
>> * Could be applied to compressed file formats
>> * Reuses existing parsing code to access entries
>>
>> This could easily form basis of OBDA v2, the main points
>> of difference I anticipate between the Bio* projects would
>> be naming conventions for the different file formats, and
>> what we consider to be the default record ID of each read
>> (e.g. which field in a GenBank file - although agreement
>> here is not essential). Some of that was already settled in
>> principle with OBDA v1.
>
> The primary/secondary IDs could be configurable with a sane
> default, I think the bioperl implementations allowed this (and
> it is certainly something that will be requested).

One reason I went with a single ID only was to keep the
Python dictionary based API simple (think hash in Perl).
You don't get secondary keys in a Python dict or a hash ;)

As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
can provide a call back function to map the suggested ID to
something else. Obviously this doesn't give the full flexibility
of extracting a field from the record's annotation because we
don't parse the whole record during indexing (it would be too
slow).

However, I'm happy for there to be an *optional* secondary
key in an OBDA v2 SQLite schema, but Biopython probably
won't populate it. We could optionally use it rather than the
primary ID on loading an existing index though.

Personally I would stick with one key in the index - it should
be faster and makes it simpler to switch the back end if we
need to later. If anyone wants a second key, they can build
a second index *grin*.

>> On the other hand, you could try and store the parsed data
>> itself, which is where NOSQL looks more interesting. That
>> essentially requires the ability to serialise your annotated
>> sequence object model to disk - which would be tricky to do
>> cross project (much more ambitious than BioSQL is). It also
>> means the "index" becomes very large because it now holds
>> all the original data.
>>
>> Peter
>
> For a fully cross-Bio* compliant format, I don't think it's feasible
> to use serialized data unless they are serialized in something
> that is easily deserialized across HLLs (JSON, BSON, YAML,
> XML, etc).  Either that, or such data is stored concurrently with
> the binary blob, along with meta data that indicates the source
> of the blob, parser, version, etc, etc (unless there are tools out
> there that reliably interconvert serialized complex data structures
> between HLLs).  Anyway you go about it, it seems like it could
> be a major ball of hurt, unless implemented very carefully.

You missed out RDF as a serialisation ;)

But yes, going down the shared serialisation route is going
to be messy - as you are well aware:

> Aside: I think this was one of the problems with
> Bio::DB::SeqFeature::Store, in that it at one point stored
> Perl-specific Storable blobs.
>
> chris

Peter


From cjfields at illinois.edu  Thu Nov  3 15:47:51 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 19:47:51 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
Message-ID: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>

On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:

> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> (side thread, so re-titling...)
>> 
> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
> where OBDA v2 talk came up again in discussion of a BioPerl indexing
> problem. Archive links for thread here:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

yes, good idea...

>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>> 
>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>> and their format as one table, and then in the main table an
>>> entry for each sequence recording the ID (only one accession,
>>> unlike OBDA which had infrastructure for a secondary accession),
>>> file number, offset of the start of the record, and optionally the
>>> length of the record on disk.
>>> 
>>> i.e. Basically what OBDA does, but using SQLite rather
>>> than BDB (not included in Python 3) or a flat file index
>>> (poor performance with large datasets).
>>> 
>>> I find this design attractive on several levels:
>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>> * Preserves the original file untouched
>>> * Index is a small single file (thanks to SQLite)
>>> * Back end could be switched out
>>> * Could be applied to compressed file formats
>>> * Reuses existing parsing code to access entries
>>> 
>>> This could easily form basis of OBDA v2, the main points
>>> of difference I anticipate between the Bio* projects would
>>> be naming conventions for the different file formats, and
>>> what we consider to be the default record ID of each read
>>> (e.g. which field in a GenBank file - although agreement
>>> here is not essential). Some of that was already settled in
>>> principle with OBDA v1.
>> 
>> The primary/secondary IDs could be configurable with a sane
>> default, I think the bioperl implementations allowed this (and
>> it is certainly something that will be requested).
> 
> One reason I went with a single ID only was to keep the
> Python dictionary based API simple (think hash in Perl).
> You don't get secondary keys in a Python dict or a hash ;)
> 
> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
> can provide a call back function to map the suggested ID to
> something else. Obviously this doesn't give the full flexibility
> of extracting a field from the record's annotation because we
> don't parse the whole record during indexing (it would be too
> slow).

Same with bioperl.

> However, I'm happy for there to be an *optional* secondary
> key in an OBDA v2 SQLite schema, but Biopython probably
> won't populate it. We could optionally use it rather than the
> primary ID on loading an existing index though.

Optional implementation of that is fine by me.

> Personally I would stick with one key in the index - it should
> be faster and makes it simpler to switch the back end if we
> need to later. If anyone wants a second key, they can build
> a second index *grin*.

That's easy enough.

>>> On the other hand, you could try and store the parsed data
>>> itself, which is where NOSQL looks more interesting. That
>>> essentially requires the ability to serialise your annotated
>>> sequence object model to disk - which would be tricky to do
>>> cross project (much more ambitious than BioSQL is). It also
>>> means the "index" becomes very large because it now holds
>>> all the original data.
>>> 
>>> Peter
>> 
>> For a fully cross-Bio* compliant format, I don't think it's feasible
>> to use serialized data unless they are serialized in something
>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>> XML, etc).  Either that, or such data is stored concurrently with
>> the binary blob, along with meta data that indicates the source
>> of the blob, parser, version, etc, etc (unless there are tools out
>> there that reliably interconvert serialized complex data structures
>> between HLLs).  Anyway you go about it, it seems like it could
>> be a major ball of hurt, unless implemented very carefully.
> 
> You missed out RDF as a serialisation ;)
> 
> But yes, going down the shared serialisation route is going
> to be messy - as you are well aware:
> 
>> Aside: I think this was one of the problems with
>> Bio::DB::SeqFeature::Store, in that it at one point stored
>> Perl-specific Storable blobs.
>> 
>> chris
> 
> Peter

yes, it's a problem w/o an easy solution.  Anyway, I think an implementation of such at this point would be a premature optimization.  

chris


From biojiangke at gmail.com  Tue Nov  8 17:29:54 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST)
Subject: [Bioperl-l] Some questions about the Bio::PopGen
In-Reply-To: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
References: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
Message-ID: <32805996.post@talk.nabble.com>


I think the pi calculated in the function isn't really the pi as defined. You
need to divide the value by total number of sites (in your case, it's 5,
which is not your individual number but sequence length). I think the reason
they implemented this way is that sometimes it's easier to work only with
variable sites. 

The aln to population function converts an aln object to a population
object. You can't really see the object unless you write additional codes to
write it out or do some calculations on it. 

The third question depends on your specific needs. For population level
analyses of molecular evolution, I usually create a multiple sequence
alignment with other applications (clustalw etc), then manually adjust the
alignments to make sure they represent homology. I wouldn't touch the
alignment once this is done but only make an aln (or whatever format you
want) for inputting to analyses applications, like Bio::PopGen (usually use
the aln_to_population function you're using now).


Qian Zhao wrote:
> 
> Hi
> Recently, I am learning how to caculate pi, Fst, Tajima D using
> Bio::PopGen.
> I am not familiar with Perl and I am really confused with the following
> problems.
> (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used
> to caculate is this:
>     __DATA__
> 01 A01 A
> 01 A02 A
> 01 A03 A
> 01 A04 A
> 01 A05 A
> 02 A01 A
> 02 A02 T
> 02 A03 T
> 02 A04 T
> 02 A05 T
> 03 A01 G
> 03 A02 G
> 03 A03 G
> 03 A04 G
> 03 A05 G
> 04 A01 G
> 04 A02 G
> 04 A03 C
> 04 A04 C
> 04 A05 G
> 05 A01 T
> 05 A02 C
> 05 A03 T
> 05 A04 T
> 05 A05 T
> And I am not sure if I can use these sequences below to demostrate the
> prettybase format above:
>>A01
> AAGGT
>>A02
> ATGGC
>>A03
> ATGCT
>>A04
> ATGCT
>>A05
> ATGGT
> The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I
> use DnaSP. I find that if the 1.4/5=0.28, which means that if the number
> from Bio::PopGen::Statistics is divided by the individula number, the
> result
> would be exactly the same. Is there something wrong in my perl script? The
> code I used was below:
> #/usr/bin/perl -w
> use warnings;
> use strict;
> use Bio::PopGen::Genotype;
>  my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'gene_1',
>                                            -individual_id => '001',
>                                            -alleles       => ['1','5'] );
> use Bio::PopGen::Individual;
>  my $ind = Bio::PopGen::Individual->new(-unique_id  => '001',
>                                         -genotypes  => [$genotype] );
> $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  use Bio::PopGen::Population;
>  my $pop = Bio::PopGen::Population->new(-name        => 'Bm',
>                                         -description => 'description',
>                                         -individuals => [$ind] );
> use Bio::PopGen::IO;
> use Bio::PopGen::Statistics;
> my $nummarkers = $pop->get_marker_names;
> my $stats = Bio::PopGen::Statistics->new();
> my $io = Bio::PopGen::IO->new (-format => 'prettybase',
>                                -file => '1.txt');
> if( my $pop = $io->next_population ) {
>     my $pi = $stats->pi($pop, $nummarkers);
>     print "pi is $pi\n";
> my @inds;
>     for my $ind ( $pop->get_Individuals ) {
>         if( $ind->unique_id =~ /A0[1-3]/ ) {
>             push @inds, $ind;
>         }
>     }
>     print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n";
> }
> 
> (2) I want to use Bio::PopGen::Utilities to translate the alignment file
> to
> the population file. However, I can not find the result file after the
> program. I use the following code:
> use Bio::PopGen::Utilities;
>   use Bio::AlignIO;
> 
>   my $in = Bio::AlignIO->new(-file   => 't/data/t7.aln',
>                             -format => 'clustalw');
> my $aln = $in->next_aln;
> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln);
> my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model =>
> 'cod',
>                                                          -alignment  =>
> $aln);
> I am not sure where I should add my result file' name in the code.
> (3) If my file contains a lot of individual sequences and one individual
> has
> one genotype. I'd like to know how can I use the  Bio::PopGen::Individual,
> Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which
> can used in Bio::PopGen::Statistics ?
> 
> I will be great appreciated if I can get the answers. Thanks for your time
> and Best Wishes!
>                                                    Qian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biojiangke at gmail.com  Tue Nov  8 17:51:22 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST)
Subject: [Bioperl-l] questions about the bioperl module
 Bio::PopGen::Statistics
In-Reply-To: <201106012030039537050@gmail.com>
References: <201106012030039537050@gmail.com>
Message-ID: <32805997.post@talk.nabble.com>


If you read the Bio::PopGen doc, you'll see there is an optional argument for
the function that calculates pi, which is taking the number of sites into
consideration. Also, when you use the aln_to_population function to input an
alignment, you can use the option to take in all sites, including the
monomorphic sites. I think if you implement both in your script, you'll get
the same pi value as from other applications like DnaSP.

In terms of sliding window analyses, you may have to implement your own
method to move along the windows, but I think DnaSP is ready to do that, you
don't have to write your won script.
  

lvu.jun wrote:
> 
> Hi, there,
> I am trying to calculate the population genetics parameters such as pi
> using the bioperl module Bio::PopGen::Statistics. But I found that the
> method only requires the input of the marker genotype of every individuals
> for the population. I don't know why the module does not take the DNA
> sequence length into consideration when calculating the pi value.
> According to the definition of the pi value, besides the polymorphic
> sites, we also need the monomorphic sites that should be incorporated in
> the denominator when doing the calculation. Is it right? therefore I'm
> confused about the module, who can tell me why it can correctly calculate
> the pi value only with the marker(polymorphic) genotype?
> Another question, if I want to calculate the pi value using the sliding
> window along the genome, how can I do this using the
> Bio::PopGen::Statistics module?
> Thanks for your help!
> Yours sincerely,
> Jun
> 
> Chinese Academy of Sciences
> 
> 2011-06-01 
> 
> 
> 
> lvu.jun 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From shachigahoimbi at gmail.com  Wed Nov  9 00:22:33 2011
From: shachigahoimbi at gmail.com (Shachi Gahoi)
Date: Wed, 9 Nov 2011 10:52:33 +0530
Subject: [Bioperl-l] Run FGENESH using bioperl
Message-ID: <CACyyM1ZOiMspVH3hF4fJOvedw=8YzZDuuzJRHsuJUJ=mkuYyng@mail.gmail.com>

Dear All.

I have multi-fasta sequence file and I want to run FGENESH and I would like
to run the FGENESH for sequence one by one stored in multi-fasta sequence
file.

Is it possible using Bioperl ?

Please guide me.

Thanks in advance.


-- 
Regards,
Shachi


From pankajt322 at gmail.com  Thu Nov  3 08:12:44 2011
From: pankajt322 at gmail.com (pankaj)
Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT)
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
Message-ID: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>


On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
> Dear all,
>
> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
> from fasta file and then I want to rename same file with that ORF ID
> "PITG_14194".
>
> I have many files and I want to do same exercise with all sequence files.
>
> Please tell me how can i do this with perl or bioperl.
>
> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>
> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From azaballos at isciii.es  Wed Nov  9 06:28:21 2011
From: azaballos at isciii.es (Angel Zaballos)
Date: Wed, 9 Nov 2011 12:28:21 +0100
Subject: [Bioperl-l] bp_genbank2gff.pl  bug
Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>

Running bp_genbank2gff.pl got this:

[root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff
Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251.


?ngel Zaballos
Unidad de Gen?mica
Centro Nacional de Microbiolog?a-ISCIII
Carretera Majadahonda-Pozuelo, Km 2,2
28220-Majadahonda

Tel: 918223994
mail:  azaballos at isciii.es


************************* AVISO LEGAL *************************
Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
pudiendo contener documentos anexos de car?cter privado y confidencial.
Si por error, ha recibido este mensaje y no se encuentra entre los
destinatarios, por favor, no use, informe, distribuya, imprima o copie su
contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
cuando no responda a las funciones atribuidas al remitente del mismo por la
normativa vigente.


From scott at scottcain.net  Wed Nov  9 11:12:02 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 11:12:02 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
Message-ID: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>

Hi Angel,

I would suggest using bp_genbank2gff3.pl, as it is more actively
maintained; the bp_genbank2gff.pl script hasn't really been touched in many
years, and I imagine it's suffering from some serious code rot.

Scott


2011/11/9 Angel Zaballos <azaballos at isciii.es>

> Running bp_genbank2gff.pl got this:
>
> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> AAXT01000001.1 > babesichr3.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
>
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 11:13:10 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 16:13:10 +0000
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
	<bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
Message-ID: <CAPOrs_030887wt=T7ZJyDUid92poO+FX4kKkRFTzWweXi5ffvw@mail.gmail.com>

On 3 November 2011 12:12, pankaj <pankajt322 at gmail.com> wrote:
>
>
> On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
>> Dear all,
>>
>> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
>> from fasta file and then I want to rename same file with that ORF ID
>> "PITG_14194".
>>
>> I have many files and I want to do same exercise with all sequence files.
>>
>> Please tell me how can i do this with perl or bioperl.
>>
>> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>>
>> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
>> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
>> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
>> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
>> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
>> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
>> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
>> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
>> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>>

---------- Forwarded message ----------
From: Jason Stajich <jason.stajich at gmail.com>
Date: 21 October 2011 10:56
Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl
To: Shachi Gahoi <shachigahoimbi at gmail.com>
Cc: bioperl-l at bioperl.org


easy to do this with a simple regular expression and opening a new
file.  Have you read up on this concept in Perl.
You can use SeqIO to parse FASTA files - did you read the HOWTO and
website documentation first?

We don't typically do people's work for them on this mailing list so
please show some effort first.


From scott at scottcain.net  Wed Nov  9 13:43:00 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 13:43:00 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
Message-ID: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>

Hi Chris,

Actually, removing it from the distribution (but letting it remain in the
code repository) is not a bad idea.  I can't really think of a down side.

Scott


2011/11/9 Fields, Christopher J <cjfields at illinois.edu>

> Scott,
>
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or
> remove it altogether)?
>
> chris
>
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
>
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus
> destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie
> su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III
> no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este
> mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo
> por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 13:39:52 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 18:39:52 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>

Scott,

Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?

chris

On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:

> Hi Angel,
> 
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> years, and I imagine it's suffering from some serious code rot.
> 
> Scott
> 
> 
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> 
>> Running bp_genbank2gff.pl got this:
>> 
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> 
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov  9 14:51:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 19:51:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <C0212F3D-AFD7-41A4-9649-B876FAFA7C02@illinois.edu>

Scott,

It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder.  Either would prevent it from being packaged and installed in future versions.  

(Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules)

chris

On Nov 9, 2011, at 12:43 PM, Scott Cain wrote:

> Hi Chris,
> 
> Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea.  I can't really think of a down side.
> 
> Scott
> 
> 
> 2011/11/9 Fields, Christopher J <cjfields at illinois.edu>
> Scott,
> 
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?
> 
> chris
> 
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
> 
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 15:39:17 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 20:39:17 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>

On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> Hi Chris,
>
> Actually, removing it from the distribution (but letting it remain in the
> code repository) is not a bad idea. ?I can't really think of a down side.
>
> Scott

Hi

can I suggest instead to simply make the script issue a warning right
at the start? Something like "bp_genbank2gff is obsolete and will be
removed from a future version of bioerl; please use bp_genbank2gff3
instead". You could leave it there for the next 2 releases and then
finally remove it. This would have 2 advantages:

1) people that have been using it will immediately know what to use as
replacement (instead of coming and ask in the mailing list)?
2) people who use it but don't know anything about the subject,
someone told them to "just press this button" or "just type this in
the terminal", won't have suddenly a broken system and will have time
to find someone that will make it work again.

That's what's done in GNU octave and I think it works good there.
Carn?


From scott at scottcain.net  Wed Nov  9 15:48:07 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 15:48:07 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
Message-ID: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>

Hi Carn?,

You are absolutely correct; that is the right way to do it.  I'll add that
right now (and if the original posts fix is an easy one, I'll fix that too
:-)

Scott


2011/11/9 Carn? Draug <carandraug+dev at gmail.com>

> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
>
> Hi
>
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
>
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
>
> That's what's done in GNU octave and I think it works good there.
> Carn?
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 16:59:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 21:59:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
	<CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
Message-ID: <C86AC2F8-F8E8-431D-83A6-39E896C23485@illinois.edu>

Works for me, it's a standard deprecation policy.  The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning).

chris

On Nov 9, 2011, at 2:48 PM, Scott Cain wrote:

> Hi Carn?,
> 
> You are absolutely correct; that is the right way to do it.  I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-)
> 
> Scott
> 
> 
> 2011/11/9 Carn? Draug <carandraug+dev at gmail.com>
> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
> 
> Hi
> 
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
> 
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
> 
> That's what's done in GNU octave and I think it works good there.
> Carn?
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From biopython at maubp.freeserve.co.uk  Thu Nov 10 08:09:40 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 13:09:40 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <31659982.post@talk.nabble.com>
References: <31659982.post@talk.nabble.com>
Message-ID: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>

Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html

On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>
> I received the following error while trying to run bl2seq from
> standaloneblastplus. Has anyone else encountered this problem?
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: /usr/bin/blastp call crashed: There was a problem running
> /usr/bin/blastp : Error: NCBI C++ Exception:
>
> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
> access NULL pointer.
>
> Thank you,
> Ryan

Just hit something very very similar, looks like a BLAST+ bug which I
will report now:

$ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
Error: NCBI C++ Exception:
    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
Attempt to access NULL pointer.

This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
BLAST 2.2.24+ (blastp) from the look of the error. The line number has
changed by one, but I'm confident it is the same point of failure.

In my case I was comparing nucleotide against nucleotide, so should
have been using tblastx not tblastn, but it still shouldn't have had a
pointer exception.

Peter


From cjfields at illinois.edu  Thu Nov 10 09:00:46 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 14:00:46 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI
	C++	Exception
In-Reply-To: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
Message-ID: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>

On Nov 10, 2011, at 7:09 AM, Peter wrote:

> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
> 
> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>> 
>> I received the following error while trying to run bl2seq from
>> standaloneblastplus. Has anyone else encountered this problem?
>> 
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: /usr/bin/blastp call crashed: There was a problem running
>> /usr/bin/blastp : Error: NCBI C++ Exception:
>> 
>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>> access NULL pointer.
>> 
>> Thank you,
>> Ryan
> 
> Just hit something very very similar, looks like a BLAST+ bug which I
> will report now:
> 
> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
> Error: NCBI C++ Exception:
>    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
> Attempt to access NULL pointer.
> 
> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
> changed by one, but I'm confident it is the same point of failure.
> 
> In my case I was comparing nucleotide against nucleotide, so should
> have been using tblastx not tblastn, but it still shouldn't have had a
> pointer exception.
> 
> Peter

Yeah, that's bad.  I have seen a few things like this myself that make me worry about the transition to BLAST+.

chris

PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?


From casaburi at ceinge.unina.it  Thu Nov 10 07:29:55 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST)
Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
Message-ID: <32818254.post@talk.nabble.com>


Hi everybody,

i have some reads (454) where there are adaptors (NNNN...), one,two or three
adaptors for each reads depending on the reads. Is there any way to
establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
over the total ???

>271-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>272-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>273-88
GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>274-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA

The problem is that some adpators occur in the middle of the sequences
because they coming out from a concameration experimental design (they are
miRNAs between NNNNNN...). So i want to know a script or tool that may say
how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
number of reads. Do you know any tool/script that may help ? Tnx 
Can anyone suggests me a script to fix this ???

Thank you very much 
-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jovel_juan at hotmail.com  Thu Nov 10 11:06:16 2011
From: jovel_juan at hotmail.com (Juan Jovel)
Date: Thu, 10 Nov 2011 16:06:16 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <32818254.post@talk.nabble.com>
References: <32818254.post@talk.nabble.com>
Message-ID: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>


There are many ways to do it. 
Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
For example: 
$adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
You then place that result in a hash bin:
my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
# Then you can sort and output your classes
foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }

You can workout the details, but something like this should work.


> Date: Thu, 10 Nov 2011 04:29:55 -0800
> From: casaburi at ceinge.unina.it
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
> 
> 
> Hi everybody,
> 
> i have some reads (454) where there are adaptors (NNNN...), one,two or three
> adaptors for each reads depending on the reads. Is there any way to
> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
> over the total ???
> 
> >271-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
> >272-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
> >273-88
> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
> >274-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
> 
> The problem is that some adpators occur in the middle of the sequences
> because they coming out from a concameration experimental design (they are
> miRNAs between NNNNNN...). So i want to know a script or tool that may say
> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
> number of reads. Do you know any tool/script that may help ? Tnx 
> Can anyone suggests me a script to fix this ???
> 
> Thank you very much 
> -- 
> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From scott at scottcain.net  Thu Nov 10 11:55:53 2011
From: scott at scottcain.net (Scott Cain)
Date: Thu, 10 Nov 2011 11:55:53 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
Message-ID: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>

Hi Angel,

Please keep correspondence on the mailing list.

I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
and it worked fine.  I suspect there is something wrong with your genbank
file.

Scott


On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:

> His Scott,
>
> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
> happened:
>
> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
> babesichr3_2.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
> UNIVERSAL->import is deprecated and will be removed in a future perl at
> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>
> However, the output file seems to be correct (Indeed, that was also the
> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>
> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
> babesiachr3_2.gff
> Reading babesiachr3_2.gff
> Read 4776 transcripts in 8821 lines in 1 files
>   4776 groups 1 seqs 1 sources 6 feature types
> 2379 gene predictions
>
> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
> Mac with Parallels. Maybe tis is the cause for such a message.
>
> Regards
>
>
> ?ngel
>
>
> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>
> Hi Angel,
>
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many years, and I imagine it's suffering from some serious code rot.
>
> Scott
>
>
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>
>> Running bp_genbank2gff.pl got this:
>>
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>>
>>
>>
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>>
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>>
>>
>>
>>
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este
>> mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por
>> la
>> normativa vigente.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From l.m.timmermans at students.uu.nl  Thu Nov 10 12:17:12 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Thu, 10 Nov 2011 18:17:12 +0100
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <CAC1jpXAW_MTQjBY8Z8ffr67g_0TrGwWddixuQvtTB19+S+DLVg@mail.gmail.com>

On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel <jovel_juan at hotmail.com> wrote:

>
> There are many ways to do it.
> Perhaps the simplest is to count the number of times the adapter sequence
> (or part of it) appears in each read.
> For example:
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;#
> $adapter_matches will store the number of times the adapter sequence is
> repeated.
>

No, it will not. tr/// will count characters, not sequences. Something like
?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH.

Leon


From cjfields at illinois.edu  Thu Nov 10 14:17:57 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 19:17:57 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
	<CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu>

This is running using an older version of bioperl (probably 1.6.0 or 1.6.1).  The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed.

chris

On Nov 10, 2011, at 10:55 AM, Scott Cain wrote:

> Hi Angel,
> 
> Please keep correspondence on the mailing list.
> 
> I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
> and it worked fine.  I suspect there is something wrong with your genbank
> file.
> 
> Scott
> 
> 
> On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:
> 
>> His Scott,
>> 
>> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
>> happened:
>> 
>> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
>> babesichr3_2.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> UNIVERSAL->import is deprecated and will be removed in a future perl at
>> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>> 
>> However, the output file seems to be correct (Indeed, that was also the
>> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>> 
>> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
>> babesiachr3_2.gff
>> Reading babesiachr3_2.gff
>> Read 4776 transcripts in 8821 lines in 1 files
>>  4776 groups 1 seqs 1 sources 6 feature types
>> 2379 gene predictions
>> 
>> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
>> Mac with Parallels. Maybe tis is the cause for such a message.
>> 
>> Regards
>> 
>> 
>> ?ngel
>> 
>> 
>> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>> 
>> Hi Angel,
>> 
>> I would suggest using bp_genbank2gff3.pl, as it is more actively
>> maintained; the bp_genbank2gff.pl script hasn't really been touched in
>> many years, and I imagine it's suffering from some serious code rot.
>> 
>> Scott
>> 
>> 
>> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>> 
>>> Running bp_genbank2gff.pl got this:
>>> 
>>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>>> AAXT01000001.1 > babesichr3.gff
>>> Replacement list is longer than search list at
>>> /usr/share/perl5/Bio/Range.pm line 251.
>>> 
>>> 
>>> 
>>> ?ngel Zaballos
>>> Unidad de Gen?mica
>>> Centro Nacional de Microbiolog?a-ISCIII
>>> Carretera Majadahonda-Pozuelo, Km 2,2
>>> 28220-Majadahonda
>>> 
>>> Tel: 918223994
>>> mail:  azaballos at isciii.es
>>> 
>>> 
>>> 
>>> 
>>> ************************* AVISO LEGAL *************************
>>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>>> Si por error, ha recibido este mensaje y no se encuentra entre los
>>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>>> asume ning?n tipo de responsabilidad legal por el contenido de este
>>> mensaje
>>> cuando no responda a las funciones atribuidas al remitente del mismo por
>>> la
>>> normativa vigente.
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Thu Nov 10 14:27:22 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 19:27:22 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
	<B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
Message-ID: <CAKVJ-_4+hGzxmn43qJ4SkJfCaPUQw=PkV5QSjUyqPSDmyVw64A@mail.gmail.com>

On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 10, 2011, at 7:09 AM, Peter wrote:
>
>> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
>>
>> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>>>
>>> I received the following error while trying to run bl2seq from
>>> standaloneblastplus. Has anyone else encountered this problem?
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: /usr/bin/blastp call crashed: There was a problem running
>>> /usr/bin/blastp : Error: NCBI C++ Exception:
>>>
>>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>>> access NULL pointer.
>>>
>>> Thank you,
>>> Ryan
>>
>> Just hit something very very similar, looks like a BLAST+ bug which I
>> will report now:
>>
>> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
>> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
>> Error: NCBI C++ Exception:
>> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
>> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
>> Attempt to access NULL pointer.
>>
>> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
>> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
>> changed by one, but I'm confident it is the same point of failure.
>>
>> In my case I was comparing nucleotide against nucleotide, so should
>> have been using tblastx not tblastn, but it still shouldn't have had a
>> pointer exception.
>>
>> Peter
>
> Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+.
>
> chris

I'm told is already fixed and will be part of BLAST 2.2.26+ which is good.

>
> PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?
>

Maybe once, but it was in the archive and my email account.

Peter


From anna.fr at gmail.com  Thu Nov 10 15:01:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 09:01:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
Message-ID: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>

Hi all

Does anyone know if there is a way to get a Taxonomy node and/or
taxonid from a gi number using the flatfile with taxonomy db?

I have blast output that I want to append taxonomic information to. I
have hundreds of thousands of items to do this for, so it's not
practical to use entrez to query the?NCBI database.

I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
think much too large to put into a hash!

This was also discussed in 2009:
http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
don't think there was a conclusion?

Thanks for your help
Anna Friedlander


From shalabh.sharma7 at gmail.com  Thu Nov 10 15:12:09 2011
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 10 Nov 2011 15:12:09 -0500
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>

Hi Anna,
           I think the thread you mentioned was started by me.
That time i wrote few scripts to map gi to taxa, after some time i found
some other efficient ways also. But recently 'Miguel Pignatelli' directed
to some Bio-LITE modules that are really helpful.

These are the modules he mentioned, i found them really easy to use and
very efficient.

Bio-LITE-Taxonomy-0.07
Bio-LITE-Taxonomy-NCBI-0.07
Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04

Cheers
Shalabh

On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:

> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From cjfields at illinois.edu  Thu Nov 10 15:23:14 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 20:23:14 +0000
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu>

Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option).  I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups.

chris

On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote:

> Hi Anna,
>           I think the thread you mentioned was started by me.
> That time i wrote few scripts to map gi to taxa, after some time i found
> some other efficient ways also. But recently 'Miguel Pignatelli' directed
> to some Bio-LITE modules that are really helpful.
> 
> These are the modules he mentioned, i found them really easy to use and
> very efficient.
> 
> Bio-LITE-Taxonomy-0.07
> Bio-LITE-Taxonomy-NCBI-0.07
> Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04
> 
> Cheers
> Shalabh
> 
> On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> 
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> Shalabh Sharma
> Scientific Computing Professional Associate (Bioinformatics Specialist)
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Thu Nov 10 15:51:13 2011
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 10 Nov 2011 21:51:13 +0100
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>

Hi Anna,

Jason changed his example script from using hashes to using SQLite:
bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom

See
https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl

It's an example script that shows how to do the tax to gi mapping for
blast reports.


Bernd

On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Thu Nov 10 16:13:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 21:13:12 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>

If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split?  Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)).  

tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match.  '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/).

chris

On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote:

> 
> There are many ways to do it. 
> Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
> For example: 
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
> You then place that result in a hash bin:
> my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
> # Then you can sort and output your classes
> foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }
> 
> You can workout the details, but something like this should work.
> 
> 
> 
> 
> 
> 
> 
>> Date: Thu, 10 Nov 2011 04:29:55 -0800
>> From: casaburi at ceinge.unina.it
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
>> 
>> 
>> Hi everybody,
>> 
>> i have some reads (454) where there are adaptors (NNNN...), one,two or three
>> adaptors for each reads depending on the reads. Is there any way to
>> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
>> over the total ???
>> 
>>> 271-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>>> 272-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>>> 273-88
>> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>>> 274-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
>> 
>> The problem is that some adpators occur in the middle of the sequences
>> because they coming out from a concameration experimental design (they are
>> miRNAs between NNNNNN...). So i want to know a script or tool that may say
>> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
>> number of reads. Do you know any tool/script that may help ? Tnx 
>> Can anyone suggests me a script to fix this ???
>> 
>> Thank you very much 
>> -- 
>> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 		 	   		  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Thu Nov 10 16:15:29 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Thu, 10 Nov 2011 13:15:29 -0800
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>

Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI
and then a second db to store GI -> TAXONID

This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string.

https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl

That's the first 165 lines, and then lookups are basically what you see on line 195.

Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?).
one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading.

Jason

On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:

> Hi Anna,
> 
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
> 
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
> 
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
> 
> 
> Bernd
> 
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anna.fr at gmail.com  Thu Nov 10 20:07:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 14:07:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
	<1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
Message-ID: <CALv2E+09JeJiXPUoZphNZnaVhWM9mstkhhp+=1Jvs6Hjy3c+uA@mail.gmail.com>

thanks all for the fast responses.

I'll try the bio-lite modules shalabh mentioned

On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich <jason.stajich at gmail.com> wrote:
> Here's another variant of one I wrote which is for my own purposes, the code
> at the beginning uses a NOSQL solution to storing all the ACC -> GI
> and then a second db to store GI -> TAXONID
> This is the case where I have a file of accession numbers and I want to add
> to the description line the taxonomy string.
> https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl
> That's the first 165 lines, and then lookups are basically what you see on
> line 195.
> Would be good to rewrite that script below to use TokyoCabinent
> or?KyotoCabinent (is newer implementation, not sure if it is faster?).
> one thing that this does is take up a lot of disk space ,but you can have
> tradeoffs between than and which compression scheme you use, which will
> impact performance of loading.
> Jason
> On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:
>
> Hi Anna,
>
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
>
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
>
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
>
>
> Bernd
>
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
>
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
>
> have hundreds of thousands of items to do this for, so it's not
>
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>
> think much too large to put into a hash!
>
> This was also discussed in 2009:
>
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>
> don't think there was a conclusion?
>
> Thanks for your help
>
> Anna Friedlander
>
> _______________________________________________
>
> Bioperl-l mailing list
>
> Bioperl-l at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From arun_innovative90 at yahoo.com  Fri Nov 11 06:09:46 2011
From: arun_innovative90 at yahoo.com (Arun Kumar)
Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST)
Subject: [Bioperl-l] BIOPERL MATERIAL
Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>

Hi team, 
?
?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl.
?
Thanks in advance

Thanks & Regards,
Arunkumar.d


From awitney at sgul.ac.uk  Fri Nov 11 08:23:29 2011
From: awitney at sgul.ac.uk (Adam Witney)
Date: Fri, 11 Nov 2011 13:23:29 +0000
Subject: [Bioperl-l] BIOPERL MATERIAL
In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
Message-ID: <EA1DBB02-0280-4207-97E7-A116C058A615@sgul.ac.uk>


All BioPerl documents can be found here:

http://www.bioperl.org/wiki/Main_Page

And a useful place to start would be the HOWTOs:

http://www.bioperl.org/wiki/HOWTOs

regards

adam


On 11 Nov 2011, at 11:09, Arun Kumar wrote:

> Hi team, 
>  
>    This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with  bioperl.
>  
> Thanks in advance
> 
> Thanks & Regards,
> Arunkumar.d
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From casaburi at ceinge.unina.it  Fri Nov 11 07:13:50 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825229.post@talk.nabble.com>


Hi thank you for your answer !!! 

At the end i tried this script and seems to work for this purpose:


perl -pe
's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g'
Scrivania/orchidea/Fiore/Mydata.fasta > result.txt


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From casaburi at ceinge.unina.it  Fri Nov 11 07:21:29 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825274.post@talk.nabble.com>


Thanks everybody for answering me so soon !!! Probably another way may be:

perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print
"$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt 


and/or with 'nawk':

nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i "
ADAPTOR"}' myFile.fasta > result.txt 

They give the same result. If you will have this problem try these, work
good !!!

Still Thanks to all,

Giorgio


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Sun Nov 13 07:24:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:24:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
Message-ID: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>

On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:
>
>> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> (side thread, so re-titling...)
>>>
>> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
>> where OBDA v2 talk came up again in discussion of a BioPerl indexing
>> problem. Archive links for thread here:
>>
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html
>
> yes, good idea...

I've not CC'd the bioperl-l anymore.

>>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>>>
>>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>>> and their format as one table, and then in the main table an
>>>> entry for each sequence recording the ID (only one accession,
>>>> unlike OBDA which had infrastructure for a secondary accession),
>>>> file number, offset of the start of the record, and optionally the
>>>> length of the record on disk.
>>>>
>>>> i.e. Basically what OBDA does, but using SQLite rather
>>>> than BDB (not included in Python 3) or a flat file index
>>>> (poor performance with large datasets).
>>>>
>>>> I find this design attractive on several levels:
>>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>>> * Preserves the original file untouched
>>>> * Index is a small single file (thanks to SQLite)
>>>> * Back end could be switched out
>>>> * Could be applied to compressed file formats
>>>> * Reuses existing parsing code to access entries
>>>>
>>>> This could easily form basis of OBDA v2, the main points
>>>> of difference I anticipate between the Bio* projects would
>>>> be naming conventions for the different file formats, and
>>>> what we consider to be the default record ID of each read
>>>> (e.g. which field in a GenBank file - although agreement
>>>> here is not essential). Some of that was already settled in
>>>> principle with OBDA v1.
>>>
>>> The primary/secondary IDs could be configurable with a sane
>>> default, I think the bioperl implementations allowed this (and
>>> it is certainly something that will be requested).
>>
>> One reason I went with a single ID only was to keep the
>> Python dictionary based API simple (think hash in Perl).
>> You don't get secondary keys in a Python dict or a hash ;)
>>
>> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
>> can provide a call back function to map the suggested ID to
>> something else. Obviously this doesn't give the full flexibility
>> of extracting a field from the record's annotation because we
>> don't parse the whole record during indexing (it would be too
>> slow).
>
> Same with bioperl.
>
>> However, I'm happy for there to be an *optional* secondary
>> key in an OBDA v2 SQLite schema, but Biopython probably
>> won't populate it. We could optionally use it rather than the
>> primary ID on loading an existing index though.
>
> Optional implementation of that is fine by me.
>
>> Personally I would stick with one key in the index - it should
>> be faster and makes it simpler to switch the back end if we
>> need to later. If anyone wants a second key, they can build
>> a second index *grin*.
>
> That's easy enough.
>
>>>> On the other hand, you could try and store the parsed data
>>>> itself, which is where NOSQL looks more interesting. That
>>>> essentially requires the ability to serialise your annotated
>>>> sequence object model to disk - which would be tricky to do
>>>> cross project (much more ambitious than BioSQL is). It also
>>>> means the "index" becomes very large because it now holds
>>>> all the original data.
>>>>
>>>> Peter
>>>
>>> For a fully cross-Bio* compliant format, I don't think it's feasible
>>> to use serialized data unless they are serialized in something
>>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>>> XML, etc). ?Either that, or such data is stored concurrently with
>>> the binary blob, along with meta data that indicates the source
>>> of the blob, parser, version, etc, etc (unless there are tools out
>>> there that reliably interconvert serialized complex data structures
>>> between HLLs). ?Anyway you go about it, it seems like it could
>>> be a major ball of hurt, unless implemented very carefully.
>>
>> You missed out RDF as a serialisation ;)
>>
>> But yes, going down the shared serialisation route is going
>> to be messy - as you are well aware:
>>
>>> Aside: I think this was one of the problems with
>>> Bio::DB::SeqFeature::Store, in that it at one point stored
>>> Perl-specific Storable blobs.
>>>
>>> chris
>>
>> Peter
>
> yes, it's a problem w/o an easy solution. ?Anyway, I think an
> implementation of such at this point would be a premature
> optimization.
>
> chris

So, Chris and I seem in general agreement that an OBDA v2
using SQLite but based on essentially the same approach as
the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
mapping record identifiers to file offsets in the original sequence
files.

I hope to get BioRuby on board, they already have an OBDA
v1 support so that shouldn't be too hard.

Right now I don't recall if BioJava has/had OBDA v1 support,
and if they did if it was affected in their recent move to BioJava
v3 (I understand from their mailing list that some older lower
priority functionality has not all been ported yet).

Also EMBOSS are likely to be interested, certainly Peter Rice
was interested in the SQLite indexing we're already using in
Biopython for sequence files (i.e. what is effectively the
prototype for OBDA v2).

Note that in addition to simple indexing of text files, we are
already using the same simple offset + length approach for
indexing binary files (e.g. SFF).

On the immediate practical side, I think I can edit the
current OBDA website of http://obda.open-bio.org/
via /home/websites/obda.open-bio.org/html on the
server.

We need to work out where the current OBDA indexing
specification lives (CVS or SVN?) and perhaps move
that to github. We may need a general OBF organisation
account on git hub for this and any other cross-project
repositories.

I see there is already an OBDA project on RedMine,
(Chris can you add me to that please?)
https://redmine.open-bio.org/projects/obda

Peter


From p.j.a.cock at googlemail.com  Sun Nov 13 07:30:37 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:30:37 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
Message-ID: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>

Hi again,

I've retitled this as it is a little off topic from the main OBDA redux thread,
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html

As far as I recall, the original flat file and BDB based OBDA
specification for indexing sequencing files didn't cover
compressed files. That might be something to consider
(although we should sort of uncompressed text/binary
files first).

I've recently been experimenting with using compressed
files - in particular simple GZIP files (ignoring any block structure)
and BGZF (the specialised gzipped blocking used in BAM), see:

http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
http://seqanswers.com/forums/showthread.php?t=15347

The virtual offset approach used in BGZF squeezes a 16 bit
within block offset (thus limiting you to 64kb blocks) and at
48 bit block start offset (thus limiting you to a 256TB file) into
a single 64bit "virtual" offset. That makes sense if you are
keeping the lookup table or many offsets in memory, and
can be used as is with code expecting a single offset (like
the current Biopython SQLite index schema).

Also bzip2 but this is block based, with the block size ranging
from 100KB to 900KB.

http://bzip.org/
http://bzip.org/1.0.5/bzip2-manual-1.0.5.html

I haven't tried any performance tests yet, which would
be interesting as I believe compression/decompression
of bfzip2 is more costly in CPU terms than gzip (although
both will be block size dependent).

If we wanted to imitate the BGZF virtual offset scheme for
arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme
could use 20 bits to cover bz2 blocks of up to 900KB, leaving
64 - 20 = 44 bits for the start offset, thus limiting you to to just
2^44 bytes or 16Tb which sounds OK only in the medium term.
On the bright side this could be used to index any BZIP2 file
(under 16TB), whereas BGZF cannot be applied to any
GZIP file.

On the other hand, storing the block start and within block
separately is truly generic and could be used on any blocked
GZIP file (including BGZF) and BZIP2 etc. It would make
the SQLite schema a bit more complicated though.

Maybe something to consider for the next revision to OBDA,
and focus on the non-compressed case for now?

Regards,

Peter


From p.j.a.cock at googlemail.com  Sun Nov 13 07:32:12 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:32:12 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
In-Reply-To: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
References: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
Message-ID: <CAKVJ-_7G639PJBZFLE8mQPT=0LXeTWaf54U0tbMgh6XWfUAKtQ@mail.gmail.com>

On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi again,
>
> I've retitled this as it is a little off topic from the main OBDA redux thread,
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html
>
> As far as I recall, the original flat file and BDB based OBDA
> specification for indexing sequencing files didn't cover
> compressed files. That might be something to consider
> (although we should sort of uncompressed text/binary
> files first).

Sorry - didn't meant to include bioperl-l on that, although it may be
of interest to you guys anyway.

Peter


From jluis.lavin at unavarra.es  Mon Nov 14 06:14:43 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 12:14:43 +0100
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
Message-ID: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Mon Nov 14 06:59:56 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 06:59:56 -0500
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
	<CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
Message-ID: <FDFB72A5-E38C-4637-9415-5A15E4C5B551@gmail.com>

if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.

If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?  

On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:

> Hello everybody,
> 
> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> worked fine for me. Now I need to perform a multiple BLAST search, but this
> time I'd just like to get all the BLAST results in a single out file
> instead of having each sequence's report written individually. I've read
> the documentation of the module, but due to my short
> experience/understanding on complex modules as this one seems to be I can't
> figure out where to change the script to achieve my previously mentioned
> aim.
> Here I post the script I've been using (it's basically the one posted on
> the module cookbook).
> 
> #!/c:/Perl -w
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> use Data::Dumper;
> 
> #Here i set the parameters for blast
> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> tblastx):\n";
> my $blst = <STDIN>;
> my $prog = "$blst";
> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> env_nr):\n";
> my $dtb = <STDIN>;
> $db = "$dtb";
> print "Enter your cutt off score (1e-n):\n";
> my $cut = <STDIN>;
> my $e_val = "$cut";
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
> 
> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #Select the file and make the blast.
> print "Enter your FASTA file:\n";
> chomp(my $infile = <STDIN>);
> my $r = $remoteBlast->submit_blast($infile);
>  my $v = 1;
> 
>    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> TO RETURN!!!!!
>    while ( my @rids = $remoteBlast->each_rid ) {
>      foreach my $rid ( @rids ) {
>        my $rc = $remoteBlast->retrieve_blast($rid);
>        if( !ref($rc) ) {
>          if( $rc < 0 ) {
>            $remoteBlast->remove_rid($rid);
>          }
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>        } else {
>          my $result = $rc->next_result();
>          #save the output
>          my $filename =
> $result->query_name()."\.out";##################open SALIDA,
> '>>'."$^T"."Report"."\.out";
>          $remoteBlast->save_output($filename);#############
>          $remoteBlast->remove_rid($rid);
>          print "\nQuery Name: ", $result->query_name(), "\n";
>          while ( my $hit = $result->next_hit ) {
>            next unless ( $v > 0);
>            print "\thit name is ", $hit->name, "\n";
>            while( my $hsp = $hit->next_hsp ) {
>              print "\t\tscore is ", $hsp->score, "\n";
>            }
>          }
>        }
>      }
>    }
> 
> 
> May any of you please explain me how to solve my question?
> 
> Thanks in advence
> 
> With best wishes
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Mon Nov 14 09:07:36 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 09:07:36 -0500
Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single
	out
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>

Please keep this on list discussions 

Sent from my iPhone-please excuse typos

--
Jason Stajich

Begin forwarded message:

> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> Date: November 14, 2011 8:04:25 AM EST
> To: Jason Stajich <jason.stajich at gmail.com>
> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
> 
> Hello Jason,
> 
> As answering your question:
> 
> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
> 
> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
> 
> Thanks in advance
> 
> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
> 
> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
> 
> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> 
> > Hello everybody,
> >
> > I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> > worked fine for me. Now I need to perform a multiple BLAST search, but this
> > time I'd just like to get all the BLAST results in a single out file
> > instead of having each sequence's report written individually. I've read
> > the documentation of the module, but due to my short
> > experience/understanding on complex modules as this one seems to be I can't
> > figure out where to change the script to achieve my previously mentioned
> > aim.
> > Here I post the script I've been using (it's basically the one posted on
> > the module cookbook).
> >
> > #!/c:/Perl -w
> > use Bio::Tools::Run::RemoteBlast;
> > use Bio::SearchIO;
> > use Data::Dumper;
> >
> > #Here i set the parameters for blast
> > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> > tblastx):\n";
> > my $blst = <STDIN>;
> > my $prog = "$blst";
> > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> > env_nr):\n";
> > my $dtb = <STDIN>;
> > $db = "$dtb";
> > print "Enter your cutt off score (1e-n):\n";
> > my $cut = <STDIN>;
> > my $e_val = "$cut";
> >
> > my @params = ( '-prog' => $prog,
> >         '-data' => $db,
> >         '-expect' => $e_val,
> >         '-readmethod' => 'SearchIO' );
> >
> > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #Select the file and make the blast.
> > print "Enter your FASTA file:\n";
> > chomp(my $infile = <STDIN>);
> > my $r = $remoteBlast->submit_blast($infile);
> >  my $v = 1;
> >
> >    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> > TO RETURN!!!!!
> >    while ( my @rids = $remoteBlast->each_rid ) {
> >      foreach my $rid ( @rids ) {
> >        my $rc = $remoteBlast->retrieve_blast($rid);
> >        if( !ref($rc) ) {
> >          if( $rc < 0 ) {
> >            $remoteBlast->remove_rid($rid);
> >          }
> >          print STDERR "." if ( $v > 0 );
> >          sleep 5;
> >        } else {
> >          my $result = $rc->next_result();
> >          #save the output
> >          my $filename =
> > $result->query_name()."\.out";##################open SALIDA,
> > '>>'."$^T"."Report"."\.out";
> >          $remoteBlast->save_output($filename);#############
> >          $remoteBlast->remove_rid($rid);
> >          print "\nQuery Name: ", $result->query_name(), "\n";
> >          while ( my $hit = $result->next_hit ) {
> >            next unless ( $v > 0);
> >            print "\thit name is ", $hit->name, "\n";
> >            while( my $hsp = $hit->next_hsp ) {
> >              print "\t\tscore is ", $hsp->score, "\n";
> >            }
> >          }
> >        }
> >      }
> >    }
> >
> >
> > May any of you please explain me how to solve my question?
> >
> > Thanks in advence
> >
> > With best wishes
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> >
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN


From cl134 at duke.edu  Sun Nov 13 09:42:05 2011
From: cl134 at duke.edu (Cheng-Ruei Lee)
Date: Sun, 13 Nov 2011 09:42:05 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>

Hi all,

     Bioperl version: 1.006
     Here are two error messages when I'm using this module to  
calculate Fu & Li's statistics:
Illegal division by zero at (the Statistics.pm file) line 359
Illegal division by zero at (the Statistics.pm file) line 376
     A further tracking down shows that the first error happens when  
$n (sample size in the ingroup) equals 1 or 2, and the second error  
happens when $n equals 3. This is not really a bug though. I would  
suggest either in the original code, do a checking before the  
calculation (and skip the current calculation when $n == 1, 2, or 3 -  
rather than let the whole program die), or add a few lines of notes in  
the CPAN page.

Sincerely,
Cheng-Ruei Lee


From joluito at gmail.com  Mon Nov 14 04:21:31 2011
From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 10:21:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
Message-ID: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From cjfields at illinois.edu  Mon Nov 14 12:02:22 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:02:22 +0000
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>

Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

chris

On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:

> Please keep this on list discussions 
> 
> Sent from my iPhone-please excuse typos
> 
> --
> Jason Stajich
> 
> Begin forwarded message:
> 
>> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> Date: November 14, 2011 8:04:25 AM EST
>> To: Jason Stajich <jason.stajich at gmail.com>
>> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
>> 
>> Hello Jason,
>> 
>> As answering your question:
>> 
>> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
>> 
>> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
>> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
>> 
>> Thanks in advance
>> 
>> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
>> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
>> 
>> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
>> 
>> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> 
>>> Hello everybody,
>>> 
>>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
>>> worked fine for me. Now I need to perform a multiple BLAST search, but this
>>> time I'd just like to get all the BLAST results in a single out file
>>> instead of having each sequence's report written individually. I've read
>>> the documentation of the module, but due to my short
>>> experience/understanding on complex modules as this one seems to be I can't
>>> figure out where to change the script to achieve my previously mentioned
>>> aim.
>>> Here I post the script I've been using (it's basically the one posted on
>>> the module cookbook).
>>> 
>>> #!/c:/Perl -w
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::SearchIO;
>>> use Data::Dumper;
>>> 
>>> #Here i set the parameters for blast
>>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>>> tblastx):\n";
>>> my $blst = <STDIN>;
>>> my $prog = "$blst";
>>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
>>> env_nr):\n";
>>> my $dtb = <STDIN>;
>>> $db = "$dtb";
>>> print "Enter your cutt off score (1e-n):\n";
>>> my $cut = <STDIN>;
>>> my $e_val = "$cut";
>>> 
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO' );
>>> 
>>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>>> 
>>> 
>>> #Select the file and make the blast.
>>> print "Enter your FASTA file:\n";
>>> chomp(my $infile = <STDIN>);
>>> my $r = $remoteBlast->submit_blast($infile);
>>> my $v = 1;
>>> 
>>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
>>> TO RETURN!!!!!
>>>   while ( my @rids = $remoteBlast->each_rid ) {
>>>     foreach my $rid ( @rids ) {
>>>       my $rc = $remoteBlast->retrieve_blast($rid);
>>>       if( !ref($rc) ) {
>>>         if( $rc < 0 ) {
>>>           $remoteBlast->remove_rid($rid);
>>>         }
>>>         print STDERR "." if ( $v > 0 );
>>>         sleep 5;
>>>       } else {
>>>         my $result = $rc->next_result();
>>>         #save the output
>>>         my $filename =
>>> $result->query_name()."\.out";##################open SALIDA,
>>> '>>'."$^T"."Report"."\.out";
>>>         $remoteBlast->save_output($filename);#############
>>>         $remoteBlast->remove_rid($rid);
>>>         print "\nQuery Name: ", $result->query_name(), "\n";
>>>         while ( my $hit = $result->next_hit ) {
>>>           next unless ( $v > 0);
>>>           print "\thit name is ", $hit->name, "\n";
>>>           while( my $hsp = $hit->next_hsp ) {
>>>             print "\t\tscore is ", $hsp->score, "\n";
>>>           }
>>>         }
>>>       }
>>>     }
>>>   }
>>> 
>>> 
>>> May any of you please explain me how to solve my question?
>>> 
>>> Thanks in advence
>>> 
>>> With best wishes
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> 
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
>> -- 
>> -- 
>> Dr. Jos? Luis Lav?n Trueba
>> 
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 12:03:04 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:03:04 +0000
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <E385D24C-E562-43B9-A820-2A7C59E9399A@illinois.edu>

Cheng,

Have you tried the latest CPAN release (we're at 1.006901).

chris

On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 12:59:35 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:59:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
	<CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu>

On Nov 13, 2011, at 6:24 AM, Peter Cock wrote:

> So, Chris and I seem in general agreement that an OBDA v2
> using SQLite but based on essentially the same approach as
> the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
> mapping record identifiers to file offsets in the original sequence
> files.

The worry I have is adhering to a specific backend (e.g. SQLite).  The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets.  Who's to say something similar won't happen to SQLite, or that it is the best option available?  

Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed).  

> I hope to get BioRuby on board, they already have an OBDA
> v1 support so that shouldn't be too hard.
> 
> Right now I don't recall if BioJava has/had OBDA v1 support,
> and if they did if it was affected in their recent move to BioJava
> v3 (I understand from their mailing list that some older lower
> priority functionality has not all been ported yet).

I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?)

> Also EMBOSS are likely to be interested, certainly Peter Rice
> was interested in the SQLite indexing we're already using in
> Biopython for sequence files (i.e. what is effectively the
> prototype for OBDA v2).
> 
> Note that in addition to simple indexing of text files, we are
> already using the same simple offset + length approach for
> indexing binary files (e.g. SFF).

I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well.

> On the immediate practical side, I think I can edit the
> current OBDA website of http://obda.open-bio.org/
> via /home/websites/obda.open-bio.org/html on the
> server.

See below w/ regards to my thoughts on the wiki.

> We need to work out where the current OBDA indexing
> specification lives (CVS or SVN?) and perhaps move
> that to github. We may need a general OBF organisation
> account on git hub for this and any other cross-project
> repositories.

+1 to a move to github, but maybe this belongs in an OBF-specific organization.  And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. 

> I see there is already an OBDA project on RedMine,
> (Chris can you add me to that please?)
> https://redmine.open-bio.org/projects/obda
> 
> Peter

Done (last night actually, but I didn't have time to respond immediately).

chris


From David.Messina at sbc.su.se  Mon Nov 14 14:31:18 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Nov 2011 20:31:18 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se>


> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

Yes, it's the --remote option. I've used it, and it works great.

The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers.


Dave

> 


From jluis.lavin at unavarra.es  Mon Nov 14 16:23:31 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 22:23:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>

Thank you very much for your answers, but due to them, I'm afraid I didn't
explained myself good enough.

 I'm not looking for another tool to perform a BLAST task. I was just
wondering if there was a way to simply change the way the module writes the
outputs, so that I can get multiple searches in a single report file
instead of having a report for each BLAST search.

Maybe there's some issue I ignore, that makes you recommend the use of
other tools instead of the Bioperl Remote BLAST module...it would be
appreciated if you let me know about that (NCBI server problems with
web-services or so)...

Thank you for your answers anyway

Best wishes

2011/11/14 Fields, Christopher J <cjfields at illinois.edu>

> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the
> various 'blast*' indicating the search is to use a remote database.  I
> haven't used it, though...
>
> chris
>
> On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>
> > Please keep this on list discussions
> >
> > Sent from my iPhone-please excuse typos
> >
> > --
> > Jason Stajich
> >
> > Begin forwarded message:
> >
> >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> >> Date: November 14, 2011 8:04:25 AM EST
> >> To: Jason Stajich <jason.stajich at gmail.com>
> >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
> single out
> >>
> >> Hello Jason,
> >>
> >> As answering your question:
> >>
> >> " If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a
> table?"
> >>
> >> A concatenation of BLAST (default format) reports should be OK, since I
> have a script to parse that kind of results. Anyway formats 1 or 2 will
> also do the trick.
> >> I'll be happy to get assistance  on how to change the OUTFILE from "a
> query a report" to "all queries in the same report", because I don't seem
> to be able to do it myself after reading the module documentation.
> >>
> >> Thanks in advance
> >>
> >> El 14 de noviembre de 2011 12:59, Jason Stajich <
> jason.stajich at gmail.com> escribi?:
> >> if you want to do a bunch of BLASTs remotely on the cmdline you should
> also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
> equivalent). This might be faster to do and easier since you need to learn
> the programming part too.
> >>
> >> If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a table?
> >>
> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> >>
> >>> Hello everybody,
> >>>
> >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> >>> worked fine for me. Now I need to perform a multiple BLAST search, but
> this
> >>> time I'd just like to get all the BLAST results in a single out file
> >>> instead of having each sequence's report written individually. I've
> read
> >>> the documentation of the module, but due to my short
> >>> experience/understanding on complex modules as this one seems to be I
> can't
> >>> figure out where to change the script to achieve my previously
> mentioned
> >>> aim.
> >>> Here I post the script I've been using (it's basically the one posted
> on
> >>> the module cookbook).
> >>>
> >>> #!/c:/Perl -w
> >>> use Bio::Tools::Run::RemoteBlast;
> >>> use Bio::SearchIO;
> >>> use Data::Dumper;
> >>>
> >>> #Here i set the parameters for blast
> >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> >>> tblastx):\n";
> >>> my $blst = <STDIN>;
> >>> my $prog = "$blst";
> >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat,
> pdb,
> >>> env_nr):\n";
> >>> my $dtb = <STDIN>;
> >>> $db = "$dtb";
> >>> print "Enter your cutt off score (1e-n):\n";
> >>> my $cut = <STDIN>;
> >>> my $e_val = "$cut";
> >>>
> >>> my @params = ( '-prog' => $prog,
> >>>        '-data' => $db,
> >>>        '-expect' => $e_val,
> >>>        '-readmethod' => 'SearchIO' );
> >>>
> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>
> >>>
> >>> #Select the file and make the blast.
> >>> print "Enter your FASTA file:\n";
> >>> chomp(my $infile = <STDIN>);
> >>> my $r = $remoteBlast->submit_blast($infile);
> >>> my $v = 1;
> >>>
> >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
> RESULTS
> >>> TO RETURN!!!!!
> >>>   while ( my @rids = $remoteBlast->each_rid ) {
> >>>     foreach my $rid ( @rids ) {
> >>>       my $rc = $remoteBlast->retrieve_blast($rid);
> >>>       if( !ref($rc) ) {
> >>>         if( $rc < 0 ) {
> >>>           $remoteBlast->remove_rid($rid);
> >>>         }
> >>>         print STDERR "." if ( $v > 0 );
> >>>         sleep 5;
> >>>       } else {
> >>>         my $result = $rc->next_result();
> >>>         #save the output
> >>>         my $filename =
> >>> $result->query_name()."\.out";##################open SALIDA,
> >>> '>>'."$^T"."Report"."\.out";
> >>>         $remoteBlast->save_output($filename);#############
> >>>         $remoteBlast->remove_rid($rid);
> >>>         print "\nQuery Name: ", $result->query_name(), "\n";
> >>>         while ( my $hit = $result->next_hit ) {
> >>>           next unless ( $v > 0);
> >>>           print "\thit name is ", $hit->name, "\n";
> >>>           while( my $hsp = $hit->next_hsp ) {
> >>>             print "\t\tscore is ", $hsp->score, "\n";
> >>>           }
> >>>         }
> >>>       }
> >>>     }
> >>>   }
> >>>
> >>>
> >>> May any of you please explain me how to solve my question?
> >>>
> >>> Thanks in advence
> >>>
> >>> With best wishes
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >>
> >> --
> >> --
> >> Dr. Jos? Luis Lav?n Trueba
> >>
> >> Dpto. de Producci?n Agraria
> >> Grupo de Gen?tica y Microbiolog?a
> >> Universidad P?blica de Navarra
> >> 31006 Pamplona
> >> Navarra
> >> SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Mon Nov 14 22:53:19 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 22:53:19 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com>

sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming.  I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. 

https://redmine.open-bio.org/issues/3313

Jason

Can you provide a test script and we'll add a test for this so 
On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cchehoud at gmail.com  Mon Nov 14 20:39:32 2011
From: cchehoud at gmail.com (Christel Chehoud)
Date: Mon, 14 Nov 2011 17:39:32 -0800
Subject: [Bioperl-l] Bioperl installation help
Message-ID: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>

Dear BioPerl,
Thank you for creating such useful code. Unfortunately, every time I
try to install Bioperl, it takes me a long time and is a challenging
ordeal :( I am a new MAC user and was not able to download bioperl
using CPAN. Here is the error I am getting:

ERROR: Can't create '/usr/local/bin'
Do not have write permissions on '/usr/local/bin'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
  CJFIELDS/BioPerl-1.6.0.tar.gz
  ./Build install  -- NOT OK
----
  You may have to su to root to install the package
  (Or you may want to run something like
    o conf make_install_make_command 'sudo make'
  to raise your permissions.Warning (usually harmless): 'YAML' not
installed, will not store persistent state
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
failure ignored because 'force' in effect


so I did:
cpan> o conf make_install_make_command 'sudo make'
followed by
cpan> o conf commit

and started over..I got the same number of errors as last time (so I
decided not to force install this time). do you have any suggestions:

63 tests and 305 subtests skipped.
Failed 11/329 test scripts. 981/17708 subtests failed.
Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
117.20 CPU)
Failed 11/329 test programs. 981/17708 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Warning (usually harmless): 'YAML' not installed, will not store
persistent state
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO


Thanks a lot for your time and help.  I appreciate it.

Thank you,
Christel


From casaburi at ceinge.unina.it  Tue Nov 15 04:25:25 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST)
Subject: [Bioperl-l]  Blast > parsing result in Exel
Message-ID: <32846407.post@talk.nabble.com>


Hy everybody,

in this situation froma blast (-m 1) result file :

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 132-291
(59 letters)

Database: Scrivania/orchidea/mature_mirBase.fa
21,643 sequences; 470,608 total letters

Searching..................................................done


Score E
Sequences producing significant alignments: (bits) Value

mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031
mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031
gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9
gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9
mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9

132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59
12631 5 .............. 18
12630 5 .............. 18
7826 5 ........... 15
7644 19 ........... 9
5394 3 ........... 13
5394 3 ........... 13
BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
...
....
..........

______________________________________________________________
I need to parse in an exel sheet :

1)ID 2)Name of the hit 3)E-value 4)Score 5)Species


1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula


Is possible from a big blast result file obtain an exel with 5 columns where
every field is the first hit of the blast result. Can anyone halp me to fix
this problem ??? Also with a little script in perl.


Thank you very much
-- 
View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From nisa.dar10 at gmail.com  Tue Nov 15 19:49:00 2011
From: nisa.dar10 at gmail.com (nisa.dar)
Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST)
Subject: [Bioperl-l]  print alignment from blast results file
Message-ID: <32851673.post@talk.nabble.com>


Hi,

I am parsing a blast results file. I have found bioperl modules to get query
string, homology string and hit string for each hit/hsp. I want to print
them in the form of an alignment instead of aligning them individually.

this is what I am doing, but it doesn't seem correct

while (my $hsp = $hit->next_hsp) {
                                        my
$start_query_num=$hsp->start('query');
					my $query_string=$hsp->query_string;
					my $end_query_num=$hsp->end('query');
					my $homology_string=$hsp->homology_string;
					my $start_hit_num=$hsp->start('hit');
					my $hit_string=$hsp->hit_string;
					my $end_hit_num=$hsp->end('hit');
					my $aln_o = $hsp->get_aln;
					$query_string=~s/\n//g;#get rid of new line characters
					$homology_string=~s/\n//g;
					$hit_string=~s/\n//g;

                         print "<h3>Alignment:</h3><br />";
			print "$start_query_num-$query_string-$end_query_num<br />";
			print "   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
   
            
}

Please let me know how can I print them in the form of an alignment as seen
in the blast results file.

Thanks


-- 
View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Wed Nov 16 04:11:40 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Nov 2011 09:11:40 +0000
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAKVJ-_5PTZttkHXS-FB-tOxhDRCty_qJH9PTurDWn2M5p3VzSw@mail.gmail.com>

On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C <casaburi at ceinge.unina.it> wrote:
>
> Hy everybody,
>
> in this situation froma blast (-m 1) result file :
>
> ...
>
> I need to parse in an exel sheet :
>
> 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species
>
>
> 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula
>
> Is possible from a big blast result file obtain an exel with 5 columns where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>
> Thank you very much

Have you looked at any of the BioPerl BLAST parsing examples? e.g
http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/Module:Bio::SearchIO

See also http://seqanswers.com/forums/showthread.php?t=15489

Peter


From bosborne11 at verizon.net  Wed Nov 16 08:19:33 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 16 Nov 2011 08:19:33 -0500
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <32851673.post@talk.nabble.com>
References: <32851673.post@talk.nabble.com>
Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>

Nisa,

See:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Brian O.


On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:

> 
> Hi,
> 
> I am parsing a blast results file. I have found bioperl modules to get query
> string, homology string and hit string for each hit/hsp. I want to print
> them in the form of an alignment instead of aligning them individually.
> 
> this is what I am doing, but it doesn't seem correct
> 
> while (my $hsp = $hit->next_hsp) {
>                                        my
> $start_query_num=$hsp->start('query');
> 					my $query_string=$hsp->query_string;
> 					my $end_query_num=$hsp->end('query');
> 					my $homology_string=$hsp->homology_string;
> 					my $start_hit_num=$hsp->start('hit');
> 					my $hit_string=$hsp->hit_string;
> 					my $end_hit_num=$hsp->end('hit');
> 					my $aln_o = $hsp->get_aln;
> 					$query_string=~s/\n//g;#get rid of new line characters
> 					$homology_string=~s/\n//g;
> 					$hit_string=~s/\n//g;
> 
>                         print "<h3>Alignment:</h3><br />";
> 			print "$start_query_num-$query_string-$end_query_num<br />";
> 			print "   
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
> 
> 
> 
> }
> 
> Please let me know how can I print them in the form of an alignment as seen
> in the blast results file.
> 
> Thanks
> 
> 
> -- 
> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 11:44:27 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:44:27 +0000
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu>

For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules).  This should automatically install the latest version from CPAN.  My guess is this will address some of the issues.  However, w/o actually seeing what tests failed we can't help.

Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB.  There are instructions in the installation docs for that.  You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan.

chris

On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
> 
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
> 
> 
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
> 
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
> 
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
> CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
> 
> 
> Thanks a lot for your time and help.  I appreciate it.
> 
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 11:46:16 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:46:16 +0000
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
References: <32851673.post@talk.nabble.com>
	<035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
Message-ID: <B7768538-08CE-40A0-8EB9-5EB5169C1072@illinois.edu>

small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance).

chris

On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote:

> Nisa,
> 
> See:
> 
> http://www.bioperl.org/wiki/HOWTO:SearchIO
> 
> Brian O.
> 
> 
> On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:
> 
>> 
>> Hi,
>> 
>> I am parsing a blast results file. I have found bioperl modules to get query
>> string, homology string and hit string for each hit/hsp. I want to print
>> them in the form of an alignment instead of aligning them individually.
>> 
>> this is what I am doing, but it doesn't seem correct
>> 
>> while (my $hsp = $hit->next_hsp) {
>>                                       my
>> $start_query_num=$hsp->start('query');
>> 					my $query_string=$hsp->query_string;
>> 					my $end_query_num=$hsp->end('query');
>> 					my $homology_string=$hsp->homology_string;
>> 					my $start_hit_num=$hsp->start('hit');
>> 					my $hit_string=$hsp->hit_string;
>> 					my $end_hit_num=$hsp->end('hit');
>> 					my $aln_o = $hsp->get_aln;
>> 					$query_string=~s/\n//g;#get rid of new line characters
>> 					$homology_string=~s/\n//g;
>> 					$hit_string=~s/\n//g;
>> 
>>                        print "<h3>Alignment:</h3><br />";
>> 			print "$start_query_num-$query_string-$end_query_num<br />";
>> 			print "   
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
>> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
>> 
>> 
>> 
>> }
>> 
>> Please let me know how can I print them in the form of an alignment as seen
>> in the blast results file.
>> 
>> Thanks
>> 
>> 
>> -- 
>> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Wed Nov 16 12:01:49 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Nov 2011 18:01:49 +0100
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <CAM3TQQWDJ1_HPrAUguFfH5ngV42WeUOvXE6N2GktgmeTFs=ijw@mail.gmail.com>

Hi Christel,

Sorry to hear you're having trouble with the installation.

It looks like these modules aren't getting installed and are causing the
failed tests:
CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO

I would try installing those separately via CPAN first and then trying
again to install BioPerl.

Also, it was a good idea to set the make_install_make_command option to
CPAN, and that should have worked. Unfortunately, there's another
installation system called Module::Build that has its own option which may
need to be set:
cpan> o conf mbuild_install_build_command 'sudo ./Build'


That being said, I would suggest you grab the latest version of BioPerl
from github instead of using v1.6.1 from CPAN, which is fairly out of date
at this point.

And unless you're planning to use BioPerl with GBrowse or Bio::Graphics,
there's another, simpler way to get BioPerl up and running (assuming you
have all the prerequisites like Data::Stag installed):

See "Don't want to install BioPerl?" here:
http://www.seqxml.org/xml/BioPerl.html


Best,
Dave


On Tue, Nov 15, 2011 at 02:39, Christel Chehoud <cchehoud at gmail.com> wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
>
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>  at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm
> line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
>
>
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
>
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
>
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
>  CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
>
>
> Thanks a lot for your time and help.  I appreciate it.
>
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jluis.lavin at unavarra.es  Wed Nov 16 13:31:46 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Wed, 16 Nov 2011 19:31:46 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>
	<CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
Message-ID: <CADm9iy=mMqHhWO5rTXbJS4ZG8aG-t0mAVHqN720tnyA7Hy_nkg@mail.gmail.com>

Thank you for your answer Jason,

While answering you I figured out how to do it...sometimes you need other
people's point of view to see the light.

As you pointed out:

"what is complicaticated is the file name right now is based on the query
name."

that's what I expected that could have an easy fix, the issue about the
dependency between the outfile name and the query name, this is why I
couldn't figure out how to change the name of the output .

While reading the code to answer you, I came across the solution.

I was persistent on doing it this way because I need to run BLAST remotely
on my CGI, that's why I didn't pay attention to all the other options you
suggested. Thank you all for your sugestions anyway.

;)

Best wishes

JL


El 16 de noviembre de 2011 18:03, Jason Stajich <jason at bioperl.org>escribi?:

> the answer to your question is to move the line that opens a file to
> outside the loop. what is complicaticated is the file name right now is
> based on the query name. so you need to think how you want to name the
> file. Since this isn't obvious to you, then I think we are suggesting you
> probably need to understand programming more, and it might just be easier
> to use the tools as we have suggested rather than teaching you to modify
> what is just an example code.  our suggestions are based on the way we'd
> solve the problem so maybe you have other reasons for the direction you
> want to take.
>
> I also think it is not efficient or logical to run
> remote blast through the web protocol simply to write it back out with
> bioperl since that has to parse it in and then write it out -- why not just
> run the program that generates the output directly from NCBI. Or run BLAST
> locally for likely more efficient running.
>
>  Finally the bioperl writer may not 100% reproduce the blast output so if
> you are planning on further parsing the output that comes out from this
> script, it really doesn't seem like a good idea to launder it through
> bioperl parser first.
>
>
>
> 2011/11/14 Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>
>> Thank you very much for your answers, but due to them, I'm afraid I didn't
>> explained myself good enough.
>>
>>  I'm not looking for another tool to perform a BLAST task. I was just
>> wondering if there was a way to simply change the way the module writes
>> the
>> outputs, so that I can get multiple searches in a single report file
>> instead of having a report for each BLAST search.
>>
>> Maybe there's some issue I ignore, that makes you recommend the use of
>> other tools instead of the Bioperl Remote BLAST module...it would be
>> appreciated if you let me know about that (NCBI server problems with
>> web-services or so)...
>>
>> Thank you for your answers anyway
>>
>> Best wishes
>>
>> 2011/11/14 Fields, Christopher J <cjfields at illinois.edu>
>>
>> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for
>> the
>> > various 'blast*' indicating the search is to use a remote database.  I
>> > haven't used it, though...
>> >
>> > chris
>> >
>> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>> >
>> > > Please keep this on list discussions
>> > >
>> > > Sent from my iPhone-please excuse typos
>> > >
>> > > --
>> > > Jason Stajich
>> > >
>> > > Begin forwarded message:
>> > >
>> > >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> > >> Date: November 14, 2011 8:04:25 AM EST
>> > >> To: Jason Stajich <jason.stajich at gmail.com>
>> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
>> > single out
>> > >>
>> > >> Hello Jason,
>> > >>
>> > >> As answering your question:
>> > >>
>> > >> " If you want to do this within this code I guess the question is
>> what
>> > format you want the data in - a BLAST report or something more like a
>> > table?"
>> > >>
>> > >> A concatenation of BLAST (default format) reports should be OK,
>> since I
>> > have a script to parse that kind of results. Anyway formats 1 or 2 will
>> > also do the trick.
>> > >> I'll be happy to get assistance  on how to change the OUTFILE from "a
>> > query a report" to "all queries in the same report", because I don't
>> seem
>> > to be able to do it myself after reading the module documentation.
>> > >>
>> > >> Thanks in advance
>> > >>
>> > >> El 14 de noviembre de 2011 12:59, Jason Stajich <
>> > jason.stajich at gmail.com> escribi?:
>> > >> if you want to do a bunch of BLASTs remotely on the cmdline you
>> should
>> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
>> > equivalent). This might be faster to do and easier since you need to
>> learn
>> > the programming part too.
>> > >>
>> > >> If you want to do this within this code I guess the question is what
>> > format you want the data in - a BLAST report or something more like a
>> table?
>> > >>
>> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> > >>
>> > >>> Hello everybody,
>> > >>>
>> > >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it
>> has
>> > >>> worked fine for me. Now I need to perform a multiple BLAST search,
>> but
>> > this
>> > >>> time I'd just like to get all the BLAST results in a single out file
>> > >>> instead of having each sequence's report written individually. I've
>> > read
>> > >>> the documentation of the module, but due to my short
>> > >>> experience/understanding on complex modules as this one seems to be
>> I
>> > can't
>> > >>> figure out where to change the script to achieve my previously
>> > mentioned
>> > >>> aim.
>> > >>> Here I post the script I've been using (it's basically the one
>> posted
>> > on
>> > >>> the module cookbook).
>> > >>>
>> > >>> #!/c:/Perl -w
>> > >>> use Bio::Tools::Run::RemoteBlast;
>> > >>> use Bio::SearchIO;
>> > >>> use Data::Dumper;
>> > >>>
>> > >>> #Here i set the parameters for blast
>> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>> > >>> tblastx):\n";
>> > >>> my $blst = <STDIN>;
>> > >>> my $prog = "$blst";
>> > >>> print "Enter a database to search (nr, refseq_protein, swissprot,
>> pat,
>> > pdb,
>> > >>> env_nr):\n";
>> > >>> my $dtb = <STDIN>;
>> > >>> $db = "$dtb";
>> > >>> print "Enter your cutt off score (1e-n):\n";
>> > >>> my $cut = <STDIN>;
>> > >>> my $e_val = "$cut";
>> > >>>
>> > >>> my @params = ( '-prog' => $prog,
>> > >>>        '-data' => $db,
>> > >>>        '-expect' => $e_val,
>> > >>>        '-readmethod' => 'SearchIO' );
>> > >>>
>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>> > >>>
>> > >>>
>> > >>> #Select the file and make the blast.
>> > >>> print "Enter your FASTA file:\n";
>> > >>> chomp(my $infile = <STDIN>);
>> > >>> my $r = $remoteBlast->submit_blast($infile);
>> > >>> my $v = 1;
>> > >>>
>> > >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
>> > RESULTS
>> > >>> TO RETURN!!!!!
>> > >>>   while ( my @rids = $remoteBlast->each_rid ) {
>> > >>>     foreach my $rid ( @rids ) {
>> > >>>       my $rc = $remoteBlast->retrieve_blast($rid);
>> > >>>       if( !ref($rc) ) {
>> > >>>         if( $rc < 0 ) {
>> > >>>           $remoteBlast->remove_rid($rid);
>> > >>>         }
>> > >>>         print STDERR "." if ( $v > 0 );
>> > >>>         sleep 5;
>> > >>>       } else {
>> > >>>         my $result = $rc->next_result();
>> > >>>         #save the output
>> > >>>         my $filename =
>> > >>> $result->query_name()."\.out";##################open SALIDA,
>> > >>> '>>'."$^T"."Report"."\.out";
>> > >>>         $remoteBlast->save_output($filename);#############
>> > >>>         $remoteBlast->remove_rid($rid);
>> > >>>         print "\nQuery Name: ", $result->query_name(), "\n";
>> > >>>         while ( my $hit = $result->next_hit ) {
>> > >>>           next unless ( $v > 0);
>> > >>>           print "\thit name is ", $hit->name, "\n";
>> > >>>           while( my $hsp = $hit->next_hsp ) {
>> > >>>             print "\t\tscore is ", $hsp->score, "\n";
>> > >>>           }
>> > >>>         }
>> > >>>       }
>> > >>>     }
>> > >>>   }
>> > >>>
>> > >>>
>> > >>> May any of you please explain me how to solve my question?
>> > >>>
>> > >>> Thanks in advence
>> > >>>
>> > >>> With best wishes
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>> _______________________________________________
>> > >>> Bioperl-l mailing list
>> > >>> Bioperl-l at lists.open-bio.org
>> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> Bioperl-l mailing list
>> > >> Bioperl-l at lists.open-bio.org
>> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> --
>> > >> Dr. Jos? Luis Lav?n Trueba
>> > >>
>> > >> Dpto. de Producci?n Agraria
>> > >> Grupo de Gen?tica y Microbiolog?a
>> > >> Universidad P?blica de Navarra
>> > >> 31006 Pamplona
>> > >> Navarra
>> > >> SPAIN
>> > >
>> > > _______________________________________________
>> > > Bioperl-l mailing list
>> > > Bioperl-l at lists.open-bio.org
>> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> --
>> Dr. Jos? Luis Lav?n Trueba
>>
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From l.m.timmermans at students.uu.nl  Fri Nov 18 09:15:47 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Fri, 18 Nov 2011 15:15:47 +0100
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAC1jpXC7uBtbHb_ixzMy2idvfeFQc1Y=d8Zi3xn_=0RyGYTzrA@mail.gmail.com>

On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C <casaburi at ceinge.unina.it>wrote:

> I need to parse in an exel sheet :
>

What you're saying here is nonsense. I think you meant to say you want to
output Excel.


> Is possible from a big blast result file obtain an exel with 5 columns
> where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>

There are a number of Perl modules on CPAN for outputting Excel. Try
Excel::Writer::XLSX and Spreadsheet::WriteExcel for example.

Leon


From tzhu at mail.bnu.edu.cn  Mon Nov 21 00:17:18 2011
From: tzhu at mail.bnu.edu.cn (Tao Zhu)
Date: Mon, 21 Nov 2011 13:17:18 +0800
Subject: [Bioperl-l] Is there a "combine" method that would combine several
 sequence alignments to a single alignment?
Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn>

I can use the "slice" method to split a single sequence alignment into 
several subalignments. Then is there a corresponding "combine" method to 
combine such subalignments back?

-- 
Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
100875, China
Email: tzhu at mail.bnu.edu.cn


From David.Messina at sbc.su.se  Mon Nov 21 04:58:51 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 21 Nov 2011 10:58:51 +0100
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
Message-ID: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>

Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>


From roy.chaudhuri at gmail.com  Mon Nov 21 06:41:09 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 21 Nov 2011 11:41:09 +0000
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <4ECA38D5.8050709@gmail.com>

See the cat method in Bio::Align::Utilities:

http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat

On 21/11/2011 09:58, Dave Messina wrote:
> Hi,
>
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
>
>
> Dave
>
>
>
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu<tzhu at mail.bnu.edu.cn>  wrote:
>
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>>
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>>
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From zntayl at gmail.com  Wed Nov 16 20:07:07 2011
From: zntayl at gmail.com (Nathan Taylor)
Date: Wed, 16 Nov 2011 20:07:07 -0500
Subject: [Bioperl-l] seqIO.pm
Message-ID: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>

Hello,

   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
barring that, a file of fastas and file of quals into .phd files?

Many thanks,
Nathan


From gregonomic at yahoo.co.nz  Mon Nov 21 07:00:50 2011
From: gregonomic at yahoo.co.nz (Gregory Baillie)
Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST)
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>

Hi.

I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.

It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.

Usage:
concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>


If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').

Greg.


________________________________
 From: Dave Messina <David.Messina at sbc.su.se>
To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
Cc: BioPerl <bioperl-l at lists.open-bio.org> 
Sent: Monday, 21 November 2011 7:58 PM
Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
 
Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: concatenate_alignments.pl
Type: application/octet-stream
Size: 3349 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111121/aa673dba/attachment-0003.obj>

From jason.stajich at gmail.com  Mon Nov 21 10:31:50 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 21 Nov 2011 10:31:50 -0500
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
	<1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com>

greg  -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out.

This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment.

https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl

On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote:

> Hi.
> 
> I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.
> 
> It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.
> 
> Usage:
> concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>
> 
> 
> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').
> 
> Greg.
> 
> 
> ________________________________
> From: Dave Messina <David.Messina at sbc.su.se>
> To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
> Cc: BioPerl <bioperl-l at lists.open-bio.org> 
> Sent: Monday, 21 November 2011 7:58 PM
> Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
> 
> Hi,
> 
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
> 
> 
> Dave
> 
> 
> 
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:
> 
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>> 
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>> 
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l<concatenate_alignments.pl>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Mon Nov 21 11:15:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 21 Nov 2011 16:15:13 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
Message-ID: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>

On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter


From cjfields at illinois.edu  Mon Nov 21 11:57:29 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 21 Nov 2011 16:57:29 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu>

On Nov 21, 2011, at 10:15 AM, Peter Cock wrote:

> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
>> Hello,
>> 
>>   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
>> barring that, a file of fastas and file of quals into .phd files?
>> 
>> Many thanks,
>> Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an
> error message?
> 
> Peter

This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose.  Nathan, if you run into problems with that conversion let us know.

chris


From rondonbio at yahoo.com.br  Mon Nov 21 12:31:21 2011
From: rondonbio at yahoo.com.br (Rondon Neto)
Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST)
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>

Hi! try this script:

#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;

if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }

my $fastq = $ARGV[0];

my $in = Bio::SeqIO->new( -file => $fastq,
?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );

my $out = Bio::SeqIO->new ( -file => ">out.phd",
?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');

while (my $seq = $in->next_seq()) {
?? ? ?$out->write_seq($seq);
}

exit;


Best wishes,
Rondon, a brazilian friend.


________________________________
 De: Peter Cock <p.j.a.cock at googlemail.com>
Para: Nathan Taylor <zntayl at gmail.com> 
Cc: bioperl-l at bioperl.org 
Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
Assunto: Re: [Bioperl-l] seqIO.pm
 
On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Mon Nov 21 15:04:01 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 22 Nov 2011 09:04:01 +1300
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
	<1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz>

Or you could use the builtin script bp_sreformat.pl

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rondon Neto
> Sent: Tuesday, 22 November 2011 6:31 a.m.
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] seqIO.pm
> 
> Hi! try this script:
> 
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Bio::SeqIO;
> 
> if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }
> 
> my $fastq = $ARGV[0];
> 
> my $in = Bio::SeqIO->new( -file => $fastq,
> ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );
> 
> my $out = Bio::SeqIO->new ( -file => ">out.phd",
> ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');
> 
> while (my $seq = $in->next_seq()) {
> ?? ? ?$out->write_seq($seq);
> }
> 
> exit;
> 
> 
> Best wishes,
> Rondon, a brazilian friend.
> 
> 
> 
> 
> 
> 
> ________________________________
>  De: Peter Cock <p.j.a.cock at googlemail.com>
> Para: Nathan Taylor <zntayl at gmail.com>
> Cc: bioperl-l at bioperl.org
> Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
> Assunto: Re: [Bioperl-l] seqIO.pm
> 
> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> > Hello,
> >
> > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> > barring that, a file of fastas and file of quals into .phd files?
> >
> > Many thanks,
> > Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an error message?
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From goodyearkl at gmail.com  Mon Nov 21 21:23:13 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>

Hi,
This may seem like a stupid question but I am just learning bioperl
and I am trying to figure out how to get a count of all the characters
in my FASTA file. I manged to get the number of sequences using the
following. Is there a way to tell bioperl to count the characters?

#!/usr/bin/perl -w
#Defines perl modules
#Bio::Seq deal with sequences and their features
use Bio::Seq;
#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;
#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );
#Count how many sequences are present in file
my $count=0;
while (my $seq_obj = $seqio_obj->next_seq) {
    $count++;
}
#Display the number of sequences present
print "There are $count sequences present.\n";


From David.Messina at sbc.su.se  Tue Nov 22 03:08:11 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 22 Nov 2011 09:08:11 +0100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>

Hi Kylie,

You can use the length method for this.

my $seq_length = $seq_obj->length();

Have you taken a look at the beginner's HOWTO? There's a nice table of
sequence methods as well lots of other good information in there.

http://www.bioperl.org/wiki/HOWTO:Beginners


Dave


On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyearkl at gmail.com> wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
>
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From liam.elbourne at mq.edu.au  Mon Nov 21 23:11:12 2011
From: liam.elbourne at mq.edu.au (Liam Elbourne)
Date: Tue, 22 Nov 2011 15:11:12 +1100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <EEEBBE60-96CB-4458-A460-F154CCC7459D@mq.edu.au>

Hi Kylie,

I think the length() method is what you're after:

....
my $sequence_length = $seq_obj->length();

....

in your case. Have a look at:

HOWTO:SeqIO - BioPerl

and,

HOWTO:Beginners - BioPerl

for some more general stuff.


Regards,
Liam.


On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment-0003.bin>

From goodyearkl at gmail.com  Tue Nov 22 08:00:55 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>

Thank you for your help. It keeps telling me that it can't find
"length" do you think it has to do with the way I am coding it?

#!/usr/bin/perl -w
#Defines perl modules

#Bio::Seq deal with sequences and their features
use Bio::Seq;

#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;


#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );


#Count how many sequences are present in file
my $countseq=0;
while (my $seq_obj = $seqio_obj->next_seq, ) {
    $countseq++;
    }
#Display the number of sequences present
print "There are $countseq sequences present.\n";

#Count number of charcaters in file
my $seq_length = $seq_obj->length ;
print $seq_length


On Nov 22, 5:08?am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> Hi Kylie,
>
> You can use the length method for this.
>
> my $seq_length = $seq_obj->length();
>
> Have you taken a look at the beginner's HOWTO? There's a nice table of
> sequence methods as well lots of other good information in there.
>
> http://www.bioperl.org/wiki/HOWTO:Beginners
>
> Dave
>
>
>
>
>
>
>
>
>
> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
> > Hi,
> > This may seem like a stupid question but I am just learning bioperl
> > and I am trying to figure out how to get a count of all the characters
> > in my FASTA file. I manged to get the number of sequences using the
> > following. Is there a way to tell bioperl to count the characters?
>
> > #!/usr/bin/perl -w
> > #Defines perl modules
> > #Bio::Seq deal with sequences and their features
> > use Bio::Seq;
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats
> > use Bio::SeqIO;
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> > #Count how many sequences are present in file
> > my $count=0;
> > while (my $seq_obj = $seqio_obj->next_seq) {
> > ? ?$count++;
> > }
> > #Display the number of sequences present
> > print "There are $count sequences present.\n";
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy.chaudhuri at gmail.com  Tue Nov 22 10:50:31 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 22 Nov 2011 15:50:31 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <4ECBC4C7.10401@gmail.com>

Hi Kylie,

I suspect the error you get is actually "Can't call method length on an 
undefined value" (please in future report the exact text of any error 
messages). You declare $seq_obj with "my" in the while loop, but then 
try to access it outside of the loop. Try printing out the length of 
each $seq_obj within the while loop.

You should always include "use strict;" at the top of your program, that 
helps to catch errors like this.

Cheers,
Roy.

On 22/11/2011 13:00, Kylie Goodyear wrote:
> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
>
> #!/usr/bin/perl -w
> #Defines perl modules
>
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
>
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
>
>
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
> =>  "fasta" );
>
>
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>      $countseq++;
>      }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
>
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
>
>
> On Nov 22, 5:08 am, Dave Messina<David.Mess... at sbc.su.se>  wrote:
>> Hi Kylie,
>>
>> You can use the length method for this.
>>
>> my $seq_length = $seq_obj->length();
>>
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>>
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>>
>> Dave
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear<goodyea... at gmail.com>  wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>>
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
>>> =>  "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>     $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 22 11:13:01 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 22 Nov 2011 16:13:01 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>

This sounds a little homework-y.  Sure this isn't for a class? :)

One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl.  Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length.

chris

On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:

> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> 
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> 
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> 
> 
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> 
> 
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>    $countseq++;
>    }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
> 
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
> 
> 
> On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
>> Hi Kylie,
>> 
>> You can use the length method for this.
>> 
>> my $seq_length = $seq_obj->length();
>> 
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>> 
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>> 
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
>>> => "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>    $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Nov 22 15:47:36 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 23 Nov 2011 09:47:36 +1300
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
	<0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz>

Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl
As previous posters have hinted, RTFM - the answers are all in there!

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J
> Sent: Wednesday, 23 November 2011 5:13 a.m.
> To: Kylie Goodyear
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] Fasta counting script?
> 
> This sounds a little homework-y.  Sure this isn't for a class? :)
> 
> One clue (and a good thing to keep in mind): always 'use strict; use warnings;'
> with your scripts if you are new to perl.  Doing so would let you know there is
> a problem with the script the way it is written, specifically, the place where
> you are inquiring about the length.
> 
> chris
> 
> On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:
> 
> > Thank you for your help. It keeps telling me that it can't find
> > "length" do you think it has to do with the way I am coding it?
> >
> > #!/usr/bin/perl -w
> > #Defines perl modules
> >
> > #Bio::Seq deal with sequences and their features use Bio::Seq;
> >
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats use Bio::SeqIO;
> >
> >
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> >
> >
> > #Count how many sequences are present in file my $countseq=0; while
> > (my $seq_obj = $seqio_obj->next_seq, ) {
> >    $countseq++;
> >    }
> > #Display the number of sequences present print "There are $countseq
> > sequences present.\n";
> >
> > #Count number of charcaters in file
> > my $seq_length = $seq_obj->length ;
> > print $seq_length
> >
> >
> > On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> >> Hi Kylie,
> >>
> >> You can use the length method for this.
> >>
> >> my $seq_length = $seq_obj->length();
> >>
> >> Have you taken a look at the beginner's HOWTO? There's a nice table
> >> of sequence methods as well lots of other good information in there.
> >>
> >> http://www.bioperl.org/wiki/HOWTO:Beginners
> >>
> >> Dave
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com>
> wrote:
> >>> Hi,
> >>> This may seem like a stupid question but I am just learning bioperl
> >>> and I am trying to figure out how to get a count of all the
> >>> characters in my FASTA file. I manged to get the number of sequences
> >>> using the following. Is there a way to tell bioperl to count the characters?
> >>
> >>> #!/usr/bin/perl -w
> >>> #Defines perl modules
> >>> #Bio::Seq deal with sequences and their features use Bio::Seq;
> >>> #Bio::SeqIO handles reading and parsing of sequences of many
> >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj =
> >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta"
> >>> ); #Count how many sequences are present in file my $count=0; while
> >>> (my $seq_obj = $seqio_obj->next_seq) {
> >>>    $count++;
> >>> }
> >>> #Display the number of sequences present print "There are $count
> >>> sequences present.\n";
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioper... at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf
> >> o/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From charles-listes+bioperl at plessy.org  Wed Nov 23 05:27:45 2011
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 23 Nov 2011 19:27:45 +0900
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
Message-ID: <20111123102745.GC20168@merveille.plessy.net>

Dear BioPerl developers,

I am trying to process some unaligned paired-end reads with Bio::DB::Sam.  For
each pair, I want to detect a sequence index and a unique molecular identifier in
the linker, record them as auxiliary flags, and trim the linker from the read.

I collect the pairs through a features iterator, and can access all their data
through the high-level Bio::DB::Bam::Alignment API.  After modifying them
(linker trimming and adding flags), I want to write the resulting pairs as a
new unaligned BAM file.

I apologise if the solution is trivial, but my problem is that I do not manage to
modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
?$pair[0]->qseq("GATACA")? give errors like
?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.

Since I did not find explanations or portsions of source code indicating how to
modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


From MEC at stowers.org  Wed Nov 23 11:02:26 2011
From: MEC at stowers.org (Cook, Malcolm)
Date: Wed, 23 Nov 2011 10:02:26 -0600
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>

Charles,

I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.

I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".

~Malcolm


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> Sent: Wednesday, November 23, 2011 4:28 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> 
> Dear BioPerl developers,
> 
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
> For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the read.
> 
> I collect the pairs through a features iterator, and can access all their data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as a
> new unaligned BAM file.
> 
> I apologise if the solution is trivial, but my problem is that I do not manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> 
> Since I did not find explanations or portsions of source code indicating how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> 
> Have a nice day,
> 
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 23 14:26:31 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 23 Nov 2011 19:26:31 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
Message-ID: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>

According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).  

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
    Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
    if (items > 1)
      b->core.l_qseq = SvIV(ST(1));
    RETVAL=b->core.l_qseq;
OUTPUT:
    RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
    char* seq;
    int   i;
CODE:
    seq = Newxz(seq,b->core.l_qseq+1,char);
    for (i=0;i<b->core.l_qseq;i++) {
      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
    }
    RETVAL = newSVpv(seq,b->core.l_qseq);
    Safefree(seq);
OUTPUT:
    RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
> 
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
> 
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
> 
> ~Malcolm
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>> 
>> Dear BioPerl developers,
>> 
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>> 
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>> 
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>> 
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>> 
>> Have a nice day,
>> 
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lincoln.stein at gmail.com  Wed Nov 23 17:02:23 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:02:23 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <CAOS1dzwxY2Kt3_xkgnbCps_TYfnT3dGE9+gAirpBCeJoMT7YDg@mail.gmail.com>

I apologize that the qseq() method is only allowing read-only access. I
will attempt to fix this.

Lincoln

On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy <
charles-listes+bioperl at plessy.org> wrote:

> Dear BioPerl developers,
>
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>  For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the
> read.
>
> I collect the pairs through a features iterator, and can access all their
> data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as
> a
> new unaligned BAM file.
>
> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>
> Since I did not find explanations or portsions of source code indicating
> how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>
> Have a nice day,
>
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From lincoln.stein at gmail.com  Wed Nov 23 17:05:41 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:05:41 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
Message-ID: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>

Unfortunately l_qseq read/writes the length of the query sequence, not the
sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <
cjfields at illinois.edu> wrote:

> According to the docs the low-level API for Bio-Samtools, both read and
> write are allowed:
>
> http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API
>
> Using the low-level API for this purpose isn't documented as well, though
> (the high-level API is read only AFAICT).
>
> The error message is a standard one generated from the XS bindings where
> the passed argument passed isn't mapped correctly.  Looking through the
> Sam.xs file, qseq() is only prototyped as a reader; the only arg is a
> Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a
> function specified for Bio::DB::Bam::Alignment names l_qseq() that might be
> the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_'
> prefix):
>
> ....
>
> int
> bama_l_qseq(b,...)
>    Bio::DB::Bam::Alignment b
> PROTOTYPE: $;$
> CODE:
>    if (items > 1)
>      b->core.l_qseq = SvIV(ST(1));
>    RETVAL=b->core.l_qseq;
> OUTPUT:
>    RETVAL
>
> SV*
> bama_qseq(b)
> Bio::DB::Bam::Alignment b
> PROTOTYPE: $
> PREINIT:
>    char* seq;
>    int   i;
> CODE:
>    seq = Newxz(seq,b->core.l_qseq+1,char);
>    for (i=0;i<b->core.l_qseq;i++) {
>      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
>    }
>    RETVAL = newSVpv(seq,b->core.l_qseq);
>    Safefree(seq);
> OUTPUT:
>    RETVAL
>
>
> -chris
>
> On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:
>
> > Charles,
> >
> > I suggest you reconsider your approach to rather, use `samtools view` to
> pipe your reads to stdout in sam format, then stream edit the barcode and
> pipe it back to samtools for conversion back to .bam file.
> >
> > I know this is not what you're asking.  I'm pretty sure that direct
> answer to your question is, "yes - they are read-only".
> >
> > ~Malcolm
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> >> Sent: Wednesday, November 23, 2011 4:28 AM
> >> To: bioperl-l at bioperl.org
> >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> >>
> >> Dear BioPerl developers,
> >>
> >> I am trying to process some unaligned paired-end reads with
> Bio::DB::Sam.
> >> For
> >> each pair, I want to detect a sequence index and a unique molecular
> >> identifier in
> >> the linker, record them as auxiliary flags, and trim the linker from
> the read.
> >>
> >> I collect the pairs through a features iterator, and can access all
> their data
> >> through the high-level Bio::DB::Bam::Alignment API.  After modifying
> them
> >> (linker trimming and adding flags), I want to write the resulting pairs
> as a
> >> new unaligned BAM file.
> >>
> >> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> >> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> >> ?$pair[0]->qseq("GATACA")? give errors like
> >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> >>
> >> Since I did not find explanations or portsions of source code
> indicating how to
> >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> >>
> >> Have a nice day,
> >>
> >> --
> >> Charles Plessy
> >> Tsurumi, Kanagawa, Japan
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From cjfields at illinois.edu  Wed Nov 23 20:07:09 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 24 Nov 2011 01:07:09 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>,
	<CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu>

Ah, okay, makes sense.  I thought it was oddly named. :)

Chris

Sent from my iPad

On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" <lincoln.stein at gmail.com<mailto:lincoln.stein at gmail.com>> wrote:

Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
   Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
   if (items > 1)
     b->core.l_qseq = SvIV(ST(1));
   RETVAL=b->core.l_qseq;
OUTPUT:
   RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
   char* seq;
   int   i;
CODE:
   seq = Newxz(seq,b->core.l_qseq+1,char);
   for (i=0;i<b->core.l_qseq;i++) {
     seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
   }
   RETVAL = newSVpv(seq,b->core.l_qseq);
   Safefree(seq);
OUTPUT:
   RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
>
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
>
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
>
> ~Malcolm
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
>> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org<mailto:bioperl-l at bioperl.org>
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>>
>> Dear BioPerl developers,
>>
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>>
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>>
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>>
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>>
>> Have a nice day,
>>
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From ross at cuhk.edu.hk  Sun Nov 27 03:24:43 2011
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sun, 27 Nov 2011 16:24:43 +0800
Subject: [Bioperl-l] Check the location type for a particular gene in a
	Genbank file
In-Reply-To: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>

Hi all,

To write a script to extract sequence generically for all types of
BioLocation objects, I'd like to know if there is any way to check what
types (e.g. simple or split) are being processed.

Bio::Location::CoordinatePolicyI appears to be doing something similar but
it is more like a post checking step. If I parse the genbank file line by
line, I can certainly check whether the line contains keywords like "join"
but as I'm using something like:

        my @features=grep{$_->primary_tag eq $chkTags[0]}
$seqobj->get_SeqFeatures;                                    
 

        foreach (@features) {

            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';

            @gene=[];                                                   

I'd appreciate if anybody knows a better integration with the well-developed
bioperl module.

Thanks a lot.


From Russell.Smithies at agresearch.co.nz  Sun Nov 27 19:46:05 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 28 Nov 2011 13:46:05 +1300
Subject: [Bioperl-l] Galaxy tools?
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>

Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl?
I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox.
It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space)

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From p.j.a.cock at googlemail.com  Sun Nov 27 20:28:33 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Nov 2011 01:28:33 +0000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
Message-ID: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>

On Monday, November 28, 2011, Smithies, Russell  wrote:
> Possibly the wrong place to ask but has anyone written
> Galaxy tools using BioPerl?
> I was thinking of creating blast graphic and format converter
>  tools as I couldn't see any already available in their toolbox.
> It looks like I can just write a Python wrapper for my existing
> BioPerl scripts - although I suspect the "correct" method is to
> use BioPython methods (but Python annoys me with its lack
> of semi-colons and required white-space)

Galaxy is agnostic about what language the tools are in,
you can use a binary, shell script, Java, Perl, Python etc.

Peter


From florent.angly at gmail.com  Sun Nov 27 21:09:45 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 12:09:45 +1000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
	<CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
Message-ID: <4ED2ED69.10601@gmail.com>

Hi Russell,

As Peter said, the tools to be wrapped do not need to be written in Python.

I have build a few wrappers for Galaxy, including one for the read 
simulator Grinder (http://sourceforge.net/projects/biogrinder/), which 
uses Bioperl and is available in the Galaxy Toolshed 
(http://sourceforge.net/projects/biogrinder/). It is not very hard to do 
a wrapper for trivial programs, but becomes more complicated once you 
start having optional arguments or multiple output files.

Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) 
to parse command-line arguments. I have been thinking about leveraging 
the information that Getopt::Euclid stores about command-line arguments 
to automate most of the Galaxy wrapper generation, but I have not gotten 
to it yet.

Florent


On 28/11/11 11:28, Peter Cock wrote:
> On Monday, November 28, 2011, Smithies, Russell  wrote:
>> Possibly the wrong place to ask but has anyone written
>> Galaxy tools using BioPerl?
>> I was thinking of creating blast graphic and format converter
>>   tools as I couldn't see any already available in their toolbox.
>> It looks like I can just write a Python wrapper for my existing
>> BioPerl scripts - although I suspect the "correct" method is to
>> use BioPython methods (but Python annoys me with its lack
>> of semi-colons and required white-space)
> Galaxy is agnostic about what language the tools are in,
> you can use a binary, shell script, Java, Perl, Python etc.
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Sun Nov 27 23:35:31 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 14:35:31 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
Message-ID: <4ED30F93.4000407@gmail.com>

Hi all,

I have been thinking about starting a set of Perl modules that would 
useful for (microbial) ecologists to represent communities of organisms. 
At the moment, there does not seem to be anything like this in Bioperl. 
I am happy to make these modules available under the Bioperl umbrella 
using the Bio::Community::* namespace.

I envision the following modules:
* Bio::Community::Member module representing members of a community.
* Bio::Community::IO modules to read/write files that describe community 
composition (a.k.a. OTU table, or site by species table) as used 
programs like QIIME, Pyrotagger, GAAS, ...
* Bio::Community::Tools modules to help manipulate communities, e.g. to 
take some members at random, normalize the community to a given number 
of individuals, or do rarefaction curves.

The idea is to implement these modules in Moose to teach myself Moose. 
The members of a community could be a sequence (Bio::SeqI), a species 
(Bio::S), an arbitrary string or even other things. I am not quite sure 
if Bioperl provide facilities to attach some arbitrary information to an 
object.

Any interest? Ideas? Comments?

Thanks,

Florent


From cjfields at illinois.edu  Mon Nov 28 14:42:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:42:12 +0000
Subject: [Bioperl-l] Check the location type for a particular gene in
	a	Genbank file
In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
	<000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu>

Ross,

The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects

chris

On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote:

> Hi all,
> 
> To write a script to extract sequence generically for all types of
> BioLocation objects, I'd like to know if there is any way to check what
> types (e.g. simple or split) are being processed.
> 
> Bio::Location::CoordinatePolicyI appears to be doing something similar but
> it is more like a post checking step. If I parse the genbank file line by
> line, I can certainly check whether the line contains keywords like "join"
> but as I'm using something like:
> 
>        my @features=grep{$_->primary_tag eq $chkTags[0]}
> $seqobj->get_SeqFeatures;                                    
> 
> 
>        foreach (@features) {
> 
>            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';
> 
>            @gene=[];                                                   
> 
> I'd appreciate if anybody knows a better integration with the well-developed
> bioperl module.
> 
> Thanks a lot.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 28 14:47:10 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:47:10 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED30F93.4000407@gmail.com>
References: <4ED30F93.4000407@gmail.com>
Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>

I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?  I do think it should be developed on it's own, per our recent discussions re: slimming down core.

Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.

chris

On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:

> Hi all,
> 
> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
> 
> I envision the following modules:
> * Bio::Community::Member module representing members of a community.
> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
> 
> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
> 
> Any interest? Ideas? Comments?
> 
> Thanks,
> 
> Florent
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From l.m.timmermans at students.uu.nl  Mon Nov 28 15:25:13 2011
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Mon, 28 Nov 2011 21:25:13 +0100
Subject: [Bioperl-l]  Interest in Bio::Community modules
In-Reply-To: <CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
Message-ID: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>

And now to the list too,

On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:

> The idea is to implement these modules in Moose to teach myself Moose. The
> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
> an arbitrary string or even other things. I am not quite sure if Bioperl
> provide facilities to attach some arbitrary information to an object.
>
> Any interest? Ideas? Comments?
>

Sounds like a good use-case for roles, maybe even parametric roles.

Leon


From florent.angly at gmail.com  Mon Nov 28 19:59:24 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 29 Nov 2011 10:59:24 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
Message-ID: <4ED42E6C.6020501@gmail.com>

Hi Chris,

On 29/11/11 05:47, Fields, Christopher J wrote:
> I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?
None of these features would be duplicated. Rather, they would be used 
attributes of the Bio::Community::* objects. For example, a member of a 
community could have a Bio::SeqI attached to it as well as a Bio::Taxon, 
etc...

> I do think it should be developed on it's own, per our recent discussions re: slimming down core.
Yes, the features are so different that it makes sense to have the 
Bio::Community::* modules as a separate BioPerl distribution (like the 
Bio-FeatureIO BioPerl distribution).

> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* 
modules would need to inherit from any other BioPerl modules. 
Considering this and the performance aspects of Moose, do you think that 
using Moose is a wise design decision?

Best,

Florent


> chris
>
> On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:
>
>> Hi all,
>>
>> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
>>
>> I envision the following modules:
>> * Bio::Community::Member module representing members of a community.
>> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
>> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
>>
>> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
>>
>> Any interest? Ideas? Comments?
>>
>> Thanks,
>>
>> Florent
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 29 00:32:50 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 05:32:50 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
	<CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
Message-ID: <C87E8F45-FE8A-4E77-A612-DF1E25C9CA73@illinois.edu>

On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote:

> And now to the list too,
> 
> On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:
> 
>> The idea is to implement these modules in Moose to teach myself Moose. The
>> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
>> an arbitrary string or even other things. I am not quite sure if Bioperl
>> provide facilities to attach some arbitrary information to an object.
>> 
>> Any interest? Ideas? Comments?
>> 
> 
> Sounds like a good use-case for roles, maybe even parametric roles.
> 
> Leon

Yep, agree totally.  It would be a good replacement in most cases for the BioI interfaces.  

(see also, the Biome project, which I'm slooooooowly working on again, on github)

chris


From pmr at ebi.ac.uk  Tue Nov 29 08:39:52 2011
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 29 Nov 2011 13:39:52 +0000
Subject: [Bioperl-l] BinarySearch.pm
Message-ID: <4ED4E0A8.30102@ebi.ac.uk>

In trying to use bioflat_index.pl index files in EMBOSS, I ran into some 
problems.

Both appear to be in the Bio/Flat/BinarySearch.pm source file.

EMBL ID lines are failing to drop the ';' from the ID. Updating the 
regular expression to make sure the ';' is not picked up seems to work:

   if ($format =~ /embl/i) {
     return ('ID',
	    "^ID   (\\S+[^; ])",
	    "^ID   (\\S+[^; ])",
	    {
	     ACC     => q/^AC   (\S+);/,
	     VERSION => q/^SV\s+(\S+)/
	    });
   }

The ACC secondary index has every record duplicated.
This line is duplicated in the write_secondary_indices source code. Is 
that intentional?

  		    print $fh sprintf("%-${length}s",$record);

regards,

Peter Rice
EMBOSS Team


From uni.anastasia at gmail.com  Sat Nov 26 12:32:48 2011
From: uni.anastasia at gmail.com (anastsia shapiro)
Date: Sat, 26 Nov 2011 19:32:48 +0200
Subject: [Bioperl-l] Problem with parsing blast results
Message-ID: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>

Hello,

I'm running a script that should parse a blast results, using searchIO.

Sometimes the script work fines, however sometimes it stops, and I receive
the following error.

------------- EXCEPTION -------------
MSG: no data for midline Query
------------------------------------------------------------
STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
blast.pm:1805
STACK toplevel
D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
-------------------------------------
While the blast results files were received as a result of running the
following blast command:
blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I am using bioperl 1.6.1.
I read all the forums , and it seems to be a bug, but on version 1.5 it was
fixed.

I will really appreciate your help, since I am trying to understand the
problem for over a month.

Regards,
Anastasia


From bunk at novozymes.com  Tue Nov 29 11:46:54 2011
From: bunk at novozymes.com (Jacob Bunk Nielsen)
Date: Tue, 29 Nov 2011 17:46:54 +0100
Subject: [Bioperl-l] Problem with parsing blast results
In-Reply-To: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
	(anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100")
References: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net>

Hi

anastsia shapiro <uni.anastasia at gmail.com> writes:

> I'm running a script that should parse a blast results, using searchIO.
>
> Sometimes the script work fines, however sometimes it stops, and I receive
> the following error.
>
> ------------- EXCEPTION -------------
> MSG: no data for midline Query
> ------------------------------------------------------------
> STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
> blast.pm:1805
> STACK toplevel
> D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
> -------------------------------------
> While the blast results files were received as a result of running the
> following blast command:
> blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
> no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I don't know why this exact problem arises, but I think you should
consider using an output format that is better machine parseable, like
the XML format.

You specify XML as output format of blastn by using -m 7. When reading
the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO.

That way I think you are likely to see a lot fewer problems regarding
the parsing of blast output.

If the above doesn't solve the problem you better show us the code that
fails.

Best regards

Jacob


From cjfields at illinois.edu  Tue Nov 29 14:11:11 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 19:11:11 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED42E6C.6020501@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>

On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:

> Hi Chris,
> 
> On 29/11/11 05:47, Fields, Christopher J wrote:
> ...
>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?

Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.

> Best,
> 
> Florent


chris


From cjfields at illinois.edu  Tue Nov 29 17:30:58 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 22:30:58 +0000
Subject: [Bioperl-l] BinarySearch.pm
In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk>
References: <4ED4E0A8.30102@ebi.ac.uk>
Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu>

Peter, 

Can you send a test file that is failing?  I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files.  I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions.  Both changes pass tests as is, though, so I have committed them in the meantime.

chris

On Nov 29, 2011, at 7:39 AM, Peter Rice wrote:

> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems.
> 
> Both appear to be in the Bio/Flat/BinarySearch.pm source file.
> 
> EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work:
> 
>  if ($format =~ /embl/i) {
>    return ('ID',
> 	    "^ID   (\\S+[^; ])",
> 	    "^ID   (\\S+[^; ])",
> 	    {
> 	     ACC     => q/^AC   (\S+);/,
> 	     VERSION => q/^SV\s+(\S+)/
> 	    });
>  }
> 
> The ACC secondary index has every record duplicated.
> This line is duplicated in the write_secondary_indices source code. Is that intentional?
> 
> 		    print $fh sprintf("%-${length}s",$record);
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Nov 29 20:18:41 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 11:18:41 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
Message-ID: <4ED58471.3030106@gmail.com>

Chris,
Yes, it is exciting to learn something new.
I have developed a bit of code in the last few days in my local git 
repository. Do you think you could create a repository for Bio-Community 
on the Bioperl Github space or is it too soon?
Cheers,
Florent

On 30/11/11 05:11, Fields, Christopher J wrote:
> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>
>> Hi Chris,
>>
>> On 29/11/11 05:47, Fields, Christopher J wrote:
>> ...
>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>
>> Best,
>>
>> Florent
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 29 21:34:00 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 30 Nov 2011 02:34:00 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED58471.3030106@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
Message-ID: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>

On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:

> Chris,
> Yes, it is exciting to learn something new.
> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?

It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:

https://github.com/bioperl/Bio-Community

chris


> Cheers,
> Florent
> 
> On 30/11/11 05:11, Fields, Christopher J wrote:
>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>> 
>>> Hi Chris,
>>> 
>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>> ...
>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>> 
>>> Best,
>>> 
>>> Florent
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Nov 29 21:50:04 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 12:50:04 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
	<A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
Message-ID: <4ED599DC.6090808@gmail.com>

Fantastic! Thank you very much Chris,
Florent

On 30/11/11 12:34, Fields, Christopher J wrote:
> On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:
>
>> Chris,
>> Yes, it is exciting to learn something new.
>> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?
> It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:
>
> https://github.com/bioperl/Bio-Community
>
> chris
>
>
>> Cheers,
>> Florent
>>
>> On 30/11/11 05:11, Fields, Christopher J wrote:
>>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>>> ...
>>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>>>
>>>> Best,
>>>>
>>>> Florent
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lsbrath at gmail.com  Wed Nov 30 00:25:32 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 00:25:32 -0500
Subject: [Bioperl-l] Exception MSG
Message-ID: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>

Hello,

Brushing up on my BioPerl and I can't figure out this MSG:

------------- EXCEPTION -------------

MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out

STACK Bio::Tools::Run::RemoteBlast::save_output
/Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678

STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40

-------------------------------------
 Here is the code:

#!/usr/bin/perl -w

use strict;

use Bio::Tools::Run::RemoteBlast;


#=cut

my $prog = 'blastp';

my $db = 'swissprot';

my $e_val = '1e-10';


my @params = ('-prog' => $prog,

'-data' => $db,

'expect' => $e_val,

'readmethod' => 'SearchIO' );

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);


#human database

$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
[ORGN]';


my $v =1; # this is just to turn on and off the messages

# Construct the sequence object

my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
=> "fasta");


while (my $input = $seq_in->next_seq()){

my $r = $factory->submit_blast($input);

print STDERR "waiting..." if ($v > 0);

while (my @rids = $factory->each_rid()){

foreach my $rid (@rids){

my $rc = $factory->retrieve_blast($rid);

if( !ref($rc) ) {

if($rc < 0){

$factory->remove_rid($rid);

}

print STDERR "." if ($v > 0);

sleep 5;

} else {

my $result = $rc->next_result();

#save output

my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error

$factory->save_output($filename);

$factory->remove_rid($rid);

print "\nQuery Name: ", $result->query_name(), "\n";

          while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

            print "\thit name is ", $hit->name, "\n";

            while( my $hsp = $hit->next_hsp ) {

              print "\t\tscore is ", $hsp->score, "\n";

}

          }

        }

      }

    }

  }


Thanks for the help!


From jason.stajich at gmail.com  Wed Nov 30 01:05:41 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 29 Nov 2011 22:05:41 -0800
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>

I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself.

On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:

> Hello,
> 
> Brushing up on my BioPerl and I can't figure out this MSG:
> 
> ------------- EXCEPTION -------------
> 
> MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> 
> STACK Bio::Tools::Run::RemoteBlast::save_output
> /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> 
> STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> 
> -------------------------------------
> Here is the code:
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> use Bio::Tools::Run::RemoteBlast;
> 
> 
> #=cut
> 
> my $prog = 'blastp';
> 
> my $db = 'swissprot';
> 
> my $e_val = '1e-10';
> 
> 
> my @params = ('-prog' => $prog,
> 
> '-data' => $db,
> 
> 'expect' => $e_val,
> 
> 'readmethod' => 'SearchIO' );
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #human database
> 
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> [ORGN]';
> 
> 
> my $v =1; # this is just to turn on and off the messages
> 
> # Construct the sequence object
> 
> my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
> => "fasta");
> 
> 
> while (my $input = $seq_in->next_seq()){
> 
> my $r = $factory->submit_blast($input);
> 
> print STDERR "waiting..." if ($v > 0);
> 
> while (my @rids = $factory->each_rid()){
> 
> foreach my $rid (@rids){
> 
> my $rc = $factory->retrieve_blast($rid);
> 
> if( !ref($rc) ) {
> 
> if($rc < 0){
> 
> $factory->remove_rid($rid);
> 
> }
> 
> print STDERR "." if ($v > 0);
> 
> sleep 5;
> 
> } else {
> 
> my $result = $rc->next_result();
> 
> #save output
> 
> my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> 
> $factory->save_output($filename);
> 
> $factory->remove_rid($rid);
> 
> print "\nQuery Name: ", $result->query_name(), "\n";
> 
>          while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>            print "\thit name is ", $hit->name, "\n";
> 
>            while( my $hsp = $hit->next_hsp ) {
> 
>              print "\t\tscore is ", $hsp->score, "\n";
> 
> }
> 
>          }
> 
>        }
> 
>      }
> 
>    }
> 
>  }
> 
> 
> 
> Thanks for the help!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ss2489 at cornell.edu  Wed Nov 30 09:32:47 2011
From: ss2489 at cornell.edu (Surya Saha)
Date: Wed, 30 Nov 2011 09:32:47 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
Message-ID: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>

If that does not fix it, try using one of the unique identifiers as the
file name (gi??) instead of the full query name. The pipe(|) characters
might cause problems.

On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:

> I don't think you need to give it the '>' when you specify the filename
> for the output. That is done by the filehandle opening itsself.
>
> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>
> > Hello,
> >
> > Brushing up on my BioPerl and I can't figure out this MSG:
> >
> > ------------- EXCEPTION -------------
> >
> > MSG: cannot open
> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> >
> > STACK Bio::Tools::Run::RemoteBlast::save_output
> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> >
> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> >
> > -------------------------------------
> > Here is the code:
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use Bio::Tools::Run::RemoteBlast;
> >
> >
> > #=cut
> >
> > my $prog = 'blastp';
> >
> > my $db = 'swissprot';
> >
> > my $e_val = '1e-10';
> >
> >
> > my @params = ('-prog' => $prog,
> >
> > '-data' => $db,
> >
> > 'expect' => $e_val,
> >
> > 'readmethod' => 'SearchIO' );
> >
> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #human database
> >
> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> > [ORGN]';
> >
> >
> > my $v =1; # this is just to turn on and off the messages
> >
> > # Construct the sequence object
> >
> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
> -format
> > => "fasta");
> >
> >
> > while (my $input = $seq_in->next_seq()){
> >
> > my $r = $factory->submit_blast($input);
> >
> > print STDERR "waiting..." if ($v > 0);
> >
> > while (my @rids = $factory->each_rid()){
> >
> > foreach my $rid (@rids){
> >
> > my $rc = $factory->retrieve_blast($rid);
> >
> > if( !ref($rc) ) {
> >
> > if($rc < 0){
> >
> > $factory->remove_rid($rid);
> >
> > }
> >
> > print STDERR "." if ($v > 0);
> >
> > sleep 5;
> >
> > } else {
> >
> > my $result = $rc->next_result();
> >
> > #save output
> >
> > my $filename =
> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> >
> > $factory->save_output($filename);
> >
> > $factory->remove_rid($rid);
> >
> > print "\nQuery Name: ", $result->query_name(), "\n";
> >
> >          while ( my $hit = $result->next_hit ) {
> >
> >            next unless ( $v > 0);
> >
> >            print "\thit name is ", $hit->name, "\n";
> >
> >            while( my $hsp = $hit->next_hsp ) {
> >
> >              print "\t\tscore is ", $hsp->score, "\n";
> >
> > }
> >
> >          }
> >
> >        }
> >
> >      }
> >
> >    }
> >
> >  }
> >
> >
> >
> > Thanks for the help!
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From lsbrath at gmail.com  Wed Nov 30 09:34:52 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 09:34:52 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
	<CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
Message-ID: <CAJm=ba-yP6q53NunpxPJzdurthGE2uN3GAtiGs7eHm1rY6AdoA@mail.gmail.com>

Surya,

As Jason suggested, I removed the '>' and it worked. Thanks for your
response.

Lom

On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha <ss2489 at cornell.edu> wrote:

> If that does not fix it, try using one of the unique identifiers as the
> file name (gi??) instead of the full query name. The pipe(|) characters
> might cause problems.
>
> On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:
>
>> I don't think you need to give it the '>' when you specify the filename
>> for the output. That is done by the filehandle opening itsself.
>>
>> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>>
>> > Hello,
>> >
>> > Brushing up on my BioPerl and I can't figure out this MSG:
>> >
>> > ------------- EXCEPTION -------------
>> >
>> > MSG: cannot open
>> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
>> >
>> > STACK Bio::Tools::Run::RemoteBlast::save_output
>> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
>> >
>> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
>> >
>> > -------------------------------------
>> > Here is the code:
>> >
>> > #!/usr/bin/perl -w
>> >
>> > use strict;
>> >
>> > use Bio::Tools::Run::RemoteBlast;
>> >
>> >
>> > #=cut
>> >
>> > my $prog = 'blastp';
>> >
>> > my $db = 'swissprot';
>> >
>> > my $e_val = '1e-10';
>> >
>> >
>> > my @params = ('-prog' => $prog,
>> >
>> > '-data' => $db,
>> >
>> > 'expect' => $e_val,
>> >
>> > 'readmethod' => 'SearchIO' );
>> >
>> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> >
>> >
>> > #human database
>> >
>> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
>> > [ORGN]';
>> >
>> >
>> > my $v =1; # this is just to turn on and off the messages
>> >
>> > # Construct the sequence object
>> >
>> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
>> -format
>> > => "fasta");
>> >
>> >
>> > while (my $input = $seq_in->next_seq()){
>> >
>> > my $r = $factory->submit_blast($input);
>> >
>> > print STDERR "waiting..." if ($v > 0);
>> >
>> > while (my @rids = $factory->each_rid()){
>> >
>> > foreach my $rid (@rids){
>> >
>> > my $rc = $factory->retrieve_blast($rid);
>> >
>> > if( !ref($rc) ) {
>> >
>> > if($rc < 0){
>> >
>> > $factory->remove_rid($rid);
>> >
>> > }
>> >
>> > print STDERR "." if ($v > 0);
>> >
>> > sleep 5;
>> >
>> > } else {
>> >
>> > my $result = $rc->next_result();
>> >
>> > #save output
>> >
>> > my $filename =
>> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
>> >
>> > $factory->save_output($filename);
>> >
>> > $factory->remove_rid($rid);
>> >
>> > print "\nQuery Name: ", $result->query_name(), "\n";
>> >
>> >          while ( my $hit = $result->next_hit ) {
>> >
>> >            next unless ( $v > 0);
>> >
>> >            print "\thit name is ", $hit->name, "\n";
>> >
>> >            while( my $hsp = $hit->next_hsp ) {
>> >
>> >              print "\t\tscore is ", $hsp->score, "\n";
>> >
>> > }
>> >
>> >          }
>> >
>> >        }
>> >
>> >      }
>> >
>> >    }
>> >
>> >  }
>> >
>> >
>> >
>> > Thanks for the help!
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From ericdemuinck at gmail.com  Wed Nov 30 18:36:36 2011
From: ericdemuinck at gmail.com (Ericde)
Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST)
Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form
Message-ID: <32886592.post@talk.nabble.com>


:-/

I am a newbie and I am trying to retrieve a blast multiple alignment in
fasta form. The BLAST output (m -2) gives several alignments (which is good)
and the parsing of the xml file seems to list all of these alignments (which
is also good) 

The problem is that the fasta alignment file only includes one of the hits
and the alignment does not include all the sequences (including the query
sequence).

I would like to generate a fasta file that includes all the alignments
included in the m -2 output (plus query sequence if possible). I have
cobbled together a script (below) ...I will attach the sample blast xml file
and the (m -2) file as well....any insight is appreciated :/

#module load perl
 
#give the name of the blast xml file to parse in the line where it says
'file =>'
use Bio::SearchIO; 
#Use m -7 to generate xml file from blastall
my $in = new Bio::SearchIO(-format => 'blastxml', 
                           -file   => 'BLASToutxml');
while( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
#ENTER desired sequence length
      if( $hsp->length('total') > 50 ) {
#ENTER desired percent identity
        if ( $hsp->percent_identity >= 75 ) {
          print "Query=",   $result->query_name,
            " Hit=",        $hit->name,
            " Length=",     $hsp->length('total'),
            " Percent_id=", $hsp->percent_identity, "\n";
#Print alignment to file
#$aln will be a Bio::SimpleAlign object
       use Bio::AlignIO;
           my $aln = $hsp->get_aln;

#changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file 
          my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file =>
">hsp.fas"); 
      $alnIO->write_aln($aln);

        }
      }
    }  
  }
}
http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml 
http://old.nabble.com/file/p32886592/hsp.fas hsp.fas 
-- 
View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From hrh at fmi.ch  Tue Nov  1 10:18:54 2011
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Tue, 1 Nov 2011 11:18:54 +0100
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
Message-ID: <CAD5861E.14042%hrh@fmi.ch>

Hi Carn?

Please allow me to make a few comments:

I very much like your idea of writing a free tool to edit and draw
sequences. We (ie people working in core Bioinformatics facilities) all
suffer from having to deal with files originally created with commercial
packages. And on top of all the pain, those commercial packages are very
expensive and they don't deliver what they promise to do.


Just double checking: Have you looked a the free tools which are available?

I am aware of the following ones (as far as I know, they are all GUI based
and don't have a command line API):

Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
GENtle            http://gentle.magnusmanske.de/
GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
pDRAW32           http://www.acaclone.com/
Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
UGene             http://ugene.unipro.ru/

maybe others on the list know of even better free tools?

Also, have you looked at the emboss tool "cirdna" ?


WRT file formats: I strongly suggest to stick to embl and genbank format as
input and (text) output format. The features are not indexed, but you can
create your own when you store the sequences in your system. Internally, you
probably wanna keep the data in a 'simpler' format than embl or genbank,
anyway.

Alternatively, have you looked at gff/gtf as away of getting features?
see: 

http://www.sequenceontology.org/gff3.shtml
http://mblab.wustl.edu/GTF22.html


I am looking forward to any progress you make

Regards, Hans


Hans-Rudolf Hotz, PhD
Bioinformatics Support

Friedrich Miescher Institute for Biomedical Research
Maulbeerstrasse 66
4058 Basel/Switzerland


On 10/31/11 7:05 PM, "Carn? Draug" <carandraug+dev at gmail.com> wrote:

> Hi
> 
> I've been planning on writing a free (as in freedom) tool to edit
> sequences and make plamids maps. The idea is to build the command line
> tool first and maybe later work on a GUI for it.
> 
> The problem I foresee at the moment while designing it, is how to
> change a feature of the sequence. I'm not familiar with all sequence
> formats (only fasta, ensembl and genbank) but I can't see how to
> specify from the command line what feature to edit since I can't see
> any unique identifiers for them. Is there a file format that makes
> this easier? Any tips would be most appreciated.
> 
> Thank in advance,
> Carn? Draug
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 13:40:30 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 13:40:30 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
Message-ID: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>

On Oct 24, 2011, at 9:58 AM, Sofia Robb wrote:

> Hi,
> 
> I am having problems running Bio::Index::Fastq.  I get the following error when a quality line begins with '@'.
> 
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: No description line parsed
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:368
> STACK: Bio::SeqIO::fastq::next_dataset /usr/share/perl5/Bio/SeqIO/fastq.pm:71
> STACK: Bio::SeqIO::fastq::next_seq /usr/share/perl5/Bio/SeqIO/fastq.pm:29
> STACK: Bio::Index::AbstractSeq::fetch /usr/share/perl5/Bio/Index/AbstractSeq.pm:147
> STACK: Bio::Index::AbstractSeq::get_Seq_by_id /usr/share/perl5/Bio/Index/AbstractSeq.pm:198
> STACK: /home_stajichlab/robb/bin/clean_pairs_indexed.pl:68
> 
> 
> Here is an example of a fastq record that is causing this error, The last line which starts with an '@'  is actually the qual line.
> 
> @5:105:15806:16092:Y
> GTGGCGCGGAACAGAGGAGGAATGTTCAGGAGAGGGGGCATGTGTTGTTACCGAGTACTTGGAAACGACG
> +
> @9;A565:=8B?<E<DEEBEE<E3BB?3??BCCF2<@@=BGGBDB60:64594.81?<B??;3?8-984?
> 
> 
> 
> i see that chris has partially addressed this in the mailing list
> http://bioperl.org/pipermail/bioperl-l/2011-January/034481.html
> 
> However as he pointed out at the time, it appears this may be a fairly large problem.

The indexer is being refactored to address this problem; the Bio::SeqIO parser actually does parse this, but the (very simple) indexer does not.  I can try to push this to the forefront this week, the fix shouldn't be too hard to implement.  In essence it would simply use a few SeqIO methods I built in to parse out each bit of data in chunks; would just need to track the start and length of each chunk while the parser is running.

> My fastq seq and qual lines are alway only one line, so I think that adding a line count and only checking for @ in the lines that $line_count%4 ==0  would work since the header lines are always the first of 4 lines , 0,4,8, etc.

That doesn't work for all cases, however (some FASTQ wraps the seq and qual, like FASTA). Peter and I have discussed this elsewhere; a possible solutions is to add in an optimized parser that takes this assumption into account. 

One problem the various Bio* indexers have currently is the lack of standardization on a specific schema for indexing.  There are in-roads towards this (OBDA) that haven't been adequately traveled IMHO, which need to be taken up again.

A second, and maybe this is more specific to BioPerl, is that the parsers and indexers essentially reimplement the format parsing in each module, so if there are bugs they have to be independently fixed (hence why SeqIO works and the indexer doesn't; I wrote the first but not the second).  The best place for any optimizations would be in a unified parser that both the SeqIO and indexer modules could use.

> But if there are multiple lines of seq and qual i think that the /^+$/ of /^+$id/ can be used to identify the end of the sequence and the number of lines of quality should be equal to the number of lines of sequence
> 
> 
> ## only for single line seq and qual
> my $line_count = 0;
>   while (<$FASTQ>) {
>       if (/^@/ and  $line_count % 4 == 0) {
>           # $begin is the position of the first character after the '@'
>           my $begin = tell($FASTQ) - length( $_ ) + 1;
>           foreach my $id (&$id_parser($_)) {
>               $self->add_record($id, $i, $begin);
>               $c++;
>           }
>       }
>       $line_count++;
>   }
> 
> 
> --
> BioPerl fastq parsing issues aside, is there another tool which allows you to retrieve arbitrary sequences from a fastq file by sequence ID?
> 
> There's one called cdbfasta which looks like it might work ? does anyone have experience with it?

I haven't, but it appears FASTA-specific.  Does it parse FASTQ as well?  

I recall Sanger has a C-based FASTQ/FASTQ hybrid one as well.  May have to look that one up.

> Thanks,
> sofia
> 
> P.S. I am CCing Peter Cock in case BioPython has solved this issue already ? if so, perhaps their solution could be applied here.


chris


From p.j.a.cock at googlemail.com  Tue Nov  1 14:38:43 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 14:38:43 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
Message-ID: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>

On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
>
> One problem the various Bio* indexers have currently is the lack of
> standardization on a specific schema for indexing. ?There are in-roads
> towards this (OBDA) that haven't been adequately traveled IMHO,
> which need to be taken up again.
>

Something to switch to open-bio-l at lists.open-bio.org for,
http://lists.open-bio.org/mailman/listinfo/open-bio-l

We can continue this thread from last summer,
http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
...
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html

And CC Peter Rice from EMBOSS too - we chatted about this
at ISMB/BOSC 2011 in July - and whomever looks after the
OBDA/indexing code in BioRuby and BioJava too.

> A second, and maybe this is more specific to BioPerl, is that the
> parsers and indexers essentially reimplement the format parsing
> in each module, so if there are bugs they have to be independently
> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
> first but not the second). ?The best place for any optimizations
> would be in a unified parser that both the SeqIO and indexer
> modules could use.

We have that problem to an extent in Biopython's Bio.SeqIO code.
The indexing code duplicates some logic of the parsing code
(how much depends on the format), sufficient to extract the read
ID and the bounds on disk. The two could be more unified but
the parsers came first and didn't want to change them at the time.
Instead I tried to be rigorous in consistency testing for the index
code's unit tests.

Regards,

Peter


From carandraug+dev at gmail.com  Tue Nov  1 15:13:06 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Tue, 1 Nov 2011 15:13:06 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAPOrs_2uZ6TPghqAXwVeXwSeHBm+iomTXMGqswR38V_L=SQWyw@mail.gmail.com>
	<CAD5861E.14042%hrh@fmi.ch>
Message-ID: <CAPOrs_0rZcokpSvAhMM3gtKWgeH3knDuTfnyybPJUU5D-WEgpA@mail.gmail.com>

On 1 November 2011 10:18, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):

They are not all free. Just for future reference, here's their licenses:

> Serial Cloner

Couldn't find a license and the download for linux has no source so
I'm guessing proprietary.

> GENtle ? ? ? ? ? ?http://gentle.magnusmanske.de/

Free under GPL

> GeneCoder

Proprietary

> pDRAW32

Proprietary

> Genome Workbench ?http://www.ncbi.nlm.nih.gov/projects/gbench/

Seems public domain. License is not defined anywhere but the files I
checked had the public domain notice on the header

> Ape

Proprietary ("license" is at the top of AppMain.tcl)

> UGene ? ? ? ? ? ? http://ugene.unipro.ru/

Free under GPL

> Also, have you looked at the emboss tool "cirdna" ?

Free under GPL

> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html

Considering the already existing alternatives, I'm more likely to
collaborate with one of them to do what I want. I'll just have to
check them all and decide. I was planning on writing a new tool and
contribute it to the scripts section of bioperl since when I googled
before all the links only the proprietary tools showed up. Thank you
very much for the links.

Carn?


From roy.chaudhuri at gmail.com  Tue Nov  1 15:44:19 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 01 Nov 2011 15:44:19 +0000
Subject: [Bioperl-l] best way to edit sequence features
In-Reply-To: <CAD5861E.14042%hrh@fmi.ch>
References: <CAD5861E.14042%hrh@fmi.ch>
Message-ID: <4EB013D3.30801@gmail.com>

The Sanger Institute's Artemis is good for editing sequence features, 
and DNAPlotter can be used to produce circular diagrams:

http://www.sanger.ac.uk/resources/software/artemis
http://www.sanger.ac.uk/resources/software/dnaplotter

Roy.

On 01/11/2011 10:18, Hotz, Hans-Rudolf wrote:
> Hi Carn?
>
> Please allow me to make a few comments:
>
> I very much like your idea of writing a free tool to edit and draw
> sequences. We (ie people working in core Bioinformatics facilities) all
> suffer from having to deal with files originally created with commercial
> packages. And on top of all the pain, those commercial packages are very
> expensive and they don't deliver what they promise to do.
>
>
> Just double checking: Have you looked a the free tools which are available?
>
> I am aware of the following ones (as far as I know, they are all GUI based
> and don't have a command line API):
>
> Serial Cloner     http://serialbasics.free.fr/Serial_Cloner.html
> GENtle            http://gentle.magnusmanske.de/
> GeneCoder         http://www.algosome.com/gene-coder/gene-coder.html
> pDRAW32           http://www.acaclone.com/
> Genome Workbench  http://www.ncbi.nlm.nih.gov/projects/gbench/
> Ape               http://www.biology.utah.edu/jorgensen/wayned/ape/>
> UGene             http://ugene.unipro.ru/
>
> maybe others on the list know of even better free tools?
>
> Also, have you looked at the emboss tool "cirdna" ?
>
>
> WRT file formats: I strongly suggest to stick to embl and genbank format as
> input and (text) output format. The features are not indexed, but you can
> create your own when you store the sequences in your system. Internally, you
> probably wanna keep the data in a 'simpler' format than embl or genbank,
> anyway.
>
> Alternatively, have you looked at gff/gtf as away of getting features?
> see:
>
> http://www.sequenceontology.org/gff3.shtml
> http://mblab.wustl.edu/GTF22.html
>
>
>
> I am looking forward to any progress you make
>
> Regards, Hans
>
>
>
> Hans-Rudolf Hotz, PhD
> Bioinformatics Support
>
> Friedrich Miescher Institute for Biomedical Research
> Maulbeerstrasse 66
> 4058 Basel/Switzerland
>
>
>
> On 10/31/11 7:05 PM, "Carn? Draug"<carandraug+dev at gmail.com>  wrote:
>
>> Hi
>>
>> I've been planning on writing a free (as in freedom) tool to edit
>> sequences and make plamids maps. The idea is to build the command line
>> tool first and maybe later work on a GUI for it.
>>
>> The problem I foresee at the moment while designing it, is how to
>> change a feature of the sequence. I'm not familiar with all sequence
>> formats (only fasta, ensembl and genbank) but I can't see how to
>> specify from the command line what feature to edit since I can't see
>> any unique identifiers for them. Is there a file format that makes
>> this easier? Any tips would be most appreciated.
>>
>> Thank in advance,
>> Carn? Draug
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Tue Nov  1 16:02:24 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 1 Nov 2011 09:02:24 -0700
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
Message-ID: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>


I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Jason
On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> 
>> One problem the various Bio* indexers have currently is the lack of
>> standardization on a specific schema for indexing.  There are in-roads
>> towards this (OBDA) that haven't been adequately traveled IMHO,
>> which need to be taken up again.
>> 
> 
> Something to switch to open-bio-l at lists.open-bio.org for,
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
> 
> We can continue this thread from last summer,
> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
> ...
> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
> 
> And CC Peter Rice from EMBOSS too - we chatted about this
> at ISMB/BOSC 2011 in July - and whomever looks after the
> OBDA/indexing code in BioRuby and BioJava too.
> 
>> A second, and maybe this is more specific to BioPerl, is that the
>> parsers and indexers essentially reimplement the format parsing
>> in each module, so if there are bugs they have to be independently
>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>> first but not the second).  The best place for any optimizations
>> would be in a unified parser that both the SeqIO and indexer
>> modules could use.
> 
> We have that problem to an extent in Biopython's Bio.SeqIO code.
> The indexing code duplicates some logic of the parsing code
> (how much depends on the format), sufficient to extract the read
> ID and the bounds on disk. The two could be more unified but
> the parsers came first and didn't want to change them at the time.
> Instead I tried to be rigorous in consistency testing for the index
> code's unit tests.
> 
> Regards,
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov  1 17:44:25 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 1 Nov 2011 17:44:25 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
Message-ID: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>

On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:

> I think a different indexer is needed for the scale of key/value pairs we see in fastq files if we want to make a fast lookup by ID. I think speed is of essence for this type of solution and so a forced all records must be 4 lines long is okay for this type of implementation. 

This can always be an early optimization, that's easy enough. But I'm sure we will have to deal with multi-line seq/qual FASTQ at some point.  

> I found NOSQL implementations to be much better performance and than any of the BDB type solutions -- they end up being really slow at above 1-5M keys.  I used TokyoCabinet and KyotoCabinet to do indexing of accession -> taxonomy ID and found it quite fast for the needs. I haven't tried storing 100bp reads + qual string as the value in it yet but I think it could be done, certainly worth a prototype.

Adding a middle layer where the backend storage is abstracted is the probably the (best|most flexible) option, converging on a good default that will work for this data.  The actual interface is in place, though would it be more feasible to go the OBDA (converge on a cross-Bio* compatible schema)?  Or are there problems afoot there we're unaware of?

Re: specifics, I think Biopython uses SQLite, is that correct Peter?  

chris

> Jason
> On Nov 1, 2011, at 7:38 AM, Peter Cock wrote:
> 
>> On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> 
>>> One problem the various Bio* indexers have currently is the lack of
>>> standardization on a specific schema for indexing.  There are in-roads
>>> towards this (OBDA) that haven't been adequately traveled IMHO,
>>> which need to be taken up again.
>>> 
>> 
>> Something to switch to open-bio-l at lists.open-bio.org for,
>> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>> 
>> We can continue this thread from last summer,
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
>> ...
>> http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html
>> 
>> And CC Peter Rice from EMBOSS too - we chatted about this
>> at ISMB/BOSC 2011 in July - and whomever looks after the
>> OBDA/indexing code in BioRuby and BioJava too.
>> 
>>> A second, and maybe this is more specific to BioPerl, is that the
>>> parsers and indexers essentially reimplement the format parsing
>>> in each module, so if there are bugs they have to be independently
>>> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
>>> first but not the second).  The best place for any optimizations
>>> would be in a unified parser that both the SeqIO and indexer
>>> modules could use.
>> 
>> We have that problem to an extent in Biopython's Bio.SeqIO code.
>> The indexing code duplicates some logic of the parsing code
>> (how much depends on the format), sufficient to extract the read
>> ID and the bounds on disk. The two could be more unified but
>> the parsers came first and didn't want to change them at the time.
>> Instead I tried to be rigorous in consistency testing for the index
>> code's unit tests.
>> 
>> Regards,
>> 
>> Peter
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From p.j.a.cock at googlemail.com  Tue Nov  1 18:06:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 18:06:50 +0000
Subject: [Bioperl-l] Bio::Index::Fastq '@' in qual
In-Reply-To: <6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
Message-ID: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>

On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>
>> I think a different indexer is needed for the scale of key/value
>> pairs we see in fastq files if we want to make a fast lookup by
>> ID. I think speed is of essence for this type of solution and so
>> a forced all records must be 4 lines long is okay for this type
>> of implementation.
>
> This can always be an early optimization, that's easy enough.
> But I'm sure we will have to deal with multi-line seq/qual
> FASTQ at some point.
>
>> I found NOSQL implementations to be much better
>> performance and than any of the BDB type solutions -- they
>> end up being really slow at above 1-5M keys. ?I used
>> TokyoCabinet and KyotoCabinet to do indexing of accession
>> -> taxonomy ID and found it quite fast for the needs. I
>> haven't tried storing 100bp reads + qual string as the
>> value in it yet but I think it could be done, certainly worth
>> a prototype.
>
> Adding a middle layer where the backend storage is abstracted
> is the probably the (best|most flexible) option, converging on a
> good default that will work for this data. ?The actual interface is
> in place, though would it be more feasible to go the OBDA
> (converge on a cross-Bio* compatible schema)? ?Or are there
> problems afoot there we're unaware of?
>
> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>
> chris

Yes, we're using SQLite3 to store essentially a list of filenames
and their format as one table, and then in the main table an
entry for each sequence recording the ID (only one accession,
unlike OBDA which had infrastructure for a secondary accession),
file number, offset of the start of the record, and optionally the
length of the record on disk.

i.e. Basically what OBDA does, but using SQLite rather
than BDB (not included in Python 3) or a flat file index
(poor performance with large datasets).

I find this design attractive on several levels:
* File format neutral, covers FASTA, FASTQ, GenBank, etc
* Preserves the original file untouched
* Index is a small single file (thanks to SQLite)
* Back end could be switched out
* Could be applied to compressed file formats
* Reuses existing parsing code to access entries

This could easily form basis of OBDA v2, the main points
of difference I anticipate between the Bio* projects would
be naming conventions for the different file formats, and
what we consider to be the default record ID of each read
(e.g. which field in a GenBank file - although agreement
here is not essential). Some of that was already settled in
principle with OBDA v1.

On the other hand, you could try and store the parsed data
itself, which is where NOSQL looks more interesting. That
essentially requires the ability to serialise your annotated
sequence object model to disk - which would be tricky to do
cross project (much more ambitious than BioSQL is). It also
means the "index" becomes very large because it now holds
all the original data.

Peter


From wenbinmei at gmail.com  Wed Nov  2 04:25:32 2011
From: wenbinmei at gmail.com (wenbin mei)
Date: Wed, 2 Nov 2011 00:25:32 -0400
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
Message-ID: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>

Hi,

I need some help in coding. I have a multiple sequence alignment which has
gaps. And also I have a reference genome sequence in the alignment which I
know all the coordinates for the protein coding genes. I want to extract
all these protein coding genes alignment from the big alignment. I am using
Bio SimpleAlign but the question is that due to the gaps in the alignment,
the coordinates has shifted in the alignment. I wonder is there a way I can
not count the gaps and still be able to extract the protein alignment. One
way I can do is remove the gaps in the reference first and then extract the
sequence. But I don't like this way ... Thank you for help.

-best,
wenbin


From dejian.zhao at gmail.com  Wed Nov  2 13:33:18 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Wed, 02 Nov 2011 21:33:18 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
	phylogenetic tree
Message-ID: <4EB1469E.4050108@gmail.com>

There are various packages on CPAN to cope with phylogenetic analysis. I 
wonder which module can read the output from other phylogenetic 
softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to 
produce a picture which combines the phylogenetic tree and the structure 
of each gene.


From roy.chaudhuri at gmail.com  Wed Nov  2 13:49:46 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 02 Nov 2011 13:49:46 +0000
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB1469E.4050108@gmail.com>
References: <4EB1469E.4050108@gmail.com>
Message-ID: <4EB14A7A.30307@gmail.com>

MEGA can export trees in Newick format, which can be read by 
Bio::TreeIO. The tree can be drawn in EPS format using 
Bio::Tree::Draw::Cladogram. See:
http://www.bioperl.org/wiki/HOWTO:Trees

Roy.

On 02/11/2011 13:33, Dejian Zhao wrote:
> There are various packages on CPAN to cope with phylogenetic analysis. I
> wonder which module can read the output from other phylogenetic
> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
> produce a picture which combines the phylogenetic tree and the structure
> of each gene.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Wed Nov  2 16:29:45 2011
From: jun.yin at ucd.ie (Jun Yin)
Date: Wed, 02 Nov 2011 16:29:45 +0000 (GMT)
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
 alignment
In-Reply-To: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
Message-ID: <7300ecdd1dd56.4eb16ff9@ucd.ie>

Hi,
 
You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
 
$aln2 = $aln->slice(20, 30);
 
Cheers,
Jun


----- Original Message -----
From: wenbin mei <wenbinmei at gmail.com>
Date: Wednesday, November 2, 2011 5:51 am
Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
To: bioperl-l at lists.open-bio.org

> Hi,
> 
> I need some help in coding. I have a multiple sequence alignment 
> which has
> gaps. And also I have a reference genome sequence in the 
> alignment which I
> know all the coordinates for the protein coding genes. I want to 
> extractall these protein coding genes alignment from the big 
> alignment. I am using
> Bio SimpleAlign but the question is that due to the gaps in the 
> alignment,the coordinates has shifted in the alignment. I wonder 
> is there a way I can
> not count the gaps and still be able to extract the protein 
> alignment. One
> way I can do is remove the gaps in the reference first and then 
> extract the
> sequence. But I don't like this way ... Thank you for help.
> 
> -best,
> wenbin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dejian.zhao at gmail.com  Thu Nov  3 01:39:22 2011
From: dejian.zhao at gmail.com (Dejian Zhao)
Date: Thu, 03 Nov 2011 09:39:22 +0800
Subject: [Bioperl-l] Modules to read MEGA output and reproduce the
 phylogenetic tree
In-Reply-To: <4EB14A7A.30307@gmail.com>
References: <4EB1469E.4050108@gmail.com> <4EB14A7A.30307@gmail.com>
Message-ID: <4EB1F0CA.80309@gmail.com>

That's great!
Many thanks, Roy.

On 2011-11-2 21:49, Roy Chaudhuri wrote:
> MEGA can export trees in Newick format, which can be read by 
> Bio::TreeIO. The tree can be drawn in EPS format using 
> Bio::Tree::Draw::Cladogram. See:
> http://www.bioperl.org/wiki/HOWTO:Trees
>
> Roy.
>
> On 02/11/2011 13:33, Dejian Zhao wrote:
>> There are various packages on CPAN to cope with phylogenetic analysis. I
>> wonder which module can read the output from other phylogenetic
>> softwares, e.g. MEGA, and reproduce the phylogenetic tree. I want to
>> produce a picture which combines the phylogenetic tree and the structure
>> of each gene.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From noncoding at gmail.com  Thu Nov  3 09:59:26 2011
From: noncoding at gmail.com (Remo Sanges)
Date: Thu, 03 Nov 2011 10:59:26 +0100
Subject: [Bioperl-l] how to not count gaps in the multiple sequence
	alignment
In-Reply-To: <7300ecdd1dd56.4eb16ff9@ucd.ie>
References: <CAHdrE2Q6weQ+7t_4X3_AZmu4JLQ3uGf3=s14UuOpDVa368V9aA@mail.gmail.com>
	<7300ecdd1dd56.4eb16ff9@ucd.ie>
Message-ID: <4EB265FE.30909@gmail.com>

To get the location in the initial sequence starting from a column in a 
multiple alignment you can:

1) create a Bio::LocatableSeq compliant object by using the method 
each_seq_with_id on the SimpleAlign object

2) then using the method location_from_column on the created 
LocatableSeq object

HTH

ERemo


-- 
Remo Sanges
Bioinformatics - Animal Physiology and Evolution
Stazione Zoologica Anton Dohrn
Villa Comunale, 80121 Napoli - Italy
+39 081 5833428


On 11/2/11 5:29 PM, Jun Yin wrote:
> Hi,
>
> You need to calculate the coordinates of the protein coding gene in the alignment by yourself. After that, you can use the slice function to get the alignment block for the selected gene, e.g.
>
> $aln2 = $aln->slice(20, 30);
>
> Cheers,
> Jun
>
>
> ----- Original Message -----
> From: wenbin mei<wenbinmei at gmail.com>
> Date: Wednesday, November 2, 2011 5:51 am
> Subject: [Bioperl-l] how to not count gaps in the multiple sequence alignment
> To: bioperl-l at lists.open-bio.org
>
>> Hi,
>>
>> I need some help in coding. I have a multiple sequence alignment
>> which has
>> gaps. And also I have a reference genome sequence in the
>> alignment which I
>> know all the coordinates for the protein coding genes. I want to
>> extractall these protein coding genes alignment from the big
>> alignment. I am using
>> Bio SimpleAlign but the question is that due to the gaps in the
>> alignment,the coordinates has shifted in the alignment. I wonder
>> is there a way I can
>> not count the gaps and still be able to extract the protein
>> alignment. One
>> way I can do is remove the gaps in the reference first and then
>> extract the
>> sequence. But I don't like this way ... Thank you for help.
>>
>> -best,
>> wenbin
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From G.Gallone at sms.ed.ac.uk  Thu Nov  3 11:50:11 2011
From: G.Gallone at sms.ed.ac.uk (Giuseppe G.)
Date: Thu, 03 Nov 2011 11:50:11 +0000
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
Message-ID: <4EB27FF3.9050203@sms.ed.ac.uk>

Hi,

I would be grateful if you could shed some light on the exact meaning of 
the method overall_percentage_identity in Bio::SimpleAlign.

If I understand correctly, the method works by considering only 
aminoacids that are identical over all the members of the alignment, and 
then averaging over the total number of aminoacids in the sequence. Is 
this correct?

Thank you
Giuseppe
-- 

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


From David.Messina at sbc.su.se  Thu Nov  3 13:22:21 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 3 Nov 2011 14:22:21 +0100
Subject: [Bioperl-l] Bio::SimpleAlign - Meaning of
	overall_percentage_identity?
In-Reply-To: <4EB27FF3.9050203@sms.ed.ac.uk>
References: <4EB27FF3.9050203@sms.ed.ac.uk>
Message-ID: <CAM3TQQWm46SWfu-6DANDaoppi8oLKGuzwGm8uxkVkf_JAog3xg@mail.gmail.com>

Hi Giuseppe,

If I understand correctly, the method works by considering only aminoacids
> that are identical over all the members of the alignment


Yes.


> , and then averaging over the total number of aminoacids in the sequence.
> Is this correct?
>

Almost.

By default, the denominator is the alignment length, namely the length of
the MSA including gaps. By means of the 'short' and 'long' options, it's
also possible to use the shortest or longest sequence's ungapped lengths as
the denominator.


Dave


From cjfields at illinois.edu  Thu Nov  3 18:28:36 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 18:28:36 +0000
Subject: [Bioperl-l] OBDA redux? was Re:  Bio::Index::Fastq '@' in qual
In-Reply-To: <CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
References: <E824C753-0688-4350-96B8-638BA1B37E73@neuro.utah.edu>
	<26C72D1A-0EEF-4844-A06C-58EE0FE89577@illinois.edu>
	<CAKVJ-_5_NEXALy1abueFCQqKRnteJpu=__VCqSDGyemorxs53A@mail.gmail.com>
	<47FB9664-A3F0-4A32-AD06-23D221431420@gmail.com>
	<6EFDCA41-D385-4CCB-A17B-0736F93DACDF@illinois.edu>
	<CAKVJ-_5iiR1-70V17KBVm-vz5hkwRf6NVLWtAWe+HOUHO+1M2w@mail.gmail.com>
Message-ID: <ED419B5E-9C55-478F-BDD6-C2B663ABE636@illinois.edu>

(side thread, so re-titling...)

On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:

> On Tue, Nov 1, 2011 at 5:44 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> On Nov 1, 2011, at 11:02 AM, Jason Stajich wrote:
>> 
>>> I think a different indexer is needed for the scale of key/value
>>> pairs we see in fastq files if we want to make a fast lookup by
>>> ID. I think speed is of essence for this type of solution and so
>>> a forced all records must be 4 lines long is okay for this type
>>> of implementation.
>> 
>> This can always be an early optimization, that's easy enough.
>> But I'm sure we will have to deal with multi-line seq/qual
>> FASTQ at some point.
>> 
>>> I found NOSQL implementations to be much better
>>> performance and than any of the BDB type solutions -- they
>>> end up being really slow at above 1-5M keys.  I used
>>> TokyoCabinet and KyotoCabinet to do indexing of accession
>>> -> taxonomy ID and found it quite fast for the needs. I
>>> haven't tried storing 100bp reads + qual string as the
>>> value in it yet but I think it could be done, certainly worth
>>> a prototype.
>> 
>> Adding a middle layer where the backend storage is abstracted
>> is the probably the (best|most flexible) option, converging on a
>> good default that will work for this data.  The actual interface is
>> in place, though would it be more feasible to go the OBDA
>> (converge on a cross-Bio* compatible schema)?  Or are there
>> problems afoot there we're unaware of?
>> 
>> Re: specifics, I think Biopython uses SQLite, is that correct Peter?
>> 
>> chris
> 
> Yes, we're using SQLite3 to store essentially a list of filenames
> and their format as one table, and then in the main table an
> entry for each sequence recording the ID (only one accession,
> unlike OBDA which had infrastructure for a secondary accession),
> file number, offset of the start of the record, and optionally the
> length of the record on disk.
> 
> i.e. Basically what OBDA does, but using SQLite rather
> than BDB (not included in Python 3) or a flat file index
> (poor performance with large datasets).
> 
> I find this design attractive on several levels:
> * File format neutral, covers FASTA, FASTQ, GenBank, etc
> * Preserves the original file untouched
> * Index is a small single file (thanks to SQLite)
> * Back end could be switched out
> * Could be applied to compressed file formats
> * Reuses existing parsing code to access entries
> 
> This could easily form basis of OBDA v2, the main points
> of difference I anticipate between the Bio* projects would
> be naming conventions for the different file formats, and
> what we consider to be the default record ID of each read
> (e.g. which field in a GenBank file - although agreement
> here is not essential). Some of that was already settled in
> principle with OBDA v1.

The primary/secondary IDs could be configurable with a sane default, I think the bioperl implementations allowed this (and it is certainly something that will be requested).

> On the other hand, you could try and store the parsed data
> itself, which is where NOSQL looks more interesting. That
> essentially requires the ability to serialise your annotated
> sequence object model to disk - which would be tricky to do
> cross project (much more ambitious than BioSQL is). It also
> means the "index" becomes very large because it now holds
> all the original data.
> 
> Peter

For a fully cross-Bio* compliant format, I don't think it's feasible to use serialized data unless they are serialized in something that is easily deserialized across HLLs (JSON, BSON, YAML, XML, etc).  Either that, or such data is stored concurrently with the binary blob, along with meta data that indicates the source of the blob, parser, version, etc, etc (unless there are tools out there that reliably interconvert serialized complex data structures between HLLs).  Anyway you go about it, it seems like it could be a major ball of hurt, unless implemented very carefully.

Aside: I think this was one of the problems with Bio::DB::SeqFeature::Store, in that it at one point stored Perl-specific Storable blobs.

chris


From p.j.a.cock at googlemail.com  Thu Nov  3 18:52:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 3 Nov 2011 18:52:50 +0000
Subject: [Bioperl-l] OBDA redux?
Message-ID: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>

On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> (side thread, so re-titling...)
>

And CC'ing open-bio-l, which is a better home for this than bioperl-l,
where OBDA v2 talk came up again in discussion of a BioPerl indexing
problem. Archive links for thread here:

http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>
>> Yes, we're using SQLite3 to store essentially a list of filenames
>> and their format as one table, and then in the main table an
>> entry for each sequence recording the ID (only one accession,
>> unlike OBDA which had infrastructure for a secondary accession),
>> file number, offset of the start of the record, and optionally the
>> length of the record on disk.
>>
>> i.e. Basically what OBDA does, but using SQLite rather
>> than BDB (not included in Python 3) or a flat file index
>> (poor performance with large datasets).
>>
>> I find this design attractive on several levels:
>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>> * Preserves the original file untouched
>> * Index is a small single file (thanks to SQLite)
>> * Back end could be switched out
>> * Could be applied to compressed file formats
>> * Reuses existing parsing code to access entries
>>
>> This could easily form basis of OBDA v2, the main points
>> of difference I anticipate between the Bio* projects would
>> be naming conventions for the different file formats, and
>> what we consider to be the default record ID of each read
>> (e.g. which field in a GenBank file - although agreement
>> here is not essential). Some of that was already settled in
>> principle with OBDA v1.
>
> The primary/secondary IDs could be configurable with a sane
> default, I think the bioperl implementations allowed this (and
> it is certainly something that will be requested).

One reason I went with a single ID only was to keep the
Python dictionary based API simple (think hash in Perl).
You don't get secondary keys in a Python dict or a hash ;)

As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
can provide a call back function to map the suggested ID to
something else. Obviously this doesn't give the full flexibility
of extracting a field from the record's annotation because we
don't parse the whole record during indexing (it would be too
slow).

However, I'm happy for there to be an *optional* secondary
key in an OBDA v2 SQLite schema, but Biopython probably
won't populate it. We could optionally use it rather than the
primary ID on loading an existing index though.

Personally I would stick with one key in the index - it should
be faster and makes it simpler to switch the back end if we
need to later. If anyone wants a second key, they can build
a second index *grin*.

>> On the other hand, you could try and store the parsed data
>> itself, which is where NOSQL looks more interesting. That
>> essentially requires the ability to serialise your annotated
>> sequence object model to disk - which would be tricky to do
>> cross project (much more ambitious than BioSQL is). It also
>> means the "index" becomes very large because it now holds
>> all the original data.
>>
>> Peter
>
> For a fully cross-Bio* compliant format, I don't think it's feasible
> to use serialized data unless they are serialized in something
> that is easily deserialized across HLLs (JSON, BSON, YAML,
> XML, etc).  Either that, or such data is stored concurrently with
> the binary blob, along with meta data that indicates the source
> of the blob, parser, version, etc, etc (unless there are tools out
> there that reliably interconvert serialized complex data structures
> between HLLs).  Anyway you go about it, it seems like it could
> be a major ball of hurt, unless implemented very carefully.

You missed out RDF as a serialisation ;)

But yes, going down the shared serialisation route is going
to be messy - as you are well aware:

> Aside: I think this was one of the problems with
> Bio::DB::SeqFeature::Store, in that it at one point stored
> Perl-specific Storable blobs.
>
> chris

Peter


From cjfields at illinois.edu  Thu Nov  3 19:47:51 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 3 Nov 2011 19:47:51 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
Message-ID: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>

On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:

> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
> <cjfields at illinois.edu> wrote:
>> (side thread, so re-titling...)
>> 
> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
> where OBDA v2 talk came up again in discussion of a BioPerl indexing
> problem. Archive links for thread here:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html

yes, good idea...

>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>> 
>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>> and their format as one table, and then in the main table an
>>> entry for each sequence recording the ID (only one accession,
>>> unlike OBDA which had infrastructure for a secondary accession),
>>> file number, offset of the start of the record, and optionally the
>>> length of the record on disk.
>>> 
>>> i.e. Basically what OBDA does, but using SQLite rather
>>> than BDB (not included in Python 3) or a flat file index
>>> (poor performance with large datasets).
>>> 
>>> I find this design attractive on several levels:
>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>> * Preserves the original file untouched
>>> * Index is a small single file (thanks to SQLite)
>>> * Back end could be switched out
>>> * Could be applied to compressed file formats
>>> * Reuses existing parsing code to access entries
>>> 
>>> This could easily form basis of OBDA v2, the main points
>>> of difference I anticipate between the Bio* projects would
>>> be naming conventions for the different file formats, and
>>> what we consider to be the default record ID of each read
>>> (e.g. which field in a GenBank file - although agreement
>>> here is not essential). Some of that was already settled in
>>> principle with OBDA v1.
>> 
>> The primary/secondary IDs could be configurable with a sane
>> default, I think the bioperl implementations allowed this (and
>> it is certainly something that will be requested).
> 
> One reason I went with a single ID only was to keep the
> Python dictionary based API simple (think hash in Perl).
> You don't get secondary keys in a Python dict or a hash ;)
> 
> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
> can provide a call back function to map the suggested ID to
> something else. Obviously this doesn't give the full flexibility
> of extracting a field from the record's annotation because we
> don't parse the whole record during indexing (it would be too
> slow).

Same with bioperl.

> However, I'm happy for there to be an *optional* secondary
> key in an OBDA v2 SQLite schema, but Biopython probably
> won't populate it. We could optionally use it rather than the
> primary ID on loading an existing index though.

Optional implementation of that is fine by me.

> Personally I would stick with one key in the index - it should
> be faster and makes it simpler to switch the back end if we
> need to later. If anyone wants a second key, they can build
> a second index *grin*.

That's easy enough.

>>> On the other hand, you could try and store the parsed data
>>> itself, which is where NOSQL looks more interesting. That
>>> essentially requires the ability to serialise your annotated
>>> sequence object model to disk - which would be tricky to do
>>> cross project (much more ambitious than BioSQL is). It also
>>> means the "index" becomes very large because it now holds
>>> all the original data.
>>> 
>>> Peter
>> 
>> For a fully cross-Bio* compliant format, I don't think it's feasible
>> to use serialized data unless they are serialized in something
>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>> XML, etc).  Either that, or such data is stored concurrently with
>> the binary blob, along with meta data that indicates the source
>> of the blob, parser, version, etc, etc (unless there are tools out
>> there that reliably interconvert serialized complex data structures
>> between HLLs).  Anyway you go about it, it seems like it could
>> be a major ball of hurt, unless implemented very carefully.
> 
> You missed out RDF as a serialisation ;)
> 
> But yes, going down the shared serialisation route is going
> to be messy - as you are well aware:
> 
>> Aside: I think this was one of the problems with
>> Bio::DB::SeqFeature::Store, in that it at one point stored
>> Perl-specific Storable blobs.
>> 
>> chris
> 
> Peter

yes, it's a problem w/o an easy solution.  Anyway, I think an implementation of such at this point would be a premature optimization.  

chris


From biojiangke at gmail.com  Tue Nov  8 22:29:54 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:29:54 -0800 (PST)
Subject: [Bioperl-l] Some questions about the Bio::PopGen
In-Reply-To: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
References: <BANLkTiktxeprLh+LxNr50cFZO+KweZCVFg@mail.gmail.com>
Message-ID: <32805996.post@talk.nabble.com>


I think the pi calculated in the function isn't really the pi as defined. You
need to divide the value by total number of sites (in your case, it's 5,
which is not your individual number but sequence length). I think the reason
they implemented this way is that sometimes it's easier to work only with
variable sites. 

The aln to population function converts an aln object to a population
object. You can't really see the object unless you write additional codes to
write it out or do some calculations on it. 

The third question depends on your specific needs. For population level
analyses of molecular evolution, I usually create a multiple sequence
alignment with other applications (clustalw etc), then manually adjust the
alignments to make sure they represent homology. I wouldn't touch the
alignment once this is done but only make an aln (or whatever format you
want) for inputting to analyses applications, like Bio::PopGen (usually use
the aln_to_population function you're using now).


Qian Zhao wrote:
> 
> Hi
> Recently, I am learning how to caculate pi, Fst, Tajima D using
> Bio::PopGen.
> I am not familiar with Perl and I am really confused with the following
> problems.
> (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used
> to caculate is this:
>     __DATA__
> 01 A01 A
> 01 A02 A
> 01 A03 A
> 01 A04 A
> 01 A05 A
> 02 A01 A
> 02 A02 T
> 02 A03 T
> 02 A04 T
> 02 A05 T
> 03 A01 G
> 03 A02 G
> 03 A03 G
> 03 A04 G
> 03 A05 G
> 04 A01 G
> 04 A02 G
> 04 A03 C
> 04 A04 C
> 04 A05 G
> 05 A01 T
> 05 A02 C
> 05 A03 T
> 05 A04 T
> 05 A05 T
> And I am not sure if I can use these sequences below to demostrate the
> prettybase format above:
>>A01
> AAGGT
>>A02
> ATGGC
>>A03
> ATGCT
>>A04
> ATGCT
>>A05
> ATGGT
> The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I
> use DnaSP. I find that if the 1.4/5=0.28, which means that if the number
> from Bio::PopGen::Statistics is divided by the individula number, the
> result
> would be exactly the same. Is there something wrong in my perl script? The
> code I used was below:
> #/usr/bin/perl -w
> use warnings;
> use strict;
> use Bio::PopGen::Genotype;
>  my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'gene_1',
>                                            -individual_id => '001',
>                                            -alleles       => ['1','5'] );
> use Bio::PopGen::Individual;
>  my $ind = Bio::PopGen::Individual->new(-unique_id  => '001',
>                                         -genotypes  => [$genotype] );
> $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  use Bio::PopGen::Population;
>  my $pop = Bio::PopGen::Population->new(-name        => 'Bm',
>                                         -description => 'description',
>                                         -individuals => [$ind] );
> use Bio::PopGen::IO;
> use Bio::PopGen::Statistics;
> my $nummarkers = $pop->get_marker_names;
> my $stats = Bio::PopGen::Statistics->new();
> my $io = Bio::PopGen::IO->new (-format => 'prettybase',
>                                -file => '1.txt');
> if( my $pop = $io->next_population ) {
>     my $pi = $stats->pi($pop, $nummarkers);
>     print "pi is $pi\n";
> my @inds;
>     for my $ind ( $pop->get_Individuals ) {
>         if( $ind->unique_id =~ /A0[1-3]/ ) {
>             push @inds, $ind;
>         }
>     }
>     print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n";
> }
> 
> (2) I want to use Bio::PopGen::Utilities to translate the alignment file
> to
> the population file. However, I can not find the result file after the
> program. I use the following code:
> use Bio::PopGen::Utilities;
>   use Bio::AlignIO;
> 
>   my $in = Bio::AlignIO->new(-file   => 't/data/t7.aln',
>                             -format => 'clustalw');
> my $aln = $in->next_aln;
> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln);
> my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model =>
> 'cod',
>                                                          -alignment  =>
> $aln);
> I am not sure where I should add my result file' name in the code.
> (3) If my file contains a lot of individual sequences and one individual
> has
> one genotype. I'd like to know how can I use the  Bio::PopGen::Individual,
> Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which
> can used in Bio::PopGen::Statistics ?
> 
> I will be great appreciated if I can get the answers. Thanks for your time
> and Best Wishes!
>                                                    Qian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biojiangke at gmail.com  Tue Nov  8 22:51:22 2011
From: biojiangke at gmail.com (vitis)
Date: Tue, 8 Nov 2011 14:51:22 -0800 (PST)
Subject: [Bioperl-l] questions about the bioperl module
 Bio::PopGen::Statistics
In-Reply-To: <201106012030039537050@gmail.com>
References: <201106012030039537050@gmail.com>
Message-ID: <32805997.post@talk.nabble.com>


If you read the Bio::PopGen doc, you'll see there is an optional argument for
the function that calculates pi, which is taking the number of sites into
consideration. Also, when you use the aln_to_population function to input an
alignment, you can use the option to take in all sites, including the
monomorphic sites. I think if you implement both in your script, you'll get
the same pi value as from other applications like DnaSP.

In terms of sliding window analyses, you may have to implement your own
method to move along the windows, but I think DnaSP is ready to do that, you
don't have to write your won script.
  

lvu.jun wrote:
> 
> Hi, there,
> I am trying to calculate the population genetics parameters such as pi
> using the bioperl module Bio::PopGen::Statistics. But I found that the
> method only requires the input of the marker genotype of every individuals
> for the population. I don't know why the module does not take the DNA
> sequence length into consideration when calculating the pi value.
> According to the definition of the pi value, besides the polymorphic
> sites, we also need the monomorphic sites that should be incorporated in
> the denominator when doing the calculation. Is it right? therefore I'm
> confused about the module, who can tell me why it can correctly calculate
> the pi value only with the marker(polymorphic) genotype?
> Another question, if I want to calculate the pi value using the sliding
> window along the genome, how can I do this using the
> Bio::PopGen::Statistics module?
> Thanks for your help!
> Yours sincerely,
> Jun
> 
> Chinese Academy of Sciences
> 
> 2011-06-01 
> 
> 
> 
> lvu.jun 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/questions-about-the-bioperl-module-Bio%3A%3APopGen%3A%3AStatistics-tp31749977p32805997.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From shachigahoimbi at gmail.com  Wed Nov  9 05:22:33 2011
From: shachigahoimbi at gmail.com (Shachi Gahoi)
Date: Wed, 9 Nov 2011 10:52:33 +0530
Subject: [Bioperl-l] Run FGENESH using bioperl
Message-ID: <CACyyM1ZOiMspVH3hF4fJOvedw=8YzZDuuzJRHsuJUJ=mkuYyng@mail.gmail.com>

Dear All.

I have multi-fasta sequence file and I want to run FGENESH and I would like
to run the FGENESH for sequence one by one stored in multi-fasta sequence
file.

Is it possible using Bioperl ?

Please guide me.

Thanks in advance.


-- 
Regards,
Shachi


From pankajt322 at gmail.com  Thu Nov  3 12:12:44 2011
From: pankajt322 at gmail.com (pankaj)
Date: Thu, 3 Nov 2011 05:12:44 -0700 (PDT)
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
Message-ID: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>


On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
> Dear all,
>
> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
> from fasta file and then I want to rename same file with that ORF ID
> "PITG_14194".
>
> I have many files and I want to do same exercise with all sequence files.
>
> Please tell me how can i do this with perl or bioperl.
>
> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>
> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>
> --
> Regards,
> Shachi
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From azaballos at isciii.es  Wed Nov  9 11:28:21 2011
From: azaballos at isciii.es (Angel Zaballos)
Date: Wed, 9 Nov 2011 12:28:21 +0100
Subject: [Bioperl-l] bp_genbank2gff.pl  bug
Message-ID: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>

Running bp_genbank2gff.pl got this:

[root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession AAXT01000001.1 > babesichr3.gff
Replacement list is longer than search list at /usr/share/perl5/Bio/Range.pm line 251.


?ngel Zaballos
Unidad de Gen?mica
Centro Nacional de Microbiolog?a-ISCIII
Carretera Majadahonda-Pozuelo, Km 2,2
28220-Majadahonda

Tel: 918223994
mail:  azaballos at isciii.es


************************* AVISO LEGAL *************************
Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
pudiendo contener documentos anexos de car?cter privado y confidencial.
Si por error, ha recibido este mensaje y no se encuentra entre los
destinatarios, por favor, no use, informe, distribuya, imprima o copie su
contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
cuando no responda a las funciones atribuidas al remitente del mismo por la
normativa vigente.


From scott at scottcain.net  Wed Nov  9 16:12:02 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 11:12:02 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
Message-ID: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>

Hi Angel,

I would suggest using bp_genbank2gff3.pl, as it is more actively
maintained; the bp_genbank2gff.pl script hasn't really been touched in many
years, and I imagine it's suffering from some serious code rot.

Scott


2011/11/9 Angel Zaballos <azaballos at isciii.es>

> Running bp_genbank2gff.pl got this:
>
> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> AAXT01000001.1 > babesichr3.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
>
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 16:13:10 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 16:13:10 +0000
Subject: [Bioperl-l] extract ORF ID from fasta file using bioperl
In-Reply-To: <bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
References: <CACyyM1brvFU1N3NOqUDfJ8eBSNqNZSsHGkxaLoK2Euik0Z2s0w@mail.gmail.com>
	<bc50d79e-005d-46a9-ac0f-6237d43df0f4@u10g2000prl.googlegroups.com>
Message-ID: <CAPOrs_030887wt=T7ZJyDUid92poO+FX4kKkRFTzWweXi5ffvw@mail.gmail.com>

On 3 November 2011 12:12, pankaj <pankajt322 at gmail.com> wrote:
>
>
> On Oct 21, 1:59?am, Shachi Gahoi <shachigahoi... at gmail.com> wrote:
>> Dear all,
>>
>> I have fasta format sequence file and I want to extract ORF ID "PITG_14194"
>> from fasta file and then I want to rename same file with that ORF ID
>> "PITG_14194".
>>
>> I have many files and I want to do same exercise with all sequence files.
>>
>> Please tell me how can i do this with perl or bioperl.
>>
>> >tr|D0NNU7|D0NNU7_PHYIT Carbohydrate esterase, putative OS=Phytophthora
>>
>> infestans (strain T30-4) GN=PITG_14194 PE=4 SV=1
>> MVKLSIVSSTMQSLLAPLLRVWTDPERRRKFLRWLFGGTSGAIALLLILEATRGFCRTPL
>> ETAQLLAGISWTLCKITVQFVARGFKPKFAKWTLRYELLHGLMRTAATMFGERIVDLQHA
>> RVIRHHTGMFGTVLGSFARWQNEMRLESVRLNGLEHIWLKSSTCTTETKSERKRLVVLFF
>> HGGGYAVLSPRMYISFCSAVAGAIRQQLASDDVDVDVFLANYRKLPEHKFPVPAEDAVAM
>> YEYLLQHEKLEPSQIILAGDSAGGGLVMSTLLRVRDGLSSWKSKLPLPLAAIVMCPLADL
>> TWDEDEIAGQHCVLPLNMTAASVLTYHPTRDDPSTWADASPVHCNLQGLPPVFLQSATLD
>> RLFQHSVRLAAKAKADGLVNWEVDIHEGVPHVFMVIPAYVLPYARVGVGRMAAFAAKQFR
>> NGIAVDHKGVICNGKAPIEIAVDENTLSAAA
>>

---------- Forwarded message ----------
From: Jason Stajich <jason.stajich at gmail.com>
Date: 21 October 2011 10:56
Subject: Re: [Bioperl-l] extract ORF ID from fasta file using bioperl
To: Shachi Gahoi <shachigahoimbi at gmail.com>
Cc: bioperl-l at bioperl.org


easy to do this with a simple regular expression and opening a new
file.  Have you read up on this concept in Perl.
You can use SeqIO to parse FASTA files - did you read the HOWTO and
website documentation first?

We don't typically do people's work for them on this mailing list so
please show some effort first.


From scott at scottcain.net  Wed Nov  9 18:43:00 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 13:43:00 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
Message-ID: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>

Hi Chris,

Actually, removing it from the distribution (but letting it remain in the
code repository) is not a bad idea.  I can't really think of a down side.

Scott


2011/11/9 Fields, Christopher J <cjfields at illinois.edu>

> Scott,
>
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or
> remove it altogether)?
>
> chris
>
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
>
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus
> destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie
> su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III
> no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este
> mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo
> por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 18:39:52 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 18:39:52 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
Message-ID: <0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>

Scott,

Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?

chris

On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:

> Hi Angel,
> 
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> years, and I imagine it's suffering from some serious code rot.
> 
> Scott
> 
> 
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> 
>> Running bp_genbank2gff.pl got this:
>> 
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> 
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov  9 19:51:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 19:51:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <C0212F3D-AFD7-41A4-9649-B876FAFA7C02@illinois.edu>

Scott,

It would remain in the repo history if it is removed, otherwise we can probably set up an 'unmaintained' folder.  Either would prevent it from being packaged and installed in future versions.  

(Speaking of, we should discuss (w/ Lincoln) about possible splitting out Bio::DB::SeqFeature/GFF and related code/tests/etc into it's own distribution, in line with slimming down core modules)

chris

On Nov 9, 2011, at 12:43 PM, Scott Cain wrote:

> Hi Chris,
> 
> Actually, removing it from the distribution (but letting it remain in the code repository) is not a bad idea.  I can't really think of a down side.
> 
> Scott
> 
> 
> 2011/11/9 Fields, Christopher J <cjfields at illinois.edu>
> Scott,
> 
> Do we want to add that caveat to the bp_genbank2gff.pl documentation (or remove it altogether)?
> 
> chris
> 
> On Nov 9, 2011, at 10:12 AM, Scott Cain wrote:
> 
> > Hi Angel,
> >
> > I would suggest using bp_genbank2gff3.pl, as it is more actively
> > maintained; the bp_genbank2gff.pl script hasn't really been touched in many
> > years, and I imagine it's suffering from some serious code rot.
> >
> > Scott
> >
> >
> > 2011/11/9 Angel Zaballos <azaballos at isciii.es>
> >
> >> Running bp_genbank2gff.pl got this:
> >>
> >> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
> >> AAXT01000001.1 > babesichr3.gff
> >> Replacement list is longer than search list at
> >> /usr/share/perl5/Bio/Range.pm line 251.
> >>
> >>
> >>
> >> ?ngel Zaballos
> >> Unidad de Gen?mica
> >> Centro Nacional de Microbiolog?a-ISCIII
> >> Carretera Majadahonda-Pozuelo, Km 2,2
> >> 28220-Majadahonda
> >>
> >> Tel: 918223994
> >> mail:  azaballos at isciii.es
> >>
> >>
> >>
> >>
> >> ************************* AVISO LEGAL *************************
> >> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> >> pudiendo contener documentos anexos de car?cter privado y confidencial.
> >> Si por error, ha recibido este mensaje y no se encuentra entre los
> >> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> >> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> >> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> >> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> >> cuando no responda a las funciones atribuidas al remitente del mismo por la
> >> normativa vigente.
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain dot
> > net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From carandraug+dev at gmail.com  Wed Nov  9 20:39:17 2011
From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=)
Date: Wed, 9 Nov 2011 20:39:17 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
Message-ID: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>

On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> Hi Chris,
>
> Actually, removing it from the distribution (but letting it remain in the
> code repository) is not a bad idea. ?I can't really think of a down side.
>
> Scott

Hi

can I suggest instead to simply make the script issue a warning right
at the start? Something like "bp_genbank2gff is obsolete and will be
removed from a future version of bioerl; please use bp_genbank2gff3
instead". You could leave it there for the next 2 releases and then
finally remove it. This would have 2 advantages:

1) people that have been using it will immediately know what to use as
replacement (instead of coming and ask in the mailing list)?
2) people who use it but don't know anything about the subject,
someone told them to "just press this button" or "just type this in
the terminal", won't have suddenly a broken system and will have time
to find someone that will make it work again.

That's what's done in GNU octave and I think it works good there.
Carn?


From scott at scottcain.net  Wed Nov  9 20:48:07 2011
From: scott at scottcain.net (Scott Cain)
Date: Wed, 9 Nov 2011 15:48:07 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
Message-ID: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>

Hi Carn?,

You are absolutely correct; that is the right way to do it.  I'll add that
right now (and if the original posts fix is an easy one, I'll fix that too
:-)

Scott


2011/11/9 Carn? Draug <carandraug+dev at gmail.com>

> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
>
> Hi
>
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
>
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
>
> That's what's done in GNU octave and I think it works good there.
> Carn?
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Wed Nov  9 21:59:48 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 9 Nov 2011 21:59:48 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0A9E463B-A527-4AF7-91FF-0CF3CA72E73C@illinois.edu>
	<CA+JTaoxrmzW8E1omw5cA1DSLfg=xSjxJDvh9a=_cEz913qTbCg@mail.gmail.com>
	<CAPOrs_2jZADvKkKE-1EbcMiXe8J74rNPd2y09N_BguVw4Er=hA@mail.gmail.com>
	<CA+JTaowZXPfbXawZGC+N_JVX5BJpKMTVRbSoh7kd6oWxCYbJZg@mail.gmail.com>
Message-ID: <C86AC2F8-F8E8-431D-83A6-39E896C23485@illinois.edu>

Works for me, it's a standard deprecation policy.  The only caveat is that the next 'release' of the code would be when the related code is split out into it's own distribution (which will require it's own versioning).

chris

On Nov 9, 2011, at 2:48 PM, Scott Cain wrote:

> Hi Carn?,
> 
> You are absolutely correct; that is the right way to do it.  I'll add that right now (and if the original posts fix is an easy one, I'll fix that too :-)
> 
> Scott
> 
> 
> 2011/11/9 Carn? Draug <carandraug+dev at gmail.com>
> On 9 November 2011 18:43, Scott Cain <scott at scottcain.net> wrote:
> > Hi Chris,
> >
> > Actually, removing it from the distribution (but letting it remain in the
> > code repository) is not a bad idea.  I can't really think of a down side.
> >
> > Scott
> 
> Hi
> 
> can I suggest instead to simply make the script issue a warning right
> at the start? Something like "bp_genbank2gff is obsolete and will be
> removed from a future version of bioerl; please use bp_genbank2gff3
> instead". You could leave it there for the next 2 releases and then
> finally remove it. This would have 2 advantages:
> 
> 1) people that have been using it will immediately know what to use as
> replacement (instead of coming and ask in the mailing list)?
> 2) people who use it but don't know anything about the subject,
> someone told them to "just press this button" or "just type this in
> the terminal", won't have suddenly a broken system and will have time
> to find someone that will make it work again.
> 
> That's what's done in GNU octave and I think it works good there.
> Carn?
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research


From biopython at maubp.freeserve.co.uk  Thu Nov 10 13:09:40 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 13:09:40 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <31659982.post@talk.nabble.com>
References: <31659982.post@talk.nabble.com>
Message-ID: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>

Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html

On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>
> I received the following error while trying to run bl2seq from
> standaloneblastplus. Has anyone else encountered this problem?
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: /usr/bin/blastp call crashed: There was a problem running
> /usr/bin/blastp : Error: NCBI C++ Exception:
>
> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
> access NULL pointer.
>
> Thank you,
> Ryan

Just hit something very very similar, looks like a BLAST+ bug which I
will report now:

$ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
Error: NCBI C++ Exception:
    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
Attempt to access NULL pointer.

This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
BLAST 2.2.24+ (blastp) from the look of the error. The line number has
changed by one, but I'm confident it is the same point of failure.

In my case I was comparing nucleotide against nucleotide, so should
have been using tblastx not tblastn, but it still shouldn't have had a
pointer exception.

Peter


From cjfields at illinois.edu  Thu Nov 10 14:00:46 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 14:00:46 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI
	C++	Exception
In-Reply-To: <CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
Message-ID: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>

On Nov 10, 2011, at 7:09 AM, Peter wrote:

> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
> 
> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>> 
>> I received the following error while trying to run bl2seq from
>> standaloneblastplus. Has anyone else encountered this problem?
>> 
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: /usr/bin/blastp call crashed: There was a problem running
>> /usr/bin/blastp : Error: NCBI C++ Exception:
>> 
>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>> access NULL pointer.
>> 
>> Thank you,
>> Ryan
> 
> Just hit something very very similar, looks like a BLAST+ bug which I
> will report now:
> 
> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
> Error: NCBI C++ Exception:
>    "/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
> Attempt to access NULL pointer.
> 
> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
> changed by one, but I'm confident it is the same point of failure.
> 
> In my case I was comparing nucleotide against nucleotide, so should
> have been using tblastx not tblastn, but it still shouldn't have had a
> pointer exception.
> 
> Peter

Yeah, that's bad.  I have seen a few things like this myself that make me worry about the transition to BLAST+.

chris

PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?


From casaburi at ceinge.unina.it  Thu Nov 10 12:29:55 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Thu, 10 Nov 2011 04:29:55 -0800 (PST)
Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
Message-ID: <32818254.post@talk.nabble.com>


Hi everybody,

i have some reads (454) where there are adaptors (NNNN...), one,two or three
adaptors for each reads depending on the reads. Is there any way to
establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
over the total ???

>271-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>272-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>273-88
GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>274-88
GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA

The problem is that some adpators occur in the middle of the sequences
because they coming out from a concameration experimental design (they are
miRNAs between NNNNNN...). So i want to know a script or tool that may say
how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
number of reads. Do you know any tool/script that may help ? Tnx 
Can anyone suggests me a script to fix this ???

Thank you very much 
-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jovel_juan at hotmail.com  Thu Nov 10 16:06:16 2011
From: jovel_juan at hotmail.com (Juan Jovel)
Date: Thu, 10 Nov 2011 16:06:16 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <32818254.post@talk.nabble.com>
References: <32818254.post@talk.nabble.com>
Message-ID: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>


There are many ways to do it. 
Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
For example: 
$adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
You then place that result in a hash bin:
my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
# Then you can sort and output your classes
foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }

You can workout the details, but something like this should work.


> Date: Thu, 10 Nov 2011 04:29:55 -0800
> From: casaburi at ceinge.unina.it
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
> 
> 
> Hi everybody,
> 
> i have some reads (454) where there are adaptors (NNNN...), one,two or three
> adaptors for each reads depending on the reads. Is there any way to
> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
> over the total ???
> 
> >271-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
> >272-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
> >273-88
> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
> >274-88
> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
> 
> The problem is that some adpators occur in the middle of the sequences
> because they coming out from a concameration experimental design (they are
> miRNAs between NNNNNN...). So i want to know a script or tool that may say
> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
> number of reads. Do you know any tool/script that may help ? Tnx 
> Can anyone suggests me a script to fix this ???
> 
> Thank you very much 
> -- 
> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 		 	   		  

From scott at scottcain.net  Thu Nov 10 16:55:53 2011
From: scott at scottcain.net (Scott Cain)
Date: Thu, 10 Nov 2011 11:55:53 -0500
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
Message-ID: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>

Hi Angel,

Please keep correspondence on the mailing list.

I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
and it worked fine.  I suspect there is something wrong with your genbank
file.

Scott


On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:

> His Scott,
>
> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
> happened:
>
> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
> babesichr3_2.gff
> Replacement list is longer than search list at
> /usr/share/perl5/Bio/Range.pm line 251.
> UNIVERSAL->import is deprecated and will be removed in a future perl at
> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>
> However, the output file seems to be correct (Indeed, that was also the
> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>
> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
> babesiachr3_2.gff
> Reading babesiachr3_2.gff
> Read 4776 transcripts in 8821 lines in 1 files
>   4776 groups 1 seqs 1 sources 6 feature types
> 2379 gene predictions
>
> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
> Mac with Parallels. Maybe tis is the cause for such a message.
>
> Regards
>
>
> ?ngel
>
>
> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>
> Hi Angel,
>
> I would suggest using bp_genbank2gff3.pl, as it is more actively
> maintained; the bp_genbank2gff.pl script hasn't really been touched in
> many years, and I imagine it's suffering from some serious code rot.
>
> Scott
>
>
> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>
>> Running bp_genbank2gff.pl got this:
>>
>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>> AAXT01000001.1 > babesichr3.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>>
>>
>>
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>>
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>>
>>
>>
>>
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este
>> mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por
>> la
>> normativa vigente.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>
>
> ?ngel Zaballos
> Unidad de Gen?mica
> Centro Nacional de Microbiolog?a-ISCIII
> Carretera Majadahonda-Pozuelo, Km 2,2
> 28220-Majadahonda
>
> Tel: 918223994
> mail:  azaballos at isciii.es
>
>
>
> ************************* AVISO LEGAL *************************
> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
> pudiendo contener documentos anexos de car?cter privado y confidencial.
> Si por error, ha recibido este mensaje y no se encuentra entre los
> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
> cuando no responda a las funciones atribuidas al remitente del mismo por la
> normativa vigente.
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From l.m.timmermans at students.uu.nl  Thu Nov 10 17:17:12 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Thu, 10 Nov 2011 18:17:12 +0100
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <CAC1jpXAW_MTQjBY8Z8ffr67g_0TrGwWddixuQvtTB19+S+DLVg@mail.gmail.com>

On Thu, Nov 10, 2011 at 5:06 PM, Juan Jovel <jovel_juan at hotmail.com> wrote:

>
> There are many ways to do it.
> Perhaps the simplest is to count the number of times the adapter sequence
> (or part of it) appears in each read.
> For example:
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;#
> $adapter_matches will store the number of times the adapter sequence is
> repeated.
>

No, it will not. tr/// will count characters, not sequences. Something like
?scalar (() = $sequence =~ m/(N+)/g)? should work OTOH.

Leon


From cjfields at illinois.edu  Thu Nov 10 19:17:57 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 19:17:57 +0000
Subject: [Bioperl-l] bp_genbank2gff.pl bug
In-Reply-To: <CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
References: <1299F32D-8640-44CA-9718-0F6F3E105414@isciii.es>
	<CA+JTaoz0eAY7xzuATsZaKT=gchPQvFLrgc+=p4Y4BTEuMDzYfQ@mail.gmail.com>
	<0CACB816-21E2-494E-ACEF-67F9FBB5AC41@isciii.es>
	<CA+JTaoyWsS534Ew43Ye3v77o-+d8zK3rcYAup8PCEPApX_ispw@mail.gmail.com>
Message-ID: <66F13EAF-0DAA-45E0-AB5B-E71EC8FA2323@illinois.edu>

This is running using an older version of bioperl (probably 1.6.0 or 1.6.1).  The warnings pop up when using perl v5.12 or v5.14; the first warning is from a bad tr/// in Bio::Range, the second is from bad usage of UNIVERSAL functions, both have ben addressed.

chris

On Nov 10, 2011, at 10:55 AM, Scott Cain wrote:

> Hi Angel,
> 
> Please keep correspondence on the mailing list.
> 
> I just ran bp_genbank2gff.pl with a genbank file (fruit fly mitocontria),
> and it worked fine.  I suspect there is something wrong with your genbank
> file.
> 
> Scott
> 
> 
> On Thu, Nov 10, 2011 at 3:15 AM, Angel Zaballos <azaballos at isciii.es> wrote:
> 
>> His Scott,
>> 
>> Thanks everyone for your help. I tried bp_genbank2gff3.pl, but the same
>> happened:
>> 
>> [root at localhost zaballos]# bp_genbank2gff3.pl babesiaChr3.gbk >
>> babesichr3_2.gff
>> Replacement list is longer than search list at
>> /usr/share/perl5/Bio/Range.pm line 251.
>> UNIVERSAL->import is deprecated and will be removed in a future perl at
>> /usr/share/perl5/Bio/Tree/TreeFunctionsI.pm line 94
>> 
>> However, the output file seems to be correct (Indeed, that was also the
>> case for  bp_genbank2gff.pl). I then ran ldHgGene and worked:
>> 
>> [zaballos at localhost ~]$ ./ldHgGene -out=babesiaChr3_2.gpe db tab
>> babesiachr3_2.gff
>> Reading babesiachr3_2.gff
>> Read 4776 transcripts in 8821 lines in 1 files
>>  4776 groups 1 seqs 1 sources 6 feature types
>> 2379 gene predictions
>> 
>> I'm using Fedora (for bioperl) and CentOS (for ldHgGene), virtualized on a
>> Mac with Parallels. Maybe tis is the cause for such a message.
>> 
>> Regards
>> 
>> 
>> ?ngel
>> 
>> 
>> El 09/11/2011, a las 17:12, Scott Cain escribi?:
>> 
>> Hi Angel,
>> 
>> I would suggest using bp_genbank2gff3.pl, as it is more actively
>> maintained; the bp_genbank2gff.pl script hasn't really been touched in
>> many years, and I imagine it's suffering from some serious code rot.
>> 
>> Scott
>> 
>> 
>> 2011/11/9 Angel Zaballos <azaballos at isciii.es>
>> 
>>> Running bp_genbank2gff.pl got this:
>>> 
>>> [root at localhost zaballos]# bp_genbank2gff.pl -stdout -accession
>>> AAXT01000001.1 > babesichr3.gff
>>> Replacement list is longer than search list at
>>> /usr/share/perl5/Bio/Range.pm line 251.
>>> 
>>> 
>>> 
>>> ?ngel Zaballos
>>> Unidad de Gen?mica
>>> Centro Nacional de Microbiolog?a-ISCIII
>>> Carretera Majadahonda-Pozuelo, Km 2,2
>>> 28220-Majadahonda
>>> 
>>> Tel: 918223994
>>> mail:  azaballos at isciii.es
>>> 
>>> 
>>> 
>>> 
>>> ************************* AVISO LEGAL *************************
>>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>>> Si por error, ha recibido este mensaje y no se encuentra entre los
>>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>>> asume ning?n tipo de responsabilidad legal por el contenido de este
>>> mensaje
>>> cuando no responda a las funciones atribuidas al remitente del mismo por
>>> la
>>> normativa vigente.
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> 
>> 
>> ?ngel Zaballos
>> Unidad de Gen?mica
>> Centro Nacional de Microbiolog?a-ISCIII
>> Carretera Majadahonda-Pozuelo, Km 2,2
>> 28220-Majadahonda
>> 
>> Tel: 918223994
>> mail:  azaballos at isciii.es
>> 
>> 
>> 
>> ************************* AVISO LEGAL *************************
>> Este mensaje electr?nico est? dirigido exclusivamente a sus destinatarios,
>> pudiendo contener documentos anexos de car?cter privado y confidencial.
>> Si por error, ha recibido este mensaje y no se encuentra entre los
>> destinatarios, por favor, no use, informe, distribuya, imprima o copie su
>> contenido por ning?n medio. Le rogamos lo comunique al remitente y borre
>> completamente el mensaje y sus anexos. El Instituto de Salud Carlos III no
>> asume ning?n tipo de responsabilidad legal por el contenido de este mensaje
>> cuando no responda a las funciones atribuidas al remitente del mismo por la
>> normativa vigente.
>> 
>> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Thu Nov 10 19:27:22 2011
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Nov 2011 19:27:22 +0000
Subject: [Bioperl-l] [Bioperl-announce-l] Null Pointer - NCBI C++
	Exception
In-Reply-To: <B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
References: <31659982.post@talk.nabble.com>
	<CAKVJ-_445daDbwg6QkxLUhnji1BhS+YvEehBhB9D3ij1M22tmw@mail.gmail.com>
	<B1FB239F-28AD-4437-97C0-10110763F34D@illinois.edu>
Message-ID: <CAKVJ-_4+hGzxmn43qJ4SkJfCaPUQw=PkV5QSjUyqPSDmyVw64A@mail.gmail.com>

On Thu, Nov 10, 2011 at 2:00 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 10, 2011, at 7:09 AM, Peter wrote:
>
>> Re: http://lists.open-bio.org/pipermail/bioperl-announce-l/2011-May/000253.html
>>
>> On Thu, May 19, 2011 at 11:15 PM, rgoldade <rgoldade at sfu.ca> wrote:
>>>
>>> I received the following error while trying to run bl2seq from
>>> standaloneblastplus. Has anyone else encountered this problem?
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: /usr/bin/blastp call crashed: There was a problem running
>>> /usr/bin/blastp : Error: NCBI C++ Exception:
>>>
>>> "/am/ncbiapdata/release/blast/src/2.2.24/Linux32-Suse-icc/c++/ICC1010-ReleaseMT--Linux32-Suse-icc/../src/corelib/ncbiobj.cpp",
>>> line 688: Critical: ncbi::CObject::ThrowNullPointerException() - Attempt to
>>> access NULL pointer.
>>>
>>> Thank you,
>>> Ryan
>>
>> Just hit something very very similar, looks like a BLAST+ bug which I
>> will report now:
>>
>> $ tblastn -out NC_003197_vs_NC_011294.tsv -outfmt 6 -query
>> NC_003197.fna -evalue 0.0001 -subject NC_011294.fna
>> Error: NCBI C++ Exception:
>> ? ?"/am/ncbiapdata/release/blast/src/2.2.25/IntelMAC-universal/c++/GCC401-ReleaseMT--IntelMAC-universal/../src/corelib/ncbiobj.cpp",
>> line 689: Critical: ncbi::CObject::ThrowNullPointerException() -
>> Attempt to access NULL pointer.
>>
>> This was on a Mac using BLAST 2.2.25+ (tblastn) whereas yours was
>> BLAST 2.2.24+ (blastp) from the look of the error. The line number has
>> changed by one, but I'm confident it is the same point of failure.
>>
>> In my case I was comparing nucleotide against nucleotide, so should
>> have been using tblastx not tblastn, but it still shouldn't have had a
>> pointer exception.
>>
>> Peter
>
> Yeah, that's bad. ?I have seen a few things like this myself that make me worry about the transition to BLAST+.
>
> chris

I'm told is already fixed and will be part of BLAST 2.2.26+ which is good.

>
> PS - Odd I didn't see this one, was it caught in the bioperl-announce filter?
>

Maybe once, but it was in the archive and my email account.

Peter


From anna.fr at gmail.com  Thu Nov 10 20:01:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 09:01:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
Message-ID: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>

Hi all

Does anyone know if there is a way to get a Taxonomy node and/or
taxonid from a gi number using the flatfile with taxonomy db?

I have blast output that I want to append taxonomic information to. I
have hundreds of thousands of items to do this for, so it's not
practical to use entrez to query the?NCBI database.

I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
think much too large to put into a hash!

This was also discussed in 2009:
http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
don't think there was a conclusion?

Thanks for your help
Anna Friedlander


From shalabh.sharma7 at gmail.com  Thu Nov 10 20:12:09 2011
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 10 Nov 2011 15:12:09 -0500
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>

Hi Anna,
           I think the thread you mentioned was started by me.
That time i wrote few scripts to map gi to taxa, after some time i found
some other efficient ways also. But recently 'Miguel Pignatelli' directed
to some Bio-LITE modules that are really helpful.

These are the modules he mentioned, i found them really easy to use and
very efficient.

Bio-LITE-Taxonomy-0.07
Bio-LITE-Taxonomy-NCBI-0.07
Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04

Cheers
Shalabh

On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:

> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636


From cjfields at illinois.edu  Thu Nov 10 20:23:14 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 20:23:14 +0000
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAA7rn9cf=iBZWRmg0f1GzeT9=NOp4nV7MfDyLxmTHH4rfYHtug@mail.gmail.com>
Message-ID: <53AF9ECA-5905-4D14-B7C1-FF4B2F2FA084@illinois.edu>

Yes, these are probably wrappers around the gi2taxid, and taxonomy data; bioperl lacks the former, whereas the latter is handled by Bio::DB::Taxonomy (the 'flatfile' option).  I did something very similar locally, though I used Bio::DB::Taxonomy for the taxonomy lookups.

chris

On Nov 10, 2011, at 2:12 PM, shalabh sharma wrote:

> Hi Anna,
>           I think the thread you mentioned was started by me.
> That time i wrote few scripts to map gi to taxa, after some time i found
> some other efficient ways also. But recently 'Miguel Pignatelli' directed
> to some Bio-LITE modules that are really helpful.
> 
> These are the modules he mentioned, i found them really easy to use and
> very efficient.
> 
> Bio-LITE-Taxonomy-0.07
> Bio-LITE-Taxonomy-NCBI-0.07
> Bio-LITE-Taxonomy-NCBI-**Gi2taxid-0.04
> 
> Cheers
> Shalabh
> 
> On Thu, Nov 10, 2011 at 3:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> 
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> Shalabh Sharma
> Scientific Computing Professional Associate (Bioinformatics Specialist)
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Thu Nov 10 20:51:13 2011
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 10 Nov 2011 21:51:13 +0100
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
Message-ID: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>

Hi Anna,

Jason changed his example script from using hashes to using SQLite:
bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom

See
https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl

It's an example script that shows how to do the tax to gi mapping for
blast reports.


Bernd

On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
> have hundreds of thousands of items to do this for, so it's not
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
> think much too large to put into a hash!
>
> This was also discussed in 2009:
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
> don't think there was a conclusion?
>
> Thanks for your help
> Anna Friedlander
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Thu Nov 10 21:13:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 10 Nov 2011 21:13:12 +0000
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
Message-ID: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>

If the adaptors are masked (e.g. are represented by the N's below) or if you are really confident that the adaptors don't have base mis-calls, why not use split?  Maybe with something like 'scalar(split(/N+/, $foo))' or scalar(split(/$adaptor/, $foo)).  

tr/// won't work for the reasons Leon mentioned; it's a transliteration of a character mapping, not a pattern match.  '$foo =~ tr/ATGCatgc/TACGtagc/' for instance converts $foo to the complement sequence (it doesn't match the pattern /ATGCatgc/).

chris

On Nov 10, 2011, at 10:06 AM, Juan Jovel wrote:

> 
> There are many ways to do it. 
> Perhaps the simplest is to count the number of times the adapter sequence (or part of it) appears in each read. 
> For example: 
> $adapter_matches = tr/adapter_sequence/adapter_sequence/;# $adapter_matches will store the number of times the adapter sequence is repeated. 
> You then place that result in a hash bin:
> my %adapter_frequency;my $class = "$adapter_matches";if(exists $adapter_frequency{$class}){	$adapter_frequency{$class}++}else{	$adapter_frequency{$class} = 1}
> # Then you can sort and output your classes
> foreach $class (sort keys %adapter_frequency){                print "$class\t$adapter_frequency{$class}\n";        }
> 
> You can workout the details, but something like this should work.
> 
> 
> 
> 
> 
> 
> 
>> Date: Thu, 10 Nov 2011 04:29:55 -0800
>> From: casaburi at ceinge.unina.it
>> To: Bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l]  Scripting help to identify adaptors count in reads
>> 
>> 
>> Hi everybody,
>> 
>> i have some reads (454) where there are adaptors (NNNN...), one,two or three
>> adaptors for each reads depending on the reads. Is there any way to
>> establish how many reads have 1 adaptors, how many 2 and how many 3 adaptors
>> over the total ???
>> 
>>> 271-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCAGGTGCCTACG
>>> 272-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNATCANNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTTGCCAGCCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGC
>>> 273-88
>> GCCTCCCTCGCGCATCAGATCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCA
>>> 274-88
>> GCCTTGCCAGCCGCTCAGATTGATNNNNNNNNNNNNNNNCTGATGGCGCGAGGGAGGCGCCTCCCTCGCGCCATCAGATCGTNNNNNNNNNNNNNNNNNNTCGTAGGCACCATCAATCTGAGCGGGCTGGCAAGGCGCCTCCCTCGCGCCATCAGATCGTAGGCACCATCAA
>> 
>> The problem is that some adpators occur in the middle of the sequences
>> because they coming out from a concameration experimental design (they are
>> miRNAs between NNNNNN...). So i want to know a script or tool that may say
>> how many reads have 1 adapt, how many 2, (max are 4) in respect to the total
>> number of reads. Do you know any tool/script that may help ? Tnx 
>> Can anyone suggests me a script to fix this ???
>> 
>> Thank you very much 
>> -- 
>> View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32818254.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 		 	   		  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Thu Nov 10 21:15:29 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Thu, 10 Nov 2011 13:15:29 -0800
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
Message-ID: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>

Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI
and then a second db to store GI -> TAXONID

This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string.

https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl

That's the first 165 lines, and then lookups are basically what you see on line 195.

Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?).
one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading.

Jason

On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:

> Hi Anna,
> 
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
> 
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
> 
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
> 
> 
> Bernd
> 
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anna.fr at gmail.com  Fri Nov 11 01:07:57 2011
From: anna.fr at gmail.com (Anna Friedlander)
Date: Fri, 11 Nov 2011 14:07:57 +1300
Subject: [Bioperl-l] taxonomy db flatfile: get taxon from gi?
In-Reply-To: <1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
References: <CALv2E+3QpXhOU2Decmw146Bd1P4bDtg45ZosGWkeGkqzZx0XOg@mail.gmail.com>
	<CAExAtoD4aD_zPJXmZZHOG8uUqcTmQr90oQsbdumku5wrWz-erw@mail.gmail.com>
	<1C8130A1-A60A-4F16-A077-4E1B592579BC@gmail.com>
Message-ID: <CALv2E+09JeJiXPUoZphNZnaVhWM9mstkhhp+=1Jvs6Hjy3c+uA@mail.gmail.com>

thanks all for the fast responses.

I'll try the bio-lite modules shalabh mentioned

On Fri, Nov 11, 2011 at 10:15 AM, Jason Stajich <jason.stajich at gmail.com> wrote:
> Here's another variant of one I wrote which is for my own purposes, the code
> at the beginning uses a NOSQL solution to storing all the ACC -> GI
> and then a second db to store GI -> TAXONID
> This is the case where I have a file of accession numbers and I want to add
> to the description line the taxonomy string.
> https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl
> That's the first 165 lines, and then lookups are basically what you see on
> line 195.
> Would be good to rewrite that script below to use TokyoCabinent
> or?KyotoCabinent (is newer implementation, not sure if it is faster?).
> one thing that this does is take up a lot of disk space ,but you can have
> tradeoffs between than and which compression scheme you use, which will
> impact performance of loading.
> Jason
> On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:
>
> Hi Anna,
>
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
>
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
>
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
>
>
> Bernd
>
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>
> Hi all
>
> Does anyone know if there is a way to get a Taxonomy node and/or
>
> taxonid from a gi number using the flatfile with taxonomy db?
>
> I have blast output that I want to append taxonomic information to. I
>
> have hundreds of thousands of items to do this for, so it's not
>
> practical to use entrez to query the?NCBI database.
>
> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>
> think much too large to put into a hash!
>
> This was also discussed in 2009:
>
> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>
> don't think there was a conclusion?
>
> Thanks for your help
>
> Anna Friedlander
>
> _______________________________________________
>
> Bioperl-l mailing list
>
> Bioperl-l at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From arun_innovative90 at yahoo.com  Fri Nov 11 11:09:46 2011
From: arun_innovative90 at yahoo.com (Arun Kumar)
Date: Fri, 11 Nov 2011 03:09:46 -0800 (PST)
Subject: [Bioperl-l] BIOPERL MATERIAL
Message-ID: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>

Hi team, 
?
?? This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF?of this bioperl?as it will be useful to get familier with? bioperl.
?
Thanks in advance

Thanks & Regards,
Arunkumar.d


From awitney at sgul.ac.uk  Fri Nov 11 13:23:29 2011
From: awitney at sgul.ac.uk (Adam Witney)
Date: Fri, 11 Nov 2011 13:23:29 +0000
Subject: [Bioperl-l] BIOPERL MATERIAL
In-Reply-To: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
References: <1321009786.42748.YahooMailNeo@web110101.mail.gq1.yahoo.com>
Message-ID: <EA1DBB02-0280-4207-97E7-A116C058A615@sgul.ac.uk>


All BioPerl documents can be found here:

http://www.bioperl.org/wiki/Main_Page

And a useful place to start would be the HOWTOs:

http://www.bioperl.org/wiki/HOWTOs

regards

adam


On 11 Nov 2011, at 11:09, Arun Kumar wrote:

> Hi team, 
>  
>    This is arun kumar of bio - informatics student wish to master in bioperl after reading your documents, if possible send me PDF of this bioperl as it will be useful to get familier with  bioperl.
>  
> Thanks in advance
> 
> Thanks & Regards,
> Arunkumar.d
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From casaburi at ceinge.unina.it  Fri Nov 11 12:13:50 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:13:50 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825229.post@talk.nabble.com>


Hi thank you for your answer !!! 

At the end i tried this script and seems to work for this purpose:


perl -pe
's/.*?((NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN)(.*?)(NNNNNNNNNNNNNNN|NNNNNNNNNNNNNNN))/$3/g'
Scrivania/orchidea/Fiore/Mydata.fasta > result.txt


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825229.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From casaburi at ceinge.unina.it  Fri Nov 11 12:21:29 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Fri, 11 Nov 2011 04:21:29 -0800 (PST)
Subject: [Bioperl-l] Scripting help to identify adaptors count in reads
In-Reply-To: <9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
References: <32818254.post@talk.nabble.com>
	<COL102-W340F68DB20A406E5B3A9FBFADC0@phx.gbl>
	<9E50CF00-FA3B-4306-81C6-DD4DC784868E@illinois.edu>
Message-ID: <32825274.post@talk.nabble.com>


Thanks everybody for answering me so soon !!! Probably another way may be:

perl -ne '$count{s/N+//g}++ if /^[^>]/;END{for $i (keys %count){print
"$count{$i} have $i ADAPTOR\n";}}' myFile.fasta > result.txt 


and/or with 'nawk':

nawk -F'[N]+' '/^[^>]/{a[NF-1]++}END{for(i in a) print a[i] " have " i "
ADAPTOR"}' myFile.fasta > result.txt 

They give the same result. If you will have this problem try these, work
good !!!

Still Thanks to all,

Giorgio


-- 
View this message in context: http://old.nabble.com/Scripting-help-to-identify-adaptors-count-in-reads-tp32818254p32825274.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Sun Nov 13 12:24:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:24:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
Message-ID: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>

On Thu, Nov 3, 2011 at 7:47 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
> On Nov 3, 2011, at 1:52 PM, Peter Cock wrote:
>
>> On Thu, Nov 3, 2011 at 6:28 PM, Fields, Christopher J
>> <cjfields at illinois.edu> wrote:
>>> (side thread, so re-titling...)
>>>
>> And CC'ing open-bio-l, which is a better home for this than bioperl-l,
>> where OBDA v2 talk came up again in discussion of a BioPerl indexing
>> problem. Archive links for thread here:
>>
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035807.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035808.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035811.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035812.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035813.html
>> http://lists.open-bio.org/pipermail/bioperl-l/2011-November/035822.html
>
> yes, good idea...

I've not CC'd the bioperl-l anymore.

>>> On Nov 1, 2011, at 1:06 PM, Peter Cock wrote:
>>>>
>>>> Yes, we're using SQLite3 to store essentially a list of filenames
>>>> and their format as one table, and then in the main table an
>>>> entry for each sequence recording the ID (only one accession,
>>>> unlike OBDA which had infrastructure for a secondary accession),
>>>> file number, offset of the start of the record, and optionally the
>>>> length of the record on disk.
>>>>
>>>> i.e. Basically what OBDA does, but using SQLite rather
>>>> than BDB (not included in Python 3) or a flat file index
>>>> (poor performance with large datasets).
>>>>
>>>> I find this design attractive on several levels:
>>>> * File format neutral, covers FASTA, FASTQ, GenBank, etc
>>>> * Preserves the original file untouched
>>>> * Index is a small single file (thanks to SQLite)
>>>> * Back end could be switched out
>>>> * Could be applied to compressed file formats
>>>> * Reuses existing parsing code to access entries
>>>>
>>>> This could easily form basis of OBDA v2, the main points
>>>> of difference I anticipate between the Bio* projects would
>>>> be naming conventions for the different file formats, and
>>>> what we consider to be the default record ID of each read
>>>> (e.g. which field in a GenBank file - although agreement
>>>> here is not essential). Some of that was already settled in
>>>> principle with OBDA v1.
>>>
>>> The primary/secondary IDs could be configurable with a sane
>>> default, I think the bioperl implementations allowed this (and
>>> it is certainly something that will be requested).
>>
>> One reason I went with a single ID only was to keep the
>> Python dictionary based API simple (think hash in Perl).
>> You don't get secondary keys in a Python dict or a hash ;)
>>
>> As a nod to flexibility, in Biopython's Bio.SeqIO indexing you
>> can provide a call back function to map the suggested ID to
>> something else. Obviously this doesn't give the full flexibility
>> of extracting a field from the record's annotation because we
>> don't parse the whole record during indexing (it would be too
>> slow).
>
> Same with bioperl.
>
>> However, I'm happy for there to be an *optional* secondary
>> key in an OBDA v2 SQLite schema, but Biopython probably
>> won't populate it. We could optionally use it rather than the
>> primary ID on loading an existing index though.
>
> Optional implementation of that is fine by me.
>
>> Personally I would stick with one key in the index - it should
>> be faster and makes it simpler to switch the back end if we
>> need to later. If anyone wants a second key, they can build
>> a second index *grin*.
>
> That's easy enough.
>
>>>> On the other hand, you could try and store the parsed data
>>>> itself, which is where NOSQL looks more interesting. That
>>>> essentially requires the ability to serialise your annotated
>>>> sequence object model to disk - which would be tricky to do
>>>> cross project (much more ambitious than BioSQL is). It also
>>>> means the "index" becomes very large because it now holds
>>>> all the original data.
>>>>
>>>> Peter
>>>
>>> For a fully cross-Bio* compliant format, I don't think it's feasible
>>> to use serialized data unless they are serialized in something
>>> that is easily deserialized across HLLs (JSON, BSON, YAML,
>>> XML, etc). ?Either that, or such data is stored concurrently with
>>> the binary blob, along with meta data that indicates the source
>>> of the blob, parser, version, etc, etc (unless there are tools out
>>> there that reliably interconvert serialized complex data structures
>>> between HLLs). ?Anyway you go about it, it seems like it could
>>> be a major ball of hurt, unless implemented very carefully.
>>
>> You missed out RDF as a serialisation ;)
>>
>> But yes, going down the shared serialisation route is going
>> to be messy - as you are well aware:
>>
>>> Aside: I think this was one of the problems with
>>> Bio::DB::SeqFeature::Store, in that it at one point stored
>>> Perl-specific Storable blobs.
>>>
>>> chris
>>
>> Peter
>
> yes, it's a problem w/o an easy solution. ?Anyway, I think an
> implementation of such at this point would be a premature
> optimization.
>
> chris

So, Chris and I seem in general agreement that an OBDA v2
using SQLite but based on essentially the same approach as
the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
mapping record identifiers to file offsets in the original sequence
files.

I hope to get BioRuby on board, they already have an OBDA
v1 support so that shouldn't be too hard.

Right now I don't recall if BioJava has/had OBDA v1 support,
and if they did if it was affected in their recent move to BioJava
v3 (I understand from their mailing list that some older lower
priority functionality has not all been ported yet).

Also EMBOSS are likely to be interested, certainly Peter Rice
was interested in the SQLite indexing we're already using in
Biopython for sequence files (i.e. what is effectively the
prototype for OBDA v2).

Note that in addition to simple indexing of text files, we are
already using the same simple offset + length approach for
indexing binary files (e.g. SFF).

On the immediate practical side, I think I can edit the
current OBDA website of http://obda.open-bio.org/
via /home/websites/obda.open-bio.org/html on the
server.

We need to work out where the current OBDA indexing
specification lives (CVS or SVN?) and perhaps move
that to github. We may need a general OBF organisation
account on git hub for this and any other cross-project
repositories.

I see there is already an OBDA project on RedMine,
(Chris can you add me to that please?)
https://redmine.open-bio.org/projects/obda

Peter


From p.j.a.cock at googlemail.com  Sun Nov 13 12:30:37 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:30:37 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
Message-ID: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>

Hi again,

I've retitled this as it is a little off topic from the main OBDA redux thread,
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html

As far as I recall, the original flat file and BDB based OBDA
specification for indexing sequencing files didn't cover
compressed files. That might be something to consider
(although we should sort of uncompressed text/binary
files first).

I've recently been experimenting with using compressed
files - in particular simple GZIP files (ignoring any block structure)
and BGZF (the specialised gzipped blocking used in BAM), see:

http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
http://seqanswers.com/forums/showthread.php?t=15347

The virtual offset approach used in BGZF squeezes a 16 bit
within block offset (thus limiting you to 64kb blocks) and at
48 bit block start offset (thus limiting you to a 256TB file) into
a single 64bit "virtual" offset. That makes sense if you are
keeping the lookup table or many offsets in memory, and
can be used as is with code expecting a single offset (like
the current Biopython SQLite index schema).

Also bzip2 but this is block based, with the block size ranging
from 100KB to 900KB.

http://bzip.org/
http://bzip.org/1.0.5/bzip2-manual-1.0.5.html

I haven't tried any performance tests yet, which would
be interesting as I believe compression/decompression
of bfzip2 is more costly in CPU terms than gzip (although
both will be block size dependent).

If we wanted to imitate the BGZF virtual offset scheme for
arbitrary BZIP2 files, an alternative 64 bit virtual offset scheme
could use 20 bits to cover bz2 blocks of up to 900KB, leaving
64 - 20 = 44 bits for the start offset, thus limiting you to to just
2^44 bytes or 16Tb which sounds OK only in the medium term.
On the bright side this could be used to index any BZIP2 file
(under 16TB), whereas BGZF cannot be applied to any
GZIP file.

On the other hand, storing the block start and within block
separately is truly generic and could be used on any blocked
GZIP file (including BGZF) and BZIP2 etc. It would make
the SQLite schema a bit more complicated though.

Maybe something to consider for the next revision to OBDA,
and focus on the non-compressed case for now?

Regards,

Peter


From p.j.a.cock at googlemail.com  Sun Nov 13 12:32:12 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 13 Nov 2011 12:32:12 +0000
Subject: [Bioperl-l] OBDA redux? Compressed files
In-Reply-To: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
References: <CAKVJ-_6s1hOo9DLDP0pnZ_96pJdd=mpHe96oKUedwELGLDgfJw@mail.gmail.com>
Message-ID: <CAKVJ-_7G639PJBZFLE8mQPT=0LXeTWaf54U0tbMgh6XWfUAKtQ@mail.gmail.com>

On Sun, Nov 13, 2011 at 12:30 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi again,
>
> I've retitled this as it is a little off topic from the main OBDA redux thread,
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000819.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000820.html
> http://lists.open-bio.org/pipermail/open-bio-l/2011-November/000821.html
>
> As far as I recall, the original flat file and BDB based OBDA
> specification for indexing sequencing files didn't cover
> compressed files. That might be something to consider
> (although we should sort of uncompressed text/binary
> files first).

Sorry - didn't meant to include bioperl-l on that, although it may be
of interest to you guys anyway.

Peter


From jluis.lavin at unavarra.es  Mon Nov 14 11:14:43 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 12:14:43 +0100
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
Message-ID: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Mon Nov 14 11:59:56 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 06:59:56 -0500
Subject: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
In-Reply-To: <CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
References: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>
	<CADm9iy=L0c8HTZcaRD8aLw79cg1uvgQrRJ5PH4bZA5zRtt=L_Q@mail.gmail.com>
Message-ID: <FDFB72A5-E38C-4637-9415-5A15E4C5B551@gmail.com>

if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.

If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?  

On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:

> Hello everybody,
> 
> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> worked fine for me. Now I need to perform a multiple BLAST search, but this
> time I'd just like to get all the BLAST results in a single out file
> instead of having each sequence's report written individually. I've read
> the documentation of the module, but due to my short
> experience/understanding on complex modules as this one seems to be I can't
> figure out where to change the script to achieve my previously mentioned
> aim.
> Here I post the script I've been using (it's basically the one posted on
> the module cookbook).
> 
> #!/c:/Perl -w
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> use Data::Dumper;
> 
> #Here i set the parameters for blast
> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> tblastx):\n";
> my $blst = <STDIN>;
> my $prog = "$blst";
> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> env_nr):\n";
> my $dtb = <STDIN>;
> $db = "$dtb";
> print "Enter your cutt off score (1e-n):\n";
> my $cut = <STDIN>;
> my $e_val = "$cut";
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
> 
> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #Select the file and make the blast.
> print "Enter your FASTA file:\n";
> chomp(my $infile = <STDIN>);
> my $r = $remoteBlast->submit_blast($infile);
>  my $v = 1;
> 
>    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> TO RETURN!!!!!
>    while ( my @rids = $remoteBlast->each_rid ) {
>      foreach my $rid ( @rids ) {
>        my $rc = $remoteBlast->retrieve_blast($rid);
>        if( !ref($rc) ) {
>          if( $rc < 0 ) {
>            $remoteBlast->remove_rid($rid);
>          }
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>        } else {
>          my $result = $rc->next_result();
>          #save the output
>          my $filename =
> $result->query_name()."\.out";##################open SALIDA,
> '>>'."$^T"."Report"."\.out";
>          $remoteBlast->save_output($filename);#############
>          $remoteBlast->remove_rid($rid);
>          print "\nQuery Name: ", $result->query_name(), "\n";
>          while ( my $hit = $result->next_hit ) {
>            next unless ( $v > 0);
>            print "\thit name is ", $hit->name, "\n";
>            while( my $hsp = $hit->next_hsp ) {
>              print "\t\tscore is ", $hsp->score, "\n";
>            }
>          }
>        }
>      }
>    }
> 
> 
> May any of you please explain me how to solve my question?
> 
> Thanks in advence
> 
> With best wishes
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at gmail.com  Mon Nov 14 14:07:36 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 09:07:36 -0500
Subject: [Bioperl-l] Fwd: Fwd: How to get Remote BLAST results in a single
	out
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
Message-ID: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>

Please keep this on list discussions 

Sent from my iPhone-please excuse typos

--
Jason Stajich

Begin forwarded message:

> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> Date: November 14, 2011 8:04:25 AM EST
> To: Jason Stajich <jason.stajich at gmail.com>
> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
> 
> Hello Jason,
> 
> As answering your question:
> 
> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
> 
> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
> 
> Thanks in advance
> 
> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
> 
> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
> 
> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> 
> > Hello everybody,
> >
> > I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> > worked fine for me. Now I need to perform a multiple BLAST search, but this
> > time I'd just like to get all the BLAST results in a single out file
> > instead of having each sequence's report written individually. I've read
> > the documentation of the module, but due to my short
> > experience/understanding on complex modules as this one seems to be I can't
> > figure out where to change the script to achieve my previously mentioned
> > aim.
> > Here I post the script I've been using (it's basically the one posted on
> > the module cookbook).
> >
> > #!/c:/Perl -w
> > use Bio::Tools::Run::RemoteBlast;
> > use Bio::SearchIO;
> > use Data::Dumper;
> >
> > #Here i set the parameters for blast
> > print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> > tblastx):\n";
> > my $blst = <STDIN>;
> > my $prog = "$blst";
> > print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
> > env_nr):\n";
> > my $dtb = <STDIN>;
> > $db = "$dtb";
> > print "Enter your cutt off score (1e-n):\n";
> > my $cut = <STDIN>;
> > my $e_val = "$cut";
> >
> > my @params = ( '-prog' => $prog,
> >         '-data' => $db,
> >         '-expect' => $e_val,
> >         '-readmethod' => 'SearchIO' );
> >
> > my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #Select the file and make the blast.
> > print "Enter your FASTA file:\n";
> > chomp(my $infile = <STDIN>);
> > my $r = $remoteBlast->submit_blast($infile);
> >  my $v = 1;
> >
> >    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
> > TO RETURN!!!!!
> >    while ( my @rids = $remoteBlast->each_rid ) {
> >      foreach my $rid ( @rids ) {
> >        my $rc = $remoteBlast->retrieve_blast($rid);
> >        if( !ref($rc) ) {
> >          if( $rc < 0 ) {
> >            $remoteBlast->remove_rid($rid);
> >          }
> >          print STDERR "." if ( $v > 0 );
> >          sleep 5;
> >        } else {
> >          my $result = $rc->next_result();
> >          #save the output
> >          my $filename =
> > $result->query_name()."\.out";##################open SALIDA,
> > '>>'."$^T"."Report"."\.out";
> >          $remoteBlast->save_output($filename);#############
> >          $remoteBlast->remove_rid($rid);
> >          print "\nQuery Name: ", $result->query_name(), "\n";
> >          while ( my $hit = $result->next_hit ) {
> >            next unless ( $v > 0);
> >            print "\thit name is ", $hit->name, "\n";
> >            while( my $hsp = $hit->next_hsp ) {
> >              print "\t\tscore is ", $hsp->score, "\n";
> >            }
> >          }
> >        }
> >      }
> >    }
> >
> >
> > May any of you please explain me how to solve my question?
> >
> > Thanks in advence
> >
> > With best wishes
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> >
> >
> > --
> > --
> > Dr. Jos? Luis Lav?n Trueba
> >
> > Dpto. de Producci?n Agraria
> > Grupo de Gen?tica y Microbiolog?a
> > Universidad P?blica de Navarra
> > 31006 Pamplona
> > Navarra
> > SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> -- 
> -- 
> Dr. Jos? Luis Lav?n Trueba
> 
> Dpto. de Producci?n Agraria
> Grupo de Gen?tica y Microbiolog?a
> Universidad P?blica de Navarra
> 31006 Pamplona
> Navarra
> SPAIN


From cl134 at duke.edu  Sun Nov 13 14:42:05 2011
From: cl134 at duke.edu (Cheng-Ruei Lee)
Date: Sun, 13 Nov 2011 09:42:05 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
Message-ID: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>

Hi all,

     Bioperl version: 1.006
     Here are two error messages when I'm using this module to  
calculate Fu & Li's statistics:
Illegal division by zero at (the Statistics.pm file) line 359
Illegal division by zero at (the Statistics.pm file) line 376
     A further tracking down shows that the first error happens when  
$n (sample size in the ingroup) equals 1 or 2, and the second error  
happens when $n equals 3. This is not really a bug though. I would  
suggest either in the original code, do a checking before the  
calculation (and skip the current calculation when $n == 1, 2, or 3 -  
rather than let the whole program die), or add a few lines of notes in  
the CPAN page.

Sincerely,
Cheng-Ruei Lee


From joluito at gmail.com  Mon Nov 14 09:21:31 2011
From: joluito at gmail.com (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 10:21:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
Message-ID: <CADm9iynE1+y2EGyx8NLzZSzj_E81o-a5_==9ghNQ5R0hX3QbAw@mail.gmail.com>

Hello everybody,

I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
worked fine for me. Now I need to perform a multiple BLAST search, but this
time I'd just like to get all the BLAST results in a single out file
instead of having each sequence's report written individually. I've read
the documentation of the module, but due to my short
experience/understanding on complex modules as this one seems to be I can't
figure out where to change the script to achieve my previously mentioned
aim.
Here I post the script I've been using (it's basically the one posted on
the module cookbook).

#!/c:/Perl -w
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use Data::Dumper;

#Here i set the parameters for blast
print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
tblastx):\n";
my $blst = <STDIN>;
my $prog = "$blst";
print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
env_nr):\n";
my $dtb = <STDIN>;
$db = "$dtb";
print "Enter your cutt off score (1e-n):\n";
my $cut = <STDIN>;
my $e_val = "$cut";

my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );

my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);


#Select the file and make the blast.
print "Enter your FASTA file:\n";
chomp(my $infile = <STDIN>);
my $r = $remoteBlast->submit_blast($infile);
  my $v = 1;

    print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
TO RETURN!!!!!
    while ( my @rids = $remoteBlast->each_rid ) {
      foreach my $rid ( @rids ) {
        my $rc = $remoteBlast->retrieve_blast($rid);
        if( !ref($rc) ) {
          if( $rc < 0 ) {
            $remoteBlast->remove_rid($rid);
          }
          print STDERR "." if ( $v > 0 );
          sleep 5;
        } else {
          my $result = $rc->next_result();
          #save the output
          my $filename =
$result->query_name()."\.out";##################open SALIDA,
'>>'."$^T"."Report"."\.out";
          $remoteBlast->save_output($filename);#############
          $remoteBlast->remove_rid($rid);
          print "\nQuery Name: ", $result->query_name(), "\n";
          while ( my $hit = $result->next_hit ) {
            next unless ( $v > 0);
            print "\thit name is ", $hit->name, "\n";
            while( my $hsp = $hit->next_hsp ) {
              print "\t\tscore is ", $hsp->score, "\n";
            }
          }
        }
      }
    }


May any of you please explain me how to solve my question?

Thanks in advence

With best wishes

-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From cjfields at illinois.edu  Mon Nov 14 17:02:22 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:02:22 +0000
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
Message-ID: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>

Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

chris

On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:

> Please keep this on list discussions 
> 
> Sent from my iPhone-please excuse typos
> 
> --
> Jason Stajich
> 
> Begin forwarded message:
> 
>> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> Date: November 14, 2011 8:04:25 AM EST
>> To: Jason Stajich <jason.stajich at gmail.com>
>> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a single out
>> 
>> Hello Jason,
>> 
>> As answering your question:
>> 
>> " If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?"
>> 
>> A concatenation of BLAST (default format) reports should be OK, since I have a script to parse that kind of results. Anyway formats 1 or 2 will also do the trick. 
>> I'll be happy to get assistance  on how to change the OUTFILE from "a query a report" to "all queries in the same report", because I don't seem to be able to do it myself after reading the module documentation.
>> 
>> Thanks in advance
>> 
>> El 14 de noviembre de 2011 12:59, Jason Stajich <jason.stajich at gmail.com> escribi?:
>> if you want to do a bunch of BLASTs remotely on the cmdline you should also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+ equivalent). This might be faster to do and easier since you need to learn the programming part too.
>> 
>> If you want to do this within this code I guess the question is what format you want the data in - a BLAST report or something more like a table?
>> 
>> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> 
>>> Hello everybody,
>>> 
>>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
>>> worked fine for me. Now I need to perform a multiple BLAST search, but this
>>> time I'd just like to get all the BLAST results in a single out file
>>> instead of having each sequence's report written individually. I've read
>>> the documentation of the module, but due to my short
>>> experience/understanding on complex modules as this one seems to be I can't
>>> figure out where to change the script to achieve my previously mentioned
>>> aim.
>>> Here I post the script I've been using (it's basically the one posted on
>>> the module cookbook).
>>> 
>>> #!/c:/Perl -w
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::SearchIO;
>>> use Data::Dumper;
>>> 
>>> #Here i set the parameters for blast
>>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>>> tblastx):\n";
>>> my $blst = <STDIN>;
>>> my $prog = "$blst";
>>> print "Enter a database to search (nr, refseq_protein, swissprot, pat, pdb,
>>> env_nr):\n";
>>> my $dtb = <STDIN>;
>>> $db = "$dtb";
>>> print "Enter your cutt off score (1e-n):\n";
>>> my $cut = <STDIN>;
>>> my $e_val = "$cut";
>>> 
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO' );
>>> 
>>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>>> 
>>> 
>>> #Select the file and make the blast.
>>> print "Enter your FASTA file:\n";
>>> chomp(my $infile = <STDIN>);
>>> my $r = $remoteBlast->submit_blast($infile);
>>> my $v = 1;
>>> 
>>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE RESULTS
>>> TO RETURN!!!!!
>>>   while ( my @rids = $remoteBlast->each_rid ) {
>>>     foreach my $rid ( @rids ) {
>>>       my $rc = $remoteBlast->retrieve_blast($rid);
>>>       if( !ref($rc) ) {
>>>         if( $rc < 0 ) {
>>>           $remoteBlast->remove_rid($rid);
>>>         }
>>>         print STDERR "." if ( $v > 0 );
>>>         sleep 5;
>>>       } else {
>>>         my $result = $rc->next_result();
>>>         #save the output
>>>         my $filename =
>>> $result->query_name()."\.out";##################open SALIDA,
>>> '>>'."$^T"."Report"."\.out";
>>>         $remoteBlast->save_output($filename);#############
>>>         $remoteBlast->remove_rid($rid);
>>>         print "\nQuery Name: ", $result->query_name(), "\n";
>>>         while ( my $hit = $result->next_hit ) {
>>>           next unless ( $v > 0);
>>>           print "\thit name is ", $hit->name, "\n";
>>>           while( my $hsp = $hit->next_hsp ) {
>>>             print "\t\tscore is ", $hsp->score, "\n";
>>>           }
>>>         }
>>>       }
>>>     }
>>>   }
>>> 
>>> 
>>> May any of you please explain me how to solve my question?
>>> 
>>> Thanks in advence
>>> 
>>> With best wishes
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> 
>>> 
>>> --
>>> --
>>> Dr. Jos? Luis Lav?n Trueba
>>> 
>>> Dpto. de Producci?n Agraria
>>> Grupo de Gen?tica y Microbiolog?a
>>> Universidad P?blica de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
>> -- 
>> -- 
>> Dr. Jos? Luis Lav?n Trueba
>> 
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 17:03:04 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:03:04 +0000
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <E385D24C-E562-43B9-A820-2A7C59E9399A@illinois.edu>

Cheng,

Have you tried the latest CPAN release (we're at 1.006901).

chris

On Nov 13, 2011, at 8:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 14 17:59:35 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 14 Nov 2011 17:59:35 +0000
Subject: [Bioperl-l] OBDA redux?
In-Reply-To: <CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
References: <CAKVJ-_6bzfZZr82y+J4qmGbn0du4rxaKaGxjmbC7p-pU_nMuoQ@mail.gmail.com>
	<FB761CFA-1CFD-4FA0-A708-2CE3F2F240D9@illinois.edu>
	<CAKVJ-_4i8AJL1emBKpnO+p-SVzNtbSVwdL9uSy72NWkHugRtVA@mail.gmail.com>
Message-ID: <12E3B71D-6E61-41AD-A956-A1FC2076AF24@illinois.edu>

On Nov 13, 2011, at 6:24 AM, Peter Cock wrote:

> So, Chris and I seem in general agreement that an OBDA v2
> using SQLite but based on essentially the same approach as
> the BDB or flat file based OBDA v1 is a good idea. i.e. Tables
> mapping record identifiers to file offsets in the original sequence
> files.

The worry I have is adhering to a specific backend (e.g. SQLite).  The reason I say this is b/c BDB in it's time was the go-to way of storing simple index data, but that is no longer feasible for very large data sets.  Who's to say something similar won't happen to SQLite, or that it is the best option available?  

Maybe we should focus on the data storage schema, as simple as it may be, then indicate the default backend must be SQLite but others are allowed (maybe with a mention that SQLite can be replaced by alternatives in the future if needed).  

> I hope to get BioRuby on board, they already have an OBDA
> v1 support so that shouldn't be too hard.
> 
> Right now I don't recall if BioJava has/had OBDA v1 support,
> and if they did if it was affected in their recent move to BioJava
> v3 (I understand from their mailing list that some older lower
> priority functionality has not all been ported yet).

I wouldn't be surprised at that, OBDA kind of lingered for a while, and I'm not sure how widely adopted it became (maybe others can shed light on that?)

> Also EMBOSS are likely to be interested, certainly Peter Rice
> was interested in the SQLite indexing we're already using in
> Biopython for sequence files (i.e. what is effectively the
> prototype for OBDA v2).
> 
> Note that in addition to simple indexing of text files, we are
> already using the same simple offset + length approach for
> indexing binary files (e.g. SFF).

I think that's the general idea, that is how all bioperl data was indexed, before with the Bio::Index modules and with the OBDA implementations as well.

> On the immediate practical side, I think I can edit the
> current OBDA website of http://obda.open-bio.org/
> via /home/websites/obda.open-bio.org/html on the
> server.

See below w/ regards to my thoughts on the wiki.

> We need to work out where the current OBDA indexing
> specification lives (CVS or SVN?) and perhaps move
> that to github. We may need a general OBF organisation
> account on git hub for this and any other cross-project
> repositories.

+1 to a move to github, but maybe this belongs in an OBF-specific organization.  And maybe we should take advantage of the simple wiki or project homepage that GitHub offers and move everything (docs) there. 

> I see there is already an OBDA project on RedMine,
> (Chris can you add me to that please?)
> https://redmine.open-bio.org/projects/obda
> 
> Peter

Done (last night actually, but I didn't have time to respond immediately).

chris


From David.Messina at sbc.su.se  Mon Nov 14 19:31:18 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 14 Nov 2011 20:31:18 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single	out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <CADm9iynj0NcWOtE4asYimfFBhbTYAFrROLj9qccmoH6o=yKVrg@mail.gmail.com>
	<8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <29C56604-BBEE-4D80-9662-7C3627907200@sbc.su.se>


> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the various 'blast*' indicating the search is to use a remote database.  I haven't used it, though...

Yes, it's the --remote option. I've used it, and it works great.

The speed is throttled by NCBI, however, so for an appreciable number of queries, the standard advice applies to run the search on your own computers.


Dave

> 


From jluis.lavin at unavarra.es  Mon Nov 14 21:23:31 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Mon, 14 Nov 2011 22:23:31 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<5E778003-D3D4-4424-BFC4-407C6489542B@illinois.edu>
Message-ID: <CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>

Thank you very much for your answers, but due to them, I'm afraid I didn't
explained myself good enough.

 I'm not looking for another tool to perform a BLAST task. I was just
wondering if there was a way to simply change the way the module writes the
outputs, so that I can get multiple searches in a single report file
instead of having a report for each BLAST search.

Maybe there's some issue I ignore, that makes you recommend the use of
other tools instead of the Bioperl Remote BLAST module...it would be
appreciated if you let me know about that (NCBI server problems with
web-services or so)...

Thank you for your answers anyway

Best wishes

2011/11/14 Fields, Christopher J <cjfields at illinois.edu>

> Re: a BLAST+ equivalent for blastcl3, I believe there is an option for the
> various 'blast*' indicating the search is to use a remote database.  I
> haven't used it, though...
>
> chris
>
> On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>
> > Please keep this on list discussions
> >
> > Sent from my iPhone-please excuse typos
> >
> > --
> > Jason Stajich
> >
> > Begin forwarded message:
> >
> >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
> >> Date: November 14, 2011 8:04:25 AM EST
> >> To: Jason Stajich <jason.stajich at gmail.com>
> >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
> single out
> >>
> >> Hello Jason,
> >>
> >> As answering your question:
> >>
> >> " If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a
> table?"
> >>
> >> A concatenation of BLAST (default format) reports should be OK, since I
> have a script to parse that kind of results. Anyway formats 1 or 2 will
> also do the trick.
> >> I'll be happy to get assistance  on how to change the OUTFILE from "a
> query a report" to "all queries in the same report", because I don't seem
> to be able to do it myself after reading the module documentation.
> >>
> >> Thanks in advance
> >>
> >> El 14 de noviembre de 2011 12:59, Jason Stajich <
> jason.stajich at gmail.com> escribi?:
> >> if you want to do a bunch of BLASTs remotely on the cmdline you should
> also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
> equivalent). This might be faster to do and easier since you need to learn
> the programming part too.
> >>
> >> If you want to do this within this code I guess the question is what
> format you want the data in - a BLAST report or something more like a table?
> >>
> >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
> >>
> >>> Hello everybody,
> >>>
> >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it has
> >>> worked fine for me. Now I need to perform a multiple BLAST search, but
> this
> >>> time I'd just like to get all the BLAST results in a single out file
> >>> instead of having each sequence's report written individually. I've
> read
> >>> the documentation of the module, but due to my short
> >>> experience/understanding on complex modules as this one seems to be I
> can't
> >>> figure out where to change the script to achieve my previously
> mentioned
> >>> aim.
> >>> Here I post the script I've been using (it's basically the one posted
> on
> >>> the module cookbook).
> >>>
> >>> #!/c:/Perl -w
> >>> use Bio::Tools::Run::RemoteBlast;
> >>> use Bio::SearchIO;
> >>> use Data::Dumper;
> >>>
> >>> #Here i set the parameters for blast
> >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
> >>> tblastx):\n";
> >>> my $blst = <STDIN>;
> >>> my $prog = "$blst";
> >>> print "Enter a database to search (nr, refseq_protein, swissprot, pat,
> pdb,
> >>> env_nr):\n";
> >>> my $dtb = <STDIN>;
> >>> $db = "$dtb";
> >>> print "Enter your cutt off score (1e-n):\n";
> >>> my $cut = <STDIN>;
> >>> my $e_val = "$cut";
> >>>
> >>> my @params = ( '-prog' => $prog,
> >>>        '-data' => $db,
> >>>        '-expect' => $e_val,
> >>>        '-readmethod' => 'SearchIO' );
> >>>
> >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
> >>>
> >>>
> >>> #Select the file and make the blast.
> >>> print "Enter your FASTA file:\n";
> >>> chomp(my $infile = <STDIN>);
> >>> my $r = $remoteBlast->submit_blast($infile);
> >>> my $v = 1;
> >>>
> >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
> RESULTS
> >>> TO RETURN!!!!!
> >>>   while ( my @rids = $remoteBlast->each_rid ) {
> >>>     foreach my $rid ( @rids ) {
> >>>       my $rc = $remoteBlast->retrieve_blast($rid);
> >>>       if( !ref($rc) ) {
> >>>         if( $rc < 0 ) {
> >>>           $remoteBlast->remove_rid($rid);
> >>>         }
> >>>         print STDERR "." if ( $v > 0 );
> >>>         sleep 5;
> >>>       } else {
> >>>         my $result = $rc->next_result();
> >>>         #save the output
> >>>         my $filename =
> >>> $result->query_name()."\.out";##################open SALIDA,
> >>> '>>'."$^T"."Report"."\.out";
> >>>         $remoteBlast->save_output($filename);#############
> >>>         $remoteBlast->remove_rid($rid);
> >>>         print "\nQuery Name: ", $result->query_name(), "\n";
> >>>         while ( my $hit = $result->next_hit ) {
> >>>           next unless ( $v > 0);
> >>>           print "\thit name is ", $hit->name, "\n";
> >>>           while( my $hsp = $hit->next_hsp ) {
> >>>             print "\t\tscore is ", $hsp->score, "\n";
> >>>           }
> >>>         }
> >>>       }
> >>>     }
> >>>   }
> >>>
> >>>
> >>> May any of you please explain me how to solve my question?
> >>>
> >>> Thanks in advence
> >>>
> >>> With best wishes
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Dr. Jos? Luis Lav?n Trueba
> >>>
> >>> Dpto. de Producci?n Agraria
> >>> Grupo de Gen?tica y Microbiolog?a
> >>> Universidad P?blica de Navarra
> >>> 31006 Pamplona
> >>> Navarra
> >>> SPAIN
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >>
> >> --
> >> --
> >> Dr. Jos? Luis Lav?n Trueba
> >>
> >> Dpto. de Producci?n Agraria
> >> Grupo de Gen?tica y Microbiolog?a
> >> Universidad P?blica de Navarra
> >> 31006 Pamplona
> >> Navarra
> >> SPAIN
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From jason.stajich at gmail.com  Tue Nov 15 03:53:19 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 14 Nov 2011 22:53:19 -0500
Subject: [Bioperl-l] Suggestion for Bio::PopGen::Statistics
In-Reply-To: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
References: <604D23CF-C1A3-4053-A28F-154412FC8345@duke.edu>
Message-ID: <0A6DF9A2-F34F-4277-8E84-C3E5351BB3FF@gmail.com>

sure -- as you say, the implementation presumed that it would be called more than 3 individuals to this method which is a shortcoming.  I have committed the code fix but still need someone to add a comment to the perldoc. I've made it a redmine bug. 

https://redmine.open-bio.org/issues/3313

Jason

Can you provide a test script and we'll add a test for this so 
On Nov 13, 2011, at 9:42 AM, Cheng-Ruei Lee wrote:

> Hi all,
> 
>    Bioperl version: 1.006
>    Here are two error messages when I'm using this module to calculate Fu & Li's statistics:
> Illegal division by zero at (the Statistics.pm file) line 359
> Illegal division by zero at (the Statistics.pm file) line 376
>    A further tracking down shows that the first error happens when $n (sample size in the ingroup) equals 1 or 2, and the second error happens when $n equals 3. This is not really a bug though. I would suggest either in the original code, do a checking before the calculation (and skip the current calculation when $n == 1, 2, or 3 - rather than let the whole program die), or add a few lines of notes in the CPAN page.
> 
> Sincerely,
> Cheng-Ruei Lee
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cchehoud at gmail.com  Tue Nov 15 01:39:32 2011
From: cchehoud at gmail.com (Christel Chehoud)
Date: Mon, 14 Nov 2011 17:39:32 -0800
Subject: [Bioperl-l] Bioperl installation help
Message-ID: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>

Dear BioPerl,
Thank you for creating such useful code. Unfortunately, every time I
try to install Bioperl, it takes me a long time and is a challenging
ordeal :( I am a new MAC user and was not able to download bioperl
using CPAN. Here is the error I am getting:

ERROR: Can't create '/usr/local/bin'
Do not have write permissions on '/usr/local/bin'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
  CJFIELDS/BioPerl-1.6.0.tar.gz
  ./Build install  -- NOT OK
----
  You may have to su to root to install the package
  (Or you may want to run something like
    o conf make_install_make_command 'sudo make'
  to raise your permissions.Warning (usually harmless): 'YAML' not
installed, will not store persistent state
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
failure ignored because 'force' in effect


so I did:
cpan> o conf make_install_make_command 'sudo make'
followed by
cpan> o conf commit

and started over..I got the same number of errors as last time (so I
decided not to force install this time). do you have any suggestions:

63 tests and 305 subtests skipped.
Failed 11/329 test scripts. 981/17708 subtests failed.
Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
117.20 CPU)
Failed 11/329 test programs. 981/17708 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Warning (usually harmless): 'YAML' not installed, will not store
persistent state
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
 FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO


Thanks a lot for your time and help.  I appreciate it.

Thank you,
Christel


From casaburi at ceinge.unina.it  Tue Nov 15 09:25:25 2011
From: casaburi at ceinge.unina.it (Giorgio C)
Date: Tue, 15 Nov 2011 01:25:25 -0800 (PST)
Subject: [Bioperl-l]  Blast > parsing result in Exel
Message-ID: <32846407.post@talk.nabble.com>


Hy everybody,

in this situation froma blast (-m 1) result file :

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Query= 132-291
(59 letters)

Database: Scrivania/orchidea/mature_mirBase.fa
21,643 sequences; 470,608 total letters

Searching..................................................done


Score E
Sequences producing significant alignments: (bits) Value

mtr-miR2644b MIMAT0013413 Medicago truncatula miR2644b 28 0.031
mtr-miR2644a MIMAT0013412 Medicago truncatula miR2644a 28 0.031
gga-miR-1704 MIMAT0007596 Gallus gallus miR-1704 22 1.9
gga-miR-1557 MIMAT0007414 Gallus gallus miR-1557 22 1.9
mmu-miR-880-5p MIMAT0017266 Mus musculus miR-880-5p 22 1.9

132_0 8 cagccgctcagattgatggtgcctacagccttgccagcccgctcagattgat 59
12631 5 .............. 18
12630 5 .............. 18
7826 5 ........... 15
7644 19 ........... 9
5394 3 ........... 13
5394 3 ........... 13
BLASTN 2.2.21 [Jun-14-2009]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
...
....
..........

______________________________________________________________
I need to parse in an exel sheet :

1)ID 2)Name of the hit 3)E-value 4)Score 5)Species


1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula


Is possible from a big blast result file obtain an exel with 5 columns where
every field is the first hit of the blast result. Can anyone halp me to fix
this problem ??? Also with a little script in perl.


Thank you very much
-- 
View this message in context: http://old.nabble.com/Blast-%3E-parsing-result-in-Exel-tp32846407p32846407.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From nisa.dar10 at gmail.com  Wed Nov 16 00:49:00 2011
From: nisa.dar10 at gmail.com (nisa.dar)
Date: Tue, 15 Nov 2011 16:49:00 -0800 (PST)
Subject: [Bioperl-l]  print alignment from blast results file
Message-ID: <32851673.post@talk.nabble.com>


Hi,

I am parsing a blast results file. I have found bioperl modules to get query
string, homology string and hit string for each hit/hsp. I want to print
them in the form of an alignment instead of aligning them individually.

this is what I am doing, but it doesn't seem correct

while (my $hsp = $hit->next_hsp) {
                                        my
$start_query_num=$hsp->start('query');
					my $query_string=$hsp->query_string;
					my $end_query_num=$hsp->end('query');
					my $homology_string=$hsp->homology_string;
					my $start_hit_num=$hsp->start('hit');
					my $hit_string=$hsp->hit_string;
					my $end_hit_num=$hsp->end('hit');
					my $aln_o = $hsp->get_aln;
					$query_string=~s/\n//g;#get rid of new line characters
					$homology_string=~s/\n//g;
					$hit_string=~s/\n//g;

                         print "<h3>Alignment:</h3><br />";
			print "$start_query_num-$query_string-$end_query_num<br />";
			print "   
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
   
            
}

Please let me know how can I print them in the form of an alignment as seen
in the blast results file.

Thanks


-- 
View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From p.j.a.cock at googlemail.com  Wed Nov 16 09:11:40 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Nov 2011 09:11:40 +0000
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAKVJ-_5PTZttkHXS-FB-tOxhDRCty_qJH9PTurDWn2M5p3VzSw@mail.gmail.com>

On Tue, Nov 15, 2011 at 9:25 AM, Giorgio C <casaburi at ceinge.unina.it> wrote:
>
> Hy everybody,
>
> in this situation froma blast (-m 1) result file :
>
> ...
>
> I need to parse in an exel sheet :
>
> 1)ID 2)Name of the hit 3)E-value 4)Score 5)Species
>
>
> 1) 132-291 2)mir2644b 3) 0,031 4)28 5) Medicago truncatula
>
> Is possible from a big blast result file obtain an exel with 5 columns where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>
> Thank you very much

Have you looked at any of the BioPerl BLAST parsing examples? e.g
http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/Module:Bio::SearchIO

See also http://seqanswers.com/forums/showthread.php?t=15489

Peter


From bosborne11 at verizon.net  Wed Nov 16 13:19:33 2011
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 16 Nov 2011 08:19:33 -0500
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <32851673.post@talk.nabble.com>
References: <32851673.post@talk.nabble.com>
Message-ID: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>

Nisa,

See:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Brian O.


On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:

> 
> Hi,
> 
> I am parsing a blast results file. I have found bioperl modules to get query
> string, homology string and hit string for each hit/hsp. I want to print
> them in the form of an alignment instead of aligning them individually.
> 
> this is what I am doing, but it doesn't seem correct
> 
> while (my $hsp = $hit->next_hsp) {
>                                        my
> $start_query_num=$hsp->start('query');
> 					my $query_string=$hsp->query_string;
> 					my $end_query_num=$hsp->end('query');
> 					my $homology_string=$hsp->homology_string;
> 					my $start_hit_num=$hsp->start('hit');
> 					my $hit_string=$hsp->hit_string;
> 					my $end_hit_num=$hsp->end('hit');
> 					my $aln_o = $hsp->get_aln;
> 					$query_string=~s/\n//g;#get rid of new line characters
> 					$homology_string=~s/\n//g;
> 					$hit_string=~s/\n//g;
> 
>                         print "<h3>Alignment:</h3><br />";
> 			print "$start_query_num-$query_string-$end_query_num<br />";
> 			print "   
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
> 
> 
> 
> }
> 
> Please let me know how can I print them in the form of an alignment as seen
> in the blast results file.
> 
> Thanks
> 
> 
> -- 
> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 16:44:27 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:44:27 +0000
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <72481F31-3ADB-4E3D-9DBC-714FBEC730E4@illinois.edu>

For some reason you are trying to install an older version of BioPerl; try installing Bio::Perl (or one of the core modules).  This should automatically install the latest version from CPAN.  My guess is this will address some of the issues.  However, w/o actually seeing what tests failed we can't help.

Also, if you are only interested in running local jobs, install BioPerl locally, or just grab the dist and add it to PERL5LIB.  There are instructions in the installation docs for that.  You can also use cpanm (cpanminus) to install things locally as well, it's highly recommended and much easier than cpan.

chris

On Nov 14, 2011, at 7:39 PM, Christel Chehoud wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
> 
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
> 
> 
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
> 
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
> 
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
> CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
> FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
> CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
> 
> 
> Thanks a lot for your time and help.  I appreciate it.
> 
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 16 16:46:16 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 16 Nov 2011 16:46:16 +0000
Subject: [Bioperl-l] print alignment from blast results file
In-Reply-To: <035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
References: <32851673.post@talk.nabble.com>
	<035404FE-A7D9-413A-B6EF-48564FE76426@verizon.net>
Message-ID: <B7768538-08CE-40A0-8EB9-5EB5169C1072@illinois.edu>

small hint: you can get a Bio::AlignI from the HSP (which can be redirected to a Bio::AlignIO instance).

chris

On Nov 16, 2011, at 7:19 AM, Brian Osborne wrote:

> Nisa,
> 
> See:
> 
> http://www.bioperl.org/wiki/HOWTO:SearchIO
> 
> Brian O.
> 
> 
> On Nov 15, 2011, at 7:49 PM, nisa.dar wrote:
> 
>> 
>> Hi,
>> 
>> I am parsing a blast results file. I have found bioperl modules to get query
>> string, homology string and hit string for each hit/hsp. I want to print
>> them in the form of an alignment instead of aligning them individually.
>> 
>> this is what I am doing, but it doesn't seem correct
>> 
>> while (my $hsp = $hit->next_hsp) {
>>                                       my
>> $start_query_num=$hsp->start('query');
>> 					my $query_string=$hsp->query_string;
>> 					my $end_query_num=$hsp->end('query');
>> 					my $homology_string=$hsp->homology_string;
>> 					my $start_hit_num=$hsp->start('hit');
>> 					my $hit_string=$hsp->hit_string;
>> 					my $end_hit_num=$hsp->end('hit');
>> 					my $aln_o = $hsp->get_aln;
>> 					$query_string=~s/\n//g;#get rid of new line characters
>> 					$homology_string=~s/\n//g;
>> 					$hit_string=~s/\n//g;
>> 
>>                        print "<h3>Alignment:</h3><br />";
>> 			print "$start_query_num-$query_string-$end_query_num<br />";
>> 			print "   
>> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$homology_string<br />";
>> 			print "$start_hit_num-$hit_string-$end_hit_num<br /><br />";
>> 
>> 
>> 
>> }
>> 
>> Please let me know how can I print them in the form of an alignment as seen
>> in the blast results file.
>> 
>> Thanks
>> 
>> 
>> -- 
>> View this message in context: http://old.nabble.com/print-alignment-from-blast-results-file-tp32851673p32851673.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Wed Nov 16 17:01:49 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 16 Nov 2011 18:01:49 +0100
Subject: [Bioperl-l] Bioperl installation help
In-Reply-To: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
References: <CAO28iBQV1iAFpUKwsZ2g-JaDo8Dz3idx=w8DgFu7A1TBOHoDhg@mail.gmail.com>
Message-ID: <CAM3TQQWDJ1_HPrAUguFfH5ngV42WeUOvXE6N2GktgmeTFs=ijw@mail.gmail.com>

Hi Christel,

Sorry to hear you're having trouble with the installation.

It looks like these modules aren't getting installed and are causing the
failed tests:
CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO

I would try installing those separately via CPAN first and then trying
again to install BioPerl.

Also, it was a good idea to set the make_install_make_command option to
CPAN, and that should have worked. Unfortunately, there's another
installation system called Module::Build that has its own option which may
need to be set:
cpan> o conf mbuild_install_build_command 'sudo ./Build'


That being said, I would suggest you grab the latest version of BioPerl
from github instead of using v1.6.1 from CPAN, which is fairly out of date
at this point.

And unless you're planning to use BioPerl with GBrowse or Bio::Graphics,
there's another, simpler way to get BioPerl up and running (assuming you
have all the prerequisites like Data::Stag installed):

See "Don't want to install BioPerl?" here:
http://www.seqxml.org/xml/BioPerl.html


Best,
Dave


On Tue, Nov 15, 2011 at 02:39, Christel Chehoud <cchehoud at gmail.com> wrote:

> Dear BioPerl,
> Thank you for creating such useful code. Unfortunately, every time I
> try to install Bioperl, it takes me a long time and is a challenging
> ordeal :( I am a new MAC user and was not able to download bioperl
> using CPAN. Here is the error I am getting:
>
> ERROR: Can't create '/usr/local/bin'
> Do not have write permissions on '/usr/local/bin'
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>  at /Users/Christel/.cpan/build/BioPerl-1.6.0-DiZ0rN/Bio/Root/Build.pm
> line 902
>  CJFIELDS/BioPerl-1.6.0.tar.gz
>  ./Build install  -- NOT OK
> ----
>  You may have to su to root to install the package
>  (Or you may want to run something like
>    o conf make_install_make_command 'sudo make'
>  to raise your permissions.Warning (usually harmless): 'YAML' not
> installed, will not store persistent state
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  CJFIELDS/BioPerl-1.6.0.tar.gz                : make_test FAILED but
> failure ignored because 'force' in effect
>
>
> so I did:
> cpan> o conf make_install_make_command 'sudo make'
> followed by
> cpan> o conf commit
>
> and started over..I got the same number of errors as last time (so I
> decided not to force install this time). do you have any suggestions:
>
> 63 tests and 305 subtests skipped.
> Failed 11/329 test scripts. 981/17708 subtests failed.
> Files=329, Tests=17708, 127 wallclock secs (104.69 cusr + 12.51 csys =
> 117.20 CPU)
> Failed 11/329 test programs. 981/17708 subtests failed.
>  CJFIELDS/BioPerl-1.6.1.tar.gz
>  ./Build test -- NOT OK
> //hint// to see the cpan-testers results for installing this module, try:
>  reports CJFIELDS/BioPerl-1.6.1.tar.gz
> Warning (usually harmless): 'YAML' not installed, will not store
> persistent state
> Running Build install
>  make test had returned bad status, won't install without force
> Failed during this command:
>  CMUNGALL/Data-Stag-0.11.tar.gz               : make NO
>  FLORA/ExtUtils-Manifest-1.60.tar.gz          : make NO
>  CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO
>
>
> Thanks a lot for your time and help.  I appreciate it.
>
> Thank you,
> Christel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jluis.lavin at unavarra.es  Wed Nov 16 18:31:46 2011
From: jluis.lavin at unavarra.es (=?ISO-8859-1?Q?Jos=E9_Luis_Lav=EDn?=)
Date: Wed, 16 Nov 2011 19:31:46 +0100
Subject: [Bioperl-l] How to get Remote BLAST results in a single out
In-Reply-To: <CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
References: <8AFF1359-D64D-4EEB-B82E-E9DB00822DC5@gmail.com>
	<CADm9iy=JcWtUp-KvazA=go2V_VMR7N8D92cHCMe5Rg5kzWmZKQ@mail.gmail.com>
	<CALf8LpwFrv2jWMm35nTaC88atO6yrSbGza9j0TyTZTzBtxaCxw@mail.gmail.com>
Message-ID: <CADm9iy=mMqHhWO5rTXbJS4ZG8aG-t0mAVHqN720tnyA7Hy_nkg@mail.gmail.com>

Thank you for your answer Jason,

While answering you I figured out how to do it...sometimes you need other
people's point of view to see the light.

As you pointed out:

"what is complicaticated is the file name right now is based on the query
name."

that's what I expected that could have an easy fix, the issue about the
dependency between the outfile name and the query name, this is why I
couldn't figure out how to change the name of the output .

While reading the code to answer you, I came across the solution.

I was persistent on doing it this way because I need to run BLAST remotely
on my CGI, that's why I didn't pay attention to all the other options you
suggested. Thank you all for your sugestions anyway.

;)

Best wishes

JL


El 16 de noviembre de 2011 18:03, Jason Stajich <jason at bioperl.org>escribi?:

> the answer to your question is to move the line that opens a file to
> outside the loop. what is complicaticated is the file name right now is
> based on the query name. so you need to think how you want to name the
> file. Since this isn't obvious to you, then I think we are suggesting you
> probably need to understand programming more, and it might just be easier
> to use the tools as we have suggested rather than teaching you to modify
> what is just an example code.  our suggestions are based on the way we'd
> solve the problem so maybe you have other reasons for the direction you
> want to take.
>
> I also think it is not efficient or logical to run
> remote blast through the web protocol simply to write it back out with
> bioperl since that has to parse it in and then write it out -- why not just
> run the program that generates the output directly from NCBI. Or run BLAST
> locally for likely more efficient running.
>
>  Finally the bioperl writer may not 100% reproduce the blast output so if
> you are planning on further parsing the output that comes out from this
> script, it really doesn't seem like a good idea to launder it through
> bioperl parser first.
>
>
>
> 2011/11/14 Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>
>> Thank you very much for your answers, but due to them, I'm afraid I didn't
>> explained myself good enough.
>>
>>  I'm not looking for another tool to perform a BLAST task. I was just
>> wondering if there was a way to simply change the way the module writes
>> the
>> outputs, so that I can get multiple searches in a single report file
>> instead of having a report for each BLAST search.
>>
>> Maybe there's some issue I ignore, that makes you recommend the use of
>> other tools instead of the Bioperl Remote BLAST module...it would be
>> appreciated if you let me know about that (NCBI server problems with
>> web-services or so)...
>>
>> Thank you for your answers anyway
>>
>> Best wishes
>>
>> 2011/11/14 Fields, Christopher J <cjfields at illinois.edu>
>>
>> > Re: a BLAST+ equivalent for blastcl3, I believe there is an option for
>> the
>> > various 'blast*' indicating the search is to use a remote database.  I
>> > haven't used it, though...
>> >
>> > chris
>> >
>> > On Nov 14, 2011, at 8:07 AM, Jason Stajich wrote:
>> >
>> > > Please keep this on list discussions
>> > >
>> > > Sent from my iPhone-please excuse typos
>> > >
>> > > --
>> > > Jason Stajich
>> > >
>> > > Begin forwarded message:
>> > >
>> > >> From: Jos? Luis Lav?n <jluis.lavin at unavarra.es>
>> > >> Date: November 14, 2011 8:04:25 AM EST
>> > >> To: Jason Stajich <jason.stajich at gmail.com>
>> > >> Subject: Re: [Bioperl-l] Fwd: How to get Remote BLAST results in a
>> > single out
>> > >>
>> > >> Hello Jason,
>> > >>
>> > >> As answering your question:
>> > >>
>> > >> " If you want to do this within this code I guess the question is
>> what
>> > format you want the data in - a BLAST report or something more like a
>> > table?"
>> > >>
>> > >> A concatenation of BLAST (default format) reports should be OK,
>> since I
>> > have a script to parse that kind of results. Anyway formats 1 or 2 will
>> > also do the trick.
>> > >> I'll be happy to get assistance  on how to change the OUTFILE from "a
>> > query a report" to "all queries in the same report", because I don't
>> seem
>> > to be able to do it myself after reading the module documentation.
>> > >>
>> > >> Thanks in advance
>> > >>
>> > >> El 14 de noviembre de 2011 12:59, Jason Stajich <
>> > jason.stajich at gmail.com> escribi?:
>> > >> if you want to do a bunch of BLASTs remotely on the cmdline you
>> should
>> > also just use the NCBI's blastcl3 tool (not sure if there is a BLAST+
>> > equivalent). This might be faster to do and easier since you need to
>> learn
>> > the programming part too.
>> > >>
>> > >> If you want to do this within this code I guess the question is what
>> > format you want the data in - a BLAST report or something more like a
>> table?
>> > >>
>> > >> On Nov 14, 2011, at 6:14 AM, Jos? Luis Lav?n wrote:
>> > >>
>> > >>> Hello everybody,
>> > >>>
>> > >>> I've been using  "Bio::Tools::Run::RemoteBlast" for a time and it
>> has
>> > >>> worked fine for me. Now I need to perform a multiple BLAST search,
>> but
>> > this
>> > >>> time I'd just like to get all the BLAST results in a single out file
>> > >>> instead of having each sequence's report written individually. I've
>> > read
>> > >>> the documentation of the module, but due to my short
>> > >>> experience/understanding on complex modules as this one seems to be
>> I
>> > can't
>> > >>> figure out where to change the script to achieve my previously
>> > mentioned
>> > >>> aim.
>> > >>> Here I post the script I've been using (it's basically the one
>> posted
>> > on
>> > >>> the module cookbook).
>> > >>>
>> > >>> #!/c:/Perl -w
>> > >>> use Bio::Tools::Run::RemoteBlast;
>> > >>> use Bio::SearchIO;
>> > >>> use Data::Dumper;
>> > >>>
>> > >>> #Here i set the parameters for blast
>> > >>> print "Enter your BLAST choice (blastn, blastp, blastx, tblastn,
>> > >>> tblastx):\n";
>> > >>> my $blst = <STDIN>;
>> > >>> my $prog = "$blst";
>> > >>> print "Enter a database to search (nr, refseq_protein, swissprot,
>> pat,
>> > pdb,
>> > >>> env_nr):\n";
>> > >>> my $dtb = <STDIN>;
>> > >>> $db = "$dtb";
>> > >>> print "Enter your cutt off score (1e-n):\n";
>> > >>> my $cut = <STDIN>;
>> > >>> my $e_val = "$cut";
>> > >>>
>> > >>> my @params = ( '-prog' => $prog,
>> > >>>        '-data' => $db,
>> > >>>        '-expect' => $e_val,
>> > >>>        '-readmethod' => 'SearchIO' );
>> > >>>
>> > >>> my $remoteBlast = Bio::Tools::Run::RemoteBlast->new(@params);
>> > >>>
>> > >>>
>> > >>> #Select the file and make the blast.
>> > >>> print "Enter your FASTA file:\n";
>> > >>> chomp(my $infile = <STDIN>);
>> > >>> my $r = $remoteBlast->submit_blast($infile);
>> > >>> my $v = 1;
>> > >>>
>> > >>>   print STDERR "waiting..." if( $v > 0 );  ########  WAIT FOR THE
>> > RESULTS
>> > >>> TO RETURN!!!!!
>> > >>>   while ( my @rids = $remoteBlast->each_rid ) {
>> > >>>     foreach my $rid ( @rids ) {
>> > >>>       my $rc = $remoteBlast->retrieve_blast($rid);
>> > >>>       if( !ref($rc) ) {
>> > >>>         if( $rc < 0 ) {
>> > >>>           $remoteBlast->remove_rid($rid);
>> > >>>         }
>> > >>>         print STDERR "." if ( $v > 0 );
>> > >>>         sleep 5;
>> > >>>       } else {
>> > >>>         my $result = $rc->next_result();
>> > >>>         #save the output
>> > >>>         my $filename =
>> > >>> $result->query_name()."\.out";##################open SALIDA,
>> > >>> '>>'."$^T"."Report"."\.out";
>> > >>>         $remoteBlast->save_output($filename);#############
>> > >>>         $remoteBlast->remove_rid($rid);
>> > >>>         print "\nQuery Name: ", $result->query_name(), "\n";
>> > >>>         while ( my $hit = $result->next_hit ) {
>> > >>>           next unless ( $v > 0);
>> > >>>           print "\thit name is ", $hit->name, "\n";
>> > >>>           while( my $hsp = $hit->next_hsp ) {
>> > >>>             print "\t\tscore is ", $hsp->score, "\n";
>> > >>>           }
>> > >>>         }
>> > >>>       }
>> > >>>     }
>> > >>>   }
>> > >>>
>> > >>>
>> > >>> May any of you please explain me how to solve my question?
>> > >>>
>> > >>> Thanks in advence
>> > >>>
>> > >>> With best wishes
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> --
>> > >>> Dr. Jos? Luis Lav?n Trueba
>> > >>>
>> > >>> Dpto. de Producci?n Agraria
>> > >>> Grupo de Gen?tica y Microbiolog?a
>> > >>> Universidad P?blica de Navarra
>> > >>> 31006 Pamplona
>> > >>> Navarra
>> > >>> SPAIN
>> > >>>
>> > >>> _______________________________________________
>> > >>> Bioperl-l mailing list
>> > >>> Bioperl-l at lists.open-bio.org
>> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> Bioperl-l mailing list
>> > >> Bioperl-l at lists.open-bio.org
>> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> --
>> > >> Dr. Jos? Luis Lav?n Trueba
>> > >>
>> > >> Dpto. de Producci?n Agraria
>> > >> Grupo de Gen?tica y Microbiolog?a
>> > >> Universidad P?blica de Navarra
>> > >> 31006 Pamplona
>> > >> Navarra
>> > >> SPAIN
>> > >
>> > > _______________________________________________
>> > > Bioperl-l mailing list
>> > > Bioperl-l at lists.open-bio.org
>> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> --
>> Dr. Jos? Luis Lav?n Trueba
>>
>> Dpto. de Producci?n Agraria
>> Grupo de Gen?tica y Microbiolog?a
>> Universidad P?blica de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


-- 
-- 
Dr. Jos? Luis Lav?n Trueba

Dpto. de Producci?n Agraria
Grupo de Gen?tica y Microbiolog?a
Universidad P?blica de Navarra
31006 Pamplona
Navarra
SPAIN


From l.m.timmermans at students.uu.nl  Fri Nov 18 14:15:47 2011
From: l.m.timmermans at students.uu.nl (L.M. Timmermans)
Date: Fri, 18 Nov 2011 15:15:47 +0100
Subject: [Bioperl-l] Blast > parsing result in Exel
In-Reply-To: <32846407.post@talk.nabble.com>
References: <32846407.post@talk.nabble.com>
Message-ID: <CAC1jpXC7uBtbHb_ixzMy2idvfeFQc1Y=d8Zi3xn_=0RyGYTzrA@mail.gmail.com>

On Tue, Nov 15, 2011 at 10:25 AM, Giorgio C <casaburi at ceinge.unina.it>wrote:

> I need to parse in an exel sheet :
>

What you're saying here is nonsense. I think you meant to say you want to
output Excel.


> Is possible from a big blast result file obtain an exel with 5 columns
> where
> every field is the first hit of the blast result. Can anyone halp me to fix
> this problem ??? Also with a little script in perl.
>

There are a number of Perl modules on CPAN for outputting Excel. Try
Excel::Writer::XLSX and Spreadsheet::WriteExcel for example.

Leon


From tzhu at mail.bnu.edu.cn  Mon Nov 21 05:17:18 2011
From: tzhu at mail.bnu.edu.cn (Tao Zhu)
Date: Mon, 21 Nov 2011 13:17:18 +0800
Subject: [Bioperl-l] Is there a "combine" method that would combine several
 sequence alignments to a single alignment?
Message-ID: <4EC9DEDE.6030901@mail.bnu.edu.cn>

I can use the "slice" method to split a single sequence alignment into 
several subalignments. Then is there a corresponding "combine" method to 
combine such subalignments back?

-- 
Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
100875, China
Email: tzhu at mail.bnu.edu.cn


From David.Messina at sbc.su.se  Mon Nov 21 09:58:51 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 21 Nov 2011 10:58:51 +0100
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <4EC9DEDE.6030901@mail.bnu.edu.cn>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
Message-ID: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>

Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>


From roy.chaudhuri at gmail.com  Mon Nov 21 11:41:09 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 21 Nov 2011 11:41:09 +0000
Subject: [Bioperl-l] Is there a "combine" method that would combine
 several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <4ECA38D5.8050709@gmail.com>

See the cat method in Bio::Align::Utilities:

http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Align/Utilities.pm#cat

On 21/11/2011 09:58, Dave Messina wrote:
> Hi,
>
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
>
>
> Dave
>
>
>
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu<tzhu at mail.bnu.edu.cn>  wrote:
>
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>>
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>>
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From zntayl at gmail.com  Thu Nov 17 01:07:07 2011
From: zntayl at gmail.com (Nathan Taylor)
Date: Wed, 16 Nov 2011 20:07:07 -0500
Subject: [Bioperl-l] seqIO.pm
Message-ID: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>

Hello,

   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
barring that, a file of fastas and file of quals into .phd files?

Many thanks,
Nathan


From gregonomic at yahoo.co.nz  Mon Nov 21 12:00:50 2011
From: gregonomic at yahoo.co.nz (Gregory Baillie)
Date: Mon, 21 Nov 2011 04:00:50 -0800 (PST)
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
Message-ID: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>

Hi.

I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.

It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.

Usage:
concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>


If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').

Greg.


________________________________
 From: Dave Messina <David.Messina at sbc.su.se>
To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
Cc: BioPerl <bioperl-l at lists.open-bio.org> 
Sent: Monday, 21 November 2011 7:58 PM
Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
 
Hi,

No, I don't believe such a method exists. Could you describe what you are
wanting to do? Perhaps there is another way to do it.


Dave


On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:

> I can use the "slice" method to split a single sequence alignment into
> several subalignments. Then is there a corresponding "combine" method to
> combine such subalignments back?
>
> --
> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
> 100875, China
> Email: tzhu at mail.bnu.edu.cn
>
> ______________________________**_________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: concatenate_alignments.pl
Type: application/octet-stream
Size: 3349 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111121/aa673dba/attachment-0004.obj>

From jason.stajich at gmail.com  Mon Nov 21 15:31:50 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Mon, 21 Nov 2011 10:31:50 -0500
Subject: [Bioperl-l] Is there a "combine" method that would combine
	several sequence alignments to a single alignment?
In-Reply-To: <1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
References: <4EC9DEDE.6030901@mail.bnu.edu.cn>
	<CAM3TQQVy1jPUXB+cZcEXEm9RQeBB=VsZ5aCFX9bO=WJSaq-VYg@mail.gmail.com>
	<1321876850.71978.YahooMailNeo@web112508.mail.gq1.yahoo.com>
Message-ID: <39ECA743-8C56-4B23-8813-40EEEAB7DBB1@gmail.com>

greg  -- looks good - you could simplify part of the code to use the .= operator and use AlignIO to write the seqs out.

This is my script to combine a directory of MSA aligned .fasaln files into a single concatenated alignment.

https://github.com/hyphaltip/genome-scripts/blob/master/phylogenetics/combine_fasaln.pl

On Nov 21, 2011, at 7:00 AM, Gregory Baillie wrote:

> Hi.
> 
> I've attached a simple script (concatenate_alignments.pl) I wrote to concatenate alignments.
> 
> It can be a bit of a memory hog if you have long alignments or large numbers of sequences; otherwise you should be OK.
> 
> Usage:
> concatenate_alignments.pl -o <output_alignment> <input_alignment_1> <input_alignment_2> <... input_alignment_n>
> 
> 
> If you want to insert a string between the concatenated sequences, you can use the -j option (eg. -j '---').
> 
> Greg.
> 
> 
> ________________________________
> From: Dave Messina <David.Messina at sbc.su.se>
> To: Tao Zhu <tzhu at mail.bnu.edu.cn> 
> Cc: BioPerl <bioperl-l at lists.open-bio.org> 
> Sent: Monday, 21 November 2011 7:58 PM
> Subject: Re: [Bioperl-l] Is there a "combine" method that would combine several sequence alignments to a single alignment?
> 
> Hi,
> 
> No, I don't believe such a method exists. Could you describe what you are
> wanting to do? Perhaps there is another way to do it.
> 
> 
> Dave
> 
> 
> 
> On Mon, Nov 21, 2011 at 06:17, Tao Zhu <tzhu at mail.bnu.edu.cn> wrote:
> 
>> I can use the "slice" method to split a single sequence alignment into
>> several subalignments. Then is there a corresponding "combine" method to
>> combine such subalignments back?
>> 
>> --
>> Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing
>> 100875, China
>> Email: tzhu at mail.bnu.edu.cn
>> 
>> ______________________________**_________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l<http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l<concatenate_alignments.pl>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From p.j.a.cock at googlemail.com  Mon Nov 21 16:15:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 21 Nov 2011 16:15:13 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
Message-ID: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>

On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter


From cjfields at illinois.edu  Mon Nov 21 16:57:29 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 21 Nov 2011 16:57:29 +0000
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <2E075A8F-92F9-4A04-9254-EF4C07793A7C@illinois.edu>

On Nov 21, 2011, at 10:15 AM, Peter Cock wrote:

> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
>> Hello,
>> 
>>   Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
>> barring that, a file of fastas and file of quals into .phd files?
>> 
>> Many thanks,
>> Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an
> error message?
> 
> Peter

This should be possible in either circumstance (FASTQ should be more straightforward), there is a Bio::SeqIO::phd for this purpose.  Nathan, if you run into problems with that conversion let us know.

chris


From rondonbio at yahoo.com.br  Mon Nov 21 17:31:21 2011
From: rondonbio at yahoo.com.br (Rondon Neto)
Date: Mon, 21 Nov 2011 09:31:21 -0800 (PST)
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
Message-ID: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>

Hi! try this script:

#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;

if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }

my $fastq = $ARGV[0];

my $in = Bio::SeqIO->new( -file => $fastq,
?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );

my $out = Bio::SeqIO->new ( -file => ">out.phd",
?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');

while (my $seq = $in->next_seq()) {
?? ? ?$out->write_seq($seq);
}

exit;


Best wishes,
Rondon, a brazilian friend.


________________________________
 De: Peter Cock <p.j.a.cock at googlemail.com>
Para: Nathan Taylor <zntayl at gmail.com> 
Cc: bioperl-l at bioperl.org 
Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
Assunto: Re: [Bioperl-l] seqIO.pm
 
On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> Hello,
>
> ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> barring that, a file of fastas and file of quals into .phd files?
>
> Many thanks,
> Nathan

In principle that is possible (e.g. Biopython can do fastq to phd).
Have you tried using BioPerl's SeqIO to do this? Was there an
error message?

Peter

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Mon Nov 21 20:04:01 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 22 Nov 2011 09:04:01 +1300
Subject: [Bioperl-l] seqIO.pm
In-Reply-To: <1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
References: <CAB_EPVxHq0=HkB5MCVcAHzchtLZVJ8-c4mQz=yDkzVtvn2Gb_A@mail.gmail.com>
	<CAKVJ-_6wWxCGKizRZuum5FFYOtvwOqqTF9D8YzGJY5RqKhW17g@mail.gmail.com>
	<1321896681.57079.YahooMailNeo@web130205.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1AB@exchsth.agresearch.co.nz>

Or you could use the builtin script bp_sreformat.pl

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rondon Neto
> Sent: Tuesday, 22 November 2011 6:31 a.m.
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] seqIO.pm
> 
> Hi! try this script:
> 
> #!/usr/bin/perl
> use warnings;
> use strict;
> use Bio::SeqIO;
> 
> if (!$ARGV[0]) { die "\n### USAGE::: perl $0 [file.fastq]\n"; }
> 
> my $fastq = $ARGV[0];
> 
> my $in = Bio::SeqIO->new( -file => $fastq,
> ?? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'fastq' );
> 
> my $out = Bio::SeqIO->new ( -file => ">out.phd",
> ?? ? ? ? ? ? ? ? ? ? ? ? ? ?-format=> 'phd');
> 
> while (my $seq = $in->next_seq()) {
> ?? ? ?$out->write_seq($seq);
> }
> 
> exit;
> 
> 
> Best wishes,
> Rondon, a brazilian friend.
> 
> 
> 
> 
> 
> 
> ________________________________
>  De: Peter Cock <p.j.a.cock at googlemail.com>
> Para: Nathan Taylor <zntayl at gmail.com>
> Cc: bioperl-l at bioperl.org
> Enviadas: Segunda-feira, 21 de Novembro de 2011 14:15
> Assunto: Re: [Bioperl-l] seqIO.pm
> 
> On Thu, Nov 17, 2011 at 1:07 AM, Nathan Taylor <zntayl at gmail.com> wrote:
> > Hello,
> >
> > ? Can SeqIO.pm convert a file of fastq reads into .phd files. Or,
> > barring that, a file of fastas and file of quals into .phd files?
> >
> > Many thanks,
> > Nathan
> 
> In principle that is possible (e.g. Biopython can do fastq to phd).
> Have you tried using BioPerl's SeqIO to do this? Was there an error message?
> 
> Peter
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From goodyearkl at gmail.com  Tue Nov 22 02:23:13 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Mon, 21 Nov 2011 18:23:13 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
Message-ID: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>

Hi,
This may seem like a stupid question but I am just learning bioperl
and I am trying to figure out how to get a count of all the characters
in my FASTA file. I manged to get the number of sequences using the
following. Is there a way to tell bioperl to count the characters?

#!/usr/bin/perl -w
#Defines perl modules
#Bio::Seq deal with sequences and their features
use Bio::Seq;
#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;
#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );
#Count how many sequences are present in file
my $count=0;
while (my $seq_obj = $seqio_obj->next_seq) {
    $count++;
}
#Display the number of sequences present
print "There are $count sequences present.\n";


From David.Messina at sbc.su.se  Tue Nov 22 08:08:11 2011
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 22 Nov 2011 09:08:11 +0100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>

Hi Kylie,

You can use the length method for this.

my $seq_length = $seq_obj->length();

Have you taken a look at the beginner's HOWTO? There's a nice table of
sequence methods as well lots of other good information in there.

http://www.bioperl.org/wiki/HOWTO:Beginners


Dave


On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyearkl at gmail.com> wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
>
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From liam.elbourne at mq.edu.au  Tue Nov 22 04:11:12 2011
From: liam.elbourne at mq.edu.au (Liam Elbourne)
Date: Tue, 22 Nov 2011 15:11:12 +1100
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
Message-ID: <EEEBBE60-96CB-4458-A460-F154CCC7459D@mq.edu.au>

Hi Kylie,

I think the length() method is what you're after:

....
my $sequence_length = $seq_obj->length();

....

in your case. Have a look at:

HOWTO:SeqIO - BioPerl

and,

HOWTO:Beginners - BioPerl

for some more general stuff.


Regards,
Liam.


On 22/11/2011, at 1:23 PM, Kylie Goodyear wrote:

> Hi,
> This may seem like a stupid question but I am just learning bioperl
> and I am trying to figure out how to get a count of all the characters
> in my FASTA file. I manged to get the number of sequences using the
> following. Is there a way to tell bioperl to count the characters?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> #Count how many sequences are present in file
> my $count=0;
> while (my $seq_obj = $seqio_obj->next_seq) {
>    $count++;
> }
> #Display the number of sequences present
> print "There are $count sequences present.\n";
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20111122/d6589266/attachment.sig>

From goodyearkl at gmail.com  Tue Nov 22 13:00:55 2011
From: goodyearkl at gmail.com (Kylie Goodyear)
Date: Tue, 22 Nov 2011 05:00:55 -0800 (PST)
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
Message-ID: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>

Thank you for your help. It keeps telling me that it can't find
"length" do you think it has to do with the way I am coding it?

#!/usr/bin/perl -w
#Defines perl modules

#Bio::Seq deal with sequences and their features
use Bio::Seq;

#Bio::SeqIO handles reading and parsing of sequences of many different
formats
use Bio::SeqIO;


#Read FASTA file
$seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
=> "fasta" );


#Count how many sequences are present in file
my $countseq=0;
while (my $seq_obj = $seqio_obj->next_seq, ) {
    $countseq++;
    }
#Display the number of sequences present
print "There are $countseq sequences present.\n";

#Count number of charcaters in file
my $seq_length = $seq_obj->length ;
print $seq_length


On Nov 22, 5:08?am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> Hi Kylie,
>
> You can use the length method for this.
>
> my $seq_length = $seq_obj->length();
>
> Have you taken a look at the beginner's HOWTO? There's a nice table of
> sequence methods as well lots of other good information in there.
>
> http://www.bioperl.org/wiki/HOWTO:Beginners
>
> Dave
>
>
>
>
>
>
>
>
>
> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
> > Hi,
> > This may seem like a stupid question but I am just learning bioperl
> > and I am trying to figure out how to get a count of all the characters
> > in my FASTA file. I manged to get the number of sequences using the
> > following. Is there a way to tell bioperl to count the characters?
>
> > #!/usr/bin/perl -w
> > #Defines perl modules
> > #Bio::Seq deal with sequences and their features
> > use Bio::Seq;
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats
> > use Bio::SeqIO;
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> > #Count how many sequences are present in file
> > my $count=0;
> > while (my $seq_obj = $seqio_obj->next_seq) {
> > ? ?$count++;
> > }
> > #Display the number of sequences present
> > print "There are $count sequences present.\n";
>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioper... at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy.chaudhuri at gmail.com  Tue Nov 22 15:50:31 2011
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Tue, 22 Nov 2011 15:50:31 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <4ECBC4C7.10401@gmail.com>

Hi Kylie,

I suspect the error you get is actually "Can't call method length on an 
undefined value" (please in future report the exact text of any error 
messages). You declare $seq_obj with "my" in the while loop, but then 
try to access it outside of the loop. Try printing out the length of 
each $seq_obj within the while loop.

You should always include "use strict;" at the top of your program, that 
helps to catch errors like this.

Cheers,
Roy.

On 22/11/2011 13:00, Kylie Goodyear wrote:
> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
>
> #!/usr/bin/perl -w
> #Defines perl modules
>
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
>
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
>
>
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
> =>  "fasta" );
>
>
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>      $countseq++;
>      }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
>
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
>
>
> On Nov 22, 5:08 am, Dave Messina<David.Mess... at sbc.su.se>  wrote:
>> Hi Kylie,
>>
>> You can use the length method for this.
>>
>> my $seq_length = $seq_obj->length();
>>
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>>
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>>
>> Dave
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear<goodyea... at gmail.com>  wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>>
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file =>  "DNA_sequences.fasta", -format
>>> =>  "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>     $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 22 16:13:01 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 22 Nov 2011 16:13:01 +0000
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
Message-ID: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>

This sounds a little homework-y.  Sure this isn't for a class? :)

One clue (and a good thing to keep in mind): always 'use strict; use warnings;' with your scripts if you are new to perl.  Doing so would let you know there is a problem with the script the way it is written, specifically, the place where you are inquiring about the length.

chris

On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:

> Thank you for your help. It keeps telling me that it can't find
> "length" do you think it has to do with the way I am coding it?
> 
> #!/usr/bin/perl -w
> #Defines perl modules
> 
> #Bio::Seq deal with sequences and their features
> use Bio::Seq;
> 
> #Bio::SeqIO handles reading and parsing of sequences of many different
> formats
> use Bio::SeqIO;
> 
> 
> #Read FASTA file
> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> => "fasta" );
> 
> 
> #Count how many sequences are present in file
> my $countseq=0;
> while (my $seq_obj = $seqio_obj->next_seq, ) {
>    $countseq++;
>    }
> #Display the number of sequences present
> print "There are $countseq sequences present.\n";
> 
> #Count number of charcaters in file
> my $seq_length = $seq_obj->length ;
> print $seq_length
> 
> 
> On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
>> Hi Kylie,
>> 
>> You can use the length method for this.
>> 
>> my $seq_length = $seq_obj->length();
>> 
>> Have you taken a look at the beginner's HOWTO? There's a nice table of
>> sequence methods as well lots of other good information in there.
>> 
>> http://www.bioperl.org/wiki/HOWTO:Beginners
>> 
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com> wrote:
>>> Hi,
>>> This may seem like a stupid question but I am just learning bioperl
>>> and I am trying to figure out how to get a count of all the characters
>>> in my FASTA file. I manged to get the number of sequences using the
>>> following. Is there a way to tell bioperl to count the characters?
>> 
>>> #!/usr/bin/perl -w
>>> #Defines perl modules
>>> #Bio::Seq deal with sequences and their features
>>> use Bio::Seq;
>>> #Bio::SeqIO handles reading and parsing of sequences of many different
>>> formats
>>> use Bio::SeqIO;
>>> #Read FASTA file
>>> $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
>>> => "fasta" );
>>> #Count how many sequences are present in file
>>> my $count=0;
>>> while (my $seq_obj = $seqio_obj->next_seq) {
>>>    $count++;
>>> }
>>> #Display the number of sequences present
>>> print "There are $count sequences present.\n";
>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioper... at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Nov 22 20:47:36 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 23 Nov 2011 09:47:36 +1300
Subject: [Bioperl-l] Fasta counting script?
In-Reply-To: <0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
References: <6f8a5606-093c-4b99-a92c-2b9ffac6832b@y42g2000yqh.googlegroups.com>
	<CAM3TQQXmB-QHRgp6JdNJFb9a--SAQ0CpjmFCtwhNC3JARThJsw@mail.gmail.com>
	<273d584f-8cc5-42f7-9257-029b0f3a4cd5@cu3g2000vbb.googlegroups.com>
	<0507832F-51F3-4946-BDCD-6913BBE697BC@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1B8@exchsth.agresearch.co.nz>

Or again, you could use the builtin scripts bp_seq_length.pl or bp_gccalc.pl
As previous posters have hinted, RTFM - the answers are all in there!

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J
> Sent: Wednesday, 23 November 2011 5:13 a.m.
> To: Kylie Goodyear
> Cc: <bioperl-l at bioperl.org>
> Subject: Re: [Bioperl-l] Fasta counting script?
> 
> This sounds a little homework-y.  Sure this isn't for a class? :)
> 
> One clue (and a good thing to keep in mind): always 'use strict; use warnings;'
> with your scripts if you are new to perl.  Doing so would let you know there is
> a problem with the script the way it is written, specifically, the place where
> you are inquiring about the length.
> 
> chris
> 
> On Nov 22, 2011, at 7:00 AM, Kylie Goodyear wrote:
> 
> > Thank you for your help. It keeps telling me that it can't find
> > "length" do you think it has to do with the way I am coding it?
> >
> > #!/usr/bin/perl -w
> > #Defines perl modules
> >
> > #Bio::Seq deal with sequences and their features use Bio::Seq;
> >
> > #Bio::SeqIO handles reading and parsing of sequences of many different
> > formats use Bio::SeqIO;
> >
> >
> > #Read FASTA file
> > $seqio_obj = Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format
> > => "fasta" );
> >
> >
> > #Count how many sequences are present in file my $countseq=0; while
> > (my $seq_obj = $seqio_obj->next_seq, ) {
> >    $countseq++;
> >    }
> > #Display the number of sequences present print "There are $countseq
> > sequences present.\n";
> >
> > #Count number of charcaters in file
> > my $seq_length = $seq_obj->length ;
> > print $seq_length
> >
> >
> > On Nov 22, 5:08 am, Dave Messina <David.Mess... at sbc.su.se> wrote:
> >> Hi Kylie,
> >>
> >> You can use the length method for this.
> >>
> >> my $seq_length = $seq_obj->length();
> >>
> >> Have you taken a look at the beginner's HOWTO? There's a nice table
> >> of sequence methods as well lots of other good information in there.
> >>
> >> http://www.bioperl.org/wiki/HOWTO:Beginners
> >>
> >> Dave
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Nov 22, 2011 at 03:23, Kylie Goodyear <goodyea... at gmail.com>
> wrote:
> >>> Hi,
> >>> This may seem like a stupid question but I am just learning bioperl
> >>> and I am trying to figure out how to get a count of all the
> >>> characters in my FASTA file. I manged to get the number of sequences
> >>> using the following. Is there a way to tell bioperl to count the characters?
> >>
> >>> #!/usr/bin/perl -w
> >>> #Defines perl modules
> >>> #Bio::Seq deal with sequences and their features use Bio::Seq;
> >>> #Bio::SeqIO handles reading and parsing of sequences of many
> >>> different formats use Bio::SeqIO; #Read FASTA file $seqio_obj =
> >>> Bio::SeqIO->new(-file => "DNA_sequences.fasta", -format => "fasta"
> >>> ); #Count how many sequences are present in file my $count=0; while
> >>> (my $seq_obj = $seqio_obj->next_seq) {
> >>>    $count++;
> >>> }
> >>> #Display the number of sequences present print "There are $count
> >>> sequences present.\n";
> >>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioper... at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinf
> >> o/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From charles-listes+bioperl at plessy.org  Wed Nov 23 10:27:45 2011
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 23 Nov 2011 19:27:45 +0900
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
Message-ID: <20111123102745.GC20168@merveille.plessy.net>

Dear BioPerl developers,

I am trying to process some unaligned paired-end reads with Bio::DB::Sam.  For
each pair, I want to detect a sequence index and a unique molecular identifier in
the linker, record them as auxiliary flags, and trim the linker from the read.

I collect the pairs through a features iterator, and can access all their data
through the high-level Bio::DB::Bam::Alignment API.  After modifying them
(linker trimming and adding flags), I want to write the resulting pairs as a
new unaligned BAM file.

I apologise if the solution is trivial, but my problem is that I do not manage to
modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
?$pair[0]->qseq("GATACA")? give errors like
?Usage: Bio::DB::Bam::Alignment::qseq(b) at /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.

Since I did not find explanations or portsions of source code indicating how to
modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


From MEC at stowers.org  Wed Nov 23 16:02:26 2011
From: MEC at stowers.org (Cook, Malcolm)
Date: Wed, 23 Nov 2011 10:02:26 -0600
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>

Charles,

I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.

I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".

~Malcolm


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> Sent: Wednesday, November 23, 2011 4:28 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> 
> Dear BioPerl developers,
> 
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
> For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the read.
> 
> I collect the pairs through a features iterator, and can access all their data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as a
> new unaligned BAM file.
> 
> I apologise if the solution is trivial, but my problem is that I do not manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> 
> Since I did not find explanations or portsions of source code indicating how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> 
> Have a nice day,
> 
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 23 19:26:31 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 23 Nov 2011 19:26:31 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
Message-ID: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>

According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).  

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
    Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
    if (items > 1)
      b->core.l_qseq = SvIV(ST(1));
    RETVAL=b->core.l_qseq;
OUTPUT:
    RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
    char* seq;
    int   i;
CODE:
    seq = Newxz(seq,b->core.l_qseq+1,char);
    for (i=0;i<b->core.l_qseq;i++) {
      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
    }
    RETVAL = newSVpv(seq,b->core.l_qseq);
    Safefree(seq);
OUTPUT:
    RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
> 
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
> 
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
> 
> ~Malcolm
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>> 
>> Dear BioPerl developers,
>> 
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>> 
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>> 
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>> 
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>> 
>> Have a nice day,
>> 
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lincoln.stein at gmail.com  Wed Nov 23 22:02:23 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:02:23 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <20111123102745.GC20168@merveille.plessy.net>
References: <20111123102745.GC20168@merveille.plessy.net>
Message-ID: <CAOS1dzwxY2Kt3_xkgnbCps_TYfnT3dGE9+gAirpBCeJoMT7YDg@mail.gmail.com>

I apologize that the qseq() method is only allowing read-only access. I
will attempt to fix this.

Lincoln

On Wed, Nov 23, 2011 at 6:27 PM, Charles Plessy <
charles-listes+bioperl at plessy.org> wrote:

> Dear BioPerl developers,
>
> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>  For
> each pair, I want to detect a sequence index and a unique molecular
> identifier in
> the linker, record them as auxiliary flags, and trim the linker from the
> read.
>
> I collect the pairs through a features iterator, and can access all their
> data
> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
> (linker trimming and adding flags), I want to write the resulting pairs as
> a
> new unaligned BAM file.
>
> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> ?$pair[0]->qseq("GATACA")? give errors like
> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>
> Since I did not find explanations or portsions of source code indicating
> how to
> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>
> Have a nice day,
>
> --
> Charles Plessy
> Tsurumi, Kanagawa, Japan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From lincoln.stein at gmail.com  Wed Nov 23 22:05:41 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 24 Nov 2011 06:05:41 +0800
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>
Message-ID: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>

Unfortunately l_qseq read/writes the length of the query sequence, not the
sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <
cjfields at illinois.edu> wrote:

> According to the docs the low-level API for Bio-Samtools, both read and
> write are allowed:
>
> http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API
>
> Using the low-level API for this purpose isn't documented as well, though
> (the high-level API is read only AFAICT).
>
> The error message is a standard one generated from the XS bindings where
> the passed argument passed isn't mapped correctly.  Looking through the
> Sam.xs file, qseq() is only prototyped as a reader; the only arg is a
> Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a
> function specified for Bio::DB::Bam::Alignment names l_qseq() that might be
> the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_'
> prefix):
>
> ....
>
> int
> bama_l_qseq(b,...)
>    Bio::DB::Bam::Alignment b
> PROTOTYPE: $;$
> CODE:
>    if (items > 1)
>      b->core.l_qseq = SvIV(ST(1));
>    RETVAL=b->core.l_qseq;
> OUTPUT:
>    RETVAL
>
> SV*
> bama_qseq(b)
> Bio::DB::Bam::Alignment b
> PROTOTYPE: $
> PREINIT:
>    char* seq;
>    int   i;
> CODE:
>    seq = Newxz(seq,b->core.l_qseq+1,char);
>    for (i=0;i<b->core.l_qseq;i++) {
>      seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
>    }
>    RETVAL = newSVpv(seq,b->core.l_qseq);
>    Safefree(seq);
> OUTPUT:
>    RETVAL
>
>
> -chris
>
> On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:
>
> > Charles,
> >
> > I suggest you reconsider your approach to rather, use `samtools view` to
> pipe your reads to stdout in sam format, then stream edit the barcode and
> pipe it back to samtools for conversion back to .bam file.
> >
> > I know this is not what you're asking.  I'm pretty sure that direct
> answer to your question is, "yes - they are read-only".
> >
> > ~Malcolm
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Charles Plessy
> >> Sent: Wednesday, November 23, 2011 4:28 AM
> >> To: bioperl-l at bioperl.org
> >> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
> >>
> >> Dear BioPerl developers,
> >>
> >> I am trying to process some unaligned paired-end reads with
> Bio::DB::Sam.
> >> For
> >> each pair, I want to detect a sequence index and a unique molecular
> >> identifier in
> >> the linker, record them as auxiliary flags, and trim the linker from
> the read.
> >>
> >> I collect the pairs through a features iterator, and can access all
> their data
> >> through the high-level Bio::DB::Bam::Alignment API.  After modifying
> them
> >> (linker trimming and adding flags), I want to write the resulting pairs
> as a
> >> new unaligned BAM file.
> >>
> >> I apologise if the solution is trivial, but my problem is that I do not
> manage to
> >> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
> >> ?$pair[0]->qseq("GATACA")? give errors like
> >> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
> >> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
> >>
> >> Since I did not find explanations or portsions of source code
> indicating how to
> >> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
> >>
> >> Have a nice day,
> >>
> >> --
> >> Charles Plessy
> >> Tsurumi, Kanagawa, Japan
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From cjfields at illinois.edu  Thu Nov 24 01:07:09 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Thu, 24 Nov 2011 01:07:09 +0000
Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
In-Reply-To: <CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
References: <20111123102745.GC20168@merveille.plessy.net>
	<2C40E43D1F7A56408C4463FD245DDDF99398FEB5@EXCHMB-02.stowers-institute.org>
	<CA5177C9-FF24-49D7-AFB3-1B79A742F2B4@illinois.edu>,
	<CAOS1dzwR050PGsomiSJZT+qns60XU8Smgy_WrotyxcOo+LmgNw@mail.gmail.com>
Message-ID: <92CA8F24-47CB-42AF-8C20-9C4765A592A5@illinois.edu>

Ah, okay, makes sense.  I thought it was oddly named. :)

Chris

Sent from my iPad

On Nov 23, 2011, at 4:05 PM, "Lincoln Stein" <lincoln.stein at gmail.com<mailto:lincoln.stein at gmail.com>> wrote:

Unfortunately l_qseq read/writes the length of the query sequence, not the sequence itself.

Lincoln

On Thu, Nov 24, 2011 at 3:26 AM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
According to the docs the low-level API for Bio-Samtools, both read and write are allowed:

http://search.cpan.org/perldoc?Bio::DB::Sam#The_low-level_API

Using the low-level API for this purpose isn't documented as well, though (the high-level API is read only AFAICT).

The error message is a standard one generated from the XS bindings where the passed argument passed isn't mapped correctly.  Looking through the Sam.xs file, qseq() is only prototyped as a reader; the only arg is a Bio::DB::Bam::Alignment (e.g. $self).  However, it appears there is a function specified for Bio::DB::Bam::Alignment names l_qseq() that might be the setter, wheras qseq() is maybe to be the getter (ignore the 'bama_' prefix):

....

int
bama_l_qseq(b,...)
   Bio::DB::Bam::Alignment b
PROTOTYPE: $;$
CODE:
   if (items > 1)
     b->core.l_qseq = SvIV(ST(1));
   RETVAL=b->core.l_qseq;
OUTPUT:
   RETVAL

SV*
bama_qseq(b)
Bio::DB::Bam::Alignment b
PROTOTYPE: $
PREINIT:
   char* seq;
   int   i;
CODE:
   seq = Newxz(seq,b->core.l_qseq+1,char);
   for (i=0;i<b->core.l_qseq;i++) {
     seq[i]=bam_nt16_rev_table[bam1_seqi(bam1_seq(b),i)];
   }
   RETVAL = newSVpv(seq,b->core.l_qseq);
   Safefree(seq);
OUTPUT:
   RETVAL


-chris

On Nov 23, 2011, at 10:02 AM, Cook, Malcolm wrote:

> Charles,
>
> I suggest you reconsider your approach to rather, use `samtools view` to pipe your reads to stdout in sam format, then stream edit the barcode and pipe it back to samtools for conversion back to .bam file.
>
> I know this is not what you're asking.  I'm pretty sure that direct answer to your question is, "yes - they are read-only".
>
> ~Malcolm
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org<mailto:bioperl-l-bounces at lists.open-bio.org> [mailto:bioperl-l-<mailto:bioperl-l->
>> bounces at lists.open-bio.org<mailto:bounces at lists.open-bio.org>] On Behalf Of Charles Plessy
>> Sent: Wednesday, November 23, 2011 4:28 AM
>> To: bioperl-l at bioperl.org<mailto:bioperl-l at bioperl.org>
>> Subject: [Bioperl-l] Are Bio::DB::Bam::Alignment objects read-only ?
>>
>> Dear BioPerl developers,
>>
>> I am trying to process some unaligned paired-end reads with Bio::DB::Sam.
>> For
>> each pair, I want to detect a sequence index and a unique molecular
>> identifier in
>> the linker, record them as auxiliary flags, and trim the linker from the read.
>>
>> I collect the pairs through a features iterator, and can access all their data
>> through the high-level Bio::DB::Bam::Alignment API.  After modifying them
>> (linker trimming and adding flags), I want to write the resulting pairs as a
>> new unaligned BAM file.
>>
>> I apologise if the solution is trivial, but my problem is that I do not manage to
>> modify the Bio::DB::Bam::Alignment objects.  Typically, attempts such as
>> ?$pair[0]->qseq("GATACA")? give errors like
>> ?Usage: Bio::DB::Bam::Alignment::qseq(b) at
>> /usr/lib/perl5/Bio/DB/Bam/AlignWrapper.pm line 80?.
>>
>> Since I did not find explanations or portsions of source code indicating how to
>> modify Bio::DB::Bam::Alignment objects, I wonder if they are read-only?
>>
>> Have a nice day,
>>
>> --
>> Charles Plessy
>> Tsurumi, Kanagawa, Japan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org<mailto:Bioperl-l at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/bioperl-l


--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca<mailto:Renata.Musa at oicr.on.ca>>


From ross at cuhk.edu.hk  Sun Nov 27 08:24:43 2011
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sun, 27 Nov 2011 16:24:43 +0800
Subject: [Bioperl-l] Check the location type for a particular gene in a
	Genbank file
In-Reply-To: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
Message-ID: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>

Hi all,

To write a script to extract sequence generically for all types of
BioLocation objects, I'd like to know if there is any way to check what
types (e.g. simple or split) are being processed.

Bio::Location::CoordinatePolicyI appears to be doing something similar but
it is more like a post checking step. If I parse the genbank file line by
line, I can certainly check whether the line contains keywords like "join"
but as I'm using something like:

        my @features=grep{$_->primary_tag eq $chkTags[0]}
$seqobj->get_SeqFeatures;                                    
 

        foreach (@features) {

            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';

            @gene=[];                                                   

I'd appreciate if anybody knows a better integration with the well-developed
bioperl module.

Thanks a lot.


From Russell.Smithies at agresearch.co.nz  Mon Nov 28 00:46:05 2011
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 28 Nov 2011 13:46:05 +1300
Subject: [Bioperl-l] Galaxy tools?
Message-ID: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>

Possibly the wrong place to ask but has anyone written Galaxy tools using BioPerl?
I was thinking of creating blast graphic and format converter tools as I couldn't see any already available in their toolbox.
It looks like I can just write a Python wrapper for my existing BioPerl scripts - although I suspect the "correct" method is to use BioPython methods (but Python annoys me with its lack of semi-colons and required white-space)

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From p.j.a.cock at googlemail.com  Mon Nov 28 01:28:33 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 28 Nov 2011 01:28:33 +0000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
Message-ID: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>

On Monday, November 28, 2011, Smithies, Russell  wrote:
> Possibly the wrong place to ask but has anyone written
> Galaxy tools using BioPerl?
> I was thinking of creating blast graphic and format converter
>  tools as I couldn't see any already available in their toolbox.
> It looks like I can just write a Python wrapper for my existing
> BioPerl scripts - although I suspect the "correct" method is to
> use BioPython methods (but Python annoys me with its lack
> of semi-colons and required white-space)

Galaxy is agnostic about what language the tools are in,
you can use a binary, shell script, Java, Perl, Python etc.

Peter


From florent.angly at gmail.com  Mon Nov 28 02:09:45 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 12:09:45 +1000
Subject: [Bioperl-l] Galaxy tools?
In-Reply-To: <CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
References: <18DF7D20DFEC044098A1062202F5FFF340186CF1E3@exchsth.agresearch.co.nz>
	<CAKVJ-_7k6RpQHw4a6=H3qOK+zb+r3T_sG74MG2fPM5_7NbViYA@mail.gmail.com>
Message-ID: <4ED2ED69.10601@gmail.com>

Hi Russell,

As Peter said, the tools to be wrapped do not need to be written in Python.

I have build a few wrappers for Galaxy, including one for the read 
simulator Grinder (http://sourceforge.net/projects/biogrinder/), which 
uses Bioperl and is available in the Galaxy Toolshed 
(http://sourceforge.net/projects/biogrinder/). It is not very hard to do 
a wrapper for trivial programs, but becomes more complicated once you 
start having optional arguments or multiple output files.

Grinder uses Getopt::Euclid (http://search.cpan.org/dist/Getopt-Euclid/) 
to parse command-line arguments. I have been thinking about leveraging 
the information that Getopt::Euclid stores about command-line arguments 
to automate most of the Galaxy wrapper generation, but I have not gotten 
to it yet.

Florent


On 28/11/11 11:28, Peter Cock wrote:
> On Monday, November 28, 2011, Smithies, Russell  wrote:
>> Possibly the wrong place to ask but has anyone written
>> Galaxy tools using BioPerl?
>> I was thinking of creating blast graphic and format converter
>>   tools as I couldn't see any already available in their toolbox.
>> It looks like I can just write a Python wrapper for my existing
>> BioPerl scripts - although I suspect the "correct" method is to
>> use BioPython methods (but Python annoys me with its lack
>> of semi-colons and required white-space)
> Galaxy is agnostic about what language the tools are in,
> you can use a binary, shell script, Java, Perl, Python etc.
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Mon Nov 28 04:35:31 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 28 Nov 2011 14:35:31 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
Message-ID: <4ED30F93.4000407@gmail.com>

Hi all,

I have been thinking about starting a set of Perl modules that would 
useful for (microbial) ecologists to represent communities of organisms. 
At the moment, there does not seem to be anything like this in Bioperl. 
I am happy to make these modules available under the Bioperl umbrella 
using the Bio::Community::* namespace.

I envision the following modules:
* Bio::Community::Member module representing members of a community.
* Bio::Community::IO modules to read/write files that describe community 
composition (a.k.a. OTU table, or site by species table) as used 
programs like QIIME, Pyrotagger, GAAS, ...
* Bio::Community::Tools modules to help manipulate communities, e.g. to 
take some members at random, normalize the community to a given number 
of individuals, or do rarefaction curves.

The idea is to implement these modules in Moose to teach myself Moose. 
The members of a community could be a sequence (Bio::SeqI), a species 
(Bio::S), an arbitrary string or even other things. I am not quite sure 
if Bioperl provide facilities to attach some arbitrary information to an 
object.

Any interest? Ideas? Comments?

Thanks,

Florent


From cjfields at illinois.edu  Mon Nov 28 19:42:12 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:42:12 +0000
Subject: [Bioperl-l] Check the location type for a particular gene in
	a	Genbank file
In-Reply-To: <000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
References: <CAA7rn9dYGfeCS-qVccDcdwVenj8KFQs4i89EoT8_63gQk5b42g@mail.gmail.com>
	<000101ccacde$0fc4ba90$2f4e2fb0$@edu.hk>
Message-ID: <49363DC1-110A-49A8-B8D7-75AA624A535C@illinois.edu>

Ross,

The standard way is to check whether the location object is a SplitLocationI or not, see the following for an example:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Location_Objects

chris

On Nov 27, 2011, at 2:24 AM, Ross KK Leung wrote:

> Hi all,
> 
> To write a script to extract sequence generically for all types of
> BioLocation objects, I'd like to know if there is any way to check what
> types (e.g. simple or split) are being processed.
> 
> Bio::Location::CoordinatePolicyI appears to be doing something similar but
> it is more like a post checking step. If I parse the genbank file line by
> line, I can certainly check whether the line contains keywords like "join"
> but as I'm using something like:
> 
>        my @features=grep{$_->primary_tag eq $chkTags[0]}
> $seqobj->get_SeqFeatures;                                    
> 
> 
>        foreach (@features) {
> 
>            $pseudo=$_->has_tag('pseudo')?'pseudo':'functional';
> 
>            @gene=[];                                                   
> 
> I'd appreciate if anybody knows a better integration with the well-developed
> bioperl module.
> 
> Thanks a lot.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Nov 28 19:47:10 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Mon, 28 Nov 2011 19:47:10 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED30F93.4000407@gmail.com>
References: <4ED30F93.4000407@gmail.com>
Message-ID: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>

I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?  I do think it should be developed on it's own, per our recent discussions re: slimming down core.

Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.

chris

On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:

> Hi all,
> 
> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
> 
> I envision the following modules:
> * Bio::Community::Member module representing members of a community.
> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
> 
> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
> 
> Any interest? Ideas? Comments?
> 
> Thanks,
> 
> Florent
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From l.m.timmermans at students.uu.nl  Mon Nov 28 20:25:13 2011
From: l.m.timmermans at students.uu.nl (Leon Timmermans)
Date: Mon, 28 Nov 2011 21:25:13 +0100
Subject: [Bioperl-l]  Interest in Bio::Community modules
In-Reply-To: <CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
Message-ID: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>

And now to the list too,

On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:

> The idea is to implement these modules in Moose to teach myself Moose. The
> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
> an arbitrary string or even other things. I am not quite sure if Bioperl
> provide facilities to attach some arbitrary information to an object.
>
> Any interest? Ideas? Comments?
>

Sounds like a good use-case for roles, maybe even parametric roles.

Leon


From florent.angly at gmail.com  Tue Nov 29 00:59:24 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 29 Nov 2011 10:59:24 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
Message-ID: <4ED42E6C.6020501@gmail.com>

Hi Chris,

On 29/11/11 05:47, Fields, Christopher J wrote:
> I think the idea is sound, it would be nice to have.  Jason is working a bit in this area, maybe he has some additional thoughts?  Would there be some redundancy with any current code (Bio::Tree, Bio::Taxon, etc)?
None of these features would be duplicated. Rather, they would be used 
attributes of the Bio::Community::* objects. For example, a member of a 
community could have a Bio::SeqI attached to it as well as a Bio::Taxon, 
etc...

> I do think it should be developed on it's own, per our recent discussions re: slimming down core.
Yes, the features are so different that it makes sense to have the 
Bio::Community::* modules as a separate BioPerl distribution (like the 
Bio-FeatureIO BioPerl distribution).

> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* 
modules would need to inherit from any other BioPerl modules. 
Considering this and the performance aspects of Moose, do you think that 
using Moose is a wise design decision?

Best,

Florent


> chris
>
> On Nov 27, 2011, at 10:35 PM, Florent Angly wrote:
>
>> Hi all,
>>
>> I have been thinking about starting a set of Perl modules that would useful for (microbial) ecologists to represent communities of organisms. At the moment, there does not seem to be anything like this in Bioperl. I am happy to make these modules available under the Bioperl umbrella using the Bio::Community::* namespace.
>>
>> I envision the following modules:
>> * Bio::Community::Member module representing members of a community.
>> * Bio::Community::IO modules to read/write files that describe community composition (a.k.a. OTU table, or site by species table) as used programs like QIIME, Pyrotagger, GAAS, ...
>> * Bio::Community::Tools modules to help manipulate communities, e.g. to take some members at random, normalize the community to a given number of individuals, or do rarefaction curves.
>>
>> The idea is to implement these modules in Moose to teach myself Moose. The members of a community could be a sequence (Bio::SeqI), a species (Bio::S), an arbitrary string or even other things. I am not quite sure if Bioperl provide facilities to attach some arbitrary information to an object.
>>
>> Any interest? Ideas? Comments?
>>
>> Thanks,
>>
>> Florent
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Nov 29 05:32:50 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 05:32:50 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<CAC1jpXDrmNJqzzWR80BvEzwDgGJBJekS4k+a5-ZUTkunU0r9VQ@mail.gmail.com>
	<CAC1jpXCOTKF21kMNQbJZWF7oR-Ue1ry3HpYkVWK9=FV--QfTog@mail.gmail.com>
Message-ID: <C87E8F45-FE8A-4E77-A612-DF1E25C9CA73@illinois.edu>

On Nov 28, 2011, at 2:25 PM, Leon Timmermans wrote:

> And now to the list too,
> 
> On Mon, Nov 28, 2011 at 5:35 AM, Florent Angly <florent.angly at gmail.com>wrote:
> 
>> The idea is to implement these modules in Moose to teach myself Moose. The
>> members of a community could be a sequence (Bio::SeqI), a species (Bio::S),
>> an arbitrary string or even other things. I am not quite sure if Bioperl
>> provide facilities to attach some arbitrary information to an object.
>> 
>> Any interest? Ideas? Comments?
>> 
> 
> Sounds like a good use-case for roles, maybe even parametric roles.
> 
> Leon

Yep, agree totally.  It would be a good replacement in most cases for the BioI interfaces.  

(see also, the Biome project, which I'm slooooooowly working on again, on github)

chris


From pmr at ebi.ac.uk  Tue Nov 29 13:39:52 2011
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 29 Nov 2011 13:39:52 +0000
Subject: [Bioperl-l] BinarySearch.pm
Message-ID: <4ED4E0A8.30102@ebi.ac.uk>

In trying to use bioflat_index.pl index files in EMBOSS, I ran into some 
problems.

Both appear to be in the Bio/Flat/BinarySearch.pm source file.

EMBL ID lines are failing to drop the ';' from the ID. Updating the 
regular expression to make sure the ';' is not picked up seems to work:

   if ($format =~ /embl/i) {
     return ('ID',
	    "^ID   (\\S+[^; ])",
	    "^ID   (\\S+[^; ])",
	    {
	     ACC     => q/^AC   (\S+);/,
	     VERSION => q/^SV\s+(\S+)/
	    });
   }

The ACC secondary index has every record duplicated.
This line is duplicated in the write_secondary_indices source code. Is 
that intentional?

  		    print $fh sprintf("%-${length}s",$record);

regards,

Peter Rice
EMBOSS Team


From uni.anastasia at gmail.com  Sat Nov 26 17:32:48 2011
From: uni.anastasia at gmail.com (anastsia shapiro)
Date: Sat, 26 Nov 2011 19:32:48 +0200
Subject: [Bioperl-l] Problem with parsing blast results
Message-ID: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>

Hello,

I'm running a script that should parse a blast results, using searchIO.

Sometimes the script work fines, however sometimes it stops, and I receive
the following error.

------------- EXCEPTION -------------
MSG: no data for midline Query
------------------------------------------------------------
STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
blast.pm:1805
STACK toplevel
D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
-------------------------------------
While the blast results files were received as a result of running the
following blast command:
blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I am using bioperl 1.6.1.
I read all the forums , and it seems to be a bug, but on version 1.5 it was
fixed.

I will really appreciate your help, since I am trying to understand the
problem for over a month.

Regards,
Anastasia


From bunk at novozymes.com  Tue Nov 29 16:46:54 2011
From: bunk at novozymes.com (Jacob Bunk Nielsen)
Date: Tue, 29 Nov 2011 17:46:54 +0100
Subject: [Bioperl-l] Problem with parsing blast results
In-Reply-To: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
	(anastsia shapiro's message of "Sat, 26 Nov 2011 18:32:48 +0100")
References: <CAHuWCXcLkSZ=N-HDG6aifV0Y1sL8JKNU5Jz=sGzyAfT8JDVTXw@mail.gmail.com>
Message-ID: <77sjl698qp.fsf@spurv.nzcorp.net>

Hi

anastsia shapiro <uni.anastasia at gmail.com> writes:

> I'm running a script that should parse a blast results, using searchIO.
>
> Sometimes the script work fines, however sometimes it stops, and I receive
> the following error.
>
> ------------- EXCEPTION -------------
> MSG: no data for midline Query
> ------------------------------------------------------------
> STACK Bio::SearchIO::blast::next_result C:/Perl64/site/lib/Bio\SearchIO\
> blast.pm:1805
> STACK toplevel
> D:\D\uni-anastasia\project\scripts\ParsingBlastResults\ParsingBlastResults.pl:36
> -------------------------------------
> While the blast results files were received as a result of running the
> following blast command:
> blastn -task blastn -db xxxxxxxxx.txt -evalue 1e-10 -perc_identity 80 -dust
> no -num_descriptions 0  -query xxxxx.txt -out results.txt -strand plus

I don't know why this exact problem arises, but I think you should
consider using an output format that is better machine parseable, like
the XML format.

You specify XML as output format of blastn by using -m 7. When reading
the result with Bioperl you must specify =>'blastxml' for Bio::SearchIO.

That way I think you are likely to see a lot fewer problems regarding
the parsing of blast output.

If the above doesn't solve the problem you better show us the code that
fails.

Best regards

Jacob


From cjfields at illinois.edu  Tue Nov 29 19:11:11 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 19:11:11 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED42E6C.6020501@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
Message-ID: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>

On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:

> Hi Chris,
> 
> On 29/11/11 05:47, Fields, Christopher J wrote:
> ...
>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?

Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.

> Best,
> 
> Florent


chris


From cjfields at illinois.edu  Tue Nov 29 22:30:58 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Tue, 29 Nov 2011 22:30:58 +0000
Subject: [Bioperl-l] BinarySearch.pm
In-Reply-To: <4ED4E0A8.30102@ebi.ac.uk>
References: <4ED4E0A8.30102@ebi.ac.uk>
Message-ID: <6F926A89-3B07-4924-8CC4-68A027E7FFCE@illinois.edu>

Peter, 

Can you send a test file that is failing?  I added a few tests using an example file with a ';' in the ID line, but everything is passing with our other EMBL example files.  I'm also looking into adding a method to return secondary IDs for a specific type ('ACC' for instance) so we can test the repeat issue for accessions.  Both changes pass tests as is, though, so I have committed them in the meantime.

chris

On Nov 29, 2011, at 7:39 AM, Peter Rice wrote:

> In trying to use bioflat_index.pl index files in EMBOSS, I ran into some problems.
> 
> Both appear to be in the Bio/Flat/BinarySearch.pm source file.
> 
> EMBL ID lines are failing to drop the ';' from the ID. Updating the regular expression to make sure the ';' is not picked up seems to work:
> 
>  if ($format =~ /embl/i) {
>    return ('ID',
> 	    "^ID   (\\S+[^; ])",
> 	    "^ID   (\\S+[^; ])",
> 	    {
> 	     ACC     => q/^AC   (\S+);/,
> 	     VERSION => q/^SV\s+(\S+)/
> 	    });
>  }
> 
> The ACC secondary index has every record duplicated.
> This line is duplicated in the write_secondary_indices source code. Is that intentional?
> 
> 		    print $fh sprintf("%-${length}s",$record);
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Wed Nov 30 01:18:41 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 11:18:41 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
Message-ID: <4ED58471.3030106@gmail.com>

Chris,
Yes, it is exciting to learn something new.
I have developed a bit of code in the last few days in my local git 
repository. Do you think you could create a repository for Bio-Community 
on the Bioperl Github space or is it too soon?
Cheers,
Florent

On 30/11/11 05:11, Fields, Christopher J wrote:
> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>
>> Hi Chris,
>>
>> On 29/11/11 05:47, Fields, Christopher J wrote:
>> ...
>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>
>> Best,
>>
>> Florent
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Nov 30 02:34:00 2011
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Wed, 30 Nov 2011 02:34:00 +0000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <4ED58471.3030106@gmail.com>
References: <4ED30F93.4000407@gmail.com>
	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>
	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
Message-ID: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>

On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:

> Chris,
> Yes, it is exciting to learn something new.
> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?

It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:

https://github.com/bioperl/Bio-Community

chris


> Cheers,
> Florent
> 
> On 30/11/11 05:11, Fields, Christopher J wrote:
>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>> 
>>> Hi Chris,
>>> 
>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>> ...
>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>> 
>>> Best,
>>> 
>>> Florent
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Wed Nov 30 02:50:04 2011
From: florent.angly at gmail.com (Florent Angly)
Date: Wed, 30 Nov 2011 12:50:04 +1000
Subject: [Bioperl-l] Interest in Bio::Community modules
In-Reply-To: <A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
References: <4ED30F93.4000407@gmail.com>	<3C22AEEE-4D84-4612-9EB9-1C6D777C0768@illinois.edu>	<4ED42E6C.6020501@gmail.com>
	<4F371B97-0D6A-48CF-9B0B-62C6C69FD1DE@illinois.edu>
	<4ED58471.3030106@gmail.com>
	<A4EEF7BF-F70C-4672-B582-748B3023E709@illinois.edu>
Message-ID: <4ED599DC.6090808@gmail.com>

Fantastic! Thank you very much Chris,
Florent

On 30/11/11 12:34, Fields, Christopher J wrote:
> On Nov 29, 2011, at 7:18 PM, Florent Angly wrote:
>
>> Chris,
>> Yes, it is exciting to learn something new.
>> I have developed a bit of code in the last few days in my local git repository. Do you think you could create a repository for Bio-Community on the Bioperl Github space or is it too soon?
> It's up to you.  I set up the barebones repo and added you on to push/pull/admin, you should be able to push to it whenever you are ready:
>
> https://github.com/bioperl/Bio-Community
>
> chris
>
>
>> Cheers,
>> Florent
>>
>> On 30/11/11 05:11, Fields, Christopher J wrote:
>>> On Nov 28, 2011, at 6:59 PM, Florent Angly wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> On 29/11/11 05:47, Fields, Christopher J wrote:
>>>> ...
>>>>> Re: using Moose and BioPerl, from experience it's a little tricky integrating the two as BioPerl uses perl's simple OO system while Moose allows much more (though it has it's own issues, memory and speed being the top two).  Specifying inheritance from a BioPerl module is a bit trickier, you'll have to wrap the module via MooseX::NonMoose.  Beyond that, I think if you stick with using is-a/has-a based relations for the time being when defining interactions with other BioPerl modules you'll be fine.
>>>> Beyond Bio::RootI, I cannot imagine that any of the Bio::Community::* modules would need to inherit from any other BioPerl modules. Considering this and the performance aspects of Moose, do you think that using Moose is a wise design decision?
>>> Moose can actually help in some circumstances; flattening the inheritance hierarchy by using roles seems to help.  And it never hurts to learn something new like Moose and other modern perl niceties.
>>>
>>>> Best,
>>>>
>>>> Florent
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lsbrath at gmail.com  Wed Nov 30 05:25:32 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 00:25:32 -0500
Subject: [Bioperl-l] Exception MSG
Message-ID: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>

Hello,

Brushing up on my BioPerl and I can't figure out this MSG:

------------- EXCEPTION -------------

MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out

STACK Bio::Tools::Run::RemoteBlast::save_output
/Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678

STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40

-------------------------------------
 Here is the code:

#!/usr/bin/perl -w

use strict;

use Bio::Tools::Run::RemoteBlast;


#=cut

my $prog = 'blastp';

my $db = 'swissprot';

my $e_val = '1e-10';


my @params = ('-prog' => $prog,

'-data' => $db,

'expect' => $e_val,

'readmethod' => 'SearchIO' );

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);


#human database

$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
[ORGN]';


my $v =1; # this is just to turn on and off the messages

# Construct the sequence object

my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
=> "fasta");


while (my $input = $seq_in->next_seq()){

my $r = $factory->submit_blast($input);

print STDERR "waiting..." if ($v > 0);

while (my @rids = $factory->each_rid()){

foreach my $rid (@rids){

my $rc = $factory->retrieve_blast($rid);

if( !ref($rc) ) {

if($rc < 0){

$factory->remove_rid($rid);

}

print STDERR "." if ($v > 0);

sleep 5;

} else {

my $result = $rc->next_result();

#save output

my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error

$factory->save_output($filename);

$factory->remove_rid($rid);

print "\nQuery Name: ", $result->query_name(), "\n";

          while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

            print "\thit name is ", $hit->name, "\n";

            while( my $hsp = $hit->next_hsp ) {

              print "\t\tscore is ", $hsp->score, "\n";

}

          }

        }

      }

    }

  }


Thanks for the help!


From jason.stajich at gmail.com  Wed Nov 30 06:05:41 2011
From: jason.stajich at gmail.com (Jason Stajich)
Date: Tue, 29 Nov 2011 22:05:41 -0800
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
Message-ID: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>

I don't think you need to give it the '>' when you specify the filename for the output. That is done by the filehandle opening itsself.

On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:

> Hello,
> 
> Brushing up on my BioPerl and I can't figure out this MSG:
> 
> ------------- EXCEPTION -------------
> 
> MSG: cannot open >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> 
> STACK Bio::Tools::Run::RemoteBlast::save_output
> /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> 
> STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> 
> -------------------------------------
> Here is the code:
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> use Bio::Tools::Run::RemoteBlast;
> 
> 
> #=cut
> 
> my $prog = 'blastp';
> 
> my $db = 'swissprot';
> 
> my $e_val = '1e-10';
> 
> 
> my @params = ('-prog' => $prog,
> 
> '-data' => $db,
> 
> 'expect' => $e_val,
> 
> 'readmethod' => 'SearchIO' );
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
> 
> #human database
> 
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> [ORGN]';
> 
> 
> my $v =1; # this is just to turn on and off the messages
> 
> # Construct the sequence object
> 
> my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa", -format
> => "fasta");
> 
> 
> while (my $input = $seq_in->next_seq()){
> 
> my $r = $factory->submit_blast($input);
> 
> print STDERR "waiting..." if ($v > 0);
> 
> while (my @rids = $factory->each_rid()){
> 
> foreach my $rid (@rids){
> 
> my $rc = $factory->retrieve_blast($rid);
> 
> if( !ref($rc) ) {
> 
> if($rc < 0){
> 
> $factory->remove_rid($rid);
> 
> }
> 
> print STDERR "." if ($v > 0);
> 
> sleep 5;
> 
> } else {
> 
> my $result = $rc->next_result();
> 
> #save output
> 
> my $filename = ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> 
> $factory->save_output($filename);
> 
> $factory->remove_rid($rid);
> 
> print "\nQuery Name: ", $result->query_name(), "\n";
> 
>          while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>            print "\thit name is ", $hit->name, "\n";
> 
>            while( my $hsp = $hit->next_hsp ) {
> 
>              print "\t\tscore is ", $hsp->score, "\n";
> 
> }
> 
>          }
> 
>        }
> 
>      }
> 
>    }
> 
>  }
> 
> 
> 
> Thanks for the help!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ss2489 at cornell.edu  Wed Nov 30 14:32:47 2011
From: ss2489 at cornell.edu (Surya Saha)
Date: Wed, 30 Nov 2011 09:32:47 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
Message-ID: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>

If that does not fix it, try using one of the unique identifiers as the
file name (gi??) instead of the full query name. The pipe(|) characters
might cause problems.

On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:

> I don't think you need to give it the '>' when you specify the filename
> for the output. That is done by the filehandle opening itsself.
>
> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>
> > Hello,
> >
> > Brushing up on my BioPerl and I can't figure out this MSG:
> >
> > ------------- EXCEPTION -------------
> >
> > MSG: cannot open
> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
> >
> > STACK Bio::Tools::Run::RemoteBlast::save_output
> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
> >
> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
> >
> > -------------------------------------
> > Here is the code:
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use Bio::Tools::Run::RemoteBlast;
> >
> >
> > #=cut
> >
> > my $prog = 'blastp';
> >
> > my $db = 'swissprot';
> >
> > my $e_val = '1e-10';
> >
> >
> > my @params = ('-prog' => $prog,
> >
> > '-data' => $db,
> >
> > 'expect' => $e_val,
> >
> > 'readmethod' => 'SearchIO' );
> >
> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >
> > #human database
> >
> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> > [ORGN]';
> >
> >
> > my $v =1; # this is just to turn on and off the messages
> >
> > # Construct the sequence object
> >
> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
> -format
> > => "fasta");
> >
> >
> > while (my $input = $seq_in->next_seq()){
> >
> > my $r = $factory->submit_blast($input);
> >
> > print STDERR "waiting..." if ($v > 0);
> >
> > while (my @rids = $factory->each_rid()){
> >
> > foreach my $rid (@rids){
> >
> > my $rc = $factory->retrieve_blast($rid);
> >
> > if( !ref($rc) ) {
> >
> > if($rc < 0){
> >
> > $factory->remove_rid($rid);
> >
> > }
> >
> > print STDERR "." if ($v > 0);
> >
> > sleep 5;
> >
> > } else {
> >
> > my $result = $rc->next_result();
> >
> > #save output
> >
> > my $filename =
> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
> >
> > $factory->save_output($filename);
> >
> > $factory->remove_rid($rid);
> >
> > print "\nQuery Name: ", $result->query_name(), "\n";
> >
> >          while ( my $hit = $result->next_hit ) {
> >
> >            next unless ( $v > 0);
> >
> >            print "\thit name is ", $hit->name, "\n";
> >
> >            while( my $hsp = $hit->next_hsp ) {
> >
> >              print "\t\tscore is ", $hsp->score, "\n";
> >
> > }
> >
> >          }
> >
> >        }
> >
> >      }
> >
> >    }
> >
> >  }
> >
> >
> >
> > Thanks for the help!
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From lsbrath at gmail.com  Wed Nov 30 14:34:52 2011
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Wed, 30 Nov 2011 09:34:52 -0500
Subject: [Bioperl-l] Exception MSG
In-Reply-To: <CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
References: <CAJm=ba9qxU2JY-i5K=sfuVcODpB941cRehpJgxcT0sThXYx68g@mail.gmail.com>
	<50C532E3-5212-49CE-A3CA-9FE91BDC00CF@gmail.com>
	<CAEiaqD=5V3_JVpzi1kjtgfCRmZg1VkOw2jWGUqNctkq9DbOnZw@mail.gmail.com>
Message-ID: <CAJm=ba-yP6q53NunpxPJzdurthGE2uN3GAtiGs7eHm1rY6AdoA@mail.gmail.com>

Surya,

As Jason suggested, I removed the '>' and it worked. Thanks for your
response.

Lom

On Wed, Nov 30, 2011 at 9:32 AM, Surya Saha <ss2489 at cornell.edu> wrote:

> If that does not fix it, try using one of the unique identifiers as the
> file name (gi??) instead of the full query name. The pipe(|) characters
> might cause problems.
>
> On Wed, Nov 30, 2011 at 1:05 AM, Jason Stajich <jason.stajich at gmail.com>wrote:
>
>> I don't think you need to give it the '>' when you specify the filename
>> for the output. That is done by the filehandle opening itsself.
>>
>> On Nov 29, 2011, at 9:25 PM, Mgavi Brathwaite wrote:
>>
>> > Hello,
>> >
>> > Brushing up on my BioPerl and I can't figure out this MSG:
>> >
>> > ------------- EXCEPTION -------------
>> >
>> > MSG: cannot open
>> >/Users/mydata/Desktop/gi|255572219|ref|XP_002527049|.out
>> >
>> > STACK Bio::Tools::Run::RemoteBlast::save_output
>> > /Library/Perl/5.10.0/Bio/Tools/Run/RemoteBlast.pm:678
>> >
>> > STACK toplevel /Users/mydata/Documents/workspace/BI7643/rb_ex.pl:40
>> >
>> > -------------------------------------
>> > Here is the code:
>> >
>> > #!/usr/bin/perl -w
>> >
>> > use strict;
>> >
>> > use Bio::Tools::Run::RemoteBlast;
>> >
>> >
>> > #=cut
>> >
>> > my $prog = 'blastp';
>> >
>> > my $db = 'swissprot';
>> >
>> > my $e_val = '1e-10';
>> >
>> >
>> > my @params = ('-prog' => $prog,
>> >
>> > '-data' => $db,
>> >
>> > 'expect' => $e_val,
>> >
>> > 'readmethod' => 'SearchIO' );
>> >
>> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> >
>> >
>> > #human database
>> >
>> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
>> > [ORGN]';
>> >
>> >
>> > my $v =1; # this is just to turn on and off the messages
>> >
>> > # Construct the sequence object
>> >
>> > my $seq_in = Bio::SeqIO->new(-file => "/Users/mydata/Desktop/rb_ex.fa",
>> -format
>> > => "fasta");
>> >
>> >
>> > while (my $input = $seq_in->next_seq()){
>> >
>> > my $r = $factory->submit_blast($input);
>> >
>> > print STDERR "waiting..." if ($v > 0);
>> >
>> > while (my @rids = $factory->each_rid()){
>> >
>> > foreach my $rid (@rids){
>> >
>> > my $rc = $factory->retrieve_blast($rid);
>> >
>> > if( !ref($rc) ) {
>> >
>> > if($rc < 0){
>> >
>> > $factory->remove_rid($rid);
>> >
>> > }
>> >
>> > print STDERR "." if ($v > 0);
>> >
>> > sleep 5;
>> >
>> > } else {
>> >
>> > my $result = $rc->next_result();
>> >
>> > #save output
>> >
>> > my $filename =
>> ">/Users/mydata/Desktop/".$result->query_name().".out";#error
>> >
>> > $factory->save_output($filename);
>> >
>> > $factory->remove_rid($rid);
>> >
>> > print "\nQuery Name: ", $result->query_name(), "\n";
>> >
>> >          while ( my $hit = $result->next_hit ) {
>> >
>> >            next unless ( $v > 0);
>> >
>> >            print "\thit name is ", $hit->name, "\n";
>> >
>> >            while( my $hsp = $hit->next_hsp ) {
>> >
>> >              print "\t\tscore is ", $hsp->score, "\n";
>> >
>> > }
>> >
>> >          }
>> >
>> >        }
>> >
>> >      }
>> >
>> >    }
>> >
>> >  }
>> >
>> >
>> >
>> > Thanks for the help!
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From ericdemuinck at gmail.com  Wed Nov 30 23:36:36 2011
From: ericdemuinck at gmail.com (Ericde)
Date: Wed, 30 Nov 2011 15:36:36 -0800 (PST)
Subject: [Bioperl-l] re trieving blast multiple alignment in fasta form
Message-ID: <32886592.post@talk.nabble.com>


:-/

I am a newbie and I am trying to retrieve a blast multiple alignment in
fasta form. The BLAST output (m -2) gives several alignments (which is good)
and the parsing of the xml file seems to list all of these alignments (which
is also good) 

The problem is that the fasta alignment file only includes one of the hits
and the alignment does not include all the sequences (including the query
sequence).

I would like to generate a fasta file that includes all the alignments
included in the m -2 output (plus query sequence if possible). I have
cobbled together a script (below) ...I will attach the sample blast xml file
and the (m -2) file as well....any insight is appreciated :/

#module load perl
 
#give the name of the blast xml file to parse in the line where it says
'file =>'
use Bio::SearchIO; 
#Use m -7 to generate xml file from blastall
my $in = new Bio::SearchIO(-format => 'blastxml', 
                           -file   => 'BLASToutxml');
while( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
#ENTER desired sequence length
      if( $hsp->length('total') > 50 ) {
#ENTER desired percent identity
        if ( $hsp->percent_identity >= 75 ) {
          print "Query=",   $result->query_name,
            " Hit=",        $hit->name,
            " Length=",     $hsp->length('total'),
            " Percent_id=", $hsp->percent_identity, "\n";
#Print alignment to file
#$aln will be a Bio::SimpleAlign object
       use Bio::AlignIO;
           my $aln = $hsp->get_aln;

#changed msf to fasta and hsp.msf to hsp.fas, output is now a fasta file 
          my $alnIO = Bio::AlignIO->new(-format =>"fasta", -file =>
">hsp.fas"); 
      $alnIO->write_aln($aln);

        }
      }
    }  
  }
}
http://old.nabble.com/file/p32886592/BLASToutxml BLASToutxml 
http://old.nabble.com/file/p32886592/hsp.fas hsp.fas 
-- 
View this message in context: http://old.nabble.com/retrieving-blast-multiple-alignment-in-fasta-form-tp32886592p32886592.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.