[Bioperl-l] fastq splitter

Wed Feb 29 13:23:27 EST 2012

Just want to say, if you can set up a local perl and local::lib it makes your life a LOT easier.  Particularly if you are running jobs on older versions of RHEL, which notoriously stuck with outdated/broken versions of perl (as well as other tools).

chris

On Feb 29, 2012, at 12:11 PM, Thomas Sharpton wrote:

> This was an interesting thread to follow (I'm about to dive into Illimina data). Glad you found the cause of the problem, Sean.
> 
> FYI - you may already know this trick, but when I work on a cluster, my first command in my submission script is to always source my bash profile (.profile, .bashrc, etc. depending on your setup).
> 
> This way you can control the structure of the PERL5LIB variable (among others) on the slave nodes and ensure your local perl modules are preferentially called.
> 
> Of course there are other solutions to this problem too.
> 
> Best,
> Tom
> 
> On Feb 29, 2012 9:38 AM, "Sean O&apos;Keeffe" <limericksean at gmail.com> wrote:
> Yes. I ran my script on a cluster which may have had bioperl installed, not
> sure.
> Running it locally = success.
> 
> Thanks all!
> 
> 
> 
> On 29 February 2012 12:13, Fields, Christopher J <cjfields at illinois.edu>wrote:
> 
> > Sean,
> >
> > To follow up just in case it was a bug, tested with your seq examples and
> > they also work, so my guess is something else is wrong locally.
> >
> > [cjfields at pyrimidine-laptop sean]$ perl test.pl < example2.fastq
> > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG
> > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT
> > +
> > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA
> > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG
> > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC
> > +
> > ##################################################
> >
> > chris
> >
> > On Feb 28, 2012, at 3:11 PM, Sean O'Keeffe wrote:
> >
> > > Hi,
> > > I'm trying to write a quick script to separate one large PE fastq file
> > into
> > > 2 separate files, one for each mate pair
> > >
> > > The file is of the format (mate1)
> > > @HWI-ST156:445:C0EDLACXX:4:1101:1496:1039 1:N:0:ATCACG
> > > CTGCTGGTAGTGCCCAAAGACCTCGAATACAATGGGCTTGGTTTTGATGT
> > > +
> > > BCCFFFFEHHHHHJJJJJHIIJIJJIIGIJJJJJJJIJJJI?FHJJIIJA
> > >
> > > && (mate2)
> > >
> > > @HWI-ST156:445:C0EDLACXX:4:2308:20877:199811 2:Y:0:ATCACG
> > > TCATAAAAATAACAAAACCACCACCCCATACAAACTCTACTCATCTCCAC
> > > +
> > > ##################################################
> > >
> > >
> > > My idea is to separate using a regex such that / 1:/ would be the first
> > > mate pair and / 2:/ would go in the second mate file.
> > > I implemented the code below but each output file is empty. Can someone
> > > spot my error?
> > >
> > > Thanks,
> > > Sean.
> > >
> > > my $infile   = shift;
> > > my $outfile1 = $infile."_1";
> > > my $outfile2 = $infile."_2";
> > >
> > > my $seqin = Bio::SeqIO->new(
> > >                             -file   => "<$infile",
> > >                             -format => "fastq",
> > >                             );
> > > my $seqout1 = Bio::SeqIO->new(
> > >                              -file   => ">$outfile1",
> > >                              -format => "fastq",
> > >                              );
> > >
> > > my $seqout2 = Bio::SeqIO->new(
> > >                              -file   => ">$outfile2",
> > >                              -format => "fastq",
> > >                              );
> > > while (my $inseq = $seqin->next_seq) {
> > >    if ($seqin->desc =~ / 1:/){
> > >      $seqout1->write_seq($inseq);
> > >    } else {
> > >      $seqout2->write_seq($inseq);
> > >    }
> > > }
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l